Improved Integral Sliding Mode Control for AUV Trajectory Tracking Based on Deep Reinforcement Learning

Zhang, Ruizhi; Wang, Zongsheng; Li, Hongyu; Ma, Weizhuang; Liu, Xiaodong; Liu, Jia

doi:10.3390/jmse14010103

Open AccessArticle

Improved Integral Sliding Mode Control for AUV Trajectory Tracking Based on Deep Reinforcement Learning

by

Ruizhi Zhang

,

Zongsheng Wang

^*

,

Hongyu Li

,

Weizhuang Ma

,

Xiaodong Liu

and

Jia Liu

College of Ocean Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(1), 103; https://doi.org/10.3390/jmse14010103

Submission received: 18 December 2025 / Revised: 31 December 2025 / Accepted: 2 January 2026 / Published: 4 January 2026

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

Trajectory tracking control of autonomous underwater vehicles (AUVs) faces challenges in complex nearshore environments due to model uncertainties and external environmental disturbances. Traditional control methods often rely on expert knowledge and manual parameter tuning, which limit the adaptability of AUVs to structural variations and changing operating conditions. Moreover, inappropriate parameter selection in conventional sliding mode control may induce high-frequency chattering, degrading control accuracy and operational efficiency. To address these issues, this paper proposes an improved integral sliding mode control (IISMC) strategy integrated with deep reinforcement learning (DRL). In the proposed framework, DRL is employed to adaptively tune key controller parameters, including the sliding surface coefficients and reaching law gains, while preserving the analytical structure of the IISMC scheme. This adaptive tuning mechanism effectively suppresses chattering and enhances robustness against uncertainties and disturbances. Numerical simulation results demonstrate that the proposed DRL-assisted IISMC method achieves improved disturbance rejection capability, higher trajectory tracking accuracy, and smoother control performance compared with conventional sliding mode control (SMC) approaches under identical operating conditions.

Keywords:

autonomous underwater vehicle (AUV); trajectory tracking; improved sliding mode control; deep reinforcement learning

1. Introduction

As a key component of global trade and marine activities, the ocean places increasing demands on the performance of underwater exploration technologies. Autonomous underwater vehicles (AUVs), which act as primary platforms for carrying sensing and inspection equipment, play an important role in a wide range of underwater engineering tasks. These applications require AUVs to exhibit high levels of responsiveness, accuracy, and stability in motion control.

This study focuses on trajectory tracking control, which represents a fundamental problem in underwater vehicle motion control. Unlike simple position regulation, trajectory tracking is time-dependent and requires the vehicle to accurately follow a predefined time-varying path in both position and velocity. After deployment from a mothership or docking station, AUVs are typically expected to follow prescribed trajectories to perform underwater exploration missions. However, variations in vehicle structures and onboard payload configurations introduce internal modeling uncertainties. In addition, complex hydrodynamic disturbances, such as turbulence and vortices commonly encountered in ports and coastal environments, further challenge control system performance. Consequently, a variety of control methodologies have been investigated for AUV motion control, including robust control, adaptive control, backstepping control, and intelligent control approaches.

Han et al. [1] proposed an adaptive fuzzy backstepping control method that integrates a command filter with an adaptive fuzzy scheme. The method constructs virtual control functions based on quantized state feedback to handle discontinuous states and model uncertainties. However, its recursive design introduces considerable computational complexity and relies heavily on expert knowledge for controller tuning. Du et al. [2] presented an adaptive backstepping sliding mode control strategy employing a condition-based adaptation mechanism, in which the control gain is updated only when necessary to enhance system stability. This approach exhibits adaptability to coupled nonlinearities, unknown system parameters, uncertain disturbances, and input saturation. Tian et al. [3] developed a trajectory tracking framework combining Bayesian optimization with nonlinear model predictive control (NMPC). Although improved tracking performance was reported, the required system linearization results in high computational burden and still depends on accurate dynamic modeling. Among these approaches, sliding mode control (SMC) is particularly attractive due to its low dependence on precise system models, inherent robustness against disturbances, and guaranteed convergence properties. Owing to these advantages, SMC is adopted in this study as the baseline control strategy for addressing AUV trajectory tracking problems under high robustness requirements.

Sliding mode control is characterized by fast dynamic response, rapid convergence, and a clearly defined control structure, which makes it suitable for practical engineering applications. Hu et al. [4] introduced an adaptive SMC scheme with predefined performance by designing a sliding surface that explicitly incorporates desired closed-loop behavior, while an adaptive law was employed to estimate external disturbances and uncertain parameters. Nevertheless, predefined performance constraints may become inadequate when the system is subject to sudden or unexpected environmental variations. Luo and Liu [5] employed a nonlinear disturbance observer to estimate complex external disturbances and improved the integral sliding surface by introducing extended exponential terms for parameter tuning. However, the associated parameters lack adaptability to changing environments, and the method was validated only for trajectory tracking on the horizontal plane. An et al. [6] developed a fixed-time disturbance observer to estimate unknown external disturbances and implemented an integral sliding mode control (ISMC) scheme, where high-order error terms were incorporated to enhance robustness against uncertainties. Close et al. [7] combined proportional–integral–derivative (PID) control with fixed-time sliding mode control (FTSMC), smoothing the switching behavior of the control input to achieve fast convergence while mitigating chattering. Guerrero et al. [8] proposed a saturation-based super-twisting algorithm (STA) within the framework of high-order sliding mode control (HOSMC), extending its application to multi-input multi-output (MIMO) systems and achieving improved tracking accuracy compared with conventional STA methods. Most existing studies addressing chattering in SMC focus on modifying the discontinuous switching function, whereas relatively few investigate how system state information influences control performance. Moreover, the integration of emerging learning-based techniques into SMC-based trajectory tracking remains limited. Inspired by the hybrid control paradigms reported in Gao et al. [9] and Roopaei et al. [10], this study explores the feasibility of incorporating autonomous learning capabilities into classical sliding mode control, with the aim of combining modern data-driven methods with traditional control theory.

Intelligent control methods have been widely applied in complex dynamic systems due to their adaptability and self-tuning capabilities. Ebrahimpour et al. [11] addressed internal and external disturbances in quadrotor unmanned aerial vehicles (UAVs) by designing a hybrid control law that combines fuzzy logic control (FLC) with integral sliding mode control (ISMC), supplemented by a disturbance observer to estimate signal gains and mitigate chattering. However, internal model disturbances were not explicitly considered in their framework. Dong et al. [12] proposed a data-driven adaptive fuzzy sliding mode control scheme in which real-time model parameters are extracted from navigation data and fuzzy control principles are employed to reduce convergence time and chattering. Nevertheless, parameter identification based on recursive least squares (RLS) suffers from limited accuracy and adaptability under rapidly varying conditions. Jiang et al. [13] employed a radial basis function neural network to model and estimate system uncertainties and time-varying disturbances. Although this approach enhances adaptability, directly learning the system structure through the neural network substantially increases exploration complexity. Despite being developed within the sliding mode control framework, most of the aforementioned methods still rely heavily on prior knowledge and accurate system models. Wang et al. [14] developed a model-free reinforcement learning approach based on deterministic policy gradients, using a continuous hybrid modeling framework for adaptive tuning of SMC parameters. While this method is suitable for highly uncertain and complex systems, it introduces considerable training cost and implementation complexity due to the use of multiple reinforcement learning modules and a large number of hyperparameters. Moreover, when applied independently to trajectory tracking problems, reinforcement learning methods often face challenges such as high-dimensional state spaces and susceptibility to local optima, which may lead to slow or unstable convergence.

To address these limitations, this study combines deep reinforcement learning (DRL) with SMC to improve AUV trajectory tracking performance. DRL employs deep neural networks as function approximators and has shown potential in reducing the strong dependence of control strategies on accurate hydrodynamic models. Zhang et al. [15] integrated DRL with PID control in a dock-handling system, where multiple controllers generated candidate control signals that were evaluated and weighted through a neural network-based quality assessment mechanism. Lee and Kim [16] applied a deep Q-network (DQN) for navigation path planning, followed by conventional control strategies for trajectory tracking. Fang et al. [17] demonstrated that DRL enables rapid deployment of trajectory and position tracking tasks in AUVs without requiring full system identification. Similarly, Wang et al. [18] reported that DRL-based path-following algorithms exhibit promising generalization in uncertain ocean environments. More recently, data-driven roadmaps for marine robotics presented by Ma et al. [19] identified DRL as a key direction for future intelligent control architectures, highlighting its capability to enhance adaptability under environmental uncertainty. In addition, Usama et al. [20] integrated DRL with robust control frameworks, such as active disturbance rejection control (DRL-ADRC), illustrating how learning-based adaptation can be combined with classical robustness guarantees. Collectively, these studies provide methodological support for developing control strategies with reduced reliance on precise model accuracy, which motivates the proposed DRL-enhanced IISMC framework in this paper.

In parallel, the integration of DRL with SMC has been increasingly investigated as an effective approach to address the long-standing chattering problem. Li et al. [21] proposed an RL-guided optimization strategy for nonlinear switching functions, enabling adaptive suppression of high-frequency oscillations without compromising robustness. Similarly, Qiu et al. [22] introduced a cross-domain framework in which RL was employed to tune SMC parameters for vibration attenuation, demonstrating the feasibility of combining data-driven learning with classical discontinuous control laws.

These studies indicate a clear trend in the literature: hybrid RL–SMC controllers retain the robustness inherent to sliding mode control while incorporating adaptive learning capabilities to achieve smoother control performance. This observation supports the effectiveness of the proposed DRL-optimized IISMC, particularly in enhancing adaptability and mitigating chattering effects in AUV motion control.

This study aims to develop and validate a motion control strategy for AUVs operating in nearshore environments for underwater surveying tasks. The main contributions of this paper are summarized as follows:

An improved integral sliding mode control (IISMC) scheme is developed by incorporating a nonlinear error-dependent function into the integral term. This design enables the IISMC system to dynamically adjust its control response according to different levels of state errors, thereby enhancing the robustness of the sliding surface structure and allowing rapid correction of trajectory deviations, which improves tracking accuracy. In addition, a modified power reaching law is introduced together with a boundary layer design to ensure smooth control input transitions within the boundary layer and effectively mitigate chattering effects.
A DRL-optimized IISMC (DIISMC) controller is proposed, in which DRL is employed to adaptively tune the sliding surface parameters and boundary layer thickness. This adaptive tuning enables the reaching law to switch between linear and nonlinear forms, allowing the controller to autonomously learn control strategies that are well suited to the dynamic and uncertain conditions of nearshore environments. It is worth noting that the proposed DIISMC framework relies on the structural properties of sliding mode control and bounded disturbance assumptions, rather than on specific mass or geometric parameters. As a result, the method exhibits inherent scale independence and can be applied to different AUV platforms. References [23,24] provide theoretical support for the scalability and generalizability of our approach.
The stability and finite-time convergence properties of the proposed control method are theoretically analyzed and rigorously proven. A series of simulation experiments are conducted on AUV trajectory tracking tasks, and performance comparisons are carried out between the proposed DIISMC controller and the baseline IISMC controller. The results demonstrate that the DRL-enhanced controller provides improved trajectory tracking accuracy under nearshore operating conditions.

The remainder of this paper is organized as follows. Section 2 presents the kinematic and dynamic modeling of the AUV. Section 3 describes the proposed IISMC control framework, including the controller design and the DRL-based optimization strategy. Section 4 presents the simulation and experimental results, with a focus on trajectory tracking performance before and after controller optimization. Section 5 discusses the issues identified through experimental data analysis and outlines potential directions for future research. Finally, Section 6 concludes the paper.

2. Materials and Methods

Section 2 introduces the coordinate frame definitions adopted for the AUV considered in this study, followed by the corresponding kinematic and dynamic analyses. Based on marine vehicle maneuvering theory, the control objectives are formulated in a mathematical framework to provide a theoretical foundation for the design of the motion control system. Figure 1 illustrates the overall geometry of the AUV used in this work, together with the definitions of the associated reference coordinate frames.

2.1. Preliminaries

To analyze the motion of the AUV, the North–East–Down (NED) inertial frame and the body-fixed frame are defined, as illustrated in Figure 1. The vector

η_{1} = {[x y z]}^{T}

represents the position of the AUV in the NED frame, while

η_{2} = {[φ θ ψ]}^{T}

denotes its orientation corresponding to the roll, pitch, and yaw angles, respectively. The complete pose of the vehicle is described by the state vector

η = {[x y z φ θ ψ]}^{T}

.

The velocity vector expressed in the body-fixed frame is defined as

v = {[u v w p q r]}^{T}

, which consists of the linear velocity vector

v_{1} = {[u v w]}^{T}

and the angular velocity vector

v_{2} = {[p q r]}^{T}

.

Remark 1.

The AUV used in the experiments is specifically designed for nearshore underwater surveying tasks. Compared with vehicles intended for deep-sea or offshore operations, this AUV operates at relatively low speeds and is subject to simpler hydrodynamic effects. Therefore, only first-order nonlinear hydrodynamic drag is considered in the dynamic model. The corresponding hydrodynamic coefficients are determined following the method described in [25].

Remark 2.

Prior to the experiments, manual calibration was performed to ensure a stable vehicle configuration, resulting in positive buoyancy of the AUV. Under these conditions, coordinate transformations are applied to convert motion information between the NED frame and the body-fixed frame.

Remark 3.

In the structural design of the AUV, the center of gravity is defined as

R_{G} (r_{G}^{1}, r_{G}^{2}, r_{G}^{3})

with respect to the body-fixed reference origin, while the center of buoyancy is defined as

R_{B} (r_{B}^{1}, r_{B}^{2}, r_{B}^{3})

and is located slightly above the center of gravity. This assumption simplifies the dynamic model by eliminating the need to explicitly compute complex restoring force and moment terms.

2.2. Kinematic and Dynamic Modeling

The kinematic model of the AUV considered in this study is defined as:

\{\begin{matrix} \dot{η} = J (η) v \\ J (η) = [\begin{matrix} J_{1} (η) & O \\ O & J_{2} (η) \end{matrix}] \end{matrix}

(1)

where

J (η)

denotes the transformation matrix between the inertial frame and the body-fixed frame. It consists of two components corresponding to the transformations of linear and angular velocities, forming a six-dimensional matrix. Here,

O

represents a 3 × 3 zero matrix.

J_{1} (η)

is the coordinate transformation matrix for linear velocity, and

J_{2} (η)

is the transformation matrix for angular velocity.

The dynamic model of the AUV is formulated following the standard marine vehicle modeling framework presented in [26], and is expressed as:

M \dot{v} + C (v) v + D (v) v + g (η) = τ_{p} - τ_{d}

(2)

where

τ_{p}

denotes the vector of control forces and moments generated by the thrusters and

τ_{d}

represents the disturbance vector with unknown structure, including both internal modeling uncertainties and unmeasurable external environmental disturbances.

The inertia matrix

M

consists of the rigid-body inertia matrix

M_{R B}

and the added mass matrix

M_{A}

, such that

M = M_{R B} + M_{A}

.

The Coriolis and centripetal matrix

C (v)

accounts for the coupling effects induced by the vehicle motion and is composed of two parts: the rigid-body Coriolis and centripetal matrix

C_{R B} (v)

, which arises from the rigid-body inertia, and the added-mass Coriolis matrix

C_{A} (v)

, which captures the velocity-dependent effects associated with the surrounding fluid. Accordingly,

C (v) = C_{R B} (v) + C_{A} (v)

.

The hydrodynamic damping matrix

D (v)

represents the energy dissipation caused by fluid–structure interaction and is composed of linear and nonlinear components. Specifically, the linear damping matrix

D_{L}

models low-speed viscous effects, while the nonlinear damping matrix

D_{N L}

accounts for higher-order drag forces that become dominant at increased velocities. Thus, the total damping matrix is expressed as

D (v) = D_{L} + D_{N L}

.

Considering that the AUV employed in this study is laterally symmetric, and that only minor asymmetries exist in the longitudinal and vertical directions, the nonlinear damping matrix is assumed to be diagonal to simplify dynamic modeling and controller design. This assumption implies that the dominant damping contribution in each degree of freedom originates primarily from the corresponding velocity component. Such modeling simplifications have been widely adopted in AUV dynamic modeling and provide satisfactory engineering practicality within acceptable control accuracy margins.

The term

g (η)

represents the hydrostatic restoring forces and moments acting on the AUV, which result from the combined effects of buoyancy

B

and gravitational force

W

. These restoring effects depend on the relative positions of the center of gravity and the center of buoyancy, as well as the roll and pitch angles of the vehicle.

2.3. Thruster Configuration Analysis

To clearly illustrate the spatial arrangement of the thrusters with respect to the AUV’s body-fixed coordinate system, a three-dimensional isometric projection is employed. The thruster layout and its relationship to the vehicle geometry are further presented through the orthographic views shown in Figure 2.

The six thrusters, denoted as

T_{1}, T_{2}, T_{3}, T_{4}, T_{5}

and

T_{6}

, operate cooperatively to achieve full three-dimensional motion of the AUV. In the body-fixed coordinate system, thrusters

T_{1}, T_{2}, T_{5}

and

T_{6}

are mounted such that their thrust axes are inclined at an angle

α

with respect to the body-frame z-axis. The distances from the centers of these thrusters to the x-axis and y-axis are denoted by

l_{1}

and

l_{2}

, respectively. Thrusters

T_{3}

and

T_{4}

are placed symmetrically along the body-frame y-axis at a distance

l_{3}

, and their vertical offsets relative to the z-axis are given by

l_{4}

.

The overall thrust vector

τ_{p}

, which represents the generalized forces and moments acting on the AUV, is distributed among the six thrusters according to the structural configuration. A transformation matrix

T_{p}

is defined to map the control output signal

u_{p}

to the actual thrust vector

τ_{p}

. Accordingly, the thrust allocation is expressed as Equation (3):

\begin{matrix} \{\begin{matrix} τ_{p} = T_{p} u_{p} \\ u_{p} = {[T_{1} T_{2} T_{3} T_{4} T_{5} T_{6}]}^{T} \\ T_{p} = [\begin{matrix} 0 & 0 & 1 & 1 & 0 & 0 \\ s i n α & - s i n α & 0 & 0 & s i n α & - s i n α \\ - c o s α & - c o s α & 0 & 0 & c o s α & c o s α \\ l_{1} c o s α & - l_{1} c o s α & 0 & 0 & - l_{1} c o s α & l_{1} c o s α \\ - l_{2} c o s α & - l_{2} c o s α & 0 & 0 & - l_{2} c o s α & - l_{2} c o s α \\ 0 & 0 & l_{4} & l_{4} & 0 & 0 \end{matrix}] \end{matrix} \end{matrix}

(3)

3. AUV Trajectory Tracking Control Scheme Design

Based on the modeling and analysis presented in the previous sections, Section 3 introduces the IISMC approach developed for AUV trajectory tracking. The structure and implementation of the IISMC motion controller are described, together with its DRL-optimized extension. In addition, the stability of the proposed control schemes is theoretically analyzed and validated. The overall system architecture is illustrated in Figure 3.

3.1. Design of the Improved Integral Sliding Mode Control

Before constructing the controller, the following assumptions are made.

Assumption 1.

The pitch angle of the AUV is bounded such that

|θ| < π / 2

in order to avoid potential singularities in the stability analysis.

Assumption 2.

The external disturbance signal

τ_{d}

is assumed to be a bounded and Lipschitz continuous deterministic signal, with its maximum magnitude denoted by

f

, satisfying

∣ τ_{d} ∣ \leq f

. In Section 4, the deterministic disturbance is constructed using sinusoidal functions over a finite experimental time horizon and does not involve any stochastic components, thereby ensuring boundedness. Moreover, similar assumptions have been widely adopted in related studies [27,28,29], implying that the disturbance does not exhibit abrupt variations. This formulation facilitates the application of Lyapunov-based theoretical tools for the subsequent stability analysis.

To address the AUV trajectory tracking problem, the sliding surface and reaching law of the sliding mode controller are designed. First, a first-order linear sliding surface

s_{1}

is constructed, incorporating the tracking error

e (t)

and its time derivative

\dot{e} (t)

. Here,

η_{d}

and

{\dot{η}}_{d}

denote the desired position and velocity profiles, respectively, provided by a predefined reference trajectory. Both are represented as six-dimensional column vectors. For the sake of simplifying the theoretical analysis, all six degrees of freedom (6-DOFs) are treated in a coupled manner, and unified variables

s

and

\dot{s}

are used instead of matrix-form expressions. The sliding surface is defined as Equation (4):

s_{1} = e (t) + \dot{e} (t)

(4)

where the tracking error terms are given by:

\{\begin{matrix} e (t) = η - η_{d} \\ \dot{e} (t) = v - {\dot{η}}_{d} \end{matrix}

(5)

The dynamic model described in Equation (2) can be rewritten as Equation (6):

\dot{v} = - M^{- 1} C (v) v - M^{- 1} D (v) v - M^{- 1} g (η) - M^{- 1} τ_{p} - M^{- 1} τ_{d}

(6)

By introducing the substitution variables

ϕ_{1} = e (t)

and

ϕ_{2} = \dot{e} (t)

, the tracking error dynamics in Equation (5) can be reformulated as Equation (7):

\{\begin{matrix} {\dot{ϕ}}_{1} = ϕ_{2} \\ {\dot{ϕ}}_{2} = - M^{- 1} C (v) v - M^{- 1} D (v) v - M^{- 1} g (η) - M^{- 1} τ_{p} - M^{- 1} τ_{d} - {\ddot{η}}_{d} \end{matrix}

(7)

In this study, the sliding surface is designed as Equation (8):

s = λ e (t) + λ_{1} \dot{e} (t) + μ \int_{0}^{t} G (e (τ)) d τ

(8)

Compared with conventional ISMC, the proposed control law demonstrates improvements in both structural design and control performance. Traditional ISMC typically incorporates the integral of the error into a first-order linear sliding surface to eliminate steady-state error. However, its tuning flexibility is limited because it usually relies on a single proportional factor, and the integral term is often designed in a linear or signum form, which may lead to chattering and integral saturation.

In contrast, the proposed sliding surface retains the classical first-order linear structure while introducing additional flexibility through the parameters

λ

and

λ_{1}

, allowing more effective adjustment of the system’s responsiveness to state errors. To further compensate for steady-state error, the method incorporates the integral term

\int_{0}^{t} G (e (τ)) d τ

, which accumulates over time when non-zero errors persist, continuously enhancing the controller output and driving the system state toward the desired trajectory.

Unlike conventional linear integral functions, the nonlinear error function

G (e (t))

is designed as a piecewise-continuous sinusoidal function: when the absolute error is within a small threshold

β

, the function takes the smooth form

β s i n (\frac{π e (t)}{2 β})

, effectively suppressing chattering; when the error exceeds this range, the function saturates to

\pm β

, ensuring both fast convergence and robustness while avoiding integral divergence. The functional profile of

G (e (t))

is illustrated in Figure 4. Overall, the proposed method provides advantages over conventional ISMC in terms of steady-state error elimination, chattering suppression, and enhanced tuning flexibility.

This structure ensures both nonlinear sensitivity and bounded control output. Specifically, it provides strong corrective action when the tracking error is large and gradually reduces control intensity as the error diminishes, thereby suppressing chattering and enhancing robustness. The function is defined as:

G (e (t)) = \{\begin{array}{l} β, e (t) \geq β \\ β s i n \frac{π e (t)}{2 β}, | e (t) | < β \\ - β, e (t) \leq - β \end{array}

(9)

In Equation

(8)

, the linear gain parameters

λ > 0

and

λ_{1} > 0

are design variables, while the integral gain

μ = d i a g (μ_{1}, μ_{2}, \dots, μ_{6})

is a diagonal positive-definite matrix. The threshold parameter

β

is also subject to controller design.

The time derivative of the sliding surface is expressed as:

\dot{s} = λ \dot{e} (t) + λ_{1} \ddot{e} (t) + μ G (e (t))

(10)

By substituting

\ddot{e} (t)

with

{\dot{ϕ}}_{2}

and defining the nonlinear vector

f = - M^{- 1} C (v) v - M^{- 1} D (v) v - M^{- 1} g (η)

. Equation (10) can be rewritten as:

\dot{s} = λ \dot{e} (t) + λ_{1} f + μ G (e (t)) - λ_{1} (M^{- 1} τ_{p} + M^{- 1} τ_{d} + {\ddot{η}}_{d})

(11)

The reaching law adopted in this study is designed as:

\dot{s} = - ξ {|s|}^{γ} s i g n (s) - λ e (t)

(12)

Compared with conventional reaching laws, such as constant-rate, exponential, and power-based schemes, the proposed method directly incorporates the tracking error

e (t)

into the reaching law. This enables direct regulation of the tracking error during the reaching phase, enhancing convergence speed while maintaining effective control responsiveness in the terminal stage. Furthermore, it reduces abrupt transitions caused by high-frequency switching, thereby improving the system’s robustness and smoothness under disturbances and uncertainties.

The nonlinear gain

ξ

is a tunable design parameter that determines the control intensity in regions with large errors. The exponent

γ

d defines the degree of nonlinearity in the reaching law and is implemented in a piecewise manner. When the sliding surface

s

approaches zero, a lower exponent is chosen to ensure a smoother control input, as

\lim_{s \to 0} {|s|}^{a} = 0

.

To balance control speed and robustness, the exponent

a

is typically selected within the range

a \in (0,1)

, allowing exploration of the effect of the power term on the performance of the modified reaching law. The structure of the reaching law is adaptively switched based on the boundary layer thickness to achieve an optimal degree of linearity. The exponent

a

is treated as a design parameter, defined as Equation (13):

γ = \{\begin{matrix} a, |s| > 1 \\ 1, |s| < 1 \end{matrix}

(13)

By combining Equations

(3)

,

(10)

and

(11)

, the simplified thrust calculation formula is obtained as Equation

(14)

. Letting

A = M^{- 1} T_{p}

, the final expression for the thrust control input is given as Equation

(15)

.

- λ_{1} M^{- 1} T_{p} μ_{p} = - ξ {|s|}^{γ} s i g n (s) - λ e (t) - λ \dot{e} (t) - λ_{1} f + λ_{1} {\ddot{η}}_{d} - μ G (e (t)) + λ_{1} M^{- 1} τ_{d}

(14)

μ_{p} = A^{- 1} (f - {\ddot{η}}_{d} - M^{- 1} τ_{d}) + A^{- 1} \frac{(ξ {|s|}^{γ} s i g n (s) + λ e (t) + λ \dot{e} (t) + μ G (e (t)))}{λ_{1}}

(15)

3.2. Stability Analysis

To evaluate the stability of the IISMC controller, a theoretical analysis is conducted based on Lyapunov stability theory. A Lyapunov function

V (s)

and its time derivative

\dot{V} (s)

are defined as follows:

\{\begin{matrix} V (s) = \frac{1}{2} s^{2} \\ \dot{V} (s) = s \dot{s} = - ξ {|s|}^{γ + 1} - λ s e (t) \end{matrix}

(16)

To ensure finite-time stability, the following condition must be satisfied:

\dot{V} (s) \leq - c V (s)

(17)

where

c > 0

is a positive constant guaranteeing convergence to the equilibrium within a finite time. The second term in Equation

(16)

contains the tracking error. To handle this term, the following assumption is introduced:

Assumption 3.

In the vicinity of the sliding surface, there exists a constant

δ > 0

such that:

|e (t)| \leq δ

. Applying Young’s inequality yields:

|s e (t)| \leq \frac{s^{2}}{2 ε} + \frac{ε ({e (t))}^{2}}{2} \leq \frac{s^{2}}{2 ε} + \frac{ε δ^{2}}{2}

(18)

By introducing the auxiliary parameter

ε

and letting

δ = \frac{1}{2 ε}

, the inequality can be simplified as:

- s e (t) \leq |s e (t)| \leq δ s^{2} + \frac{δ}{4}

(19)

This conservative estimate replaces the actual error with its upper bound, ensuring that worst-case stability conditions are met. Consequently, when

λ > 0

, it suffices that:

- ξ {|s|}^{1 + γ} + λ δ s^{2} \leq - \frac{c}{2} s^{2}

(20)

During the transient phase, when

s \neq 0

, the following condition must hold:

ξ {|s|}^{γ - 1} \geq λ δ + \frac{c}{2}

(21)

Thus, the Lyapunov condition is satisfied when

|s|

exceeds a certain threshold

s_{0}

, defined as:

|s| \geq {(\frac{λ δ + \frac{c}{2}}{ξ})}^{γ - 1} ≜ s_{0}

(22)

This ensures that the system state reaches the sliding surface in finite time. According to finite-time stability theory [30,31,32], the upper bound on the convergence time

T_{t}

is given by:

T_{t} \leq \frac{2}{c (1 - \frac{1 + γ}{2})} {V (0)}^{1 - \frac{1 + γ}{2}}

(23)

Therefore, the convergence time is finite and depends on the initial sliding variable

s_{0}

and the nonlinearity exponent

γ

. From Equations

(16)

and

(17)

, it follows that

\dot{s} < - \frac{c}{2} s

. Considering the relation

\frac{d s^{2}}{d t} = 2 s \dot{s}

, separating variables and integrating yields:

\ln s^{2} \leq - c t + C

(24)

where

C

is an integration constant. Taking the exponential form gives

s^{2} \leq e^{C} e^{- c t}

. Since

e^{C}

is a constant and

c > 0

,

s^{2}

decreases monotonically over time and asymptotically approaches zero. Therefore, the sliding surface converges to zero, which ensures that the system state reaches and remains on the sliding manifold, completing the proof of finite-time stability.

It should be noted that this Lyapunov-based stability analysis assumes fixed controller parameters, while the proposed DIISMC framework incorporates online parameter adaptation via DRL. This means that the presented stability analysis does not directly account for time-varying parameters.

In practice, the learned parameters are constrained within predefined bounds, and their variation is relatively slow compared to the system dynamics. Therefore, the stability analysis can be interpreted as guaranteeing local stability for frozen parameter values, which is a standard qualitative assumption in adaptive and learning-based control systems. Reference [33] provides support for this approach, demonstrating that Lyapunov theory can be effectively applied to analyze the stability of controllers with bounded, slowly varying, time-dependent parameters. Additionally, Reference [34] investigates adaptive sliding mode control for nonlinear systems with time-varying parameters and bounded disturbances, applying the Lyapunov method to perform a stability analysis and discussing how local stability can be guaranteed when parameter variations are slow. These interpretations highlight a limitation in the current theoretical analysis, while still providing valuable insight into the closed-loop stability behavior under bounded and slowly varying adaptations. These interpretations highlight a limitation in the current theoretical analysis, while still providing valuable insight into the closed-loop stability behavior under bounded and slowly varying adaptations.

This study also employs the traditional SMC method as a benchmark for comparison. Compared to IISMC, the traditional SMC lacks the integral term, and certain parameters used in the IISMC approach are not involved in the SMC framework. Due to its simpler structure, the stability analysis for SMC is based on the classical literature on SMC [35], which is presented here without the need for a detailed mathematical derivation within this paper.

3.3. IISMC Motion Controller Optimized by Deep Reinforcement Learning

To optimize the tunable parameters in the IISMC algorithm, this study employs a DRL framework. The DRL mechanism is leveraged to identify parameter configurations that best adapt to the operating environment, thereby ensuring both trajectory tracking accuracy and robustness for the AUV.

Since the AUV operates in a continuous action space, value-based DRL methods typically require discretization of the action space, which reduces control precision. In contrast, policy-based DRL methods rely on stochastic policies, which often result in slow convergence and high parameter variance. Consequently, both approaches are suboptimal for systems such as AUVs, which demand fast response, high stability, and precise control.

To overcome these limitations, the Deep Deterministic Policy Gradient (DDPG) algorithm is adopted in this study. DDPG combines the advantages of value-based and policy-based methods, making it suitable for efficient training in continuous action spaces. The principle of the algorithm is illustrated in Figure 5 and described in [36].

Both the actor and critic networks in the DDPG framework are implemented as fully connected feedforward neural networks (MLPs). The actor network comprises two hidden layers with 128 and 64 neurons, respectively, employing ReLU activation functions, and outputs continuous control parameters corresponding to the sliding surface and reaching law coefficients. The critic network has a similar architecture, with the state and action inputs concatenated at the second hidden layer. Other related parameters are summarized in Table 1.

By directly mapping system states to continuous action vectors, the DDPG algorithm eliminates the need for discretizing the action space, thereby preserving control precision. To enhance training stability, target networks are introduced to mitigate instabilities arising from simultaneous updates of the policy and value networks. Furthermore, the deterministic policy structure of DDPG aligns naturally with the control logic of sliding mode control, ensuring continuity of control inputs and maintaining system stability. Collectively, these characteristics enhance both the practical applicability and robustness of the proposed DRL-optimized IISMC motion controller.

In this framework, the current AUV state information and the reward function evaluation are fed into the policy network to generate control signals, which are subsequently evaluated by the critic network to assess their tracking performance. The target policy network maps the next control signal based on the subsequent state sampled from the replay buffer, while the target critic network provides the corresponding evaluation value. The difference between evaluation values at two consecutive time steps is used to perform gradient descent, thereby updating the weights of both the critic and policy networks. At fixed intervals, the weights of the target networks are softly updated. Ultimately, the policy network outputs parameterized control signals, enabling an adaptive control process.

It should be noted that the deep reinforcement learning module is trained offline prior to deployment. During online operation, the controller performs only a forward pass through the trained policy network to update a small set of sliding mode parameters, while the IISMC law itself remains analytically defined. Consequently, the online computational burden is limited, making the approach suitable for real-time implementation on embedded processors commonly.

3.4. Network Configuration and Reward Function Design

A precise definition of the state input vector

s_{t}

is crucial for the performance of the DDPG algorithm. In the context of AUV motion control, this vector should comprehensively represent the vehicle’s pose and velocity information. In this study, the state input

s_{t}

is constructed based on tracking errors and their temporal evolution, including the current tracking error

e (t)

, its time derivative

\dot{e} (t)

, as well as historical error trends. Specifically, the integral of the tracking error over a short time window

\int_{t - 1}^{t} e (τ) d τ

and the integral of its derivative

\int_{t - 1}^{t} \dot{e} (τ) d τ

, together with the tracking error and its derivative at the previous time step,

e (t - 1)

and

\dot{e} (t - 1)

, are included to capture both the magnitude and the variation trend of the state deviation. Each of these components is represented as a six-dimensional row vector, resulting in the state vector formulation shown in Equation

(25)

:

s_{t} = [\begin{matrix} e (t) \\ \dot{e} (t) \\ \int_{t - 1}^{t} e (τ) d τ \\ \int_{t - 1}^{t} \dot{e} (τ) d τ \\ e (t - 1) \\ \dot{e} (t - 1) \end{matrix}]

(25)

The reward function

r_{t}

is designed as a piecewise function that evaluates the tracking performance across all 12 dimensions (pose errors and velocity errors) at each time step t, as defined in Equation

(26)

:

r_{t} = (\sum_{i = 1}^{12} \{\begin{matrix} \frac{1}{|e^{i} (t)| + ι}, |e^{i} (t)| \geq χ \\ - κ | e^{i} (t) |, |e^{i} (t)| < χ \end{matrix}) + k t - 1000 Ω

(26)

Here,

e^{i} (t)

denotes the error of the

i

-th component at the corresponding time step

t

, with

i

ranging from 1 to 12. The parameter

χ

defines the error threshold,

κ

determines the penalty strength for large errors, and

ι

is a small positive constant to prevent singularities. The parameter

k

acts as a time weight to balance smoothness during the initial training phase and responsiveness during later stages.

Ω

is a binary termination flag, taking values of 1 or 0. When

Ω = 1

, at least one component of the pose or velocity error exceeds the limit, requiring the current training episode to be terminated and restarted. When

Ω = 0

, all errors are within acceptable limits, and training continues until the episode is completed. A severe penalty is applied in failure cases.

This reward structure combines positive reinforcement and penalties for erroneous exploration, guiding the agent to quickly learn the desired behavior and thereby accelerating convergence during training. The incorporation of multiple reference factors further enhances the stability and robustness of policy learning.

4. Simulation

The algorithm in this study is currently implemented using MATLAB and Simulink (version R2023a) for simulation and verification. To validate the effectiveness and robustness of the proposed IISMC method and its DDPG-based optimization, a series of simulations were conducted. During these experiments, the algorithm runs on a high-performance computer equipped with an Intel i7 processor, 16 GB of RAM, and a 256 GB SSD storage. This configuration is sufficient to meet the computational requirements of the algorithm, particularly when handling complex control tasks, ensuring stable computational performance. The simulation sampling time is set as

T_{s} = 0.01 s

, with each training episode lasting

T_{f} = 10 s

, and a total of 200 training episodes. The performance of the proposed controller is compared with that of the traditional SMC method based on a constant-rate reaching law. Furthermore, the improvement resulting from the integration of DDPG with IISMC (hereafter referred to as DIISMC) is evaluated.

The reference trajectory is a three-dimensional spiral with a period of

T_{2 f} = 20 s

, defined as Equation (27):

\{\begin{matrix} x_{d} (t) = 2 \sin (\frac{2 π}{T} t) \\ y_{d} (t) = 2 \cos (\frac{2 π}{T} t) - 2 \\ Z_{d} (t) = 0.1 t \\ φ_{d} (t) = 0 \\ θ_{d} (t) = atan \frac{\dot{Z_{d} (t)}}{\sqrt{{\dot{x_{d}}}^{2} (t) + {\dot{y_{d}}}^{2} (t)}} \\ ψ_{d} (t) = atan \frac{\dot{y_{d} (t)}}{\dot{x_{d} (t)}} \end{matrix}

(27)

The external disturbances

τ_{d}

are generated as a combination of sinusoidal functions with different amplitudes and frequencies, along with constant bias terms. In addition, a Gaussian noise term is superimposed to emulate sensor noise, further testing the robustness of the control system. It should be noted that this noise is not part of the theoretical disturbance model and is introduced solely for simulation purposes. Since no probabilistic modeling is involved, the inclusion of this noise does not affect the validity of the system stability analysis. The disturbance vector is defined as Equation (28):

\begin{matrix} τ_{d} = [\begin{matrix} 0.4 + 0.4 \sin (0.18 t) + 0.2 \cos (0.25 t) \\ 0.1 + 0.3 \sin (0.12 t) + 0.2 \cos (0.18 t) \\ 0.1 + 0.2 \sin (0.18 t) + 0.3 \cos (0.06 t) \\ 0.4 + 0.2 \sin (0.06 t) + 0.3 \cos (0.18 t) \\ 0.4 + 0.3 \sin (0.25 t) + 0.2 \cos (0.12 t) \\ 0.4 + 0.4 \sin (0.18 t) + 0.3 \cos (0.12 t) \end{matrix}] \end{matrix}

(28)

The physical parameters of the AUV used in the simulations are summarized in Table 2. The table includes key vehicle characteristics such as mass

m

, displaced volume

Δ

, and density

ρ

, as well as the geometric dimensions (length, width, and height). The positions of the center of gravity

R_{G} (r_{G}^{1}, r_{G}^{2}, r_{G}^{3})

and center of buoyancy

R_{B} (r_{B}^{1}, r_{B}^{2}, r_{B}^{3})

are defined relative to the body-fixed frame origin, simplifying the calculation of restoring forces. Moments and products of inertia

I_{x x}, I_{y y}, I_{z z}, I_{x y}, I_{y z}, I_{x z}

describe the rotational dynamics of the vehicle. Hydrodynamic effects are modeled using added mass coefficients

{(X}_{\dot{u}}, Y_{\dot{v}}, Z_{\dot{w}}, K_{\dot{p}}, M_{\dot{q}}, N_{\dot{r}})

, linear damping coefficients

(X_{u}, Y_{v}, Z_{w}, K_{p}, M_{q}, N_{r})

and nonlinear damping coefficients

(X_{u |u|}, Y_{v |v|}, Z_{w |w|}, K_{p |p|}, M_{q |q|}, N_{r |r|})

. These parameters are adopted from conventional marine vehicle modeling methods and provide a reasonable representation of the dynamics for AUV operations.

Table 3 lists the parameter values used in the experiments along with the corresponding ranges for the DIISMC method. These values were determined based on relevant literature and an iterative simulation-based tuning procedure. To ensure fair comparison, parameters with identical physical meanings in the SMC and IISMC controllers were assigned the same values under identical experimental conditions.

Some parameters are not applicable to the simpler SMC formulation and are therefore marked as “

N / A

” in Table 3. All SMC and IISMC parameters were chosen within the predefined ranges of the DIISMC method to maintain consistency.

Over fifty tuning trials were conducted to identify parameter sets that provided favorable control performance. The values presented in Table 3 represent one representative set from these trials. Additional comparative simulations were then performed using alternative parameter values and combinations to comprehensively evaluate the performance of each control method.

The trajectory tracking results are presented in Figure 6. The black curve represents the performance of the traditional SMC controller, the blue curve corresponds to the IISMC method, and the green curve illustrates the DDPG-optimized IISMC (DIISMC) controller. The desired helical trajectory is shown in red for reference. The detailed tracking performance in terms of position and velocity, as well as the corresponding tracking errors, are depicted in Figure 7, Figure 8, Figure 9 and Figure 10.

An analysis of Figure 7 reveals that all three controllers successfully achieve trajectory tracking in the position domain, with DIISMC and IISMC showing improved tracking accuracy compared to SMC. Considering that the experimental workspace radius is approximately ten times larger than the AUV size, this improvement is expected to translate into satisfactory tracking performance in practical scenarios. Regarding attitude tracking, DIISMC and IISMC exhibit smaller oscillation ranges and faster convergence than SMC, achieving steady-state operation within approximately half a cycle. Moreover, DIISMC demonstrates a noticeable reduction in overshoot compared to IISMC, indicating enhanced pose tracking performance while maintaining stability.

The Analysis of the tracking performance of linear and angular velocities in Figure 8 shows that DIISMC and IISMC exhibit slightly larger amplitude variations compared to SMC. However, after convergence, there is no significant difference in the oscillation range among the three methods. The relationship between these results and velocity tracking performance will be explored in future studies.

Additionally, the analysis of the error plots in Figure 9 reveals that DIISMC achieves the fastest convergence and the smallest steady-state errors among the three controllers. Particularly, in Figure 9f, the yaw angle tracking error, SMC and IISMC show varying degrees of lag, while DIISMC does not exhibit this issue, highlighting its rapid response advantage. However, for the velocity errors presented in Figure 10, DIISMC does not show significant improvements similar to pose tracking, which will be a key focus of our future work.

Table 4 presents the statistical indicators of tracking performance, including the standard deviation and boundary limits of the tracking errors, thereby quantifying the improvements or deteriorations across the different control methods. A detailed analysis of specific motion dimensions reveals that, compared with the traditional SMC, both IISMC and DIISMC achieve enhanced forward and lateral position tracking. Specifically, the upper bound of the forward position error is reduced to 60.0% and 25.0% of the corresponding SMC value for IISMC and DIISMC, respectively, while the lateral position error upper bound decreases to 89.7% and 66.7%, These results indicate that both the proposed IISMC and the DIISMC methods effectively constrain the range of tracking deviations, demonstrating superior control performance.

Analysis of the error standard deviations indicates that, for IISMC, the standard deviations of forward and lateral position errors are reduced by 9.3% and 46.2%, respectively. For DIISMC, after autonomous parameter adaptation, the standard deviations of pose tracking errors decrease by an average of 35.5%, while those of velocity tracking are reduced by an average of 2.1%. These results demonstrate that the proposed controller effectively limits the frequency of error fluctuations, thereby enhancing overall system stability during the control process. Notably, as observed in Figure 7f and Figure 8f, DIISMC successfully mitigates the control lag in yaw regulation, achieving a favorable balance among control speed, accuracy, and stability. Additionally, by extracting the twelve sets of error data shown in Figure 9 and Figure 10, and calculating the standard deviations of the control performance errors for both DIISMC and SMC, it is found that DIISMC provides an 18.8% performance improvement over SMC on average. Further, the local zoom-in views in Figure 7 and Figure 8 illustrate the process by which the actual trajectory approaches the desired trajectory at specific moments.

Based on the observations from Figure 7, Figure 8, Figure 9 and Figure 10, the DIISMC method effectively constrains both the error standard deviation and the error bounds. Under steady-state conditions, the fluctuation amplitude is well-controlled, and the system maintains accurate trajectory tracking even in the presence of external disturbances. Overall, the proposed method shows improvements in trajectory tracking accuracy, disturbance rejection, and system robustness, providing a solid foundation for further experimental validation and potential application in marine environments. However, as indicated by the data in Table 4, IISMC does not outperform traditional SMC in all dimensions. In some cases, the tracking accuracy does not exceed that of SMC, and certain error bounds are not sufficiently suppressed. These limitations will be further analyzed and discussed in Section 5.

5. Discussion

The simulation results demonstrate the effectiveness and robustness of the proposed DIISMC method. To further evaluate its strengths and limitations, key aspects are discussed below.

5.1. Error Source Analysis

Although DIISMC generally outperforms traditional SMC, some tracking metrics show slight deterioration. Possible causes include:

Integral Error in Sliding Mode Surface:
Unlike traditional SMC, DIISMC incorporates an integral term in the sliding surface and state definition. In the early tracking phase, system inertia may lead to noticeable overshoot. Over time, the integral term accumulates errors to compensate for disturbances, whereas traditional SMC cannot fully eliminate persistent errors. This results in residual fluctuations near the sliding surface. In simulations with limited duration, traditional SMC may appear more stable in certain metrics, but over longer tracking periods, DIISMC is expected to achieve a superior performance due to stable convergence in later stages.
Exploration Phase in DDPG Algorithm:
The initial exploration required by the DDPG algorithm can temporarily generate suboptimal actions. Mitigation strategies include refining the exploration mechanism, applying smoothing filters, and adjusting the algorithm to balance exploration and convergence. Advanced DRL variants such as Twin Delayed DDPG (TD3) or Soft Actor–Critic (SAC) leverage dual evaluation and entropy regularization to improve exploration efficiency, reduce suboptimal actions, and accelerate convergence, thereby enhancing overall controller performance.

This analysis identifies potential directions for improving DIISMC, aiming for more robust tracking under complex dynamic conditions.

5.2. Applicability and Limitations

Compared with RL-SMS in [22], which primarily utilizes reinforcement learning to design or modify the sliding surface, the proposed DRL-based improved integral sliding mode controller (DIISMC) maintains the analytical integral sliding mode framework and employs DRL to adaptively tune controller parameters. This distinction brings several advantages: while RL-SMS offers structural adaptability of the sliding surface, it often complicates theoretical stability analysis and may exhibit limited robustness under sustained disturbances. In contrast, DIISMC preserves stability guarantees and integral action, effectively suppressing steady-state errors, while enhancing adaptability and robustness against model uncertainties and external perturbations. It should be noted that the RL-SMS method in [22] has not yet been applied effectively within the present study context. In future work, we plan to explore potential synergies between RL-SMS and DIISMC in both theory and practice, aiming to combine their respective strengths to further advance the practical application of control engineering.

While DIISMC performs well in nearshore simulations, real-world deployment introduces challenges such as communication latency, sensor noise, and environmental uncertainties. Domain randomization or online fine-tuning may be necessary to maintain performance outside the simulation environment.

Moreover, hardware constraints—especially the need for high computational power in compact embedded systems—pose challenges for real-time implementation. Future work will focus on experimental validation under variable current fields, robustness testing, and embedded hardware optimization to enable real-time adaptive control for practical AUV missions.

6. Conclusions

This study aims to address the trajectory tracking problem for AUVs operating in complex nearshore marine environments. An adaptive and robust control approach is proposed by integrating Deep Reinforcement Learning (DRL) with an Improved Integral Sliding Mode Control (IISMC) framework. Building upon the traditional sliding mode control structure, the proposed method incorporates a nonlinear error function and a boundary layer mechanism, enhancing robustness against disturbances and model uncertainties. Furthermore, the Deep Deterministic Policy Gradient (DDPG) algorithm is employed for online adaptive optimization of the sliding surface parameters and reaching law, improving the controller’s adaptability and precision in dynamic conditions.

Simulation results indicate that, compared to conventional constant-rate SMC and the unoptimized IISMC methods, the proposed approach achieves superior performance in pose tracking accuracy, velocity response smoothness, and overall system robustness. Quantitative evaluation based on the variation in error standard deviations across all relevant pose and velocity tracking dimensions shows an average improvement of 18.8%. In addition, the controller effectively suppresses oscillations induced by external disturbances. Notably, even when trained in a disturbance-free environment, the controller maintains strong tracking performance under perturbations, ensuring the stability and accuracy of the overall control system.

Author Contributions

Conceptualization, Z.W.; Methodology, Z.W.; Software, W.M.; Investigation, R.Z.; Resources, H.L.; Data curation, R.Z.; Writing—original draft, R.Z.; Writing—review and editing, Z.W.; Supervision, X.L., H.L., J.L. and W.M.; Project administration, H.L. and Z.W.; Funding acquisition, W.M. and H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shandong Provincial Natural Science Foundation (Grant No. ZR2024MD066).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We thank the members of the marine robot group of Shandong University of Science and Technology for their contribution to the research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this article:

AUV	Autonomous Underwater Vehicle
SMC	Sliding Mode Control
ISMC	Integral Sliding Mode Control
IISMC	Improved Integral Sliding Mode Control
DRL	Deep Reinforcement Learning
UAV	Unmanned Aerial Vehicle
NED	North–East–Down inertial frame
DDPG	Deep Deterministic Policy Gradient
DIISMC	DDPG-optimized IISMC

References

Han, Y.; Liu, J.; Yu, J.; Sun, C. Adaptive fuzzy quantized state feedback control for AUVs with model uncertainty. Ocean Eng. 2024, 313, 119496. [Google Scholar] [CrossRef]
Du, P.; Yang, W.; Wang, Y.; Hu, R.; Chen, Y.; Huang, S. A novel adaptive backstepping sliding mode control for a lightweight autonomous underwater vehicle with input saturation. Ocean Eng. 2022, 263, 112362. [Google Scholar] [CrossRef]
Tian, Z.; Zheng, H.; Wu, W.; Xu, W. Integrated front reconstruction and AUV tracking control with Bayesian optimization and NMPC. Ocean Eng. 2025, 326, 120761. [Google Scholar] [CrossRef]
Hu, Y.; Song, Z.; Zhang, H. Adaptive sliding mode control with pre-specified performance settings for AUV’s trajectory tracking. Ocean Eng. 2023, 287, 115882. [Google Scholar] [CrossRef]
Luo, W.; Liu, S. Disturbance observer based nonsingular fast terminal sliding mode control of underactuated AUV. Ocean Eng. 2023, 279, 114553. [Google Scholar] [CrossRef]
An, S.; Wang, X.; Wang, L.; He, Y. Observer based fixed-time integral sliding mode tracking control for underactuated AUVs via an event-triggered mechanism. Ocean Eng. 2023, 284, 115158. [Google Scholar] [CrossRef]
Close, J.; Van, M.; McIlvanna, S. PID-Fixed Time Sliding Mode Control for Trajectory Tracking of AUVs under Disturbance. IFAC-PapersOnLine 2024, 58, 281–286. [Google Scholar] [CrossRef]
Guerrero, J.; Chemori, A.; Creuze, V.; Torres, J.; Campos, E. Saturated STA-based sliding-mode tracking control of AUVs: Design, stability analysis, and experiments. Ocean Eng. 2024, 301, 117560. [Google Scholar] [CrossRef]
Gao, T.; Liu, Y.-J.; Liu, L.; Li, D. Adaptive neural network-based control for a class of nonlinear pure-feedback systems with time-varying full state constraints. IEEE/CAA J. Autom. Sin. 2018, 5, 923–933. [Google Scholar] [CrossRef]
Roopaei, M.; Zolghadri, M.; Meshksar, S. Enhanced adaptive fuzzy sliding mode control for uncertain nonlinear systems. Commun. Nonlinear Sci. Numer. Simul. 2009, 14, 3670–3681. [Google Scholar] [CrossRef]
Ebrahimpour, M.; Lungu, M.; Kakavand, M. Antisaturation fixed-time backstepping fuzzy integral sliding mode control for automatic landing of fixed-wing unmanned aerial vehicles. J. Frankl. Inst. 2024, 361, 107185. [Google Scholar] [CrossRef]
Dong, Z.; Zhou, W.; Tan, F.; Wang, B.; Wen, Z.; Liu, Y. Simultaneous modeling and adaptive fuzzy sliding mode control scheme for underactuated USV formation based on real-time sailing state data. Ocean Eng. 2024, 314, 119743. [Google Scholar] [CrossRef]
Jiang, G.; Er, M.J.; Gong, H.; Wang, S. Adaptive Neural Network Dynamic Surface Trajectory Tracking Control for Underactuated Autonomous Underwater Vehicles. In Proceedings of the 2024 6th International Conference on Robotics and Computer Vision (ICRCV), Wuxi, China, 20–22 September 2024; IEEE: New York, NY, USA. [Google Scholar]
Wang, D.; Shen, Y.; Wan, J.; Sha, Q.; Li, G.; Chen, G.; He, B. Sliding mode heading control for AUV based on continuous hybrid model-free and model-based reinforcement learning. Appl. Ocean Res. 2022, 118, 102960. [Google Scholar] [CrossRef]
Zhang, Q.; Tan, B.; Gu, B.; Hu, X. Docking ship heave compensation system for loading operations based on a DDPG and PID hybrid control method using a judge network. Ocean Eng. 2024, 305, 117727. [Google Scholar] [CrossRef]
Lee, H.-T.; Kim, M.-K. Optimal path planning for a ship in coastal waters with deep Q network. Ocean Eng. 2024, 307, 118193. [Google Scholar] [CrossRef]
Fang, Y.; Huang, Z.; Pu, J.; Zhang, J. AUV position tracking and trajectory control based on fast-deployed deep reinforcement learning method. Ocean Eng. 2022, 245, 110452. [Google Scholar] [CrossRef]
Wang, C.; Du, J.; Wang, J.; Ren, Y. AUV Path Following Control using Deep Reinforcement Learning Under the Influence of Ocean Currents. In Proceedings of the 2021 5th International Conference on Digital Signal Processing, Chengdu, China, 26–28 February 2021; Association for Computing Machinery: Beijing, China, 2021; pp. 225–231. [Google Scholar]
Ma, X.; Wang, Y. The Adaptive Sampling of Marine Robots in Ocean Observation: An Overview. IEEE J. Ocean. Eng. 2025, 50, 1103–1126. [Google Scholar] [CrossRef]
Usama, H.; Nasir, M.; Fareh, R.; Ghommam, J.; Khadraoui, S.; Bettayeb, M. Reinforcement Learning-Enhanced Active Disturbance Rejection Control for Mobile Robot Trajectory Tracking. In Proceedings of the 2025 IEEE 22nd International Multi-Conference on Systems, Signals & Devices (SSD), Monastir, Tunisia, 17–20 February 2025. [Google Scholar]
Li, X.; Dong, L.; Xue, L.; Sun, C. Hybrid Reinforcement Learning for Optimal Control of Non-Linear Switching System. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 9161–9170. [Google Scholar] [CrossRef]
Qiu, Z.-C.; Chen, S.-W. Sliding mode reinforcement learning vibration control of a translational coupled double flexible beam system. Eur. J. Control 2025, 84, 101239. [Google Scholar] [CrossRef]
Arcos-Legarda, J.; Gutiérrez, Á. Robust model predictive control based on active disturbance rejection control for a robotic autonomous underwater vehicle. J. Mar. Sci. Eng. 2023, 11, 929. [Google Scholar] [CrossRef]
Bazrafshan, S. Sliding Mode Control Techniques for Autonomous Underwater Vehicles: A Comprehensive Review and Future Outlook. Authorea Preprint 2025. [Google Scholar] [CrossRef]
Bao, H.; Zhu, H. Modeling and trajectory tracking model predictive control novel method of AUV based on CFD data. Sensors 2022, 22, 4234. [Google Scholar] [CrossRef] [PubMed]
Fossen, T. Nonlinear maneuvering theory and pathfollowing control. In Centre for Marine Technology and Engineering (CENTEC) Anniversary Book; Guedes Soares, C., Garbatov, Y., Fonseca, N., Texeira, A.P., Eds.; CRC Press, Taylor & Francis Group: Boca Raton, FL, USA, 2012. [Google Scholar]
Guerrero, J.; Torres, J.; Creuze, V.; Chemori, A. Trajectory tracking for autonomous underwater vehicle: An adaptive approach. Ocean Eng. 2019, 172, 511–522. [Google Scholar] [CrossRef]
Manzanilla, A.; Ibarra, E.; Salazar, S.; Zamora, Á.E.; Lozano, R.; Munoz, F. Super-twisting integral sliding mode control for trajectory tracking of an Unmanned Underwater Vehicle. Ocean Eng. 2021, 234, 109164. [Google Scholar] [CrossRef]
Li, B.; Gao, X.; Huang, H.; Yang, H. Improved adaptive twisting sliding mode control for trajectory tracking of an AUV subject to uncertainties. Ocean Eng. 2024, 297, 116204. [Google Scholar] [CrossRef]
Hong, Y.; Huang, J.; Xu, Y. On an output feedback finite-time stabilisation problem. In Proceedings of the 38th IEEE conference on decision and control (Cat No 99CH36304), Phoenix, AZ, USA, 7–10 December 1999; pp. 1302–1307. [Google Scholar]
Bhat, S.P.; Bernstein, D.S. Finite-time stability of continuous autonomous systems. SIAM J. Control Optim. 2000, 38, 751–766. [Google Scholar] [CrossRef]
Polyakov, A. Nonlinear feedback design for fixed-time stabilization of linear control systems. IEEE Trans. Autom. Control 2011, 57, 2106–2110. [Google Scholar] [CrossRef]
Patil, O.S.; Sun, R.; Bhasin, S.; Dixon, W.E. Adaptive control of time-varying parameter systems with asymptotic tracking. IEEE Trans. Autom. Control 2022, 67, 4809–4815. [Google Scholar] [CrossRef]
Yuan, G.; Zhang, Z.; Qin, C.; Ge, S.S. Adaptive control for nonlinear time-varying systems with unknown control coefficients and external disturbances. IEEE Trans. Syst. Man Cybern. Syst. 2023, 54, 119–130. [Google Scholar] [CrossRef]
Utkin, V.I. Sliding modes in control of electric motors. In Sliding Modes in Control and Optimization; Springer: Berlin/Heidelberg, Germany, 1992; pp. 250–264. [Google Scholar]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]

Figure 1. Geometry of the AUV and reference coordinate frames.

Figure 2. Three views of the installation position and angle of the thrusters. (a) Front View; (b) Top View; (c) Left View.

Figure 3. Block diagram of the proposed DIISMC controller.

Figure 4. Nonlinear Piecewise Function

G (e (t))

with

β = 2

as an example.

Figure 4. Nonlinear Piecewise Function

G (e (t))

with

β = 2

as an example.

Figure 5. Schematic diagram of the principle of the DDPG.

Figure 6. Plotting results of helical trajectory tracking.

Figure 7. The position and attitude tracking process of AUV. (a) Forward position. (b) Lateral position. (c) Vertical position. (d) Roll angle. (e) Pitch angle. (f) Yaw angle.

Figure 8. The linear and angular velocity tracking process of AUV. (a) Forward velocity. (b) Lateral velocity. (c) Vertical velocity. (d) Roll angular velocity. (e) Pitch angular velocity. (f) Yaw angular velocity.

Figure 9. The variation in position and attitude error of AUV. (a) Forward error. (b) Lateral error. (c) Vertical error. (d) Roll error. (e) Pitch error. (f) Yaw error.

Figure 10. The variation in linear velocity and angular velocity error of AUV. (a) Forward velocity error. (b) Lateral velocity error. (c) Vertical velocity error. (d) Roll angular velocity error. (e) Pitch angular velocity error. (f) Yaw angular velocity error.

Table 1. Training parameters.

Parameters	Value/Range
Policy network learning rate	$0.001$
Q-network learning rate	$0.001$
Discount factor	$0.99$
Target network soft update factor	$0.005$
Initial exploration noise	$0.1 ~ 0.2$
Replay buffer size	$10^{6}$
Mini-batch size	$64 ~ 256$
Action bound normalization	$[- 1, 1]$

Table 2. Coefficients of the AUV.

Parameter	Value	Parameter	Value	Parameter	Value
$m$	$10 k g$	$ρ$	$1.04 g / {c m}^{3}$	$Δ$	$0.0145 m^{3}$
$L e n g t h$	$0.44 m$	$W i d t h$	$0.3 m$	$H e i g h t$	$0.19 m$
$r_{g}^{1}$	$0 m$	$r_{g}^{2}$	$0 m$	$r_{g}^{3}$	$0 m$
$r_{b}^{1}$	$0 m$	$r_{b}^{2}$	$0 m$	$r_{b}^{3}$	$0 m$
$I_{x x}$	$0.01 k g \cdot m^{2}$	$I_{y y}$	$0.154 k g \cdot m^{2}$	$I_{z z}$	$0.068 k g \cdot m^{2}$
$I_{x y}$	$0.0032 k g \cdot m^{2}$	$I_{y z}$	$0.0036 k g \cdot m^{2}$	$I_{x z}$	$0 k g \cdot m^{2}$
$X_{\dot{u}}$	$- 11.68 k g$	$X_{u}$	$- 22.78 k g$	$X_{u \|u\|}$	$- 22.89 k g$
$Y_{\dot{v}}$	$- 19.87 k g$	$Y_{v}$	$- 26.8 k g$	$Y_{v \|v\|}$	$- 32.6 k g$
$Z_{\dot{w}}$	$- 43.6 k g$	$Z_{w}$	$- 46.87 k g$	$Z_{w \|w\|}$	$- 55.6 k g$
$K_{\dot{p}}$	$- 0.78 k g \cdot m$	$K_{p}$	$- 1.56 k g \cdot m$	$K_{p \|p\|}$	$- 2.46 k g \cdot m$
$M_{\dot{q}}$	$- 1.53 k g \cdot m$	$M_{q}$	$- 3.06 k g \cdot m$	$M_{q \|q\|}$	$- 3.06 k g \cdot m$
$N_{\dot{r}}$	$0.082 k g \cdot m$	$N_{r}$	$- 0.82 k g \cdot m$	$N_{r \|r\|}$	$- 1.23 k g \cdot m$

Table 3. Parameter settings of the three experimental methods (one representative case).

Parameter	SMC	IISMC	The Range in DIISMC
$λ_{1}$	$5$	$5$	$[1, 10]$
$μ$	$N / A$	$2$	$[0, 5]$
$ξ$	$0.03$	$0.03$	$[0.01, 0.05]$
$λ$	$10$	$10$	$[1, 10]$
$β$	$N / A$	$0.03$	$[0.01, 0.1]$
$a$	$N / A$	$0.2$	$[0.01, 0.5]$

Table 4. Statistical indicators of tracking errors under different control methods.

	Indicators	SMC	IISMC	DIISMC
$e_{x}$	Standard Deviation	$0.043$	$0.039$	$0.027$
$e_{x}$	Error Bounds	$[- 0.224, 0.040]$	$[- 0.243,0.024]$	$[- 0.182,0.010]$
$e_{y}$	Standard Deviation	$0.013$	$0.007$	$0.005$
$e_{y}$	Error Bounds	$[- 0.023, 0.039]$	$[- 0.016, 0.035]$	$[- 0.007, 0.026]$
$e_{z}$	Standard Deviation	$0.009$	$0.010$	$0.006$
$e_{z}$	Error Bounds	$[- 0.037, 0.008]$	$[- 0.041, 0.013]$	$[- 0.030, 0.010]$
$e_{φ}$	Standard Deviation	$0.011$	$0.023$	$0.014$
$e_{φ}$	Error Bounds	$[- 0.026, 0.018]$	$[- 0.130, 0.052]$	$[- 0.089, 0.030]$
$e_{θ}$	Standard Deviation	$0.026$	$0.019$	$0.018$
$e_{θ}$	Error Bounds	$[- 0.159, 0.020]$	$[- 0.159, 0.018]$	$[- 0.159, 0.004]$
$e_{ψ}$	Standard Deviation	$0.288$	$0.167$	$0.065$
$e_{ψ}$	Error Bounds	$[- 0.022, 6.268]$	$[- 0.277, 6.279]$	$[- 0.281, 0.098]$
$e_{u}$	Standard Deviation	$0.044$	$0.051$	$0.042$
$e_{u}$	Error Bounds	$[- 0.627, 0.086]$	$[- 0.627, 0.151]$	$[- 0.627, 0.126]$
$e_{v}$	Standard Deviation	$0.007$	$0.006$	$0.004$
$e_{v}$	Error Bounds	$[- 0.022, 0.014]$	$[- 0.029, 0.017]$	$[- 0.022, 0.017]$
$e_{w}$	Standard Deviation	$0.007$	$0.009$	$0.007$
$e_{w}$	Error Bounds	$[- 0.099, 0.014]$	$[- 0.099, 0.027]$	$[- 0.099, 0.026]$
$e_{p}$	Standard Deviation	$0.016$	$0.025$	$0.018$
$e_{p}$	Error Bounds	$[- 0.012, 0.015]$	$[- 0.136, 0.123]$	$[- 0.119, 0.093]$
$e_{q}$	Standard Deviation	$0.012$	$0.018$	$0.017$
$e_{q}$	Error Bounds	$[- 0.015, 0.059]$	$[- 0.013, 0.115]$	$[- 0.006, 0.121]$
$e_{r}$	Standard Deviation	$0.026$	$0.025$	$0.021$
$e_{r}$	Error Bounds	$[- 0.048, 0.315]$	$[- 0.313, 0.066]$	$[- 0.313, 0.058]$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, R.; Wang, Z.; Li, H.; Ma, W.; Liu, X.; Liu, J. Improved Integral Sliding Mode Control for AUV Trajectory Tracking Based on Deep Reinforcement Learning. J. Mar. Sci. Eng. 2026, 14, 103. https://doi.org/10.3390/jmse14010103

AMA Style

Zhang R, Wang Z, Li H, Ma W, Liu X, Liu J. Improved Integral Sliding Mode Control for AUV Trajectory Tracking Based on Deep Reinforcement Learning. Journal of Marine Science and Engineering. 2026; 14(1):103. https://doi.org/10.3390/jmse14010103

Chicago/Turabian Style

Zhang, Ruizhi, Zongsheng Wang, Hongyu Li, Weizhuang Ma, Xiaodong Liu, and Jia Liu. 2026. "Improved Integral Sliding Mode Control for AUV Trajectory Tracking Based on Deep Reinforcement Learning" Journal of Marine Science and Engineering 14, no. 1: 103. https://doi.org/10.3390/jmse14010103

APA Style

Zhang, R., Wang, Z., Li, H., Ma, W., Liu, X., & Liu, J. (2026). Improved Integral Sliding Mode Control for AUV Trajectory Tracking Based on Deep Reinforcement Learning. Journal of Marine Science and Engineering, 14(1), 103. https://doi.org/10.3390/jmse14010103

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved Integral Sliding Mode Control for AUV Trajectory Tracking Based on Deep Reinforcement Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Preliminaries

2.2. Kinematic and Dynamic Modeling

2.3. Thruster Configuration Analysis

3. AUV Trajectory Tracking Control Scheme Design

3.1. Design of the Improved Integral Sliding Mode Control

3.2. Stability Analysis

3.3. IISMC Motion Controller Optimized by Deep Reinforcement Learning

3.4. Network Configuration and Reward Function Design

4. Simulation

5. Discussion

5.1. Error Source Analysis

5.2. Applicability and Limitations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI