Research on Real-Time Trajectory Planning and Tracking Control for Multi-ROV Shipwreck Search

Gan, Wenyang; Liang, Haozhe; Cai, Caixia

doi:10.3390/jmse14090802

Open AccessArticle

Research on Real-Time Trajectory Planning and Tracking Control for Multi-ROV Shipwreck Search

by

Wenyang Gan

^*,

Haozhe Liang

and

Caixia Cai

Logistics Engineering College, Shanghai Maritime University, No. 1550, Haigang Avenue, Pudong New Area, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(9), 802; https://doi.org/10.3390/jmse14090802

Submission received: 23 March 2026 / Revised: 13 April 2026 / Accepted: 21 April 2026 / Published: 28 April 2026

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

Multi-robot collaboration and marine robotics constitute key research directions in intelligent autonomous systems. In this context, multi-ROV cooperative operations are increasingly deployed for sunken ship search missions. A central technical challenge in such applications is to ensure efficient, non-redundant coverage while maintaining accurate formation tracking. This scenario confronts two principal difficulties. First, overlapping operational regions among multiple ROVs tend to produce both redundant coverage and search blind zones. Second, trajectory tracking accuracy is significantly degraded by the combined effects of hydrodynamic disturbances and inherent actuator constraints in ROVs. To address these challenges, an improved dynamic window approach (DWA), incorporating a search distance penalty mechanism, is proposed for multi-ROV trajectory planning. Concurrently, a cascaded tracking control architecture is constructed, wherein a model predictive kinematic controller generates constrained velocity references, while an adaptive sliding mode dynamic controller augmented with an extended state observer provides robust disturbance rejection. Collaborative search is conducted using a three-ROV leader–follower formation. Simulation results indicate that regional search coverage is effectively improved and areas of repeated detection are significantly reduced by the proposed planning algorithm. Real-time trajectory tracking is achieved by the designed controller under two typical extreme strong disturbance conditions, namely, time-varying disturbances and abrupt disturbances, on the premise of satisfying thruster thrust constraints. The proposed scheme enables all three ROVs to successfully complete the tracking task under time-varying disturbances while reducing the frequency of thrust saturation events by up to seven times. In contrast, under the conventional MPC–ASMC controller, one ROV deviates from the formation and fails to complete the tracking task. Under abrupt disturbances, the proposed approach reduces the trajectory tracking error by up to six times and decreases the frequency of thrust saturation events by up to four times.

Keywords:

improved algorithm based on the dynamic window approach; trajectory planning; expanded state observer; model predictive control; adaptive sliding mode control

1. Introduction

Multi-robot collaboration and marine robotics constitute key research directions in intelligent autonomous systems. With the rapid advancement of underwater vehicle technologies, multi-ROV cooperative operations are increasingly deployed for complex maritime missions, particularly in sunken ship search-and-rescue scenarios where efficiency, coverage, and robustness are paramount. The integration of autonomous planning and control architectures for coordinated multi-ROV systems has thus emerged as a critical enabler for enhancing the effectiveness of underwater search operations.

Since humanity embarked on the journey of ocean exploration, shipwreck incidents have frequently occurred. Subsequent to shipwrecks, air chambers can be formed in the enclosed spaces of hulls under certain conditions, which can be utilized as critical shelters for distressed personnel to maintain vital signs and await external rescue. In 1939, the USS Squalus submarine of the United States Navy sank to a depth of 74 m, and a stable air chamber was created by the sealed forward compartment, allowing 33 crew members to survive until the arrival of rescue teams. In 2013, following the capsizing of the Nigerian tugboat Jascon 4, crew member Harrison was rescued after surviving for 60 h in a compressed air layer merely 1.2 m high in the toilet adjacent to the engineer’s office.

Nevertheless, the existence of air chambers only secures a valuable time window for rescue operations, while the inherent limitations of traditional search-and-rescue modes have not been fundamentally eliminated. In 1997, in the icy waters of Antarctica, British navigator Blymore survived for 140 h in the air chamber of a capsized sailboat with only a chocolate bar and a small quantity of fresh water, and was eventually detected by a frigate of the Royal Australian Navy.

It has been indicated by relevant studies that the golden rescue window period for shipwreck search and rescue normally does not exceed 72 h. Although air chambers formed after vessel submergence can supply distressed personnel with life-sustaining oxygen, such spaces are not hermetically sealed and stable. Oxygen concentration inside the air chambers is continuously reduced with water seepage in the cabins and respiratory consumption, while carbon dioxide concentration is gradually elevated. In the context where the efficiency of traditional search-and-rescue methods is severely limited, how to determine the location of the sunken ship before oxygen runs out has become a key factor in improving the survival rate of the survivors.

Compared with traditional manual marine search and rescue, underwater robots demonstrate irreplaceable core advantages in operation scope, work efficiency and safety assurance. In terms of operation space, the physiological constraints of human beings are not imposed on underwater robots, which can be deployed to extreme and harsh environments in the deep sea, breaking through the limits of human diving depth and operational conditions. In terms of operation efficiency, long-term continuous operations can be conducted by underwater robots without being restricted by working hours, and the efficiency and quality of shipwreck search and rescue are significantly improved. In terms of operation safety, for high-risk tasks, including underwater pipeline inspection, shipwreck salvage, emergency rescue, and construction in high-risk waters, operations in hazardous environments can be fulfilled by underwater robots instead of personnel, which greatly enhances the safety factor of underwater operations. Therefore, the new generation of underwater robots with autonomy and intelligence as the core will gradually reshape the traditional underwater search-and-rescue operation system, and comprehensively replace the manual underwater operation mode featuring low efficiency and high risks.

In shipwreck search scenarios, extremely stringent requirements are imposed on the motion control of underwater vehicles due to complex seabed topography and turbulent current conditions. This necessitates that trajectory planning algorithms dynamically generate obstacle-avoidance trajectories based on real-time bathymetric data to precisely cover designated search zones in target waters, while tracking control technologies must be relied upon to ensure stable adherence to predefined trajectories under strong disturbances, thereby preventing collisions with obstacles and mission failures caused by tracking deviations. For unknown environments, it is impossible to provide trajectories in advance for tracking. Moreover, since real-time mapping data is mapped along with the movement of underwater vehicles, it is also impossible to conduct trajectory planning first and then carry out tracking control. Therefore, it is necessary to study the real-time trajectory planning and tracking control of underwater vehicles. A Remotely Operated Vehicle (ROV) is categorized as a type of underwater vehicle, which is characterized by advantages such as economy, high efficiency, unlimited operation time, and stable communication capabilities [1].

At present, there are many methods for trajectory planning for underwater vehicles. While conventional planning methods remain widely utilized in engineering practice, reinforcement learning has emerged as a prominent research direction, attracting growing attention in recent years. Researchers such as Behnaz Hadi have used deep learning and its derivative technologies to plan the trajectories of underwater vehicles [2,3]. Tao Liu et al. proposed a navigation method that integrates the principle of reciprocal velocity obstacles and the control of model predictive path integral [4]. Xinyu Jian et al. proposed an improved hybrid programming strategy using dynamic window approach (DWA) and fast exploration random tree [5]. Rui Wang et al. proposed a stepwise programming algorithm, integrating time-space Bezier curves and Particle Swarm Optimization (PSO) to achieve multi-BUV cooperative formation and efficient obstacle avoidance re-planning [6]. Deep reinforcement learning indeed enables stronger autonomy and potential optimization space in complex unknown environments, but its data-driven nature results in high computational cost, difficult safety verification, and heavy reliance on training quality. By contrast, methods like PSO are better suited for scenarios where global optimization in static environments is required, models are precisely known, and offline planning is allowed. DWA designs a reactive local planning and rapid safety assessment framework, making it particularly suitable for dynamic unknown environments. Consequently, in practical terms, DWA is demonstrated to be applicable for multi-ROV trajectory planning.

At present, there are many methods to track and control the underwater vehicle. Researchers such as Hongtao Liang and Zhao Wang used deep learning to track and control underwater robots [7,8]. Researchers such as Yuan Chen used sliding mode control (SMC) and its derivative technologies to track and control underwater robots [9,10,11,12]. Traditional SMC is inherently limited by its dependence on discontinuous switching control laws, which tend to induce actuator wear. The methodology is incapable of predicting trajectories or explicitly satisfying complex constraints, while its robust performance is constrained by conservative parameter design, rendering the system inadequately prepared to handle time-varying disturbances. Researchers such as Haoyu Yang have used model predictive control (MPC) and its derivative technologies to track and control underwater robots [13,14,15]. MPC shows superior performance in dealing with complex constraints and generating optimized reference trajectories for ROVs. Nevertheless, its performance is highly dependent on the accuracy of kinematic and dynamic models, and the online optimization process will bring corresponding computational delay, which makes it unsuitable to be directly employed as a high-frequency inner-loop controller. In contrast, adaptive sliding mode control (ASMC) possesses strong robustness against model uncertainties and external disturbances with low computational complexity, which is competent for high-frequency real-time closed-loop control. However, it lacks the ability of global trajectory optimization and explicit constraint management. Given the complementary advantages of the two control methods, the hierarchical structure with outer-loop MPC and inner-loop ASMC has become a common scheme in relevant research. In this paper, this mature and effective control architecture is adopted. By decomposing kinematic optimization and dynamic tracking tasks, the advantages of MPC in constraint handling and trajectory optimization, as well as the merits of ASMC in robustness and real-time performance, can be fully utilized, so as to improve the overall performance of the control system.

Despite significant progress in individual domains, such as reactive local planning via DWA, constrained optimal control via MPC, robust dynamic tracking via ASMC, and disturbance estimation via extended state observer (ESO), existing approaches typically address these challenges in isolation. In the context of multi-ROV collaborative search, a fundamental research gap remains: how to simultaneously ensure efficient, non-redundant coverage among multiple vehicles while maintaining precise trajectory tracking under both actuator saturation and strong hydrodynamic disturbances. Conventional DWA formulations lack inter-vehicle coordination mechanisms and are therefore prone to search overlap and blind zones in deployments involving multiple agents. Standard MPC- and adaptive sliding mode control (ASMC)-cascaded controllers, although effective for constraint handling and robustness with a single vehicle, do not inherently compensate for unknown environmental disturbances unless augmented with active feedforward strategies. Conversely, ESO-based disturbance rejection schemes are seldom integrated into a complete planning and control hierarchy that explicitly accounts for multi-vehicle search coordination and thruster limitations. To bridge this gap, the present work proposes an integrated framework that couples an augmented leader and follower DWA planner with a cascaded kinematic and dynamic tracking controller enhanced by ESO-based feedforward compensation. This unified architecture is specifically designed to reconcile the competing demands of formation coverage, constraint satisfaction, and disturbance rejection in time-critical shipwreck search scenarios. The proposed framework is outlined as follows.

In this paper, an improved DWA is used to construct a multi-ROV collaborative search architecture for trajectory planning. For tracking control, a cascade control structure is formed by combining a kinematic controller and an adaptive dynamic controller. DWA is a local path planning algorithm based on velocity space, which is often employed for dynamic obstacle avoidance and real-time trajectory optimization. To address the issues of repeated detection and coverage blind spots in multi-ROV formation search under the traditional DWA algorithm, this study proposes a hierarchical optimization architecture based on dynamic potential field adjustment. The core idea of this architecture is to achieve the collaborative optimization of formation safety and search efficiency through a distance-sensitive coupling mechanism. Specifically, it deeply integrates formation collaboration constraints into the heading, obstacle avoidance, and speed evaluation framework of the classical DWA. For the tracking control of ROVs, MPC is combined with it to form a dual-loop tracking control system. In this configuration, MPC serves as the kinematic controller in the position outer loop, converting the desired position into a velocity reference signal. ASMC functions as the dynamic controller in the velocity inner loop, tracking the velocity reference and generating actuator torque commands. In ROV tracking control, disturbances encompass various non-ideal factors that affect the motion state or trajectory of the vehicle, arising from complex and diverse sources. To mitigate the impact of unknown disturbances on ROV tracking performance, this paper incorporates an ESO in conjunction with ASMC. The ESO treats uncertainties such as unmodeled dynamics, parameter variations, and external interferences as a lumped total disturbance and provides real-time disturbance estimation for feedforward compensation. The combined controller thereby realizes closed-loop optimization, encompassing disturbance estimation, feedforward compensation, and robust control.

The remainder of this paper is structured as follows: the improved DWA integrated with search distance penalty for multi-ROV cooperative trajectory planning, which is proposed to solve the problems of repeated detection and coverage blind spots in cooperative search, is elaborated in Section 2; the dual-loop tracking control framework consisting of MPC and ASMC, which is developed to realize high-performance trajectory tracking under strong underwater disturbances and thruster constraints, is presented in Section 3; simulation experiments and a comprehensive result analysis to verify the effectiveness of the proposed trajectory planning and control methods are provided in Section 4; and, finally, the whole work is concluded and possible future research directions are discussed in Section 5.

2. DWA Trajectory Planning

Consider that, if the ship sinks after a signal is sent on the sea surface, the actual sinking position of the ship is often not the position where the signal is sent, but will drift under natural factors such as ocean wind, ocean currents, and tides, and eventually deviate from the signal position. Search-and-rescue personnel can roughly estimate the offset position based on meteorological data and experience, but the actual search requires ROVs. A single ROV cannot cover all the search areas well, so multiple ROVs need to be used for collaborative search. When a certain ROV searches for the wreck, it sends a signal to inform the rest of the ROVs to move closer to its own position, which is convenient for subsequent careful search and recovery. The schematic diagram of multi-ROV collaborative search is shown in Figure 1.

In Figure 1, the black target point represents the position of the final signal, and the orange target point represents the actual shipwreck position under the action of ocean current. The gray areas of different shapes represent different obstacles. The black dotted line represents the planned trajectory if the target point is not moved, while the black solid line represents the actual planned trajectory.

Because it is used to verify the multi-ROV trajectory planning and tracking control system proposed in this paper, it is simplified to three ROVs. The trajectory planning and tracking control system for three ROVs can be used to represent that multiple ROVs can use the same trajectory planning and tracking control system for synergy. Here, there are three ROVs: one is the leader ROV, and the other two are the follower ROVs.

The task of trajectory planning is to generate trajectory points based on real-time mapping data within a limited time, with the capability to realize the planning for target points and the avoidance of obstacles. The improved DWA is adopted for trajectory planning in this paper, and the main block diagram of the improved DWA trajectory planning is presented in Figure 2. The main module of DWA is the dynamic window. It generates all possible feasible velocity ranges within a short sampling period based on the given maximum velocity, maximum acceleration and maximum angular velocity. Through discretization, it forms combinations containing numerous candidate linear velocities and angular velocities. Then, based on the kinematic model, it predicts the motion trajectory of each velocity combination within a certain period of time in the future. Then, the score is assigned and normalized through the objective function. The higher the score, the more the trajectory meets the requirements. The trajectory

η_{d}

with the highest score is output to the tracking control link as a reference trajectory.

2.1. ROV Kinematic Model

The three-dimensional planar motion of ROVs is studied in this paper, so only four degrees of freedom are considered, namely, surge

(u)

, sway

(v)

, heave

(w)

, and yaw

(r)

. The pose and speed of ROV can be expressed as

η = {[x, y, z, ψ]}^{T}

,

v = {[u, v, w, r]}^{T}

. The kinematic model can be expressed in the following form [16]:

\dot{η} = [\begin{matrix} \dot{x} \\ \dot{y} \\ \dot{z} \\ \dot{ψ} \end{matrix}] = J (η) v = [\begin{matrix} cos ψ & - sin ψ & 0 & 0 \\ sin ψ & cos ψ & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} u \\ v \\ w \\ r \end{matrix}]

(1)

The Falcon system is employed as the simulation prototype for the ROV, where five thrusters are configured, comprising four horizontal thrusters and one vertical thruster, thereby enabling independent motion in all directions. Given the minimal displacement of the ROV over consecutive time intervals, its motion can be approximated as uniform linear motion, with the kinematic model formulated as follows:

η = [\begin{matrix} x_{t + 1} \\ y_{t + 1} \\ z_{t + 1} \\ ψ_{t + 1} \end{matrix}] = [\begin{matrix} x_{t} \\ y_{t} \\ z_{t} \\ ψ_{t} \end{matrix}] + [\begin{matrix} Δ t cos ψ & - Δ t sin ψ & 0 & 0 \\ Δ t sin ψ & Δ t cos ψ & 0 & 0 \\ 0 & 0 & Δ t & 0 \\ 0 & 0 & 0 & Δ t \end{matrix}] [\begin{matrix} u \\ v \\ w \\ r \end{matrix}]

(2)

2.2. Speed Sampling

The International Regulations for Preventing Collisions at Sea (COLREGs) applicable to surface vessels do not directly govern underwater ROV operations, whereas the ROV operational guidelines established by the International Marine Contractors Association (IMCA) have become the universal safety framework in the industry [17,18].

The core logic inherits key maritime safety principles from COLREGs, including safe speed, safe encounter distance, effective collision avoidance maneuvers, and the allocation of avoidance priorities.

Aiming at the three-dimensional underwater operating environment, the algorithm proposed in this paper complies with such safety criteria: it responds to the requirement of safe speed by introducing velocity constraints in the DWA framework, maintains safe separation among multiple ROVs via a distance penalty function, and defines a master–slave relationship for multi-ROV systems in accordance with the shipwreck search task, reflecting the hierarchical levels of different ROVs.

In DWA, a velocity space

(u, v, w, r)

is defined for the ROV within a specific time frame, where infinite velocity sets are contained. A possible range of

(u, v, w, r)

within a short time is calculated based on the current velocity and acceleration constraints of the ROV. This window limits the actual combination of speed and steering that the ROV can execute in the current state, avoiding the generation of instructions beyond its physical capabilities. Within a dynamic window of one cycle, multiple possible speed and steering values are sampled to generate multiple local trajectories. Each trajectory represents the potential movement trajectory of the ROV within a certain period of time in the future. Finally, the optimal solution is selected based on the evaluation function. The flowchart of DWA velocity sampling and trajectory generation is shown in Figure 3.

DWA defines dynamic windows through three physical constraints:

(1): Kinematic constraints: The maximum linear velocities $u_{max}$ , $v_{max}$ , $w_{max}$ and angular velocity $r_{max}$ of the ROV are specified.
(2): Dynamic constraint: Maximum acceleration ${\dot{u}}_{max}$ , ${\dot{v}}_{max}$ , ${\dot{w}}_{max}$ , ${\dot{r}}_{max}$ .
(3): Braking distance constraint: $0 \leq v \leq \sqrt{2 d_{min} a_{b}}$ . In the formula, $d_{min}$ represents the distance from the ROV to the nearest obstacle, and $a_{b}$ represents the braking acceleration of the ROV.

The dynamic window is composed of two constraint conditions, whose mathematical expressions are respectively expressed as Formulas (3) and (4) [19].

Constraint 1—Within this window, specify the speed that can be achieved within a short period of time:

V_{d} = \{(u, v, w, r) |\begin{matrix} u \in [u_{c} - {\dot{u}}_{max} Δ t, u_{c} + {\dot{u}}_{max} Δ t], \\ v \in [v_{c} - {\dot{v}}_{max} Δ t, v_{c} + {\dot{v}}_{max} Δ t], \\ w \in [w_{c} - {\dot{w}}_{max} Δ t, w_{c} + {\dot{w}}_{max} Δ t], \\ r \in [r_{c} - {\dot{r}}_{max} Δ t, r_{c} + {\dot{r}}_{max} Δ t], \end{matrix}\}

(3)

In Formula (3),

u \in [u_{c} - {\dot{u}}_{max} Δ t, u_{c} + {\dot{u}}_{max} Δ t]

,

v \in [v_{c} - {\dot{v}}_{max} Δ t, v_{c} + {\dot{v}}_{max} Δ t]

and

w \in [w_{e} - {\dot{w}}_{max} Δ t, w_{e} + {\dot{w}}_{max} Δ t]

represent the range of values for considering the maximum accelerations

{\dot{u}}_{max}

and

{\dot{v}}_{max}

, as well as the velocities u and v within the time interval.

r \in [r_{c} - {\dot{r}}_{max} Δ t, r_{c} + {\dot{r}}_{max} Δ t]

indicates that the value range of angular velocity is centered on the current angular velocity

r_{c}

, considering the maximum angular acceleration

{\dot{r}}_{max}

and time

Δ t

, and expands in the positive and negative directions.

Constraint 2—The ROV shall be enabled to perform an emergency stop before colliding with an obstacle.

V_{a} = \{(u, v, w, r) |\begin{matrix} | u | \leq \sqrt{2} * d i s t_{1} (u, v, w, r) * {\dot{u}}_{max}, \\ | v | \leq \sqrt{2} * d i s t_{1} (u, v, w, r) * {\dot{v}}_{max}, \\ | w | \leq \sqrt{2} * d i s t_{1} (u, v, w, r) * {\dot{w}}_{max}, \\ | r | \leq \sqrt{2} * d i s t_{2} (u, v, w, r) * {\dot{r}}_{max}, \end{matrix}\}

(4)

In Formula (4),

d i s t_{1}

represents the arc distance between the ROV and the nearest obstacle, and

d i s t_{2}

represents the size of the central angle between the ROV and the nearest obstacle. The central angle is the angle formed by the line connecting the edge and the center of the obstacle from the ROV’s observation perspective, with the geometric center of the obstacle as the vertex.

2.3. Objective Function

The traditional evaluation function is as follows [20]:

G (u, v, w, r) = α \cdot Heading (u, v, w, r) + β \cdot Dist (u, v, w, r) + γ \cdot vel (u, v, w)

(5)

In Formula (5),

H e a d i n g (u, v, w, r)

represents the alignment degree of the direction between the end point of the trajectory and the target point,

D i s t (u, v, w, r)

represents the normalized value of the distance to the nearest obstacle on the trajectory, and

v e l (u, v, w)

represents the normalized value of the linear velocity.

According to the traditional evaluation function, trajectory planning from the starting point to the target point can be achieved by DWA. However, for multi-ROV cooperative search systems, the problem of repeated detection caused by task area overlap arises, which not only leads to resource waste, data redundancy, and interference, but also may mislead conclusions, increase costs, and reduce work efficiency. Therefore, the traditional evaluation function is optimized to realize trajectory planning under the multi-ROV cooperative search architecture. The improved evaluation function is as follows:

\begin{matrix} G (u, v, w, r) = & α \cdot Heading (u, v, r) + β \cdot Dist (u, v, w, r) \\ + γ \cdot Velocity (u, v, w) + λ \cdot Dist 1 (u, v, w, r) \end{matrix}

(6)

In Formula (6),

D i s t 1 (u, v, w, r)

is the normalized value of the distance fraction between the leader ROV and the follower ROV. The

D i s t 1 (u, v, w, r)

score calculation process is shown in Figure 4.

First, the inter-ROV distance to the leader is evaluated. If this distance is less than 3 m, the algorithm enters a repulsion stage and outputs a corresponding repulsion score. For distances between 3 m and 30 m, the algorithm operates in the collaborative stage. In this stage, the deviation from the prescribed optimal separation is computed: if the deviation exceeds 0.5 m, a suboptimal score is assigned; otherwise, a Gaussian score is calculated such that the score increases as the deviation decreases. For distances exceeding 30 m, the algorithm transitions to an attenuation stage, wherein the score diminishes monotonically with increasing distance. The resulting trajectory score influences the objective function by adjusting the relative weighting among orientation, obstacle clearance, velocity, and inter-ROV cooperation terms—higher trajectory scores correspond to configurations better aligned with mission requirements. Based on this augmented DWA formulation, the optimal trajectory

η_{d}

is generated and transmitted to the tracking controller in real time.

The establishment of the threshold range of 3 to 30 m is not solely based on sensing coverage considerations; the 3 m threshold is employed as the minimum safe separation distance for multi-ROV cooperative formation, which is determined by integrating the physical dimensions of the ROV and the operational margin of the umbilical cable, and by which inter-ROV collisions and cable entanglement can be effectively avoided. Meanwhile, the upper limit of the threshold is set to 30 m to be matched with the other three objective functions in this paper, so that the collaborative coordination of the overall optimization objective can be realized through weight allocation among multiple objective functions.

Sensitivity analysis has been conducted over the threshold range of 27 to 33 m, and it is demonstrated that the scanning coverage of multi-ROV cooperative perception and formation stability are maintained at a favorable level with no significant fluctuations within this interval, whereby the rationality of adopting 30 m as the threshold is validated.

Owing to its smooth decay characteristic, the evaluation score can be dynamically adjusted by the Gaussian scoring function in accordance with the cooperative distance, so that the smoothness of formation trajectory planning is guaranteed.

3. ESO Combined with MPC–ASMC Tracking Control

The task of the tracking controller is to track the track the trajectory planned by the improved DWA within a limited time. In this paper, MPC and ASMC are combined to form a dual-loop tracking system. Specifically, MPC is employed as the kinematic controller and acts as the position outer loop, where the desired position is converted into velocity reference signals. The ASMC serves as the dynamic controller, acting as the inner loop of the speed, tracking the speed and outputting the actuator torque. In ROV tracking control, the influence of various disturbances on the ROV tracking effect needs to be considered. Therefore, ESO is added as feedforward compensation to compensate for unknown disturbances. The main block diagram of the control system is shown in Figure 5.

3.1. MPC Kinematic Controller

In the MPC kinematic controller, the error between the tracking trajectory points and the reference trajectory points is taken as the input. The error model of the ROV tracking control system is represented as follows [21]:

\begin{matrix} \dot{\tilde{η}} = [\begin{matrix} \dot{x} - {\dot{x}}_{d} \\ \dot{y} - {\dot{y}}_{d} \\ \dot{z} - {\dot{z}}_{d} \\ \dot{ψ} - {\dot{ψ}}_{d} \end{matrix}] & = [\begin{matrix} 0 & 0 & 0 & - u_{d} sin ψ_{d} - v_{d} cos ψ_{d} \\ 0 & 0 & 0 & u_{d} cos ψ_{d} - v_{d} sin ψ_{d} \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}] [\begin{matrix} x - x_{d} \\ y - y_{d} \\ z - z_{d} \\ ψ - ψ_{d} \end{matrix}] & + [\begin{matrix} cos ψ_{d} & - sin ψ_{d} & 0 & 0 \\ sin ψ_{d} & cos ψ_{d} & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} u - u_{d} \\ v - v_{d} \\ w - w_{d} \\ r - r_{d} \end{matrix}] \end{matrix}

(7)

After discretizing the above linear error model, the formula is as follows:

\tilde{η} (k + 1) = a_{i} \tilde{η} (k) + b_{i} \tilde{v} (k)

(8)

In Formula (8),

a_{t} = [\begin{matrix} 1 & 0 & 0 & - T (u_{d} sin ψ_{d} + v_{d} sin ψ_{d}) \\ 0 & 1 & 0 & T (u_{d} cos ψ_{d} - v_{d} cos ψ_{d}) \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}]

,

b_{t} = [\begin{matrix} T cos ψ_{d} & - T sin ψ_{d} & 0 & 0 \\ T sin ψ_{d} & T cos ψ_{d} & 0 & 0 \\ 0 & 0 & T & 0 \\ 0 & 0 & 0 & T \end{matrix}]

where

\tilde{η} (k)

contains the state quantities and

\tilde{v} (k)

is the control quantity.

The objective function is designed as follows:

J_{1} (t) = \sum_{i = 1}^{N_{p}} ∥ η (t + i | t) - η_{d} {(t + i | t) ∥}_{Q_{1}}^{2} + \sum_{i = 1}^{N_{c}} {∥ Δ v (t + i | t) ∥}_{R_{1}}^{2}

(9)

In Formula (9),

N_{p}

is the predictive time domain, while

N_{c}

is the control time domain.

The speed constraints are as follows:

\{\begin{matrix} - 2 m / s < u < 2 m / s \\ - 2 m / s < v < 2 m / s \\ - 2 m / s < w < 2 m / s \\ - 1 rad / s < r < 1 rad / s \end{matrix}

(10)

The control quantity and control increment constraint functions are as follows:

\begin{matrix} v_{e p min} (t + k) & \leq v_{e p} (t + k) \leq v_{e p max} (t + k) \\ Δ v_{e p min} (t + k) & \leq Δ v_{e p} (t + k) \leq Δ v_{e p max} (t + k) \\ k & = 0, 1, 2, \dots, N_{c} - 1 \end{matrix}

(11)

The optimization process of objective function is the essence of the MPC algorithm. Therefore, the optimal solution of the above objective function is transformed into the solution of the quadratic programming. The objective function is as follows:

min_{v_{e p} (t)} (v_{e p} (t) H_{d} v_{e p} {(t)}^{T} + G_{d} v_{e p} {(t)}^{T})

(12)

In Formula (12),

H_{d} = B_{d}^{T} Q B_{d} + R

,

G_{d} = B_{d}^{T} Q A_{d} e_{η}

.

3.2. ASMC Dynamics Controller

The control quality of ROV tracking is determined by the performance of the adaptive sliding mode controller, while the performance and stability of the controller system are determined by the design of the sliding surface. Based on the concept of sliding mode control, the velocity error formula of ROV is defined as follows:

e_{v} = v_{e p} - v

(13)

In Formula (13),

v_{e p}

represents the kinematic controller, MPC is the desired speed obtained according to the optimal trajectory, and v represents the actual speed at the current moment.

The sliding mode surface constructed based on velocity error is as follows [22]:

s = {\dot{e}}_{c} + 2 Λ e_{c} + Λ^{2} \int e_{c} d t

(14)

The sliding mode control law is as follows:

τ = τ_{e q} + τ_{a d}

(15)

In Formula (15),

τ_{e q}

is the equivalent control law of the ROV, as shown below:

τ_{e q} = M ({\dot{v}}_{c} + \frac{{\dot{e}}_{c}}{2 Λ} + \frac{Λ e_{c}}{2}) + C (v) + D (v) + g

(16)

In Formula (16), M is the inertia matrix of the ROV itself,

Λ

is a constant, C is the rigid-body matrix of the ROV, D is the drag matrix, and g is the gravity and buoyancy matrix.

τ_{a d}

is the adaptive control law, which includes the estimated perturbations, as follows:

τ_{a d} = K^{*} s + τ_{e s o}

(17)

In Formula (17), K is a constant and s is the sliding mode surface.

τ_{s e o}

is the perturbation estimated by ESO.

3.3. ESO Compensation

In the operational process of multi-ROVs, uncertainties such as unmodeled internal dynamics, parameter variations, and external disturbances pose significant challenges to the tracking control system. These factors can compromise the control precision and robustness, as they introduce unaccounted nonlinearities and time-varying characteristics that deviate from the nominal system model. Therefore, ESO is introduced to weaken the influence of various disturbances on multiple ROVs. The core concept of ESO is that uncertainties within the entire system and external disturbances are treated as a unified entity. These entities are referred to as total disturbances and are defined as extended state variables. Through the observation and compensation of total disturbances, their impact on multi-ROV tracking is mitigated [23].

The main types of disturbances to underwater vehicles are as follows:

(1): Environmental disturbance: unstable flow velocity, direction changes and local vortices cause the ROV to deviate from its path.
(2): Equipment and system disturbances: motor response delay, thrust nonlinearity or mechanical wear causing thrust deviation.
(3): Modeling error: ignoring the effect of higher-order fluids, the estimation deviation of the fluid coefficient may vary over time.
(4): Numerical calculation disturbance: excessive time step leads to insufficient integration accuracy. The tiny deviations accumulated in floating-point operations, due to long-term tracking and control, lead to continuously increasing errors.
(5): External interference: disturbance caused by the collision of fish schools or marine organisms, Obstacles, other vehicles, or acoustic disturbances.

In the tracking control of multi-ROV systems, two typical types of unknown disturbances exist in the underwater environment, namely, slowly varying time-varying disturbances and sudden abrupt disturbances, and their interference mechanisms on the tracking effect exhibit significant differences. To address this, this study introduces the ESO as a feedforward compensation link for the two independent scenarios of time-varying disturbances and abrupt disturbances. By adopting compensation strategies adapted to the characteristics of different disturbances, the accurate estimation and suppression of various unknown disturbances are achieved, thereby ensuring the stability and accuracy of the tracking control for multi-ROV systems.

Since these disturbances cannot be predicted in advance, they are prone to exert adverse effects on the tracking control of ROVs. Therefore, ESO is introduced to estimate strong disturbances, proactively mitigating the impact of severe interferences on the controller. The core idea of ESO lies in that the internal uncertainties and external disturbances of the system are uniformly treated as “total disturbances”, which are real-time estimated and compensated through extended state variables, thereby enhancing the system robustness.

The kinematic and dynamic equations of the ROV are expressed as follows:

\{\begin{matrix} \dot{η} = J (η) v \\ M \dot{v} + D (v) v + C (v) v + g (η) = τ + τ_{d} \end{matrix}

(18)

In Formula (18),

τ

represents the control input and

τ_{d}

represents the total disturbance.

The total disturbance is defined as follows:

τ_{d} = [M - \hat{M}] \dot{v} + [D (v) - \hat{D} (v)] \dot{v} + [C (v) - \hat{C} (v)] v + [g (η) - \hat{g} (η)] + d_{e x t} (t)

(19)

In Formula (19),

[M - \hat{M}] \dot{v} + [D (v) - \hat{D} (v)] v + [C (v) - \hat{C} (v)] v + [g (η) - \hat{g} (η)]

represents the model uncertainty and

d_{ext} (t)

represents the external disturbance.

Taking the total disturbance

τ_{d}

as the expanded state, a new state vector is defined as follows:

x = [\begin{matrix} η \\ v \\ τ_{d} \end{matrix}]

(20)

Then, the expanded equation of state is as follows:

\{\begin{matrix} \dot{η} = J (η) v \\ \dot{v} = M^{- 1} (τ - D (v) \dot{v} - C (v) v - g (η) + τ_{d}) \\ {\dot{τ}}_{d} = h (t) \end{matrix}

(21)

In Formula (21),

h (t)

is the change rate of the total disturbance

τ_{d}

, which is assumed to be bounded. ESO implicitly processes

h (t)

through the error-driven mechanism. When the actual disturbance

τ_{d}

changes, that is,

h (t)

is not equal to 0, the position error e increases, and the disturbance estimation is updated by the gain

β_{3}

.

Suppose the nominal parameters

\hat{M}

,

\hat{D}

,

\hat{C}

and

\hat{g}

are known; then, the dynamic equation can be rewritten as

\dot{v} = {\hat{M}}^{- 1} (τ - \hat{D} \dot{v} - \hat{C} v - \hat{g}) + {\hat{M}}^{- 1} τ_{d}

(22)

The total disturbance is as follows:

d_{t o t a l} = {\hat{M}}^{- 1} τ_{d} + Δ f

(23)

In Formula (23),

Δ f

is the model error term.

Therefore, the ESO update equation is as follows:

\{\begin{matrix} e_{k} = η_{k} - {\hat{η}}_{k} \\ \dot{\hat{η}} = \hat{v} + β_{1} e_{k} \\ \dot{\hat{v}} = M^{- 1} (τ - D \dot{\hat{v}}) + {\hat{τ}}_{d} + β_{2} e_{k} \\ {\dot{\hat{τ}}}_{d} = β_{3} e_{k} \end{matrix}

(24)

In Formula (24),

e_{k}

is the position and attitude estimation error at time k;

\dot{\hat{η}}

is the derivative of the estimated position and attitude;

\hat{v}

is the estimated speed of the ROV;

β_{1}

is the gain of the position observer, which determines the error correction speed;

\dot{\hat{v}}

is the derivative of the estimated velocity;

{\hat{τ}}_{d}

is the estimated total disturbance (including model uncertainty and external interference);

β_{2}

is the gain of the velocity observer, which determines the error correction velocity;

{\dot{\hat{τ}}}_{d}

is the derivative of the estimated disturbance; and

β_{3}

is the gain of the perturbation observer, which determines the dynamic response speed of perturbation estimation.

4. Simulation Results and Analysis

In this paper, the Falcon ROV is used as the simulation prototype. The hydrodynamic parameters of the Falcon are shown in Table 1.

The Falcon consists of five thrusters, namely, four horizontal thrusters

[T_{1}, T_{2}, T_{3}, T_{4}]

and one vertical thruster

[T_{5}]

. The ultimate output torque of the thruster is 550 N. The relationship between the resultant force and resultant moment of the four degrees of freedom and the thruster is as follows:

[\begin{matrix} τ_{X} \\ τ_{Y} \\ τ_{Z} \\ τ_{ψ} \end{matrix}] = [\begin{matrix} cos α & cos α & cos α & cos α & 0 \\ sin α & - sin α & sin α & - sin α & 0 \\ 0 & 0 & 0 & 0 & 1 \\ A & - A & - A & A & 0 \end{matrix}] [\begin{matrix} T_{1} \\ T_{2} \\ T_{3} \\ T_{4} \\ T_{5} \end{matrix}]

(25)

In Formula (25),

α

represents the angle between the thruster arrangement and the X-axis direction, with an angle of 36°.

A = (\frac{b}{2}) sin α + (\frac{a}{2}) cos α

,

a = 0.6

m,

b = 1

m.

4.1. Trajectory Planning Simulation

In the trajectory planning phase, the last known position of the shipwreck is taken as

(0, 0, 0)

m, where the surface search vessel loses contact. Based on the historical ocean current direction at this location, a fixed deployment heading of 45° is adopted for the ROV search operation. For the multi-ROV scenario, each vehicle is assigned a nominal search radius of 10 m. Initially, the leader and the two follower ROVs, denoted A and B, are all deployed at the origin

(0, 0, 0)

m. Considering the potential drift of the shipwreck, the estimated planar coordinates of the search terminus are set to

(120, 120, - 50)

m. To prevent redundant coverage, the improved DWA algorithm directs follower A to prioritize a waypoint at

(20, 0, - 20)

m based on its search radius, while its final target is correspondingly shifted to

(140, 120, - 50)

m. Similarly, follower B is guided to prioritize

(0, 20, - 20)

m, with its final target shifted to

(120, 140, - 50)

m, thereby establishing zoned coverage. In the simulation, the actual shipwreck position is placed at

(90, 90, - 57)

m.

In Equation (6), the weight coefficients

α

,

β

,

γ

, and

λ

adopted in this study are set as

α = 0.05

,

β = 0.1

,

γ = 0.1

, and

λ = 0.006

. The individual components of the objective function are normalized prior to weighted summation.These values are designed to balance the formation coverage efficiency and collision avoidance requirements in the simulation search scenarios.The sampling rate of DWA is set at 0.1 s, and the prediction time domain is 3 s.

The trajectory generated by the conventional DWA is shown in Figure 6. The actual shipwreck location is marked by red circles, and obstacles are classified into two distinct categories: conical seabed reefs and circular floating objects. As can be observed in Figure 6, the planned trajectories exhibit redundant coverage of task areas and fail to locate the shipwreck. This limitation arises because the DWA accounts only for heading, obstacle proximity, and velocity when generating paths between the start and target points. Although the point-to-point planning task is nominally completed, the absence of inter-vehicle coordination results in excessive search overlap, limited overall efficiency, and ultimately the failure of ROV(A) to detect the shipwreck within its assigned search radius.

The trajectory planning after optimizing the DWA objective function is shown in Figure 7. It can be observed from Figure 7 that the search area is significantly expanded, and the positions of sunken ships are successfully located. This can be attributed to the fact that in the DWA objective function optimization process, when planning the trajectory of the starting point and the target point, not only the azimuth, obstacle and speed factors are considered, but also the search-cooperation in the multi-ROV scenario is considered. ROVs are prompted to maintain a certain distance from each other by the distance penalty function. Thus, not only is the trajectory planning from the start to the target point achieved, but also the search range is expanded and the search efficiency is improved.

4.2. Tracking Control Simulation

Due to the uncertainties in the marine environment, certain disturbances will occur to ROVs. Therefore, the impact of disturbances on the tracking control of ROVs needs to be considered. The Band-Limited White Noise module in Simulink is used to generate random noise with specific power spectral density to simulate random disturbances in the marine environment, such as water flow and waves. White noise represents a class of worst-case disturbances characterized by a relatively uniform energy distribution. If the trajectory tracking control of ROV can achieve stable convergence under Band-Limited White Noise excitation and enable the ROV to complete its search mission, it will also exhibit favorable tolerance to structured disturbances in real operating environments. In other words, the white noise model provides a moderately conservative yet effective test benchmark. Through (26), the random noise is amplified and converted into the disturbing force with practical physical significance.

\begin{matrix} τ_{d - x} & = [\begin{matrix} 1 & 0 & 0 & 0 \end{matrix}] \cdot (R \cdot N - inv (M) \cdot B), \\ τ_{d - y} & = [\begin{matrix} 0 & 1 & 0 & 0 \end{matrix}] \cdot (R \cdot N - inv (M) \cdot B), \\ τ_{d - z} & = [\begin{matrix} 0 & 0 & 1 & 0 \end{matrix}] \cdot (R \cdot N - inv (M) \cdot B), \\ τ_{d - ψ} & = [\begin{matrix} 0 & 0 & 0 & 1 \end{matrix}] \cdot (R \cdot N - inv (M) \cdot B) . \end{matrix}

(26)

In Formula (26), R is the amplification matrix, N is the random noise generated by the Band-Limited White Noise module, M is the ROV inertial matrix, B is the current state information matrix of the ROV.

The perturbation curve used for time-varying disturbances is shown in Figure 8.

A comparison of the tracking effect of three-dimensional feedforward compensation for ROVs under time-varying disturbances is in Figure 9.

In the MPC controller, the core parameters of MPC are

N p = 5

,

N c = 1

, and sampling time

T = 0.1

. In the ASMC controller, the constant gain coefficient

n m n = 1.2

. In the ESO, the observer bandwidth is set to

ω_{0} = 4

, and the feedback gains are designed as

β_{1} = 4 \times ω_{0}

,

β_{2} = 4 \times ω_{0}^{2}

,

β_{3} = 4 \times ω_{0}^{3}

, based on the parameterization of the observer bandwidth.

As shown in Figure 9, the presence of time-varying disturbances prevents the ROVs from accurately tracking the planned trajectories, leading to noticeable overshoot. Consequently, both the leader and the follower ROVs exhibit degraded tracking performance. In particular, follower ROV(B) lacks disturbance feedforward compensation, resulting in progressive error accumulation and eventual divergence from the prescribed path. Such behavior renders this control scheme impractical for field deployment.

In contrast, as shown in Figure 9, the proposed ESO-based feedforward compensation proactively estimates the time-varying disturbances. Consequently, all three ROVs track the prescribed trajectories with high fidelity and exhibit no discernible deviation.

The tracking error curves of three-dimensional ROVs with and without ESO for feedforward compensation under time-varying disturbances are shown in Figure 10.

The comparison of the maximum tracking error of each degree of freedom with and without ESO is shown in Table 2.

In Table 2, data exhibiting significant differences in tracking error are highlighted in blue. As shown in Figure 10 and Table 2, the control system without ESO-based feedforward compensation can eventually drive the tracking error toward zero after prolonged transient adjustment. However, in obstacle-dense regions, the adverse effects of disturbances are amplified, and the cascaded MPC–ASMC controller fails to respond with sufficient speed and precision, resulting in noticeable overshoot and obstacle collisions. In contrast, the system with ESO feedforward compensation not only achieves comparable steady-state error reduction, but also effectively mitigates disturbance impacts in cluttered environments. Consequently, the augmented cascaded controller—incorporating ESO-based feedforward compensation—delivers timely and accurate trajectory corrections, substantially improving tracking fidelity in challenging scenarios.

The thrust output curves of the five thrusters under time-varying disturbances, with and without ESO and with combined ESO feedforward compensation, are shown in Figure 11.

The maximum thrust output of each thruster in the two cases of no ESO and combined with ESO feedforward compensation under time-varying disturbances is shown in Table 3.

Since the maximum thrust values of the thrusters are all 550 Newtons, it is necessary to calculate the proportion of the torque generated when the thrust exceeds the limit during this tracking control process, as shown in Table 4.

For the Falcon ROV, the maximum thrust output of an individual thruster is 550 N. As shown by the thrust response curves in Figure 11 and the quantitative summary in Table 4, the two control schemes produce markedly different thrust saturation profiles. With ESO feedforward compensation, operation at the 550 N saturation limit is rare: even the most heavily saturated thruster remains at the limit for only 10.30% of the total operating duration, and the saturation events are brief. The majority of thrusters exhibit no saturation whatsoever. In contrast, without ESO compensation, all thrusters operate against the 550 N saturation bound for a substantial portion of the mission; the thruster experiencing the most severe saturation is constrained at the limit for 77.63% of the operating time. This confirms that, in the absence of ESO feedforward compensation, the system operates under prolonged thrust saturation. The proposed scheme enables all three ROVs to successfully complete the tracking task under time-varying disturbances while reducing the frequency of thrust saturation events by up to seven times. In contrast, under the conventional MPC–ASMC controller, one ROV deviates from the formation and fails to complete the tracking task. Under abrupt disturbances, the proposed approach reduces the trajectory tracking error by up to six times and decreases the frequency of thrust saturation events by up to four times.

Prolonged thrust saturation renders the trajectory tracking control system of the leader ROV unable to accurately track the reference trajectory planned by the DWA algorithm. The feedback controller continuously outputs large-magnitude error correction commands but fails to achieve real-time and effective adjustments to the ROV’s pose. As tracking errors accumulate, the ROV gradually deviates from the preset trajectory planning path, ultimately leading to the failure of the trajectory tracking task.

For the perturbation of mutations, simulations were conducted using the random number generator module and pulse generator module in Simulink. The generated amplitude–phase spectrogram is shown in Figure 12.

The generated random pulses are transformed and amplified through (26), and the resulting mutant perturbations are shown in Figure 13.

Figure 14 compares the tracking performance of the multi-ROV formation under DWA-based trajectory planning in the presence of abrupt marine disturbances. With ESO feedforward compensation, only minor transient deviations from the planned trajectory are observed, and the tracking error is corrected rapidly. In contrast, the system without ESO compensation exhibits substantial tracking deviations, which lead to formation distortion, unnecessary path detours, and error amplification due to disturbance accumulation; moreover, the recovery time required to realign the actual trajectory with the planned path is considerably longer. These results confirm that ESO-based feedforward compensation provides effective real-time disturbance estimation and attenuation, substantially enhancing trajectory tracking accuracy under abrupt disturbance conditions.

The tracking error curves of ROV under mutation interference without combining ESO with ESO feedforward compensation are shown in Figure 15.

The comparison of the maximum tracking error of each degree of freedom with and without ESO is shown in Table 5.

As shown in Figure 15 and Table 5, the cascaded MPC–ASMC controller without ESO feedforward compensation fails to provide timely dynamic correction for trajectory deviations and external disturbances under the imposed thrust constraints. Error accumulation consequently leads to large-amplitude tracking overshoot and substantial performance degradation. For both the leader and follower ROVs, the maximum tracking error recorded without ESO compensation is approximately six times greater than that achieved with ESO feedforward compensation, rendering the former scheme impractical for precise ROV tracking applications.

In contrast, the proposed system with ESO feedforward compensation effectively suppresses external disturbances across all operating conditions and achieves the asymptotic convergence of tracking errors via closed-loop regulation. By providing real-time disturbance estimation and feedforward compensation, the ESO enables the cascaded MPC–ASMC controller to rapidly and accurately correct trajectory deviations, mitigate performance degradation induced by thrust constraints, and prevent severe tracking overshoot. Consequently, the overall trajectory tracking accuracy and robustness are substantially enhanced under the combined effects of actuator saturation and complex environmental disturbances.

The maximum thrust outputs of each thruster with and without ESO in three-dimensional mutation perturbation are shown in Figure 16.

The maximum thrust outputs of each thruster under three-dimensional mutation disturbances, with and without ESO feedforward compensation, are shown in Table 6.

Since the maximum thrust values of the thrusters are all 550 Newtons, it is necessary to calculate the proportion of the torque generated when the thrust exceeds the limit during this tracking control process, as shown in Table 7.

Actuator thrust saturation constitutes a critical constraint that degrades ROV trajectory tracking performance in unknown underwater environments. For the Falcon ROV employed in this study, the maximum thrust output of an individual thruster is limited to 550 N. As evidenced by the quantitative results in Figure 16 and Table 7, the control schemes with and without ESO feedforward compensation exhibit markedly different thrust saturation characteristics. With ESO compensation, the incidence of thrust saturation is substantially reduced: the most heavily loaded thruster operates at its saturation limit for only 14.30% of the total duration, and the saturation events are brief in duration.

In stark contrast, the system lacking ESO compensation experiences severe and persistent thrust saturation. For the most overloaded thruster, up to 50.96% of the operating time is spent constrained at the 550 N limit. Such prolonged saturation prevents the leader ROV from accurately tracking the DWA-generated reference trajectory. Under these conditions, the feedback controller issues large-amplitude corrective commands but lacks the capacity for effective real-time pose adjustment, leading to progressive tracking error accumulation and, ultimately, mission failure. The underlying mechanism for this improvement is straightforward: the ESO provides real-time observation and feedforward compensation of unknown environmental disturbances, thereby alleviating the control burden on the baseline MPC–ASMC controller and reducing the required thrust amplitudes.

Continuous thrust saturation renders the trajectory tracking control system of the leader ROV unable to accurately track the reference trajectory planned by the DWA algorithm. The feedback controller can only continuously output large-amplitude error correction commands, yet fails to achieve real-time and effective adjustment of the ROV’s pose. With the continuous accumulation of tracking errors, the ROV gradually deviates from the preset trajectory planning path, ultimately leading to the failure of the trajectory tracking task.

5. Conclusions

Based on the scenario of multi-ROV search for sunken ships in unknown underwater environments, the trajectory planning and tracking control strategies of multiple ROVs are investigated in this paper. Problems of repeated detection and coverage blind zones are encountered in formation search when the conventional DWA algorithm is applied, and, to address these drawbacks, an improved DWA algorithm embedded with a hierarchical optimization architecture based on dynamic potential fields is proposed, where formation collaboration constraints are integrated into the classic DWA evaluation framework consisting of heading, obstacle avoidance and velocity indices. Collaborative optimization between formation safety and search efficiency is thereby realized. For the tracking control problem of multiple ROVs, considering two extreme disturbance scenarios of time-varying and sudden changes, an expanded state observer is adopted for feedforward compensation, and, combined with the MPC–ASMC dual closed-loop system, effective disturbance suppression is achieved. The simulation results conducted in MATLAB/Simulink R2022a show that the proposed planning architecture effectively eliminates the overlapping search areas and blind zones, and successfully detects the actual position of the sunken ship. At the same time, the results verify that the integrated ESO compensation control system, compared with the traditional control structure, can significantly reduce the trajectory tracking error under both time-varying and sudden change disturbances, which are the two extreme strong disturbances. In future work, the proposed algorithms will be deployed on physical multi-ROV platforms for field verification, the computational efficiency of the trajectory planning module will be further optimized, and adaptive disturbance observation methods will be developed to improve the adaptability to complex underwater disturbance environments.

Author Contributions

Conceptualization, W.G. and C.C.; Methodology, W.G.; Validation, H.L.; Formal analysis, H.L.; Writing—original draft, H.L.; Writing—review & editing, W.G.; Supervision, C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by grants from the National Natural Science Foundation of China under Grant 52271321, 52471336, 52101362, 52501392 and the Creative Activity Plan for the Science and Technology Commission of Shanghai 18DZ2253100.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sun, H.; Wang, L.; Li, D. Development Status and Key Technology Analysis of Deep-Sea Working ROVs. Shipbuild. China 2024, 65, 130–144. [Google Scholar]
Hadi, B.; Khosravi, A.; Sarhadi, P. Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle. Appl. Ocean. Res. 2022, 129, 103326. [Google Scholar] [CrossRef]
Roh, E.J.; Song, I.S.; Kim, S.; Park, S. Autonomous mission-oriented unmanned underwater vehicle control using directional policy optimization. Ocean. Eng. 2025, 320, 120242. [Google Scholar] [CrossRef]
Liu, T.; Zhao, J.; Huang, J.; Li, Z. A hybrid RVO-MPPI approach for efficient collision avoidance for multiple autonomous underwater vehicles. Ocean. Eng. 2024, 312, 119205. [Google Scholar] [CrossRef]
Jian, X.; Zou, T.; Vardy, A.; Bose, N. A Hybrid Path Planning Strategy of Autonomous Underwater Vehicles. In Proceedings of the 2020 IEEE/OES Autonomous Underwater Vehicles Symposium (AUV); IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
Wang, R.; Jiang, T.; Bai, G.; Wang, Y.; Wang, S.; Tan, M. Stepwise Cooperative Trajectory Planning for Multiple BUVs Based on Temporal–Spatial Bezier Curves. IEEE Trans. Instrum. Meas. 2023, 72, 8503414. [Google Scholar] [CrossRef]
Liang, H.; Yu, J.; Li, H. Appointed-time robust tracking control for uncertain unmanned underwater vehicles with prescribed performance. Ocean. Eng. 2025, 322, 120436. [Google Scholar] [CrossRef]
Wang, Z.; Xiang, X.; Duan, Y.; Yang, S. Adversarial deep reinforcement learning based robust depth tracking control for underactuated autonomous underwater vehicle. Eng. Appl. Artif. Intell. 2024, 130, 107728. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, R.; Zhao, X.; Gao, J. Adaptive fuzzy inverse trajectory tracking control of underactuated underwater vehicle with uncertainties. Ocean. Eng. 2016, 121, 123–133. [Google Scholar] [CrossRef]
Yan, Y.; Yu, S. Sliding mode tracking control of autonomous underwater vehicles with the effect of quantization. Ocean. Eng. 2018, 151, 322–328. [Google Scholar] [CrossRef]
Li, W.; Lai, X.; Du, S.; Lu, C.; Wang, Y.; Chen, Z.; Wu, M. A Trajectory Tracking Method using Dynamic Sliding Mode Control with Parameter Optimization for Autonomous Underwater Vehicles. In Proceedings of the 2023 IEEE 6th International Conference on Industrial Cyber-Physical Systems (ICPS); IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
Gong, Q.; Zhang, W.; Su, Y.; Yang, H. Guidance and Control of Underwater Hexapod Robot Based on Adaptive Sliding Mode Strategy. J. Bionic Eng. 2025, 22, 118–132. [Google Scholar] [CrossRef]
Yang, H.; Yan, Z.; Zhang, W.; Gong, Q.; Zhang, Y.; Zhao, L. Trajectory tracking with external disturbance of bionic underwater robot based on CPG and robust model predictive control. Ocean. Eng. 2022, 263, 112215. [Google Scholar] [CrossRef]
Zhao, R.; Miao, M.; Lu, J.; Wang, Y.; Li, D. Formation control of multiple underwater robots based on ADMM distributed model predictive control. Ocean. Eng. 2022, 257, 111585. [Google Scholar] [CrossRef]
Yan, Z.; Yan, J.; Cai, S.; Yu, Y.; Wu, Y. Robust MPC-based trajectory tracking of autonomous underwater vehicles with model uncertainty. Ocean. Eng. 2023, 286, 115617. [Google Scholar] [CrossRef]
Mei, M.; Zhu, D.; Gan, W.; Jiang, X. Trajectory Tracking of Underwater Robots Based on Model Predictive Control. Control Eng. China 2019, 26, 1917–1924. [Google Scholar]
Namgung, H. Local Route Planning for Collision Avoidance of Maritime Autonomous Surface Ships in Compliance with COLREGs Rules. Sustainability 2022, 14, 198. [Google Scholar] [CrossRef]
Namgung, H.; Kim, J.S.; Jang, D.U. Path planning and collision avoidance technologies for maritime autonomous surface ships: A review of COLREGs compliance, algorithmic trends and the navigation-GPT framework. J. Navig. 2026. [Google Scholar] [CrossRef]
Zhang, J.; Zhu, Z.; Xue, Y.; Deng, Z.; Qin, H. Local path planning of under-actuated AUV based on VADWA considering dynamic model. Ocean. Eng. 2024, 310, 118705. [Google Scholar] [CrossRef]
Tao, B.; Kim, J.H. Deep reinforcement learning-based local path planning in dynamic environments for mobile robot. J. King Saud. Univ. Comput. Inf. Sci. 2024, 36, 102254. [Google Scholar] [CrossRef]
Gan, W.; Zhu, D.; Hu, Z.; Shi, X.; Yang, L.; Chen, Y. Model Predictive Adaptive Constraint Tracking Control for Underwater Vehicles. IEEE Trans. Ind. Electron. 2020, 67, 7829–7840. [Google Scholar] [CrossRef]
Zhao, H.; Zhu, D. Model Predictive Sliding Mode Tracking Control Algorithm for UUVs. Control Eng. China 2022, 29, 1195–1203. [Google Scholar]
Wang, C.; Gao, X.; Wang, L. BESO-PPF: A PPF-optimized ship heading controller based on backstepping control and the ESO. Ocean. Eng. 2025, 316, 119925. [Google Scholar] [CrossRef]

Figure 1. Multi-ROV cooperative search schematic diagram.

Figure 2. DWA trajectory planning.

Figure 3. DWA velocity sampling and trajectory generation flow chart.

Figure 4. Multi-ROV collaborative search objective function flow chart.

Figure 5. Tracking control system block diagram.

Figure 6. Traditional DWA trajectory planning.

Figure 7. Improved DWA trajectory planning.

Figure 8. The time-varying interference applied to each axis.

Figure 9. Comparison of the tracking effect of three-dimensional feedforward compensation for ROV under time-varying disturbances.

Figure 10. The tracking error curves of three-dimensional ROVs with and without ESO for feedforward compensation under time-varying disturbances.

Figure 11. The maximum thrust output of each thruster in the 3-dimensional space remains consistent regardless of whether the ESO is activated under time-varying disturbances.

Figure 12. Three-dimensional disturbance amplitude–phase spectrum diagram.

Figure 13. Three-dimensional mutation perturbs the perturbation applied along each axis.

Figure 14. Comparison chart of the tracking performance of ROV under mutation disturbance without ESO and with ESO feedforward compensation combined.

Figure 15. The tracking error curves of ROV under mutation interference without combining ESO with ESO feedforward compensation.

Figure 16. The maximum thrust output of each thruster when there is and when there is no ESO in the three-dimensional mutation perturbation.

Table 1. Falcon hydrodynamic parameters.

Parameters	Numerical Value	Unit	Physical Meaning
$X_{\dot{u}}$	281	kg	Additional mass in the u direction
$Y_{\dot{v}}$	224	kg	Additional mass in the v direction
$Z_{\dot{w}}$	509	kg	Additional mass in the w direction
$N_{\dot{r}}$	157	$N \cdot m \cdot s^{2}$	Additional moment of inertia in the r direction
$X_{u}$	109.2	$N \cdot s / m$	Linear resistance in the u direction
$X_{u u}$	169.5	$N \cdot s^{2} / m^{2}$	Secondary resistance in the u direction
$Y_{v}$	123.9	$N \cdot s / m$	Linear resistance in the v direction
$Y_{v v}$	493.1	$N \cdot s^{2} / m^{2}$	Secondary resistance in the v direction
$Z_{w}$	225.9	$N \cdot s / m$	Linear resistance in the w direction
$Z_{w w}$	140.7	$N \cdot s^{2} / m^{2}$	Secondary resistance in the w direction
$N_{r}$	225.9	$N \cdot s / m$	Linear resistance in the r direction
$N_{r r}$	140.7	$N \cdot s^{2}$	Secondary resistance in the r direction
$I_{z}$	165	$N \cdot m \cdot s^{2}$	Moment of inertia

Table 2. The maximum tracking errors of each degree of freedom when ESO is present and absent, in three dimensions.

ROV Type	X-Axis (m)		Y-Axis (m)		Z-Axis (m)		$ψ$ -Axis (rad)
ROV Type	No ESO	ESO	No ESO	ESO	No ESO	ESO	No ESO	ESO
Leader ROV	20.1627	1.3074	15.5158	1.8377	0.0745	0.11115	3.0211	1.0646
Follower ROV (A)	3.0734	2.6936	3.3978	3.6524	0.0752	0.11656	1.8217	1.426
Follower ROV (B)	105.88	2.5896	72.47	2.5871	0.52	0.1246	48.61	0.9344

Table 3. The maximum thrust output of each thruster under time-varying disturbances when there is or is not ESO feedforward compensation.

	T1 (N)		T2 (N)		T3 (N)		T4 (N)		T5 (N)
	No ESO	ESO	No ESO	ESO	No ESO	ESO	No ESO	ESO	No ESO	ESO
Leader ROV	550.00	550.00	550.00	488.35	550.00	550.00	550.00	405.63	530.84	531.34
Follower ROV (A)	550.00	550.00	495.39	335.02	550.00	550.00	401.95	280.22	550.00	550.00
Follower ROV (B)	550.00	550.00	550.00	533.91	550.00	550.00	550.00	507.53	550.00	550.00

Table 4. The proportion of moments when thrust exceeds the limit in tracking control under time-varying disturbances.

ROV Type	T1 (%)		T2 (%)		T3 (%)		T4 (%)		T5 (%)
ROV Type	No ESO	ESO	No ESO	ESO	No ESO	ESO	No ESO	ESO	No ESO	ESO
Leader ROV	53.80	5.15	26.21	0	39.97	0.77	34.67	0	0	0
Follower ROV (A)	3.15	4.53	0	0	0.38	0.54	0	0	0.15	0.15
Follower ROV (B)	77.63	10.30	70.95	0	54.65	5.69	45.20	0	0.15	0.15

Table 5. Comparison of the maximum errors of each degree of freedom under mutation disturbance with ESO and without ESO conditions.

	X-Axis (m)		Y-Axis (m)		Z-Axis (m)		$ψ$ -Axis (rad)
	No ESO	ESO	No ESO	ESO	No ESO	ESO	No ESO	ESO
Leader ROV	30.2112	5.4182	16.9007	8.1867	0.57492	0.59326	5.1039	6.9648
Follower ROV (A)	32.9511	5.5851	25.9012	5.0911	0.5393	0.56843	8.6988	6.9395
Follower ROV (B)	49.6087	9.429	33.8687	8.9981	0.57405	0.58904	4.1802	7.0135

Table 6. The maximum thrust output of each thruster in the case of three-dimensional mutation perturbation with and without ESO feedforward compensation.

	T1 (N)		T2 (N)		T3 (N)		T4 (N)		T5 (N)
	No ESO	ESO	No ESO	ESO	No ESO	ESO	No ESO	ESO	No ESO	ESO
Leader ROV	550.00	550.00	550.00	550.00	550.00	550.00	550.00	550.00	550.00	550.00
Follower ROV (A)	550.00	550.00	550.00	550.00	550.00	550.00	550.00	550.00	550.00	550.00
Follower ROV (B)	550.00	550.00	550.00	550.00	550.00	550.00	550.00	550.00	550.00	550.00

Table 7. The proportion of moments when thrust exceeds the limit in tracking control under mutation perturbation.

ROV Type	T1 (%)		T2 (%)		T3 (%)		T4 (%)		T5 (%)
ROV Type	No ESO	ESO	No ESO	ESO	No ESO	ESO	No ESO	ESO	No ESO	ESO
Leader ROV	30.75	4.46	35.51	4.46	30.36	14.30	21.45	4.30	10.22	9.99
Follower ROV (A)	27.59	2.84	25.44	6.76	26.44	12.91	31.44	4.15	10.30	10.15
Follower ROV (B)	50.96	6.38	24.29	4.00	32.05	13.07	11.61	6.92	10.61	10.53

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gan, W.; Liang, H.; Cai, C. Research on Real-Time Trajectory Planning and Tracking Control for Multi-ROV Shipwreck Search. J. Mar. Sci. Eng. 2026, 14, 802. https://doi.org/10.3390/jmse14090802

AMA Style

Gan W, Liang H, Cai C. Research on Real-Time Trajectory Planning and Tracking Control for Multi-ROV Shipwreck Search. Journal of Marine Science and Engineering. 2026; 14(9):802. https://doi.org/10.3390/jmse14090802

Chicago/Turabian Style

Gan, Wenyang, Haozhe Liang, and Caixia Cai. 2026. "Research on Real-Time Trajectory Planning and Tracking Control for Multi-ROV Shipwreck Search" Journal of Marine Science and Engineering 14, no. 9: 802. https://doi.org/10.3390/jmse14090802

APA Style

Gan, W., Liang, H., & Cai, C. (2026). Research on Real-Time Trajectory Planning and Tracking Control for Multi-ROV Shipwreck Search. Journal of Marine Science and Engineering, 14(9), 802. https://doi.org/10.3390/jmse14090802

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Real-Time Trajectory Planning and Tracking Control for Multi-ROV Shipwreck Search

Abstract

1. Introduction

2. DWA Trajectory Planning

2.1. ROV Kinematic Model

2.2. Speed Sampling

2.3. Objective Function

3. ESO Combined with MPC–ASMC Tracking Control

3.1. MPC Kinematic Controller

3.2. ASMC Dynamics Controller

3.3. ESO Compensation

4. Simulation Results and Analysis

4.1. Trajectory Planning Simulation

4.2. Tracking Control Simulation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI