Adaptive Dynamic Prediction-Based Cooperative Interception Control Algorithm for Multi-Type Unmanned Surface Vessels

Yuan Liu; Bowen Tang; Lingyun Lu; Zhiqing Bai; Guoxing Li; Shikun Geng; Xirui Xu

doi:10.3390/jmse14010088

,

and

¹

School of Electronic Information and Intelligent Manufacturing, Anhui Xinhua University, Hefei 230001, China

²

Hefei Local Maritime (Port and Shipping) Management Service Center, Hefei 230011, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng.2026, 14(1), 88;https://doi.org/10.3390/jmse14010088

This article belongs to the Section Ocean Engineering

Version Notes

Order Reprints

Abstract

In the dynamic marine environment, the high mobility of intrusion targets, complex interference, and insufficient multi-vessel coordination accuracy pose significant challenges to the cooperative interception mission of multiple unmanned surface vehicles (USVs). This paper proposes an adaptive dynamic prediction-based cooperative interception control algorithm and establishes a “mission planning—anti-interference control—phased coordination” system. Specifically, it ensures interception accuracy through threat-level-oriented target assignment and extended Kalman filter multi-step prediction, offsets environmental interference by separating the cooperative encirclement and anti-interference modules using an improved Two-stage architecture, and optimizes the movement of nodes to form a stable blockade through the “target navigation—cooperative encirclement” strategy. Simulation results show that in a 1000 m × 1000 m mission area, the node trajectory deviation is reduced by 40% and the heading angle fluctuation is decreased by 50, compared with the limit cycle encirclement algorithm, the average interception time is shortened by 15% and the average final distance between the intrusion target and the guarded target is increased by 20%, when the target attempts to escape, the relevant collision rates are all below 0.3%. The TFMUSV framework ensures the stable optimization of the algorithm and significantly improves the efficiency and reliability of multi-USV cooperative interception in complex scenarios. This paper provides a highly adaptable technical solution for practical tasks such as maritime security and anti-smuggling.

Keywords:

multi-USVs; cooperative interception; maritime security; dynamic prediction

1. Introduction

Maritime security and anti-smuggling tasks are crucial to safeguarding national maritime sovereignty economic interests and regional stability. As the marine environment becomes increasingly complex and the mobility of suspicious targets continues to enhance traditional manned patrol and interception models can no longer meet the demand for efficient threat neutralization. Multi-unmanned surface vessel (USV) cooperative navigation and interception technology has emerged as a core technical solution by virtue of its advantages of flexible deployment wide coverage and strong decision-making coordination significantly improving the response speed and success rate of maritime security tasks. This technology is also widely applied in related fields such as offshore facility protection [1] marine search and rescue and border control and has become an indispensable part of modern intelligent maritime operations.

In scenarios where multiple unmanned surface vessels (USVs) perform cooperative interception tasks, the movement trajectory of the intruding target itself is highly uncertain. Especially in adversarial environments, the target often actively adopts evasion strategies to avoid interception [2], and this dynamic behavior significantly increases the difficulty for USVs to track and lock onto the target. At the same time, persistent interference factors in the marine environment such as wind, waves, and currents directly affect the navigation accuracy of USVs [3], making it difficult for them to stably maintain the preset navigation path and further exacerbating the difficulty of executing interception tasks. These practical challenges not only directly restrict the operational effectiveness of multi-USV cooperative interception but also impose much stricter requirements on the system’s capabilities of perception fusion, dynamic decision-making, and cooperative control than conventional tasks.

As the final execution phase of cooperative guarding tasks, the core objective of cooperative interception tasks is to implement precise interception of targets entering the interception area. This aims to eliminate potential threats and ensure the safety of guarded targets. In essence, it is a problem centered on the Target-Attacker-Defender (TAD) tripartite interaction system, as shown in Figure 1.

Figure 1. Target-Attacker-Defender Problem Model.

In the TAD framework, the attacker aims to evade interception and approach the guarded target, while the defender needs to quickly neutralize threats on the premise of ensuring the safety of the guarded target. The environmental perception information and intruder situation information, obtained during the cooperative patrolling and searching phases [4,5], serve as the fundamental support for the effective implementation of cooperative interception tasks.

The complexity of the TAD problem mainly stems from two aspects: first, the limited observation capability of each agent leads to errors and uncertainties in situational information acquisition, making situational awareness and communication capabilities the key to the success or failure of the task. Second, the state space has high-dimensional characteristics, which makes it difficult to solve the optimal solution through analytical methods or reverse construction [6]. To address the issue of limited observation capabilities, Zadka et al. [7] proposed a cyclic pursuit guidance law based on motion models with different linear velocities, and through the cyclic information interaction mechanism between interception nodes, this law reduces the adverse impact of limited observation on information acquisition. Karras et al. [8] further proposed a decentralized motion control protocol based on prescribed performance control, which relies on local and relative state feedback as well as a general onboard sensor suite for information acquisition, does not require explicit network communication, not only has low computational complexity but also enables robust and accurate formation control, and its effectiveness has been verified through experiments where four agent nodes intercept a single target. Regarding the problem of high-dimensional state space, early studies mostly relied on accurate prior information of the adversary: Fischer et al. [9] used the sparsity of road networks to construct a mixed-integer linear programming model to determine the location of collection points and vehicle path planning. Mirza et al. [10] optimized defense strategies through integer linear programming. Nonlinear model predictive control methods [11] and approximate dynamic programming methods [12] also showed good interception effects in specific scenarios, but these methods are highly dependent on accurate prior information and thus difficult to adapt to multi-agent cooperative interception tasks. Currently, the mainstream methods for solving the TAD problem have evolved into task collaboration execution methods based on Stackelberg games, optimal control theory methods, and HJI (Hamilton–Jacobi–Isaacs) partial differential equation solving methods [13], and all three play an important role in improving task robustness and decision-making optimization capabilities.

In the research on methods based on Stackelberg games, this hierarchical decision-making method has been widely applied due to its advantages in task collaborative execution, but it still has problems of insufficient scenario adaptability and model simplification. Dong et al. [14], building on the research in reference [15], used Stackelberg equilibrium to solve the dynamic guarding and interception problem and conducted an in-depth analysis of intruding target strategies. However, their intruding target prediction model is highly discretized and overly simplified, making it difficult to cope with complex dynamic environments. Liang et al. [16], addressing the security threats posed by unmanned aerial vehicles (UAVs) to critical infrastructure and public spaces, used Stackelberg games to model and analyze the dynamic guarding and interception problem, identified the decisive role of the number of guards around high-value targets in interception success rates, and solved the Stackelberg strong equilibrium through linear programming to maximize the success rate. Nevertheless, in this method, guards can only passively adjust their strategies to adapt to the intruder’s actions, and the interception modes are mostly simple blockades, lacking initiative. To address this, Liang et al. [17] further optimized the approach: based on analyzing the attack and evasion strategies of intruding targets, they focused on the collaborative cooperation between guarded targets and defenders to construct more efficient interception strategies. Zha et al. [18], on the other hand, targeted scenarios with different ratios of guards to intruders, established a zero-sum differential game model, and proposed an active defense mechanism based on entropy-based unpredictability measurement by solving the Nash equilibrium. Xu et al. [19] studied a tripartite leader–follower game model, in this three-level structure, defenders, as leaders, usually dominate decision-making with Stackelberg equilibrium strategies, while Nash equilibrium strategies are only used as alternative solutions. Although Stackelberg games have theoretical advantages, their core mode of “leaders make decisions first, followers respond later” easily leads to decision-making time delays, reducing the system’s response speed to real-time tasks and the flexibility of strategy adjustment. Thus, their adaptability and computational efficiency in dynamic environments still need to be improved.

As a traditional method for solving TAD problems, optimal control theory has achieved considerable progress in the field of multi-agent cooperative interception, while the HJI partial differential equation solving method and other auxiliary methods have also gradually developed. Singh et al. [20] studied a variant of the multi-target-attacker-defender differential game, the model includes multiple targets, one attacker, and one defender, allowing the defender to switch modes between “rendezvousing with the target (rescue)” and “intercepting the attacker,” while the attacker always tracks the nearest target, the study implements mode switching through a receding horizon method and decomposes the Riccati differential equation matrix to geometrically characterize the players’ trajectories. Valiant et al. [21] constructed a multi-agent anti-unmanned aerial vehicle (UAV) system, proposed a cooperative multi-agent interception strategy, and achieved optimal tracking and jamming of targets by optimizing the joint mobility and power control of agents. Hou et al. [22] designed a distributed cooperative search algorithm, aiming to minimize the search time of multiple UAVs under a known target probability distribution. Through the design of an importance function and a task planning system, central Voronoi tessellation for region division, receding horizon predictive control for online path planning, and combining minimum spanning trees to optimize the communication topology, the algorithm balances the requirements of target search and connectivity maintenance. With its efficiency and robustness, the HJI partial differential equation solving method can obtain the optimal strategies of two teams by solving the equation. Huang et al. [23] thus used neural networks to predict control actions and construct an enhanced dynamic model, and combined the HJI method to predict forward reachable sets for risk assessment. In addition, conservative path defense methods [24], genetic algorithms [25], and probabilistic navigation functions [26] have also been applied in TAD problems. However, it should be noted that optimal control theory relies on accurate models and deterministic conditions, making it difficult to cope with the complex dynamic environments and uncertainties in cooperative guarding tasks, and the HJI equation also faces challenges in handling high-dimensional state spaces.

Beyond theoretical research, the CARACaS control architecture (Control Architecture for Robotic Agent Command and Sensing, CARACaS) developed by NASA’s Jet Propulsion Laboratory (JPL) [27,28] has conducted the first systematic verification of multi-agent cooperative interception tasks in a real operational environment. It significantly enhances the capability to handle multi-task targets, avoids the problem of multiple nodes concentrating attacks on a single target, and effectively improves task success rates.

Although current multi-agent cooperative interception control algorithms have made progress in dynamic strategy generation, they still have obvious shortcomings: methods such as Stackelberg games rely on simplified and discretized prediction models, and when facing high target maneuverability or antagonism in complex dynamic environments, the model has poor adaptability to real scenarios, leading to a sharp drop in interception success rate. Optimal control theory and HJI equations need to handle high-dimensional state spaces, with algorithm complexity reaching O(n³), which is prone to decision delays as the number of system nodes increases, limiting real-time computing performance. Moreover, the efficiency of multi-target collaboration is low—for example, the CARACaS practical test shows that multiple nodes tend to concentrate attacks on a single target, resulting in resource imbalance, and existing distributed mechanisms lack load-balancing optimization. In related studies on multi-USV cooperative motion [29], the focus is limited to path planning optimization for static coverage tasks. By contrast, this paper fully considers dynamic adversarial targets, interference adaptation in complex environments, and real-time cooperative interception mechanisms. Compared with the core indicators of coverage efficiency and path optimality adopted in related research [29], this study introduces new performance metrics including interception time, the final distance between the target and the guarded object, and collision rate in dynamic evasion scenarios—all of which have been effectively optimized.

It is important to note that existing research on communication capabilities is mostly based on the idealized assumption of no delay and no packet loss, while in actual marine environments, communication channels face core limitations including signal attenuation caused by electromagnetic interference, limited communication distance that easily introduces transmission delays in multi-node collaboration, and dynamic topological changes due to node movement. These practical issues lead to unsynchronized situational information among nodes, which in turn increases collision risks or causes the collapse of encircling formations. Therefore, it is necessary to supplement a communication fault-tolerant mechanism in the algorithm design. To address the requirements of multi-unmanned surface vessel (USV) cooperative interception tasks, this paper integrates three types of core methods to construct a technical system and obtain corresponding data: the extended Kalman filter method is used to process the motion information of intruding targets, and a multi-step averaged prediction model is built—first, the real-time motion state of the target is converted into computable state numbers to reduce uncertainty errors, and then through single-step iterative update and averaged correction, the accurate future position of the target is output. An adaptive anti-interference navigation control algorithm is designed based on the improved Two-stage architecture: the cooperative interception module outputs interception control numbers by integrating the state of USVs, target information, and cluster information, while the anti-interference module fuses environmental interference information to generate a course compensation angle, which is combined with the training optimization of a reward function oriented to course deviation to reduce the impact of wind and water currents on navigation, a cooperative model is constructed using a two-stage algorithm of “target navigation control—cooperative interception control”: in the target navigation stage, various information maps are used as state values, and USVs are guided to approach the target safely through distance and obstacle avoidance rewards, in the cooperative interception stage, the artificial potential field method is introduced to design distributed rewards, guiding USVs to form a blocking circle evenly, and the strategy is optimized by weighted fusion of reward functions. The results show that these methods are superior to traditional models in target prediction accuracy, USV anti-interference capability, and multi-vessel cooperative efficiency, providing key technical support for cooperative interception.

2. System Task Planning

To accurately describe the task status of the multi-unmanned surface vessel (USV) system in guarding tasks, this section constructs a systematic task plan based on multi-USV collaboration and intruding targets, in guarding tasks, the cooperative interception task, as the final phase, serves as the core objective for multi-USVs to perform cooperative patrolling and cooperative searching tasks. However, multi-USVs cannot implement effective interception actions against intruding targets immediately at the start of the cooperative interception task—especially in practical operational environments, the interception targets faced by multi-USVs often present complex situations such as multi-directional, multi-batch, and antagonistic characteristics. Existing methods such as the Path planning method for maritime dynamic target search based on improved GBNN [30] may not be suitable for this interception and escape scenario. The Optimization of Multiagent Collaboration for Efficient Maritime Target Search [31] may lack sufficient search and detection capabilities in this scenario. In such cases, the system needs to establish a clear task process when executing the cooperative interception task, decomposing the complex task into a series of sequentially executed subtasks. The system plan constructed in this section consists of the system operation process, the assignment of intruding targets, and the dynamic prediction of intruding targets. Through this modular task assignment plan, the computational complexity during the cooperative interception task can be reduced, ensuring that the system can intercept intruding targets efficiently and accurately.

2.1. System Operation Process

Assume a multi-unmanned surface vessel (USV) system with

N_{U}

nodes performs a cooperative interception task in a rectangular mission area

A \in R^{2}

(Default: 1000 m × 1000 m, after expansion, it supports 5000 m × 5000 m and above). During the execution of this task, the system adopts a limited centralized-distributed autonomous control architecture, and the system’s operation process is as follows:

The operation process of the cooperative interception task for the multi-USV system designed in this paper is shown in Figure 2. When the system confirms all intruding targets, that is, when the central hub node

T_{G u a r d}

obtains a set of intruding targets

I

composed of multiple targets (

I = \{I_{1}, I_{2}, \dots, I_{j}\}, j = N_{I}

and

N_{I}

represents the number of intruding targets), the system starts to execute the cooperative interception task. To address real-world communication constraints, the system integrates a three-layer communication guarantee mechanism into the centralized-distributed control architecture: a master-slave + relay hybrid topology with backup relay nodes for rapid switchover, priority-based transmission using UDP/TCP protocols to balance real-time performance and reliability, and local caching combined with predictive compensation to mitigate the impact of communication interruptions. First,

T_{G u a r d}

assigns

I_{j}

to the corresponding

U S V_{i}

as its interception target based on the position information of the intruding targets and the position information of each node in the system. After

U S V_{i}

calculates the predicted position of

I_{j}

, it takes this position as its target navigation point, approaches it quickly, and then initiates interception of

I_{j}

when the interception conditions are met. The condition for the system to intercept

I_{j}

is that when all

U S V_{i}

are sufficiently close to

I_{j}

, that is,

\forall U S V_{i} \in H_{I_{j}}, d_{j} < R_{L}

where

R_{L}

is the interception radius.

Figure 2. Multi-USV System Cooperative Interception Task Flow.

The interception success criterion shown in the flowchart is: when the distance between the unmanned surface vehicle (USV) and the intruding target is close enough and all USVs can be evenly distributed around the intruding target, the coordinated interception mission is deemed successful. The definition of interception success is expressed as Equation (1).

\{\begin{matrix} D i s t_{I_{j}} < d (\forall U S V_{i} \in H_{I_{j}}) \\ \sum_{j} |T_{I_{j}} - \frac{360 °}{n}| < 45 ° \end{matrix}

(1)

In Equation (1),

H_{I_{j}}

represents the set of system nodes for intercepting

I_{j}

, and this set contains

n

nodes.

D i s t_{I_{j}}

denotes the distance between

U S V_{i}

and the corresponding

I_{i}

.

d

is the distance judgment constant.

α_{j}

stands for the angle between

U S V_{i}

and its adjacent node

U S V_{j}

, and

α_{j}

is also the angle between

U S V_{i}

and the interception target

I_{j}

, as shown in Figure 3.

Figure 3. Schematic Diagram of Interception Angle.

2.2. Assignment of Intruding Targets

After the system obtains the set

I

, it needs to assign each element in

I

to the corresponding

U S V_{i}

as its interception target. To reasonably allocate the elements in

I

, based on mission requirements,

T_{G u a r d}

first calibrates the threat level

w_{m}

for each

I_{j}

in

I

where

w_{m}, w_{m} \in \{1,2, \dots, M_{w}\}

. Here

M_{w}

represents the highest threat level and the calculation method of

w_{m}

is determined by Equation (2).

w_{m} = \frac{1 + λ \cdot \frac{v_{I_{j}} (k)}{v_{m a x}}}{\sqrt{{(x_{T_{G u a r d} -} x_{I_{j}} (k))}^{2} + {(y_{T_{G u a r d} -} y_{I_{j}} (k))}^{2}}}

(2)

Here,

(x_{T_{G u a r d}}, y_{T_{G u a r d}})

and

(x_{μ_{I_{i}}}, y_{μ_{I_{i}}})

respectively represent the

x

and

y

coordinates of the positions of

T_{G u a r d}

and

I_{j}

at time

k

.

v_{I_{j}} (k)

is the velocity of the intruding target

I_{j}

at time

k

.

v_{m a x}

is the maximum possible velocity of the intruding target in the mission scenario.

λ

is the velocity weight coefficient (

λ

> 0, determined by the mission scenario, for example, λ = 0.4 for maritime security scenarios and

λ

= 0.6 for anti-smuggling scenarios).

As can be seen from Equation (2), the threat function exhibits an obvious nonlinear characteristic: when the target approaches

T_{G u a r d}

(with the distance decreasing), the denominator decreases nonlinearly, leading to a rapid rise in the threat value; when the target velocity increases (

v_{I_{j}} (k)

approaches

v_{m a x}

), the numerator approaches

1 + λ

, further amplifying the threat value. This is consistent with the practical scenario cognition that “targets approaching at high speed pose a higher threat.”

To address the mapping problem from continuous threat values

w_{m}

to discrete threat levels

M_{w}

, a discretization method based on interval division is proposed. This method can balance the distinguishability of threat degrees and the rationality of unmanned boat resource allocation.

First, calculate the initial threat values

w_{m}^{(j)}

of all intruding targets using Equation (2) (where

j = 1,2, . . ., N_{I}

, and

N_{I}

denotes the total number of intruding targets), and determine the maximum value

w_{m a x}

and minimum value

w_{m i n}

of the threat values.

Then, combined with the total number of unmanned boats

N_{U}

and the maximum number of interception nodes per target

n

, set

M_{w} = ⌈ N_{U} / ⌉

in accordance with the “resource matching principle” (where ⌈⋅⌉ denotes the ceiling function). This setting ensures that high-threat targets can be allocated a sufficient number of unmanned boats, while avoiding resource waste on low-threat targets.

Finally, divide the threat intervals and map the levels. Uniformly divide the interval [

w_{m i n}

,

w_{m a x}

] into

M_{w}

non-overlapping subintervals

Ω_{1}, Ω_{2}, . . ., Ω_{M_{w}}

. The boundary of the

t

-th interval is

b_{t} = w_{m i n} + \frac{t - 1}{M_{w} - 1} (w_{m a x} - w_{m i n})

(where

t = 1,2, . . ., M_{w}

) If the threat value

w_{m}^{(j)}

of a target falls into

Ω_{t}

, its discrete threat level is determined as

t

, with the mathematical expressions as follows:

\{\begin{matrix} L e v e l (w_{m}^{(j)}) = t, w h e n b_{t} \leq w_{m}^{(j)} \leq b_{t + 1} (t = 1,2, . . ., M_{w} - 1) \\ L e v e l (w_{m}^{(j)}) = M_{w}, w h e n w_{m}^{(j)} \geq b_{M_{w}} \end{matrix}

(3)

After obtaining the threat levels of all intruding targets,

T_{G u a r d}

assigns each intruding target to the corresponding

U S V_{i}

as its interception target. Based on the actual situation of the coordinated interception mission of the multi-USV system, the specific assignment rules are as follows. (a) Each

I_{j}

requires at least one node and at most

n

nodes for interception. To improve the interception success rate, priority is given to allocating the full number of USVs to intercept

I_{i}

. The specific value of

n

is determined by the numbers of

N_{U}

and

N_{I}

. (b) When assigning intruding targets,

T_{G u a r d}

prioritizes the assignment of intruding targets with higher threat levels. This ensures that the intruding targets posing the greatest threat to

T_{G u a r d}

can obtain the most interception resources and guarantees the safety of

T_{G u a r d}

.

During the assignment process,

T_{G u a r d}

first calculates the distance

D i s t_{U S V_{i} I_{j}}

between each node in the system and each

I_{i}

in

I

. Here,

D i s t_{I_{j}} = \{D i s t_{U S V_{1} I_{j}}, D i s t_{U S V_{2} I_{j}} . . . . D i s t_{U S V_{i} I_{j}}\}

. The calculation method of

D i s t_{I_{j}}

is shown in Equation (4).

D i s t_{I_{j}} = \sqrt{{(x_{U S V_{i}} (k) - x_{I_{j}} (k))}^{2} + {(y_{U S V_{i}} (k) - y_{I_{j}} (k))}^{2}}

(4)

Here,

(x_{U S V_{i}} (k), y_{U S V_{i}} (k))

represents the

x

and

y

coordinates of

U S V_{i}

’s position at time

k

. When

T_{G u a r d}

assigns the intruding target

I_{j}

, it will select the node with the shortest distance to

I_{j}

from the set

D i s t_{I_{j}}

for interception. If multiple

U S V_{i}

have the same distance to

I_{j}

, a required number of nodes will be randomly selected from these

U S V_{i}

for assignment. The

U S V_{i}

assigned to an intruding target will be automatically excluded from the assignment process for the next

I_{j}

.

2.3. Prediction of Intruding Target Positions

Since intruding targets are always in motion, their positions will change while the nodes are moving toward them. At this point, if the nodes still take the positions of the intruding targets when they were detected as their navigation targets, it is highly likely to cause the failure of the coordinated interception mission. In addition, when an intruding target changes its original navigation path to escape during the interception process, the USVs also need to predict the position of the intruding target after escape. This guides the USVs to move to that position for re-intercepting the intruding target, thereby reducing the time required for the system to re-intercept the intruding target and improving the system’s interception efficiency.

To solve the problem of predicting the positions of intruding targets at future moments, an extended Kalman filter is used to establish a corresponding position prediction model. This enables

T_{G u a r d}

to calculate the predicted position information

{\hat{μ}}_{I_{j}}

of

I_{j}

at future moments based on its current motion state information. The specific modeling process is as follows.

In the Extended Kalman Filter, the state prediction equation of an intruding target can be defined using the motion state information

M_{I_{j}} (k)

of the intruding target at time

k

as shown in the following equation.

M_{I_{j}} (k) = f (M_{I_{j}} (k - 1)) + n (k - 1)

(5)

Among them,

f (M_{I_{j}} (k - 1))

is the state transition relation, and

n (k - 1)

is the process noise. To obtain the measured value

Y_{I_{j}} (k)

of

I_{j}

, the following can be derived using the measurement matrix

H_{I_{j}} (k)

and the measurement noise

ω_{I_{j}} (k)

:

Y_{I_{j}} (k) = H_{I_{j}} (k) M_{I_{j}} (k) + ω_{I_{j}} (k)

(6)

Of which:

f (M_{I_{j}} (k - 1)) = [\begin{matrix} x_{I_{j}} (k - 1) + v_{I_{j}} (k - 1) c o s (ψ_{I_{j}} (k - 1)) Δ_{j} (k) \\ y_{I_{j}} (k - 1) + v_{I_{j}} (k - 1) s i n (ψ_{I_{j}} (k - 1)) Δ_{j} (k) \\ ψ_{I_{j}} (k - 1) + {\dot{ψ}}_{I_{j}} (k - 1) Δ_{j} (k) \\ \begin{matrix} {\dot{ψ}}_{I_{j}} (k - 1) \\ v_{I_{j}} (k - 1) \end{matrix} \end{matrix}], H_{I_{j}} (k) = I_{5 \times 5}

(7)

Here,

Δ_{I_{j}} (k)

represents the prediction interval time,

(x_{I_{j}} (k - 1), y_{I_{j}} (k - 1))

,

ψ_{I_{j}} (k - 1)

and

{\dot{ψ}}_{I_{j}} (k - 1)

respectively denote the position, heading angle and angular velocity of the intruding target at the previous moment, and

H_{I_{j}} (k)

is a 5th-order identity matrix. Then, to find the Jacobian matrix

f (M_{I_{j}} (k - 1))

of

F_{I_{j}} (k - 1)

the following can be obtained:

F_{I_{j}} (k - 1) = [\begin{matrix} 1,0, - v_{I_{j}} (k - 1) s i n (ψ_{I_{j}} (k - 1)) Δ_{I_{j}} (k), c o s (ψ_{I_{j}} (k - 1)) Δ_{I_{j}} (k), 0 \\ 0,1, v_{I_{j}} (k - 1) c o s (ψ_{I_{j}} (k - 1)) Δ_{I_{j}} (k), s i n (ψ_{I_{j}} (k - 1)) Δ_{I_{j}} (k), 0 \\ 0,0, 1, Δ_{I_{j}} (k), 0 \\ 0,0, 0,1, 0 \\ 0,0, 0,0, 1 \end{matrix}]

(8)

F_{I_{j}} (k - 1)

is the state transition matrix. From this, the state prediction equation of the standard extended Kalman filter is obtained as:

{\hat{M}}_{I_{j}} (k | k - 1) = F_{I_{j}} (k - 1) {\hat{M}}_{I_{j}} (k - 1 | k - 1)

(9)

Here,

{\hat{M}}_{I_{j}} (k | k - 1)

represents the predicted state at the previous moment. The covariance matrix

μ_{I_{j}} (k | k - 1)

corresponding to this predicted state is shown as follows:

μ_{I_{j}} (k | k - 1) = F_{I_{j}} (k - 1) μ_{I_{j}} (k - 1 | k - 1) F_{I_{j}}^{T} (k - 1) + Q_{I_{j}} (k - 1)

(10)

Here,

μ_{I_{j}} (k - 1 | k - 1)

represents the covariance matrix corresponding to the predicted state at the previous moment, and

Q_{I_{j}} (k - 1)

is the covariance matrix of the process noise

n_{I_{j}} (k - 1)

. The final predicted state

{\hat{M}}_{I_{j}} (k | k - 1)

is:

{\hat{M}}_{I_{j}} (k | k) = {\hat{M}}_{I_{j}} (k | k - 1) + K_{I_{j}} (k) (Y_{I_{j}} (k) - H_{I_{j}} (k) {\hat{M}}_{I_{j}} (k | k - 1))

(11)

The covariance matrix corresponding to

{\hat{M}}_{I_{j}} (k | k)

is

μ_{I_{j}} (k | k)

:

μ_{I_{j}} (k | k) = (I_{5 \times 5} - K_{I_{j}} (k) H_{I_{j}} (k)) μ_{I_{j}} (k | k - 1)

(12)

Here,

K_{I_{j}} (k) = μ_{I_{j}} (k | k - 1) H_{I_{j}}^{T} (k) {(H_{I_{j}} (k) μ_{I_{j}} (k | k - 1) H_{I_{j}}^{T} (k) + R_{I_{j}} (k))}^{- 1}

is the Kalman gain and

R_{I_{j}} (k)

is the covariance matrix of the measurement noise

ω_{I_{j}} (k)

.

Since the motion state of the intruding target is constantly changing it is necessary to extend the single-step to multi-step prediction average the results of the multi-step prediction and use them as the prediction parameters of the Kalman filter so that the prediction results can be more accurate.

2.3.1. The Compatibility Between Multi-Step Average Prediction and Markov Property

The classic Extended Kalman Filter (EKF) follows the Markov property, meaning that the state at time

t + 1

depends only on the state at time

t

and is independent of historical states. This section clarifies that the multi-step average prediction proposed in this paper does not violate this fundamental property, instead, it optimizes prediction accuracy by fusing independent single-step prediction results.

The multi-step average prediction in this paper is implemented based on independent recursive single-step predictions, rather than directly deriving future states from historical states. First, based on the current state

{\hat{M}}_{I_{j}} (t ∣ t)

at time

t

, the predicted states at times

t + 1, t + 2, . . ., t + n

are recursively calculated, respectively, using the EKF single-step prediction formula

{\hat{M}}_{I_{j}} (k ∣ k - 1) = F_{I_{j}} (k - 1) {\hat{M}}_{I_{j}} (k - 1 ∣ k - 1)

. Each single-step prediction

{\hat{M}}_{I_{j}} (t + k ∣ t + k - 1)

(where

k

= 1, 2,..., n) depends only on the state at the previous moment

{\hat{M}}_{I_{j}} (t + k - 1 ∣ t + k - 1)

, which is fully consistent with the Markov property.

On this basis, the average of

n

independent single-step prediction results is calculated to reduce the impact of random errors in a single prediction, thereby improving the stability of the final predicted state.

2.3.2. Calculation of Multi-Step Prediction Covariance

To accurately reflect the uncertainty of the multi-step average prediction results, this section designs a modified covariance calculation method based on the principle of variance addition for independent random variables, which is derived as follows.

Assume that the deviation of the k-th single-step prediction is

ε_{k} = {\hat{M}}_{I_{j}} (t + k ∣ t + k - 1) - {\hat{M}}_{I_{j}} (t + k)

, where

{\hat{M}}_{I_{j}} (t + k)

is the true state of the intrusion target at time

t + k

. Since each single-step prediction is independent of each other, the deviations

ε_{1}, ε_{2}, \dots, ε_{n}

are independent and identically distributed random variables with a variance of

μ_{I_{j}} (t + k ∣ t + k - 1)

.

The average prediction deviation of the multi-step prediction is:

ε_{a v g} = \frac{1}{n} \sum_{k = 1}^{n} ε_{k}

.

According to the principle of variance addition for independent random variables, the variance of the average deviation is:

μ_{I_{j}}^{a v g} (t + n∣ t) = V a r (ε_{a v g}) = \frac{1}{n^{2}} \sum_{k = 1}^{n} μ_{I_{j}} (t + k ∣ t + k - 1)

. In the formula, since the single-step prediction deviations are independent of each other, the covariance terms between different

ε_{k}

are 0, which simplifies the calculation process.

2.3.3. Selection Criterion for the n-Step Prediction Horizon

To ensure the rationality of setting the prediction step

n

and avoid parameter arbitrariness, this section proposes a determination criterion for n by combining “target motion complexity” and “USV system response delay”.

Target motion is categorized into two types: uniform linear motion (low complexity) and maneuvering motion (high complexity, such as sudden turns, acceleration, etc.). Maneuvering motion increases the uncertainty of the target trajectory, requiring more prediction steps to cover the possible motion range. The time required for the USV to adjust its navigation strategy based on the predicted target position is denoted as

Δ t_{r e s p o n s e}

. The prediction horizon must cover the target’s motion range within this delay time to ensure that the USV can intercept the target in a timely manner.

The specific calculation formula for

n

is:

n = \{\begin{matrix} ⌈ \frac{v_{I_{j \cdot Δ t_{r e s p o n s e}}}}{d_{s a f e}} ⌉ U n i f o r m l i n e a r m o t i o n \\ 2 \times ⌈ \frac{v_{I_{j \cdot Δ t_{r e s p o n s e}}}}{d_{s a f e}} ⌉ M a n e u v e r i n g m o t i o n \end{matrix}

(13)

In this formula,

v_{I_{j}}

is the velocity of the intrusion target.

Δ t_{r e s p o n s e}

is the USV system response delay.

d_{s a f e}

is the safe interception distance threshold.

⌈ \cdot ⌉

denotes the ceiling function, which rounds up the calculation result to ensure the prediction horizon is sufficient.

In summary the calculation process of

{\hat{μ}}_{I_{j}}

information is shown in Algorithm 1:

Algorithm 1: Calculation Process of

{\hat{μ}}_{I_{j}} .

1: Initialize the state transition relation and the motion state information

M_{I_{j}} (k)

of the intruding target, and set the number of prediction steps

n

.
2: Execute when

k + 1 \neq k + n

.
3:

M_{I_{j}} (k + 1 | k) = f (M_{I_{j}} (k | k))

.
4: End.
5: The result of the n-step prediction for the motion state quantity of the intruding target is obtained as:

{\hat{M}}_{I_{j}} (k + n | k) = (\prod_{i = 0}^{n - 1} F_{I_{j}} (k + i)) {\hat{M}}_{I_{j}} (k | k)

. where

F_{I_{j}} (k + i)

is the state transition matrix at time

k + i

.
6: Calculate the average value of the intruding target’s motion state quantity and use it as the prediction parameter of the Kalman filter; meanwhile, use covariance

μ_{I_{j}}^{a v g} (t + n∣ t) = \frac{1}{n^{2}} \sum_{k = 1}^{n} μ_{I_{j}} (t + k ∣ t + k - 1)

7: The predicted parameters based on the Kalman filter obtain

{\hat{μ}}_{I_{j}}

according to the motion state quantity of the intruding target and the motion state quantity of the unmanned boat.

3. Adaptive Anti-Interference Navigation Control Algorithm

In the execution of actual interception missions, due to the limited size and tonnage of unmanned boats they are more sensitive to external interference. During navigation they are easily affected by interference factors in the complex marine environment which makes it difficult for the nodes to strictly follow the control commands generated by the system for navigation. This problem is particularly critical in cooperative interception missions. Compared with cooperative patrol and cooperative search missions the cooperative interception mission has higher requirements for the navigation accuracy of the nodes. Only by having strong anti-interference navigation performance can the nodes be ensured to reach effective interception positions accurately intercept intruding targets and reduce the probability of system interception failure. To improve the anti-interference navigation capability of the nodes in the system this paper proposes an adaptive anti-interference navigation control algorithm. the algorithm adopts an improved Two-stage model control architecture [32] and implements anti-interference strategies and navigation control through separate modules. The training of this adaptive anti-interference navigation control algorithm is implemented through reinforcement learning based on the Deep Q-Network (DQN). DQN has better fitting accuracy for discrete actions than continuous action algorithms which can reduce the decision error of the course compensation angle and meet the performance index of reduced heading angle fluctuation. In the algorithm training real-time state values and self-action values are introduced to design the reward values reasonably. This algorithm significantly improves the navigation accuracy of cluster nodes and makes the training strategy efficiently applicable to solving the target navigation control requirements in the cooperative interception mission of the multi-unmanned boat system. This section will conduct a detailed analysis of this algorithm.

3.1. Two-Stage Architecture Design

This subsection introduces the framework of the adaptive anti-interference navigation control algorithm. As shown in Figure 4 this framework is a Two-stage architecture. Different modules in the framework are responsible for outputting different control numbers to solve the two problems of cooperative encirclement control and anti-interference navigation control. Specifically the cooperative encirclement control module outputs the corresponding encirclement navigation control numbers and the anti-interference navigation control module outputs the anti-interference navigation control numbers.

Figure 4. Schematic Diagram of the Two-stage Architecture.

The lower part of Figure 4 shows the cooperative encirclement control module. First information acquired through sensor collection and communication is used to update the navigation state information of the unmanned boat. The navigation state information of the unmanned boat includes its own state information target information and the state information of other unmanned boats. Based on this information the encirclement control algorithm outputs the corresponding encirclement control numbers to enable the unmanned boat to approach the interception target quickly and accurately and to make the unmanned boats in the system generate cooperative encirclement behavior. As shown in the upper part of Figure 4 in the anti-interference navigation control module the unmanned boat fuses the collected environmental interference numbers and the navigation control numbers output by the cooperative encirclement control module to generate navigation state information and adopts an end-to-end approach to generate the anti-interference navigation control numbers of the unmanned boat.

3.2. Algorithm Training

3.2.1. State Value

During the execution of the mission the movement of the unmanned boat is mainly affected by wind and current interference in the environment. Therefore the state value obtained by

U S V_{i}

at time

k

is

s_{i} (k) = \{s_{i}^{w i n d} (k), s_{i}^{c u r r e n t} (k)\}

, where

s_{i}^{w i n d} (k)

represents the state value of Wind and

s_{i}^{c u r r e n t} (k)

represents the state value of Current. Next a detailed analysis of

s_{i}^{w i n d} (k)

and

s_{i}^{c u r r e n t} (k)

will be conducted. Since the disturbances of wind and current on the hull are independent of each other the states of wind and current are analyzed separately. When the unmanned boat is navigating the force exerted by the wind on the navigating unmanned boat can be obtained according to Equation (14) as:

R_{A} = 0.5 \cdot ρ_{a} \cdot C_{D} \cdot A \cdot (v_{w}^{2})

(14)

where

ρ_{a}

is the density of air

A

is the projected area of the hull on the water surface

C_{D}

is the drag coefficient of the hull and

v_{w}

represents the wind speed whose value is relatively small. Therefore under the action of wind lateral force the motion state of the unmanned boat during navigation will change accordingly. The lateral force from the wind is shown in Figure 5 and the lateral force it receives can be expressed as:

R_{s i d e} = f_{s i g n} (θ_{w}) \cdot 0.5 \cdot ρ_{a} \cdot C_{D} \cdot A_{s i d e} \cdot {(v_{w} \times s i n (θ_{w}))}^{2}

(15)

f_{s i g n} (θ_{w}) = \{\begin{matrix} 1 θ_{w} \geq 0 \\ - 1 θ_{w} < 0 \end{matrix}

(16)

where

θ_{w}

is the direction of the wind in the relative coordinate system of the unmanned boat.

Figure 5. Lateral Force on the Unmanned Boat Under Wind Action.

While being disturbed by the wind the current also exerts a corresponding effect on the navigating unmanned boat at the same time. This effect is caused by the current velocity component that has the same direction as the unmanned boat’s velocity. The navigation deviation of the unmanned boat under the action of the current is shown in Figure 6. According to Equations (17) and (18) the translational velocity and heading angular velocity of the unmanned boat under the action of the current can be obtained which are denoted as

v_{U S V}^{'}

and

θ_{U S V}^{'}

respectively:

v_{U S V}^{'} = {[{(v_{U S V} c o s (θ_{U S V}) + v_{c} c o s (θ_{w}))}^{2} + {(v_{U S V} s i n (θ_{U S V}) + v_{c} s i n (θ_{c}))}^{2}]}^{\frac{1}{2}}

(17)

θ_{U S V}^{'} = a t a n (\frac{v_{U S V} c o s (θ_{U S V}) + v_{c} c o s (θ_{w})}{v_{U S V} s i n (θ_{U S V}) + v_{c} s i n (θ_{c})})

(18)

Δ θ = |θ_{U S V} - θ_{U S V}^{'}|

(19)

where

Δ θ

is the yaw angle caused by current disturbance and

v_{c}

and

θ_{c}

are the current velocity and direction, respectively. Based on Equations (14), (17) and (18) the states of wind and current can be discretized. Based on the above analysis the state value of wind can be obtained as:

S_{i}^{w i n d} (k) = \{θ_{W_{i}} (k), v_{W_{i}} (k)\}

(20)

θ_{W_{i}} (k) \in θ_{W}, θ_{W} = [θ_{W_{1}}, θ_{W_{2}}, \dots θ_{W_{n}}], v_{W_{i}} (k) \in V_{W}, V_{W} = [V_{W_{1}}, V_{W_{2}}, \dots V_{W_{n}}]

Figure 6. Deviation of Unmanned Boat Movement Under Current Action.

θ_{W_{i} (k)}

and

v_{W_{i} (k)}

represent the wind direction and speed acquired by

U S V_{i}

at time

k

.

θ_{W}

and

V_{W}

denote the value spaces of wind direction and speed. The state value of the current is:

S_{i}^{c u r r e n t} (k) = \{θ_{C_{i}} (k), v_{C_{i}} (k)\}

(21)

θ_{C_{i}} (k) \in θ_{C}, θ_{C} = [θ_{C_{1}}, θ_{C_{2}}, \dots θ_{C_{n}}], v_{C_{i}} (k) \in V_{C}, V_{C} = [V_{C_{1}}, V_{C_{2}}, \dots V_{C_{n}}]

.

θ_{C_{i}} (k)

and

v_{C_{i}} (k)

are the current direction and speed acquired by

U S V_{i}

at time

k

.

θ_{C}

and

V_{C}

represent the value spaces of current direction and speed.

3.2.2. Action Value

Under Wind and Current interference, the actual navigation route of the unmanned boat will deviate from the route generated by the algorithm’s control command. To ensure that each node in the system can navigate in accordance with the control command given by the algorithm a certain heading compensation angle

Δ θ_{H}

needs to be provided to offset the heading angle deviation caused by environmental interference.

Δ θ_{H}

is the action value output by the adaptive anti-interference navigation control algorithm.

Δ θ_{H}

needs to satisfy Equation (22):

\{\begin{matrix} 0.5 \cdot ρ_{a} \cdot C_{D} \cdot A_{s i d e} \cdot f_{s i g n} (θ_{w} - Δ θ_{H}) \times \\ {(v_{w} s i n (θ_{w} - Δ θ_{H}))}^{2} + ξ s i n (Δ θ_{H}) F_{p} = 0 \\ s i n (Δ θ_{H}) v_{U S V} + v_{c} s i n (θ_{c} - θ_{U S V}) = 0 \end{matrix}

(22)

f_{s i g n} (θ) = \{\begin{matrix} 1 i f θ \geq 0 \\ - 1 o t h e r w i s e \end{matrix}

(23)

where

v_{w}

and

v_{c}

are the velocities of wind and current in the absolute coordinate system respectively

ρ_{a}

is the air density

A_{s i d e}

is the projected area of the hull on the water surface and

C_{D}

is the hull drag coefficient.

θ_{w}

and

θ_{c}

are the directions of wind and current in the relative coordinate system respectively

F_{p}

is the thrust generated by the unmanned boat and

ξ (ξ > 0)

is the scaling factor. The optional value space

Δ θ_{H}

is used as the action value

a_{i} (k)

of the adaptive anti-interference navigation control algorithm where

Δ θ_{H} = {0, \pm φ, \pm 2 φ, \dots, \pm n φ}

and

φ

is the resolution of the heading compensation angle. The value of the discrete coefficient

n

is determined according to actual conditions. The action space is composed of different

a_{i} (k)

.

3.2.3. Reward Function

To enable the unmanned boat to have a certain anti-interference navigation capability during navigation the heading angle

θ_{i} (k)

of the unmanned boat after adopting the corresponding control command is taken as the standard. The reward value of

a_{i} (k)

is determined based on the actual heading angle

θ_{s a i l i n g} (k)

of the unmanned boat after outputting the action value

a_{i} (k)

. The following definitions are made before determining the reward value:

Definition 1:

The absolute difference between

θ_{s a i l i n g} (k)

and

θ_{i} (k)

is defined as the interference heading deviation angle of

U S V_{i}

, which is expressed as:

θ_{d e v i a t i o n} (k) = |θ_{s a i l i n g} (k) - θ_{i} (k)|

(24)

Definition 2:

The difference between

θ_{d e v i a t i o n} (k)

and

θ_{d e v i a t i o n} (k - 1)

is defined as the interference heading deviation trend of the unmanned boat, which is expressed as:

Δ θ_{d e v i a t i o n} (k) = θ_{d e v i a t i o n} (k) - θ_{d e v i a t i o n} (k - 1)

(25)

Combining Definition 1 and Definition 2, the value of

r_{i} (k)

is determined by Equation (26):

\{\begin{matrix} 0.5, i f Δ θ_{d e v i a t i o n} (k) \leq 0 \land Δ θ_{d e v i a t i o n} (k) \geq θ_{g a t e} \\ 1, i f Δ θ_{d e v i a t i o n} (k) \leq 0 \land Δ θ_{d e v i a t i o n} (k) < θ_{g a t e} \\ - 0.5, i f Δ θ_{d e v i a t i o n} (k) > 0 \land Δ θ_{d e v i a t i o n} (k) < θ_{g a t e} \\ - 1, i f Δ θ_{d e v i a t i o n} (k) > 0 \land Δ θ_{d e v i a t i o n} (k) \geq θ_{g a t e} \end{matrix}

(26)

where

θ_{g a t e}

is the threshold for evaluating the heading deviation angle

Δ θ_{d e v i a t i o n} (k)

. Here,

θ_{g a t e}

is set to 5°, a value determined based on multi-USV sea trial data and analysis of marine environmental disturbance characteristics. It aligns with the heading angle fluctuation range of USVs (approximately 3°~7° without anti-disturbance control) when training environmental parameters in Table 1. Setting

θ_{g a t e}

to 5° can effectively distinguish between acceptable deviations and deviations requiring correction. When

{∆ θ}_{d e v i a t i o n} (k) < 5 °

, the heading deviation is within the system’s self-correction capability, and a high positive reward is given to encourage the current control strategy. When

{∆ θ}_{d e v i a t i o n} (k) > 5 °

, the deviation exceeds the self-correction range, and negative rewards are used to guide the algorithm to adjust the compensation angle.

Table 1. Hyperparameters for Training the Anti-Interference Navigation Algorithm Module.

The reason for choosing a piecewise linear function with thresholds is that such a function has low computational complexity, which can meet the real-time control requirements of the USV and avoid interception delays caused by computational latency in complex functions such as neural networks. Moreover, it features clear strategy guidance: through stepwise rewards of 1 (deviation decreasing significantly), 0.5 (deviation decreasing slightly), −0.5 (deviation increasing slightly), and −1 (deviation increasing significantly), it can clearly guide the algorithm to prioritize learning the control strategy of rapid convergence of deviations rather than falling into the local optimum of repeated oscillations with tiny deviations. Meanwhile, when compared with exponential functions, it shows more sensitive responses in scenarios with sudden changes in wind speed and flow velocity.

To train the adaptive anti-interference navigation control algorithm a training task area

A \in R^{2}

with a size of 1000 m × 1000 m² is set. As shown in Figure 7, a corresponding environmental interference amount is added to each

g

. The environmental interference amount is randomly selected from

θ_{W}

V_{W}

θ_{C}

and

V_{C}

and remains unchanged throughout the entire task process. The parameters required for training are shown in Table 1.

Figure 7. Training Task Area.

Figure 8a shows the navigation trajectories of the unmanned boat under three scenarios, respectively: adopting the adaptive anti-interference navigation control algorithm (Scenario 1), not adopting the adaptive anti-interference navigation control algorithm in an environment with interference (Scenario 2), and the node navigating to the target in an environment without interference (Scenario 3). As shown in Figure 8a, the deviation of the node’s navigation trajectory in Scenario 3 is significantly lower than that in Scenario 2. As shown in Figure 8b,c, other navigation parameters of the unmanned boat under the three scenarios also reflect the above results. The test results prove that the adaptive anti-interference navigation control algorithm can effectively reduce the negative impact of environmental interference on the node’s navigation.

Figure 8. Test Results of the Adaptive Anti-Interference Navigation Control Algorithm. (a) Navigation Trajectories of Nodes Under Different Conditions. (b) Velocities of Nodes Under Different Conditions. (c) Heading Angles of Nodes Under Different Conditions.

4. Cooperative Encirclement and Interception Algorithm

In practical scenarios such as target control in complex sea areas anti-smuggling patrols or military confrontations the cooperative interception mission of a multi-unmanned surface vehicle (USV) formation essentially involves constructing a “dead-angle-free blockade” through dynamic cooperation among multiple nodes to achieve efficient containment and control of suspicious targets. The core of the mission lies in achieving effective interception of intruding targets among which encirclement is the interception method that can best leverage the advantages of cooperative operations in multi-agent systems. Throughout the entire process from the initiation of the encirclement mission to target interception each unmanned boat serves as an independent perception and execution node. Each node acquires task situation information of other nodes through information interaction. Through high-frequency multi-dimensional information exchange, each node can break through its own perception range and decision-making limitations, occupy reasonable interception positions, achieve the encirclement of intruding targets, and thereby eliminate their security threats to the guarded targets. Effectively realizing the cooperative control of multi-unmanned boat systems has become the key to solving the encirclement problem. Therefore to effectively encircle and intercept intruding targets that enter the interception radius of unmanned boats this section proposes a cooperative encirclement and interception algorithm. The algorithm consists of two stages: target navigation control and cooperative encirclement. Through the progressive design of the two stages it outputs adaptive encirclement and interception actions in real time according to different movement states of the target and ultimately improves the overall interception efficiency of the system.

4.1. Target Navigation Control Stage

In the cooperative interception mission of multi-unmanned boat systems the movement state of the intruding target is uncertain. When the unmanned boats move to the preset interception positions the target position will change continuously which puts forward a key requirement for the navigation control of the unmanned boats they need to quickly generate effective navigation control commands after obtaining the target point information to reach the predicted interception positions as soon as possible thereby minimizing the probability of the target escaping. For this reason in response to the above mission requirements this section introduces a target navigation control algorithm for the encirclement process aiming to solve the target navigation control problem in the cooperative interception of multi-unmanned boats.

4.1.1. Algorithm Training

(1) State value

The movement of a node at a given moment is determined by its state value at that moment. Therefore, the predicted position information map of the intruding target, the cooperation information map, and the obstacle information map obtained by

U S V_{i}

at time

k

are taken as its state value at that moment, i.e.,

S_{i} (k) = \{{\hat{μ}}_{I_{j}} (k), C_{i} (k), O b_{i} (k)\}

. Based on the node’s position prediction model,

U S V_{i}

can obtain the predicted positions of adjacent nodes within its communication radius at time

k

for the next moment. The predicted position information of adjacent nodes for the next moment is added to a grid map of size

A

to form the cooperation information map

C_{i} (k)

of

U S V_{i}

at time

k

, as shown in Equation (27).

C_{i} (k) = \{v_{i, g} (k) |g \in A\}

(27)

When

v_{i, g} (k) = 1

, it indicates that area

g

will be occupied by other unmanned boats in the future. When

v_{i, g} (k) = 0

, it indicates that area

g

will not be occupied by other unmanned boats in the future. Considering the 100–500 ms delay in real-world communication, this algorithm performs time compensation for the position prediction of neighboring nodes. Let the communication delay be

Δ t

(in actual measurements,

Δ t

follows a uniform distribution between 100 ms and 500 ms). When calculating the predicted position of neighboring nodes, the USV adjusts the prediction time step from

Δ_{I_{j}} (k)

to

Δ_{I_{j}} (k) + Δ t

, and extends the multi-step average prediction by one additional step based on Section 2.3 to offset position deviations caused by the delay. Meanwhile, a delay marker is added to the cooperative information map

C_{i} (k)

: when

Δ t

> 300 ms, the corresponding neighboring node area is marked as an “uncertain region” (

v_{i, g} (k) = 0.5

), guiding the USV to sail at a reduced speed of 0.5 m/s in this region to lower collision risks.

For dynamic obstacles their volume is small and they are treated as nodes for analysis and processing during the obstacle avoidance process. Both dynamic obstacles and static obstacles are added to a grid map of size

A

and this gives the obstacle information map

O b_{i} (k)

of

U S V_{i}

.

O b_{i} (k) = \{o_{i, g} (k) |g \in A\}

(28)

When

o_{i, g} (k) = 1

, it indicates that for

U S V_{i}

at time

k

, there are dynamic obstacles within its node safe navigation distance threshold

R_{a c o l l i s i o n}

. When

o_{i, g} (k) = 2

, it indicates that for

U S V_{i}

at time

k

, there are static obstacles within its obstacle safe navigation distance threshold

R_{o c o l l i s i o n}

. When

o_{i, g} (k) = 0

, it indicates that for

U S V_{i}

at time

k

, there are no obstacles within the ranges of both

R_{a c o l l i s i o n}

and

R_{o c o l l i s i o n}

.

The position prediction information

{\hat{μ}}_{I_{j}} (k)

of

I_{i}

is added to the grid map

A

to form the predicted position information map

M_{{\hat{I}}_{j}} (k)

of

U S V_{i}

at time

k

.

M_{i} (k) = \{m_{i, g} (k) |g \in A\}

(29)

m_{i, g} (k) = 1

indicates that area

g

is the target navigation point of

U S V_{i}

at time

k

. Except for this point,

m_{i, g} (k) = 0

for all other areas.

(2) Action value

The movement space of an agent is closely related to its dynamic and kinematic characteristics. The action

a_{i} (k)

of

U S V_{i}

at time

k

is shown in the following formula:

a_{i} (k) = \{τ_{i, u} (k), τ_{i, r} (k)\}

(30)

In Equation (30),

a_{i} (k)

is the control input of

U S V_{i}

at time

k

.

τ_{i, u} (k)

and

{τ (k)}_{i, r}

are two continuous variables, representing the forward thrust and yaw moment of

U S V_{i}

at time

k

respectively.

(3) Reward function

The reward function [33] in this scenario needs to be designed according to the actual conditions of the scenario. Before designing the reward function it is first necessary to clarify the task objectives of the node in the target navigation control stage. First unmanned boats should always approach the target position during navigation. Second collisions between nodes and obstacles should be avoided to ensure the navigation safety of unmanned boats during navigation. The training in the target navigation control phase adopts the Actor-Critic architecture of reinforcement learning based on the DDPG algorithm, which can simultaneously optimize the weighted fusion of distance rewards and obstacle avoidance rewards. Corresponding reward functions are set for nodes according to these constraints.

Distance reward by enabling

U S V_{i}

to obtain the corresponding distance reward value

r_{i}^{d i s t} (k)

at time

k

nodes can approach their respective target positions. The following definitions are made before setting this reward value.

Definition 3:

First mark

I_{j}

that has obtained the predicted position coordinates as

{\hat{I}}_{j}

. Let the distance between

U S V_{i}

and

{\hat{I}}_{j}

at time

k

be

D i s t_{{\hat{I}}_{j}} (k)

. Let the distance between

U S V_{i}

and

{\hat{I}}_{j}

at time

k - 1

be

D i s t_{{\hat{I}}_{j}} (k - 1)

. The absolute difference between

D i s t_{{\hat{I}}_{j}} (k)

and

D i s t_{{\hat{I}}_{j}} (k - 1)

is the distance change of

U S V_{i}

from

{\hat{I}}_{j}

at time

k

expressed as:

D i s t_{d e v i a t i o n} (k) = |D i s t_{{\hat{I}}_{j}} (k) - D i s t_{{\hat{I}}_{j}} (k - 1)|

(31)

Definition 4:

The difference between

D i s t_{d e v i a t i o n} (k)

and

D i s t_{d e v i a t i o n} (k - 1)

is defined as the distance change trend of

U S V_{i}

at time

k

, expressed as:

Δ D i s t_{d e v i a t i o n} (k) = D i s t_{d e v i a t i o n} (k) - D i s t_{d e v i a t i o n} (k - 1)

(32)

Combining Definition 3 and Definition 4, the value of

r_{i}^{d i s t} (k)

is given by Equation (33):

r_{i}^{d i s t} (k) = \{\begin{matrix} 1, i f Δ D i s t_{d e v i a t i o n} (k) \leq 0.2 \\ 0.5, i f Δ D i s t_{d e v i a t i o n} (k) \geq 0.2 \land Δ D i s t_{d e v i a t i o n} (k) < D i s t_{g a t e} \\ - 0.5, i f Δ D i s t_{d e v i a t i o n} (k) > D i s t_{g a t e} \end{matrix}

(33)

Here,

D i s t_{g a t e}

is the threshold for evaluating

Δ D i s t_{d e v i a t i o n} (k)

. The value of

D i s t_{g a t e}

is set to 5 m here, which is determined based on the USV interception radius

R_{L}

and target motion characteristics. When

Δ D i s t_{d e v i a t i o n} (k) \leq

0.2 m, the distance reduction rate between the USV and the target is

\geq

0.2 m/step, enabling the USV to approach within the interception radius within 100 steps. When 0.2 m

< Δ D i s t_{d e v i a t i o n} (k) <

5 m, the distance reduction rate falls into the slow approach range, and a medium reward of 0.5 is required to maintain the approaching trend. When

Δ D i s t_{d e v i a t i o n} (k) \geq

5 m, the distance shows a moving away or stagnant state, and a negative reward of −0.5 is needed to force adjustment of the movement direction. This threshold has been verified with Total episode = 1 × 10⁷ in Table 2, which can shorten the average interception time of the USV by 15%.

Table 2. Hyperparameters for the training of the target navigation control algorithm.

The reason for choosing a piecewise linear function for the distance reward is that intruding targets mostly exhibit a movement pattern of “uniform linear motion + sudden steering”. A piecewise linear function can provide differentiated rewards for scenarios of uniform approaching (

Δ D i s t_{d e v i a t i o n} (k) \leq

0.2 m) and steering evasion (

Δ D i s t_{d e v i a t i o n} (k) \geq

5 m), thus avoiding delayed reward feedback caused by gradual deviation changes in continuous functions. Moreover, in multi-USV cooperative scenarios, the piecewise linear function can achieve consistent reward standards among nodes through a unified

D i s t_{g a t e}

threshold, preventing some nodes from excessively pursuing approaching speed while ignoring obstacle avoidance due to differences in function parameters.

Obstacle avoidance reward: To avoid collisions between operating nodes and other nodes in the system as well as dangerous areas a corresponding obstacle avoidance reward

r_{i}^{o a} (k)

is designed. The obstacle avoidance reward in the algorithm of this paper consists of node obstacle avoidance reward

r_{i}^{n o d e} (k)

and regional obstacle avoidance reward

r_{i}^{o b s t a c l e} (k)

.

r_{i}^{n o d e} (k)

is designed using the obstacle information map

O b_{i} (k)

and based on the artificial potential field method [34]. Using the artificial potential field method the node calculates the mutual interaction dangerous force

F_{i j} (k + 1)

between each node in the system and the vector sum of their acting forces

F_{i} (k + 1)

. Then

r_{i}^{n o d e} (k)

of

U S V_{i}

after taking the corresponding action

a_{i} (k)

at time

k

is calculated according to

F_{i} (k + 1)

.

r_{i}^{n o d e} (k)

is calculated by Equation (34):

r_{i}^{n o d e} (k) = \{\begin{matrix} {- e}^{δ_{d a n g e r o u s} |F_{i} (k + 1)|}, i f F_{i} (k + 1) \neq 0 \\ |N a_{i} (k)|, e l s e \end{matrix}

(34)

Here,

δ_{d a n g e r o u s}

is the dangerous force cost parameter, and

N a_{i} (k)

is the set of regions with a value of 1 in

O b_{i} (k)

.

r_{i}^{o b s t a c l e} (k)

is also designed using the information in the obstacle information map

O b_{i} (k)

, and its calculation process is shown in Equation (35):

r_{i}^{o b s t a c l e} (k) = \{\begin{matrix} - (r_{s i z e} - \min D i s t_{i j} (k + 1)) i f \exists D i s t_{i j} (k + 1) \in D i s t_{o c o l l i s i o n} (k + 1) < r_{s i z e} \\ |N o_{i} (k)|, \forall D i s t_{i j} (k + 1) \in D i s t_{o c o l l i s i o n} (k + 1) > r_{s i z e} \lor D i s t_{o c o l l i s i o n} (k + 1) = \{\emptyset\} \end{matrix}

(35)

N o_{i} (k)

is the set of areas with a value of 2 in

O b_{i} (k)

.

Finally,

r_{i}^{a v o i d c o l l i s i o n} (k)

is shown in Equation (36):

r_{i}^{o a} (k) = β_{n o d e} r_{i}^{n o d e} (k) + β_{o b s t a c l e} r_{i}^{o b s t a c l e} (k)

(36)

To enable different reward items to play distinct roles during the training process and ultimately allow the system to learn strategies that meet mission requirements, a constant vector

β

is used to weight different types of reward functions. The reward

r_{i}^{n o d e c o l l i s i o n} (k)

obtained by

U S V_{i}

after taking the corresponding action at time

k

is given by Equation (37):

r_{i} (k) = β^{T} r = {[\begin{matrix} β_{d i s} \\ β_{o a} \end{matrix}]}^{T} [\begin{matrix} r_{i}^{d i s t} (k) \\ r_{i}^{o a} (k) \end{matrix}]

(37)

To train the target navigation control algorithm proposed in this section, a Python 3.12 based simulation environment was built using Google’s TensorFlow. As shown in Figure 9, a training task area of 1000 m × 1000 m was planned in this task scenario.

N_{I}

intruding targets and

N_{O}

obstacles were randomly placed in this area. The starting positions of the unmanned boats were set randomly. The positions of the intruding targets were the target navigation points (end positions) of the unmanned boats. All intruding targets were in a uniform motion state. The training framework was TFMUSV. The training parameters and training process are shown in Table 2.

Figure 9. Algorithm training scenario.

4.1.2. Algorithm Performance Testing

Performance tests were conducted on the trained target navigation control algorithm. Different numbers of nodes and obstacles were set in different test scenarios. Test Scenario 1 has

N_{U} = 6

,

N_{I} = 3

. Test Scenario 2 has

N_{U} = 9

,

N_{I} = 4

. Test Scenario 3 has

N_{U} = 5

,

N_{I} = 3

. It can be seen from the navigation trajectories of each node in the system under Test Scenario 1 given in Figure 10 that after adopting the target navigation control algorithm the distance between the nodes in the system and their assigned intruding targets gradually decreases. It can be seen from Figure 10 that although the positions of the intruding targets are constantly changing the nodes can still adjust their navigation directions in a timely manner according to the real-time positions of the intruding targets continuously approach the intruding targets and finally enter the effective interception area of the intruding targets. This is a circular area with the intruding target as the center and a radius of

R_{L}

. It can also be seen from Figure 10 that during the approach of the unmanned boats to the targets no collisions occurred between nodes or between nodes and obstacles indicating that the algorithm can ensure the safety of the unmanned boats during navigation. The data given in Table 3 also confirms the above conclusions.

Figure 10. Navigation Trajectories of Each Node in the System Controlled by the Target Navigation Control Algorithm. (a) Test Scenario 1. (b) Test Scenario 2.

Table 3. Performance Test Results of the Target Navigation Control Algorithm.

4.2. Cooperative Encirclement and Control Stage

During the encirclement phase of cooperative interception by a multi-USV (Unmanned Surface Vessel) system, the movement coordination among individual vessels faces severe challenges. Due to the limited sensing range of a single vessel and the potential evasive maneuvers of the target, relying solely on independent navigation control can easily lead to the collapse of the encirclement formation and the emergence of interception gaps. This imposes precise requirements on multi-vessel cooperative control. It is necessary to integrate the status of each vessel and the dynamic information of the target in real time and quickly generate distributed cooperative commands to maintain a stable encirclement situation and reduce the target’s escape space.

To address the above phase characteristics and mission requirements, this section will elaborate on the cooperative encirclement control method during the encirclement process. The aim is to solve the problems of formation maintenance and joint field control in the cooperative interception of multi-USV formations.

4.2.1. Algorithm Training

(1) State value

The state value of this algorithm phase is consistent with that in the previous chapters, which has been detailed and analyzed in Section 4.1.1 and will not be repeated here. Add the position information

μ_{I_{j}} (k)

of

I_{j}

to the grid map

A

to form the position information map

L_{i} (k)

of

U S V_{i}

at time

k

:

L_{i} (k) = \{l_{i, g} (k) |g \in A\}

(38)

l_{i, g} (k) = 1

indicates that

U S V_{i}

has detected an intrusion targets in region

g

at time

k

. For all other regions where no intrusion target is detected,

l_{i, g} (k) = 0

.

(2) Action value

The action value in this algorithm phase is consistent with that in the previous chapters. It has been introduced and analyzed in detail in Section 4.1.1, so it will not be repeated here.

(3) Reward function

The training of this Cooperative Encirclement and Control Stage adopts reinforcement learning with the PPO algorithm. The Critic network of PPO can more accurately fit the value function of multi-dimensional rewards, which significantly reduces the collision rate in target escape scenarios.

Before designing the reward function, it is first necessary to clarify the task objectives of the nodes in the cooperative encirclement and control phase. First, each

U S V_{i}

should approach the corresponding target position during the task. Second, each

U S V_{i}

should be evenly distributed around the corresponding intrusion target during the task. Third, it is necessary to ensure the navigation safety of each node in the system during navigation and avoid collisions between nodes and obstacles. Corresponding reward values are set for the nodes based on these constraints.

Distance reward: By enabling

U S V_{i}

to obtain the corresponding distance reward value

r_{i}^{d i s t} (k)

at time

k

, all nodes can approach the corresponding target position. The specific calculation method of

r_{i}^{d i s t} (k)

has been provided in Section 3.2.3 and will not be repeated here.

Distribution reward: The purpose of setting the distribution reward

r_{i}^{d i s t r i b u t e d}

is to enable each

U S V_{i}

to be evenly distributed around the corresponding intrusion target during the task, thereby achieving the encirclement and interception of the intrusion target.

r_{i}^{d i s t r i b u t e d}

is designed based on the artificial potential field method. Assuming that the gravitational force of all

U S V_{i}

on the corresponding intrusion target

I_{j}

is 1, the resultant force

F_{I_{j}} (k)

received by

I_{j}

at this time is calculated by Equation (39).

F_{I_{j}} (k) = \frac{{[\sum_{i = 1}^{N_{u}} \cos θ_{i}]}^{2} + {[\sum_{i = 1}^{N_{u}} \sin θ_{i}]}^{2}}{N_{u}} \geq 0

(39)

In the equation,

θ_{i}

is the angle between the line connecting the centers of

I_{j}

and the

i

-th

U S V_{i}

and the horizontal axis of the coordinate system with

I_{j}

as the origin. In the cooperative interception task, the algorithm is designed to enable each

U S V_{i}

to be evenly distributed around the corresponding intrusion target, thereby achieving encirclement and interception of the target. Therefore, the ideal position distribution

T_{U S V_{i}}

of

U S V_{i}

in the process of intercepting

I_{j}

should satisfy Equation (40).

T_{U S V_{i}} = \frac{2 π}{N_{U I_{j}}}

(40)

Among them,

N_{U I_{j}}

is the number of nodes intercepting

I_{j}

. At this point, the magnitude of the resultant force exerted by

U S V_{i}

on

I_{j}

is 0. Conversely, the larger the value of

F_{I_{j}} (k)

, the more uneven the distribution of

U S V_{i}

relative to

I_{j}

. Before determining this reward value, the following definition is made: the absolute difference between

F_{I_{j}} (k)

and

F_{I_{j}} (k - 1)

is defined as the resultant force change of

U S V_{i}

with respect to

I_{j}

at time

k

, which is expressed as:

Δ F_{I_{j}} (k) = F_{I_{j}} (k) - F_{I_{j}} (k - 1)

(41)

Δ F_{I_{j}} (k) < 0

indicates that the distribution of

U S V_{i}

tends to be uniform, and a positive reward should be given, i.e.,

r_{i}^{d i s t r i b u t e d} > 0

.

Δ F_{I_{j}} (k) > 0

indicates that the distribution of

U S V_{i}

tends to be non-uniform, and a penalty should be applied, which means a negative reward should be given, i.e.,

r_{i}^{d i s t r i b u t e d} < 0

. Therefore,

r_{i}^{d i s t r i b u t e d}

is determined by Equation (42).

r_{i}^{d i s t r i b u t e d} = 2 [\frac{1}{1 + e^{Δ F (k)}} - \frac{1}{2}]

(42)

There is a reason for choosing the S-shaped function to design the distributed reward

r_{i}^{d i s t r i b u t e d}

. The design has been verified through simulations to directly correlate the reduction in interception time with the improvement of interception probability. Firstly, the output range of the S-shaped function is [−1,1] which can generate continuous and smooth reward feedback for small fluctuations of

Δ F_{I_{j}} (k)

. When

Δ F_{I_{j}} (k)

decreases from 0.1 to −0.1 the distribution tends to be uniform the reward value gently rises from −0.05 to 0.05. This avoids sudden changes in the USV’s course caused by step rewards and ensures the interception stability of final uniform distribution of nodes in Table 4. Compared with the linear function, the S-shaped function reduces the heading angle fluctuation of the USV formation during adjustment by 30% indirectly shortens the interception delay caused by formation reconstruction and meets the demand of smooth reward gradient adapting to cooperative stability. Moreover, when

Δ F_{I_{j}} (k) < - 0.5

the reward value of the S-shaped function increases rapidly. When

Δ F_{I_{j}} (k) > - 0.5

the penalty value increases rapidly. This suppresses the resource waste behavior of a single node being excessively close to the target and allows the reward increment characteristic to guide the optimal distribution.

Table 4. Training Hyperparameters for the Cooperative Encirclement and Interception Algorithm.

The distributed reward mechanism of this algorithm forms a collaborative closed loop with the Assignment of Intruding Targets in Section 2.2 ensuring that the reward optimization direction is consistent with the agent’s target assignment. For the priority allocation of more USV nodes to high-threat targets in Section 2.2 the S-shaped function is calculated through the resultant force

F_{I_{j}} (k)

for example when the node distribution is unbalanced and

F_{I_{j}} (k) = 0.8

the reward value is −0.47 which forcibly guides nodes to adjust to a uniform distribution ensures no waste of interception resources for high-threat targets and shortens the interception time of such targets by 20% compared with low-threat targets. Combined with the rule of assigning nodes by distance in Section 2.2 the distributed reward correlates the distance between nodes through

F_{I_{j}} (k)

when the distance difference between the assigned nodes and the target is ≤50 m the reward value of the S-shaped function can be stabilized at 0 ± 0.1 encouraging nodes to maintain a collaborative state where nearby nodes do not seize resources and distant nodes do not lag behind avoiding the resource imbalance problem of multiple nodes concentrating on attacking a single target in the CARACaS architecture increasing the system resource utilization rate from 65% to 88% and indirectly reducing interception delays caused by resource redundancy.

Obstacle Avoidance Reward: In this task, the obstacle avoidance requirements and application background of the nodes are exactly the same as those in the previous chapters. Therefore, the obstacle avoidance reward value

r_{i}^{o a} (k)

obtained by

U S V_{i}

at time

k

is also set in accordance with the previous chapters. That is, it consists of two parts: the node obstacle avoidance reward

r_{i}^{n o d e} (k)

and the obstacle avoidance reward for obstacles

r_{i}^{o b s t a c l e} (k)

, which will not be repeated here.

To enable different reward items to play different roles in the training process and ultimately allow the system to learn a strategy that meets the task requirements, a constant vector

β

is used to weight various types of reward values. Thus, the reward value

r_{i} (k)

obtained by

U S V_{i}

at time

k

is shown in Equation (43).

r_{i} (k) = β^{T} r = {[\begin{matrix} β_{d i s t} \\ β_{d i s t r i b u t e d} \\ β_{o a} \end{matrix}]}^{T} [\begin{matrix} r_{i}^{d i s t} (k) \\ r_{i}^{d i s t r i b u t e d} (k) \\ r_{i}^{o a} (k) \end{matrix}]

(43)

To train the algorithm proposed in this section, a Python-based simulation environment was built using Google’s TensorFlow. In this task scenario, a training task area of

1000 m \times 1000 m

was planned. A multi-USV system composed of

N_{U}

unmanned surface vessels with good local perception and communication capabilities, and

N_{I}

intrusion targets move within this area. The task objective of the intrusion targets is to approach

T_{G u a r d}

as quickly as possible, so they all sail along the route with the shortest distance to

T_{G u a r d}

. Centered on each

I_{j}

, an interception and encirclement area

Ω

with a radius of

R_{L}

was planned. The initial positions of all

U S V_{i}

are located within the

Ω

formed by the corresponding

I_{j}

. During the training process, each episode terminates when the limited time steps are exceeded, all intrusion targets are successfully encircled and intercepted, or

T_{G u a r d}

is successfully invaded.

The success condition for the multi-USV system to intercept is defined as follows. For any intruder

I_{j}

, there exists an unmanned surface vessel

U S V_{i}

such that the distance to the intruder satisfies

0 < D i s t_{I_{j} T_{G u a r d}} (k) - D i s t_{U S V_{i} T_{G u a r d}} (k) < r_{d a n g e r o u s}

. The successful intrusion of an intruder is defined as follows:

D i s t_{I_{j} T_{G u a r d}} (k) < r_{d a n g e r o u s}

. Here,

r_{d a n g e r o u s}

represents the dangerous radius of the guarded target, and

D i s t_{I_{j} T_{G u a r d}} (k)

represents the distance between the intrusion target

I_{j}

and the guarded target

T_{G u a r d}

at time

k

. The calculation method of

D i s t_{I_{j} T_{G u a r d}} (k)

is shown in Equation (44).

D i s t_{I_{j} T_{G u a r d}} (k) = \sqrt{{(x_{T_{G u a r d}} - x_{I_{j}} (k))}^{2} + {(y_{T_{G u a r d}} - y_{I_{j}} (k))}^{2}}

(44)

The training framework is TFMUSV. The training hyperparameters are shown in Table 4.

4.2.2. Algorithm Performance Testing

In the following part of this section, the performance of the trained algorithm is tested during the test there are two test scenarios Test Scenario 4:

N_{U} = 3, N_{o} = 3, N_{I} = 1

Test Scenario 5:

N_{U} = 5, N_{o} = 3, N_{I} = 1

From the navigation trajectory diagram of each node in the system shown in Figure 11, it can be seen that after adopting the cooperative encirclement and interception algorithm, the distance between the nodes in the system and their assigned intrusion targets gradually decreases moreover, although the position of the intrusion target is constantly changing, the nodes can still adjust their navigation direction in a timely manner according to the real-time changes in the intrusion target’s position to continuously approach the intrusion target during the process of the nodes sailing towards the intrusion target, there is no collision between nodes, or between nodes and obstacles the final positions of the nodes are distributed at the positions required for encircling the intrusion target the distance between each node and its corresponding intrusion target remains unchanged and meets the successful interception condition, thus finally completing the cooperative interception task successfully.

Figure 11. Navigation Trajectories of Nodes During the Encirclement and Interception Process. (a) Test Scenario 4. (b) Test Scenario 5.

5. Simulation Test

5.1. Test Evaluation Parameters

To verify whether the algorithm proposed in this paper can enable the multi-USV system to complete the cooperative interception task efficiently and accurately, the algorithm proposed in this paper is tested in a simulation environment. Before conducting the test, the test evaluation parameters are first specified. The test evaluation parameters consist of the following parts, which are the average distance between the final interception position of the intrusion target and the guarded target

T_{g u a r d}

(

I D_{e p i s o d e}

), and three evaluation parameters:

D_{t o t a l}

, CRBAO, and CRBAA.

D_{t o t a l}

represents the total navigation distance (In large-scale tests,

D_{t o t a l}

increases with the growth of the initial distance between nodes and targets.), CRBAO represents the collision rate with obstacles, and CRBAA represents the collision rate between nodes.

I D_{e p i s o d e}

can be calculated by Equation (45).

I D_{e p i s o d e} = \frac{\sum_{i = 1}^{N_{e p i s o d e}} I D_{e p i s o d e}^{i}}{N_{e p i s o d e}}

(45)

Among them,

I D_{e p i s o d e}^{i}

represents the sum of distances between all intrusion targets in the test scenario and

T_{g u a r d}

after the end of an episode, where

I D_{e p i s o d e}^{i} = \sum_{j = 1}^{N_{I}} {D i s t}_{{I_{j} T}_{g u a r d}}

.

{D i s t}_{{I_{j} T}_{g u a r d}}

is the distance between the final position of the intrusion target

I_{j}

and

T_{g u a r d}

. An episode ends when the system meets the condition for successful encirclement and interception or the intrusion target successfully invades. The condition for successful interception is that for any intruder

I_{j}

, there exists a

{U S V}_{i}

such that the distance to the intruder satisfies

0 < {D i s t}_{{I_{j} T}_{g u a r d}} - {D i s t}_{{U S V}_{i} T_{g u a r d}} < r_{d a n g e r o u s}

. The condition for a successful intrusion by an intrusion target is that

\exists {D i s t}_{{I_{j} T}_{g u a r d}} \leq r_{d a n g e r o u s}

.

The calculation method of CRBAO is given by Equation (46), and that of CRBAA is given by Equation (47):

C R B A O = \frac{\sum_{i = 1}^{N_{e p i s o d e}} \frac{N_{c o l l i s i o n o b s t a c l e s}^{i}}{N_{s t e p}^{i}}}{N_{e p i s o d e}}, i = 1,2, . . ., N_{e p i s o d e}

(46)

C R B A A = \frac{\sum_{i = 1}^{N_{e p i s o d e}} \frac{N_{c o l l i s i o n a g e n t s}^{i}}{N_{s t e p}^{i}}}{N_{e p i s o d e}}, i = 1,2, . . ., N_{e p i s o d e}

(47)

Among them,

N_{e p i s o d e}

denotes the total number of episodes.

N_{c o l l i s i o n a g e n t s}^{i}

represents the number of collisions between nodes in a specific episode,

N_{s t e p}^{i}

denotes the total number of actions performed by the system nodes in that episode, and

N_{c o l l i s i o n o b s t a c l e s}^{i}

represents the number of collisions between nodes and obstacles in a specific episode.

Considering the variability of actual combat battlefields, the subsequent tests will be divided into two major scenarios based on the differences in actual operating conditions of USV cooperative encirclement. The first is the test when the target has no countermeasure capability, which focuses on verifying the encirclement efficiency and formation stability of the algorithm when the target sails at a constant speed in a straight line without maneuvering to escape. The second is the test when the target has countermeasure capability, by simulating typical countermeasure behaviors of the target such as sudden acceleration, steering evasion, and feint interference, the dynamic response speed, cooperative strategy self-adjustment ability, and anti-interference performance of the algorithm are comprehensively evaluated.

5.2. Test When the Target Has No Countermeasure Capability

To more objectively reflect the performance of the algorithm proposed in this paper, the multi-USV limit cycle encirclement and interception algorithm based on neural oscillators (referred to as the Limit Cycle Encirclement and Interception Algorithm for short) is selected as the comparative test algorithm in the test. This algorithm has a wide application foundation in the field of multi-agent cooperative control, and its limit cycle control logic forms a typical difference from the dynamic coordination idea of the algorithm in this paper, making it suitable for horizontal performance comparison. Three test scenarios are set according to different system scales and environmental conditions. They are Test Scenario 6:

N_{U} = 5

,

N_{o} = 3

,

N_{I} = 1

, Test Scenario 7:

N_{U} = 6

,

N_{o} = 3

,

N_{I} = 2

, and Test Scenario 8:

N_{U} = 9

,

N_{o} = 3

,

N_{I} = 3

. The intrusion targets are always in a uniform speed navigation state. The initial distance between all intrusion targets and all nodes in the multi-USV system is much larger than

R_{L}

. All nodes in the multi-USV system and

T_{g u a r d}

are always in a communicable state. It is assumed that

T_{g u a r d}

has obtained the movement situation information of all intrusion targets in the task area before the test starts and can continuously track the intrusion targets. Each test scenario is tested 150 times, with every 30 tests as a group.

This subsection will select Test Scenario 7 as the key analysis object. The purpose of setting the parameter configuration in the scenario of 1000 m × 1000 m is to focus on the core contradictions of multi-USV cooperative interception (dynamic target prediction, adaptive environmental interference). Through high-density scenarios, the performance of the algorithm under ‘limited resources and concentrated interference’ is verified, providing basic parameters for subsequent large-scale tests. It can specifically verify the core logic of the algorithm proposed earlier in the links of dynamic target tracking and cooperative formation adjustment, and its test results have strong representativeness and persuasiveness. From the navigation trajectories of each node in the system shown in Figure 12 and the distance changes between each node in the system and the corresponding intrusion target shown in Figure 13 it can be seen that in the target approaching phase although the position of the intrusion target is constantly changing the nodes under the control of the algorithm can still adjust their navigation direction in a timely manner according to the change in the intrusion target’s position and continuously approach the intrusion target. However there is a certain difference in the speed at which the nodes approach the intrusion target under the control of the two algorithms. It can be seen from Figure 13 that when the system is controlled by the algorithm proposed in this chapter at

k = 10 m

in the average distance between the nodes and the corresponding intrusion targets decreases to about 160 m. Within the same time when the system is controlled by the Limit Cycle Encirclement and Interception Algorithm the average distance between the nodes and the corresponding intrusion targets is about 180 m. Under the control of both algorithms there is no collision between nodes or between nodes and obstacles. In the encirclement and interception phase the final positions of the nodes under the control of both algorithms are distributed at the positions required for encircling the intrusion target and the distance between the nodes and their intrusion targets remains unchanged thus successfully encircling the intrusion target in the end.

Figure 12. Navigation Trajectories of Nodes During the Test Process. (a) The Algorithm in This Paper. (b) Limit Cycle Encirclement and Interception Algorithm.

Figure 13. Distance Between Nodes and Intrusion Targets. (a) The Algorithm in This Paper. (b) Limit Cycle Encirclement and Interception Algorithm.

The test results of the two algorithms in other test scenarios are presented in Table 5. It can be seen from Table 5 that the average interception time of the system controlled by the proposed algorithm is 30.2 min, while that of the system using the Limit Cycle Encirclement and Interception Algorithm is 35.6 min.

Table 5. Test Results 1 When the Target Has No Countermeasure Capability.

To match actual wide and deep sea scenarios, an additional comparative experiment with a 5000 m × 5000 m test range is added.

In the results of the expanded test range, as shown in Table 6. due to the expansion of the test range, the test time increased by 40–50% compared with the scenario of 1000 m × 1000 m. However, the average interception time of the algorithm in this paper is still 15.5% shorter than that of the limit cycle encirclement interception algorithm, which is basically consistent with the data in the 1000 m × 1000 m range.

Table 6. Test Results After Expanding the Test Range.

When the system is controlled by the algorithm proposed in this paper the

I D_{e p i s o d e}

values under all test scenarios are much larger than

r_{d a n g e r o u s}

. When the system is controlled by the Limit Cycle Encirclement and Interception Algorithm the

I D_{e p i s o d e}

values under all test scenarios are also larger than

r_{d a n g e r o u s}

. A comparison shows that the

I D_{e p i s o d e}

value of the system controlled by the proposed algorithm is significantly larger than that controlled by the Limit Cycle Encirclement and Interception Algorithm, which indicates that the system can achieve a better interception effect under the proposed algorithm. The

D_{t o t a l}

values of the system under the two algorithms presented in Table 7 also confirm the above conclusion. By synthesizing all data in Table 5 and Table 7 it can be seen that under all test conditions the algorithm proposed in this paper enables the system to achieve a better interception effect.

Table 7. Test Results 2 When the Target Has No Countermeasure Capability.

In a 5000 m × 5000 m test scenario, the

I D_{e p i s o d e}

improvement ratio reaches 19.2%, and the collision rate ranges from 0.28% to 0.32%, which is basically consistent with the data in the 1000 m × 1000 m range. The slight performance degradation mainly stems from “accumulated prediction errors of long-distance targets” and “delays in multi-target cooperative scheduling”. The algorithm’s modeling logic remains valid.

5.3. Test When the Target Has Countermeasure Capability

When intrusion targets are intercepted, they often adopt corresponding evasion strategies to counteract interception, which makes it difficult for USVs to effectively encircle and intercept them. In this case USVs need to quickly adjust their positions track the escaping intrusion targets in a timely manner and intercept them again. To verify whether the algorithm proposed in this paper can effectively respond to the countermeasures of intrusion targets corresponding scenarios are set up for testing in this subsection.

The test area is exactly the same as that in the previous subsection. Meanwhile, the test range has been expanded to 5000 m × 5000 m. The initial distance between the intrusion target and all nodes in the multi-USV system is much larger than

R_{L}

. Two evasion strategies are set for the intrusion target. Evasion Strategy 1: The intrusion target escapes in the same direction as the node that first enters its escape radius

r_{e s c a p e}

. Evasion Strategy 2: When a node enters the

r_{e s c a p e}

of the intrusion target randomly selects a direction to escape. Regardless of the evasion strategy chosen the evasion speed

v_{e s c a p e}

of the intrusion target during the evasion process is greater than the movement speed of the nodes. Its evasion time

k_{e s c a p e}

is set to 5 min and it resumes its original navigation speed after the evasion ends.

To focus on the core game scenario and deeply analyze the dynamic response mechanism of the algorithm this subsection selects Evasion Strategy 1 as the key analysis object. Under this strategy the target’s behavior directly counteracts the initial encirclement logic of the USV formation which facilitates the accurate extraction of system performance characteristics. Through the observation and analysis of the experimental process the key stages in the entire interception process and the system response characteristics can be clearly identified.

In the initial tracking phase as shown in Figure 14a the system nodes move based on the target navigation control algorithm proposed in this paper. This algorithm can process the acquired task situation information in real time and convert it into efficient target approach control commands. Through the analysis of the node navigation trajectories it can be observed that all nodes are continuously approaching the intrusion target and exhibit precise target tracking capabilities and dynamic response characteristics under the real-time position changes in the intrusion target.

Figure 14. Navigation Trajectories of the System in the Countermeasure Capability Test. (a) The System Approaches the Interception Target. (b) The Intrusion Target Escapes. (c) The System Approaches the Interception Target Again. (d) The System Encircles the Intrusion Target.

When the distance between the system and the intrusion target reaches the critical value

r_{e s c a p e}

as shown in Figure 14b the intrusion target starts to escape. The escape of the intrusion target results in the failure of the first encirclement and interception. This indicates that the intrusion target’s countermeasure capability increases the system’s interception difficulty and also verifies the necessity for the system to have dynamic adjustment capability. After the intrusion target escapes the system quickly constructs a target position prediction model based on the latest target movement situation information uploaded by

U S V_{i}

and issues corresponding new control commands accordingly. As shown in Figure 14c all nodes of the system cooperate to pursue the escaping target in accordance with the updated control commands. When the intrusion target enters the system’s interception radius as shown in Figure 14d the encirclement and interception phase begins immediately. Finally the system successfully restricts the intrusion target outside the safe distance of the guarded target

T_{g u a r d}

.

The distance change curve between nodes and the intrusion target presented in Figure 15 provides a more intuitive basis for process analysis. From the curve characteristics the key phases in the entire operation process can be clearly identified including two approaching processes the target escape process and the final encirclement process.

Figure 15. Distance Variation Between Nodes and the Intrusion Target at Different Phases of the Countermeasure Test.

Through the data analysis of Table 8 and Table 9 it can be known that when the intrusion target adopts Evasion Strategy 1 and Strategy 2 the average interception time of the system is 33.8 min and 32.6 min, respectively. In the test within the

5000 m \times 5000 m

range, the average interception time reduction ratio reaches 14.8%, the

I D_{e p i s o d e}

improvement ratio is 18.5%, and the collision rate ranges from 0.30% to 0.34%, which is basically consistent with the data in the 1000 m × 1000 m range. This result shows that although different evasion strategies affect the system’s interception efficiency none of them lead to interception failure which confirms that under the control of the algorithm proposed in this paper the system exhibits good task completion capability when facing different evasion strategies.

Table 8. Test Results 1 When the Target Has Countermeasure Capability.

Table 9. Test Results 2 When the Target Has Countermeasure Capability.

6. Discussion

Through the verification of the above algorithm simulation tests, the algorithm proposed in this paper has formed differentiated advantages through technical integration and strategic innovation. Its core lies in the in-depth integration of dynamic prediction and anti-interference control, which solves the problem of insufficient accuracy caused by the disconnection between the two in traditional methods. Meanwhile, it optimizes resource allocation based on threat levels and adapts to adversarial scenarios through a two-stage strategy. This not only makes up for the shortcomings of existing frameworks in the balance of multi-vessel cooperation but also improves the response capability to high-dynamic targets, providing a more targeted technical path for cooperative interception in complex marine environments. From the perspective of technical improvement and engineering implementation, this study still has room for further expansion. The current research mainly focuses on common marine environments and typical task scales, and the adaptability to extreme sea conditions and large-scale multi-target scenarios can be further explored. The integration of dynamic characteristics in the threat level assessment model and the supplement of hardware-in-the-loop testing will help enhance the engineering practicality and robustness of the algorithm.

7. Conclusions and Future Work

This paper conducts in-depth research on the core challenges in the cooperative interception task of multiple Unmanned Surface Vessels (USVs), including the strong dynamics of intrusion targets, complex marine environment interference, and insufficient multi-vessel cooperative accuracy. The adaptive dynamic prediction cooperative interception control algorithm constructs a complete technical framework for “task planning—anti-interference control—phased cooperation”. It lays a foundation for accurate interception through a threat level-oriented target assignment mechanism and an extended Kalman filter multi-step prediction model. Relying on a Two-stage architecture, it separates the cooperative encirclement module from the anti-interference module, effectively offsetting wind and current interference and reducing trajectory deviation and course fluctuation. Through a two-phase strategy of “target navigation—cooperative encirclement”, it optimizes the movement and distribution of nodes to form a stable blockade. Simulation verification shows that compared with the strategy without anti-interference measures, the node trajectory deviation of the adaptive algorithm is reduced by 40% and the course angle fluctuation is reduced by 50%. Compared with the limit cycle encirclement algorithm, the average interception time of this algorithm is shortened by 15%, the average final distance between the intrusion target and the guarded target is increased by 20%, and the collision rate (CRBAO and CRBAA values) is less than 0.3% when facing target escape, which significantly improves the interception efficiency and robustness in complex scenarios.

Although the algorithm in this paper shows excellent performance in simulation scenarios, there is still room for further deepening and expansion in the optimization of anti-interference mechanisms in extremely complex environments and the expansion of high-dynamic multi-target confrontation scenarios. Future research will focus on the deepening and engineering of the algorithm. For extreme marine environments (e.g., typhoons and strong turbulence), an interference prediction model integrating “data-driven—physical modeling” will be constructed, which will train a deep learning sub-model using historical measured data and combine the constraints of fluid mechanics equations to realize advanced prediction of coupled interference, and explores distributed communication optimization based on federated learning. Each USV functions as a local learning node, training a personalized transmission strategy leveraging its proprietary communication quality data (including packet loss rate and latency). Subsequently, the global strategy is updated via federated aggregation, which mitigates the degradation of collaborative efficiency induced by individual communication discrepancies and further enhances communication reliability as well as collaborative stability in complex marine environments. The multi-target cooperative confrontation scenario will be expanded, and a fusion framework of multi-agent reinforcement learning and differential game will be introduced. At the same time, hardware-in-the-loop testing will be carried out to verify the performance of the algorithm under sensor noise and communication delay, and a human–machine hierarchical decision-making mode will be built to further strengthen the framework of the multi-USV dynamic prediction cooperative interception control algorithm. In the future, the test will be further extended to a 10 km × 10 km open-sea scenario, aiming to focus on verifying “multi-node cross-regional collaboration” and “prediction correction under satellite communication delays”.

This study lays a theoretical and practical foundation for the engineering application of the multi-USV cooperative interception system and provides a highly adaptable technical solution for tasks such as maritime security and anti-smuggling. The proposed algorithm successfully connects the integration path of dynamic prediction theory, anti-interference control and multi-agent cooperative technology, opening up a new direction for intelligent interception operations in complex marine environments.

Author Contributions

Conceptualization, Y.L.; Data curation, B.T.; Formal analysis, L.L.; Funding acquisition, Y.L.; Investigation, G.L.; Methodology, Y.L.; Project administration, G.L.; Resources, L.L.; Software, B.T. and X.X.; Supervision, Y.L. and Z.B.; Validation, B.T. and Z.B.; Visualization, Y.L. and S.G.; Writing—original draft, Y.L.; Writing—review and editing, B.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by Intelligent Fisheries Research Center of Anhui Provincial Key Laboratory of Aquaculture Project kfkt202501, in part by Anhui Provincial University Research Plan Project 2024AH050613.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare there are no conflicts of interest regarding the publication of this paper.

References

Gu, Y.; Wang, P.; Rong, Z.; Wei, H.; Yang, S.; Zhang, K.; Tang, Z.; Han, T.; Si, Y. Vessel intrusion interception utilising unmanned surface vehicles for offshore wind farm asset protection. Ocean. Eng. 2024, 299, 117395. [Google Scholar] [CrossRef]
Zhang, C.; Zeng, R.; Lin, B.; Zhang, Y.; Xie, W.; Zhang, W. Multi-USV cooperative target encirclement through learning-based distributed transferable policy and experimental validation. Ocean Eng. 2025, 318, 120124. [Google Scholar] [CrossRef]
Yan, X.; Yang, X.; Feng, B.; Liu, W.; Ye, H.; Zhu, Z.; Shen, H.; Xiang, Z. A navigation accuracy compensation algorithm for low-cost unmanned surface vehicles based on models and event triggers. Control Eng. Pract. 2024, 146, 105896. [Google Scholar] [CrossRef]
Umsonst, D.; Sarıtaş, S.; Dán, G.; Sandberg, H. A Bayesian Nash Equilibrium-Based Moving Target Defense Against Stealthy Sensor Attacks. IEEE Trans. Autom. Control 2024, 69, 1659–1674. [Google Scholar] [CrossRef]
Yakıcı, E.; Karatas, M.; Eriskin, L.; Cicek, E. Location and Routing of Armed Unmanned Aerial Vehicles and Carrier Platforms Against Mobile Targets. Comput. Oper. Res. 2024, 169, 106727. [Google Scholar] [CrossRef]
Mammadov, K.; Lim, C.C.; Shi, P. Unified Optimality Criteria for the Target–Attacker–Defender Game. J. Control Decis. 2024, 11, 572–589. [Google Scholar] [CrossRef]
Zadka, B.; Tripathy, T.; Tsalik, R.; Shima, T. A Max-Consensus Cyclic Pursuit Based Guidance Law for Simultaneous Target Interception. In Proceedings of the 2020 European Control Conference (ECC), Saint Petersburg, Russia, 12–15 May 2020; IEEE: New York, NY, USA, 2020; pp. 662–667. [Google Scholar]
Karras, G.C.; Bechlioulis, C.P.; Fourlas, G.K.; Kyriakopoulos, K.J. Formation Control and Target Interception for Multiple Multi-rotor Aerial Vehicles. In Proceedings of the 2020 International Conference on Unmanned Aircraft Systems (ICUAS), Athens, Greece, 1–4 September 2020; IEEE: New York, NY, USA, 2020; pp. 85–92. [Google Scholar]
Fischer, V.; Legrain, A.; Schindl, D. A Benders Decomposition Approach for a Capacitated Multi-vehicle Covering Tour Problem with Intermediate Facilities. In Integration of Constraint Programming, Artificial Intelligence, and Operations Research; Dilkina, B., Ed.; Springer Nature: Cham, Switzerland, 2024; Volume 14742, pp. 277–292. [Google Scholar]
Mirza, I.S.; Shah, S.; Siddiqi, M.Z.; Wuttisittikukij, L.; Sasithong, P. Task Assignment and Path Planning of Multiple Unmanned Aerial Vehicles Using Integer Linear Programming. In Proceedings of the TENCON 2023 IEEE Region 10 Conference (TENCON), Chiang Mai, Thailand, 31 October 2023–3 November 2023; IEEE: New York, NY, USA, 2023; pp. 547–551. [Google Scholar]
Xu, F.; Zhang, X.; Chen, H.; Hu, Y.; Wang, P.; Qu, T. Parallel Nonlinear Model Predictive Controller for Real-Time Path Tracking of Autonomous Vehicle. IEEE Trans. Ind. Electron. 2024, 71, 16503–16513. [Google Scholar] [CrossRef]
Li, W.; Geng, J.; Cheng, Y.; Tang, L.; Duan, J.; Duan, F.; Li, S.E. Real-Time Resilient Tracking Control for Autonomous Vehicles Through Triple Iterative Approximate Dynamic Programming. IEEE Trans. Intell. Transp. Syst. 2025, 26, 1015–1028. [Google Scholar] [CrossRef]
Chen, G.; Wang, Y.; Zhou, Q. Optimal H∞ Control Based on Stable Manifold of Discounted Hamilton-Jacobi-Isaacs Equation. arXiv 2024, arXiv:2410.02272. [Google Scholar]
Dong, X.; Zhang, H.; Ming, Z. Adaptive Optimal Control via Q-Learning for Multi-Agent Pursuit-Evasion Games. IEEE Trans. Circuits Syst. II Express Briefs 2024, 71, 3056–3060. [Google Scholar] [CrossRef]
Li, D. Slightly Altruistic Nash Equilibrium for Multi-agent Pursuit-Evasion Games With Input Constraints. arXiv 2022, arXiv:2211.05924. [Google Scholar]
Liang, L.; Deng, F.; Lu, M.; Chen, J. Analysis of Role Switch for Cooperative Target Defense Differential Game. IEEE Trans. Autom. Control 2021, 66, 902–909. [Google Scholar] [CrossRef]
Liang, L.; Deng, F. A Differential Game for Cooperative Target Defense with Two Slow Defenders. Sci. China Inf. Sci. 2020, 63, 229205. [Google Scholar] [CrossRef]
Zhai, L.; Vamvoudakis, K.G. A Data-Based Moving Target Defense Framework for Switching Zero-Sum Games. In Proceedings of the 2021 IEEE Conference on Control Technology and Applications (CCTA), San Diego, CA, USA, 9–11 August 2021; IEEE: New York, NY, USA, 2021; pp. 931–936. [Google Scholar]
Xu, G.; Chen, G.; Cheng, Z.; Hong, Y.; Qi, H. Consistency of Stackelberg and Nash Equilibria in Three-Player Leader-Follower Games. IEEE Trans. Inf. Forensics Secur. 2024, 19, 5330–5344. [Google Scholar] [CrossRef]
Singh, S.K.; Reddy, P.V.; Vundurthy, B. Study of Multiple Target Defense Differential Games Using Receding Horizon-Based Switching Strategies. IEEE Trans. Control Syst. Technol. 2022, 30, 1403–1419. [Google Scholar] [CrossRef]
Valianti, P.; Papaioannou, S.; Kolios, P.; Ellinas, G. Multi-Agent Coordinated Interception of Multiple Rogue Drones. In Proceedings of the GLOBECOM 2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
Hou, K.; Yang, Y.; Yang, X.; Lai, J. Distributed Cooperative Search Algorithm With Task Assignment and Receding Horizon Predictive Control for Multiple Unmanned Aerial Vehicles. IEEE Access 2021, 9, 6122–6136. [Google Scholar] [CrossRef]
Gong, H.; Li, Z.; Lu, C.; Gong, J. Towards Online Risk Assessment for Human-Robot Interaction: A Data-Driven Hamilton-Jacobi-Isaacs Reachability Approach. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; IEEE: New York, NY, USA, 2023; pp. 6062–6067. [Google Scholar]
Li, S.; Cheng, X.; Shi, F.; Zhang, H.; Dai, H.; Zhang, H.; Chen, S. A Novel Robustness-Enhancing Adversarial Defense Approach to AI-Powered Sea State Estimation for Autonomous Marine Vessels. IEEE Trans. Syst. Man Cybern. Syst. 2025, 55, 28–42. [Google Scholar] [CrossRef]
Shi, M.; Shi, H. Planning Model of Air Defense Kill Webs Based on Genetic Algorithm. In Proceedings of the 2024 11th International Conference on Dependable Systems and Their Applications (DSA), Suzhou, China, 2–3 November 2024; IEEE: New York, NY, USA, 2024; pp. 289–296. [Google Scholar]
Sadeghi, Z.; Matwin, S. Anomaly Detection for Maritime Navigation Based on Probability Density Function of Error of Reconstruction. J. Intell. Syst. 2023, 32, 20220270. [Google Scholar] [CrossRef]
Kirdi, S.M.; Scarano, N.; Oberti, F.; Mannella, L.; Di Carlo, S.; Savino, A. CARACAS: VehiCular ArchitectuRe for detAiled Can Attacks Simulation. In Proceedings of the 2024 IEEE Symposium on Computers and Communications (ISCC), Paris, France, 26–29 June 2024; pp. 1–6. [Google Scholar]
He, J.; Hu, J.; Zhang, H. Survey of CARACaS Autonomy Framework and its Sensing, Control and Collaborative Techniques. Command Control Simul. 2022, 44, 1–19. [Google Scholar]
Liu, Y.; Xu, X.; Li, G.; Lu, L.; Gu, Y.; Xiao, Y.; Sun, W. Cooperative Patrol Control of Multiple Unmanned Surface Vehicles for Global Coverage. J. Mar. Sci. Eng. 2025, 13, 584. [Google Scholar] [CrossRef]
Jiang, Z.; Sun, X.; Wang, W.; Zhou, S.; Li, Q.; Da, L. Path planning method for maritime dynamic target search based on improved GBNN. Complex Intell. Syst. 2025, 11, 296. [Google Scholar] [CrossRef]
Li, Z.; Zhang, Y.; Cheng, C.; Yan, Y. Optimization of Multiagent Collaboration for Efficient Maritime Target Search and Probe Tasks Allocation. IEEE J. Ocean. Eng. 2025, 50, 1836–1854. [Google Scholar] [CrossRef]
Lu, N.; Zhou, W.; Yan, H.; Fei, M.; Wang, Y. A two-stage dynamic collision avoidance algorithm for unmanned surface vehicles based on field theory and COLREGs. Ocean. Eng. 2022, 259, 111836. [Google Scholar] [CrossRef]
Wang, S.; Yu, W. Reinforcement Learning Reward Function Evaluator for USV Straight-Path Following. In Proceedings of the 2024 International Conference on Intelligent Ships and Electromechanical System (ICISES), Guangdong, China, 27–29 December 2024; pp. 139–142. [Google Scholar]
Kailas, A.J.; Raj, A.P.; Abhilash, S. Collision avoidance for autonomous surface vessels using novel artificial potential fields. Ocean Eng. 2023, 288, 116011. [Google Scholar] [CrossRef]

Figure 1. Target-Attacker-Defender Problem Model.

Figure 2. Multi-USV System Cooperative Interception Task Flow.

Figure 3. Schematic Diagram of Interception Angle.

Figure 4. Schematic Diagram of the Two-stage Architecture.

Figure 5. Lateral Force on the Unmanned Boat Under Wind Action.

Figure 6. Deviation of Unmanned Boat Movement Under Current Action.

Figure 7. Training Task Area.

Figure 8. Test Results of the Adaptive Anti-Interference Navigation Control Algorithm. (a) Navigation Trajectories of Nodes Under Different Conditions. (b) Velocities of Nodes Under Different Conditions. (c) Heading Angles of Nodes Under Different Conditions.

Figure 9. Algorithm training scenario.

Figure 10. Navigation Trajectories of Each Node in the System Controlled by the Target Navigation Control Algorithm. (a) Test Scenario 1. (b) Test Scenario 2.

Figure 11. Navigation Trajectories of Nodes During the Encirclement and Interception Process. (a) Test Scenario 4. (b) Test Scenario 5.

Figure 12. Navigation Trajectories of Nodes During the Test Process. (a) The Algorithm in This Paper. (b) Limit Cycle Encirclement and Interception Algorithm.

Figure 13. Distance Between Nodes and Intrusion Targets. (a) The Algorithm in This Paper. (b) Limit Cycle Encirclement and Interception Algorithm.

Figure 14. Navigation Trajectories of the System in the Countermeasure Capability Test. (a) The System Approaches the Interception Target. (b) The Intrusion Target Escapes. (c) The System Approaches the Interception Target Again. (d) The System Encircles the Intrusion Target.

Figure 15. Distance Variation Between Nodes and the Intrusion Target at Different Phases of the Countermeasure Test.

Table 1. Hyperparameters for Training the Anti-Interference Navigation Algorithm Module.

Training Hyperparameters	Value	Training Hyperparameters	Value
Batch size	1024	Discrete coefficient	10
Experience replay buffer $D$	10⁶	$φ$	9°
Discount factor $γ$	0.95	$ρ_{a}$	1.29 kg/m³
Learning rate $α$	0.01	$A$	1.5 m²
Total number of episode E	20,000	$C_{D}$	0.8
Maximum number of steps per episode K	130	$v_{w}$	8 m/s

Table 2. Hyperparameters for the training of the target navigation control algorithm.

Training Hyperparameters	Value	Training Hyperparameters	Value
Batch size	1024	Total episode	1 × 10⁷
Experience replay buffer $D$	10⁶	$D i s t_{g a t e}$	10 m
Discount factor $γ$	0.95	$β_{d i s t}$	0.55
Learning rate $α$	0.01	$β$	0.78
Max episode	20,000	Maximum speed of the node	3 m/s

Table 3. Performance Test Results of the Target Navigation Control Algorithm.

	$T_{e p i s o d e}$	$D_{t o t a l}$	CRBAA	CRBAO
Test Scenario 1	25.3 min	2.5 km	0.11%	0.17%
Test Scenario 2	31.3 min	3.3 km	0.23%	0.19%
Test Scenario 3	22.6 min	2.1 km	0.25%	0.23%

Table 4. Training Hyperparameters for the Cooperative Encirclement and Interception Algorithm.

Training Hyperparameters	Value	Training Hyperparameters	Value
Batch size	1024	Total episode	1 × 10⁷
Experience replay buffer $D$	10⁶	$β_{d i s t r i b u t e d}$	0.62
Discount factor $γ$	0.95	$β_{d i s t a n c e}$	0.55
Learning rate $α$	0.01	$β_{a v o i d c o l l i s i o n}$	0.78
Max episode	20,000	$r_{d a n g e r o u s}$	10 m

Table 5. Test Results 1 When the Target Has No Countermeasure Capability.

$The N_{t h}$ Experiment	Average Time Consumed Per Group (Min)
$The N_{t h}$ Experiment	The Algorithm in This Paper	Limit Cycle Encirclement and Interception Algorithm
The 1st to the 30th	28	35
The 31st to the 60th	31	36
The 61st to the 90th	30	37
The 91st to the 120th	29	36
The 121st to the 150th	33	34
Total Average Time Consumed	30.2	35.6

Table 6. Test Results After Expanding the Test Range.

$The N_{t h}$ Experiment	Average Time Consumed Per Group (Min)
$The N_{t h}$ Experiment	The Algorithm in This Paper	Limit Cycle Encirclement and Interception Algorithm
The 1st to the 30th	42	51
The 31st to the 60th	43.8	52.5
The 61st to the 90th	45.2	53.1
The 91st to the 120th	44.5	52.8
The 121st to the 150th	46.1	54.3
Total Average Time Consumed	44.3	52.7

Table 7. Test Results 2 When the Target Has No Countermeasure Capability.

	The Algorithm in This Paper	Limit Cycle Encirclement and Interception Algorithm
	$I D_{e p i s o d e}$ $, D_{t o t a l}$ , CRBAA, CRBAO	$I D_{e p i s o d e}$ $, D_{t o t a l}$ , CRBAA, CRBAO
Test Scenario 6	25.2 m, 7.8 km, 0.23%, 0.20%	20.5 m, 9.3 km, 0.38%, 0.27%
Test Scenario 7	26.3 m, 8.4 km, 0.35%, 0.18%	20.9 m, 9.5 km, 0.43%, 0.33%
Test Scenario 8	28.4 m, 9.2 km, 0.38%, 0.24%	19.8 m, 9.9 km, 0.51%, 0.40%

Table 8. Test Results 1 When the Target Has Countermeasure Capability.

$The N_{t h}$ Experiment	Average Time Consumed Per Group (Min)
$The N_{t h}$ Experiment	Evasion Strategy 1	Evasion Strategy 2
The 1st to the 30th	34	39
The 31st to the 60th	35	28
The 61st to the 90th	32	33
The 91st to the 120th	31	29
The 121st to the 150th	37	34
Total Average Time Consumed	33.8	32.6

Table 9. Test Results 2 When the Target Has Countermeasure Capability.

	Evasion Strategy 1	Evasion Strategy 2
	$I D_{e p i s o d e}$ $, D_{t o t a l}$ , CRBAA, CRBAO	$I D_{e p i s o d e}$ $, D_{t o t a l}$ , CRBAA, CRBAO
Test Scenario 6	17.4 m, 9.9 km, 0.13%, 0.16%	24.5 m, 8.6 km, 0.33%, 0.18%
Test Scenario 7	19.8 m, 10.7 km, 0.25%, 0.22%	21.9 m, 11.5 km, 0.40%, 0.33%
Test Scenario 8	21.4 m, 11.5 km, 0.18%, 0.28%	22.8 m, 9.8 km, 0.29%, 0.24%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.