Distributed Dynamic Predictive Control for Multi-AUV Target Searching and Hunting in Unknown Environments

: The research and development of the ocean has been gaining in popularity in recent years, and the problem of target searching and hunting in the unknown marine environment has been a pressing problem. To solve this problem, a distributed dynamic predictive control (DDPC) algorithm based on the idea of predictive control is proposed. The task-environment region information and the input of the AUV state update are obtained by predicting the state of multi-AUV systems and making online task optimization decisions and then locking the search area for the following moment. Once a moving target is found in the search process, the AUV conducts a distributed hunt based on the theory of potential points, which solves the problem of the reasonable distribution of potential points during the hunting process and realizes the formation of hunting rapidly. Compared with other methods, the simulation results show that the algorithm exhibits high efﬁciency and adaptability.


Introduction
It is always a challenging task to explore and develop the enormous, complex, and hazardous marine environment, and an autonomous underwater vehicle (AUV) is the best technical means to deal with the current challenges as it is an underwater device with good concealment, flexibility in underwater movement, economic applicability, and other technical characteristics of high-tech devices [1]. The reconnaissance efficiency of a single AUV is low because of realistic conditions such as limited energy consumption and restricted communication. Therefore, greater multi-AUV collaborative operation is required. In order to accomplish underwater tasks better, the AUV needs to have capabilities of adaptive searching, decision-making and dynamic target hunting. Therefore, how to utilize limited resources rationally and coordinate AUVs to complete target searching and hunting are the most critical problems.
For the problem of multiple-agent-system target searches in an unknown environment, extensive research has been carried out [2]. Liu et al. establish the distributed multi-AUVs collaborative search system (DMACSS) and proposed the autonomous collaborative searchlearning algorithm (ACSLA) be integrated into the DMACSS. The test results demonstrate that the DMACSS runs stably and the search accuracy and efficiency of ACSLA outperform other search methods, thus better realizing the cooperation between AUVs and allowing the DMACSS to find the target more accurately and faster [3]. A fuzzy-based bio-inspired neural network approach was proposed by Sun for multi-AUV target searching, which can effectively plan search paths. Moreover, a fuzzy algorithm was introduced into the bio-inspired neural network to make the trajectory of AUV obstacle avoidance smoother. Simulation results show that the proposed algorithm can control a multi-AUV system to complete multi-target search tasks with higher search efficiency and adaptability [4]. For multiple AUVs, Yao et al. proposed bidirectional negotiation with a biased min-consensus (BN-BMC) algorithm to determine the allocation and prioritization of sub-regions to be searched and utilized an adaptive oval-spiral coverage (AOSC) strategy to plan the coverage path within each sub-region. Dubin's curve is the taken path satisfying the AUV kinematic constraint as the transition path between sub-regions [5]. Yue et al. presented a combinatorial option and reinforcement learning algorithm for a target search against unknown environments, which improves the dynamic characteristics by reinforcement learning in the unknown environment to handle real-time tasks [6]. However, this kind of study method is not adaptable in an unknown working environment, with real-time performance and the target search efficiency is reduced. Luo et al. put forward a kind of biological heuristic in unknown environments such as a neural network algorithm [7]. The weight transfer attenuation process between neurons is constructed in the environment so as to construct a real-time map and realize the robot's search in the unknown twodimensional environment. However, due to the limited sonar detection range, this method can only be applied to target searches in local unknown environments. An improved PSO-based approach was proposed by Cai et al. for robot cooperative target searching in unknown environments, which used the potential field function as the fitness function of the particle swarm optimization (PSO) algorithm [8]. The unknown areas are divided into different search levels and the collaborative rules are redefined. Dadgar et al. proposed a distributed method based on PSO to overcome robot workspace limitations so that the PSO-based algorithm could achieve the overall optimum under a global mechanism [9]. A local PSO algorithm based on information fusion and sharing was proposed by Saadaoui et al. for collaborative a target search algorithm [10]. In addition, the probabilistic map and the deterministic map were established for searching a location based on the map data. Finally, the path was planned by PSO algorithm for further confirmation of the target. A reinforcement learning algorithm was applied to snake-game by Wu et al. to simulate the process of an unmanned aerial vehicle (UAV) searching for targets in an unknown area to simulate the process of a snake searching for fruit [11]. S Ivić et al. proposed the search algorithm of the dynamic system traversal theory for the specific case of MH370, combining optimal search theory with traversal theory [12].
For the problem of hunting a dynamic target, Wang et al. came up with a method to let robots form a pursuit team through negotiation, and then the team carried out cooperative pursuit according to the motion multi-target cooperative pursuit algorithm [13]. Wang et al. proposed an optimal capture strategy based on potential points, and the ant colony algorithm was used to realize capture and collision avoidance during the AUV task [14]. Taking into account port security issues, Meng et al. proposed a predictive planning interception (PPI) algorithm for dynamic targets [15]. The underwater vehicle predicts the target position by tracking the target, and the path to the target is planned by artificial potential field method in advance for interception. However, due to the randomness and uncontrollability of moving targets, it is often difficult to achieve advanced interception. Cao et al. used polynomial fitting to dynamically update the sampling points to find the navigation rules of the dynamic target, and then the reinforcement learning algorithm was used to find the shortest path to capture the target [16].
In conclusion, in view of the various interference factors in the unknown underwater environment and the limitations of its own search capacity and energy expenditure, the distributed dynamic predictive control (DDPC) search method is proposed. For the problem of hunting dynamic targets, a dynamic distribution method is proposed to make a reasonable distribution of potential points around the target, which allows the AUV to improve the accuracy and the rapidity of hunting.
The structure of this article is divided into following parts: Section 2 shows the model establishment; the process of searching and hunting are presented in Section 3; Section 4 describes DDPC algorithm; the details of simulation setup and results are given in Section 5; and ultimately, the conclusions and future work are summarized in Section 6.

The Environment Model
This paper divides the region into two levels [17]. Information such as the traversal status, search results, and area coverage of the area can be better obtained during the execution of the task by the grid division of the search area. The size of the first-level sub-region is related to the distance the AUV moves over a predicted time step, and the second-level sub-region is consistent with the effective detection range of the AUV sonar. The state information structure contained in different level regions and how it is updated is discussed in detail in Section 3.2.1. As shown in Figure 1, the bright red box represents the first-level sub-region while the dark red box represents the second-level sub-region, and the blue number represents the first-level sub-region number and the black number represents the second-level sub-region number.

The AUV Kinematic Model
According to the definition in reference [18], two reference coordinate systems are adopted, namely the earth-fixed reference coordinate system and the body-fixed reference coordinate system, respectively, which include velocity vector V = [u, v, w] T , angular velocity vector, and attitude angle vector η = [φ, θ, ϕ] T , as shown in Figure 2. The AUV kinematic model is as follows: Normally, in a two-dimensional environment, the kinematics model of AUV is usually simplified as:

Forward-Looking Sonar Model
The mathematical model was established according to the real sonar working principle [19]. The detection radius of sonar is R, open angle of horizontal detection is α and the detection angle of depth is β. The array statistic matrix is established according to the open angle range of sonar and judging whether there is a target in the visible range through the array elements in the matrix. It can be simply represented as in Figure 3. The mathematical model between the object and sonar is established. In a twodimensional environment, the targets for which information is available should meet the following requirements: where (x t , y t , z t ) is as follows: where (x, y) represents position coordinate of the object and (x 0 , y 0 ) is the coordinate of sonar under the body coordinate system. Forward-looking sonar is prone to interference by external noise in the process of collecting underwater information, which affects the detected data. Therefore, interference noise is added to the sonar model to simulate the attenuation of the visual field and obstruction of obstacles, which are specifically described as follows: where y x−q is the environmental characteristic information detected by sonar, L is the sonar effective detection range, h is the sonar detection function under noise-free interference, l n is the distance between the detected target, and sonar at time n, and ζ denotes the nonlinear interference. When l n > L or the object to be detected is blocked by obstacles, the environmental characteristics information cannot be detected.

Target Model
It is assumed that all targets are regarded as particles. The model of a static target is shown as:     x(n + 1) .
x n+1 , respectively, represent the coordinate and velocity of the target in the X-axis direction at time n+1, and y n+1 and . y n+1 , respectively, represent the coordinate and velocity in the Y-axis direction at time n + 1. . x n+1 and . y n+1 are identical to 0 when the target is static.
It is assumed that dynamic targets move at the same depth and perform uniform circular motion [20]. The discrete time equation of dynamic targets can be described as follows: x n y n where ω, T represent the turning angular velocity and sampling time, respectively.

Multiple AUV Communication Content
In order to realize multi-AUV cooperative searching and hunting, this paper analyzes the information interaction content of AUV systems. The communication contents are shown in Table 1, which are the AUV state information, target information and sub-region state information. The communication mode is such that each AUV exchanges information in the form of broadcast over a period of time so that each AUV can obtain the information content of the global environment, thus realizing collaborative searching and hunting.

The Searching and Hunting Process
According to the task requirements, the problem of target search for AUV can be described as: N T static and moving targets, N P random obstacles, and N V AUVs are randomly distributed in the mission area. The cooperative control of multiple AUVs is required to search for as many unknown targets as possible with lower search cost and limited time [21].
In the target search process, the AUV marks the location of static targets; for the moving target, the first AUV which discovers moving target is confirmed as an organizer, and then a hunting message is sent to other AUVs according to the distance between the AUV and target. After receiving the message, the AUVs sail around the predetermined hunting position at first, and then narrow down to the required formation to hunt the dynamic target. When the AUV encounters obstacles during the search mission, it switches from the search mode to the obstacle avoidance mode. After successfully avoiding the obstacle, it continues to perform the search task. The search flowchart is shown in Figure 4.

The Distributed Dynamic Predictive Control Algorithm
The core of the DDPC algorithm is to predict the change of environment after each AUV obtains some prior information through communication, and then decides its next action based on the prediction. If the AUV can obtain the state of the mission environment area, multi-AUV system, and target for a period of time in the future when executing the target search decision, the entire system better adapts to the unknown underwater environment, which is the original purpose of the DDPC algorithm. Figure 5 shows the task execution process of the AUV system using the DDPC algorithm.
(1) AUV and environment state feedback: The system feeds back the state information changes of the AUV of the current actuator and the task environment model and the feedback information is used as the input of the system state prediction; (2) System state prediction: The state for the N steps in the future is dynamically predicted by the feedback information, and the predicted state of the current time n is obtained. The predicted state is represented by X(n) = {x(n + 1|n), x(n + 2|n), · · · , x(n + N|n)}; (3) Online task optimization decision: The algorithm is based on distributed dynamic prediction combined with optimization methods for online decision making, confirming the actuator state input information and area state information, which are and are taken as the state inputs; (4) State updates for AUV and task area: updates the state of the actuator and the state information of the entire environmental area through decision input to obtain M(n) and Y(n), respectively, and finally controls the AUV system to perform collaborative target search.

Task Area State
The environment region is divided into N x × M y discrete task first-level sub-regions, and each first-level sub-region contains a state information structure, as follows: M 1xy (n) = A xy (n), B xy (n), T xy (n), P xy (n) (13) where A xy (n) = [0, 1] describes the state of the first-level sub-region (S x , S y ) allocated by dynamic prediction, A xy (n) = 0 indicates that the first-level sub-region has not been allocated by prediction, and A xy (n) = 1 indicates the first-level sub-region has been allocated and is locked by the AUV. When the number of regions that are not predicted to be allocated is less than the AUV, the unlocking action does not proceed; B xy (n) is the traversal state of the sub-region at time n, which is defined as follows: where d xy (n) denotes the distance between the AUV and the allocated sub-region, r xy = α · min(L s x , L S y )(α ∈ (0, 1)) is the distance measure of the traversal degree of the sub-region, L S x , L S y are the length and width of region, and α is a dynamic regulator.
T xy (n) represents the status value of the current first-level sub-region effectively traversed by AUV, which is defined as: P xy (n) ∈ [0,1] represents the degree of certainty of target presence information in the current first-level sub-region. P xy (n) = P xy (l n ) Since the degree of certainty of the target existence information is obtained by observing the environment with sonar, the AUV is required to continuously conduct the search in the allocated area to update the target existence probability in the area. The update equation is as follows: where, τ ∈ [0, 1] is the dynamic coefficient of the certainty degree of the target existence, q n (n) is a binary vector, q n (n) = 0 represents the target is found, and q n (n)= 1 represents the absence of detection. The first-level dynamic predictive search state information set of the AUV is defined as: , 1} represents the state information of the second-level sub-region traversed by the AUV at time n, W ij (n) = 0 represents the region that has not been traversed by the AUV, and W ij (n)= 1 represents the opposite and no other AUVs need to traverse it again. Z ij (n) ∈ {0, 0.5, 1} indicates the three states of the second-level sub-region at the time n. The three states are not locked, locked but not arrived, and locked and arrived.Z ij (n) is expressed as follows: where H ij (n) is the effective traversal state of the second-level sub-region at the time n, which is defined as: where d ij (n) represents the distance from the AUV to the predicted allocation sub-region center, represents the distance measure of the traversal degree of the second-level sub-region.D b i , D b j represents the length and width of the second-level sub-region, and β is the dynamic adjustment factor. Similar to the certainty of target presence information update method in the first-level sub-region, the update equation of the currently locked second-level sub-region is defined as: The parameter setting of Equation (21) is consistent with the parameter of Equation (17). The dynamic prediction search states of all the second-level sub-regions in a first-level sub-region are expressed as: The dynamic predictive search states of the entire task environment region can be expressed as:

The AUV State
Assuming that the AUVs are executing the target search task at a certain depth of a horizontal plane, the state of each AUV is denoted as X i a (n) = [ps i (n), ψ i (n)], where ps i (n) represents the position of the i-th AUV, the position coordinate is (x i a (n), y i a (n)), and ψ i (n) represents the heading angle of the AUV. The optimal decision input of the AUV is is the sailing speed of the AUV at time n, and r i (n) is the heading deflection angle of the AUV, thus the state equation Q i of the AUV is: According to the decision input of AUV, the sailing distance in a period of time in the future is calculated, and the distance is mapped to the coordinate axis according to the AUV heading angle at the current time to obtain increment (∆x, ∆y). The specific calculation is shown in the following formula:

The Function of Decision-Making
The purpose of the multi-AUV cooperative target search is to find targets and determine target information as much as possible within a certain mission area. Therefore, the following requirements need to be met: (1) Reduce the cost of multi-AUV cooperative target search; (2) Improve the determination degree of target information in the task area; (3) Allocate search area reasonably.
J(O(n), X(n), U(n)) describes the comprehensive revenue of the multi-AUV system [22], which is a multi-objective synthesis function that needs to satisfy several conditions to make the final optimal solution. This function comprehensively considers the regional target discovery revenue J T , environment target search revenue J S , execution cost C v , and the predicted allocation revenue J P of sub-regions in the task environment.
(1) Regional target discovery revenue The regional target discovery revenue of target searching is related to the degree of certainty of the target existence information in the first-level and the second-level subregion allocated by the distributed predictive control method. The specific regional target discovery revenue J T is defined as: where p k S x S y (n) represents the target existence probability of the first-level sub-region where the k-th AUV is located in the task environment, which is related to the position of the AUV and the traversal state of the current sub-region. q S x S y (n) is a binary variable representing whether the target is found. The target is found when the deterministic probability of the target existence is greater than the threshold γ p . The specific definition of q S x S y (n) is expressed as: (2) Environment target search revenue The environment target search revenue is defined to be associated with the reduction of the target information uncertainty in the sub-region within the effective sensor detection range of the k-th AUV. The concept of target information entropy is introduced to describe the revenue, which is specifically defined as: The information entropy H k (n) is expressed as: (3) Execution cost The execution cost of the multi-AUV system represents the comprehensive consumption in the target search process, which is generally expressed as the time consumption or energy consumption in the process of the AUV arriving from the current position to the predicted allocation area. Here, the estimation of N steps in the future is performed, and the specific representation is as follows: (4) Sub-region predicted allocation revenue The sub-region predicted allocation revenue takes the change of the global environmental information due to the change of the sub-region information at time n into account. The algorithm can improve the certainty of regional target information during prediction, thereby reducing the uncertainty of the entire environment. The specific definition is expressed as: where O k s x s y (n) represents the dynamic prediction search states of the first-level sub-regions in the task area.
To sum up, under the conditions of the state of the AUV and the search state of the task area, after the multi-AUV cooperative target search system adopts the control input of the online task optimization decision, the optimization objective function of the entire system J(O(n), X(n), U(n)) is defined as: where, X(n) represents the state of AUV, O(n) represents the search state of the task area, and U(n) is the control input, 0 ≤ ω i ≤ 1 i = 1, 2, 3, 4 is the weight coefficient, and the different weight coefficients reflect the degree of performance preference for the system. The weight coefficient should be adjusted appropriately according to specific task requirements. In addition, as the above revenues have different dimensions, it is necessary to conduct normalization before the summation.

System-State Prediction and Online Optimization Decision-Making
(1) System-state prediction based on rolling optimization Rolling optimization can be used as the solution method for the multi-AUV cooperative target search objective function of distributed dynamic predictive control. Using state equation and objective functions, an optimal rolling model for a multi-AUV system with n-step prediction is established. The system state and control input at time n + m are dynamically predicted at time n. Within a period of time, the overall performance index of the system is denoted as: The rolling model of task optimization decisions for the multi-AUV system at time n is obtained as follows: Finally, the state equation obtained according to the solution of the rolling model is expressed as: X(n + m + 1|n) = f (x(n + m + 1|n), u(n + m + 1|n)), m = 0, 1, · · · , N − 1 O(n + m + 1|n) = φ(O 1xy (n + m + 1|n), O 2ij (n + m + 1|n)), m = 0, 1, · · · , N − 1 In rolling time, the state input sequence of the AUV state space and regional information structure is obtained through online optimization and decision-making, which are U * (n) = {u * (n|n), · · · , u * (n + N − 1|n)} and O * (n) = {o * (n|n), · · · , o * (n + N − 1|n)}, respectively. Then, u * (n) = u * (n|n) in the optimization sequence of the actuator is used as the input of the state of the actuator at the current moment, and o * (n) = o * (n|n) is used as the state input of the task area, thereby changing the decision input of the actuator in the future.
By optimizing all performance indicators of the system, which include the state of the task environment area, the predictive control, and the decision-making optimization input, the optimal decision sequence of the entire system can be obtained. The rolling optimization model based on a certain time window can transform an infinite time domain optimization problem into a series of finite time domain optimization problems. Therefore, it is very suitable for the online dynamic solution process of state input.
(2) The online task-optimization decision In Formula (36), the solution of the cooperative target searching mode of the multi-AUV system through the rolling optimization model is generated by a centralized solution method, which requires a unified modeling of all the actuators and the determination of the central solution node of the system. The node can be unified for all of the multi-AUV system state information X(n) = [X 1 a , X 2 a , · · · , X N v a ] T and task environment area state T and solve the optimal task decision and sub-region state update information for all members of the system. Such a centralized solution is very computationally intensive and time-consuming for a large and complex multi-AUV system. Therefore, this method limits the scale of the multi-AUV system the whole system's decision-making and control capabilities.
When AUVs perform the target search task in each task sub-region, they are decoupled from each other, that is, they exist independently in the global task environment. The only interrelated factor is the state information of the task sub-region and the communication between the AUV. Therefore, in a system for such an independent state, the global state information of the agents can be obtained through the communication network between the AUVs and the state information exchange of the task sub-region based on the distributed dynamic predictive control method so as to achieve the purpose of the multi-AUV system for performing the target search cooperatively. The structure chart of the DDPC for the AUV system is shown in Figure 6. The whole system is decoupled into independent small systems on the basis of distributed dynamic prediction. Supposing that the state equation of the K-th AUV is denoted as f k , then the whole system is shown as follows: The task region state information is set as O k , then the whole system is: Then, the optimization objective function of the entire multi-AUV system can be decomposed into the optimization objective function of each of N v AUVs, with the specific form as follows: where, J k represents the optimization objective function of K-th AUV; λ k is the weight coefficient; and O k (n) denotes the state change of the task region at time n by the AUV. X k (n), U k (n) represent the dynamic prediction state and optimization decision input of the AUV, respectively; O k (n) represents the influence of the other AUVs in the system on the environment state; X k (n) represents the dynamic prediction state of the other AUVs; and U k (n) represents the optimal decision input. The specific representation is shown below: n|n), · · · , x k a (n + N − 1|n) U k (n) = u k a (n|n), · · · , u k a (n + N − 1|n) O k (n) = {O l (n)|k = l, l = 1, 2, · · · , N v } X k (n) = X k a (n)|k = l, l = 1, 2, · · · , N v U k (n) = u k a (n)|k = l, l = 1, 2, · · · , N v Aiming at the global optimization problem of the target search system, it can be decomposed into N v locally finite time domain problems. According to the solution of each AUV separately, the rolling optimization model of the K-th AUV is shown as follows: Then, according to the solving conditions of the rolling model, the optimization decision input and the sub-region state information of each AUV are obtained, as shown in the following formula: Finally, the state and decision variables of each optimization subsystem are obtained as follows: It can be seen that the solution of the local optimization problem also contains the state of other AUV subsystems, the decision variables, and the state changes of sub-region, so the obtained solution is based on the AUV cooperation mechanism. The state of the other members of the system and the effects of decision-making information can be gained through communication. So the K-th AUV state and decision input are only associated with the current local state. The cooperative target search problem of the whole multi-AUV system becomes the optimization problem of the independent AUV and the update problem of the state information of the sub-region, which greatly reduces the optimization scale of the whole system.

Hunting Formation
Considering the problems of the hunting task, this paper proposes a dynamically distributed hunting method which is suitable for hunting formation and transformation. The time that the AUV takes to adjust the heading is taken into account as the time consumption for forming the hunting formation during the analysis of the hunting conditions.
Suppose that there are three hunt executors with the target (red AUV in Figure 7) as the center of the hunting formation. According to the hunting critical diagram, the minimum value of the ratio of the speed of the hunting executor to the target is obtained as follows: where v i a represents the speed of the hunting executor, which is the same for all AUVs, and v D represents the speed of the moving target. By reasonable extrapolation, the general formula of the required speed for the hunting formation of the multiple-AUV system is shown as follows: where i is the serial number of the hunting AUV. According to Formula (45), it can be concluded that the greater the number of AUVs involved in hunting, the smaller the minimum moving speed required. In this paper, the method of reducing the hunting circle is adopted to form an effective hunting formation for moving targets with a fixed number of AUVs. By analyzing this method, it can be concluded that the relation between the field angle θ of the target (as shown in Figure 8), the distance l between the executor and the target and effective detection radius r, is expressed as: In the process when the AUV gets closer to the target to be hunted, with the reduction of the distance from the target, θ increases for the effective detection range, which is fixed, and thus the probability of the target escaping is reduced. When the AUV reaches the effective range to be changed into a hunting formation, it moves to the respective hunting points for formation, as shown in Figure 9.

Formation of the Hunting Potential Point
In order to ensure the rapidity and effectiveness of hunting, it is necessary to develop an appropriate method for the formation of hunting potential points [9]. Assuming that the position of the moving target is D(x D , y D ) and the speed and heading angle are v D and φ, respectively, then the coordinate formula of the hunting point is as follows: where the radius of the virtual hunting circle is l and n represents the number of hunting AUVs. The arc length between the hunting potential points is L.
Assume that the maximum radius of the hunting potential point is r, and the maximum radius of the target is r D . The safe distance between the hunting executors is set as S, and L and r should satisfy the following requirements: where S > r, S > r D , and λ 1 ≥ 1, λ 2 ≥ 1 are the adjustment coefficients of the safety distance. It is assumed that there are n hunting executors, the circumference of the virtual hunting circle is n * L = 2πl, and the relation between the number of hunting executors and parameters mentioned above is obtained after solving the minimum hunting circle radius from the above inequality: After determining the safety distance coefficient as a fixed constant according to the actual situation, the completion of an effective circle of hunting depends on the radius of the potential point of the executor and the target. Thus, with the increase of the number of the hunting executors, the radius of the virtual hunting circle also increases.

The Task Assignment of the Hunting Formation
In the process of the target search, when a dynamic target appears in the sonar field, it triggers the mechanism of the hunting task. In this paper, a triangle formation is adopted, as shown in Figure 10. Here, D is the origin of polar coordinates, θ i is the polar angle, and the heading direction of the target is taken as the polar axis. The polar angles of the current positions of each hunter are calculated and then sorted from smallest to largest as P, in terms of their polar angles. After sorting, the elements in P and T sequentially correspond to achieve the optimal assignment of tasks, as follows: where T is the set of the potential hunting positions. A virtual hunting circle and hunting potential point are formed immediately by the AUV that finds the target, and the information is sent to other AUVs in the environment. Each AUV with a decision-making mechanism decides whether to participate in hunting or not and sends the message back to the organizer. This creates a joint contractual relationship between the hunting members. In this way, the task assignment and role switching of AUVs can be described as the change process of the validity and invalidation of the hunting contract. The specific decision-making mechanism requirements are as follows: (1) If an AUV fails to reach a predetermined position within the time limit after it has been identified as a hunting actuator, the contract becomes invalid and the role is changed; (2) If the required cooperative hunting executors do not all reach the corresponding potential point within the time limit, the contract is re-established; (3) After the target is destroyed, the contract becomes invalid immediately. The initiator of the hunting shall send the message of giving up to other executors in the team for role switching; (4) When the initiator gives up, a message is sent to the other executors about the success of the chase.
The following provisions shall be made in the assignment of tasks to decide whether or not to join the hunting contract: (1) In affirming a commitment to hunt for a target, all other mission roles of the executor in effect of the contract are waived; (2) All AUVs are required to exchange information before the hunting contract becomes effective. The role switch is abandoned when the AUV that is about to sign the hunting contract has confirmed that the team does not need it.

Search Algorithm Verification
By comparing the search results with the random method and the scan line method, this paper verified the high efficiency of the DDPC search algorithm. The experiments were run on a computer with an AMD Ryzen 5 4600H CPU at 3.00 GHz and 16.00 GB RAM.
Three AUVs were set in the simulation environment with 30 random static targets and obstacles with different shapes and positions. The simulation was set to compare the final target search results of each algorithm in 1000 time steps. Each target is marked after it is found by an AUV. The final experimental results of the three methods are displayed below. Figure 11 shows the target search method proposed in this paper, where there are still four targets left to be searched for after the deadline. Figure 12 shows the results of the random search method. Each AUV sailed randomly in the environment, and 16 static targets were found within the specified time. Figure 13 shows that the scan line method failed to cover the whole search environment smoothly and efficiently and 22 static targets were found within the specified time.   (1) Regional coverage; (2) Average number of found targets.
The indexes respectively describe the ratio of the regions searched by the AUVs in a certain task time to the whole environment and the number of targets searched for by the AUVs. The specific formula is shown in (51).
AUVi sonar detection range Task area (51) Figure 14 indicates that the method proposed in this paper can better cover almost all task areas within the specified task time, followed by the scan line method. Due to its unique search method, the scan line method can cover every region it passes through. If the task time is long enough, the scanning line method can achieve full area coverage. However, the heading of the AUV of the random search method is random at each time point, so there are repeated searches in the same area, or it arrives at a certain sub-area and then leaves quickly, so the coverage rate also decreases. It can be seen from the data statistics that the coverage rate of the random search method is also the lowest. In order to explore the influence of the prediction on the algorithm, we compared the influence of different prediction steps on the average computation time of each step and the detection rate of the target after 1500 search steps for 30 static targets, as shown in Figure 15. According to the data in the figure, we chose the predicted steps to be 10.

Hunting Algorithm Verification
In this section, four AUVs and two dynamic targets were set up in the environment to prove the validity of the hunting algorithm. At first, the AUVs were searching in different areas, as shown in Figure 16. Figure 17 shows that AUV 1 found the target and organized the other two AUVs to go to the hunting potential point while the fourth AUV was still searching. As can be seen in the figure, AUV 1 was already at the hunting potential point, while the other 2 AUVs were still heading towards the assigned positions. The formation of the hunt was created as shown in Figure 18. Figure 19 shows that after the dynamic target was destroyed, the formation disbanded and the search mission continued.    As can be seen from Figures 16-19, the hunting algorithm was tested with a few AUVs in a simple environment, and in the next section it is shown to work in complex environments as well.

Cooperative Searching and Hunting Simulation
In order to prove the feasibility and effectiveness of the searching and hunting method proposed in this paper, the simulation environment was set to be a two-dimensional area of 2000 m × 2000 m, and the AUV was set to perform the target search and hunting task at a fixed depth in the horizontal plane. In the simulation environment, the speed of AUV as set at a constant 4 m/s while searching and accelerated to 5 m/s when executing the hunting mission. The maximum angular velocity of turning was r a = π/20 (rad/s). The detection performance parameters of forward-looking sonar were V e = 0.8, V none = 0.2, R = 150 m and α = 120 • . Global communication was considered, obstacles with different shapes were set, and 40 static targets with random positions and 2 dynamic targets with different tracks were set in the simulation environment with obstacles in different shapes. There are 2 dynamic targets and 40 static targets set in the Environment, as shown in Tables 2 and 3. Six AUVs were launched at the position according to Table 4 to perform the target search task with a specified running time of T = 2000 (steps). The operation scenario of the target search is shown in the figures below: In the simulation experiment, the process of multi-AUV task execution at different times was selected. As can be seen from the figure, six AUVs marked static targets when they found them and maintained the formation to hunt and destroy dynamic targets. Figures 20-24 show the process of two dynamic targets being hunted and destroyed. Figure 21 shows that AUV 1 finds dynamic target 1 and then organizes the other two AUVs to hunt according to the target hunting algorithm. However, AUV 3 and AUV 5 are performing a target search. When they become hunting executors, they abandon the current search task and accelerate their movement towards the hunting potential point. After moving target 1 is hunted, AUV 5 found dynamic target 2. AUV 3 and AUV 4, which are not performing the hunting task at this time, become members of the new team and hunt the moving target. When encountering obstacles, the AUV switches to obstacle avoidance mode and does not return to hunting mode until obstacles no longer appear in the view of forward-looking sonar. Each team eventually forms as a triangle formation and maintains the search for a while before the dynamic target is finally destroyed, as shown in Figures 22 and 23. After the hunting formation is disbanded, the target search is carried out again until the time limit is reached or all the targets are searched. Figure 24 shows the final result. Therefore, the effectiveness of the method can be proved and dynamic targets can be successfully hunted.

Conclusions
For the first time, dynamic prediction and online optimization decision-making are conducted based on the environmental region state and AUV state to solve the problem of multi-AUV cooperative searching and hunting. This algorithm divides the large-scale unknown environment faced by the AUV into two-level search sub-regions and establishes a mathematical model. Based on the distributed search theory, the AUV state model and the regional state information update mechanism are introduced. The predicted region state information and AUV input state are obtained through the time window rolling optimization model, and the online optimization decision function is used to solve the regional and AUV state update input, and finally, the purpose of the multi-AUV collaborative target search us realized. When the AUV finds a dynamic target, it hunts the dynamic target and destroys it. Combined with the traditional hunting organization method, a dynamic distribution hunting method is proposed to reasonably allocate the hunting potential points of the moving target so that the AUV can form a hunting formation more quickly. Finally, the simulation verification of the multi-AUV cooperative target searching and hunting is given, which proves the effectiveness of the method.
Because the actual unknown underwater environment is more complex, there are still many problems and deficiencies in the research content that need to be improved in the future, including the following: (1) Communication delay and loss of information; (2) Complex groups of dynamic obstacles; (3) Dynamic targets with multiple motion states; (4) Application in the 3D underwater environment.