Coevolution Pigeon-Inspired Optimization with Cooperation-Competition Mechanism for Multi-UAV Cooperative Region Search

In this paper, a dynamic two-stage closed search (DTSCS) scheme for the unmanned aerial vehicle (UAV) cooperative region search is designed, which satisfies the range constraint (RC) and orientation constraint (OC). The closed trajectory is composed of two coupling stages, the search stage and the return stage. The position and orientation at the end of the search stage are the starting cell and orientation of the return stage. In the first stage, a coevolution pigeon-inspired optimization (CPIO) algorithm based on the cooperation-competition mechanism is proposed for multi-UAV cooperative search. In the return stage, inspired by region searching and trajectory tracking, a search tracking (ST) approach is presented to obtain the lowest-cost path under OC. The simulation results show that: (i) Np = 5 is the best prediction time step. (ii) CPIO algorithm performs better than the compared intelligent algorithms in region searching. (iii) ST has high tracking performance than other algorithms. (iv) The DTSCS scheme enables every UAV to make the best use of its fuel to cover more region and return to the airport within the RC, and the average range utilization of UAVs is 97% under the 3OC.


Introduction
In the wide-area, complex and changeable environment, multiple unmanned aerial vehicles (UAVs) cooperative control is a critical research field [1][2][3][4] of unmanned system. In particular, multi-UAV cooperative search (MUCS) can be applied to prevent forest fires, patrol the border and probe potential safety hazards in cities and other aspects. Therefore, the study of MUCS strategy has practical significance. In [5], Sujit uses k-shortest path algorithm to search the target in an uncertain environment, but this algorithm cannot adapt to the situation that the edge weights of the search cell changes with the number of UAV passes. In [6], Bertuccelli models the uncertainty in the environment as the prior probabilities in the region and uses the Beta distribution to predict the minimum number of times required by UAV to search the target. This method, however, only considers a single UAV.

Problem Formulation
The searching region is usually divided into grid cells. The target existence probability (TEP) represents the probability of target existence in a cell. Environmental uncertainty (EU) indicates the degree to which UAVs do not know about the environment. These two variables are regarded as prior information. Every cell is modeled as the key or non-key region, and then the search probability graph of the whole searching region is formed. UAVs accomplish the search task via environment perception and information interaction. This section includes three parts: the environment model, UAV kinematic model and the analysis of UAV closed search.

Region Searching Model
This subsection describes the basic mathematical model of region searching. Firstly, rasterizing the searching region, and then the environment information is modeled as the prior probability information. The TEP in the whole search environment is updated by Bayesian rule. And the environmental uncertainty of every cell is updated according to search number of UAVs.

Environment Modeling
UAVs are assumed to fly on a fixed plane above the searching region. M UAVs search N t targets. The searching region R is uniformly divided into L x · L y cells: R = {C(m, n) | m = 1, 2, ..., L x ; n = 1, 2, ..., L y }, (1) where C(m, n) is the coordinate of cell (m, n). The side length of a cell is the unit length. Discretizing searching time of U AV i (i = 1, 2, ..., M), U AV i can search a cell in one time step [24,25]. The real-time position of U AV i can be described by the geometric center of the cell it searches.

Probability Map Update
It is known that the TEP P m,n e (k) ∈ [0, 1] and the EU χ m,n e (k) ∈ [0, 1]. If 0 ≤ P m,n e (k) ≤ 0.3 and 0 ≤ χ m,n e (k) ≤ 0.3, the cell (m, n) is modeled as a known region. If 0.3 < P m,n e (k) ≤ 0.7 and 0.3 < χ m,n e (k) ≤ 0.7, the cell (m, n) is modeled as a non-key region. And if 0.7 < P m,n e (k) ≤ 1 and 0.7 < χ m,n e (k) ≤ 1, the cell (m, n) is modeled as a key region. Dangerous areas, such as hostile radar detection regions, peaks, etc., are designated as no-fly zones. Bayesian rule is used to update whether a cell exists a target [26]. The update equations for the probability that a target exists but is not detected and that the target does not exist but is reported are written as follows: where P m,n c (k) ∈ [0, 1] is the detection probability of the airborne sensor to the target [27]. P m,n f (k) ∈ [0, 1] is the false alarm rate, which indicates the probability that there is no target in the cell but a target is reported. χ m,n e (k) decreases with the search number of UAVs, satisfying the following equation: where N f (m, n) ∈ N is the number of times that a cell (m, n) searched by UAVs.

UAV Kinematic Model
The kinematic model of U AV i with constant velocity and OC is obtained as follows: where (x i (t), y i (t), z i (t)) is the position of U AV i , v i is its velocity. θ i is course angle. The fourth term sets the constraint on the angle rate, which cannot exceed ε i . In the grid map, U AV i can only move to one of its eight neighbor cells at a time step, corresponding to the heading D i (k) = {0, 1, 2, 3, 4, 5, 6, 7} [28].

Definition 1. (D-orientation constraint, DOC).
In the grid map, The degree of freedom of the heading in which the UAV is allowed to walk in the next time step is D.
A slow-moving robot usually satisfies 8OC (equivalent to no constraint ) or 4OC , as shown in Figure 1a,b. The fast UAV can not turn backward or turn vertically during flying. Therefore, it is subject to 3OC: go straight (0 • ), turn left (45 • ), and turn right (−45 • ), which can be seen in Figure 1c. The orientation of the next time step of U AV i can be summarized as follows:

Preliminary Analysis
In the research of MUCS, we always discuss the cooperative search strategy under the assumption that there is enough fuel, which is not consistent with the actual situation. Every UAV sets out from the airport and returns to its airport after completing the search task. The flight path is a closed track. We divide the closed search into two stages: search stage and return stage.

Definition 2. (Closed path).
According to the definition of the loop in the directed graph, a closed path is a flight trajectory of a UAV starting form an airport to carry out task and returning back to the same airport.
The closed search in this paper is different from the traveling salesman problem, which to find a solution that traverses all the reachable nodes from the starting point and returns to the starting point with the least travel cost. In this paper, the closed search refers to the closed path that UAV returns to the airport after completing the search task in the grid map.
Rasterizing searing region is consistent with discretizing searching time, so the RC of UAV is equated with its searching time limit. The closed path of the UAV under the RC should be composed of two stages: (1) search stage; (2) return stage. Long searching time means that UAV has a limited time to return and may not even return on time. And the short searching time means that the UAV can not effectively complete the search task. Ideally, the UAV would return to the airport at the time step when it is running out of fuel.

Search Stage
UAVs accomplish the search task via environment perception and information interaction. The principles of MUCS should include the following: The search strategy is no longer considered in the return stage. Our goal is to find a safe and collision-free lowest path under the OC. This path starts with the current cell and orientation and ends at the airport.
For the closed search, the primary purpose of this paper is to solve the following three problems: (1) Set the time step for UAV to return to the airport; (2) Maximize search rewards in the search stage; (3) The purpose of the return phase is to obtain a shortest path that satisfies the OC.

CPIO Algorithm with Cooperation-Competition Mechanism
According to the principle of cooperative search discussed above, we design the reward function of MUCS and regard it as the fitness function of PIO. In [8,29,30] coevolution genetic algorithm (GA) is applied to multi-robot platform to find the optimal solution through the cooperation among robots. In some conditions, the optimal solutions of every robots may sometimes be mutually exclusive. This conflict is consistent with the competition of subgroups in nature. Inspired by the cooperation-competition relationship between subgroups within a biological community in nature, a CPIO algorithm is used as the search heuristics for MUCS in the search stage. The optimal solution obtained via the CPIO algorithm is the position of a target, which is consistent with the attribute of region searching.

Design Reward Function
The MUCS process is modeled as a multiobjective nonlinear programming function with RC, OC, no-fly zone and collision free and the model is given in Equations (7) and (8): s.t.
where ω 1 , ω 2 , ω 3 , ω 4 are weighting factors, which satisfy ω 1 , ω 2 , ω 3 , ω 4 ∈ (0, 1) and ω 1 + ω 2 + ω 3 + ω 4 = 1. R c is the searchable region in R. T i,1 (k) and T i are the searching time and the range of 2 denotes the Euclidean distance of U AV i and U AV j . N p ≥ 1 is a positive integer. The first and second terms of the J 1 are designed to maximize search rewards, which corresponding to the first and second terms of the principles. The reduction of EU of the cell (m, n) can be defined as follows: The third of the J 1 is consistent with the fourth of the principles and is designed to avoid collision between UAVs: where d min (k) is the minimum element of the d ij (k) at time step k, and d ij (k) is given in Appendix A Equation (A1). When the distance between any two UAVs is less than or equal to √ 2 unit lengths, the reward of this term will be −100. And then the rewards of J 1 will also be negative. U AV i can only choose to search cells away from other UAVs. With regard to the fourth term of the J 1 , it implies that the further away U AV i is from the airport, the more rewards it obtains (principle 5): where d ∑ (k) is the sum of all the elements in d ih (k), which is written in Appendix A Equation (A2).

Definition 4. (N p -step-ahead prediction, N p SAP).
Learning from the idea of rolling optimization in model predictive control theory [31]. In each term of J 1 designed in Equation (7), the rewards of the reachable cells to be calculated is not only the next time step, but also the next N p time steps.

Remark 1.
Although n steps are predicted in advance each time, U AV i flies forward only one-time step.
The series of reachable waypoints of N p SAP of U AV i maps the predicted cells sequence .., N p ) from the step (k + 1) to the future time step (k + N p ), which is based on the present position L i (k) at time k. The reachable cells included in 3SAP construct an expanding tree, as shown in Figure 2a. Obviously, the expanding tree generated by N p SAP contains 3 N p alternative paths and the l-th path can be illustrated as: where , the path 2 will be chosen. For an extreme situation, if N p = T i,1 , the path with the largest reward in E i (k) must be the optimal path in the whole search process. As N p grows, the number of paths increases exponentially. Therefore, we use PIO algorithm to find the optimal solution that is difficult for conventional optimization algorithms.

Overview of Basic PIO
Inspired by natural phenomenon of the autonomous homing behavior of pigeon swarms, Duan proposes the PIO algorithm in [14], which the optimization process can be divided into two operators based on the distance of the pigeons from the destination.

Operator 1: Map and compass operator
During the prometaphase of the search, pigeons is far away from the goal. The real-time information of the sun is abstracted into map and compass operator to adjust the flight orientation, which is a rough navigation process. Suppose the number of pigeons is C 1 . The position and velocity of pigeon a , (a = 1, 2, ..., C 1 ) in the two-dimensional (2D) plane is expressed as: The renewal equations of position and velocity are: where u = 1, 2, ..., N c1 is the current iteration number. R p is the coefficient of the map and compass operator. rand is a random number from 0 to 1. L best denotes the position of the pigeon closest to the goal in the pigeons in iteration u − 1. After N c1 iterations, the rough navigation stage is completed. PIO algorithm enters the second stage of optimization.

Operator 2: Landmark operator
When the pigeons arrive near the target, the PIO algorithm switches to the landmark operator. The landmark information of the nearby environment will provide precise guidance information. The speed of the pigeon does not change at this stage, while the position is updated: where v = 1, 2, ..., N c2 is the current iteration number.
is the number of pigeons in v iteration. L v−1 center denotes the position of the central pigeon in v − 1 iteration. f itness(·) is the fitness function. The optimal solution will be obtained after Operator 2 is performed N c2 iterations.

CPIO and Cooperative Search
In biology, a population may divide into some subgroups [32,33]. In the face of natural enemies, the subgroups will cooperate to resist. Nevertheless, they also compete with each other in the interests of food, mating, territory, and so on. Cooperation and competition make the population survive and evolve better. To simulate these natural behaviors, UAVs work together to complete the search task via information interaction, in the mean time UAVs compete with each other to search the specific vital cells. We propose a CPIO algorithm based on the cooperation-competition mechanism as the search algorithm for MUCS, as shown in Figure 3. In which, one subgroup pigeons is abstracted as one UAV. The cooperation-competition relationship between pigeons reflects the cooperative relationship between UAVs.
MUCS based on CPIO is composed of three parts: CPIO, UAVs, and environment model. The principle of which is shown in Figure 4. In the CPIO module, the initial information of the environment, the environment information detected by the UAVs and the real-time pose signals of the UAVs are modeled as prior information. This information is then transmitted to the reward function J 1 . Subgroup i transforms the optimal solution obtained by the cooperative-competitive mechanism into discrete flight signals. And these signals will be transmitted to U AV i at each time step. Thus, the cooperation-competition mechanism can be described as follows:

•
Get the maximum P m,n e (k) and the maximum ∆χ m,n e (k) • Stay away from airports and search for more unknown regions • Avoid searching locally highly remunerated cells

Competition mechanism
• Forbid UAVs to search for the same cell at the same time step • Avoid UAVs repeatedly searching the same cell • Stay away from other UAVs to search more unknown cells Under all constraints, the subgroup that maximizes the total rewards will win. Cooperation and competition are not entirely opposed to each other. The winning subgroup will search the controversial cell, and the other subgroups compete to search other cells. The result of the competition mechanism is consistent with the original intention of the cooperation mechanism.  Let f itness(·) = J 1 , thus Equation (15) can be transformed into:

Competition-Cooperation
In the searching region, L u a and L v a are neighbor cells of L u−1 a and L v−1 a , respectively. Under the constraint of Equation (6), the position (orientation) in the PIO algorithm can be expressed as: The MUCS process based on the CPIO algorithm can be expressed as Algorithm 1.

Algorithm 1 MUCS based on CPIO algorithm
Input: Initializing environmental information and pose information of U AV i . Output: The searching trajectory of every UAV and its searching time steps T i,1 (k). 1: begin: 2: for k = 1, ..., T i do 3: Predict N time steps in advance, and get the alternative paths. 4: Delete the paths that does not satisfy Equation (8).

5:
Use the CPIO algorithm to find the path with the highest fitness value among the remaining paths. 6: while U AV i receives the return order, U AV i returns to the airport. 7: Record the searching trajectory and searching time steps T i,1 (k) when U AV i returns. 8: end while 9: end for 10: end

ST Approach
Inspired by the knowledge of cooperative search and trajectory tracking [34,35], we propose the ST approach. The lowest path a without the OC of motion is mapped to a trajectory to be followed, and the cells it passes through are modeled as the key regions. Other blank cells are shaped as non-key regions, and obstacles are designed as no-fly zones. The input signal is the orientation sequence to maximize the search rewards of U AV i . The process of tracking path a is the same as that of a UAV searching from the starting cell to the goal cell. Since the ST approach is only applied to a single UAV, there is no need to consider the cooperation-competition relationship between UAVs. ST approach based on basic PIO algorithm consists of four operators.

Operator 1: Obtain path a
We use A* algorithm [23] to find path a , and A* algorithm can be simplified as the following steps: Step 1: Specify the OC for the UAV.
Step 2: Design a cost function f (m, n) = g(m, n) + h(m, n), where g(m, n) denotes the movement cost: corresponds to the expenditure of moving the current position (m, n) to other cell moved into the neighbor. h(m, n) denotes the heuristic cost: corresponds to the expenditure of changing from current cell to goal cell. When g(m, n) = 0, A* algorithm degenerates to Dijkstra algorithm. When h(m, n) = 0, A * algorithm degenerates to greedy best first search algorithm.
Step 3: Estimate total expenditure h(m, n) and change to the cell with the least cost.
Step 4: Repeat Step 3 until the goal cell is reached.
Step 5: When reaching the goal, choose the final path with least cost.

Operator 2: Mark key cells
Referring to the concept of the TEP in the region searching, the blank cells are set as the non-key regions, which implies P m,n e,i (k) = 0. The cells that the path of U AV i path a,i passes through are marked as the key regions, and their P m,n e,i (k) are denoted as follows: where s i = 1, 2, ..., N m,i is the serial number of the mark points from the starting cell to the goal cell in path a,i . s i = N m,i stands for goal point. When the key cell is searched for N f ,i = 0 time, the existence probability of the goal cell is set to be 1, and the probability of the other key cell is 1 5 . WhenN f ,i ≥ 0, the probability of the corresponding key cell becomes 0. (20), U AV i will only track path a,i . The key cells marked for other UAVs are non-key cells for U AV i . This ensures that the tracking process of the UAVs does not affect each other.

Operator 3: Design fitness function
The reward function of the ST approach relates only to the TEP for every cell in Operation 2: The original cell of path a is the starting position of the ST approach. And its initial flight orientation is pointed from the starting position to the second cell of path a .

Operator 4: Track path a
The process of tracking is similar to that of Algorithm 1 and can be described as the following steps: Step 1: Initialize the information of the region to be searched and the pose of the UAVs.
Step 2: Execute N p SAP, and then get the predicted state sequence E i (k).
Step 3: Let f itness(·) = J 2 , the path p l i (k) corresponding to the sequence with the greatest fitness in E i (k) solved by basic PIO.
Step 4: U AV i flies forward for a time step.
Step 5: Repeat Step 2 to Step 4 until the goal cell is searched.
Step 6: When reaching the goal, choose the final path with least cost and OC, as well as the total time steps T i,2 (k) from the starting point to the goal point.
In light of the knowledge of graph theory, the key cells included by path a is the only directed connection sequence from the starting position to the goal cell, which is equivalent to a directed path. Through the four operations described above, we can obtain a lowest path that satisfies 3OC. ST approach as opposed to path following or trajectory tracking. The errors of the latter two are the absolute errors between the actual tracking trajectory and the ideal trajectory. ST approach chooses the path with the highest reward in all alternative paths, and the resulting tracking errors are the relative errors. Minimizing tracking errors can be converted to maximizing search rewards and it can be represented as: where p * i (k) denotes the path with the greatest fitness in E i (k) at time step k. L r (k) = (x r (k), y r (k)) is the coordinate of the key cell to be tracked. L c (k) = (x c (k), y c (k)) is the coordinate of the cells actually tracked.
If L r (k) ∈ E i (k), the tracking process with the highest search rewards must be error-free tracking. Algorithm 2 demonstrates the implementation procedure of the ST approach.

Remark 3.
(1) ST approach is not only suitable for A* algorithm. It is effective for other algorithms.
(2) In addition to the 3OC mentioned in this section, the path can also be tracked under 4OC and 8OC. ST approach is suitable for different types of robots or different environments. (3) This approach can also track the 3D path or the path under the dynamic environment.
Set the RC for the U AV i to be 100 steps. Taking an example shown in Figure 5, the remaining range at this time step is assumed to be six steps. If U AV i continues to search, it will take at least nine steps, as shown in black arrow, for U AV i to return to the airport at the next step. According to Equations (23) and (24), U AV i should return to the airport at this time step, as shown in red arrow. In this case, η i = 95%. Algorithm 3 describes the detailed flows of the DTSCS scheme.

Algorithm 3 DTSCS scheme
Input: Initializing environmental information, pose information of U AV i , and T i . Output: T i,1 (k), T i,2 (k), closed trajectory and η i . 1: begin: 2: do the search stage (Algorithm 1).

Numerical Simulation and Analysis
In this section, three simulations are performed to verify the performance of the proposed CPIO and the effectiveness of the ST approach and the DTSCS scheme. The simulation programs are coded in Matlab R2016a and implemented on Intel Core I7-2600 3.40 GHz personal computer with 4 GB random access memory.

MUCS Based on CPIO Algorithm
The region R consists of 50 × 50 cells, the center of every cell is the allowed waypoint. And the setting of the coordinate system of R is consistent with Figure 9a Table 1. The advantages of N p SAP are discussed in Section 3.1. If N p is relatively large, the benefits of N p SAP will be limited or even negligible. In contrast, the number of alternative paths will increase exponentially. Figure 6 shows the simulation results and error analysis with different values of N p .
In Figure 6a, the running time T a of the personal computer increases with the increase of N p . When N p = 7 and the time steps of the search is 100, the computation time is T a = 121.7 s. Combined with the actual search process, it do not meet the requirements of the real-time search task. Figure 6b shows that if N p = 1 or 2, the average fitness value, average rewards for the search process, decreases gradually in the later stages of the search, this is because one or two time steps in advance cannot effectively avoid obstacles or continue to search within the key region. When N p = 4, UAV has enough time to make obstacle avoidance and continue to search in the key regions. However, the increase of N p will not significantly improve the rewards. In Figure 6c, when N p ≤ 4, the number of targets found out is less instead. When N p = 5 to 7, the number of targets found out are the same. Based on the above analysis, we choose N p = 5 as the best prediction steps. Figure 7 shows the search trajectories of four UAVs when N p = 5.

Scenario 2: Searching performance of CPIO
To demonstrate the effectiveness of CPIO, comparative experiments are conducted with identical initial conditions. The searching performance of CPIO is compared to the basic PIO, particle swarm optimization (PSO), and GA. In Figure 8a, The convergence speed of the CPIO algorithm is faster than the basic PIO, PSO and GA. And the standard deviations of 100 experiments are also the smallest. Figure 8b shows that when the searching time step is 100, the average number of targets found out by CPIO is 9.6, which is more than the other three algorithms. Therefore, the CPIO algorithm based on cooperation-competition mechanism is superior to the compared algorithms in the cooperative search task.

Tracking Performance of the ST Approach
Definition 6. (Tracking efficiency φ). The proportion of the key cells tracked in the total cells tracked by ST approach, which is the indicator of tracking performance.
The cells that A* algorithm passes through are modeled as the key regions, and Figure 9a shows the marked results. Using basic PIO algorithm to track these key cells, the tracking results under 3OC, 4OC, and 8OC are shown in Figure 9b. In some slit areas, φ will be reduced because the key cells cannot be tracked completely under the 3OC. The A* algorithm and Dijistra algorithm can not satisfy the 3OC in Area 1 and Area 2. To maximize the rewards, there will be some adaptive path selections, which via a few steps ahead of the turn or lag a few steps back to track the key cells. The φ for different OCs are given in Figure 10. Its abscissa represents the unit length after subdividing Figure 9a. The search efficiency of 8OC is higher than that of 3OC and 4OC. As the grid map is subdivided, The φ of 3OC and 4OC increases.

Closed Trajectory
The numerical simulation is carried out according to the DTSCS scheme designed by Section 5. Let T i,1 (k) = 100. The closed trajectory is shown in Figure 11, where the dotted line is the search path, and the solid line is the return path. Figure 12 shows the relationship between range utilization and time step for every UAV. The average range utilization of UAVs is 97% under the 3OC. If switching to the 8OC, the range utilization of every UAV will increases. X-axis Y-axis

Conclusions
According to the RC of UAV in the search process, we design a dynamic two-stage scheme to implement the closed search. Every UAV needs to take into account the time steps needed to return to the airport during searching process. When the searching time and the returning time meet the Equations (23) and (24), UAVs can return back to the starting airports. To improve the target search efficiency, the CPIO with cooperation-competition mechanism is proposed and applied to the search stage problem. The simulation results show that N p = 5 is the best prediction time steps. The CPIO algorithm outperforms the compared algorithms regarding the number of targets found out and the convergence speed. In the return stage, the ST approach is presented to ensure UAVs safely return to the their starting airport. Closed target searching simulations demonstrate the effectiveness and efficiency of the proposed algorithm. In the future, we will explore the cooperative search strategy and path planning in dynamic 3D environments.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: UAV unmanned aerial vehicle MUCS multi-UAV cooperative search DTSCS dynamic two-stage closed search ST search tracking RC range constraint OC orientation constraint N p SAP N p -step-ahead prediction TEP target existence probability EU environmental uncertainty PIO pigeon-inspired optimisation CPIO coevolution pigeon-inspired optimization 3D three-dimensional