1. Introduction
In the wide-area, complex and changeable environment, multiple unmanned aerial vehicles (UAVs) cooperative control is a critical research field [
1,
2,
3,
4] of unmanned system. In particular, multi-UAV cooperative search (MUCS) can be applied to prevent forest fires, patrol the border and probe potential safety hazards in cities and other aspects. Therefore, the study of MUCS strategy has practical significance. In [
5], Sujit uses 
k-shortest path algorithm to search the target in an uncertain environment, but this algorithm cannot adapt to the situation that the edge weights of the search cell changes with the number of UAV passes. In [
6], Bertuccelli models the uncertainty in the environment as the prior probabilities in the region and uses the Beta distribution to predict the minimum number of times required by UAV to search the target. This method, however, only considers a single UAV. Riehl transforms MUCS into a finite sequence of updates to a dynamic graph via sampling region with a high probability of target existence in [
7], but ignores the orientation constraint (OC) of UAV. In [
8], Tian maps the search rewards of the target to fitness function and proposes a coevolution genetic algorithm to search the random target under the OC of UAV. In [
9], Hu designs a multi-agent mapping fusion scheme based on distributed control, which converges the individual search probability of agent to the whole searching region. Nevertheless, this method does not take into account the impacts of environmental conditions.
Currently, almost all achievements on MUCS have mentioned that the range limits the application of UAV. To our knowledge, however, there is no study of the search strategies of MUCS under the range constraints (RC). The motivation of this paper is to present a new search strategy considering the range constraint (RC) and the OC. The search path of UAV under RC should be a closed trajectory. UAVs set out from the airports and need to return to their respective airports after completing the search task. The search algorithm based on an intelligent algorithm has been proved to be superior to the traditional optimization algorithm in [
10,
11,
12]. Intelligent algorithms start the optimization with a series of possible solutions, and their performances depend primarily on the parameter initializations. Therefore, the constant parameters are not entirely consistent with the evolutionary spirit of the algorithm itself [
13,
14,
15]. The results of optimization may be premature, divergent or locally optimal. Duan presents the pigeon-inspired optimization (PIO) algorithm based on a dynamic size of solution agents in [
16]. Map and compass operator and landmark operator are used respectively by the distance of the pigeons from the target. Although there are only two switching operators, the optimization ability of PIO has been verified to be better than other heuristic algorithms in many applications [
17,
18,
19].
The grid map can accurately describe the size, location, and shape of obstacles. Eight non-obstacle neighbor cells are reachable nodes for every cell. Its center is the allowed waypoint [
20,
21]. The lowest-cost path in grid model corresponding to finding a suitable sequence of cells to move the UAV from the original cell to a goal cell such that its total accumulated expenditure is minimized. The least-cost path solved by Dijkstra algorithm [
22] or A* algorithm [
23] does not take into account the OC of the UAV. It is assumed that the cells sequence of the lowest 
 without the OC is 
, in which 
 is the cell that 
 contains. After adding the OC, if an element of 
 is not in the feasible region, 
 will become a illegal path. Therefore, A* algorithm and other algorithms cannot find the lowest-cost path under the OC.
Our contributions: (i) We apply the concept of the closed search to MUCS for the first time, and design a dynamic two-stage closed search (DTSCS) scheme to realize the closed path search of UAV. The first stage is the search stage, and the second stage is the return stage. The pose, position and orientation, at the end of the search stage is the starting pose of the return stage. (ii) Inspired by the cooperation-competition relationship between subgroups within a population in nature, a CPIO algorithm based on the cooperation-competition mechanism is used as the search algorithm for MUCS in the first stage. Every subgroup pigeons is abstracted as one UAV. The cooperation-competition relationship between pigeons reflects the cooperative relationship between UAVs. (iii) In the second stage, a search tracking approach (ST) is proposed. The cells containing in the lowest-cost path without the OC are modeled as the key regions in the region searching. The basic PIO algorithm is used to obtain the lowest-cost path from the starting point to the goal point which satisfies the OC. Maximizing search rewards is equivalent to minimizing tracking errors.
The rest of the paper is organized as follows. 
Section 2 introduces the concept of the closed path and the basics of region searching. In 
Section 3, A CPIO algorithm based on the cooperation-competition mechanism is proposed for MUCS. A ST approach is given to obtain the lowest-cost path with the OC in 
Section 4. 
Section 5 describes the DTSCS scheme. Numerical simulations and analysis are drawn in 
Section 6. Concluding remarks are given in the final section.
  2. Problem Formulation
The searching region is usually divided into grid cells. The target existence probability (TEP) represents the probability of target existence in a cell. Environmental uncertainty (EU) indicates the degree to which UAVs do not know about the environment. These two variables are regarded as prior information. Every cell is modeled as the key or non-key region, and then the search probability graph of the whole searching region is formed. UAVs accomplish the search task via environment perception and information interaction. This section includes three parts: the environment model, UAV kinematic model and the analysis of UAV closed search.
  2.1. Region Searching Model
This subsection describes the basic mathematical model of region searching. Firstly, rasterizing the searching region, and then the environment information is modeled as the prior probability information. The TEP in the whole search environment is updated by Bayesian rule. And the environmental uncertainty of every cell is updated according to search number of UAVs.
  2.1.1. Environment Modeling
UAVs are assumed to fly on a fixed plane above the searching region. 
M UAVs search 
 targets. The searching region 
R is uniformly divided into 
 cells:
          where 
 is the coordinate of cell 
. The side length of a cell is the unit length. Discretizing searching time of 
, 
 can search a cell in one time step [
24,
25]. The real-time position of 
 can be described by the geometric center of the cell it searches.
  2.1.2. Probability Map Update
It is known that the TEP 
 and the EU 
. If 
 and 
, the cell (m, n) is modeled as a known region. If 
 and 
, the cell (m, n) is modeled as a non-key region. And if 
 and 
, the cell (m, n) is modeled as a key region. Dangerous areas, such as hostile radar detection regions, peaks, etc., are designated as no-fly zones. Bayesian rule is used to update whether a cell exists a target [
26]. The update equations for the probability that a target exists but is not detected and that the target does not exist but is reported are written as follows:
          where 
 is the detection probability of the airborne sensor to the target [
27]. 
 is the false alarm rate, which indicates the probability that there is no target in the cell but a target is reported. 
 decreases with the search number of UAVs, satisfying the following equation:
          where 
 is the number of times that a cell (m, n) searched by UAVs.
  2.2. UAV Kinematic Model
The kinematic model of 
 with constant velocity and OC is obtained as follows:
        where 
 is the position of 
, 
 is its velocity. 
 is course angle. The fourth term sets the constraint on the angle rate, which cannot exceed 
. In the grid map, 
 can only move to one of its eight neighbor cells at a time step, corresponding to the heading 
 [
28].
Definition 1. (D-orientation constraint, DOC). In the grid map, The degree of freedom of the heading in which the UAV is allowed to walk in the next time step is D.
 A slow-moving robot usually satisfies 8OC (equivalent to no constraint ) or 4OC , as shown in 
Figure 1a,b. The fast UAV can not turn backward or turn vertically during flying. Therefore, it is subject to 3OC: go straight (
), turn left (
), and turn right (
), which can be seen in 
Figure 1c. The orientation of the next time step of 
 can be summarized as follows:
  2.3. Preliminary Analysis
In the research of MUCS, we always discuss the cooperative search strategy under the assumption that there is enough fuel, which is not consistent with the actual situation. Every UAV sets out from the airport and returns to its airport after completing the search task. The flight path is a closed track. We divide the closed search into two stages: search stage and return stage.
  2.3.1. Closed Path Search
Definition 2. (Closed path). According to the definition of the loop in the directed graph, a closed path is a flight trajectory of a UAV starting form an airport to carry out task and returning back to the same airport.
 The closed search in this paper is different from the traveling salesman problem, which to find a solution that traverses all the reachable nodes from the starting point and returns to the starting point with the least travel cost. In this paper, the closed search refers to the closed path that UAV returns to the airport after completing the search task in the grid map.
Rasterizing searing region is consistent with discretizing searching time, so the RC of UAV is equated with its searching time limit. The closed path of the UAV under the RC should be composed of two stages: (1) search stage; (2) return stage. Long searching time means that UAV has a limited time to return and may not even return on time. And the short searching time means that the UAV can not effectively complete the search task. Ideally, the UAV would return to the airport at the time step when it is running out of fuel.
  2.3.2. Search Stage
UAVs accomplish the search task via environment perception and information interaction. The principles of MUCS should include the following:
- Get the maximum TEP 
- Acquire the maximum reduction of EU 
- Avoid UAVs repeatedly searching the same cell 
- Prohibit collisions between UAVs 
- Fly as far away from the airport as possible to search more unknown cells 
- Avoid searching for locally highly remunerated sequences of cells 
  2.3.3. Return Stage
Definition 3. (The lowest path with D-orientation constraint). The solution that the UAV seeks from origin to the goal cell with the smallest travel cost under the OC.
 The search strategy is no longer considered in the return stage. Our goal is to find a safe and collision-free lowest path under the OC. This path starts with the current cell and orientation and ends at the airport.
For the closed search, the primary purpose of this paper is to solve the following three problems: (1) Set the time step for UAV to return to the airport; (2) Maximize search rewards in the search stage; (3) The purpose of the return phase is to obtain a shortest path that satisfies the OC.
  3. CPIO Algorithm with Cooperation-Competition Mechanism
According to the principle of cooperative search discussed above, we design the reward function of MUCS and regard it as the fitness function of PIO. In [
8,
29,
30] coevolution genetic algorithm (GA) is applied to multi-robot platform to find the optimal solution through the cooperation among robots. In some conditions, the optimal solutions of every robots may sometimes be mutually exclusive. This conflict is consistent with the competition of subgroups in nature. Inspired by the cooperation-competition relationship between subgroups within a biological community in nature, a CPIO algorithm is used as the search heuristics for MUCS in the search stage. The optimal solution obtained via the CPIO algorithm is the position of a target, which is consistent with the attribute of region searching.
  3.1. Design Reward Function
The MUCS process is modeled as a multiobjective nonlinear programming function with RC, OC, no-fly zone and collision free and the model is given in Equations (
7) and (
8):
        where 
 are weighting factors, which satisfy 
 and 
. 
 is the searchable region in 
R. 
 and 
 are the searching time and the range of 
 respectively. 
 denotes the Euclidean distance of 
 and 
. 
 is a positive integer. The first and second terms of the 
 are designed to maximize search rewards, which corresponding to the first and second terms of the principles. The reduction of EU of the cell 
 can be defined as follows:
The third of the 
 is consistent with the fourth of the principles and is designed to avoid collision between UAVs:
        where 
 is the minimum element of the 
 at time step 
k, and 
 is given in  
Appendix A Equation (
A1). When the distance between any two UAVs is less than or equal to 
 unit lengths, the reward of this term will be −100. And then the rewards of 
 will also be negative. 
 can only choose to search cells away from other UAVs. With regard to the fourth term of the 
, it implies that the further away 
 is from the airport, the more rewards it obtains (principle 5):
        where 
 is the sum of all the elements in 
, which is written in 
Appendix A Equation (
A2).
Definition 4. (-step-ahead prediction, SAP). Learning from the idea of rolling optimization in model predictive control theory [31]. In each term of  designed in Equation (7), the rewards of the reachable cells to be calculated is not only the next time step, but also the next  time steps.  Remark 1. Although n steps are predicted in advance each time,  flies forward only one-time step.
 The series of reachable waypoints of 
SAP of 
 maps the predicted cells sequence 
 from the step 
 to the future time step 
, which is based on the present position 
 at time 
k. The reachable cells included in 3SAP construct an expanding tree, as shown in 
Figure 2a. Obviously, the expanding tree generated by 
SAP contains 
 alternative paths and the 
l-th path can be illustrated as:
        where 
.
The positive aspects of SAP include:
- Avoid greedy thought corresponding to the sixth term of principles referred in  Section 2.3.2
- Circumvent the no-fly zone ahead of time and continue to search the key region 
- Prohibit collision between UAVs 
- Reduce the impact of communication delays 
As given in 
Figure 2b, Comparing the reward of 
 and 
, if 
, the 
 is chosen at this time step. Else if 
, the 
 will be chosen. For an extreme situation, if 
, the path with the largest reward in 
 must be the optimal path in the whole search process. As 
 grows, the number of paths increases exponentially. Therefore, we use PIO algorithm to find the optimal solution that is difficult for conventional optimization algorithms.
  3.2. Overview of Basic PIO
Inspired by natural phenomenon of the autonomous homing behavior of pigeon swarms, Duan proposes the PIO algorithm in [
14], which the optimization process can be divided into two operators based on the distance of the pigeons from the destination.
Operator 1: Map and compass operator
During the prometaphase of the search, pigeons is far away from the goal. The real-time information of the sun is abstracted into map and compass operator to adjust the flight orientation, which is a rough navigation process. Suppose the number of pigeons is 
. The position and velocity of 
 in the two-dimensional (2D) plane is expressed as:
The renewal equations of position and velocity are:
        where 
 is the current iteration number. 
 is the coefficient of the map and compass operator. 
 is a random number from 0 to 1. 
 denotes the position of the pigeon closest to the goal in the pigeons in iteration 
. After 
 iterations, the rough navigation stage is completed. PIO algorithm enters the second stage of optimization.
Operator 2: Landmark operator
When the pigeons arrive near the target, the PIO algorithm switches to the landmark operator. The landmark information of the nearby environment will provide precise guidance information. The speed of the pigeon does not change at this stage, while the position is updated:
        where 
 is the current iteration number. 
 is the number of pigeons in 
v iteration. 
 denotes the position of the central pigeon in 
 iteration. 
 is the fitness function. The optimal solution will be obtained after Operator 2 is performed 
 iterations.
  3.3. CPIO and Cooperative Search
In biology, a population may divide into some subgroups [
32,
33]. In the face of natural enemies, the subgroups will cooperate to resist. Nevertheless, they also compete with each other in the interests of food, mating, territory, and so on. Cooperation and competition make the population survive and evolve better. To simulate these natural behaviors, UAVs work together to complete the search task via information interaction, in the mean time UAVs compete with each other to search the specific vital cells. We propose a CPIO algorithm based on the cooperation-competition mechanism as the search algorithm for MUCS, as shown in 
Figure 3. In which, one subgroup pigeons is abstracted as one UAV. The cooperation-competition relationship between pigeons reflects the cooperative relationship between UAVs.
MUCS based on CPIO is composed of three parts: CPIO, UAVs, and environment model. The principle of which is shown in 
Figure 4. In the CPIO module, the initial information of the environment, the environment information detected by the UAVs and the real-time pose signals of the UAVs are modeled as prior information. This information is then transmitted to the reward function 
. 
 transforms the optimal solution obtained by the cooperative-competitive mechanism into discrete flight signals. And these signals will be transmitted to 
 at each time step. Thus, the cooperation-competition mechanism can be described as follows:
Cooperation mechanism
- Get the maximum  and the maximum  
- Stay away from airports and search for more unknown regions 
- Avoid searching locally highly remunerated cells 
Competition mechanism
- Forbid UAVs to search for the same cell at the same time step 
- Avoid UAVs repeatedly searching the same cell 
- Stay away from other UAVs to search more unknown cells 
Under all constraints, the subgroup that maximizes the total rewards will win. Cooperation and competition are not entirely opposed to each other. The winning subgroup will search the controversial cell, and the other subgroups compete to search other cells. The result of the competition mechanism is consistent with the original intention of the cooperation mechanism.
Let 
, thus Equation (
15) can be transformed into:
In the searching region, 
 and 
 are neighbor cells of 
 and 
, respectively. Under the constraint of Equation (
6), the position (orientation) in the PIO algorithm can be expressed as:
The MUCS process based on the CPIO algorithm can be expressed as Algorithm 1.
| Algorithm 1 MUCS based on CPIO algorithm | 
| Input: Initializing environmental information and pose information of . | 
| Output: The searching trajectory of every UAV and its searching time steps . | 
| 1: begin: | 
| 2:      for  do | 
| 3:          Predict N time steps in advance, and get the alternative paths. | 
| 4:          Delete the paths that does not satisfy Equation (8). | 
| 5:          Use the CPIO algorithm to find the path with the highest fitness value among the remaining paths. | 
| 6:          while  receives the return order,  returns to the airport. | 
| 7:          Record the searching trajectory and searching time steps  when  returns. | 
| 8:          end while | 
| 9:      end for | 
| 10: end | 
  4. ST Approach
Inspired by the knowledge of cooperative search and trajectory tracking [
34,
35], we propose the ST approach. The lowest 
 without the OC of motion is mapped to a trajectory to be followed, and the cells it passes through are modeled as the key regions. Other blank cells are shaped as non-key regions, and obstacles are designed as no-fly zones. The input signal is the orientation sequence to maximize the search rewards of 
. The process of tracking 
 is the same as that of a UAV searching from the starting cell to the goal cell. Since the ST approach is only applied to a single UAV, there is no need to consider the cooperation-competition relationship between UAVs. ST approach based on basic PIO algorithm consists of four operators.
Operator 1: Obtain 
We use A* algorithm [
23] to find 
, and A* algorithm can be simplified as the following steps:
- Step 1: Specify the OC for the UAV. 
- Step 2: Design a cost function , where  denotes the movement cost: corresponds to the expenditure of moving the current position  to other cell moved into the neighbor.  denotes the heuristic cost: corresponds to the expenditure of changing from current cell to goal cell. When , A* algorithm degenerates to Dijkstra algorithm. When , A * algorithm degenerates to greedy best first search algorithm. 
- Step 3: Estimate total expenditure  and change to the cell with the least cost. 
- Step 4: Repeat Step 3 until the goal cell is reached. 
- Step 5: When reaching the goal, choose the final path with least cost. 
Operator 2: Mark key cells
Referring to the concept of the TEP in the region searching, the blank cells are set as the non-key regions, which implies 
. The cells that the path of 
 passes through are marked as the key regions, and their 
 are denoted as follows:
      where 
 is the serial number of the mark points from the starting cell to the goal cell in 
. 
 stands for goal point. When the key cell is searched for 
 time, the existence probability of the goal cell is set to be 1, and the probability of the other key cell is 
. When 
, the probability of the corresponding key cell becomes 0.
Remark 2. As shown in the fourth term of Equation (20),  will only track . The key cells marked for other UAVs are non-key cells for . This ensures that the tracking process of the UAVs does not affect each other.  Operator 3: Design fitness function
The reward function of the ST approach relates only to the TEP for every cell in Operation 2:
The original cell of  is the starting position of the ST approach. And its initial flight orientation is pointed from the starting position to the second cell of .
Operator 4: Track 
The process of tracking is similar to that of Algorithm 1 and can be described as the following steps:
- Step 1: Initialize the information of the region to be searched and the pose of the UAVs. 
- Step 2:  Execute SAP, and then get the predicted state sequence . 
- Step 3:  Let , the path  corresponding to the sequence with the greatest fitness in  solved by basic PIO. 
- Step 4:   flies forward for a time step. 
- Step 5:  Repeat Step 2 to Step 4 until the goal cell is searched. 
- Step 6:  When reaching the goal, choose the final path with least cost and OC, as well as the total time steps  from the starting point to the goal point. 
In light of the knowledge of graph theory, the key cells included by 
 is the only directed connection sequence from the starting position to the goal cell, which is equivalent to a directed path. Through the four operations described above, we can obtain a lowest path that satisfies 3OC. ST approach as opposed to path following or trajectory tracking. The errors of the latter two are the absolute errors between the actual tracking trajectory and the ideal trajectory. ST approach chooses the path with the highest reward in all alternative paths, and the resulting tracking errors are the relative errors. Minimizing tracking errors can be converted to maximizing search rewards and it can be represented as:
      where 
 denotes the path with the greatest fitness in 
 at time step 
k. 
 is the coordinate of the key cell to be tracked. 
 is the coordinate of the cells actually tracked.
If , the tracking process with the highest search rewards must be error-free tracking. Algorithm 2 demonstrates the implementation procedure of the ST approach.
Remark 3. (1) ST approach is not only suitable for A* algorithm. It is effective for other algorithms. (2) In addition to the 3OC mentioned in this section, the path can also be tracked under 4OC and 8OC. ST approach is suitable for different types of robots or different environments. (3) This approach can also track the 3D path or the path under the dynamic environment.
 | Algorithm 2 ST approach | 
| Input: The lowest  without the OC. | 
| Output: The lowest  with the OC, the tracking time steps . | 
| 1: begin: | 
| 2:      Use the A* algorithm to get . | 
| 2:      Mark the key cells contained in . | 
| 3:      for | 
| 4:          for  do | 
| 5:              Assign different  to the cells of whole map in term of Equation (23). | 
| 6:          end for | 
| 7:      end for | 
| 8:      Let . | 
| 9:      for (unknow) | 
| 10:        Use basic PIO algorithm to track  under OC. | 
| 11:        while the goal cell (airport) is reached. | 
| 12:            Record the return  and . | 
| 13:        end while | 
| 14:     end for | 
| 15: end | 
  5. DTSCS Scheme for MUCS
According to the analysis and requirements of 
Section 2.3, we design the DTSCS scheme for closed search. In the first stage, the CPIO algorithm proposed in the 
Section 3 is used as the search algorithm for MUCS. In the return stage, the ST approach proposed in the former section is used to obtain the shortest path with OC to return to the airport. The two stages are coupled in time. The pose of UAV at the end of the search stage is the starting pose for the return stage, which in turn determine the time steps needed for UAV to return to the starting airport.
In the search stage, 
 needs to calculate its remaining distance and the time steps needed to return to the airport in real time. Assume the RC of 
 is 
. 
 represents the total time steps of the search stage. 
 is the total time steps of the return stage. Assume that 
 has searched for 
k time steps and 
 can still safely return to the airport. Before executing the next-step search, 
 need first to calculate the position to be reached at the next step and how long it will take from the current cell returning back to the airport. If at the next step, the calculated fuel can still guarantee 
 safely returning back to the airport, then the 
 search forward a time step. Otherwise, the 
 stop searching and return directly to the airport. Therefore, the time for 
 returning to the airport should satisfy the following relationships:
      where Equation (
23) means that the sum of 
 and 
 at any time step cannot exceed the RC. Equation (
24) specifies the condition for the return of the 
. The DTSCS scheme can not only ensure that UAV gets the maximum search rewards, but also maximize the range. Besides, it is not limited to the fixed range 
. It can be changed during the task. We can specify the searching time for the first stage, or we can issue a return order at any time during the search. All of these scenarios are possible cases in the application of MUCS.
      
Remark 4. Under return conditions, UAV may have surplus fuel after returning to the airport, which is due to the OC of itself.
 Definition 5. (Range utilization). The proportion of time-of-flight of UAV in  is defined as:
 Set the RC for the 
 to be 100 steps. Taking an example shown in 
Figure 5, the remaining range at this time step is assumed to be six steps. If 
 continues to search, it will take at least nine steps, as shown in black arrow, for 
 to return to the airport at the next step. According to Equations (
23) and (
24), 
 should return to the airport at this time step, as shown in red arrow. In this case, 
. Algorithm 3 describes the detailed flows of the DTSCS scheme.
| Algorithm 3 DTSCS scheme | 
| Input: Initializing environmental information, pose information of , and . | 
| Output:, , closed trajectory and . | 
| 1: begin: | 
| 2:      for  do | 
| 3:          if  and | 
| 4:             do the search stage (Algorithm 1). | 
| 5:          else if  and | 
| 6:             do the return stage (Algorithm 2). | 
| 7:          while the airport is reached. | 
| 8:          end while | 
| 9:      end for | 
| 10: end |