Multi-Agent Cooperative Target Search

This paper addresses a vision-based cooperative search for multiple mobile ground targets by a group of unmanned aerial vehicles (UAVs) with limited sensing and communication capabilities. The airborne camera on each UAV has a limited field of view and its target discriminability varies as a function of altitude. First, by dividing the whole surveillance region into cells, a probability map can be formed for each UAV indicating the probability of target existence within each cell. Then, we propose a distributed probability map updating model which includes the fusion of measurement information, information sharing among neighboring agents, information decay and transmission due to environmental changes such as the target movement. Furthermore, we formulate the target search problem as a multi-agent cooperative coverage control problem by optimizing the collective coverage area and the detection performance. The proposed map updating model and the cooperative control scheme are distributed, i.e., assuming that each agent only communicates with its neighbors within its communication range. Finally, the effectiveness of the proposed algorithms is illustrated by simulation.


Introduction
With the fast development of high resolution imaging devices and processing technologies, unmanned aerial vehicles (UAVs) with air-borne cameras are increasingly employed in civil and military applications such as environmental monitoring, battlefield surveillance and map building, where ground-target search is one of the major applications [1,2]. Target tracking and search have been one of the most popular utilizations of UAVs [3,4]. The conventional method for target search by UAVs in a closed region divides the whole surveillance region into cells, and associates each cell with a probability or confidence of target existence in the cell which constitutes a probability map for the whole region [5,6].
In [7], an online planning and control method is proposed for cooperative search by a group of UAVs, where each agent keeps an individual probability map for the whole region updated according to the Dempster-Shafer theory. A path planning algorithm is designed by using the obtained measurement information, which requires each agent to directly communicate with all other agents. In [8], target detection is considered as part of an integrated mission including coverage control and data collection as parallel tasks for multi-agent networks. The coverage control method aims to maximize the joint detection probability of random events and the probability of target existence is updated by measurements based on the Bayesian rule. However, only the measurement information of direct neighbors is exchanged, which makes it difficult to obtain the target information of the whole surveillance region. In [9], a decentralized gradient-based control strategy is proposed for multiple autonomous mobile sensor agents searching for targets of interest by minimizing the joint team probability of no detection within action horizon based on range detection sensing model. However, each agent is required to collect detection information from all other agents. In [10], a decentralized search algorithm is developed which includes a two-step updating procedure for the probability maps. Each agent first obtains observations over the cells within its sensing region and updates its individual probability map by the Bayesian rule. Then, each agent transmits its individual probability map to its neighbors for map fusion. This algorithm is distributed and full network connectivity is not required. However, the lack of information correlation makes the map fusion difficult and only a heuristic fusion method is given in [10], the performance of which has not been analyzed. In our recent work [11], a distributed iterative map updating model is proposed to fuse the information from measurements and the maps of neighbors based on a logarithmic transformation of the Bayesian rule. Through this, the nonlinear Bayesian update is replaced by a linear one which simplifies the computation. The convergence speed of individual probability map of an agent is also analyzed under fixed detection and false alarm probabilities for the search of static targets.
The cooperative control is an important task for efficient target search by a group of UAVs. Compared with the centralized control algorithms, distributed control algorithms are more robust to accidental failures of UAVs and breaks of communication links [12]. In [13], a distributed multi-agent coverage control method is proposed based on a given sensing performance function related to the distance to robots and gradient descent algorithms are designed for a class of utility functions to optimize the coverage and sensing performance. In [14], a distributed, adaptive control law is developed to achieve an optimal sensing configuration for a network of mobile robots which obtain sensory information of a static environment and exchange their estimates of the environment with neighbors. In [15], a three dimensional distributed control strategy is proposed to deploy hovering robots with downward facing cameras to collectively monitor an environment. A new optimization criterion is defined as the information obtained by each pixel of a camera. In [16], a dynamic awareness model is proposed to control a multi-vehicle sensor network with intermittent communications. The state of awareness of each individual vehicle is updated by its own sensing model and sharing information with its neighbors. However, none of the coverage control schemes mentioned above has considered the detection results of target existence which may affect UAVs' movement decisions in target search. Moreover, there are very few works addressing the issue of distributed vision-based cooperative search for multiple mobile targets with probabilistic detections.
In this paper, we investigate the vision-based cooperative search for multiple ground mobile targets by a group of UAVs with limited sensing and communication capabilities. The main contribution of this paper is that a distributed strategy of information fusion and cooperative control is proposed for searching multiple mobile targets using multi-agent networks based on probabilistic detections. In addition, the time-varying detection and false alarm probabilities are considered which are due to the varying altitudes of the agents with 3-dimensional dynamics. Each agent under our search strategy shares local target information and controls its own behavior in a distributed manner. Based on the probability map updating model proposed in [11], we generalize the model by considering the information decay and transmission between cells due to environmental changes such as the target movement. The influence of the time-varying detection probability on the update of probability maps due to the three-dimensional UAV dynamics is also analyzed. Then, a coverage optimization problem is formulated to balance the coverage area and detection performance. The proposed map updating model and cooperative control scheme are distributed, i.e., each agent only communicates with the agents within its communication range. This paper is organized as follows: Section 2 describes the basic notations and assumptions used in this paper. Section 3 presents the probability map updates by measurements and information sharing with time-varying detection probabilities. In Section 4, a three-dimensional coverage control method is presented for target search. Simulation results are shown in Section 5, and the conclusions are drawn in Section 6.

Basic Definitions and Assumptions
The surveillance region O ∈ R 2 is assumed to be on a plane ground and has been uniformly divided into a set of cells of the same size. We assume that all UAVs (or agents) use the same global Cartesian coordinate system and the position of each agent is denoted as µ i,k = c T i,k , h i,k T ∈ R 3 for agent i (i = 1, 2, · · · , N ) at time k (as shown in Figure 1a), where c i,k ∈ R 2 is the planar coordinate of its projection on O, h i,k ∈ R is the altitude of the agent above O, N is the total number of agents and "T" denotes the transpose operation. Each agent is assumed to have access to its own position at any time.
Each cell in the surveillance region is associated with a probability or confidence of target existence within the cell which is modeled using the Bernoulli distribution, i.e., θ g,k = 1 (a target is present) with probability P i (θ g,k = 1) and θ g,k = 0 (no target is present) with probability 1 − P i (θ g = 1) for agent i and cell g at time k, where g ∈ R 2 is the location of the cell center in O. If more than one target are present within a cell, they are treated as one single target. In this paper, we mainly discuss about the vision-based detections where each agent carries an airborne camera facing downward to surveillance region (as shown in Figure 1a). Each agent independently takes measurements Z i,g,k over the cells within its sensing region C i,k at time k, where and • denotes the 2-norm for vectors. Each agent is assumed to have the same angle of field of view, half of which is denoted by ϕ. We also assume that the size of each cell is sufficiently small comparing with the size of C i,k so that we can ignore the boundary effect and roughly consider a cell as wholly within C i,k if its center is within C i,k . Only two observation results are defined for each cell, Z i,g,k = 0 or Z i,g,k = 1. For all cells, P (Z i,g,k = 1|θ g,k = 1) = p i,k and P (Z i,g,k = 1|θ g,k = 0) = q i,k are assumed to be known by agent i as the detection probability and false alarm probability respectively. The topology of the network of all agents at time k is modeled by an undirected graph G k = (E k , V). V = {1, 2, . . . , N } is the vertex set and E k = {{i, j} : i, j ∈ V; µ i,k − µ j,k R c } is the edge set, where each edge {i, j} is an unordered pair of distinct agents and R c is the communication range of each agent. The graph or the network is connected if for any two vertices i and j there exists a sequence of edges (a path) {i, ν 1 }, {ν 1 , ν 2 }, . . . , {ν n−1 , ν n }, {ν n , j} in E k . Let N i,k = {j ∈ V| {i, j} ∈ E k } ∪ {i} denote the set of neighbors of agent i at time k where an agent is assumed to be a neighbor of itself. The degree (number of neighbors) of agent i at time k is denoted as d i,k = |N i,k |.

Bayesian Update and Consensus-Based Map Fusion
In [11], we proposed a cooperative control scheme for target search in multi-agent systems. In a group of UAVs, each agent i keeps an individual probability map P i,g,k of the whole region, where P i,g,k P i (θ g,k = 1) and is updated by the Bayesain rule: P i,g,k = P (Z i,g,k |θ g,k = 1) P i,g,k−1 P (Z i,g,k |θ g,k = 1) P i,g,k−1 + P (Z i,g,k |θ g,k = 0) (1 − P i,g,k−1 ) where 0 < P i,g,0 < 1 and 1 > p i,k , q i,k > 0. For the cases with p i,k = 0 or 1 or q i,k = 0 or 1, simplified conclusions can be obtained as shown in [11] and will not be considered in this paper. By letting we get the following transformation of Equation (1): where Keeping Q i,g,k as the updated term instead of P i,g,k simplifies the nonlinear update in Equation (1) into the linear one in Equation (3). For a group of UAVs, we let each agent i at time k first take measurements and transmit the measurements to its neighbors. After receiving the measurements from all its neighbors, Q i,g,k is updated as follows: Then, each agent i transmits the updated Q i,g,k of the whole region to its neighbors for map fusion, which is given by: where Then, a matrix composed of w i,j,k can be defined as: which is a doubly stochastic matrix [17]. The communications of neighboring agents are assumed to be synchronized within a short time interval. Time synchronization in distributed networks is not the focus of this paper and has been addressed by many works [18][19][20][21].

Time-Varying Detection Probability
In [11], we only considered a 2-dimensional control scheme assuming that all agents move on a fixed plane parallel to the ground plane. However, in the real world, UAVs such as helicopters can change their altitudes according to their task requirements so as to enlarge their sensing area (here we only consider cameras with a fixed zooming level). Therefore, in this paper, we will consider the influence of 3-dimensional dynamics of UAVs on the detection performance.
For vision-based detection, the detection probability relies on the picture resolutions. Figure 1b shows the basic imaging scheme by an airborne camera similar to the one given in [15,22]. In general, a desirable property for good target recognition is a "right" ratio between the size of the image and the size of the target, where "right" depends on the target type and the detection algorithm that is employed. To be more simplified, it can be assumed that the larger the image of a target in the picture (in terms of the number of occupied pixels) obtained by the UAV, the easier for the UAV to discriminate the target no matter what recognition method is used. Hence, we can model the target discriminability of a UAV as a function ρ proportional to the ratio between the size of a target image taken by the camera denoted by S TI and the size of one pixel denoted by S P , i.e., ρ ∝ S TI S P . Here, we assume that all targets are of the same visual properties such as color, shape and size that are influential on target discriminability. It is also assumed that each camera has a fixed focal length so that we can only consider the change of ρ due to the variation of agent altitude. Then, by denoting the size of the projection of a target on the ground plane as S T , we can derive that: where h is the altitude of the UAV and b is the fixed distance between the image and the lens (as shown in Figure 1b). In a multi-agent system, for the i-th agent at time k, we have However, in reality, ρ i,k cannot be infinitely large and there should be an upper limit when h i,k is smaller than a threshold h. That is to say, the target discrimination ability will not be improved any more if a UAV is descending very close to the ground.
The target discriminability determines the detection probability when a UAV is detecting the existence of targets within each cell under surveillance. It is natural to conceive that the detection probability p i,k increases and the false alarm probability q i,k decreases as ρ i,k increases. When the altitude of the UAV becomes larger than a threshold h, it runs out of its ability to discriminate any target from the background environment, which means that the detection result dose not rely on the true existence of the target any more.
should be a monotonically increasing function of ρ i,k , or more explicitly, a monotonically decreasing function of h i,k . Therefore, we assume the following detection probability model: Similarly, we can assume the false alarm probability model as a monotonically increasing function of h i,k : In this paper, the altitude h i,k of an agent is allowed to vary from 0 to ∞ for theoretic analysis, though it may not happen in the real world due to system limitations. (9) is motivated by the natural understanding of the interaction between the altitude of an agent and its detection and false alarm probabilities. It only reflects the general relation between those parameters, and is not restricted to a specific parametric representation of f 1 and f 2 . Hence, our method is applicable for any detection probability function that fits for the model. An experimental detection probability model of CCD camera has been given in [22].

Remark 1. Model
Remark 2. p i,k and q i,k can also be cell-dependent, i.e., they may vary from place to place due to environment conditions. For example, the target is often easier to be discriminated on an open ground than on a land with trees. In complex environments, agents must know the detection probability and false alarm probability models of different type of regions. For the ease of expression, we assume the models to be constant across the whole surveillance region.
Denoting by m i,g,k the number of observations taken over cell g up to time k by agent i and defining m g,k = N i=1 m i,g,k , we can get the following conclusions for the update of Q i,g,k .
Theorem 1. Given the initial prior probability map 0 < P i,g,0 < 1 ∀i ∈ V, if there exists a constant ε > 0 such that p i,k 0.5 + ε and q i,k 0.5 − ε ∀i ∈ V, and the network topology G k is connected at all times, the following conclusions hold by implementing the map updating rule (5) and (6).
(2) If no target is present within cell g, Q i,g,k a.s.
Proof. See Appendix A.

Environment-Based Probability Map
In the map updates (5) and (6), the effect of the environmental changes such as the information decay and transmission between cells has not been considered. For example, if targets may randomly appear or disappear during search, the historical information about the target existence cannot reflect the true current situation and revisits of certain frequency to the detected cells are needed for information update. This problem can be formulated as the information decay for each cell. If a target may move from one cell to another, then part of the information for the former cell should be removed and counted as the new information for the latter cell. This problem can be formulated as the information transmission between each two cells. Therefore, we need to generalize the aforementioned map updating model to be applicable to the case with such environmental changes. Similar to the assumption made in [16], we assume that Q i,g,k decays exponentially for each cell if there is no prior knowledge and/or no measurement information. The information transmission between cells due to target movement is modeled based on the transition of probabilities. In addition, the prior knowledge about the environmental change is taken as the system input. All these lead to the following generalized updating model for Q i,g,k : where α 0 is the information decaying factor, T is the sampling period of all UAVs, a i,g,r,k and b i,g,r,k are the information transmission factors which are nonnegative, and ξ i,g,k is the input information vector given by the prior knowledge about the target existence within cell g. Specifically, b i,g,r,k satisfies b i,g,g,k = 1 and b i,g,r,k = 0 (g = r) for Q i,r,k−1 > 0, and b i,g,r,k = P θ g,k = 1 θ r,k−1 = 1 for Q i,r,k−1 0. a i,g,r,k is determined by a i,g,r i ,k = 1 and a i,g,r,k = 0 (r =r i ), wherê Remark 3. a i,g,r,k and b i,g,r,k are defined based on the physical meaning of information transmission due to the target movement in the real world. Since the combination of Q i,r,k (r ∈ O) into a cell g involves the combination of historical measurement information of all cells r ∈ O, the correlation of which may not be known, we need to be careful in dealing with the fusion of such information. If Q i,g,k > 0, we are more confident that no target exists within cell g. Otherwise, we are more confident that a target exists within cell g. Since an information transmission out of a cell at time k is expected to occur only when a target exists within the cell at time k − 1, we let b i,g,r,k = 0 if Q i,r,k−1 > 0, which means there is no transmission of information (or target movement) from cell r to cell g. If Q i,r,k−1 0, an information transmission occurs from cell r to cell g due to the possible target movement from r to g and the amount of information transmitted should be proportional to P θ g,k = 1 θ r,k−1 = 1 , i.e., equal to b i,g,r,k Q i,r,k−1 . The smaller b i,g,r,k is, the less amount of information is retained for cell g. Furthermore, by assuming that within one cell there can only exist up to one target at a time, i.e., at most one target can move into a cell at a time, we select the information stream with the largest transmitted amount as the newly stored information for cell g when there are incoming information streams from multiple cells r ∈ B g,k . That is, to take the smallest b i,g,r,k Q i,r,k−1 subject to Q i,r,k−1 0 as the newly stored information after the transmission, which corresponds to the most probable target movement to g in all possible movements to g from different cells. The information decaying factor α is set to be positive in the case that the prior knowledge of b i,g,r,k is not accurate or targets may appear and disappear unpredictably during the search. In this case, the information decay makes the agents revisit the detected regions at a certain frequency. As for the input information vector ξ, it only denotes the effect brought by the prior knowledge and there is no need to calculate it out in real implementations, because any prior knowledge on the target existence can be directly used to update the probabilities of target existence and thus update directly following its definition in Equation (2).
Here we give a simple example to illustrate how the parameters are designed if the true target dynamic model is give by x k+1 = Ψx k where x k is a vector including the target location. In this case, one can calculate the transition probability P (θ g,k+1 = 1|θ r,k = 1) for any two cells r and g where θ r,k = 1 represents that the target locates within cell r at time k. Following this, given the current accumulated information on target existence Q i,r,k of agent i for cell r at time k, one can calculate b i,g,r,k following its definition. Further, with the results of b i,g,r,k for any two cells r and g, one can calculate a i,g,r,k according to its definition in Equation (12). If the target will not suddenly disappear/appear, the decaying factor α can be set as zero.
Define the following augmented variables: where τ and s are respectively the row and column indices of an appropriate cell in A i,k , and M is the total number of cells, we get the following generalized updating model: where ⊗ denotes the Kronecker product. According to Theorem 1, Q k can be seen as the gathered information for decision making on the target existence and the larger the Q k , the less the uncertainty about the target existence or nonexistence. Hence, our aim of controlling the UAVs is to maximize Q k in some sense, which will be discussed in the following section.

Cooperative Coverage Control
In the previous section, a distributed map updating scheme was proposed for fusion of the knowledge of multiple agents. In this section, we will design a cooperative control strategy that optimizes the trajectories of agents for target search based on their real-time updated knowledge about the target information. Within each time interval, an agent first updates its probability map by the the map updating scheme designed in Section 3.3 and then makes a control decision on which place it should move to for the next observation by collective optimization which will be addressed in this section. The two steps make the whole network form a closed-loop sensing and feedback control system.
Here we consider the waypoint motion model for each agent: where u i,k ∈ R 3 is the control input (or the waypoint displacement) of the i-th agent at time k. Note that the above motion model only deals with the waypoints of agents at discrete-time steps. The true dynamics of agents is not discussed in this paper since we do not want to limit our results on the dynamic model of any specific type of UAV. How to make the agents achieve the desired waypoints by their inner-loop flight controllers is a technical issue which will not be addressed in this paper but left to be solved in our real system experiments. Our job is to optimize the selection of the next waypoint (i.e., u i,k ) for each agent given its current waypoint (i.e., u i,k−1 ). Following Equation (13), we can get At time k − 1, G k can be seen as the prior information, and V k the information gathered from measurements. Since V k and E [V k ] are both related to the true target existence which is unknown, we cannot predict the values of Q k or E [Q k ] before taking measurements. What we can do at time k − 1 is to find the optimal next time sampling position µ i,k so as to maximize the information to be gathered at time k. More precisely, the problem can be formulated as the optimization problem: max where µ k µ T 1,k , . . . , µ T N,k T . Considering that W k includes the global topological information which is often hard to obtain for each individual agent in a distributed system, and (W k ⊗ I) V k V k , we replace Equation (15) with the following suboptimal optimization: Notice that Equation (16) is not an approximation of Equation (15), but a new cost function we intend to optimize which is an upper bound of Equation (16). Such way of defining the cost function is very common in statistics and estimation theory such as the Cramér-Rao lower bound, which is often selected as the cost function if the true variance of estimation error is time-varying and unknown.
Following Equation (16), we should try to maximize E v 2 j,g,k and the collective sensing area of all agents. From Equation (4), we get for g ∈ C j,k : It is straightforward to find that E v 2 j,g,k is monotonically increasing with respect to p j,k and monotonically decreasing with respect to q j,k no matter θ g,k = 1 or not. Thus, Equation (16) is further replaced with the following optimization problem: where φ k is a given nonnegative weighting function of r ∈ O at time k, and its influence on the control law will be shown later.
Voronoi partition. The introduction of the partition is for avoidance of collision between UAVs and ease of dealing with the overlapped sensing regions between neighboring agents, which will be discussed later. Since p i,k q i,k , H (µ k ) is always nonnegative and H (µ k ) ≡ 0 if h i,k h. Denoting by ∂ (•) the boundary of the corresponding region and n ∂(•) (r) the outward pointing normal vector of the boundary ∂ (•) at point r, we can compute the gradient of H (µ k ) as follows.
Theorem 2. The gradient of the cost function H (µ k ) with respect to µ i,k (h i,k < h) is given by for h i,k ∈ h, h , and Proof. See Appendix B.
Following Theorem 2, a gradient-based control law is given by where K u is a positive gain parameter. A larger K u may lead to faster convergence of to the sub-optimal configuration, but may also cause larger convergence error or oscillation around the settle points due to the discrete-time control. In real system implementations, users should choose the parameter by trading off the two performance indices.
Remark 4. Note that the control input is always upper bounded in real systems, i.e., u i,k u max for some positive number u max and the height of each agent is also bounded by h i,k < h to have meaningful detections. Moreover, to avoid collision when neighboring agents are at the same altitude, the motion of each agent should be constrained by c i,k+1 ∈ M i,k . Therefore, the control law (20) is modified as follows to be adapted to the constraints: where λ i,k is a scaling factor defined by Λ i,k is a buffer region enclosing the border of M i,k defined as follows: where 1 , 2 > 0 are given parameters for limiting the width of the buffer region and the height of each agent respectively.
Generally, it is favored that UAVs stay longer in the region with less gathered information to take more measurements. Thus, we define the weighting function φ i,k (g) as a function of the gathered information Q i,g,k−1 for each cell, i.e.: where K φ is a positive gain parameter. By this model, cells with less gathered information are given higher weights for detection. There is no specific rule for choosing the optimal K φ since it only denotes the user's preference on the search priority for different cells. In general, K φ is only required not to be too large or too small in order to properly scale the weights of different cells. For example, we find that K φ = 2 is one of the many suitable settings in our simulation.
Remark 5. The partition {M 1,k , . . . M N,k } can be static or time-varying. Partition is commonly used where each UAV only takes charge of one part of the whole surveillance region so that the whole searching task is shared by multiple agents. Users can predefine the task regions for each UAV or let the UAVs dynamically compute the partition following some rules. An example of the dynamic partition is the Voronoi partition which has been widely used in the distributed control [13].

Simulation Environment
We deploy multiple UAVs to search for four ground targets. The whole surveillance region is a square region of Initially, we set Q i,g,0 = 0 for all i and g within roads (i.e., P i,g,k = 0.5 for g ∈ O R ), and Q i,g,0 to a fixed large value for g outside the roads (i.e., P i,g,k ≈ 0 for g / ∈ O R ). The detection probability function and the false alarm probability function are assumed respectively where K 1 , K 2 , K 3 , K 4 are positive parameters satisfying the conditions in Equations (9) and (10). We test the proposed target search method in two scenarios. In Scenario I, all targets appear at k = 0 and keep stationary during the whole searching process, i.e., V Tar = 0 m/s. In Scenario II, we set V Tar = 1 m/s to test the influence of target mobility on the convergence of probability maps. The four targets also appear at k = 0 but do not disappear during the search. In these two scenarios, we verify the effectiveness of the proposed target search method by deploying different number of UAVs and using different information decaying factors. The initial positions of UAVs are randomly selected within region [0, 5] 3 m 3 . The partition {M 1,k , . . . M N,k } is generated by Voronoi partition. The communication range is set as R c = 20 m and the communication control protocol in [11] is applied for connectivity maintenance. A distributed K-connectivity maintenance algorithm has also been developed by the authors in [23] which can be applied in cooperative target search. Readers may refer to the references for more details on the communication protocols or maintenance algorithms. The cell size is fixed as 1 × 1 m 2 Other key parameters are respectively set as K u = 0.3, K φ = 2, q = 0.1,p = 0.99,q = 0.01, h = 10 m, h = 5 m, α = 0, u max = 2 m/s and T = 1 s.  Since the convergence of the individual probability map P i,g,k of agent i implies that the weight φ i,k (g) defined by Equation (24) approaches 0 for each cell, we define the following average weight to evaluate the convergence performance of the whole network: where M R denotes the total number of cells within the roads. It is easy to find that the initial value of g∈O R e −K φ Q i,g,0 = 1. In the simulations, we compare the results of φ k with different system parameters. The results are averaged from 200 Monte Carlo simulations. Figure 2 shows an example of the convergence process of individual probability maps in Scenario I with stationary targets, where the probabilities converge to 1 for the cells within which targets truly exist and 0 for the cells within which no target exists. The snapshots of UAVs in Scenario I are shown in Figure 3. Additionally, φ k finally converges to 0 and the more agents are deployed, the faster it converges as shown in Figure 4a. The convergence process of an individual probability map in Scenario II with mobile targets is shown in Figure 5, where the probabilities for the cells around targets may not converge to 0 as in Scenario I due to the random mobility of targets. However, we still can infer that there are four targets on the roads and have a rough estimation of their positions based on the envelopes of the final probability maps of UAVs. The snapshots of UAVs in Scenario II at according times are shown in Figure 6. In this case, φ k does not converge to 0 as shown in Figure 4b. However, a smaller φ k can be obtained with more agents deployed since the collective sensing area becomes larger. Compared with the results in Scenario I, the number of deployed agents has a greater impact on the convergence performance of probability maps in Scenario II with random target mobility. Hence, the algorithm is more robust with more UAVs deployed.   In addition, we also test the impact of the information decaying factor on the convergence results. According to the simulation results (as shown in Figure 7), a larger decaying factor will lead to larger average uncertainty about the target existence in the whole region, because the accumulated information for each cell decays faster. In fact, the design purpose of the decaying factor is to let agents revisit each cell at certain frequency to update the latest information about target existence in the cell. Therefore, the tradeoff lies in that, a larger decaying factor leads to larger uncertainty, but makes the agents pay more attention to the cells with fewer observations. However, there is no quantitative means of choosing the decaying factor and users may find a proper one via simulation method.

Conclusions
In this paper, we studied the three-dimensional vision-based cooperative control and information fusion in target search by a group of UAVs with limited sensing and communication capabilities.
First, heuristic detection probability and false alarm probability models were built which are related to the target discriminability of a camera and varies as a function of altitude. Then, we formulated the target search problem as a coverage optimization problem by balancing the coverage area and the detection performance. A generalized probability map updating model was proposed, by considering the information decay and transmission due to environmental changes such as the target movement. The simulation results showed that the proposed algorithms can make the individual probability maps of all agents converge to the same one which reflects the true environment when the targets are stationary. The influence of target mobility and the number of deployed UAVs on the convergence of probability maps has also been illustrated by simulation. Following this work, there is still a big potential area for future development and generalization of the proposed method. For example, the extension for detection by heterogenous sensors is an interesting topic since more types of information can be combined to improve the detection performance. More realistic environmental and system conditions that can affect the search results need to be considered such as the light intensity, block on the line of sight, camera with adjustable focus, asynchronous communication, etc.

Author Contributions
Jinwen Hu, Lihua Xie and Jun Xu conceived and designed the study. Jinwen Hu and Zhao Xu designed and implemented the simulation and method validation. Jinwen Hu wrote the paper. Lihua Xie and Jun Xu reviewed and edited the manuscript. All authors read and approved the manuscript.

Conflicts of Interest
The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1
First, we consider Case 1, where a target is present within cell g. Define the following augmented notations for the entire system: Υ g,k [Q 1,g,k , Q 2,g,k , . . . , Q N,g,k ] T Φ g,k [v 1,g,k , v 2,g,k , . . . , v N,g,k ] T Then, the updating rule (5) and (6) can be replaced by the following equation: Hence, where E [v j,g,t ] = p j,t ln q j,t p j,t + (1 − p j,t ) ln 1 − q j,t 1 − p j,t 1 {g∈C j,t } and 1 {g∈C j,t } is the indicator function defined as: Since Q i,g,0 , v j 1 ,g,η 1 and v j 2 ,g,η 2 are independent for j 1 = j 2 or η 1 = η 2 , we can get the variance where D [v j,g,t ] = p j,t (1 − p j,t ) ln q j,t p j,t − ln 1 − q j,t 1 − p j,t Considering that D [v j,g,t ] is a continuous function of p j,t and 1 >p p j,t 0.5 + ε > 0.5, 0 <q q j,t 0.5 − ε < 0.5, there exists a constant real number σ = 2 p (0.5 − ε) ln 0.5−ε 0.5+ such that D [v j,g,t ] σ 2 for g ∈ C j,t , which implies − − → 0, as m g,t → +∞ Hence, , as m g,t → +∞ On the other hand, since p j,t > q j,t , it is straightforward to get ∂E [v j,g,t ] ∂p j,t = ln p j,t q j,t − ln 1 − p j,t 1 − q j,t < 0 which implies E [v j,g,t ] 2ε ln 0.5 − ε 0.5 + ε < 0 Hence, lim sup Since the network is connected all the time, Q i,g,k for each agent i will almost surely converge to −∞ by implementing the average consensus protocol (6) (as shown in [11,25]).
Following the same procedure of the proof above, we can prove the conclusion of Case 2.

Appendix B. Proof of Theorem 2
We first consider h i,k ∈ h, h . From Equation (17), we get Defining a set of agents for agent i, i.e., N i,k {j : ∂ (M j,k ) C i,k = ∅}, it follows that For r ∈ S 2 i,j (j ∈ N i,k ), we have ∂r ∂µ i,k T = 0. Hence, From Equation (9) and according to the results of [15], we get and ∂r ∂c i,k T n S 2 i (r) = ∂r ∂c i,k T n S 2 i (r) ∂r ∂h i,k T n S 2 i (r) = tan ϕ where ϕ is half of the angle width of the field of view of each agent (as shown in Figure 1b). Therefore, Equation (18) holds. According to Equation (9), p i,k =p for h i,k ∈ (0, h). It is straightforward to get Equation (19) following the same procedure of proof as above.