Task Assignment of UAV Swarm Based on Wolf Pack Algorithm

: To perform air missions with an unmanned aerial vehicle (UAV) swarm is a significant trend in warfare. The task assignment among the UAV swarm is one of the key issues in such missions. This paper proposes PSO-GA-DWPA (discrete wolf pack algorithm with the principles of particle swarm optimization and genetic algorithm) to solve the task assignment of a UAV swarm with fast convergence speed. The PSO-GA-DWPA is confirmed with three different ground-attack scenarios by experiments. The comparative results show that the improved algorithm not only converges faster than the original WPA and PSO, but it also exhibits excellent search quality in high-dimensional space.


Introduction
In the last two decades, how to use unmanned aerial vehicles (UAVs) for various military missions has received growing attention. Over the years, UAVs have been used in military missions, such as ground attack missions (GAM), wide area search missions (WASM), suppression of enemy air defense (SEAD), and combat intelligence, surveillance, and reconnaissance (ISR).
At present, research on UAVs has changed from focus on a large full-function UAV to a swarm of small single-function stealth UAVs, because the cost of a small single-function UAV is much lower than that of a large full-function UAV. The basic performance of UAV swarm operations is to meet the requirement of fast reliable assignment and execution, which is very complex to implement because of the large amount of computation.
The task assignment of a UAV swarm is usually viewed as an optimization problem, in which the optimal solution is found by maximizing or minimizing an objective function with constraints. The classical optimization models include a mixed integer linear programming (MILP) [1] model, the traveling salesman problem (TSP) [2] model, the generalized assignment problem (GAP) [3] model, the vehicle routing problem (VRP) [4] model and the cooperative multiple task assignment problem (CMTAP) [5] model.
The methods to solve optimization models are divided into two categories. One category is to get an exact solution by mathematical programming, such as the Hungarian algorithm [6], the branch and bound search algorithm [7], dynamic programming [8], the exhaustive method [9], Newton's method [10] and the gradient method [11]. These methods can guarantee an optimal solution of a convex problem if there is a solution, but it is difficult to solve an NP-hard non-convex problem. Moreover, this kind of method is normally suitable for low-dimensional space.
Another category is evolutionary algorithms, which are inspired by the mechanism of nature's evolution of a biological population. These methods usually do not have polynomial time complexity, but can approach a global optimal or suboptimal solution with finite calculation cost. In the task assignment of the UAV swarm, it is difficult to establish an accurate mathematical model and guarantee that the model is convex. Therefore, the evolutionary algorithms are the most common methods to solve this kind of problem. The classical evolutionary algorithms include the simulated annealing algorithm (SAA) [12], the tabu search algorithm (TS) [13], the genetic algorithm (GA) [14], the ant colony optimization algorithm (ACO) [15], and particle swarm optimization algorithm (PSO) [16].
Considering the UAV swarm operation in the future, task assignment is challenging due to its computational complexity caused by two reasons. The first is the complexity of the task assignment model, including determining the order of tasks, the constraints and the coupling between task assignment and trajectory optimization. The second is the large amount of computation and long solution time caused by the dimension of the variables. For example, in a scenario where 80 aircraft attack 100 targets, assuming that only a single attack task is performed and each target is attacked only once, the variables describing the UAVs assigned to all targets, reach 100 dimensions, which greatly increases the computation. Jia et al. [17] and Su [18] improved the variable's coding method where the constraint information was added so as to overcome the model complexity caused by constraints. The co-evolutionary ant colony algorithm was presented in [19] to decompose a highdimensional variable into several low-dimensional variables. This method can effectively improve the speed of solving the algorithm itself, but it is necessary to fuse all UAVs' decisions to meet the constraints of task assignment, such as the order of tasks. In [20], by integrating PSO with the beetle antennae search (BAS) algorithm, which can overcome the defect that PSO is prone to fall into local convergence, the convergence speed and solution accuracy of the algorithm are improved. Both in [18] and [20], the fixed parameters of PSO are improved to be linear or dynamic so as to effectively accelerate the convergence speed. Braun [21] developed the island model, where several populations isolated on an island are optimized by GA until they degenerate. Then the degeneration is removed by refreshing the population on each island through individuals of other islands. This method effectively solves the problem of population degradation. To mitigate the parameter tuning on highdimensional problems, which is a computationally expensive procedure, Tuani et al. [22] introduced an adaptive approach to a heterogeneous ant colony population that evolves the alpha and beta controlling parameters for ACO. In addition, distributed task assignment methods were proposed to improve the computation efficiency to overcome the problem of large computation and long solution time. Matin et al. [23] proposed minimum distance greedy search (MDGS) to make the decision for a UAV to select the targets, when the arrivals are dynamic and appear uniformly in a known rectangular region. This approach is currently very effective in the field of dynamic task assignment for individual UAVs. Di et al. [24] used the distributed auction algorithm, in which each UAV decides their own task goals according to their own local information, to reduce the computational load of the system to a certain extent. Similarly, a market-based decentralized algorithm was proposed by Oh et al. [25] to realize online task assignment. Zhang et al. [26] used a contract network to rapidly update the task assignment scheme after pre-assignment based on PSO. Such methods did not necessarily find the optimal solution based on the global information, so the optimal assignment of resources cannot be realized.
Despite the great progress in improving efficiency, dimension explosion is still the primary concern of these methods. High-dimension combinatorial optimization is still a difficult task assignment problem. One possible direction is to establish a hybrid method to overcome the defects of each individual method, such as PSO-SA [27], PSO-GA [28,29], and DPSO-GT [30,31]. Simulation experiments have demonstrated the effectiveness of these methods, when there are under 30 dimensions. The other direction is to design a more efficient algorithm, especially for high-dimension problems. Wu et al. [32] proposed a new heuristic swarm intelligent algorithm named the wolf pack algorithm (WPA), based on a swarm of intelligent wolves. Simulation results showed that the WPA is especially suitable for solving high-dimension and multimodal function optimization problems. Subsequently, the algorithm was applied to some classical optimization problems and achieved good results [33,34]. However, whether the WPA with excellent performance in solving high-dimensional continuous problems is suitable for discrete optimization problems such as task assignment needs further verification.
The main contribution of this paper is the application of WPA in high-dimension task assignment and the improvement on the WPA with the ideas of PSO and a GA to accelerate convergence speed. The remainder of this paper is organized as follows. Section 2 formalizes the UAV swarm task assignment problem with respect to performance requirements. The principles of the WPA and the improvement of four aspects of the WPA are described in Section 3. In Section 4, the PSO-GA-DWPA is compared with the WPA and PSO by simulation experiments in three task scenarios. Finally, Section 5 concludes the paper and briefly explores future work.

Task Assignment Model
In this section, a background incorporating a UAV swarm that executes an attack mission in a large field is presented. Then, a mathematical formulation is established.
There are T N static targets whose initial positions have been revealed, and V N attacking UAVs planning to attack these targets with the minimum cost.
In actual task assignment for ground attack, the task assignment model is very complex; it includes many factors, such as fuel constraints, ammunition quantity, timing constraints between multiple tasks, UAV mobility, threats, and redundancy tolerance. The factors considered in different combat scenarios are different, so the task assignment model varies with the actual situation. This paper considers a more general task assignment model which is not limited to a specific scenario and a specific UAV type. The point is to verify whether the proposed PSO-GA-DWPA is suitable for solving high-dimensional discrete optimization problems.
To avoid the influence of uncertain factors on the analysis of simulation results, the following ideal assumptions are set up to simplify the formalization of the problem.
1. There are no obstacles and no-fly zones in the task scenario. A flight trajectory can be described as connections of straight lines. 2. It does not take into account the consumption of time spent preparing and firing the weapon. In other words, only the time the UAV swarm takes to reach the target positions is considered. 3. The UAV swarm maintains the same constant velocity and hence the flight time can be represented with the flight distance. 4. Each target can be attacked only once. There are more targets than UAVs in the swarm, which means each UAV will probably be assigned multiple targets.
The targets can be expressed as , and the attacking UAV swarm can be expressed as where Without loss of generality, two cost indicators that must be considered in task assignment are adopted: the shortest total range of all UAVs (i.e., the lowest fuel cost) and the shortest time to complete all tasks.

Total range of all UAVs
For all UAVs in the swarm, the average range is ( ) 2. Time to complete all tasks where i Time is the time of UAV i V V ∈ to finish all of its tasks. With consideration of the same constant velocity for all UAVs, the time of UAV i V V ∈ to finish all of its tasks can be expressed as its total range. Therefore, Equation (4) can be represented as Equation (5).
The optimization goal is to minimize both the total range of all UAVs and the maximum range among all UAVs.
Thus, the cost function of task assignment is ( ) where 1 ω and 2 ω are weighting factors reflecting the importance of each performance criterion, decided by the commander, and 1 2 + =1 ω ω .

The Basics of the Wolf Pack Algorithm
The WPA adopts the bottom-up design principle and simulates wolves hunting cooperatively according to their division system of responsibilities as shown in Figure 1. Wolves are divided into a leader wolf, exploring wolves and fierce wolves, and they share information with each other. The leader wolf is closest to the prey because it is most sensitive to smell, so it is the guidance of the wolf pack. Exploring wolves are responsible for detecting the environment to find the position of the prey, due to their sensitivity to smell. Fierce wolves are responsible for getting close to catch the prey quickly. The problem space is defined as an N D × Euclidean space, where N is the number of wolves in the wolf pack, and D is the number of each wolf's information dimension. The position of the wolf i N ∈ is expressed as of wolf i N ∈ , the smell concentration of prey perceived by wolf is described as the cost function value. The distance between the wolf p and q is defined as the Manhattan distance.
WPA is shown in Algorithm 1. The whole process of wolf pack hunting can be abstracted into 3 features: • the selection of the leader wolf based on the winner-take-all rule; • three cooperative behaviors including walking, calling and sieging; • an update mechanism based on the strongest-survives law.

The Selection of Leader Wolf Based on Winner-Take-All Rule
The wolf with the minimum cost can be the leader wolf at any time. Initially, the wolf with the minimum cost is chosen as the leader wolf. In the subsequent stages, if the cost of some wolf is less than that of the current leader wolf, that wolf will be the new leader. If multiple wolves are the best at the same time, pick one at random to be the leader wolf. The leader wolf goes directly to the next iteration without executing the following cooperative behaviors, until it is replaced by a new leader wolf.
3.1.2. Three Cooperative Behaviors 1. Walking behavior. S suboptimal wolves are selected as exploring wolves to perform the walking behavior. S is a random integer picked from , 1 , where α is the scale factor of exploring wolves. Starting from the current position, exploring wolf i N ∈ makes one step forward towards h directions. On account of individual differences, h generally takes a random integer within a limited range. The new position to the ( ) where a step is the length of walking step. Exploring wolf i N ∈ will return to its initial position after detecting each direction, then choose the best one to update its position. All exploring wolves will keep walking until the satisfy one of the following conditions.
• As long as one exploring wolf's new position is better than that of the leader wolf, this exploring wolf will be the new leader wolf and the wolf pack will move to the calling behavior.
• When walking times reach the maximum max T ,the wolf pack move to the calling behavior.
where b step is the length of raid step. Fierce wolves will continue to get close to the leader wolf until one of the following conditions is satisfied.
• As long as one fierce wolf's new position is better than that of the leader wolf, this fierce wolf will be the new leader and the wolf pack moves to the sieging behavior.

•
The Manhattan distance between a fierce wolf and the leader wolf is less than the threshold distance near d . near d is estimated according to where ω is the factor representing the threshold distance and [min max ] 3. Sieging behavior. The leader wolf, whose position is treated as the prey's position, guides all other wolves to siege the prey with a small step. For iteration k , the position of the prey is k leader X ; then the new position of wolf i N ∈ is updated according to where λ is a random real number in [ ] -1,1 , and c step is the length of the siege step. For any wolf, if its new position is better than the current position, its position will be updated, otherwise its current position will be kept. The wolf whose position is the best will be chosen as the leader wolf.

Update Mechanism Based on Strong-Survive Law
Prey is distributed according to the principle of strong to weak, which will cause the weakest wolves to starve without food, that is to say, the wolves too far away from the prey will be eliminated. In the algorithm, the R weakest wolves will be eliminated from the population and extra new R wolves will be generated randomly. The larger R is, the more wolves will be generated, which is conducive to maintaining the diversity of individuals in the population. However, if R is too large, the algorithm tends to perform a random search. On the contrary, if R is too small, it is not conducive to maintaining the diversity of individuals in the population, and the ability of the algorithm to open up a new solution space is weakened. Since the size and number of prey are different with each capture, the number of wolves starving to death varies. R is a random integer , where β is the scale factor of the wolf population update.

The Proposed PSO-GA-DWPA
The WPA is suitable for continuous problems where the variable in each dimension changes continuously. However, the task assignment problem is a discrete problem in which the variable in each dimension is an integer belonging to a set. As such, based on the rules of the WPA, it is necessary to improve the WPA to match the integer discrete character of the task assignment. In addition, in order to improve the convergence speed and solution accuracy, PSO and a GA are introduced into WPA. The details of the PSO-GA-DWPA are as follows and the algorithm is shown in Algorithm 2.

Integer Matrix Coding
Integer matrix coding is adopted to express the assignment schema where i X is the position of the wolf i N ∈ , N is the population number of wolves, the dimensions of variable is T N because there are more targets than UAVs and each target can only be attacked once. The 1 st row of Equation (12) , then the task assignment scheme is as follows:

Improvement on Walking Behavior
The walking behavior of wolves is essentially an active exploration of the unknown environment. This process determines the extent of the current best solution approaches to the global optimal solution. In the WPA, each exploring wolf scouts the prey from h directions divided equally by 360 degrees as Equation (8), and adjusts the coverage of the scout by changing the size of h . The bigger h is, the larger the coverage of the scout will be, but the speed of the scout will be relatively slower. To scout the prey more effectively, PSO is introduced in this paper to make the walking process completed from as few directions as possible, under the guidance of the global extremum and the individual extremum.
Each exploring wolf is represented as one particle in the PSO. The implementation of PSO is divided into three steps [20]: tracking individual extremum, tracking global extremum and individual variation.

Tracking individual extremum
This part of the formula is expressed in the discrete domain as Equation (13). This represents a copy operation with probability 1 c .
k i X represents the position of wolf i S ∈ in iteration k . k i P is the optimal solution of wolf i S ∈ according to the current scouted directions, representing the individual extremum of wolf i S ∈ in iteration k . Note that k i ϕ is a temporary variable.
During the operation in Figure 2, a random number

Tracking global extremum
Equation (14) represents a copy operation with probability 2 c .  , During the operation in Figure 3, a random number

Individual variation
This part of the formula is described in the discrete domain as Equation (15), where a step is the walking step, which determines how many columns of k i λ will mutate. k i W is a temporary variable representing the result of the walking behavior.
During the operation in Figure 4, function ( )

Improvement on Calling Behavior
The calling behavior is an essential process for fierce wolves to converge to the current optimal solution which is the position of the leader wolf. This behavior determines the convergence rate of the algorithm. Inspired by gene segment duplication in the GA, each fierce wolf duplicates a piece of the leader's position to replace its own to realize the fast approach of the leader wolf.
This part of the formula is expressed in the discrete domain as Equation (16).
During the operation in Figure 5, a continuous segment of length b step is randomly intercepted from the wolf i M ∈ , then, is replaced by the corresponding segment of the leader wolf.

Improvement on Sieging Behavior
This part of the formula is expressed in the discrete domain as Equation (17), where c step is siege step, and is the result of the siege behavior. ( , ( )), ( , ( )) ( , ( )), During the operation in Figure 6

Improvement in Wolf Population Update
In WPA, R weakest wolves are eliminated and replaced by new wolves generated randomly just as the initialization. Experiments show that new wolves generated by this method do not have competitiveness, and they are finally eliminated and meaningless in the algorithm. In the PSO-GA-DWPA, a new wolf is generated by using the method of small variation of the leader wolf. The process is similar to individual variation in walking behavior. This is expressed in the discrete domain as Equation (18), where d step is variation step, and  In this way, the new wolf can not only be competitive, but can also conduct local optimization at the position of the leader wolf to avoid local convergence.

Experiments of Task Assignment for UAV Swarm Using PSO-GA-DWPA
The performance of the PSO-GA-DWPA is analyzed in this section using simulation according to the assumptions in Section 2. By setting three scenarios with different numbers of UAVs and targets, we verified the applicable scope and efficiency of the algorithm. The parameters used for the simulation and the PSO-GA-DWPA are summarized in Table 1.

Monte Carlo Simulation in Different Scenarios
A Monte Carlo study, consisting of 50 runs, is used in this section to compare the performance of the PSO-GA-DWPA, WPA and PSO algorithms with respect to the cost function of Equation (6)  First, a small size scenario of 5 UAVs on 8 targets is analyzed. In Figures 9 and 10, the cost curve and the running time curve (the intersection of the solid line and dotted line indicates the convergence time) are plotted to compare the convergence performances of the three methods. Figure 9 shows that the optimization capabilities of the three algorithms are similar, but the convergence speed of the WPA and the PSO-GA-DWPA is better than that of PSO. Also, PSO is time-consuming and WPA and PSO-GA-DWPA is time-saving. Since the calculation process of the PSO-GA-DWPA is more complicated than that of the WPA, running time for each iteration of the PSO-GA-DWPA is slightly longer than that of the WPA. However, the PSO-GA-DWPA needs less number of iterations to converge, so the convergence time is shorter than that of the WPA.
Compared with the recognized fast and accurate PSO algorithm, the advantage of the WPA and the proposed PSO-GA-WPA in a low-dimensional solution space (such as the 8-dimensional solution space in scenario 1), is not obvious. The experiments in Scenario 1 prove the availability of the WPA and the PSO-GA-WPA for task assignment problem. The higher the dimension of the solution space is, the advantage of the proposed algorithm is more obvious. Compared with the WPA, the PSO-GA-WPA has the advantage of a faster convergence speed. Although their convergence costs are similar, the PSO-GA-WPA can obtain a convergent solution in a shorter time. Second, we set the size as 20 UAVs on 30 targets, in other words, the dimension of the optimization problem is 30. As shown in Figure 10, it is obvious that the optimization performances of the WPA and the PSO-GA-DWPA are much better than that of the PSO. It can be seen that the WPA and the PSO-GA-DWPA can obtain a better solution in a shorter time. PSO-GA-DWPA has the same optimization performance as WPA, however, its convergence speed is better than that of the WPA. Finally, the dimension of the optimization problem is set to be 150, that is, the scenario is 100 UAVs on 150 targets. The convergence performance and running time of different measures are shown in Figure 11. With the increase of dimension, PSO has become ineffective, (i) the optimal solution cannot be obtained; and (ii) the calculation speed is too slow, and the time required for 1000 iterations is nearly 2.5 times that of the other two algorithms. Running time for each iteration of the PSO-GA-DWPA is slightly longer than that of the WPA, but its convergence speed in the early stage is obviously faster, so its convergence time is shorter than that of WPA.
The comparison of the three methods in 3 scenarios is shown in Table 2, which indicates visually that the overall performance of the PSO-GA-DWPA is better than that of the other two methods. According to   Table 2, compared with the PSO, which is recognized as fast and accurate, the advantage of the WPA and the proposed PSO-GA-WPA in the low-dimensional solution space (such as the 8-dimensional solution space in Scenario 1) is not obvious. The experiments in Scenario 1 prove the availability of the WPA and the PSO-GA-WPA for task assignment problem. The higher the dimension of the solution space is, the advantage of the proposed algorithm is more obvious. Compared with the WPA, the PSO-GA-WPA has the advantage of a faster convergence speed. Although their convergence costs are similar, the PSO-GA-WPA can obtain a convergent solution in a shorter time.

The real-Time Analysis of the PSO-GA-DWPA
In practical application, the situation of the battlefield is rapidly changing; as a result, the time of task assignment is limited. When the optimization problem scale is small, the global optimal solution or sub-optimal solution can be obtained within the specified time. However, if the dimension of the optimization problem is high, it is impossible to get the optimal solution in the specified time. In this case, an algorithm with a faster convergence speed is required in the early stage to obtain a relatively reasonable solution.
As shown in Figure 9b, to get the convergent solution of the optimization problem with 8 dimensions, the WPA needs 5 s and the PSO-GA-DWPA needs 2 s, which meets the requirement of real time. While as shown in Figures 10b and 11b, the WPA needs 70 s and the PSO-GA-DWPA needs 30 s when the dimension of the problem is 30, even worse, the WPA needs 2500 s and the PSO-GA-DWPA needs 1800 s when the dimension of the problem goes up to 150. Hence in practice, a compromise must be made between the optimal solution and the convergence time.
In the 150-dimensional scenario, it is required to make the decision of task assignment within a specified time. Fifty experiments with three methods were conducted to compare the adaptability of the three methods. The experimental results are shown in Figure 12. As can be seen from the results in the figure, the PSO-GA-DWPA obtains a better solution within the specified time. If we need to obtain a better but not optimal solution in finite time, the PSO-GA-DWPA is a better choice.

Conclusions
Aiming at the problem of the UAV swarm task assignment, this paper introduces a novel swarm intelligence algorithm, the wolf pack algorithm (WPA), which is particularly suitable for highdimensional continuous optimization problems. In order to apply this method to the discrete optimization problem and improve the convergence speed, this paper proposes the PSO-GA-DWPA based on matrix coding and the principles of particle swarm optimization and genetic algorithm. According to a general task assignment model and simulation experiments, this method proved to be applicable for high-dimensional task assignment.
The simulation results show that in the case of a small dimension such as Scenario 1, the PSO-GA-DWPA has almost the same optimization ability as the PSO. However, when the dimension increases to 30 in Scenario 2, the PSO-GA-DWPA is obviously superior to PSO both in terms of convergence speed and solution accuracy. When the dimension increases to 150 in Scenario 3, the PSO fails, while the PSO-GA-DWPA can still obtain an effective solution. Therefore, this method can serve as a useful reference for the UAV swarm task assignment. Funding: This research received no external funding.

Acknowledgments:
The authors would like to thank the anonymous reviewers for their constructive comments and suggestions, which improved the quality of the paper.

Conflicts of Interest:
The authors declare no conflict of interest.