The Application of Improved Harmony Search Algorithm to Multi-UAV Task Assignment

: In this work, aiming at the problem of cooperative task assignment for multiple unmanned aerial vehicles (UAVs) in actual combat, battleﬁeld tasks are divided into reconnaissance tasks, strike tasks and evaluation tasks, and a cooperative task-assignment model for multiple UAVs is built. Meanwhile, heterogeneous UAV-load constraints and mission-cost constraints are introduced, the UAVs and their constraints are analyzed and the mathematical model is established. The exploration performance and convergence performance of the harmony search algorithm are analyzed theoretically, and the more general formulas of exploration performance and convergence performance are proved. Based on theoretical analysis, an algorithm called opposition-based learning parameter-adjusting harmony search is proposed. Using the algorithm to test the functions of different properties, the value range of key control parameters of the algorithm is given. Finally, four algorithms are used to simulate and solve the assignment problem, which veriﬁes the effectiveness of the task-assignment model and the excellence of the designed algorithm. Simulation results show that while ensuring proper assignment, the proposed algorithm is very effective for the multi-objective optimization of heterogeneous UAV-cooperation mission planning with multiple constraints.


Introduction
It is very common to use multiple unmanned aerial vehicles to perform various tasks [1,2], especially military tasks [3]. UAVs cooperatively patrolling perimeters, monitoring areas of interest or escorting convoys have been key research topics [4][5][6]. Power inspection, air rescue, large parties and so on are the most common application examples of Multi-UAV cooperation [7][8][9]. Therefore, it is an important research aspect to seek an efficient task-assignment mechanism to solve the task assignment problem of multiple UAVs [10]. In this context, the problem of multi-UAV cooperative task allocation arises at the historic moment.
Collaborative task assignment of multiple UAVs is to assign one or more tasks to UAVs under various constraints in the battlefield environment and determine the sequence of tasks to be executed by UAVs. Task assignment is an important part of collaborative task planning of multiple UAVs, and is also the prerequisite for realizing collaborative operation of UAV groups [11]. In the allocation process, factors such as the type of tasks, timing constraints among tasks, types of tasks that can be performed by UAVs, combat capabilities and payloads should be fully considered, and combat tasks should be reasonably assigned to the UAV fleet with the goal of achieving optimal efficiency of all tasks [12,13]. Therefore, the collaborative task-assignment problem of multiple UAVs is a non-deterministic polynomial (NP) problem of combinatorial optimization under multiple constraints [14,15]. underlying search mechanisms of HS to balance exploration and exploitation, establish the iteration equation of HS and propose an improved HS; (c) use the improved HS algorithm to simulate and verify a multi-UAV task-allocation combat example.
The remainder of this paper is organized as follows. In Section 2, the modeling of heterogeneous UAV task-assignment planning is presented in detail. Section 3 is the theoretical analysis of harmony search. The proposed algorithm is described in Section 4. The experiments and their results of the proposed algorithm are presented in Section 5. The task-allocation simulation experiment is designed and discussed in Section 6. Section 7 discusses related work. Section 8 then makes concluding remarks and indicates future works.

Modeling Heterogeneous UAV Task Assignment Planning
Multi-UAV cooperative task assignment is a complex multi-objective optimization and decision problem. Most current research models assume that the mission attributes of UAVs are consistent. However, multi-UAV formation is not a linear addition of a certain number of individual UAVs. Heterogeneous formation disperses the functions of reconnaissance and surveillance, electronic interference, strike and evaluation into a large number of single UAVs with low cost and single function. Through the high level of cooperation and self-organization among a large number of heterogeneous individuals, the cluster function far exceeds the individual function that can be realized. In addition, as the multi-UAV formation combat involves various aspects of the combat target and the task complexity is different, some large tasks need a certain combat sequence, and it is necessary to deploy and combine UAVs with different functions and characteristics to complete the corresponding combat task. For the above reasons, this section proposes a heterogeneous multi-UAV task-allocation model to enhance the antagonism of combat subjects and achieve higher-formation combat effectiveness in view of future diversified combat requirements.

Objective Function of Multi-UAV Cooperative Target Assignment Model
If the UAV formation of our side performs the task U = (U 1 , U 2 , U 3 , . . . , U N UVA ), the target list of the task is T = (T 1 , T 2 , T 3 , . . . , T N TARGET ), M = {M 1 , M 2 , M 3 } represents the task type, N UVA and N TARGET are the number of UAVs in the formation and the number of targets in the task list respectively, M 1 represents reconnaissance, M 2 represents attack and M 3 represents evaluation. The essence of multi-UAV formation task assignment is to solve a target-assignment scheme to maximize target fitness (Tar_fitness). Based on this, the objective function of a multi-UAV cooperative target-allocation model is given as follows Definition 1.
Tar_fitness i (T j ) = Reward i (T j )/Cost i (T j ). (1) where Tar_fitness i (T j ) is the fitness of the UAV numbered i performing the task numbered j, Reward i (T j ) is the reward of the UAV numbered i performing the task numbered j and Cost i (T j ) is the cost of the UAV numbered i performing the task numbered j. It can be seen that task fitness is directly proportional to the reward of performing tasks and inversely proportional to the cost of performing tasks.

Task Rewards Modeling
In this paper, the rewards of UAV numbered i performing a certain task numbered j are mainly determined by three factors: (1) UAV's capability to perform specific tasks. The capability of UAV to perform tasks is quantified by its payload, such as detection device and weapon system, combined with its own performance, and described in probability based on historical experience (whether it has performed relevant target tasks in the past).
(2) the worth of enemy targets. The worth of enemy targets is determined in advance based on the information obtained beforehand, combined with the tactical plan available for reference. For example, destroying the enemy's command center can paralyze the enemy's operation, so the worth of the command center is relatively high, while striking the enemy's small warehouse has relatively low worth compared to the whole operation.
(3) the defense capability of enemy targets. Combined with the detected battlefield environment information and advance intelligence information, the enemy targets, such as surface-to-air missile positions, anti-aircraft gun positions, flight bases and ground radar are analyzed and quantified.
From the above analysis, the mission rewards of UAV can be defined as follows: where Reward i (T j ) is the rewards of a certain task numbered j performed by a UAV numbered i with a specific load, P i (T j ) is the completion capability of the UAV numbered i to perform tasks the task numbered j and Defence(T j ) is the defense capability of the target.
The reward of each UAV mission meets the additivity, and the decision variable of the mission is introduced as follows: x ij = 1, UAVi performs the task j 0, UAVi does not perform the task j , Then, the total rewards of multiple UAV formation's task execution on the target list are:

Task Cost Modeling
In the course of reconnaissance, attack or evaluation, UAV formation will encounter confrontational threats and terrain threats. Confrontational threats are mainly radar detection and fire attack by enemy targets during the execution of missions, so the defense capability of our UAV and the attack capability of enemy targets should be considered in task allocation. Terrain threats mainly involve the air-range cost of a UAV during its mission, terrain height and altitude change rate. On some terrain that is difficult to leap over, the UAV flying too high will consume too much energy, and the UAV climbing too-steep terrain is likely to impact the terrain and encounter other threats, so it should try to avoid the distribution results of too-long air-range or steep terrain. Based on the above analysis, the task-cost modeling in this section includes terrain-cost modeling and loss-cost modeling. At the same time, terrain cost and loss cost of different dimensions are normalized.
(1) Terrain Cost The air-range cost of a UAV reflects the duration of the mission, the length of the UAV and the energy consumption. The Euclidean distance between the UAV and the enemy target is used in this paper to estimate the air-range cost. This is because the formation cannot accurately obtain the relevant data before the actual execution of the mission.
where Pos x (U i ), Pos y (U i ) is the initial position coordinate of the UAV numbered i, Pos x (T i ), Pos y (T i ) is the position coordinate of the enemy target numbered i, Pos x T j , Pos y T j is the position coordinate of the enemy target numbered j. Equation (5) gives the Euclidean distance between the UAV numbered i and the task target numbered j, as well as the Euclidean distance between the task numbered i and the task numbered j. To facilitate calculation, the matrix Range is established as the distance matrix.
According to the above equation, Range is a matrix of (N UAV + N TARGET ) × N TARGET , which contains the Euclidean distance between UAV and enemy targets and the Euclidean distance between enemy targets. Thus, the total air-range of the UAV numbered i returning to its initial position after completing the corresponding task target list can be given as follows: where N represents the total number of tasks performed by UAV numbered i. Further analysis of Equation (7): when n = 0, Length(i) 0,1 represents the distance between the initial position of the UAV numbered i and the mission target numbered 1. When n = {1, 2, . . . , N − 1}, Length(i) n,n+1 represents the Euclidean distance between the No. n mission target, performed by the UAV numbered i, and the No.(n + 1) mission target. When n = N, Length(i) N, N+1 represents the distance of UAV numbered i returning to its starting point after completing the last mission target. Considering the impact of terrain height and altitude-change rate on the UAV's loss, terrain-steepness factor λ is introduced to the weighted UAV's track. Assuming that the battlefield environment can be known in advance before the execution of the mission, the value of the terrain-steepness factor λ is determined by combining the result of task assignment with the terrain information known in advance. In this paper, λ ranges from [1.2, 2.5] according to the complexity of the terrain.
(2) Loss Cost The loss cost of a UAV is mainly caused by the confrontational threats encountered by UAV in the execution of tasks. Therefore, the loss cost mainly considers the following factors: the worth of UAV with specific load, the defensive capability of UAV itself, and the attack capability of enemy targets. Combined with the significance of loss cost, it can be concluded from the above analysis: where LossCost ij is the loss cost of the UAV numbered i when executing the mission target numbered j, Strike(T j ) is the attack capability of the mission target numbered j, Worth(U i ) is the worth of UAV numbered i, Defense(U i ) is the defense capability of UAV numbered i.
Thus, it can be concluded that the task cost of multi-UAV formation in task allocation is as followed: x ij · LossCost ij LossCost max (9) Electronics 2022, 11, 1171 6 of 28 where LossCost max is the maximal element with the total loss cost of performing all tasks, λ i is the terrain steepness factor reflecting the complexity of different UAV flight paths, ω 1 and ω 2 respectively represent the UAV's track weight and loss weight and x ij is the task decision variables introduced in Section 2.2.

Modeling Constraints
(1) Constraints on Task Completion During the execution of the mission, UAVs involved in such tasks as strike and reconnaissance must perform all the missions, and there cannot be a situation where some tasks are not assigned, namely, where, i = {1, 2, . . . , N UAV } and j = {1, 2, . . . , N TARGET } represent the number of UAVs and the number of assigned tasks respectively.
(2) Constraints on Task Non-Redundancy During task execution, it must be ensured that each task can only be performed by a certain UAV, namely, (3) Constraints on UAV's Capability The number of task targets numbered j performed by the UAV numbered i should not exceed the maximum of its own task capability: where Load i represents the maximum number of tasks that UAV numbered i can execute.
(4) Constraints on UAV's Attack Range When task target assignment is completed, the air-range of the total mission performed by the UAV numbered i cannot exceed its maximal flight distance, namely: where λ i is the terrain steepness factor, and D(i) max is the maximal air-range of UAV numbered i. In order to transform constrained optimization problems into unconstrained problems, this section introduces the penalty function and designs an outpoint method to punish the consumption cost of infeasible solutions when infeasible solutions appear in the population. The constrained optimization method improves the efficiency of the task-assignment model. Therefore, Equation (1) is further updated, and the final total task fitness is as follows: Pu = 1, when task assignment meets constraints 0, when task assignment does not meet constraints , where δ is a positive number known as the penalty factor, and Pu is the penalty function.

Task-Assignment Coding
Constructing the mapping relationship between particle and feasible solution of cooperative-task assignment for multiple UAVs is the key step for the swarm intelligence algorithm to solve the optimization problem. In the collaborative-task assignment process for multiple UAVs, the response of the UAVs to enemy target reconnaissance, attack and evaluation tasks should be ensured, and the timing of tasks should also be ensured. That is, an enemy target must be reconnoitered before the attack task, and finally the effect evaluation, which embodies the double-layer coding meaning of particles on the taskassignment problem. Referring to reference [31], this paper adopts real vector coding. Suppose that the dimension of the particle is the number of the task target list, the subscript of the particle corresponds to the number of tasks, the integer part of the particle position is rounded up to indicate the number of UAVs and the fractional part of the particle position corresponds to the task order of UAVs from small to large. The mapping relationship between particle coding and task target assignment can be expressed by the following example. Suppose that three UAVs must perform nine tasks, and the task-assignment coding is shown in Figure 1.
where  is a positive number known as the penalty factor, and Pu is the penalty function.

Task-Assignment Coding
Constructing the mapping relationship between particle and feasible solution of cooperative-task assignment for multiple UAVs is the key step for the swarm intelligence algorithm to solve the optimization problem. In the collaborative-task assignment process for multiple UAVs, the response of the UAVs to enemy target reconnaissance, attack and evaluation tasks should be ensured, and the timing of tasks should also be ensured. That is, an enemy target must be reconnoitered before the attack task, and finally the effect evaluation, which embodies the double-layer coding meaning of particles on the taskassignment problem. Referring to reference [31], this paper adopts real vector coding. Suppose that the dimension of the particle is the number of the task target list, the subscript of the particle corresponds to the number of tasks, the integer part of the particle position is rounded up to indicate the number of UAVs and the fractional part of the particle position corresponds to the task order of UAVs from small to large. The mapping relationship between particle coding and task target assignment can be expressed by the following example. Suppose that three UAVs must perform nine tasks, and the task-assignment coding is shown in Figure 1. Figure 1. Mapping relationship between particle coding and UAV task assignment problem.

The Theoretical Analysis of Harmony Search
This section analyzes the performance of the HS algorithm from two aspects: exploration performance and iterative convergence performance.

Exploration Performance
The HS algorithm explores the solution space by improvisation, and its development is accomplished by updating the operation steps. In reference [30], the exploration ability of HS is measured by the evolution of the expected population variance with the number of iterations. However, only symmetric intervals [−a, a], a ∈ R are considered for analyzing the evolution of the group variables of HS and its exploration performance. Here, this paper gives results for asymmetric intervals, as shown in Theorem 1.
Proof. when x ∈ [−a, a], a ∈ R, the following result as Equation (17) is obtained, which had been done by S. Das et al. in literature [30].
where E(x) is the mathematical expected value of the variable x. In combination with the random selection operation expressed as Equation (20) in improvisation, Equations (18) and (19) are used to deduce the expressions of E(x new ) and E((x new ) 2 ).
where φ(R) is the probability density function of R. Then Equation (23) is obtained.

Convergence Performance
According to literature [30], bw = k · E(x) is obtained, so Then, the expected variance of the population variable x g in the g-th generation is: With increasing evolution generations, the increase in expected population variance gives the algorithm strong exploration ability. However, if only the exploration ability is considered, the HS algorithm will not converge in the last generation, that is, the HS algorithm keeps searching for new information, but does not get effective information at the end of the algorithm. Therefore, the convergence of the algorithm is a very critical problem. In the following, the convergence of the HS algorithm is analyzed from the expectation of population variance and expectation of population mean.
According to Equations (37) and (39), y = Mx + B is obtained. After the evolution of g-th generation, y g = M g x + (M g−1 + M g−2 + ··· + M 2 + M) · B can be obtained. Thus, the characteristic roots of the iterative equation can be obtained as This proof is complete.
From the proof of Theorem 2, the following conclusions are obtained.
(1) The parameter HMCR is actually the spectral radius of the iteration matrix, which directly determines the convergence of the iteration equation.
(2) When ε is close to 0 and parameter HMCR is close to 1, the algorithm has a fast convergence rate, which verifies the rationality of HMCR > 0.9.
(3) In order to find a balance between higher convergence accuracy and better exploration ability, the parameter bw can be adjusted to prevent the algorithm from falling into local optimal.

The Opposition-Based Learning Parameter-Adjusting Harmony Search (OLPDHS) Algorithm
Through the above analysis and discussion, on the one hand, E(var(x)), which is related to the original HM initialization, is the part of E(var(y)), so it is necessary to find a better initialization method to explore an effective offspring population for improving the algorithm's exploration. On the other hand, adjusting key control parameters' values is a good choice to obtain better fine-tuning properties and convergence speed.
The OLPDHS algorithm is proposed. On the one hand, the algorithm improves the initialization method. On the other hand, the fine-tuning properties and convergence speed of HS can be improved by adjusting parameter value. What the OLPDHS algorithm has in common with the classical HS algorithm is that both of them go through four processes: initializing the parameters involved in the algorithm, initializing the harmony memory, improvising a new harmony and updating the harmony memory. The difference between the two is the way of population initialization and the way of control parameter change.

The Opposite-Based Learning Strategy for Population Initialization
Opposite-based learning (OBL) was first proposed by Tizhooshin in the literature [33]. The main idea of OBL is to obtain a better approximated solution than the current candidate solution, by considering both an estimate and another estimate that is opposite to this estimate, so OBL expands the solution space and reduces the search time [34,35].
In the original HS algorithm, the initial population is generated in a random way. In order to improve the quality of the initial harmonic memory warehouse, this paper proposes a variety of alternative opposite learning strategies, which include five methods based on opposite learning strategies [36]. Method 1:

Parameter Adjustment Policy
In order to balance the exploration ability and development ability of the HS algorithm, it is necessary to adjust the key control parameter bw dynamically during the search process.
According to Theorem 2, the parameter bw (bw j = γ · E(x),x = (1/HMS)∑ HMS i=1 x i ) is dynamically adjusted as follows. To better detail bw, the more general expression for the average x of the group is given as Equation (40): where j is the jth dimension of the problem, D is the maximum dimension of the problem and x j represents the average value of the jth dimension variable. Then, get E(x l,j ) = E(x r,j ), here x r,j are randomly selected from HM and P(r = w) = 1 HMS , thus E(x r,j ) can be obtained as Equation (41).
Thus the result of the parameter bw is obtained.
It can be seen from Equation (42) that the dynamic adjustment of the parameter bw mainly depends on the square root of the mean value of the harmony vectors of the population after each update. In this way, it is easier to realize the actual operation of dynamic adjustment of the parameter bw. When the new solution is fine-tuned according to Equation (40) or Equation (41), the information of the current population is carried out by the parameter bw. The fine-tuning of the new solution has global guiding significance. However, the fine-tuning solution with the fixed value of the parameter bw does not consider the current population search situation. Additional calculation E x j after each evolution is required, which increases the calculation amount of the algorithm.

Simulation Experiments and Analysis of OLPDHS
In order to verify the rationality and effectiveness of the OLPDHS algorithm in this paper, MATLAB is used to test algorithm performance from different perspectives.

Influence of the Parameters HMCR, HMS and PAR on OLPDHS
Firstly, the influence of the parameter HMCR on the performance of the OLPDHS algorithm is analyzed by simulation. The unimodal function Sphere and the multimodal function Rastrigin are used as the test functions, and the parameters are set as HMS = 5, PAR = 0.5, bw j = γ · E(x), γ = E(x), the maximum number of evolutions G = 1000, the dimension of the function is 30, and the HMCR is changed dynamically from 0.1 to 0.99, proceeding through 0.1, 0.25, 0.5, 0.75, 0.9 and 0.99. MATLAB is used for simulation, and the relationship between the population variance expectation and the iterations number is obtained, as shown in Figure 2. It reflects that for the function Sphere and the function Rastrigin, when the values of parameter HMCR are different, the search capability of the OLPDHS algorithm changes over iterations. Figure 3 shows the relation between the Sphere function value and the Rastrigin function value obtained by the OLPDHS algorithm under different parameter HMCR values with the change of evaluation numbers, reflecting the influence of different parameter HMCR values on the convergence performance of the OLPDHS algorithm.    As can be seen from Figures 2 and 3, the value of HMCR has a great impact on the expected population variance and the convergence of the OLPDHS algorithm. With the increase in the value of HMCR, the smaller the expected population variance and convergence accuracy, the larger the value of HMCR and the better the convergence rate of the OLPDHS algorithm.
In addition, when HMCR values are 0.01, 0.25, 0.5, 0.75, 0.9, 0.95, and 0.99, the mean value and variance results of the two test functions are shown in Table 1 and Table 2, respectively. For the Sphere function, Table 1 shows that the smaller the value of HMCR, the worse the performance of the OLPDHS algorithm. When HMCR < 0.9, the results obtained by the OLPDHS algorithm have deviated significantly from the global optimal solution. When HMCR > 0.9, the OLPDHS algorithm quickly converges to or near the global optimal solution. For the multimodal function Rastrigin, Table 2 shows that the value of HMCR has a similar impact on the performance of the OLPDHS algorithm to that in the Sphere function. According to Tables 1 and 2, the convergence accuracy is better with the increase in HMCR value.  As can be seen from Figures 2 and 3, the value of HMCR has a great impact on the expected population variance and the convergence of the OLPDHS algorithm. With the increase in the value of HMCR, the smaller the expected population variance and convergence accuracy, the larger the value of HMCR and the better the convergence rate of the OLPDHS algorithm.
In addition, when HMCR values are 0.01, 0.25, 0.5, 0.75, 0.9, 0.95, and 0.99, the mean value and variance results of the two test functions are shown in Tables 1 and 2, respectively. For the Sphere function, Table 1 shows that the smaller the value of HMCR, the worse the performance of the OLPDHS algorithm. When HMCR < 0.9, the results obtained by the OLPDHS algorithm have deviated significantly from the global optimal solution. When HMCR > 0.9, the OLPDHS algorithm quickly converges to or near the global optimal solution. For the multimodal function Rastrigin, Table 2 shows that the value of HMCR has a similar impact on the performance of the OLPDHS algorithm to that in the Sphere function. According to Tables 1 and 2, the convergence accuracy is better with the increase in HMCR value. Then, the influence of parameter HMS and parameter PAR on the OLPDHS algorithm is analyzed through simulation experiments. Nine functions ( f 1 ∼ f 9 ) with different characteristics are selected to study the impact of parameters HMS and PAR on the performance of the OLPDHS.
The dimension of all functions is set to 30, the maximum number of iterations to 20,000, and each function is set to run 30 times independently. Table 3 Table 4. Effects of HMS for nine functions.        As observed in Table 3, none of the PAR values performs well for all of the test functions. When the value of PAR is in the interval [0.3, 0.7], the mean and SD of all the functions except function f 4 and function f 7 are relatively good. For function f 4 , the larger value of parameter PAR results in poor performance of the OLPDHS algorithm. For function f 7 , the values of PAR have little effect on the performance of the OLPDHS algorithm. By observing Figure 4 and comparing the optimization curves, it can be found that when PAR is in the interval [0.3, 0.7], the optimization curves of the OLPDHS algorithm are better. Thus, the value of PAR has an impact on the performance of the OLPDHS algorithm, and the value is determined based on the analysis of the specific optimization problem.
As can be seen from Table 4, except for the function f 4 (when the HMS value is large, the optimization result is better), the convergence accuracy of other functions deteriorates as the HMS value increases. As shown in Figure 5, except for function f 4 , the optimization curve obtained by other functions is better with the decrease in HMS value. Through analysis, it can be concluded that the value of HMS is appropriate in the interval [5,40].
Finally, the parameter value range of the OLPDHS algorithm to obtain better performance can be obtained through the above simulation analysis. The specific values of parameters should be determined by the optimization problem itself and the optimization model, which must be tested repeatedly in practice.

Performance Comparison of the OLPDHS Algorithm and Other HS Algorithms
In order to prove the efficiency of the OLPDHS algorithm, the OLPDHS algorithm is compared with the recently developed HS algorithm and its improved version (IHS [37], GHS [38], SAHS [39], SGHS [40], NGHS [41] NDHS [42], EHS [30] and ITHS [43]) by testing the unconstrained optimization problem. Related parameters are set as follows: (1) Figure 6 shows the convergence properties of 10 algorithms tested by 9 functions with 30 dimensions. In most cases, the OLPDHS algorithm has better convergence than the other nine HS algorithms. Except for functions f 4 and f 7 , the convergence of the OLPDHS algorithm is not as good as expected, while SGHS, NGHS and EHS algorithms have a better convergence trend. For functions f 3 and f 9 , the OLPDHS algorithm has a convergence no better than the ITHS algorithm, but the OLPDHS algorithm has better performance than other algorithms. On the whole, the OLPDHS algorithm has faster convergence than other HS algorithms.

Task-Allocation Simulation Experiment Based on OLPDHS
In order to fully verify the effectiveness of the task-allocation algorithm and model designed in this paper, three unmanned aerial vehicles with different loads were used to perform eight different mission objectives as a combat example for simulation verification. In the simulation, four algorithms are used to independently solve the task assignment model 10 times, respectively. The simulation environment and algorithm parameters are consistent with the Section 5.

Initial Information of Simulation
Battlefield space is a 100 km × 100 km square map. According to the description of the modeling process in Section 2, the terrain-steepness factor is

Task-Allocation Simulation Experiment Based on OLPDHS
In order to fully verify the effectiveness of the task-allocation algorithm and model designed in this paper, three unmanned aerial vehicles with different loads were used to perform eight different mission objectives as a combat example for simulation verification. In the simulation, four algorithms are used to independently solve the task assignment model 10 times, respectively. The simulation environment and algorithm parameters are consistent with the Section 5.

Initial Information of Simulation
Battlefield space is a 100 km × 100 km square map. According to the description of the modeling process in Section 2, the terrain-steepness factor is λ = 1.5, track weight and loss weight are ω 1 = 0.5, ω 2 = 0.5 and the penalty factor of the constraint term is δ = 5. At the same time, the maximum number of tasks that each UAV can perform is set to Load i = 3, where i represents the number of UAVs. The maximum range of the UAVs is set to D(i) max = 600 km. According to the task-allocation code in Section 2.5, the dimension of the allocation model is set to the task target number dim = 8, the scope of individual search is (0,3), the number of individuals in each algorithm is set to 10, and the iteration number is 1000. The heterogeneity of multiple UAV formations with different Electronics 2022, 11, 1171 22 of 28 loads is directly reflected by the capability value of performing a specific task. The specific parameters of UAVs and mission targets are shown in Tables 5 and 6. In order to visually display the battlefield environment and task-assignment process, this paper draws the battlefield environment through Visio 2017 software, as shown in Figure 7 below. search is (0,3), the number of individuals in each algorithm is set to 10, and the iteration number is 1000. The heterogeneity of multiple UAV formations with different loads is directly reflected by the capability value of performing a specific task. The specific parameters of UAVs and mission targets are shown in Tables 5 and 6. In order to visually display the battlefield environment and task-assignment process, this paper draws the battlefield environment through Visio 2017 software, as shown in Figure 7 below.

Simulation and Analysis
Four intelligent algorithms are used to independently solve the task-allocation model 10 times. The solution results are shown in Table 7, where Tar_fitness _final represents the final task fitness; the higher the fitness, the better the feasible solution iterated by the algorithm. P represents the success rate of the algorithm solution, and Mean is the average of the solution results obtained 10 times for each algorithm. Further analysis of Table 7 shows that the solving success rate of each algorithm is 100%, which verifies the effectiveness of the multi-UAV task-allocation model constructed in this paper. Meanwhile, the average fitness of 10 independent solutions of the OLPDHS algorithm is 1.5388, which is higher than the average fitness of the other three algorithms. Therefore, in solving task assignment problems, the optimization algorithm based on improved chaotic adaptive strategies designed in this paper is superior to the other three comparison algorithms. Figure 8 shows the fitness convergence process of four algorithms for solving the task assignment model. It can be seen from the figure that the OLPDHS algorithm can quickly converge to a relatively excellent fitness value in the early stage, and the fitness is higher than the other three algorithms in the late iteration, which further reflects the effectiveness of OLPDHS in solving the task assignment. In order to directly reflect the time consumption of the two algorithms, the tic and toc functions in MATLAB are used to obtain the time-consumption data in Table 8. Based on the data in the table, the time-consumption graph of the four algorithms after 10 iterations of independent operation is drawn in Figure 9. It can be seen from the figure that the SAHS algorithm has the lowest time consumption, while the NDHS algorithm has the highest time consumption. Although OLPDHS is more time-consuming, OLPDHS improves the fitness of feasible solutions of the task-allocation model, so the cost of sacrificing some time is acceptable.  In order to directly reflect the time consumption of the two algorithms, the tic and toc functions in MATLAB are used to obtain the time-consumption data in Table 8. Based on the data in the table, the time-consumption graph of the four algorithms after 10 iterations of independent operation is drawn in Figure 9. It can be seen from the figure that the SAHS algorithm has the lowest time consumption, while the NDHS algorithm has the highest time consumption. Although OLPDHS is more time-consuming, OLPDHS improves the fitness of feasible solutions of the task-allocation model, so the cost of sacrificing some time is acceptable.  In order to directly reflect the time consumption of the two algorithms, the tic and toc functions in MATLAB are used to obtain the time-consumption data in Table 8. Based on the data in the table, the time-consumption graph of the four algorithms after 10 iterations of independent operation is drawn in Figure 9. It can be seen from the figure that the SAHS algorithm has the lowest time consumption, while the NDHS algorithm has the highest time consumption. Although OLPDHS is more time-consuming, OLPDHS improves the fitness of feasible solutions of the task-allocation model, so the cost of sacrificing some time is acceptable. Figure 9. The average times of the four algorithms running independently 10 times. Table 8 shows the target allocation results corresponding to the optimal fitness obtained by running four algorithms for 10 times. Meanwhile, the allocation results are drawn on the basis of Figure 7 as shown in Figure 10.  Figure 9. The average times of the four algorithms running independently 10 times. Table 8 shows the target allocation results corresponding to the optimal fitness obtained by running four algorithms for 10 times. Meanwhile, the allocation results are drawn on the basis of Figure 7 as shown in Figure 10.

Related Work
There are many researchers dedicated to the study of task allocation, and many methods have been put forward. This section mainly summarizes the methods of task allocation in various fields and the achievements achieved in UAV task allocation.
The threshold-based method is used to solve task-allocation problems. In 2005, ex-

Related Work
There are many researchers dedicated to the study of task allocation, and many methods have been put forward. This section mainly summarizes the methods of task allocation in various fields and the achievements achieved in UAV task allocation.
The threshold-based method is used to solve task-allocation problems. In 2005, extreme teams, large-scale agent teams operating in dynamic environments, were on the horizon. Low-communication approximate distributed constraint optimization (LADCOP), a distributed threshold-based algorithm, was proposed to solve task-allocation problems in the domain of simulated disaster rescue. The tasks are perceived by the agents in the environment [44]. In 2007, swarm generalized assignment problem (Swarm-GAP) was used to experiment in the RoboCup rescue simulator [45]. In 2010, considering the various actors in the RoboCup rescue, Swarm-GAP, LADCOP and a greedy method were used to solve distributed task allocation among teams of agents in a RoboCup rescue scenario [46]. The results showed that the performance of Swarm-GAP and LADCOP are similar and that they outperform a greedy strategy. The threshold-based approach is applied to division of labor control for a robot group [47], but the task ordering is performed by a central command unit, which generates a significant number of messages. Swarm-GAP was adopted to deal with this problem and a new method with three algorithm variants was proposed in 2017 [48]. This method effectively prevented some agents from being overloaded with tasks while others remained idle. The aforementioned researchers aimed at the optimization of their resource usage applied in the context of static environments. In 2020, Amorim J C evaluated the performance of three swarm-GAP variants in dynamic contexts, and extended these algorithms to properly address more realistic dynamic scenarios [49].
The market-based method is applied to task-allocation problems. Task specification trees (TSTs) as a highly expressive specification language for complex multiagent tasks were used. Meanwhile, a sound and complete distributed heuristic search algorithm for allocating the individual tasks in a TST to platforms was proposed in 2010 [50]. A decentralized distributed solution approach based on multi-agent systems (MAS) to manage emergency vehicles was developed, and a multi-agent architecture to fit real emergency systems was proposed. A more refined and efficient auction mechanism based on implicit agents' coordination was examined to coordinate agents to reach good quality solutions in a distributed manner [51]. For the multi-robot dynamic task-allocation problem, multiobjective optimization (MOO) was used to estimate and subsequently make an offer for its assignment. That is, after task detection, an auction took place amongst robots capable of executing it. Robots calculated their bid using MOO [52].
There are also other works that propose methods for allocating tasks. ZORLU [22] considered the UAV load constraints and modeled the problem as a CVRP model with load constraints. Shima T [53] proposed the CMTAP model for UAVs, which divided the task types into identification, attack and damage assessment. The time sequence between tasks, execution time of tasks and multi-UAVs cooperative constraints were added into the model. Min Yao [54] designed a collaborative multi-task assignment model for UAV groups that is suitable for multiple UAVs, multi-task targets and multi-task types. Wen Gu [55] proposed a solution to UAV target allocation problem based on the Hungarian algorithm, and Jin Zhang et al. [56] extended the Hungarian algorithm to multi-target allocation. In addition, [57] used the MILP method to solve multi-UAV target allocation, and other optimization methods include dynamic programming and graph theory.
This current paper considers the type of task, the UAV load constraints, multiple UAVs, multi-task targets and multi-task types, features that are similar to previous studies in the literature [21,53,54]. Unlike these studies, this paper builds a model from two aspects of the rewards and costs of performing tasks.

Conclusions
In this paper, the complex task set was decomposed into sub-tasks suitable for a single UAV, and then the task-allocation problem was described and defined from the aspects of task rewards and task cost. The UAV's own constraints and mission constraints were defined. The mathematical model of task assignment was established, being more suitable for actual battlefield environment. The OLPDHS algorithm was proposed through discussion of exploration performance and convergence performance and its parameter values (HMCR > 0.9, PAR is in the interval [0.3, 0.7], HMS is appropriate in [5,40]) were given via a number of simulation experiments. In comparison with IHS, GHS, SAHS, SGHS, NGHS, NDHS, EHS and ITHS, the superior performance of OLPDHS was proven by a number of simulation experiments. A multi-UAV cooperative task-allocation example was designed to verify the superior performance of OLPDHS (Fitness is 1.5491, time cost is 0.0819). This is the first application of HS to the multi-UAV cooperative taskallocation problem. These performance qualities are ideal for helping decision-makers devise allocation schemes.
In the environment or in the system itself at runtime, real-world scenarios abound with the aforementioned dynamism. It is necessary for cooperative systems to deal with this dynamism to keep the execution and results level. Future work must study the dynamic redistribution problem, which is closer to real-world scenarios, focusing on how to establish the redistribution problem model in a dynamic environment, so that decision makers can make effective decisions in the battlefield's changeable environment.