1. Introduction
With the continuous development of information technology, unmanned aerial vehicle (UAV) systems have been widely applied in various complex scenarios, such as remote monitoring [
1,
2], target recognition [
3], data collection [
4,
5], and urban logistics [
6,
7]. To improve the task execution efficiency of multiple UAVs in complex and dynamic scenarios, efficient scheduling has become one of the important research directions in intelligent systems and UAV collaborative technology [
8].
One strategy to solve the UAV task allocation problem is to regard the task allocation process of the UAV system as a dynamic decision-making process of a multi-agent system, such as the Deep Reinforcement Learning method [
9]. In this approach, the UAV system possesses the capability to autonomously perceive the environment and respond accordingly. Jingyi Guo et al. [
10] modeled the policy function using Graph Neural Networks and Attention Mechanisms, successfully solving the problem of multi-UAV cooperative multi-target task allocation. Peng Pengfei et al. [
11] proposed an Evolutionary Reinforcement Learning algorithm based on Deep Q, which effectively addressed the uncertainty in the task allocation optimal solution space. Bo Zhang et al. [
12] employed an improved MAPPO algorithm to optimize the search and rescue routes of multiple UAVs in disaster-stricken environments, significantly reducing UAV energy consumption and enhancing rescue efficiency. Le Han et al. [
13] optimized DQN using the particle swarm optimization (PSO) algorithm, generating smoother and more efficient paths for UAVs. Although Deep Reinforcement Learning methods show certain advantages in complex, dynamic, and uncertain scenarios, their high performance also comes with demands such as improving training convergence speed and reducing control difficulty.
The dynamic task allocation problem for UAVs can be regarded as a typical combinatorial optimization problem [
14]. Metaheuristic algorithms can usually obtain approximate or high-quality solutions more quickly within a limited time and with lower computational complexity. Therefore, metaheuristic algorithms can be considered ideal methods for solving multi-objective task allocation problems. Cong Rui et al. [
15] proposed an improved TS-NSGA-II based on a Genetic Algorithm (GA), which provides an effective solution for UAVs to perform cooperative detection tasks in complex environments with multiple constraints. Chen, Xin et al. [
16] used the Ant Colony Algorithm and Greedy Algorithm to solve the task allocation and path planning problems in multi-UAV route planning, respectively, obtaining optimal solutions. Jingling Wang et al. [
17] proposed a Novel Discrete Non-Dominated Sorting Algorithm to solve the delivery task allocation problem. Zexian Huang et al. [
18] enhanced the optimization capability of multi-UAV cooperative path planning algorithms by using an improved Grey Wolf Optimization algorithm. Gang Huang et al. [
19] successfully realized the minimization of UAV flight distance and obstacle avoidance functions in UAV trajectory planning using the proposed TSCEA Two-Stage Cooperative Co-Evolution Multi-Objective Evolutionary Algorithm. Senlin Liu et al. [
20] employed an improved Multi-Objective Grey Wolf Optimization algorithm to optimize a heterogeneous multi-UAV cooperative multi-task allocation model, demonstrating the proposed algorithm’s high convergence and diversity in addressing this problem. Pei Zhu et al. [
21] integrated MP–GWO and NSGA-II algorithms to optimize the multi-objective firefighting task allocation and path planning of multiple UAVs in dynamic forest fire environments. The above literature reviews fully illustrate the flexibility and simplicity of optimization algorithms in solving various task allocation problems.
Dynamic multi-objective task allocation problems impose high requirements on optimization algorithms, including rapid responsiveness to environmental changes, the ability to continuously approximate the time-varying Pareto front, and the capability to escape local optima. PSO, with its fast convergence speed, strong local search capability, simple principles, few parameters, and high compatibility, is undoubtedly an excellent choice for solving dynamic multi-objective task allocation problems. For instance, Ming Yan et al. [
22] demonstrated the strong anti-interference capability and high efficiency of the improved GA-PSO algorithm in solving dynamic multi-UAV task allocation problems in marine environments. Gao, Yang et al. [
23] improved the convergence and distribution of the traditional MOPSO algorithm based on a Monte Carlo Resampling method, achieving excellent performance in multi-UAV task allocation scenarios. Xiaolong Zheng et al. [
24] developed an Evolutionary Multi-tasking Optimization algorithm based on the classical PSO algorithm, providing a good solution for Multi-Task Optimization (MTO) problems. Jian-feng Wang et al. [
25] proposed an improved Multi-Objective Quantum-Behavior Particle Swarm Optimization algorithm, which successfully optimized a complex four-objective task allocation model for heterogeneous UAVs. Gurwinder Singh et al. [
26] combined the PSO and AOQPIO algorithms to develop an optimization approach capable of effectively addressing UAV task allocation and path planning under dynamic wind conditions, significantly improving the convergence speed of the algorithm for this dynamic problem. Shuyue Liu et al. [
27] modeled how UAV swarms acting as temporary mobile base stations can simultaneously ensure coverage and communication quality, and used Tent-PSO for path planning, significantly enhancing the performance of the solution set. Ary Shared Rosas-Carrillo et al. [
28] proposed a PSO algorithm with adaptive inertia weight, successfully optimizing UAV reconnaissance task allocation during volcanic eruptions. Yu Chen et al. [
29] used an improved PSO algorithm to plan tourist routes in scenic areas based on point cloud data collected by UAVs. Ying Zeng [
30] proposed an improved MOPSO algorithm to optimize a UAV Cooperative Air Combat Route Planning model, demonstrating the algorithm’s convergence, diversity, and robustness.
Although the particle swarm optimization algorithm is susceptible to issues such as being trapped in local optima and uneven distribution of the solution set when solving dynamic multi-objective task allocation problems, its simple structure, few parameters, and strong scalability give it significant potential for improvement in such scenarios. To address these challenges, this paper proposes a dynamic multi-objective particle swarm optimization algorithm based on centroid-moving prediction strategy (DCMPSO). The algorithm introduces a range prediction mechanism based on the historical movement of the solution set’s centroid, thereby enhancing adaptability to dynamic environments. In terms of population updating, DCMPSO integrates a GA-based crossover and mutation strategy, as well as a novel multi-population cooperative velocity and position update mechanism. Additionally, a dynamic parameter adjustment strategy is designed to handle environmental changes. For external archive maintenance, a Euclidean distance-based projection pruning mechanism is introduced, which enhances solution diversity and robustness while maintaining convergence, thus improving the distribution quality of the solution set. In terms of modeling, this paper adopts a unique encoding and decoding approach to construct a multi-objective task allocation model for dynamic unmanned aerial vehicle scenarios. Algorithmic validation is conducted based on this model, and the results demonstrate that DCMPSO exhibits excellent adaptability and robustness in solving dynamic multi-objective task allocation problems involving multiple UAVs.
The structure of this paper is organized as follows:
Section 1 introduces the research background and reviews related literature;
Section 2 elaborates on the structure and innovative mechanisms of the proposed DCMPSO algorithm;
Section 3 conducts ablation and comparative experiments to verify the algorithm’s effectiveness;
Section 4 constructs a realistic multi-UAV dynamic task allocation scenario and applies DCMPSO to solve it;
Section 5 provides a summary and discussion of the study.
2. Dynamic Multi-Objective Particle Swarm Optimization Algorithm Based on Centroid-Shift Range Prediction
This chapter introduces the proposed DCMPSO algorithm.
Section 2.1 presents the concept and principles of PSO.
Section 2.2 describes the environment change detection mechanism.
Section 2.3 designs the environment change response mechanism.
Section 2.4 focuses on the design of the actual optimization process. Finally,
Section 2.5 presents the overall structure of the algorithm.
2.1. Particle Swarm Optimization Algorithm
The particle swarm optimization algorithm is a typical swarm intelligence optimization algorithm proposed by Kennedy and Eberhart in 1995, inspired by the foraging behavior of bird flocks. Based on this idea, PSO simulates a group of “particles” moving through the solution space to search for the optimal solution. Each particle represents a potential solution and has two key attributes: position and velocity, corresponding to the solution and its search direction. During the iterative process, each particle updates its state based on its own experience (i.e., the best position it has found, known as pBest) and the experience of the swarm (i.e., the best position found by any particle in the population, known as gBest). By leveraging swarm intelligence, the particles collectively converge toward the global optimal position, thereby identifying the best solution.
2.2. Environmental Change Detection
The purpose of environmental change detection is to accurately identify the time points of environmental changes in discrete dynamic environments, thereby providing guidance for subsequent response mechanisms. The algorithm determines whether the environment has changed at generation n by comparing the objective function values of the population in the objective space between generation and generation n.
Unlike other environmental change detection mechanisms, DCMPSO introduces the calculation of the environmental correlation coefficient between two consecutive environments once a change is detected. Specifically, it employs the Pearson correlation coefficient, denoted as
corr, which ranges from −1 to 1. The coefficient is calculated as a weighted value of two correlation components
and
, as defined in Equation (
1).
where
denotes the correlation coefficient of the objective function values of the current population between the
i-th and
-th discrete sub-environments, whose calculation is given in Equation (
2).
represents the correlation coefficient of the algorithm frontiers in the decision space between the
-th and
i-th sub-environments, calculated as shown in Equation (
3).
and
are the weighting coefficients of the two correlation coefficients, respectively.
where
and
represent the function values of the
m-th objective for the
k-th individual in the final generation of the population under the
i-th and
-th sub-environments, respectively.
and
denote the average function values of the
m-th objective over all individuals in the population under the
i-th and
-th discrete sub-environments, respectively.
n represents the number of individuals used to compute the linear correlation coefficient, and
M denotes the number of objectives.
where
and
represent the values of the
k-th individual in the final generation of the population on the
d-th dimension in the decision space under the
-th and
i-th sub-environments, respectively.
and
denote the average values of the
d-th dimension across all individuals in the population within the decision space under the
-th and
i-th sub-environments, respectively.
n denotes the number of individuals used to compute the linear correlation coefficient, and
M is the number of objectives. When
, there is no historical environment available for reference, so the weight coefficient
in the first sub-environment during the algorithm’s execution is set to 1, i.e.,
.
The weighted correlation coefficient corr is used to assess the predictability of the true Pareto front (the set of all optimal solutions in the objective space that are not simultaneously dominated by any other solutions, forming a curve or surface known as the true Pareto front) in the -th sub-environment by comprehensively comparing the linear correlation of the objective space between the i-th and -th sub-environments and the linear correlation of the decision space between the -th and i-th sub-environments.
Algorithm 1 presents the pseudocode of the environment change detection mechanism.
Algorithm 1: Pseudocode of the environment change detection mechanism |
![Drones 09 00556 i001 Drones 09 00556 i001]() |
2.3. Environment Change Response Mechanism Based on Centroid Shift Prediction
After an environmental change, it is necessary to perturb the population to prevent it from falling into local optima. However, simple random reinitialization of the population tends to slow down the convergence speed of the algorithm and exhibits poor robustness. To enhance the algorithm’s performance in dynamic environments, this section proposes a range prediction strategy based on centroid shift to respond to environmental changes. In addition, the exponential weighted moving average method [
31] is adopted to predict the number of iterations for which the next sub-environment will persist, serving as a guide for the dynamic adjustment of parameters during the optimization process.
2.3.1. Range Prediction Strategy Based on Centroid Shift
By employing the range prediction strategy based on centroid shift designed in this work, the possible location range of the true Pareto front in the decision space of the next sub-environment can be predicted. The population in the current environment is then initialized within this range with a certain degree of randomness, which can accelerate the optimization speed of the population under the new environment and enhance the robustness of the algorithm, thereby adapting to rapidly changing dynamic environments.
Figure 1 illustrates the schematic diagram of the proposed range prediction strategy based on centroid shift.
When an environmental change occurs, i.e.,
, a translation vector
is generated based on the centroid position
in the decision space of the
-th discrete sub-environment and the centroid position
of the
i-th sub-environment according to Equation (
4):
Using the positions of the current generation’s population individuals in the decision space as reference points, a range with a width of
times the length of vector
is generated. Each individual
in the current generation is reinitialized within a range centered at itself, defined by the vector interval
, producing new population individual positions
in the new environment. Subsequently, the positions of these newly generated individuals in the decision space are checked, and any particles violating the constraints are restricted to the boundaries of the decision space. Equation (
5) gives the calculation formula for the positions
of the newly generated individuals in the decision space.
where
and
represent the position vectors of the
k-th individual in the population at generations
and
j in the decision space, respectively; rand is a random number in the interval
; and
N denotes the total number of individuals in the population. The value of
changes according to the environmental linear correlation coefficient corr. When
, it indicates a strong linear correlation in both the decision space and objective space between the current sub-environment and its two adjacent sub-environments, reflecting good predictability. In this case, a smaller
is used for precise prediction. When
, the linear correlation is weaker and the predictability is moderate. Therefore, a larger
is used to expand the prediction range and prevent premature convergence to local optima. When corr falls outside the range
, the true Pareto front in the next sub-environment is difficult to predict, and the algorithm adopts a random initialization method to generate new population individuals to cope with the environmental change.
2.3.2. Exponential Weighted Moving Average Method for Predicting Sub-Environment Iteration Count
The prediction mechanism based on the exponential weighted moving average (EWMA) method is used to estimate the duration of iterations for each discrete sub-environment. During the algorithm’s execution, if an environmental change is detected, the duration
of the current sub-environment, i.e., the
i-th discrete sub-environment, is recorded. Based on the durations of historical sub-environments
, the EWMA method [
31] is employed to predict
. Equation (
6) presents the formula for calculating
using EWMA.
where
is the smoothing factor that controls the influence of historical data on the prediction results. Equation (
7) shows the recursive expansion of Equation (
6):
As shown by its recursive expansion, when , the weight decays slowly, enabling the prediction mechanism to quickly respond to the latest changes. This setting is suitable for rapidly changing data. In contrast, when , the weight decays quickly, placing more emphasis on older historical environments. In this case, the prediction curve becomes smoother, which is more appropriate for regularly changing data with small variation amplitudes. The value of can be adjusted according to the nature of the problem.
Algorithm 2 presents the pseudocode for the environmental change response component.
Algorithm 2: Centroid-translation-based range prediction |
![Drones 09 00556 i002 Drones 09 00556 i002]() |
2.4. The Actual Optimization Component of the Algorithm
The actual optimization component is responsible for the optimization process of DCMPSO within a single discrete sub-environment. In this section, a hybrid population update strategy combining GA-based crossover and mutation with PSO-based velocity and position update is proposed as the updating mechanism during the population’s iterative process in the decision space. A dynamic adjustment scheme for certain parameters during population updating is designed to balance the convergence and diversity of the algorithm. Finally, an external archive maintenance method based on Euclidean distance projection pruning is developed.
2.4.1. GA-Based Crossover and Mutation Update Strategy
The single velocity and position update strategy in MOPSO often leads to excessive convergence, causing the population to easily fall into local optima. In contrast, the crossover and mutation update strategies in the GA enhance population diversity and distribution, and improve the exploration capability in the solution space. Therefore, this study introduces a GA-based population update strategy that incorporates two-point crossover and polynomial mutation to enhance population distribution and facilitate better exploration of the solution space.
When a population update operation is required during the algorithm’s iteration process, a selection pressure
is introduced, and a random number rand is generated within the range
. Equation (
8) determines the update strategy for the current generation by comparing the values of
and
.
where the random number
is less than the selection pressure
, the current generation’s population updates their positions in the decision space using GA’s two-point crossover and polynomial mutation methods. Otherwise, the population updates are performed using PSO’s velocity and position update strategy.
To enable the actual optimization component to better explore the solution space, accelerate population convergence, and enhance population diversity, we design a dynamic variation scheme for the selection pressure
within a single sub-environment. Equation (
9) describes the dynamic variation of
in the
-th sub-environment:
where
and
denote the maximum and minimum values of the selection pressure, respectively;
t is the iteration counter of the algorithm within the
-th sub-environment; and
is the predicted number of iterations the
-th sub-environment will last. Equations (
8) and (
9) show that the algorithm applies GA’s crossover and mutation-based population updates more frequently in the early stage of a single sub-environment to enhance exploration of the solution space, while in the later stage, it favors PSO’s velocity and position update strategy to accelerate convergence.
Figure 2 illustrates the schematic diagram of the two-point crossover operation. First, two parent individuals,
parent 1 and
parent 2, with relatively better fitness values are selected from the population using binary tournament selection. Two crossover points are then randomly chosen on their position vectors in the decision space. The gene segments between these two points are exchanged to generate two offspring individuals,
child 1 and
child 2.
The generated offspring have a certain probability of undergoing polynomial mutation. For the mutated offspring, each decision variable dimension is perturbed with a mutation probability , causing local variations near the original value. This perturbation follows a polynomial probability distribution with distribution index , which controls the magnitude of the mutation. After mutation, the offspring are projected back into the feasible decision space to ensure that all variables remain within the predefined boundary constraints.
2.4.2. Multiple Population Update Strategies of Particle Swarm Optimization
When
>
, the algorithm updates the population individuals using the velocity and position update strategy of particle swarm optimization (PSO). In traditional PSO, there is only a single population during iteration, which means there is only one global best solution guiding the update. When this global best solution becomes trapped in a local optimum, other individuals in the population tend to follow and get trapped as well. To address this, this study designs a multi-swarm velocity and position update strategy, as illustrated in
Figure 3.
According to
Figure 3, the algorithm uniformly divides the population into multiple subpopulations. Particles of the same color indicate that they belong to the same subpopulation, and solid particles indicate the best solutions within their respective subpopulations. Each subpopulation possesses its own local best solution, which guides the updates of individuals within that subpopulation. Multiple local best solutions help prevent all particles from being guided by a single global best solution and thus avoid premature convergence to local optima. This approach increases the population’s distribution and diversity without compromising the convergence ability of the algorithm. Equation (
10) shows the velocity and position update formula:
where
denotes the global best solution of the
s-th subpopulation at generation
n, and
denotes the historical best solution of the
i-th individual within the
s-th subpopulation at generation
n. The coefficients
,
, and
represent the learning factors and inertia weight that dynamically vary with the iteration number within a single sub-environment. Their dynamic variation in the
-th sub-environment is described by Equations (
11) and (
12).
where
denotes the predicted number of iterations the
-th sub-environment will last;
and
represent the initial values of the learning factors
and
, respectively;
and
denote their final values; and
t is the iteration counter within the current sub-environment. Equation (
10) indicates that, in the early stage of each sub-environment, the algorithm uses a larger individual learning factor
and a smaller social learning factor
, relying more on the particle’s own experience to explore the solution space more broadly. In the later stage, a smaller
and larger
are used to depend more on the guidance of the local best solution within each subpopulation, accelerating convergence toward the algorithm’s Pareto front.
where
,
, and
denote the adaptive inertia weight, the maximum inertia weight, and the minimum inertia weight, respectively. The term
represents the fitness value of the current individual, which is the normalized average of all objective function values across its dimensions. The terms
and
denote the minimum and maximum fitness values within the current population, respectively. As shown in Equation (
12), the inertia weight
adopts an adaptive variation strategy independent of the environmental change cycle. It varies adaptively according to the convergence status of the current particle: if the particle shows good convergence, the inertia weight decreases to promote exploitation; if convergence is poor, the inertia weight increases to encourage further exploration of the search space for better solutions.
2.4.3. External Archive Maintenance
The solutions stored in the external archive serve as the final representation of the solution set, where both convergence and diversity of the solutions are crucial. Therefore, the proposed DCMPSO implements a Euclidean distance–based projection pruning method to update and maintain the external archive, Rep. This method helps to improve the distribution of the solution set in the objective space. Taking the two-dimensional objective space as an example (see
Figure 4), the method calculates the pairwise Euclidean distances among all individuals in the critical layer, identifies the two individuals with the greatest Euclidean distance, and connects them with a line segment. This segment is then divided into
equal parts, producing
k endpoints, where
k is the number of particles still needed to fill the new external archive. All other individuals in the critical layer are projected onto this line segment, and those whose projections are closest to each endpoint are retained in the external archive Rep, thus completing the maintenance of the external archive.
The Euclidean distance-based projection pruning method has advantages over the crowding distance-based external archive maintenance approach in low-dimensional objective spaces. Therefore, the proposed DCMPSO algorithm adopts the crowding distance method to maintain the external archive when the number of objectives satisfies , and uses the Euclidean distance-based projection pruning method when .
Algorithm 3 presents the pseudocode of the actual optimization process in DCMPSO.
Algorithm 3: Population individual update |
![Drones 09 00556 i003 Drones 09 00556 i003]() |
2.5. Algorithm Structure
As shown in
Figure 5, the structure of the proposed algorithm is divided into three main components: the environmental change detection, the environmental change response, and the actual optimization process. This chapter first presents the mechanism for environmental change detection, which includes detecting environmental changes and calculating the weighted linear correlation coefficient. The algorithm determines whether an environmental change has occurred by monitoring whether the objective function values of the same population differ across two consecutive generations. Once a change is detected, it computes the weighted linear correlation coefficient between two adjacent environments to assess the predictability of the environmental change.
The environmental change response module includes a centroid-shift-based range prediction strategy and an exponentially weighted moving average-based iteration prediction method. The centroid-shift-based prediction strategy estimates the potential location of the true Pareto front in the upcoming environment by utilizing the centroid shift vector of the population’s front and the corresponding weighted correlation coefficient in previous environments. The population is then reinitialized within the predicted range to respond to environmental changes. Meanwhile, the EWMA-based prediction method estimates the number of iterations that the next sub-environment will last based on the observed duration of past sub-environments. This prediction is used to guide dynamic parameter adjustments in the actual optimization process.
The actual optimization component involves the design of population update strategies, dynamic parameter adjustment, and external archive maintenance. In terms of population update, the algorithm divides the population into multiple subpopulations to reduce the likelihood of premature convergence caused by velocity–position update mechanisms in standard PSO. It further integrates the crossover and mutation strategies of the GA with the velocity–position updates from PSO. Additionally, adaptive mechanisms are designed for certain parameters to balance convergence and diversity in dynamic environments. For external archive maintenance, a novel Euclidean distance-based projection truncation method is proposed for bi-objective optimization problems to maintain the diversity and quality of archived solutions.
Algorithm 4 is the pseudocode of the complete algorithm.
Algorithm 4: Pseudocode of the complete algorithm |
![Drones 09 00556 i004 Drones 09 00556 i004]() |
5. Conclusions and Discussion
This paper proposes a dynamic multi-objective particle swarm optimization algorithm, DCMPSO, based on centroid shift prediction, in response to the limitations of traditional particle swarm optimization (PSO) algorithms in solving dynamic multi-objective task allocation problems for UAVs. The DCMPSO framework comprises three main components: environmental change detection, environmental change response, and actual optimization. The centroid shift prediction mechanism designed for the detection and response modules addresses the issues of premature convergence, poor robustness, and slow convergence typically encountered by PSO in dynamic environments. In the actual optimization phase, the GA-based crossover and mutation update strategy, combined with dynamic parameter adjustment, effectively balances convergence and diversity within each sub-environment, reducing the likelihood of local optima and significantly enhancing the overall algorithmic performance.
We conducted systematic ablation and comparative experiments. The ablation study demonstrates that the centroid-based range prediction strategy improves performance by approximately 17% on complex benchmark functions, while the GA-based crossover and mutation strategy yields an additional improvement of about 17% in similarly challenging problems. The comparative experiments show that DCMPSO outperforms two conventional algorithms, DNSGA-II and SGEA, with improvements in convergence and diversity ranging from approximately 37% to 70%. Subsequently, we applied DCMPSO to optimize a dynamic multi-objective task allocation model for multiple UAVs. The resulting solutions clearly demonstrate the robustness and flexibility of DCMPSO in real-world applications. In conclusion, the experimental results fully indicate that DCMPSO achieves superior convergence and diversity in dynamic multi-objective optimization problems and holds strong engineering potential in UAV task allocation under dynamic conditions.
Future research on this topic may focus on several aspects: First, in the environmental change response phase, nonlinear modeling techniques based on deep learning—such as Deep Neural Networks (DNNs) or Recurrent Neural Networks (RNNs)—could be introduced to achieve more accurate predictions of the true Pareto front. In terms of the task allocation scenario, more realistic three-dimensional geographic environment models could be constructed, and additional dynamic environmental factors—such as moving obstacles and variations in wind speed or direction—could be incorporated to enhance the model’s applicability to real-world contexts, enabling it to be applied to a wider range of real-world scenarios, such as post-disaster reconnaissance in natural disaster areas and wildlife protection reconnaissance tasks. Moreover, due to DCMPSO’s strong solution diversity and distribution capabilities, it could play a greater role in multimodal optimization problems, such as generating multiple feasible approximate solutions for UAV task allocation.