1. Introduction
In recent years, global disasters have occurred frequently. Countries like Ecuador, Turkey, Afghanistan, Tibet, and Myanmar have been hit by a series of major disasters. During the golden 72 h rescue window, adverse conditions, including heavy rains and nighttime darkness, hinder rescuers from rapidly obtaining comprehensive situational information about the disaster-stricken area, prolonging the rescue response time. Meanwhile, complex terrain composed of collapsed buildings and broken roads compels ground rescue teams to perform repeated surveys and detours, which further reduces rescue efficiency. A fixed-wing Unmanned Aerial Vehicle (UAV), taking advantage of its strengths in long-endurance flight, wide-area coverage capability [
1], high flight stability [
2], and strong environmental adaptability [
3], enables continuous surveying of post-disaster areas. This avoids interruptions in data collection due to frequent recharging trips, thereby providing continuous dynamic data support for rescue decision-making. Equipped with a Light Detection And Ranging (LiDAR) device, it can output stable and reliable point cloud data under harsh conditions, allowing rescuers to grasp the overall disaster situation in the affected area. Based on these capabilities, the fixed-wing UAV equipped with LiDAR can continuously cover disaster-stricken areas in scenarios such as nighttime and heavy rains, providing technical support for All-Weather Post-Disaster Coverage Path Planning (PDCPP).
The Sequential Path Coverage (SPC) algorithm [
4] enables reliable, complete coverage of post-disaster areas through continuous and parallel non-overlapping scanning. However, the turning capability of a fixed-wing UAV during transitions between adjacent scan lines is bounded by its flight dynamics, which are governed by factors such as allowable bank angle, climb angle, and airspeed. Meanwhile, the spacing between neighboring coverage paths is determined by the LiDAR focal length and flight altitude, in many cases, this spacing becomes smaller than the UAV’s feasible turn radius [
5]. This mismatch forces the aircraft to execute multiple consecutive curved maneuvers when shifting from one parallel path to the next. These maneuvers increase the turning distance and energy consumption, reducing task timeliness. Reference [
6] studied the Traveling Salesman Problem-Coverage Path Planning (TSP-CPP) in post-disaster scenarios, proposed a mixed-integer programming formulation suitable for this scenario, and introduced a CPP method for covering polygonal areas. The validity and performance of the proposed approach were confirmed through rigorous theoretical analysis and simulation studies. In this study, the fixed-wing UAV determined the optimal scanning direction by calculating the supporting parallel lines of a convex polygonal area and generated parallel, non-intersecting coverage paths. Reference [
7] focused on agricultural operation scenarios and proposed a coverage path generation scheme for the fixed-wing UAV, applicable to arbitrary polygonal areas. Its core idea is to integrate wind field interference on the fixed-wing UAV’s ground track to minimize operation time. In this approach, coverage paths are represented as parallel strips, and the method further identifies the optimal alignment of these strips, together with the priority of entry and the trajectory orientation for each strip. However, it only uses the straight-line distance between path endpoints when calculating turning distance and does not fully account for the kinematic constraints and flight safety limits of the fixed-wing UAV. This scheme cannot perform “point-to-point” straight-line turns, which significantly deviates from real operational scenarios. Similarly, references [
4,
6,
7] do not account for the maneuvering limitations imposed on a fixed-wing UAV during operations in post-disaster regions.
Based on the correlation between the minimum maneuvering radius and the sensor detection range, reference [
8] proposed a staggered reciprocating flight mode. While this mode improved the turning efficiency of the fixed-wing UAV to some extent, it relied on a fixed access sequence and lacked a global optimization mechanism. This can easily lead to redundant flight paths. Additionally, as the number of paths increased, the data volume of the TSP grew quadratically, significantly extending path planning time. Reference [
9] used the SPC algorithm to generate initial flight paths and then optimized the regional traversal order using a Genetic Algorithm (GA) to shorten the turning distance. However, the initial GA population was mostly generated randomly. This resulted in low individual quality, insufficient excellent solutions, restricted convergence speed, and compromised global search performance. It also led to high individual redundancy, which made it difficult to fully cover the optimal solution region. This intensified the randomness in the early stage of evolution, making the algorithm prone to falling into the trap of local optimality. Reference [
10] attempted to introduce heuristic rules to improve the initialization process of the population. However, this method only generated some of the components heuristically, while the rest were still generated randomly, resulting in insufficient population diversity. Thus, it failed to overcome the limitation of local optimal solutions caused by frequent turns in short-distance paths.
Reference [
11] achieved initial CPP by combining the minimum span algorithm and the round-trip path generation algorithm. They also realized global path optimization by improving the crossover operator in the Dubins-based Enhanced Genetic Algorithm (DEGA). However, this single crossover operator slowed down the algorithm’s convergence speed and significantly increased the probability of generating infeasible solutions in constrained or combinatorial optimization problems, which significantly impaired the algorithm’s performance. Reference [
12] proposed matching multiple crossover operators to two parent chromosomes and constraining the sum of all operators’ usage probabilities to 1. However, this method used an equal-probability selection strategy, failing to dynamically adjust according to operator characteristics or the population’s evolutionary state. This leads to insufficient population offspring diversity and traps the algorithm in local optimality.
Deep Q-network (DQN)-based reinforcement learning (RL) techniques, along with other RL approaches, have also been investigated in previous studies [
13]. For instance, using the Markov Decision Process model, a DQN based deep RL method was designed for search and rescue path planning. Reference [
14] adopted an RL-based strategy to solve the problem, realizing path planning via training across diverse environments. To address the instability of the Deep Deterministic Policy Gradient (DDPG) algorithm when applied to fixed-wing UAV path planning, reference [
15] proposed a multi-critic delayed DDPG algorithm. However, in practical applications, relying solely on RL for decision-making is easily affected by the distribution of training samples, and its convergence stability needs to be improved urgently in multi-constraint scenarios.
To overcome the limitations of GA in generating high-quality initial populations, the excessive turning energy consumption of SPC, and the slow convergence associated with single crossover operators, the main contributions of this study are as follows:
This study constructs a mixed-integer programming model to optimize the PDCPP problem of the fixed-wing UAV. To minimize energy expenditure during flight, the model incorporates constraints including turning angle and endurance, enabling All-Weather rescue and supporting efficient disaster relief operations;
To address the energy consumption issue in PDCPP, this study proposes the Multi-Selector Genetic Algorithm-Reinforcement Learning (MSGA-RL) algorithm. Featuring distance-priority initialization, multi-selector crossover, and RL-based Elite Archive to avoid local optima and enhance convergence;
This study conducts simulation experiments in a post-disaster rescue environment, selecting convex quadrilateral and pentagonal areas as representative task scenarios. The results indicate that MSGA-RL significantly reduces energy consumption compared with benchmark algorithms. Further ablation experiments verify the effectiveness of each improvement strategy in minimizing energy consumption. Considering that algorithmic stability is critical for reliable mission execution, a boxplot-based analysis is performed on the multi-selector crossover operator in MSGA-RL, demonstrating its superior stability.
The paper is organized as follows:
Section 2 introduces the materials and methods of the PDCPP.
Section 3 details the MSGA-RL algorithm.
Section 4 presents simulation experiments and performance evaluation.
Section 5 concludes with a summary and future research directions.
2. Materials and Methods of the PDCPP
In this study, to focus on validating the effectiveness of the proposed MSGA-RL algorithm in typical post-disaster coverage scenarios, several simplifying assumptions are made regarding the environment and UAV operations. Specifically, the following simplifying assumptions are considered:
- 1.
Communication failures, including loss of command and telemetry channels, are not considered.
- 2.
Extreme meteorological conditions, such as strong winds or turbulence, are not considered.
- 3.
The proposed MSGA-RL algorithm focuses on task-level and path-level offline planning and does not directly involve low-level flight control or real-time trajectory tracking.
These simplifications allow the study to first evaluate the algorithm’s performance under standardized and controlled conditions, while more complex operational factors are left for future work.
As shown in
Table 1, the symbols and parameters used in the MIP model in this study are presented.
As shown in
Figure 1, the fixed-wing UAV equipped with a LiDAR [
16] device performs PDCPP starting from the take-off point. It avoids obstacles, covers the target area, and then returns to the take-off point. This provides rescuers with terrain information of the disaster-stricken region and significantly reduces rescue time. According to reference [
17], disaster-affected areas can often be modeled as convex polygons. For non-convex disaster-affected regions, convexification can be performed—while ensuring complete coverage—using methods such as the convex hull algorithm [
18], partition-based convexification [
19], or buffer-based convexification [
20]. Taking a pentagon as an example, as illustrated in
Figure 1, the vertices of the convex pentagon are defined in a counterclockwise order in the Cartesian coordinate system, denoted as
, where no three consecutive vertices are collinear.
(where
i = 1, 2, 3) represents circular obstacles within the mission area.
Figure 2 presents the schematic of a fixed-wing UAV outfitted with a LiDAR system, executing PDCPP. Let
l represent the flight altitude, ABCD represent the projection area, and
and
represent the short and long sides of the sensing area, respectively. In this paper, the scanning width of the fixed-wing UAV is defined as the short side length
of the sensing area (as shown in
Figure 2,
), to avoid potential edge blurring in captured images. This ensures full coverage of the target area without missing edge details.
During All-Weather rescue missions, when flight speed and payload are fixed, the UAV’s minimum maneuvering radius represents the key constraint.
Figure 3 depicts the relevant motion characteristics and variables.
According to
Figure 3, the following relationship can be obtained:
In Equation (1),
denotes the lift experienced by the fixed-wing UAV,
is the maximum roll angle,
is the minimum maneuvering radius,
is the flight speed, and
is the gravitational acceleration. The calculation formula for
can be expressed as follows [
21]:
SPC is a classic method for solving the PDCPP problem. It is particularly suitable for early-stage application scenarios with low real-time performance and hardware requirements. With sequential traversal of sub-regions as its core logic, SPC does not require complex optimization calculations. The path is highly deterministic in that once the sub-region division is complete, the traversal order, turning points, and other elements are fixed. This results in high predictability, facilitating advance planning and risk prediction. The SPC algorithm determines the optimal flight direction based on the minimum span direction [
22]. By traversing the boundary of the task area, it calculates the maximum vertical distance from each vertex to the edges. Among these maximum vertical distances, the minimum one is recorded as
, and the edge corresponding to this minimum vertical distance
is regarded as the optimal flight direction for the fixed-wing UAV.
Figure 4 shows a schematic diagram of the optimal flight direction, where
represents the side length between vertices
and
, the blue edge denotes the minimum width of the area, the red edge indicates the optimal flight direction for the fixed-wing UAV. Then, the minimum number of scan paths required to cover the entire area is denoted as
, as shown in Equation (3). The intersection points of each scan line with the boundary of the target area are calculated, and the resulting set is defined as
F(
i), which is given in Equation (
4).
In Equation (3), w represents the coverage range of the fixed-wing UAV equipped with an on-board camera, denotes the minimum width distance of the target area, and stands for the ceiling function (rounding up). This ensures that the scanning paths can fully cover the boundary of the area.
In Equation (4), , represent the coordinates of all intersection points between the i-th scanning path and the area boundary.
SPC algorithm requires the fixed-wing UAV to fly along paths in sequential order. When the UAV switches between adjacent paths, it must detour through multiple curved segments to adapt to its kinematic constraint of being unable to turn instantaneously, which leads to additional energy consumption. Based on this, the turning process between adjacent paths can be divided into two scenarios, each of which has a targeted optimization [
23], as illustrated in
Figure 5.
Figure 5 illustrates two regional coverage strategies, where the yellow line segments indicate the turning trajectories of the fixed-wing UAV beyond the boundaries of the mission area. The calculation formula for the turning flight distance is as follows:
where
,
, and
,
d is the span between flight paths.
SPC achieves PDCPP by generating continuous scanning flight paths. However, there is a flaw in the trajectory switching phase. Due to the kinematics of the fixed-wing UAV, the UAV’s minimum maneuvering radius often exceeds the spacing between adjacent flight paths. This compels the UAV to perform multiple detours. These redundant detours increase energy consumption per unit distance and reduce task timeliness, which is unfavorable for rescue operation implementation.
3. The MSGA-RL Method
In All-Weather coverage scenarios, high energy consumption has become a bottleneck restricting continuous operations. To achieve full-area coverage, a fixed-wing UAV needs to maintain long-endurance flight in complex environments. However, flight paths with continuous small-angle sharp turns will lead to a sharp increase in energy consumption, making it difficult to meet the requirements of uninterrupted All-Weather operations. The post-disaster UAV coverage path planning (PDCPP) problem can be formalized as a variant of the Generalized Traveling Salesman Problem (GTSP). In PDCPP, each coverage strip can be regarded as a “cluster,” and the optimization objective is to determine the visiting sequence of all coverage clusters such that the UAV completes the entire scanning mission while minimizing turning energy consumption. Specifically, if the set of coverage strips is , the task can be interpreted as selecting one representative point from each cluster and determining the visiting order to minimize cumulative turning costs. This formulation aligns exactly with the definition of GTSP, where each cluster must be visited exactly once.
Compared with standard TSP solvers, MSGA-RL offers the following advantages: Multi-strategy genetic operations: By employing multi-selector crossover and adaptive mutation, MSGA-RL can flexibly explore the complex solution space of coverage strip sequences. Reinforcement learning-based individual retention mechanism: High-quality individuals are preserved to prevent the loss of superior paths, accelerating convergence and reducing energy consumption. Adaptation to complex obstacle environments: Traditional TSP methods typically assume fixed distances between nodes, whereas in PDCPP, UAV must avoid obstacles. MSGA-RL can optimize the visiting sequence while ensuring obstacle avoidance. Therefore, MSGA-RL is particularly well-suited for this GTSP variant, and it can substantially reduce UAV turning energy and improve mission efficiency in complex post-disaster environments.
Therefore, this study selects “minimum energy consumption” as the optimization objective. Reducing redundant turns and optimizing attitude adjustments can maximize the coverage duration and range of the fixed-wing UAV in a single flight, thereby avoiding mission interruptions caused by insufficient endurance.
In Equation (6),
(
,
) represents the minimum flight energy consumption of the fixed-wing UAV along the Dubins feasible curve between
and
.
represents the connection intersection point of the
i-th scan strip, where
(i.e., either
or
), serving as the endpoint connected to the next scan strip.
represents the connection intersection point of the (
)-th scan strip, where
(i.e., either
or
), serving as the endpoint connected to the previous scan strip. In Equation (7) [
24],
P denotes turn power,
denotes turn angle, and
denotes turn slope angle. Equation (8) describe the power consumption of the fixed-wing UAV during its turning flight. Here,
and
are constants related to the fixed-wing UAV’s parameters and environment (estimated based on weight, area, air density, etc.) [
24],
denotes velocity.
Furthermore, in the context of the path connection and energy consumption control model for post-disaster rescue scenarios, Equation (6) is subject to the constraints listed below:
Equation (9) indicates that each path point is visited exactly once, where means the path point is selected and the path selection is a binary decision. Here, i denotes the serial number of the flight path, and k represents the serial number of the intersection point between the flight path and the area. Equation (10) specifies that the start point or end point of each path must have exactly one incoming path from the end point or start point of another path, where refers to the coordinates of the intersection points between the flight paths and the area. Equation (11) ensures that the total distance-based power consumption of the fixed-wing UAV does not exceed its maximum battery capacity. In this equation, is the path distance, represents the fixed-wing UAV’s power consumption per unit distance, and denotes the fixed-wing UAV’s maximum battery capacity. Equation (12) represents the total turning distance of the fixed-wing UAV within the mission area.
To effectively address the high energy consumption issue of the fixed-wing UAV in the All-Weather PDCPP scenario, this section proposes the MSGA-RL algorithm. The specific innovations are as follows:
Population Initialization: The algorithm applies a distance-priority heuristic during population initialization to strengthen diversity. This heuristic further enhances its ability to explore the search space effectively.
Crossover Strategy: The algorithm employs a multi-selector crossover operator. This approach allows it to achieve faster convergence. It also helps maintain a diverse set of candidate solutions.
Individual Retention Mechanism: The algorithm integrates an RL-based retention mechanism with an Elite Archive. This integration reduces the likelihood of premature convergence. It also helps maintain high-quality individuals throughout the search process.
3.1. Generation of Initial Population
To improve the quality of initial individuals, enhance population diversity, and strengthen the algorithm’s initial exploration capability, this study proposes a distance-priority heuristic greedy initialization strategy.
Within the general population, each chromosome is defined as a specific sequence of target points, and its generation integrates two distinct approaches, namely a distance-prioritized heuristic and a greedy algorithm. The distance-prioritized heuristic functions as follows. Initially, all target points are sorted in ascending order based on their Euclidean distance from the initial point. Subsequently, a partial segment of the chromosome is generated according to this sequence. Once a partial segment of a chromosome has been constructed using the distance-prioritized heuristic, the remainder of the chromosome is completed through a greedy algorithm. More precisely, let the length of a chromosome be
for
i = 1, 2, …,
(i.e., the total number of task points), and let the population size be
. Then, for individuals from the first to the
-th, the lengths of the segments generated according to the heuristic rules are given as follows:
For the remaining task points not included in the heuristic sequence, the greedy algorithm is adopted to generate the subsequent segment. The specific steps are as follows. Set the last task point in the heuristic segment as the initial starting point, then initiate the iterative construction phase. At each step, select a task from the set U of currently unvisited tasks, where has the minimum transfer cost from the current task . This selection follows the formula.
The optimal task is selected, where C(, ) represents the transfer cost from task to task (the transfer cost is given by Equation (6)). After selection, is set as the new current task and removed from the unvisited set U. This process is repeated until all remaining task points are visited. The chromosome segment formed by this greedy path is then concatenated after the heuristic sequence to form a complete individual chromosome.
3.2. Fitness Function
The choice of the fitness function plays a crucial role in determining both the convergence efficiency of the algorithm and its ability to identify the global optimum. In this study, the problem is fundamentally formulated as minimizing energy consumption, and the objective function
is defined as
The core optimization objective of this study is to minimize the UAV’s flight energy consumption, which is defined by the energy cost function in Equation (6). Therefore, the fitness function is constructed by negating the energy function in Equation (6) to ensure compatibility with the maximization-based optimization mechanism of the MSGA-RL algorithm. The corresponding fitness function is given in Equation (15).
3.3. Adaptive Crossover Operator
As a core operator for exploring new solution spaces, the crossover operator is indispensable. However, most crossover methods are designed for general purposes and lack heuristic guidance tailored to specific problems, and this makes it difficult for offspring to effectively inherit feasible or high-quality features related to problem characteristics from parents [
25]. In the MSGA-RL algorithm, this study proposes a multi-selector heuristic crossover operator to facilitate the search process of the algorithm.
The adopted crossover operators include the segment rearrangement operator, the gene recombination operator, the distance-priority operator, and the cost-driven insertion operator guided by common subpaths. The specific operations of these crossover operators are as follows.
Segment Rearrangement Operator: Randomly select a continuous gene sequence in the chromosome, shuffle the order of genes within this segment, and then reinsert the shuffled segment into its original position. It features strong local perturbation, simple and efficient operation, and can quickly break local optimal solutions. For example, in post-disaster rescue paths, the problem of local detours can be optimized by rearranging the order of a segment of continuous path points.
Gene Recombination Operator: Randomly extract multiple genes from the chromosome, rearrange them to form a new gene segment, and then insert it into a randomly selected position to generate a new chromosome individual. Gene recombination has high flexibility, making it easier to produce diverse individuals. The randomness of the insertion position further broadens the population search space, helping to reduce the likelihood of the algorithm becoming trapped in local optima.
Three-Point Crossover Operator: According to the length of the chromosome, three random crossover points , , and are determined on the chromosome, defining two crossover intervals: = [, ] and = [, ]. New offspring are created by exchanging genes within these crossover intervals. If gene position duplication occurs after the exchange, necessary gene swaps are performed in non-crossover intervals to eliminate such duplicates. The multi-interval exchange balances local optimization and global recombination; the randomness of crossover positions enhances population diversity, and the controllable exchange logic prevents the loss of high-quality gene segments.
Distance-Priority Operator: After selecting a chromosome segment, reorder the task numbers in the segment based on the distance between each path point in the segment and the preceding chromosome (sorted from nearest to farthest). Distance constraints avoid the blindness of random rearrangement, enabling efficient optimization of excessively long detours between adjacent path points in post-disaster rescue.
Cost-Driven Insertion Operator Guided by Common Subpaths: Randomly select a common subpath containing three consecutive target points from the two parent paths. For the remaining unvisited cities, calculate the comprehensive cost of each candidate insertion position using the fitness function, and continuously select the position with the minimum comprehensive cost for insertion until all cities are visited. The cost-driven insertion logic balances objectives such as energy consumption and safety, generating paths that better meet the requirement of minimizing the comprehensive cost in post-disaster rescue.
In the algorithm initialization phase, each crossover operator is assigned an initial score
, and the weight
is updated according to the following formula:
Here, indicates the current performance score of the considered crossover operator, and its corresponding weight represents the relative proportion of this operator’s score among all crossover operators’ scores. In each generation, the algorithm selects the crossover operator in use via a roulette mechanism based on the weights assigned to the operators. This allows for dynamically adjusting the choice of crossover strategies, where operator selection is based on the fitness improvement of newly generated individuals relative to the best individuals from the previous generation.
According to Equation (17), the score of the operator is adjusted as follows.
denotes the updated score of the
i-th individual.
is added when the new individual’s fitness
is worse than the current best
;
is added when it is worse than the previous generation’s best
but better than the current best
; in all remaining situations,
is added.
The weights of all operators are then recalculated based on the updated scores and the roulette selection mechanism is adjusted accordingly. This updated mechanism then guides the selection of crossover operators in subsequent iterations.
3.4. Individual Retention Mechanism Combined with RL
Once adaptive crossover and mutation operations have been performed, it is necessary to determine whether the newly generated individuals should be added to the offspring population. While introducing only individuals with better performance (based on fitness) can accelerate convergence, it also reduces population diversity, ultimately leading the algorithm to fall into local optimal solutions. When the algorithm’s search process shows a tendency to deviate from the region of potential optimal solutions, the algorithm should promptly identify this deviation, filter out ineffective search directions and dynamically adjust the search range. To address this issue, this study proposes a memory pool mechanism based on RL, which introduces an agent that decides whether to retain an individual based on the performance of the currently optimized individual. Decisions regarding actions are made by the Agent using only the current search state, satisfying MDP requirements. Therefore, RL methods can be used to train and optimize the agent.
In practical application scenarios, the Agent continuously executes decision-making actions at different iteration stages of the algorithm, based on the current search state of the population (changes in the population’s optimal fitness and individual diversity). It obtains rewards through the interaction results with the evolutionary environment, and after multiple iterations, gradually learns the optimal individual screening strategy to achieve a higher expected return. Since this study aims to use RL to assist individual retention in PDCPP, priority is given to RL methods that have a simple structure, are easy to integrate with evolutionary algorithms, and offer stable performance. Based on this, the proposed algorithm adopts Q-learning—a classic model-free method in RL. Considering the core requirement of balancing population diversity and convergence speed in the PDCPP problem, the state, action, and reward of the Agent are defined below.
State: the state space is defined as . denotes that the current population’s optimal fitness has improved relative to the prior generation. indicates that the best fitness remains unchanged across generations. reflects a decline in the current population’s optimal fitness compared with its predecessor.
Action: the action dimension concerns how the algorithm manages the target individual and encompasses three possible treatments, including keeping it within the current population, transferring it to the Elite Archive for potential reuse, or removing it from consideration. The Elite Archive serves as a buffer for preserving promising individuals during the search.
Reward: the reward is determined by evaluating the change in optimal fitness between successive generations, and its formulation is presented in Equation (18).
The difference parameter
is calculated as
Here,
denotes the fitness value of the optimal individual in the current population, while
represents the fitness value of the optimal individual in the previous generation population. According to the basic principle of Q-learning, after the Agent evaluates a specific state-action pair, the update of its Q-value can be given by Equation (19). Here, Q
denotes the action-value function (Q-value),
denotes the learning rate, and
denotes the discount factor.
In most cases, MSGA-RL favors the action associated with the highest Q-value. To balance exploration and exploitation, a threshold-based mechanism is incorporated. For each decision step, a random number in [0, 1] is generated; when this number falls below the threshold, the algorithm performs a stochastic action choice, whereas otherwise it adopts the action currently holding the maximum Q-value. This mechanism strengthens exploratory behavior and reduces the risk of the search becoming trapped in suboptimal regions.
3.5. Elite Archive
When the RL agent elects to place an individual into the Elite Archive, the system executes a two-stage procedure. Initially, the individual exhibiting the highest performance in the current iteration is selected as a candidate, which is subsequently inserted into the Elite Archive for temporary retention.
In this study, the Elite Archive is configured to 20% of the population size. For populations between 50 and 200 individuals, this setting provides an effective balance between preserving high-quality genes and maintaining diversity, and its effectiveness is validated in subsequent experiments. For populations outside this range, the proportion may need to be recalibrated. Once this capacity is reached, a population-update evaluation is initiated. During this process, the system compares the best fitness value contained within the Archive with that of the active population. Based on this assessment, the algorithm determines whether the individuals preserved in the Archive should supplant the current population.
When the Elite Archive contains an individual with superior fitness compared to the active population, the current population is fully substituted. The replacement individuals come from the Elite Archive. This is to introduce higher-quality evolutionary genes. If not, the structure of the current population remains unchanged. Once the population update evaluation is complete, the Elite Archive is fully cleared. This prepares it for the next cycle of individual retention and storage. This temporary storage-evaluation-replacement cycle is dynamic. Through it, the Elite Archive mechanism not only preserves potential high-quality evolutionary directions for the population but also mitigates the degradation of population diversity. This dual objective is achieved through periodic resetting, ultimately forming a complementary and synergistic effect with the RL decision-making process.
As the final part of this section, Algorithm 1 provides the main framework of MSGA-RL.
| Algorithm 1 The Framework of MSGA-RL |
- Require:
, , , , , , Q, , , , , H - Ensure:
Minimum energy consumption - 1:
The initialization parameters are shown in Table 1; - 2:
Set , ; - 3:
; - 4:
; - 5:
for
to
do - 6:
; - 7:
; - 8:
; - 9:
; - 10:
if Preserve Offspring then - 11:
; - 12:
end if - 13:
if save the individual into the Archive then - 14:
; - 15:
end if - 16:
if then - 17:
; - 18:
if then - 19:
; - 20:
; - 21:
end if - 22:
end if - 23:
; - 24:
; - 25:
; - 26:
; - 27:
if then - 28:
; - 29:
end if - 30:
end for
|
4. Simulation Experiments and Results
This study selects post-disaster rescue areas as the research scenario, simplifying them into two typical geometric shapes, convex quadrilaterals and convex pentagons. These shapes cover the common geometric features of post-disaster regions [
17,
26,
27]. Post-disaster areas often contain numerous obstacles, such as building ruins. To ensure path safety, this study adopts a dual-circle obstacle avoidance strategy [
11] to meet the requirements of the PDCPP scenario.
To verify the performance of the proposed MSGA-RL algorithm in solving PDCPP problems, this section conducts simulation experiments with the following analyses. First, within the same convex polygon region, the energy consumption and path length of the MSGA-RL algorithm are compared with those of three other algorithms (GA, SPC, and DEGA). This comparison intuitively evaluates the performance advantages of MSGA-RL. Second, control experiments are conducted on each improved strategy within the MSGA-RL algorithm. These experiments quantitatively analyze the contribution of key optimization strategies to algorithm performance. Finally, boxplots are used to analyze differences between the multi-selector crossover operator and the crossover operators in other comparison algorithms. This clarifies its role in improving algorithm stability.
To ensure uniform experimental conditions, the parameters of GA, DEGA, and MSGA-RL are all set strictly following the configurations in Reference [
11]. The detailed parameter settings are listed in
Table 2.
To evaluate the search performance of each algorithm more accurately, each experimental instance is run repeatedly for 5000 times.
is the population size,
is the number of iterations,
is the number of coverage strips, and
represents the number of individuals updated via the RL retention mechanism. As shown in
Table 3, the computational complexity of GA and DEGA is
, MSGA-RL further adds
to account for the reinforcement learning-based individual retention mechanism. Therefore, MSGA-RL has a slightly higher computational cost compared with GA and DEGA. However, this additional overhead is acceptable in post-disaster rescue scenarios, as the optimization is performed offline before the UAV executes the coverage mission. This analysis clarifies the trade-off between enhanced path optimization and increased computation time, and it demonstrates that MSGA-RL is suitable for high-endurance missions that tolerate offline planning.
To determine the impact of Elite Archive size on the algorithm, five control groups with Elite Archive sizes of 5%
, 10%
, 20%
, 30%
, and 50%
were established under the conditions of a convex quadrilateral region and
= 10. Final stable turning energy consumption and convergence iteration count were used as evaluation metrics to verify how different Elite Archive sizes affect the algorithm’s performance. As shown in
Table 4, the 20%
group demonstrated lower final stable turning energy consumption and a faster convergence rate compared to the other control groups. Its convergence iterations were reduced by 13.2%, 5.9%, 7.3%, and 19.6% compared with the 5%
, 10%
, 30%
, and 50%
groups, respectively. This result indicates that an Elite Archive size of 20%
optimally balances retention of high-quality genes and maintenance of population diversity. It prevents the loss of high-quality, low-energy-consuming individuals while not compressing the evolutionary space of ordinary individuals, thus achieving the optimal algorithm performance. Therefore, in the subsequent simulation experiments of this study, the Elite Archive size is set to 20%
.
MSGA-RL adopts different turning strategies based on the relationship between the scanning width of fixed-wing UAV and the size of
. When
w ≤
, the improved MSGA-RL is used to optimize the access sequence of coverage paths; when
w >
, a row-by-row access strategy is adopted. Therefore, this study introduces different scanning widths to conduct comparative experiments within convex quadrilateral and convex pentagonal areas. Based on Equation (
3), three distinct test scenarios were designed, with each scenario containing 10, 20, and 50 coverage paths within the target area, respectively.
Since the flight direction of UAV is predetermined by the minimum span algorithm, the SPC energy consumption of each algorithm within the same test area is fixed. Only the turning flight energy consumption outside the mission area is accounted for in the experiments;
Figure 6 and
Figure 7 present the respective comparison results.
In the comparative experiments on turning energy consumption in both convex quadrilateral and convex pentagon regions, the turning energy consumption of all algorithms showed an upward trend as the number of flight strips (10, 20, 50) increased. This is because a larger number of strips leads to more turns between strips, thereby accumulating higher total turning energy consumption.
From the horizontal comparison of algorithms, the MSGA-RL algorithm achieved the lowest turning energy consumption in all strip quantity scenarios across both types of regions. In the convex quadrilateral region (first set of data), when = 10, the energy consumption of MSGA-RL was 31.5%, 16.4%, and 8.1% lower than that of SPC, GA, and DEGA, respectively; when = 50, its energy consumption was 52.8% lower than that of SPC, and the advantage became more significant as the number of strips increased. In the convex pentagon region (second set of data), MSGA-RL consistently outperformed the other algorithms. When = 10, its energy consumption was lower than that of SPC and GA; when = 50, it was 5.7% lower than that of DEGA.
In addition, among traditional algorithms, SPC consistently had the highest turning energy consumption. Although GA and DEGA outperformed SPC, their energy consumption levels were significantly higher than that of MSGA-RL. This indicates that MSGA-RL can more effectively reduce the invalid turning consumption between strips. Especially in complex scenarios with a large number of strips, its energy consumption optimization capability is more practically valuable.
In this section, the convergence performance, algorithm stability, and analysis as well as verification of variant algorithms are all focused on the scenario with . The core rationale lies in the fact that under this specific number of flight strips, the baseline flight energy consumption levels of the four algorithms exhibit minimal discrepancies. This setup enables more precise delineation of the stability differences among the algorithms during the iterative process, while precluding interference from excessive initial energy consumption gaps that could obscure the accurate assessment of the algorithms’ core performance.
Figure 8 and
Figure 9 illustrate the convergence performance of different algorithms under two scenarios (convex quadrilateral and convex pentagon) (
= 10), respectively. As observed from the curve trends, MSGA-RL exhibits the most superior convergence performance. During the early phase of the search in both scenarios, the algorithm exhibits substantially reduced energy demand for turning maneuvers relative to DEGA and GA, providing an initial advantage in rapid optimization. As iterations proceed, MSGA-RL continues to expand the solution space via its adaptive crossover–mutation strategy and reinforcement learning-based individual retention mechanism, ultimately achieving stable convergence earlier than DEGA and GA. It should be specifically noted that the convergence curve of the SPC algorithm is not included in the figures, and the core reason lies in the inherent difference in its algorithm mechanism. SPC adopts a fixed-order regional scanning strategy, and path generation does not rely on an evolutionary iteration process. Therefore, the corresponding path length and flight energy consumption are constant values, and a convergence curve that changes with the number of iterations cannot be formed. To ensure the validity and interpretability of the comparison, this study only presents the iterative convergence processes of the GA, DEGA, and MSGA-RL algorithms that possess evolutionary search capabilities.
Subsequently, to verify the role of the distance-prioritized greedy initial population strategy in the PDCPP scenario, this study constructs three variant algorithms with key modules removed. MSGA-RL without the distance-prioritized greedy initial population strategy (denoted as MSGA-RL-W1); MSGA-RL without the RL-based individual retention feature removed (denoted as MSGA-RL-W2); MSGA-RL without the multi-selector crossover operator strategy (denoted as MSGA-RL-W3). Control experiments are conducted between the three aforementioned variant algorithms and the MSGA-RL algorithm. In the PDCPP scenarios of different convex polygon regions ( = 10), comparative analysis is carried out using the fitness value as the monitoring indicator. By comparing the fitness differences between the original algorithm and each variant algorithm, the effect of the distance-prioritized greedy initial population strategy on the initial quality of the population can be clearly observed, thereby clarifying its role in optimizing the energy consumption of coverage paths.
As shown in
Figure 10, the multiple proposed improved strategies all contribute to enhancing the overall search capability of the algorithm in the PDCPP scenario. Among them, the distance-prioritized greedy initial population strategy has the most prominent effect on improving algorithm performance. This is because, in the early stage of path planning, a high-quality initial population can lay a solid foundation for the subsequent evolutionary search of the algorithm. By prioritizing distance factors to generate the initial population, this strategy enables the initial solutions to possess a certain potential for path optimization. It reduces the time and computational cost required for the algorithm to screen high-quality solutions from a large number of low-quality solutions in subsequent iterations, thereby facilitating the algorithm to quickly discover high-quality solutions.
In the PDCPP scenario with energy consumption as the optimization objective, algorithm stability serves as a core prerequisite for ensuring the efficient completion of tasks. To enhance algorithm stability and thereby support the stable execution of tasks by UAVs, this paper adopts a multi-selection crossover operator to construct MSGA-RL. Through comparative experiments with crossover operators of other algorithms, the stability of the multi-selection crossover operator employed in MSGA-RL is verified in the PDCPP scenario. As shown in the boxplots of
Figure 11 and
Figure 12, in both convex quadrilateral and convex pentagon regions, the energy consumption cost median of the multi-selector crossover operator used in the MSGA-RL algorithm is lower than that of the three-point crossover operator and the random crossover operator. This intuitively demonstrates that the multi-selector crossover operator has better path optimization capabilities and can effectively reduce flight energy consumption. Meanwhile, the multi-selector crossover operator has the smallest interquartile range, indicating that the paths generated in multiple experiments have extremely high stability. In contrast, the random crossover operator has a large number of outliers, which reflects that it tends to fall into local optima and is difficult to continuously explore globally better paths.
5. Conclusions
A fixed-wing UAV can quickly collect terrain and position information in post-disaster rescue scenarios, providing key data support for the efficient organization of rescue operations. To address the rescue timeliness of a fixed-wing UAV in All-Weather PDCPP missions, this study proposes the MSGA-RL optimization algorithm. In convex quadrilateral and pentagonal areas, three different scanning widths were introduced, and performance comparison experiments were conducted using various path planning algorithms. When = 20, MSGA-RL reduces turning energy consumption compared with the SPC, GA, and DEGA algorithms in both convex quadrilateral and pentagonal regions, with reductions of 45.53%, 22.89%, and 9.32% in the quadrilateral region, and 31.89%, 21.69%, and 11.58% in the pentagonal region, while also demonstrating superior stability. Specifically, the distance-priority greedy initialization strategy ensures the quality of the initial population, the multi-selection crossover operator enhances convergence efficiency and global search capability, and the RL adaptive individual retention mechanism combined with a memory pool optimizes solution quality and search efficiency. To evaluate the performance of the algorithm, this study simulates complex post-disaster scenarios using convex polygons and conducts comparative experiments between MSGA-RL and classical algorithms such as SPC, GA, and DEGA. The results indicate that MSGA-RL can substantially reduce the total flight energy consumption of a fixed-wing UAV while effectively enhancing the responsiveness of post-disaster rescue operations.
It should be noted that the primary focus of MSGA-RL is on minimizing UAV turning energy to ensure efficient post-disaster area coverage. While the algorithm achieves significant energy savings, it requires offline computation to optimize the coverage path sequences. In practical scenarios, this introduces a trade-off between the “energy-saving benefits” and the “potential computational overhead.” Therefore, MSGA-RL is particularly suitable for high-endurance missions where the offline planning time is acceptable. For scenarios demanding real-time path planning, the computational cost should be considered, and MSGA-RL can be adapted or combined with faster heuristics to balance energy efficiency and planning speed.
However, in real post-disaster scenarios, the affected areas can be extensive, and operations relying on a single fixed-wing UAV may fail to meet timeliness requirements. Therefore, future research could focus on multi-UAV collaborative path planning and coverage strategies to further improve the efficiency and coverage quality of large-scale post-disaster rescue missions.