Disaster Relief Coverage Path Planning for Fixed-Wing UAV Based on Multi-Selector Genetic Algorithm and Reinforcement Learning

Yang, Jing; Lu, Xuemeng; Cui, Mingyang

doi:10.3390/aerospace13020192

Open AccessArticle

Disaster Relief Coverage Path Planning for Fixed-Wing UAV Based on Multi-Selector Genetic Algorithm and Reinforcement Learning

by

Jing Yang

^1,2,3,*,

Xuemeng Lu

^1,2,3 and

Mingyang Cui

^1,2,3

¹

Key Laboratory of Grain Information Processing and Control, Henan University of Technology, Ministry of Education, Zhengzhou 450001, China

²

Henan Key Laboratory of Grain Storage Information Intelligent Perception and Decision Making, Henan University of Technology, Zhengzhou 450001, China

³

Henan Engineering Research Center of Grain Condition Intelligent Detection and Application, Henan University of Technology, Zhengzhou 450001, China

^*

Author to whom correspondence should be addressed.

Aerospace 2026, 13(2), 192; https://doi.org/10.3390/aerospace13020192

Submission received: 5 January 2026 / Revised: 11 February 2026 / Accepted: 16 February 2026 / Published: 17 February 2026

(This article belongs to the Special Issue Advances in UAV Technology: Dynamics, Guidance, Navigation, and Control of Transformative Aerial Vehicles)

Download

Browse Figures

Versions Notes

Abstract

When a fixed-wing Unmanned Aerial Vehicle (UAV) conducts All-Weather Post-Disaster Coverage Path Planning (PDCPP), the commonly used Sequential Path Coverage (SPC) method tends to generate redundant flight distance during turning transitions between adjacent coverage paths, which in turn increases the UAV’s flight energy consumption and thereby compromises the timeliness of rescue information acquisition. To address these challenges, this paper proposes a Multi-Selector Genetic Algorithm with Reinforcement Learning (MSGA-RL). It enhances population diversity through a distance-priority heuristic greedy initialization strategy, employs a multi-selector crossover operator to improve both solution diversity and convergence speed, and integrates a reinforcement learning-based individual retention mechanism with an elite pool protection strategy to prevent premature convergence. To simulate post-disaster scenarios, the disaster-affected area is modeled as a convex polygonal region with obstacles, while the flight energy consumption and stability of MSGA-RL are evaluated under different numbers of coverage paths. Simulation results indicate that, across all coverage path settings, MSGA-RL consistently achieves lower flight energy consumption than SPC, the Genetic Algorithm (GA), and the Dubins-based Enhanced Genetic Algorithm (DEGA), while exhibiting superior stability. In particular, in the convex quadrilateral scenario with 50 coverage paths, the flight energy consumption of MSGA-RL is reduced by 52.80%, 32.06%, and 15.96% compared with SPC, GA, and DEGA, respectively.

Keywords:

MSGA-RL; PDCPP; energy consumption; fixed-wing UAV

1. Introduction

In recent years, global disasters have occurred frequently. Countries like Ecuador, Turkey, Afghanistan, Tibet, and Myanmar have been hit by a series of major disasters. During the golden 72 h rescue window, adverse conditions, including heavy rains and nighttime darkness, hinder rescuers from rapidly obtaining comprehensive situational information about the disaster-stricken area, prolonging the rescue response time. Meanwhile, complex terrain composed of collapsed buildings and broken roads compels ground rescue teams to perform repeated surveys and detours, which further reduces rescue efficiency. A fixed-wing Unmanned Aerial Vehicle (UAV), taking advantage of its strengths in long-endurance flight, wide-area coverage capability [1], high flight stability [2], and strong environmental adaptability [3], enables continuous surveying of post-disaster areas. This avoids interruptions in data collection due to frequent recharging trips, thereby providing continuous dynamic data support for rescue decision-making. Equipped with a Light Detection And Ranging (LiDAR) device, it can output stable and reliable point cloud data under harsh conditions, allowing rescuers to grasp the overall disaster situation in the affected area. Based on these capabilities, the fixed-wing UAV equipped with LiDAR can continuously cover disaster-stricken areas in scenarios such as nighttime and heavy rains, providing technical support for All-Weather Post-Disaster Coverage Path Planning (PDCPP).

The Sequential Path Coverage (SPC) algorithm [4] enables reliable, complete coverage of post-disaster areas through continuous and parallel non-overlapping scanning. However, the turning capability of a fixed-wing UAV during transitions between adjacent scan lines is bounded by its flight dynamics, which are governed by factors such as allowable bank angle, climb angle, and airspeed. Meanwhile, the spacing between neighboring coverage paths is determined by the LiDAR focal length and flight altitude, in many cases, this spacing becomes smaller than the UAV’s feasible turn radius [5]. This mismatch forces the aircraft to execute multiple consecutive curved maneuvers when shifting from one parallel path to the next. These maneuvers increase the turning distance and energy consumption, reducing task timeliness. Reference [6] studied the Traveling Salesman Problem-Coverage Path Planning (TSP-CPP) in post-disaster scenarios, proposed a mixed-integer programming formulation suitable for this scenario, and introduced a CPP method for covering polygonal areas. The validity and performance of the proposed approach were confirmed through rigorous theoretical analysis and simulation studies. In this study, the fixed-wing UAV determined the optimal scanning direction by calculating the supporting parallel lines of a convex polygonal area and generated parallel, non-intersecting coverage paths. Reference [7] focused on agricultural operation scenarios and proposed a coverage path generation scheme for the fixed-wing UAV, applicable to arbitrary polygonal areas. Its core idea is to integrate wind field interference on the fixed-wing UAV’s ground track to minimize operation time. In this approach, coverage paths are represented as parallel strips, and the method further identifies the optimal alignment of these strips, together with the priority of entry and the trajectory orientation for each strip. However, it only uses the straight-line distance between path endpoints when calculating turning distance and does not fully account for the kinematic constraints and flight safety limits of the fixed-wing UAV. This scheme cannot perform “point-to-point” straight-line turns, which significantly deviates from real operational scenarios. Similarly, references [4,6,7] do not account for the maneuvering limitations imposed on a fixed-wing UAV during operations in post-disaster regions.

Based on the correlation between the minimum maneuvering radius and the sensor detection range, reference [8] proposed a staggered reciprocating flight mode. While this mode improved the turning efficiency of the fixed-wing UAV to some extent, it relied on a fixed access sequence and lacked a global optimization mechanism. This can easily lead to redundant flight paths. Additionally, as the number of paths increased, the data volume of the TSP grew quadratically, significantly extending path planning time. Reference [9] used the SPC algorithm to generate initial flight paths and then optimized the regional traversal order using a Genetic Algorithm (GA) to shorten the turning distance. However, the initial GA population was mostly generated randomly. This resulted in low individual quality, insufficient excellent solutions, restricted convergence speed, and compromised global search performance. It also led to high individual redundancy, which made it difficult to fully cover the optimal solution region. This intensified the randomness in the early stage of evolution, making the algorithm prone to falling into the trap of local optimality. Reference [10] attempted to introduce heuristic rules to improve the initialization process of the population. However, this method only generated some of the components heuristically, while the rest were still generated randomly, resulting in insufficient population diversity. Thus, it failed to overcome the limitation of local optimal solutions caused by frequent turns in short-distance paths.

Reference [11] achieved initial CPP by combining the minimum span algorithm and the round-trip path generation algorithm. They also realized global path optimization by improving the crossover operator in the Dubins-based Enhanced Genetic Algorithm (DEGA). However, this single crossover operator slowed down the algorithm’s convergence speed and significantly increased the probability of generating infeasible solutions in constrained or combinatorial optimization problems, which significantly impaired the algorithm’s performance. Reference [12] proposed matching multiple crossover operators to two parent chromosomes and constraining the sum of all operators’ usage probabilities to 1. However, this method used an equal-probability selection strategy, failing to dynamically adjust according to operator characteristics or the population’s evolutionary state. This leads to insufficient population offspring diversity and traps the algorithm in local optimality.

Deep Q-network (DQN)-based reinforcement learning (RL) techniques, along with other RL approaches, have also been investigated in previous studies [13]. For instance, using the Markov Decision Process model, a DQN based deep RL method was designed for search and rescue path planning. Reference [14] adopted an RL-based strategy to solve the problem, realizing path planning via training across diverse environments. To address the instability of the Deep Deterministic Policy Gradient (DDPG) algorithm when applied to fixed-wing UAV path planning, reference [15] proposed a multi-critic delayed DDPG algorithm. However, in practical applications, relying solely on RL for decision-making is easily affected by the distribution of training samples, and its convergence stability needs to be improved urgently in multi-constraint scenarios.

To overcome the limitations of GA in generating high-quality initial populations, the excessive turning energy consumption of SPC, and the slow convergence associated with single crossover operators, the main contributions of this study are as follows:

This study constructs a mixed-integer programming model to optimize the PDCPP problem of the fixed-wing UAV. To minimize energy expenditure during flight, the model incorporates constraints including turning angle and endurance, enabling All-Weather rescue and supporting efficient disaster relief operations;
To address the energy consumption issue in PDCPP, this study proposes the Multi-Selector Genetic Algorithm-Reinforcement Learning (MSGA-RL) algorithm. Featuring distance-priority initialization, multi-selector crossover, and RL-based Elite Archive to avoid local optima and enhance convergence;
This study conducts simulation experiments in a post-disaster rescue environment, selecting convex quadrilateral and pentagonal areas as representative task scenarios. The results indicate that MSGA-RL significantly reduces energy consumption compared with benchmark algorithms. Further ablation experiments verify the effectiveness of each improvement strategy in minimizing energy consumption. Considering that algorithmic stability is critical for reliable mission execution, a boxplot-based analysis is performed on the multi-selector crossover operator in MSGA-RL, demonstrating its superior stability.

The paper is organized as follows: Section 2 introduces the materials and methods of the PDCPP. Section 3 details the MSGA-RL algorithm. Section 4 presents simulation experiments and performance evaluation. Section 5 concludes with a summary and future research directions.

2. Materials and Methods of the PDCPP

In this study, to focus on validating the effectiveness of the proposed MSGA-RL algorithm in typical post-disaster coverage scenarios, several simplifying assumptions are made regarding the environment and UAV operations. Specifically, the following simplifying assumptions are considered:

1.: Communication failures, including loss of command and telemetry channels, are not considered.
2.: Extreme meteorological conditions, such as strong winds or turbulence, are not considered.
3.: The proposed MSGA-RL algorithm focuses on task-level and path-level offline planning and does not directly involve low-level flight control or real-time trajectory tracking.

These simplifications allow the study to first evaluate the algorithm’s performance under standardized and controlled conditions, while more complex operational factors are left for future work.

As shown in Table 1, the symbols and parameters used in the MIP model in this study are presented.

As shown in Figure 1, the fixed-wing UAV equipped with a LiDAR [16] device performs PDCPP starting from the take-off point. It avoids obstacles, covers the target area, and then returns to the take-off point. This provides rescuers with terrain information of the disaster-stricken region and significantly reduces rescue time. According to reference [17], disaster-affected areas can often be modeled as convex polygons. For non-convex disaster-affected regions, convexification can be performed—while ensuring complete coverage—using methods such as the convex hull algorithm [18], partition-based convexification [19], or buffer-based convexification [20]. Taking a pentagon as an example, as illustrated in Figure 1, the vertices of the convex pentagon are defined in a counterclockwise order in the Cartesian coordinate system, denoted as

V = (v_{1}, v_{2}, \dots, v_{5})

, where no three consecutive vertices are collinear.

O_{i}

(where i = 1, 2, 3) represents circular obstacles within the mission area.

Figure 2 presents the schematic of a fixed-wing UAV outfitted with a LiDAR system, executing PDCPP. Let l represent the flight altitude, ABCD represent the projection area, and

w_{1}

and

w_{2}

represent the short and long sides of the sensing area, respectively. In this paper, the scanning width of the fixed-wing UAV is defined as the short side length

w_{1}

of the sensing area (as shown in Figure 2,

w = w_{1}

), to avoid potential edge blurring in captured images. This ensures full coverage of the target area without missing edge details.

During All-Weather rescue missions, when flight speed and payload are fixed, the UAV’s minimum maneuvering radius represents the key constraint. Figure 3 depicts the relevant motion characteristics and variables.

According to Figure 3, the following relationship can be obtained:

\{\begin{matrix} F \cos φ_{m a x} & = m g \\ F \sin φ_{m a x} & = \frac{m v^{2}}{R_{m i n}} \end{matrix}

(1)

In Equation (1),

F

denotes the lift experienced by the fixed-wing UAV,

φ_{m a x}

is the maximum roll angle,

R_{m i n}

is the minimum maneuvering radius,

v

is the flight speed, and

g

is the gravitational acceleration. The calculation formula for

R_{m i n}

can be expressed as follows [21]:

R_{m i n} = \frac{v^{2}}{g \tan φ_{m a x}}

(2)

SPC is a classic method for solving the PDCPP problem. It is particularly suitable for early-stage application scenarios with low real-time performance and hardware requirements. With sequential traversal of sub-regions as its core logic, SPC does not require complex optimization calculations. The path is highly deterministic in that once the sub-region division is complete, the traversal order, turning points, and other elements are fixed. This results in high predictability, facilitating advance planning and risk prediction. The SPC algorithm determines the optimal flight direction based on the minimum span direction [22]. By traversing the boundary of the task area, it calculates the maximum vertical distance from each vertex to the edges. Among these maximum vertical distances, the minimum one is recorded as

L_{m i n}

, and the edge corresponding to this minimum vertical distance

L_{m i n}

is regarded as the optimal flight direction for the fixed-wing UAV.

Figure 4 shows a schematic diagram of the optimal flight direction, where

e_{i} = (e_{1}, e_{2}, \dots, e_{5})

represents the side length between vertices

v_{i}

and

v_{i + 1}

, the blue edge denotes the minimum width of the area, the red edge indicates the optimal flight direction for the fixed-wing UAV. Then, the minimum number of scan paths required to cover the entire area is denoted as

N_{c}

, as shown in Equation (3). The intersection points of each scan line with the boundary of the target area are calculated, and the resulting set is defined as F(i), which is given in Equation (4).

N_{c} = ⌈\frac{L_{m i n}}{w}⌉

(3)

F (i) = {m_{i, 1}, m_{i, 2}}, i = 1, \dots, N_{c}

(4)

In Equation (3), w represents the coverage range of the fixed-wing UAV equipped with an on-board camera,

L_{m i n}

denotes the minimum width distance of the target area, and

⌈ . ⌉

stands for the ceiling function (rounding up). This ensures that the scanning paths can fully cover the boundary of the area.

In Equation (4),

m_{i, 1}

,

m_{i, 2}

represent the coordinates of all intersection points between the i-th scanning path and the area boundary.

SPC algorithm requires the fixed-wing UAV to fly along paths in sequential order. When the UAV switches between adjacent paths, it must detour through multiple curved segments to adapt to its kinematic constraint of being unable to turn instantaneously, which leads to additional energy consumption. Based on this, the turning process between adjacent paths can be divided into two scenarios, each of which has a targeted optimization [23], as illustrated in Figure 5.

Figure 5 illustrates two regional coverage strategies, where the yellow line segments indicate the turning trajectories of the fixed-wing UAV beyond the boundaries of the mission area. The calculation formula for the turning flight distance is as follows:

D = \{\begin{matrix} T + π \cdot N_{m i n} + 2 μ \cdot R_{m i n}, & d < R_{m i n} \\ π \cdot N_{m i n} + d \cdot \cos γ, & d = 2 R_{m i n} \end{matrix}

(5)

where

μ = arccos (\frac{d}{2 R_{m i n}})

,

γ = arctan (\frac{|m_{i, k} . y - m_{i + 1, l} . y|}{|m_{i, k} . x - m_{i + 1, l} . x|})

, and

T = R_{m i n} \times \sin (μ) - d \times \tan (γ)

, d is the span between flight paths.

SPC achieves PDCPP by generating continuous scanning flight paths. However, there is a flaw in the trajectory switching phase. Due to the kinematics of the fixed-wing UAV, the UAV’s minimum maneuvering radius often exceeds the spacing between adjacent flight paths. This compels the UAV to perform multiple detours. These redundant detours increase energy consumption per unit distance and reduce task timeliness, which is unfavorable for rescue operation implementation.

3. The MSGA-RL Method

In All-Weather coverage scenarios, high energy consumption has become a bottleneck restricting continuous operations. To achieve full-area coverage, a fixed-wing UAV needs to maintain long-endurance flight in complex environments. However, flight paths with continuous small-angle sharp turns will lead to a sharp increase in energy consumption, making it difficult to meet the requirements of uninterrupted All-Weather operations. The post-disaster UAV coverage path planning (PDCPP) problem can be formalized as a variant of the Generalized Traveling Salesman Problem (GTSP). In PDCPP, each coverage strip can be regarded as a “cluster,” and the optimization objective is to determine the visiting sequence of all coverage clusters such that the UAV completes the entire scanning mission while minimizing turning energy consumption. Specifically, if the set of coverage strips is

{C_{1}, C_{2}, \dots, C_{N_{c}}}

, the task can be interpreted as selecting one representative point from each cluster and determining the visiting order to minimize cumulative turning costs. This formulation aligns exactly with the definition of GTSP, where each cluster must be visited exactly once.

Compared with standard TSP solvers, MSGA-RL offers the following advantages: Multi-strategy genetic operations: By employing multi-selector crossover and adaptive mutation, MSGA-RL can flexibly explore the complex solution space of coverage strip sequences. Reinforcement learning-based individual retention mechanism: High-quality individuals are preserved to prevent the loss of superior paths, accelerating convergence and reducing energy consumption. Adaptation to complex obstacle environments: Traditional TSP methods typically assume fixed distances between nodes, whereas in PDCPP, UAV must avoid obstacles. MSGA-RL can optimize the visiting sequence while ensuring obstacle avoidance. Therefore, MSGA-RL is particularly well-suited for this GTSP variant, and it can substantially reduce UAV turning energy and improve mission efficiency in complex post-disaster environments.

Therefore, this study selects “minimum energy consumption” as the optimization objective. Reducing redundant turns and optimizing attitude adjustments can maximize the coverage duration and range of the fixed-wing UAV in a single flight, thereby avoiding mission interruptions caused by insufficient endurance.

\min_{\begin{matrix} k, l \in {1, 2} \end{matrix}} \sum_{i = 0}^{N_{c} - 1} E_{d i s} (m_{i, k}, m_{i + 1, l})

(6)

E_{d i s} (m_{i, k}, m_{i + 1, l}) = P \times (\frac{θ v}{g \tan φ_{m a x}})

(7)

P = (c_{1} + \frac{c_{2}}{g^{2} r^{2}}) v^{3} + \frac{c_{2}}{v}

(8)

In Equation (6),

E_{d i s}

(

m_{i, k}

,

m_{i + 1, l}

) represents the minimum flight energy consumption of the fixed-wing UAV along the Dubins feasible curve between

m_{i, k}

and

m_{i + 1, l}

.

m_{i, k}

represents the connection intersection point of the i-th scan strip, where

k \in {1, 2}

(i.e., either

m_{i, 1}

or

m_{i, 2}

), serving as the endpoint connected to the next scan strip.

m_{i + 1, l}

represents the connection intersection point of the (

i + 1

)-th scan strip, where

l \in {1, 2}

(i.e., either

m_{i + 1, 1}

or

m_{i + 1, 2}

), serving as the endpoint connected to the previous scan strip. In Equation (7) [24], P denotes turn power,

θ

denotes turn angle, and

φ_{m a x}

denotes turn slope angle. Equation (8) describe the power consumption of the fixed-wing UAV during its turning flight. Here,

c_{1}

and

c_{2}

are constants related to the fixed-wing UAV’s parameters and environment (estimated based on weight, area, air density, etc.) [24],

v

denotes velocity.

Furthermore, in the context of the path connection and energy consumption control model for post-disaster rescue scenarios, Equation (6) is subject to the constraints listed below:

\sum m_{i, k} = 1, i \in {1, 2, \dots, N_{c}}, k \in {1, 2}, m_{i, k} \in {0, 1}

(9)

\sum_{z \neq i} m_{m_{z 1}, m_{i 1}} = 1, \sum_{z \neq i} m_{m_{i 1}, m_{z 1}} = 1

(10)

E_{u} \sum_{i \neq j} \sum_{k, l \in {1, 2}} ℓ (m_{i, k}, m_{j, l}) \leq E_{m a x}

(11)

ℓ (m_{i, k}, m_{j, l}) = D, i \neq j \in {1, 2, \dots, N_{c}}, k, l \in {1, 2}

(12)

Equation (9) indicates that each path point is visited exactly once, where

m_{i, k} = 1

means the path point is selected and the path selection is a binary decision. Here, i denotes the serial number of the flight path, and k represents the serial number of the intersection point between the flight path and the area. Equation (10) specifies that the start point or end point of each path must have exactly one incoming path from the end point or start point of another path, where

m_{m_{z 1}, m_{i 1}}

refers to the coordinates of the intersection points between the flight paths and the area. Equation (11) ensures that the total distance-based power consumption of the fixed-wing UAV does not exceed its maximum battery capacity. In this equation,

ℓ (m_{i, k}, m_{j, l})

is the path distance,

E_{u}

represents the fixed-wing UAV’s power consumption per unit distance, and

E_{m a x}

denotes the fixed-wing UAV’s maximum battery capacity. Equation (12) represents the total turning distance of the fixed-wing UAV within the mission area.

To effectively address the high energy consumption issue of the fixed-wing UAV in the All-Weather PDCPP scenario, this section proposes the MSGA-RL algorithm. The specific innovations are as follows:

Population Initialization: The algorithm applies a distance-priority heuristic during population initialization to strengthen diversity. This heuristic further enhances its ability to explore the search space effectively.
Crossover Strategy: The algorithm employs a multi-selector crossover operator. This approach allows it to achieve faster convergence. It also helps maintain a diverse set of candidate solutions.
Individual Retention Mechanism: The algorithm integrates an RL-based retention mechanism with an Elite Archive. This integration reduces the likelihood of premature convergence. It also helps maintain high-quality individuals throughout the search process.

3.1. Generation of Initial Population

To improve the quality of initial individuals, enhance population diversity, and strengthen the algorithm’s initial exploration capability, this study proposes a distance-priority heuristic greedy initialization strategy.

Within the general population, each chromosome is defined as a specific sequence of target points, and its generation integrates two distinct approaches, namely a distance-prioritized heuristic and a greedy algorithm. The distance-prioritized heuristic functions as follows. Initially, all target points are sorted in ascending order based on their Euclidean distance from the initial point. Subsequently, a partial segment of the chromosome is generated according to this sequence. Once a partial segment of a chromosome has been constructed using the distance-prioritized heuristic, the remainder of the chromosome is completed through a greedy algorithm. More precisely, let the length of a chromosome be

|F (i)|

for i = 1, 2, …,

N_{c}

(i.e., the total number of task points), and let the population size be

N_{p}

. Then, for individuals from the first to the

N_{p}

-th, the lengths of the segments generated according to the heuristic rules are given as follows:

\frac{| F (i) |}{N_{P} + 1}, \frac{2 | F (i) |}{N_{P} + 1}, \dots, \frac{(N_{P} - 1) | F (i) |}{N_{P} + 1}, \frac{N_{P} | F (i) |}{N_{P} + 1}

(13)

s^{*} = \arg \min_{s_{j} \in U} C (s_{c u r}, s_{j})

(14)

For the remaining task points not included in the heuristic sequence, the greedy algorithm is adopted to generate the subsequent segment. The specific steps are as follows. Set the last task point in the heuristic segment as the initial starting point, then initiate the iterative construction phase. At each step, select a task from the set U of currently unvisited tasks, where

s^{*}

has the minimum transfer cost from the current task

s_{c u r}

. This selection follows the formula.

The optimal task

s^{*}

is selected, where C(

s_{c u r}

,

s_{j}

) represents the transfer cost from task

s_{c u r}

to task

s_{j}

(the transfer cost is given by Equation (6)). After selection,

s^{*}

is set as the new current task and removed from the unvisited set U. This process is repeated until all remaining task points are visited. The chromosome segment formed by this greedy path is then concatenated after the heuristic sequence to form a complete individual chromosome.

3.2. Fitness Function

The choice of the fitness function plays a crucial role in determining both the convergence efficiency of the algorithm and its ability to identify the global optimum. In this study, the problem is fundamentally formulated as minimizing energy consumption, and the objective function

f i t

is defined as

f i t = - \sum_{i = 1}^{N_{c} - 1} E_{d i s} (m_{i, k}, m_{i + 1, l}), k, l \in {1, 2}

(15)

The core optimization objective of this study is to minimize the UAV’s flight energy consumption, which is defined by the energy cost function in Equation (6). Therefore, the fitness function is constructed by negating the energy function in Equation (6) to ensure compatibility with the maximization-based optimization mechanism of the MSGA-RL algorithm. The corresponding fitness function is given in Equation (15).

3.3. Adaptive Crossover Operator

As a core operator for exploring new solution spaces, the crossover operator is indispensable. However, most crossover methods are designed for general purposes and lack heuristic guidance tailored to specific problems, and this makes it difficult for offspring to effectively inherit feasible or high-quality features related to problem characteristics from parents [25]. In the MSGA-RL algorithm, this study proposes a multi-selector heuristic crossover operator to facilitate the search process of the algorithm.

The adopted crossover operators include the segment rearrangement operator, the gene recombination operator, the distance-priority operator, and the cost-driven insertion operator guided by common subpaths. The specific operations of these crossover operators are as follows.

Segment Rearrangement Operator: Randomly select a continuous gene sequence in the chromosome, shuffle the order of genes within this segment, and then reinsert the shuffled segment into its original position. It features strong local perturbation, simple and efficient operation, and can quickly break local optimal solutions. For example, in post-disaster rescue paths, the problem of local detours can be optimized by rearranging the order of a segment of continuous path points.

Gene Recombination Operator: Randomly extract multiple genes from the chromosome, rearrange them to form a new gene segment, and then insert it into a randomly selected position to generate a new chromosome individual. Gene recombination has high flexibility, making it easier to produce diverse individuals. The randomness of the insertion position further broadens the population search space, helping to reduce the likelihood of the algorithm becoming trapped in local optima.

Three-Point Crossover Operator: According to the length of the chromosome, three random crossover points

c p_{1}

,

c p_{2}

, and

c p_{3}

are determined on the chromosome, defining two crossover intervals:

s u b_{1}

= [

c p_{1}

,

c p_{2}

- 1

] and

s u b_{2}

= [

c p_{3}

,

N_{c}

]. New offspring are created by exchanging genes within these crossover intervals. If gene position duplication occurs after the exchange, necessary gene swaps are performed in non-crossover intervals to eliminate such duplicates. The multi-interval exchange balances local optimization and global recombination; the randomness of crossover positions enhances population diversity, and the controllable exchange logic prevents the loss of high-quality gene segments.

Distance-Priority Operator: After selecting a chromosome segment, reorder the task numbers in the segment based on the distance between each path point in the segment and the preceding chromosome (sorted from nearest to farthest). Distance constraints avoid the blindness of random rearrangement, enabling efficient optimization of excessively long detours between adjacent path points in post-disaster rescue.

Cost-Driven Insertion Operator Guided by Common Subpaths: Randomly select a common subpath containing three consecutive target points from the two parent paths. For the remaining unvisited cities, calculate the comprehensive cost of each candidate insertion position using the fitness function, and continuously select the position with the minimum comprehensive cost for insertion until all cities are visited. The cost-driven insertion logic balances objectives such as energy consumption and safety, generating paths that better meet the requirement of minimizing the comprehensive cost in post-disaster rescue.

In the algorithm initialization phase, each crossover operator is assigned an initial score

s o c_{i}

, and the weight

η_{i}

is updated according to the following formula:

η_{i} = \frac{s o c_{i}}{\sum_{j = 1}^{5} s o c_{j}}

(16)

Here,

s o c_{i}

indicates the current performance score of the considered crossover operator, and its corresponding weight

η_{i}

represents the relative proportion of this operator’s score among all crossover operators’ scores. In each generation, the algorithm selects the crossover operator in use via a roulette mechanism based on the weights assigned to the operators. This allows for dynamically adjusting the choice of crossover strategies, where operator selection is based on the fitness improvement of newly generated individuals relative to the best individuals from the previous generation.

According to Equation (17), the score of the operator is adjusted as follows.

s o c_{i}^{'}

denotes the updated score of the i-th individual.

λ_{1}

is added when the new individual’s fitness

f_{n e w}

is worse than the current best

f_{g l o b a l}

;

λ_{2}

is added when it is worse than the previous generation’s best

f_{p r e}

but better than the current best

f_{g l o b a l}

; in all remaining situations,

λ_{3}

is added.

s o c_{i}^{'} = s o c_{i} + \{\begin{matrix} λ_{1}, & f_{new} < f_{global} \\ λ_{2}, & f_{global} \leq f_{new} < f_{pre} \\ λ_{3}, & o t h e r w i s e \end{matrix}

(17)

The weights of all operators are then recalculated based on the updated scores and the roulette selection mechanism is adjusted accordingly. This updated mechanism then guides the selection of crossover operators in subsequent iterations.

3.4. Individual Retention Mechanism Combined with RL

Once adaptive crossover and mutation operations have been performed, it is necessary to determine whether the newly generated individuals should be added to the offspring population. While introducing only individuals with better performance (based on fitness) can accelerate convergence, it also reduces population diversity, ultimately leading the algorithm to fall into local optimal solutions. When the algorithm’s search process shows a tendency to deviate from the region of potential optimal solutions, the algorithm should promptly identify this deviation, filter out ineffective search directions and dynamically adjust the search range. To address this issue, this study proposes a memory pool mechanism based on RL, which introduces an agent that decides whether to retain an individual based on the performance of the currently optimized individual. Decisions regarding actions are made by the Agent using only the current search state, satisfying MDP requirements. Therefore, RL methods can be used to train and optimize the agent.

In practical application scenarios, the Agent continuously executes decision-making actions

a_{t}

at different iteration stages of the algorithm, based on the current search state

s_{t}

of the population (changes in the population’s optimal fitness and individual diversity). It obtains rewards

r_{t}

through the interaction results with the evolutionary environment, and after multiple iterations, gradually learns the optimal individual screening strategy to achieve a higher expected return. Since this study aims to use RL to assist individual retention in PDCPP, priority is given to RL methods that have a simple structure, are easy to integrate with evolutionary algorithms, and offer stable performance. Based on this, the proposed algorithm adopts Q-learning—a classic model-free method in RL. Considering the core requirement of balancing population diversity and convergence speed in the PDCPP problem, the state, action, and reward of the Agent are defined below.

State: the state space is defined as

S = {S_{1}, S_{2}, S_{3}}

.

S_{1}

denotes that the current population’s optimal fitness has improved relative to the prior generation.

S_{2}

indicates that the best fitness remains unchanged across generations.

S_{3}

reflects a decline in the current population’s optimal fitness compared with its predecessor.

Action: the action dimension concerns how the algorithm manages the target individual and encompasses three possible treatments, including keeping it within the current population, transferring it to the Elite Archive for potential reuse, or removing it from consideration. The Elite Archive serves as a buffer for preserving promising individuals during the search.

Reward: the reward is determined by evaluating the change in optimal fitness between successive generations, and its formulation is presented in Equation (18).

The difference parameter

r_{t}

is calculated as

r_{t} = f_{t}^{b e s t} - f_{t - 1}^{b e s t}

(18)

Here,

f_{t}^{b e s t}

denotes the fitness value of the optimal individual in the current population, while

f_{t - 1}^{b e s t}

represents the fitness value of the optimal individual in the previous generation population. According to the basic principle of Q-learning, after the Agent evaluates a specific state-action pair, the update of its Q-value can be given by Equation (19). Here, Q

(s_{t}, a_{t})

denotes the action-value function (Q-value),

α_{L}

denotes the learning rate, and

γ

denotes the discount factor.

Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + α_{L} \cdot ⌊ r_{t} + γ \cdot max_{a^{'}} Q (s_{t + 1}, a^{'}) - Q (s_{t}, a_{t}) ⌋

(19)

In most cases, MSGA-RL favors the action associated with the highest Q-value. To balance exploration and exploitation, a threshold-based mechanism is incorporated. For each decision step, a random number in [0, 1] is generated; when this number falls below the threshold, the algorithm performs a stochastic action choice, whereas otherwise it adopts the action currently holding the maximum Q-value. This mechanism strengthens exploratory behavior and reduces the risk of the search becoming trapped in suboptimal regions.

3.5. Elite Archive

When the RL agent elects to place an individual into the Elite Archive, the system executes a two-stage procedure. Initially, the individual exhibiting the highest performance in the current iteration is selected as a candidate, which is subsequently inserted into the Elite Archive for temporary retention.

In this study, the Elite Archive is configured to 20% of the population size. For populations between 50 and 200 individuals, this setting provides an effective balance between preserving high-quality genes and maintaining diversity, and its effectiveness is validated in subsequent experiments. For populations outside this range, the proportion may need to be recalibrated. Once this capacity is reached, a population-update evaluation is initiated. During this process, the system compares the best fitness value contained within the Archive with that of the active population. Based on this assessment, the algorithm determines whether the individuals preserved in the Archive should supplant the current population.

When the Elite Archive contains an individual with superior fitness compared to the active population, the current population is fully substituted. The replacement individuals come from the Elite Archive. This is to introduce higher-quality evolutionary genes. If not, the structure of the current population remains unchanged. Once the population update evaluation is complete, the Elite Archive is fully cleared. This prepares it for the next cycle of individual retention and storage. This temporary storage-evaluation-replacement cycle is dynamic. Through it, the Elite Archive mechanism not only preserves potential high-quality evolutionary directions for the population but also mitigates the degradation of population diversity. This dual objective is achieved through periodic resetting, ultimately forming a complementary and synergistic effect with the RL decision-making process.

As the final part of this section, Algorithm 1 provides the main framework of MSGA-RL.

Algorithm 1 The Framework of MSGA-RL

Require:: $N_{p}$ , $G e n$ , $P_{c}$ , $P_{m}$ , $O_{c}$ , $s o c_{i}$ , Q, $ε$ , $η$ , $α_{L}$ , $γ$ , H
Ensure:: Minimum energy consumption
1:: The initialization parameters are shown in Table 1;
2:: Set $A r c h i v e \leftarrow []$ , $t \leftarrow 0$ ;
3:: $P_{0} \leftarrow Equation (13), Equation (14)$ ;
4:: $f_{global} \leftarrow Equation (15)$ ;
5:: for $g e n = 1$ to $G e n$ do
6:: $O_{c} \leftarrow Select individual by roulette wheel method$ ;
7:: $O f f s p r i n g \leftarrow crossover and mutation operation (P_{c}, O_{c}, P_{m})$ ;
8:: $f_{global} \leftarrow Equation (15)$ ;
9:: $a_{t} \leftarrow Select action (Q, ε, S)$ ;
10:: if Preserve Offspring then
11:: $P \leftarrow O f f s p r i n g$ ;
12:: end if
13:: if save the individual into the Archive then
14:: $A r c h i v e \leftarrow Add (A r c h i v e, l b_{indi})$ ;
15:: end if
16:: if $| A r c h i v e | = = 20$ then
17:: $f \leftarrow AvgFitness (A r c h i v e)$ ;
18:: if $f < global_best$ then
19:: $P \leftarrow A r c h i v e$ ;
20:: $A r c h i v e \leftarrow []$ ;
21:: end if
22:: end if
23:: $r_{t} \leftarrow Equation (18)$ ;
24:: $Q \leftarrow Equation (19)$ ;
25:: $t \leftarrow t + 1$ ;
26:: $s o c_{i} \leftarrow Equation (17)$ ;
27:: if $g e n % 10 = = 0$ then
28:: $H \leftarrow Equation (16)$ ;
29:: end if
30:: end for

4. Simulation Experiments and Results

This study selects post-disaster rescue areas as the research scenario, simplifying them into two typical geometric shapes, convex quadrilaterals and convex pentagons. These shapes cover the common geometric features of post-disaster regions [17,26,27]. Post-disaster areas often contain numerous obstacles, such as building ruins. To ensure path safety, this study adopts a dual-circle obstacle avoidance strategy [11] to meet the requirements of the PDCPP scenario.

To verify the performance of the proposed MSGA-RL algorithm in solving PDCPP problems, this section conducts simulation experiments with the following analyses. First, within the same convex polygon region, the energy consumption and path length of the MSGA-RL algorithm are compared with those of three other algorithms (GA, SPC, and DEGA). This comparison intuitively evaluates the performance advantages of MSGA-RL. Second, control experiments are conducted on each improved strategy within the MSGA-RL algorithm. These experiments quantitatively analyze the contribution of key optimization strategies to algorithm performance. Finally, boxplots are used to analyze differences between the multi-selector crossover operator and the crossover operators in other comparison algorithms. This clarifies its role in improving algorithm stability.

To ensure uniform experimental conditions, the parameters of GA, DEGA, and MSGA-RL are all set strictly following the configurations in Reference [11]. The detailed parameter settings are listed in Table 2.

To evaluate the search performance of each algorithm more accurately, each experimental instance is run repeatedly for 5000 times.

N_{p}

is the population size,

G e n

is the number of iterations,

N_{c}

is the number of coverage strips, and

N_{R L}

represents the number of individuals updated via the RL retention mechanism. As shown in Table 3, the computational complexity of GA and DEGA is

O (N_{p} \cdot G e n \cdot N_{c}^{2})

, MSGA-RL further adds

N_{R L} \cdot N_{c}

to account for the reinforcement learning-based individual retention mechanism. Therefore, MSGA-RL has a slightly higher computational cost compared with GA and DEGA. However, this additional overhead is acceptable in post-disaster rescue scenarios, as the optimization is performed offline before the UAV executes the coverage mission. This analysis clarifies the trade-off between enhanced path optimization and increased computation time, and it demonstrates that MSGA-RL is suitable for high-endurance missions that tolerate offline planning.

To determine the impact of Elite Archive size on the algorithm, five control groups with Elite Archive sizes of 5%

N_{p}

, 10%

N_{p}

, 20%

N_{p}

, 30%

N_{p}

, and 50%

N_{p}

were established under the conditions of a convex quadrilateral region and

N_{c}

= 10. Final stable turning energy consumption and convergence iteration count were used as evaluation metrics to verify how different Elite Archive sizes affect the algorithm’s performance. As shown in Table 4, the 20%

N_{p}

group demonstrated lower final stable turning energy consumption and a faster convergence rate compared to the other control groups. Its convergence iterations were reduced by 13.2%, 5.9%, 7.3%, and 19.6% compared with the 5%

N_{p}

, 10%

N_{p}

, 30%

N_{p}

, and 50%

N_{p}

groups, respectively. This result indicates that an Elite Archive size of 20%

N_{p}

optimally balances retention of high-quality genes and maintenance of population diversity. It prevents the loss of high-quality, low-energy-consuming individuals while not compressing the evolutionary space of ordinary individuals, thus achieving the optimal algorithm performance. Therefore, in the subsequent simulation experiments of this study, the Elite Archive size is set to 20%

N_{p}

.

MSGA-RL adopts different turning strategies based on the relationship between the scanning width of fixed-wing UAV and the size of

R_{m i n}

. When w ≤

R_{m i n}

, the improved MSGA-RL is used to optimize the access sequence of coverage paths; when w >

R_{m i n}

, a row-by-row access strategy is adopted. Therefore, this study introduces different scanning widths to conduct comparative experiments within convex quadrilateral and convex pentagonal areas. Based on Equation (3), three distinct test scenarios were designed, with each scenario containing 10, 20, and 50 coverage paths within the target area, respectively.

Since the flight direction of UAV is predetermined by the minimum span algorithm, the SPC energy consumption of each algorithm within the same test area is fixed. Only the turning flight energy consumption outside the mission area is accounted for in the experiments; Figure 6 and Figure 7 present the respective comparison results.

In the comparative experiments on turning energy consumption in both convex quadrilateral and convex pentagon regions, the turning energy consumption of all algorithms showed an upward trend as the number of flight strips

N_{c}

(10, 20, 50) increased. This is because a larger number of strips leads to more turns between strips, thereby accumulating higher total turning energy consumption.

From the horizontal comparison of algorithms, the MSGA-RL algorithm achieved the lowest turning energy consumption in all strip quantity scenarios across both types of regions. In the convex quadrilateral region (first set of data), when

N_{c}

= 10, the energy consumption of MSGA-RL was 31.5%, 16.4%, and 8.1% lower than that of SPC, GA, and DEGA, respectively; when

N_{c}

= 50, its energy consumption was 52.8% lower than that of SPC, and the advantage became more significant as the number of strips increased. In the convex pentagon region (second set of data), MSGA-RL consistently outperformed the other algorithms. When

N_{c}

= 10, its energy consumption was lower than that of SPC and GA; when

N_{c}

= 50, it was 5.7% lower than that of DEGA.

In addition, among traditional algorithms, SPC consistently had the highest turning energy consumption. Although GA and DEGA outperformed SPC, their energy consumption levels were significantly higher than that of MSGA-RL. This indicates that MSGA-RL can more effectively reduce the invalid turning consumption between strips. Especially in complex scenarios with a large number of strips, its energy consumption optimization capability is more practically valuable.

In this section, the convergence performance, algorithm stability, and analysis as well as verification of variant algorithms are all focused on the scenario with

N_{c} = 10

. The core rationale lies in the fact that under this specific number of flight strips, the baseline flight energy consumption levels of the four algorithms exhibit minimal discrepancies. This setup enables more precise delineation of the stability differences among the algorithms during the iterative process, while precluding interference from excessive initial energy consumption gaps that could obscure the accurate assessment of the algorithms’ core performance.

Figure 8 and Figure 9 illustrate the convergence performance of different algorithms under two scenarios (convex quadrilateral and convex pentagon) (

N_{c}

= 10), respectively. As observed from the curve trends, MSGA-RL exhibits the most superior convergence performance. During the early phase of the search in both scenarios, the algorithm exhibits substantially reduced energy demand for turning maneuvers relative to DEGA and GA, providing an initial advantage in rapid optimization. As iterations proceed, MSGA-RL continues to expand the solution space via its adaptive crossover–mutation strategy and reinforcement learning-based individual retention mechanism, ultimately achieving stable convergence earlier than DEGA and GA. It should be specifically noted that the convergence curve of the SPC algorithm is not included in the figures, and the core reason lies in the inherent difference in its algorithm mechanism. SPC adopts a fixed-order regional scanning strategy, and path generation does not rely on an evolutionary iteration process. Therefore, the corresponding path length and flight energy consumption are constant values, and a convergence curve that changes with the number of iterations cannot be formed. To ensure the validity and interpretability of the comparison, this study only presents the iterative convergence processes of the GA, DEGA, and MSGA-RL algorithms that possess evolutionary search capabilities.

Subsequently, to verify the role of the distance-prioritized greedy initial population strategy in the PDCPP scenario, this study constructs three variant algorithms with key modules removed. MSGA-RL without the distance-prioritized greedy initial population strategy (denoted as MSGA-RL-W1); MSGA-RL without the RL-based individual retention feature removed (denoted as MSGA-RL-W2); MSGA-RL without the multi-selector crossover operator strategy (denoted as MSGA-RL-W3). Control experiments are conducted between the three aforementioned variant algorithms and the MSGA-RL algorithm. In the PDCPP scenarios of different convex polygon regions (

N_{c}

= 10), comparative analysis is carried out using the fitness value as the monitoring indicator. By comparing the fitness differences between the original algorithm and each variant algorithm, the effect of the distance-prioritized greedy initial population strategy on the initial quality of the population can be clearly observed, thereby clarifying its role in optimizing the energy consumption of coverage paths.

As shown in Figure 10, the multiple proposed improved strategies all contribute to enhancing the overall search capability of the algorithm in the PDCPP scenario. Among them, the distance-prioritized greedy initial population strategy has the most prominent effect on improving algorithm performance. This is because, in the early stage of path planning, a high-quality initial population can lay a solid foundation for the subsequent evolutionary search of the algorithm. By prioritizing distance factors to generate the initial population, this strategy enables the initial solutions to possess a certain potential for path optimization. It reduces the time and computational cost required for the algorithm to screen high-quality solutions from a large number of low-quality solutions in subsequent iterations, thereby facilitating the algorithm to quickly discover high-quality solutions.

In the PDCPP scenario with energy consumption as the optimization objective, algorithm stability serves as a core prerequisite for ensuring the efficient completion of tasks. To enhance algorithm stability and thereby support the stable execution of tasks by UAVs, this paper adopts a multi-selection crossover operator to construct MSGA-RL. Through comparative experiments with crossover operators of other algorithms, the stability of the multi-selection crossover operator employed in MSGA-RL is verified in the PDCPP scenario. As shown in the boxplots of Figure 11 and Figure 12, in both convex quadrilateral and convex pentagon regions, the energy consumption cost median of the multi-selector crossover operator used in the MSGA-RL algorithm is lower than that of the three-point crossover operator and the random crossover operator. This intuitively demonstrates that the multi-selector crossover operator has better path optimization capabilities and can effectively reduce flight energy consumption. Meanwhile, the multi-selector crossover operator has the smallest interquartile range, indicating that the paths generated in multiple experiments have extremely high stability. In contrast, the random crossover operator has a large number of outliers, which reflects that it tends to fall into local optima and is difficult to continuously explore globally better paths.

5. Conclusions

A fixed-wing UAV can quickly collect terrain and position information in post-disaster rescue scenarios, providing key data support for the efficient organization of rescue operations. To address the rescue timeliness of a fixed-wing UAV in All-Weather PDCPP missions, this study proposes the MSGA-RL optimization algorithm. In convex quadrilateral and pentagonal areas, three different scanning widths were introduced, and performance comparison experiments were conducted using various path planning algorithms. When

N_{c}

= 20, MSGA-RL reduces turning energy consumption compared with the SPC, GA, and DEGA algorithms in both convex quadrilateral and pentagonal regions, with reductions of 45.53%, 22.89%, and 9.32% in the quadrilateral region, and 31.89%, 21.69%, and 11.58% in the pentagonal region, while also demonstrating superior stability. Specifically, the distance-priority greedy initialization strategy ensures the quality of the initial population, the multi-selection crossover operator enhances convergence efficiency and global search capability, and the RL adaptive individual retention mechanism combined with a memory pool optimizes solution quality and search efficiency. To evaluate the performance of the algorithm, this study simulates complex post-disaster scenarios using convex polygons and conducts comparative experiments between MSGA-RL and classical algorithms such as SPC, GA, and DEGA. The results indicate that MSGA-RL can substantially reduce the total flight energy consumption of a fixed-wing UAV while effectively enhancing the responsiveness of post-disaster rescue operations.

It should be noted that the primary focus of MSGA-RL is on minimizing UAV turning energy to ensure efficient post-disaster area coverage. While the algorithm achieves significant energy savings, it requires offline computation to optimize the coverage path sequences. In practical scenarios, this introduces a trade-off between the “energy-saving benefits” and the “potential computational overhead.” Therefore, MSGA-RL is particularly suitable for high-endurance missions where the offline planning time is acceptable. For scenarios demanding real-time path planning, the computational cost should be considered, and MSGA-RL can be adapted or combined with faster heuristics to balance energy efficiency and planning speed.

However, in real post-disaster scenarios, the affected areas can be extensive, and operations relying on a single fixed-wing UAV may fail to meet timeliness requirements. Therefore, future research could focus on multi-UAV collaborative path planning and coverage strategies to further improve the efficiency and coverage quality of large-scale post-disaster rescue missions.

Author Contributions

Conceptualization, X.L.; methodology, X.L.; software, X.L.; validation, X.L.; formal analysis, X.L.; investigation, X.L.; resources, X.L., J.Y. and M.C.; data curation, X.L.; writing—original draft preparation, X.L.; writing—review and editing, X.L., J.Y. and M.C.; visualization, X.L.; supervision, X.L., J.Y. and M.C.; project administration, X.L.; funding acquisition, X.L., J.Y. and M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was partially supported by the Scientific and Technological Project in Henan Province (Grant No. 252102210238), the Open Project of the Grain Information Processing Center of Henan University of Technology (Grant Nos. KFJJ2022014, KFJJ2024017), the Key Project of Science and Technology Research of the Henan Provincial Department of Education (Grant No. 24B510002), the Natural Science Project of the Zhengzhou Bureau of Science and Technology (Grant No. 22ZZRDZX40), and the Youth Project of the Henan Provincial Natural Science Foundation. The APC (Article Processing Charge) was funded by Henan University of Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data will be kept confidential for the time being as it will be used for further research based on this study.

Acknowledgments

During the preparation of this manuscript, ChatGPT 4.0 was utilized to optimize the fluency of certain English sentences, adjust the standardization of academic expressions, and refine the accuracy of professional terminology. The authors have reviewed, revised, and edited the generated outputs sentence by sentence to ensure the content aligns with the research objectives and adheres to academic rigor, and they assume full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Karaca, A.; Buğday, E. Generation of digital terrain models using different geospatial measurement technologies: A comparative analysis of the terrain data sets. Turk. J. For. 2025, 26, 25–35. [Google Scholar] [CrossRef]
Vidhyalakshmi, M.; Manivannan, K. Enhancing spatial accuracy in disaster response: A DTBiFP-YOLOv8 model for drone-based search and rescue operations. Earth Sci. Inform. 2025, 18, 240. [Google Scholar] [CrossRef]
Yucesoy, E.; Balcik, B.; Coban, E. The role of drones in disaster response: A literature review of operations research applications. Int. Trans. Oper. Res. 2025, 32, 545–589. [Google Scholar] [CrossRef]
Pérez-González, A.; Benítez-Montoya, N.; Jaramillo-Duque, Á. Coverage path planning with semantic segmentation for UAV in PV plants. Appl. Sci. 2021, 11, 12093. [Google Scholar] [CrossRef]
Elijah, T.; Jamisola, R.S., Jr.; Tjiparuro, Z. A review on control and maneuvering of cooperative fixed-wing drones. Int. J. Dyn. Control 2021, 9, 1332–1349. [Google Scholar] [CrossRef]
Du, L.; Fan, Y.; Gui, M.; Zhao, D. A Multi-Regional Path-Planning Method for Rescue UAVs with Priority Constraints. Drones 2023, 7, 692. [Google Scholar] [CrossRef]
Richards, A. Flight Optimization for an Agricultural Unmanned Air Vehicle. In Proceedings of the 2018 European Control Conference (ECC), Limassol, Cyprus, 12–15 June 2018; pp. 2891–2896. [Google Scholar] [CrossRef]
Huang, J.; Fu, W.; Luo, S.; Wang, C.; Zhang, B.; Bai, Y. A Practical Interlacing-Based Coverage Path Planning Method for Fixed-Wing UAV Photogrammetry in Convex Polygon Regions. Aerospace 2022, 9, 521. [Google Scholar] [CrossRef]
Tang, G.; Tang, C.; Zhou, H.; Claramunt, C.; Men, S. R-DFS: A Coverage Path Planning Approach Based on Region Optimal Decomposition. Remote Sens. 2021, 13, 1525. [Google Scholar] [CrossRef]
Zhan, H.; Zhang, Y.; Huang, J. A reinforcement learning-based evolutionary algorithm for the unmanned aerial vehicles maritime search and rescue path planning problem considering multiple rescue centers. Memet. Comput. 2024, 16, 373–386. [Google Scholar] [CrossRef]
Wu, Y.; Du, C.; Yang, R. Area coverage path planning for tilt-rotor unmanned aerial vehicle based on enhanced genetic algorithm. J. Zhejiang Univ. Eng. Sci. 2024, 58, 2031–2039. [Google Scholar] [CrossRef]
Gutiérrez-Aguirre, P.; Contreras-Bolton, C. A multioperator genetic algorithm for the traveling salesman problem with job-times. Expert Syst. Appl. 2024, 10, 122–152. [Google Scholar] [CrossRef]
Ma, Q.; Zhang, D.; Wan, C.; Zhang, J. Multiobjective emergency resources allocation optimization for maritime search and rescue considering accident black-spots. Ocean Eng. 2022, 261, 112–178. [Google Scholar] [CrossRef]
Wu, R.; Gu, F.; Liu, H. UAV Path Planning Based on Multicritic-Delayed Deep Deterministic Policy Gradient. Wirel. Commun. Mob. Comput. 2022, 2022, 9017079. [Google Scholar] [CrossRef]
Zhang, S.; Dai, Q. UAV Path Planning Based on Improved Deep Deterministic Policy Gradients. J. Syst. Simul. 2025, 37, 875–881. [Google Scholar] [CrossRef]
Salach, A.; Bakula, K.; Pilarska, M.; Ostrowski, W. Accuracy Assessment of Point Clouds from LiDAR and Dense Image Matching Acquired Using the UAV Platform for DTM Creation. ISPRS Int. J. Geo Inf. 2018, 7, 342. [Google Scholar] [CrossRef]
Wang, M.; Zhang, D.; Li, C. Multiple fixed-wing UAVs collaborative coverage 3D path planning method for complex areas. Def. Technol. 2025, 47, 197–215. [Google Scholar] [CrossRef]
Wang, L.; Zhuang, X.; Zhang, W.; Cheng, J.; Zhang, T. Coverage Path Planning for UAVs: An Energy-Efficient Method in Convex and Non-Convex Mixed Regions. Drones 2024, 8, 776. [Google Scholar] [CrossRef]
Kang, B.; Wang, C.; Su, Y. Multi-UAV forest area inspection path planning based on concave polygon region decomposition. Sci. Rep. 2025, 15, 41990. [Google Scholar] [CrossRef]
Zhang, P.; Mei, Y.; Wang, H.; Wang, W.; Liu, J. Collision-free trajectory planning for UAVs based on sequential convex programming. Aerosp. Sci. Technol. 2024, 152, 109404. [Google Scholar] [CrossRef]
Miarka, D.; Hnidka, J. Empirical Approach to Determination of Key Flight Performance Parameters of UAV. In Proceedings of the 2025 International Conference on Military Technologies (ICMT), Brno, Czech Republic, 27–30 May 2025; pp. 1–6. [Google Scholar] [CrossRef]
Wang, S.; Bai, Y.; Zhou, C. Coverage Path Planning Design of Mapping UAVs Based on Particle Swarm optimization Algorithm. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 27–30. [Google Scholar] [CrossRef]
Yuan, J.; Liu, Z.; Lian, Y. Global optimization of UAV area coverage path planning based on good point set and genetic algorithm. Aerospace 2022, 9, 86. [Google Scholar] [CrossRef]
Ribeiro, P.; Coelho, A.; Campos, R. On the energy consumption of rotary-wing and fixed-wing UAVs in flying networks. In Proceedings of the 2025 20th Wireless On-Demand Network Systems and Services Conference (WONS), Hintertux, Austria, 27–29 January 2025; IEEE: New York, NY, USA, 2025; pp. 1–4. Available online: https://ieeexplore.ieee.org/abstract/document/10926026 (accessed on 4 January 2026).
Xue, Y.; Zhu, H.; Liang, J.; Słowik, A. Adaptive crossover operator based multi-objective binary genetic algorithm for feature selection in classification. Knowl.-Based Syst. 2021, 227, 107218. [Google Scholar] [CrossRef]
Xiong, Y.; Zhou, Y.; She, J. Collaborative coverage path planning for UAV swarm for multi-region post-disaster assessment. Veh. Commun. 2025, 53, 100915. [Google Scholar] [CrossRef]
Kang, Z.; Ling, H.; Zhu, T.; Luo, H. Coverage flight path planning for multi-rotor UAV in convex polygon area. In Proceedings of the 2019 Chinese Control And Decision Conference (CCDC), Nanchang, China, 3–5 June 2019; IEEE: New York, NY, USA, 2019; pp. 1930–1937. [Google Scholar] [CrossRef]

Figure 1. Task scenario of PDCPP.

Figure 2. Kinematic model of a fixed-wing UAV.

Figure 3. Schematic of a fixed-wing UAV carrying a LiDAR system.

Figure 4. Best flight direction.

Figure 5. Two types of area coverage strategies.

Figure 6. Performance comparison of region-covering path algorithms (quadrilateral region).

Figure 7. Performance comparison of region-covering path algorithms (pentagon region).

Figure 8. Fitness function iteration curve in the convex quadrilateral region.

Figure 9. Fitness function iteration curve in the convex pentagon region.

Figure 10. Comparison results of different strategy algorithms.

Figure 11. Comparison of the stability of different operators in a convex quadrilateral region.

Figure 12. Comparison of the stability of different operators in a convex pentagonal region.

Table 1. Variables and symbols of the MIP model.

Symbol	Description	Type/Unit
$m_{i, k}$	Whether the k-th endpoint of the i-th scan strip is selected	Binary variable {0, 1}
i	Index of scan strip	Integer, 1 to $N_{c}$
k	Index of endpoint of a scan strip	1 or 2
$m_{i + 1, l}$	l-th endpoint of the $(i + 1)$ -th scan strip	Binary variable {0, 1}
$N_{c}$	Total number of coverage strips	Integer
D	Turning path length/UAV flight distance	m
$ℓ (m_{i, k}, m_{j, l})$	Path distance between points $m_{i, k}$ and $m_{j, l}$	m
$E_{d i s} (m_{i, k}, m_{i + 1, l})$	Minimum flight energy along Dubins feasible curve	J
P	Turning power	W
$θ$	Turn angle	rad
$φ_{m a x}$	Maximum roll angle	rad
$c_{1}, c_{2}$	Constants related to UAV and environment (from the literature)	-
$E_{u}$	UAV energy consumption per unit distance	J/m
$E_{m a x}$	UAV maximum battery capacity	J

Table 2. Parameter settings of the MSGA-RL algorithm and simulation environment.

Parameter Name	Value
Population Size ( $N_{P}$ )	100
Number of Iterations	200
Turning Radius	70 m
Crossover Probability	0.8
Mutation Probability	0.1
Elite Archive Size	20
RL Learning Rate	0.1
RL Discount Factor	0.9
Vertex Coordinates of Convex Quadrilateral	(1500, 75), (1500, 650), (500, 650), (750, 75) (Unit: m)
Vertex Coordinates of Convex Pentagon	(925, 0), (1625, 350), (1500, 650), (500, 650), (400, 250) (Unit: m)
Vertex Coordinates of Obstacles in Convex Quadrilateral	(1000, 400, 40), (1200, 200, 30), (800, 300, 35) (Unit: m; Note: The third value represents the obstacle radius)
Vertex Coordinates of Obstacles in Convex Pentagon	(900, 450, 40), (1300, 350, 30), (700, 250, 35) (Unit: m; Note: The third value represents the obstacle radius)

Table 3. Computational complexity of compared algorithms.

Algorithm	Computational Complexity
GA	$O (N_{p} \cdot G e n \cdot N_{c}^{2})$
DEGA	$O (N_{p} \cdot G e n \cdot N_{c}^{2})$
MSGA-RL	$O (N_{p} \cdot G e n \cdot N_{c}^{2} + N_{R L} \cdot N_{c})$

Table 4. Comparative experiment on algorithm performance under different elite archive sizes.

Scale of Elite Talent Pool	5% $N_{p}$	10% $N_{p}$	20% $N_{p}$	30% $N_{p}$	50% $N_{p}$
Final Stable Energy Consumption Value	8255.61 J	7947.13 J	7735.458 J	7883.53 J	8147.71 J
Number of Convergence Iterations (Generations)	144	135	127	137	158

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, J.; Lu, X.; Cui, M. Disaster Relief Coverage Path Planning for Fixed-Wing UAV Based on Multi-Selector Genetic Algorithm and Reinforcement Learning. Aerospace 2026, 13, 192. https://doi.org/10.3390/aerospace13020192

AMA Style

Yang J, Lu X, Cui M. Disaster Relief Coverage Path Planning for Fixed-Wing UAV Based on Multi-Selector Genetic Algorithm and Reinforcement Learning. Aerospace. 2026; 13(2):192. https://doi.org/10.3390/aerospace13020192

Chicago/Turabian Style

Yang, Jing, Xuemeng Lu, and Mingyang Cui. 2026. "Disaster Relief Coverage Path Planning for Fixed-Wing UAV Based on Multi-Selector Genetic Algorithm and Reinforcement Learning" Aerospace 13, no. 2: 192. https://doi.org/10.3390/aerospace13020192

APA Style

Yang, J., Lu, X., & Cui, M. (2026). Disaster Relief Coverage Path Planning for Fixed-Wing UAV Based on Multi-Selector Genetic Algorithm and Reinforcement Learning. Aerospace, 13(2), 192. https://doi.org/10.3390/aerospace13020192

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Disaster Relief Coverage Path Planning for Fixed-Wing UAV Based on Multi-Selector Genetic Algorithm and Reinforcement Learning

Abstract

1. Introduction

2. Materials and Methods of the PDCPP

3. The MSGA-RL Method

3.1. Generation of Initial Population

3.2. Fitness Function

3.3. Adaptive Crossover Operator

3.4. Individual Retention Mechanism Combined with RL

3.5. Elite Archive

4. Simulation Experiments and Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI