Next Article in Journal
Hyperbolic ∗-Ricci Solitons and Gradient Hyperbolic ∗-Ricci Solitons on (ε)-Almost Contact Metric Manifolds of Type (α, β)
Previous Article in Journal
Pricing Optimization for Inventory with Integrated Storage and Credit Constraints
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Population-Based Iterative Greedy Algorithm for Multi-Robot Rescue Path Planning with Task Utility

School of Computer Science, Liaocheng University, Liaocheng 252059, China
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(1), 164; https://doi.org/10.3390/math14010164
Submission received: 27 November 2025 / Revised: 25 December 2025 / Accepted: 29 December 2025 / Published: 31 December 2025

Abstract

Multi-robot rescue path planning (MRRPP) is critical for ensuring the rapid and effective completion of post-disaster rescue tasks. Most studies focus on minimizing the length of rescue paths, the number of robots, and rescue time, neglecting the task utility, which reflects the effect of timely emergency supplies delivery, which is also important for post-disaster rescue. In this study, we integrated multiple optimization indicators into the rescue cost and modeled the problem as a variant of the vehicle routing problem (VRP) with timeliness and battery constraints. A population-based iterative greedy algorithm with Q-learning (QPIG) is proposed to solve it. First, two problem-specific heuristic schemes are designed to generate a high-quality and diverse population. Second, a competition-oriented destruction-reconstruction mechanism is applied to improve the global search ability of the algorithm. In addition, a Q-learning-based local search strategy is developed to enhance the algorithm’s exploitation ability. Moreover, a historical information-based constructive strategy is investigated to accelerate the convergence speed of the algorithm. Finally, the proposed QPIG is validated by comparing it with five efficient algorithms on 56 instances. Experiment results show that the proposed QPIG significantly outperforms compared algorithms in terms of rescue cost and convergence speed.

1. Introduction

With the increasing frequency of natural disasters, the security and stability of human society are facing severe challenges [1]. In recent years, post-disaster relief has become an important research field [2]. In some severely affected regions, high-risk factors such as structural collapse and leakage of hazardous substances make it difficult for rescue workers to enter the disaster area safely [3]. The development of robot technology provides a new solution for post-disaster rescue. Rescue robots are capable of replacing or assisting humans in search, positioning, material transportation, and other tasks in dangerous environments [4]. However, the post-disaster rescue tasks are usually numerous and widely distributed, and a single robot is difficult to meet the actual needs. Multi-robots are able to work together to complete complex rescue tasks efficiently [5]. Therefore, how to effectively design rescue sequences for multiple robots in post-disaster rescue is an urgent problem to be solved.
Currently, the multi-robot path planning (MRPP) research mainly focuses on the shortest path or the minimum energy consumption and has made significant progress [6,7]. Many researchers model these problems as variants of the VRP to optimize the path of the robot. For example, Cui et al. [8] introduced a mathematical model for the rescue vehicle routing problem with time windows and limited survival time, which aims to minimize the total distance traveled by helicopters while considering time windows, survivor life strength, and the capacities of transport and medical helicopters. Lin et al. [9] designed a grouping method in disaster environments to minimize task failures and rescue time based on task locations and time constraints. Geng et al. [10] proposed a modified PSO algorithm to maximize the number of successfully rescued survivors by assigning tasks to robots while considering uncertain survival times, ensuring that each robot operates within time constraints and optimizes task efficiency. Huang et al. [11] proposed a model to minimize both path cost and time cost to complete the tasks in the shortest time, while ensuring that each robot has enough supplies to perform its assigned task. Wang et al. [12] developed an improved IG algorithm to maximize the number of rescued survivors by considering life strength constraints and the time robots take to reach each survivor. Although existing research in rescue problems often considers optimization objectives such as path length, time, and the number of robots, most studies overlook the critical factor of task utility. Task utility refers to the rescue effect of the timely delivery of emergency supplies, and it decays exponentially over time. If emergency supplies can be delivered promptly and assistance can be provided, the task utility will be extremely high. If delivery is delayed, even if the supplies are intact, their utility value will decrease. Therefore, path planning should not only optimize the shortest distance and time but also maximize task utility to enhance rescue effectiveness. Additionally, considering that rescue robots have limited battery, path planning must also account for battery usage to ensure that robots can effectively complete their tasks. Therefore, based on the traditional VRP model, a new problem model is established considering the timeliness of the task and the battery constraint of the robot. The optimization objective of the model is composed of path length, task utility, number of robots, and rescue time. It aims to achieve efficient path planning by minimizing the objective function.
The MRPP problem is essentially an NP-hard problem, as demonstrated by reductions from well-established NP-hard problems such as the vehicle routing problem (VRP) and the path planning problem [13,14,15,16]. This makes it challenging to achieve the optimal solution using exact algorithms within a specified time. Using heuristics and metaheuristics is one of the most efficient algorithms for finding the optimal or approximate optimal solution to NP-hard problems [17]. As a straightforward and efficient metaheuristic algorithm, the iterative greedy (IG) algorithm offers the benefits of a simple structure, minimal parameter requirements, and embeddability. The IG algorithm has been extensively used to solve combinatorial optimization problems [18,19,20,21]. The MRRPP problem and combinatorial optimization problems have similarities in solution structure. Both problem types require identifying the optimal sequence among a large number of possible configurations. Therefore, this study explores the applicability of the IG algorithm in the MRRPP problem. However, given the features of the MRRPP problem, the traditional IG algorithm is limited by its single-solution greedy strategy and is susceptible to becoming trapped in local optima when handling large-scale problems. Therefore, this work proposes an improved population-based iterative greedy algorithm for solving the MRRPP problem. Specifically, a history-based exploration strategy, a collective intelligence-driven strategy, and an elite-oriented optimization strategy are designed to enhance solution diversity. Additionally, six neighborhood operators based on critical and non-critical paths are developed for further exploration, and Q-learning is investigated to select these operators, thereby enhancing the algorithm’s effectiveness in exploring the search space. The primary contributions of this study are as follows:
(1)
A novel mathematical model for the MRRPP problem is proposed, emphasizing the integration of time-decaying task utility to optimize rescue performance.
(2)
A competition-oriented destruction-reconstruction mechanism is designed to improve the QPIG’s global search ability.
(3)
A Q-learning-based local search strategy is developed to increase the exploitation ability of the algorithm.
(4)
A historical information-based constructive strategy is investigated to accelerate the algorithm’s convergence speed.
The remainder of this study is structured as follows. Section 2 reviews the related literature. Section 3 explains the formulation and description of the problem. Section 4 provides a detailed explanation of the proposed QPIG algorithm. Section 5 describes parameter calibration, effectiveness analysis, and comparative algorithms for the experiment. Finally, Section 6 summarizes this study and presents possible directions for further research.

2. Literature Review

In recent years, several algorithms have been utilized by many researchers to solve rescue path planning problems. For example, Xu et al. [22] proposed an obstacle constraint and task of equal division-based k-means clustering algorithm, along with a glow-worm swarm optimization algorithm with chaotic initialization (GSOCI), which significantly improves the accuracy and convergence speed in addressing the emergency rescue planning issue. Michael et al. [23] applied a variant of the ACO algorithm to determine the optimal search path under time and resource constraints, aiming to maximize the probability of locating survivors. Additionally, Wen et al. [24] developed an unmanned aerial vehicle (UAV) route planning method that incorporated a probabilistic model to estimate potential target locations and two search algorithms to maximize detection probability in solving maritime search and rescue challenges. Meanwhile, Alrayes et al. [25] used an enhanced drone route planning scheme that integrates quasi-oppositional-based learning with the traditional search-and-rescue optimization algorithm to optimize secure communication flight paths. Liu et al. [26] implemented a multi-objective path planning framework, using hierarchical target filtering and sorting strategies to improve efficiency in complex disaster scenarios. Although the aforementioned algorithms have achieved notable results in rescue path planning, there remains room for improvement in balancing solution quality and computational efficiency, especially in large-scale environments.
Given its efficiency and ability to generate near-optimal solutions, the IG algorithm has been extensively adopted to address combinatorial optimization problems since its introduction. In the past several years, many variants of IG have been developed, among which the most notable is the population-based iterative greedy (PIG) algorithm. For instance, Bouamama et al. [27] were the first to establish the PIG algorithm for the minimum weight vertex cover problem. Miao et al. [28] used a PIG algorithm enhanced with Q-learning for solving the scheduling issue of multiple weeding robots, focusing on minimizing completion time while lowering total operational costs. In addition, Zou et al. [29] implemented a mixed-integer linear optimization model and PIG algorithm to tackle the scheduling of automated guided vehicles with unloading safety detection in a matrix workshop. Wang et al. [30] designed the PIG algorithm to address the cascaded flowshop joint scheduling problem, where the search alternates between two sequential phases to effectively reduce total tardiness. Moreover, Zhao et al. [31] presented the PIG algorithm, incorporating heuristic initialization, local search, and selection mechanism, and achieving superior performance on extensive instances. Wang et al. [32] developed two rapid evaluation approaches and a modified idle time insertion method within the PIG algorithm, resulting in notable improvements in both efficiency and solution robustness. Therefore, considering its verified efficiency in solving various combinatorial optimization problems, the PIG algorithm is employed as the core framework to balance search intensification and diversification.
Reinforcement learning (RL) has attracted growing interest recently, prompting researchers to investigate its potential for tackling optimization problems. To improve solution quality in multi-objective optimization, Li et al. [33] designed a Q-learning-based dimension detection search method for identifying promising non-dominated solutions in resource-constrained flexible flowshop scheduling with robotic transportation. Zhao et al. [34] introduced a Q-learning-based approach for determining the weighting coefficients, aiming to achieve smaller objective values in the distributed no-idle permutation flowshop scheduling problem. Ren et al. [35] used an ensemble artificial bee colony algorithm enhanced with Q-learning to solve the bi-objective disassembly line scheduling problem, optimizing both disassembly time and smoothing index while considering task interference. In addition, Luo et al. [36] developed a Q-learning memetic algorithm (QLMA) that considers product priorities and factory heterogeneity to minimize energy consumption and total tardiness. To address the multi-UAV cooperative plant protection task allocation problem, Chen et al. [37] investigated a learning-based memetic algorithm (L-MA) that utilizes Q-learning to assist operators in determining the optimal number of UAVs for each field. Moreover, Fang et al. [38] suggested an improved ant colony optimization algorithm (IAC-IQL), integrated with enhanced Q-learning, for global path planning of search and rescue robots based on Bessel curves, achieving higher efficiency and smoother paths. Zhan et al. [39] developed a reinforcement learning-based genetic algorithm (GA-RL), which leverages Q-learning to guide offspring selection and population renewal in UAV maritime search and rescue involving multiple rescue centers. Inspired by the successful integration of Q-learning for scheduling and path planning problems, this study employs Q-learning to intelligently steer the choice of operators, allowing the algorithm to learn from experience and improve solution quality.
According to the algorithm described above, we chose nine representative algorithms and detailed their advantages and limitations in Table 1. Through comparison, we found that swarm intelligence algorithms such as GSOCI, IACO, DABC, QLMA, IAC-IQL, and GA-RL have more parameters, complex population structures, and higher computational complexity. In the MRRPP problem, we usually only need an optimal path solution. Compared to swarm intelligence algorithms, the improved IG algorithm can explore the solution more thoroughly and effectively optimize its quality. Based on the advantages of the IG algorithm and the current research, this work introduces further improvements on the basis of the IG algorithm.
The traditional IG algorithm performs excellently in solving small-scale problems. As the number of rescue points increases, the advantages of its iterative improvement strategy gradually reduce, and the algorithm may overly focus on certain areas, leading to getting stuck in local optima. This work proposes a competition-oriented destruction-reconstruction mechanism to improve global search capability. Additionally, the traditional IG algorithm only uses insertion operators, resulting in incomplete exploration of the solution space. Therefore, this work designed six neighborhood operators based on the critical path and non-critical path, expanding the search range and enabling more comprehensive exploration of the solution space.

3. Problem Description and Formulation

3.1. Problem Description

In post-disaster rescue scenarios, there are multiple rescue points, assuming each rescue point corresponds to a task. Robots are mainly responsible for carrying out material distribution tasks from rescue centers to various rescue points, usually involving the delivery of emergency supplies. Tasks are usually time sensitive, and their utility gradually declines over time. The task utility refers to the effect or value of the timely delivery of emergency supplies. In post-disaster rescue scenarios, for example, the task is to deliver medication to patients. The “value” of the medication does not refer to its monetary worth as an emergency supply, but rather to the therapeutic effect it can provide. If the medication can be delivered to the patient promptly and provide treatment, its task utility will be very high, as it can significantly improve the patient’s health condition and increase the likelihood of a successful recovery. If the delivery is delayed, even if the medicine is not physically damaged, its task utility value will decrease because it does not provide timely assistance, impacting the rescue effectiveness. This study formulated the rescue problem as a variation of the VRP with timeliness and battery constraints.
The problem is formulated as a directed connected graph G = (N, A), where N = {0} ∪ T is the set of vertices, 0 denotes the rescue center, T is defined as the set of rescue points, and A = {(i, j)|i, jT, i = j} is τ, the arc set composed of distances between rescue points. All rescue tasks are performed by k robots, and the robot set is R = {1, 2, …, k}. Each rescue point must be accessed by at least one robot to ensure smooth completion of all tasks. Additionally, without compromising generality, the following assumptions have been made:
(1)
All robots are homogeneous.
(2)
There is only one rescue center, and robots are required to leave the rescue center and return after completing their tasks.
(3)
All robots are operating normally without any downtime or malfunctions.
(4)
The maximum load of all robots is the same.
(5)
Robots move at a known constant speed.

3.2. Problem Formulation

In this section, a mathematical model is established as follows:
Decision variables:
xijkIf the robot Rk goes from the rescue point Ti to the rescue point Tj, then xijk = 1; otherwise, xijk = 0.
yikIf the robot Rk goes to the rescue point Ti to perform a task, then yik = 1; otherwise, yik = 0.
Objective:
min   f = w l k = 1 R i = 0 N j = 0 , j i N x i j k d i j + w n R + w t ( τ n e τ 1 s ) w r k = 1 R i = 1 N y i k U i e μ i ( t i k a i )
Constraints:
j N x 0 j k = 1           k R
i N x i 0 k = 1           k R
i N y i k 1           k R
a i t i k b i         i N ,     k R
i = 1 N y i k d r i C k m a x
E k = d i , k × E a
E k + ξ Q k
The objective function (1) consists of four parts: path length, number of robots, rescue time, and task utility. Constraints (2) and (3) indicate that each path taken by the robot starts and ends at the rescue center. Constraint (4) represents that the robot can only access each rescue point once. Constraint (5) ensures that the service time of each rescue point is within its time window. Constraint (6) guarantees that the robot’s material transport does not surpass its maximum carrying capacity. Constraint (7) represents the battery consumption of the robot during task execution, which depends on the travel distance and the battery usage per unit distance. Constraint (8) indicates a battery constraint, ensuring that the combined battery usage of the robot and its backup does not exceed the robot’s total battery capacity.

4. The Proposed QPIG Algorithm

The proposed QPIG is structured around four components: the population initialization, destruction and reconstruction phase, local search phase, and acceptance criteria phase. By introducing population-based optimization, multiple solution individuals are processed in parallel. The population-based IG framework not only preserves the advantage of IG in exploring a single solution in depth but also maintains high diversity in the solution space, thereby enhancing global search capability.
In the initialization phase, two problem-specific heuristic schemes (i.e., a utility-based initialization strategy and a rescue cost-based initialization strategy) are designed to establish an initial population composed of high-quality individuals. In the destruction and reconstruction phase, a competition-oriented destruction-reconstruction mechanism generates candidate solutions and selects the optimal individual from the candidate solutions to conduct local search. This mechanism includes a history-based exploration strategy, a collective intelligence-driven strategy, and an elite-oriented optimization strategy to improve the search process. By integrating into the competition-oriented destruction-reconstruction mechanism, individuals in the population undergo a process of selection, continuously filtering out the optimal solutions, which further improves search efficiency. The competition mechanism prompts the algorithm to select the best individuals, thereby accelerating the global search. In the local search phase, a Q-learning-based local search strategy is investigated to select six neighborhood operators to improve the quality of solutions. To overcome the limitations of traditional fixed selection strategies in complex problems, Q-learning adapts by selecting operators and dynamically adjusting the search strategy, allowing the algorithm to make flexible decisions based on the current solution’s state and historical experience. This effectively balances global and local optimization, improving search efficiency and enhancing the quality of the solution. The proposed QPIG framework is presented in Algorithm 1. Lines 10 to 19 of Algorithm 1 describe the acceptance criteria phase, while lines 20 to 24 provide a historical information-based constructive strategy, which clears the stored historical solution set periodically after a predefined number of executions. Specifically, the historical information-based constructive strategy includes the historical path allocation operator and path refinement operator to accelerate the convergence speed of the algorithm. The algorithm continues executing the above loop until the stopping criterion is met and then outputs the optimal path. Figure 1 provides a flowchart of the QPIG.
Algorithm 1: Framework of the proposed QPIG
Input: population size: ps, destruction-reconstruction size: ld, simulated annealing acceptance probability: p, predefined constant: c
Output: the best solution πbest
1(π, POP) = Initialization;
2i = 0, RSeq = ϕ, Histsol = ϕ;
3πbest= π;
4while the stopping criterion is not satisfied do
5       POP’ = Competition-oriented destruction-reconstruction mechanism(POP, π, ld);
6       POP’’ = Q-learning-based local search strategy(POP’);
7       idx_best = arg( m i n j = n 1(f(POP’’j));
8       π1 = POP’’idx_best;
9       RSeq = Rseqπ1;
10       if f(π1) < f(π) then
11               π = π1;
12               if f(π1) < f(πbest) then
13                       πbest = π1;
14                end
15        else
16               if p < rand(0,1) then
17                       π = π1;
18                end
19        end
20       if mod(i, c) == 0 then
21               πhistory = Historical information-based constructive strategy(RSeq);
22               Add πhistory to Histsol;
23               RSeq = ϕ;
24        end
25       i = i + 1;
26end
27return πbest;

4.1. Solution Representation

Figure 2 shows an example of the MRRPP problem, where 0 represents the rescue center and {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} denote the rescue points. R1, R2, and R3 represent the robots. Taking the path of robot R1 as an example, it starts from the rescue center {0} and visits the rescue points {5, 3, 4, 1} in sequence. The t5(6,9) next to rescue point 5 indicates that the earliest service start time of rescue point 5 is 6 and the latest service start time is 9. The demand for rescue point 5 is dr5 = 10, which means 10 units of rescue supplies are required. Each time a robot visits a rescue point, its load is reduced. If the remaining load is insufficient to meet the demand at the rescue point, the robot will return to the rescue center. The initial utility value at rescue point 5 is U = 79, indicating that the robot’s timely delivery of rescue supplies is expected to contribute a value of 79 to this rescue point. As shown in the upper right corner of Figure 2, each robot has an initial load capacity of 150 and a total battery capacity of 110. In this study, a two-dimensional array is employed to represent each solution. The first dimension of the array corresponds to the robot number, and the second dimension represents the rescue points allocated to that robot. Figure 3 illustrates an example of the encoding method.

4.2. Population Initialization

An effective initial solution significantly speeds up the convergence process and improves the solution’s quality. In this study, two problem-specific heuristic schemes are designed. The first heuristic scheme is the utility-based initialization strategy, which sorts tasks according to their utility values. Firstly, tasks with low utility values are assigned to each robot as the starting point. Then, the optimal task insertion is iteratively selected for each path to maximize the overall task utility and improve rescue efficiency while satisfying constraint conditions. Algorithm 2 describes pseudocode for the utility-based initialization strategy.
Algorithm 2: Utility-based initialization strategy
Input: parameters
Output: the solution: π
1N’N;
2Calculate f0j, j ∈ {1, 2, …, n}, n = sizeof(N’);
3Sort tasks in N’ by ascending order of f0j, and select the top m tasks;
4Delete the above m tasks in N’;
5k = 1, iπk,1; π
6while N’ ≠ ϕ do
7        Calculate fij;
8        fmin = min{fi1, fi2, …, fiN};
9        Insert the task x of fmin into πk;
10        if the constraint is satisfied then
11                ix, delete task x from N’;
12         else
13                k = k + 1, iπk,1;
14         end
15end
16return π ;
The second heuristic scheme is the rescue cost-based initialization strategy. The rescue cost-based initialization strategy has the same structure as the utility-based initialization strategy, except that the utility value is replaced by the ranking of rescue cost. This strategy assigns tasks with low rescue cost to each robot as the starting point, selects the optimal task insertion for each path in turn, and constructs a multi-robot initial solution under the premise of satisfying the constraint conditions.
As a population-based metaheuristic algorithm, high-quality and diverse populations are essential for the evolution of the QPIG algorithm. Therefore, we use the proposed heuristic algorithms (utility-based initialization strategy and rescue cost-based initialization strategy) to generate two individuals to ensure the population’s quality. Meanwhile, randomly generate the remaining ps-2 individuals to guarantee the diversity of the population.

4.3. Destruction and Reconstruction Phase

The IG algorithm perturbs the current solution during the destruction and reconstruction phase, improving the diversity of the search process. In the destruction phase, ld tasks are chosen randomly from the path sequence, yielding two subsequences: πD, which includes the removed tasks, and πR, which contains the remaining tasks. Parameter ld represents the destruction size of the path sequence. The value of ld is analyzed and set in Section 5.3. In the construction phase, the tasks πD extracted from the destruction stage are reinserted into the sequence πR. First, randomly extract a task δ from the sequence πD. After attempting to insert task δ into all positions of πR, select the position that minimizes the rescue cost for insertion. This process continues until all the removed tasks have been reinserted into the path sequence.
To enhance the diversity and quality of solutions, this study proposes a competition-oriented destruction-reconstruction mechanism. This mechanism includes a history-based exploration strategy, a collective intelligence-driven strategy, and an elite-oriented optimization strategy. Among them, the history-based exploration strategy destroys and reconstructs the random selection of individuals within the history solutions, enhancing the diversity of solutions; the collective intelligence-driven strategy performs destruction reconstruction operations on all individuals in the population while retaining the one with the best rescue cost and mining global information to enhance search depth; and the elite-oriented optimization strategy focuses on the current optimal individual and conducts fine optimization around their neighborhood to enhance local search ability. Moreover, this mechanism evaluates the candidate solutions generated by the three strategies based on their rescue cost and selects the best-performing solution. Algorithm 3 provides the pseudocode for the competition-oriented destruction-reconstruction mechanism.
Algorithm 3: Competition-oriented destruction-reconstruction mechanism
Input: population: POP, population size: ps, destruction-reconstruction size: ld, the set of historical solutions: Histsol
Output: new population: POP’
1Candsolϕ, Unionϕ;
2# History-based exploration strategy    # indicates a remark.
3πrh ← randomly select an individual from Histsol;
4 π r h D , π r h R Destruction(πrh, ld);
5π1Reconstruction( π r h D , π r h R );
6Add π1 to Candsol;
7# Collective intelligence-driven strategy
8for each individual ∈ POP do
9           π i D , π i R Destruction(πi, ld);
10          πiReconstruction( π i D , π i R );
11end
12POP1 ← update POP;
13idxbest1= arg( m i n j = p s (1f(POP1j));
14π2POP1idxbest;
15Add π2 to Candsol;
16# Elite-oriented optimization strategy
17idxbest2= arg( m i n j = p s 1(f(POPj));
18πeoPOPidxbest2;
19 π e o D , π e o R Destruction(πeo, ld);
20π3Reconstruction( π e o D , π e o R );
21Add π3 to Candsol;
22UnionPOPCandsol;
23POP’ ← Select the top ps individuals with the rescue cost ranking from Union;
24return POP’;

4.4. Q-Learning-Based Local Search Strategy

Q-learning is an RL algorithm that allows agents to identify the most effective actions through feedback obtained from reward mechanisms. It mainly consists of intelligent agents, environment, state, actions, and rewards. An agent takes actions in a given state, and the environment will transition to a new state based on the actions and the current state and reward the agent. Through repeated loops, the experience of the intelligent agent will be improved. The Q-table is an important component of Q-learning, used to guide individuals in selecting actions during different iteration processes, as shown in Table 2. Rows represent states, and columns represent actions. Each combination of states and actions corresponds to a Q-value, which reflects the learning process of the agent. The Q-value update is given by the following formula:
Q ( S t , a t ) = ( 1 α ) Q ( S t , a t ) + α ( R + γ max Q ( S t + 1 , a t + 1 ) )
where Q(St, at) denotes the Q-value for executing action at in state St, α is the learning rate, R represents the reward received after taking the action, γ is the discount rate, and maxQ(St+1, at+1) corresponds to the maximum expected Q-value in the Q-table for state St+1.
(1)
Status
The solutions form the set of states in Q-learning, with each individual representing a particular state. Specifically, state St represents the current solution of the QPIG algorithm at time step t. The state St is observable and distinguishable, meaning that the agent can accurately identify and distinguish different states based on the characteristics of the current solution. State transitions satisfy the Markov property, meaning that future states depend only on the current state and the selected action and are independent of historical states.
(2)
Action
The action indicates the neighborhood structure of the solution in the Q-learning environment model. Currently, no universal improvement strategy based on a neighborhood structure exists that is applicable to all problems. In QPIG, the set of neighborhood structures is designed with six operators specific to the problem. The critical path mentioned refers to the path sequence with the highest rescue cost. These neighborhood operators are explained as follows, and the graphical representation is provided in Figure 4.
(a)
Critical path swap operator (LS_op1)
Identify the critical path first, and then randomly choose a task from it as the critical task, Tc. Then, exchange Tc with each non-critical task Tn until a promising path sequence is found, as shown in Figure 4a.
(b)
Critical path and non-critical path swap operator (LS_op2)
Randomly choose a critical task Tc from the critical path, and randomly choose a non-critical path from the solution. Swap the critical task Tc with each non-critical task Tn in the non-critical path to generate a new path sequence, as depicted in Figure 4b.
(c)
Critical path insertion operator (LS_op3)
Select a task, Tc, at random from the critical path. Subsequently, Tc is inserted into the preceding and following locations of every non-critical task Tn for swapping to generate a new path sequence, as illustrated in Figure 4c.
(d)
Critical path and non-critical path insertion operator (LS_op4)
Choose a task Tc at random from the critical path, and randomly choose a non-critical path from the solution. Subsequently, insert the critical task Tc into the preceding and following locations of every non-critical task Tn in the non-critical path to generate a promising path sequence, as presented in Figure 4d.
(e)
Path reverse operator (LS_op5)
Select a continuous task in the critical path or non-critical path, and reverse the subsequence to generate a promising path sequence, as depicted in Figure 4e.
(f)
Path right shift operator (LS_op6)
Move the subsequence to the right in the critical path or non-critical path, move the last element of the subsequence to the first position of the subsequence, and move the other elements back in sequence to generate a new path sequence, as shown in Figure 4f.
(3)
Action selection
The ε-greedy strategy is employed to explore new action options during the training process, thereby improving the global optimality of learning. In this study, a larger ε value is set in the initial stage to encourage the algorithm to explore a wide range of local operators. As iterations continue, the value of ε progressively decreases linearly, causing the strategy to lean towards utilizing existing knowledge. The formula used to compute ε is as follows:
ε g = max ε min , ε 0 g G ( ε 0 ε min )
where ε0 is the initial value, and εmin is the minimum value of ε, g indicates the current iteration count, and G denotes the total allowed iterations.
(4)
Reward
After executing an action, an individual can receive a reward R. In QPIG, the reward R is calculated as follows:
R = 1                         f < f 0                         f = f 1                   f > f
where f’ denotes the total rescue cost of the new solution, and f represents the total rescue cost of the current solution.

4.5. Acceptance Criteria

The acceptance criterion based on simulated annealing (SA) is employed to enhance the diversity of solutions [40]. Its core strategy involves accepting the current solution with a given probability p if the rescue cost of the newly generated solution exceeds that of the current one, rather than directly replacing it with the new solution. The calculation formula for receiving probability p is as follows:
p = min exp F ( π ) F ( π n e w ) T e m p e r a t u r e , 1
where π represents the current solution and πnew denotes the newly generated solution. The temperature settings are as follows:
T e m p e r a t u r e = σ × k = 1 m i = 1 n t i k 10 × m × n
where σ is the constant temperature value, tik is the service time of robot k at rescue point i, m denotes the number of robots, and n refers to the number of rescue points.

4.6. Historical Information-Based Constructive Strategy

The construction strategy based on historical information aims to generate a better path by borrowing high-quality parts from multiple historical solutions. Its core idea is to identify the parts that perform well by utilizing path data from historical solutions and optimize them accordingly. By integrating the advantages of different solutions, this strategy aims to reduce the search space, thereby improving the algorithm’s convergence efficiency. The pseudocode of the historical information-based constructive strategy is shown in Algorithm 4. The historical path allocation operator is described in Algorithm 5. Algorithm 6 provides pseudocode for the path refinement operator.
Algorithm 4: Historical information-based constructive strategy
Input: the set of current solutions: RSeq
Output: historical solution: πhistory
1rmax = 0;
2for each πRSeq do
3        cn = count (unique robots in π);
4        if cn > rmax then
5               rmax = cn;
6         end
7end
8RobMat = CreateRobotPathMatrix(rmax);
9PEbest = Historical path allocation operator(rmax, RobMat);
10πhistory = Path refinement operator(PEbest, RSeq);
11return πhistory;
Algorithm 5: Historical path allocation operator
Input: maximum number of robots: rmax, path matrix: RobMat
Output: the optimal path for each robot: PEbest
1nsol = size(RobMat, 2);
2for i = 1:rmax do
3        Pbest = inf;
4        for j = 1:nsol do
5               Pnew = RobMat(i, j);
6               if f(Pnew) < f(Pbest) then
7                       Pbest = Pnew;
8                end
9         end
10PEbest(i) = Pbest;
11end
12return PEbest;
Algorithm 6: Path refinement operator
Input: the optimal path for each robot: PEbest, the set of current solutions: RSeq
Output: historical solution: πhistory
1πopt = CombinePath(PEbest);
2πopt1 = RemoveDuplicates(πopt);
3πref = RSeq(randi([1, length(RSeq)]));
4MN = FindMissingPoints(πref, πopt1);
5for each point z in MN do
6       bestpos = FindOptimalInsertion(πopt1, z);
7       πopt1 = InsertPointsToPath(πopt1, z, bestpos);
8end
9πhistory = πopt1;
10return πhistory;

4.7. Computational Time Complexity of QPIG

In the initialization phase, two problem-specific heuristic strategies are designed, including the utility-based initialization strategy and the rescue cost-based initialization strategy. The time complexity of the utility-based initialization strategy is O(n2), and similarly, the time complexity of the rescue cost-based initialization strategy is also O(n2), where n represents the number of rescue points. In the destruction and reconstruction phase, a competition-oriented destruction-reconstruction mechanism is developed to improve the diversity of the search process. Specifically, the time complexity of the history-based exploration strategy is O(ld), where ld represents the destruction-reconstruction size. The collective intelligence-driven and elite-oriented optimization strategies each have a time complexity of O(ps⋅ld), where ps denotes the population size. Additionally, operations such as merging populations and selecting the top individuals introduce a complexity of O(ps) and O(pslogps), respectively. Therefore, the overall time complexity of this mechanism is O(ps⋅ld + pslogps). In the local search phase, six neighborhood operators, i.e., critical path swap, critical path and non-critical path swap, critical path insertion, critical path and non-critical path insertion, path reverse, and path right shift operators, each have a time complexity of O(n). Additionally, operations such as Q-value updates, reward calculations, and state management are also performed. Therefore, the overall time complexity of the Q-learning-based local search strategy is O(n⋅G), where G is the number of iterations in this strategy. The time complexity of the historical information-based constructive strategy is O(n2). In summary, the time complexity of the QPIG algorithm is O(n2).

5. Simulation Results and Analysis

5.1. Experimental Instances

To evaluate the effectiveness of the QPIG algorithm, we conducted experiments in MATLAB R2020a on an Intel Core i7 2.10 GHz processor and 16 GB RAM. The experiment selected geographic information from Yantai, a city in Shandong Province, as the actual background. The test data were sourced from the Solomon dataset [41]. The Solomon dataset contains three types of instances, namely random instances, clustered instances, and semi-clustered instances. The extended instance is consistent with the Solomon dataset in terms of coordinates, demand, service time, and time window, and it generates initial task utility randomly for each rescue point. To distinguish these extended instances, we added a prefix of “T” to the instance names, indicating that these instances are based on the Solomon dataset and have added task utility to the extended versions. Taking TC101 as an example, the distribution information of rescue centers and rescue points is shown in Figure 5. The red cross in the picture denotes the rescue center, and the red dots represent the rescue points. To guarantee fairness, identical stopping criteria and a consistent maximum CPU time of 100 s are applied to all algorithms. The effectiveness of the algorithms is evaluated by taking the average of 30 independent runs for each instance.

5.2. Weight Coefficient Setting

The metrics in the rescue task differ in their ranges of variation. After preliminary experiments, all metrics are normalized using min–max normalization to ensure comparability. The normalization equation is expressed as follows:
x norm = x x min x max x min
where x and xnorm denote the original and normalized values, respectively, and xmin and xmax are the minimum and maximum values of the corresponding metric. Additionally, the weights are determined using a least squares fitting approach based on reference solutions [42], thereby reflecting the relative importance of each indicator. The weight coefficients are computed as follows: wl = 0.42, wn = 0.62, wt = 0.09, and wr = 0.12.

5.3. Parameter Settings and Sensitivity Analysis

The algorithm’s performance is greatly influenced by the adjustment of the appropriate parameters. In this work, four parameters are considered, including population size (ps), the length of removed tasks in the destruction and reconstruction phase (ld), learning rate (α), and discount rate (γ). Based on our preliminary experiments on instance TC101, the parameter levels are presented in Table 3. Similarly to [43,44], the design-of-experiment (DOE) is employed to assess how these four parameters influence the performance of the algorithm. The parameter combinations are determined using the orthogonal array L16 (44), as shown in Table 4. To ensure statistical reliability, the QPIG runs each parameter combination independently 30 times and collects the average rescue cost as the response variable (RV). Figure 6 illustrates the trend of the factor levels for the parameters. Moreover, Table 5 provides the average response value and ranking of each parameter. It can be observed that (1) these four parameters have a significant impact on the performance of the proposed QPIG. (2) The parameter ld exerts the most significant influence on the algorithm, followed by ps, γ, and α. (3) The parameter combination settings of ps = 20, ld = 10, α = 0.3, and γ = 0.7 can achieve optimal performance of the algorithm.

5.4. Performance of the Developed Components

This study presents three key components: two problem-specific heuristic schemes, the competition-oriented destruction-reconstruction mechanism, and the historical information-based constructive strategy. To evaluate the performance of the three proposed components, we investigated four algorithms: the first is the QPIG, and the second does not use two problem-specific heuristic schemes (QPIG-NI). Indeed, QPIG-NI employs a random approach to establish the initial population, while all other components stay the same. Additionally, the third is not using the competition-oriented destruction-reconstruction mechanism (QPIG-ND), and the fourth is not using the historical information-based constructive strategy (QPIG-NH). All algorithms are independently run 30 times on 56 instances. The algorithm’s performance is quantified using the relative percentage increase (RPI).
R P I = f c f best f best × 100
where fbest is the optimal value found by all compared algorithms, and fc is the optimal value from a specific algorithm.
Table 6 lists the data of all compared algorithms. The first column is the name of the instances, the second column represents the optimal solutions of the four algorithms, the next four columns denote the experimental results of the QPIG, QPIG-NI, QPIG-ND, and QPIG-NH algorithms, and the RPI values correspond to the four algorithms. The comparison results indicate that (1) among the 56 large-scale instances, the QPIG algorithm obtained 49 optimal solutions, whereas the other three algorithms achieved only 7 optimal solutions. Specifically, (a) in the TC and TR2 instance series, QPIG obtained all optimal solutions. In these test instances, the rescue point distribution in the TC series is clustered, while in the TR2 series, it is randomly distributed with relatively wide time windows. (b) In the TR1 instance series, the QPIG algorithm obtained 10 optimal solutions, while QPIG-NI obtained two optimal solutions. In the TR1 series, the rescue point distribution is random, and the time windows are relatively narrow. The reason for this situation is as follows: Although heuristic initialization accelerates convergence, it may lead to local optima. Random initialization allows the algorithm to start from different initial solutions, enabling a broader exploration of the solution space and avoiding premature convergence to local optima. In the case of randomly distributed rescue points and narrow time windows, heuristic initialization may favor certain specific distributions, limiting the exploration of the solution space. Random initialization avoids this problem and provides more opportunities for exploration. (c) In the TRC instance series, the QPIG algorithm obtained 12 optimal solutions, QPIG-NI obtained three optimal solutions, and QPIG-NH obtained two optimal solutions. In the TRC series, the rescue point distribution exhibits a mixed pattern of random and clustered distributions. The reasons for this situation are as follows: excessive reliance on historical information may cause the algorithm to be unable to adapt to new environments, especially in the case of a mixed distribution of rescue points that are both random and clustered. QPIG-NH can dynamically adjust the search strategy based on the current task and environment, better adapting to changing conditions. In mixed distribution scenarios, QPIG-NI starts from multiple random solutions, avoiding the limitations of heuristic initialization, enabling a broader exploration of the solution space, and preventing local optima. (2) The QPIG algorithm achieves an average RPI of 0.15, which is markedly lower than the mean RPI values achieved by the other three algorithms. The average RPI value of QPIG-ND is the highest, indicating that it has the most significant impact on the algorithm’s performance. These data suggest that future improvements should focus on enhancing the destruction-reconstruction component to improve the algorithm’s global search capability.
To determine the differences between these compared algorithms, we use the analysis of variance (ANOVA) method. Figure 7 illustrates the ANOVA results of these four algorithms. Figure 7a displays the results of the ANOVA implemented by QPIG and QPIG-NI. The p-value is 4.76 × 10−10, substantially lower than 0.05, indicating a notable difference between these two algorithms and verifying the effectiveness of the two problem-specific heuristic schemes. The ANOVA between QPIG and QPIG-ND shows a p-value of 1.32 × 10−21, which is much smaller than 0.05, as illustrated in Figure 7b. This indicates that the competition-oriented destruction-reconstruction mechanism is effective. The p-value is 6.94 × 10−5, which is much lower than 0.05, demonstrating a notable difference between QPIG and QPIG-NH and verifying the effectiveness of the historical information-based constructive strategy, as illustrated in Figure 7c. Figure 7d presents the result plot between QPIG components obtained through Tukey’s test within a 95% LSD confidence interval. The data indicate that the competition-oriented destruction-reconstruction mechanism is the most important component of QPIG, followed by the historical information-based constructive strategy and two problem-specific heuristic schemes, with their importance decreasing in order.
To illustrate the impact of the Q-learning-based local search strategy, we designed four algorithms: the first is QPIG, the second does not use a local search strategy (QPIG-NL), the third does not use Q-learning and instead employs a random method to select six neighborhood operators (QPIG-RS), and the fourth also does not use Q-learning and instead employs a probabilistic method to select six neighborhood operators (QPIG-PS). Independently run 30 experiments to collect data obtained by all algorithms. Table 7 presents the comparative results of QPIG, QPIG-NL, QPIG-RS, and QPIG-PS algorithms. It can be inferred from Table 7 that (1) among the given 56 examples, the QPIG algorithm obtained all optimal solutions; (2) the mean RPI obtained by the QPIG algorithm is 0, which is much lower than the other three algorithms. Figure 8 shows the ANOVA results for the QPIG, QPIG-NL, QPIG-RS, and QPIG-PS algorithms. The p-value is 1.24 × 10−22, significantly lower than 0.05, demonstrating the effectiveness of Q-Learning in selecting local search operators. This significant improvement is due to the adaptive nature of Q-learning, which dynamically adjusts the local search strategy based on the algorithm’s performance history. Compared to random or probabilistic methods, Q-learning can focus on more promising search directions, thereby enhancing the algorithm’s local search ability.

5.5. Comparison with Five Effective Algorithms

To analyze the performance of the QPIG algorithm in solving the MRRPP problem, five effective algorithms are selected for comparison, namely IIG [8], CDABC [45], MPSO [46], Q_DPIG [28], and QIG [47]. The IIG algorithm designs the feasible-first destruction-construction strategy aimed at broadening the search scope, which is used to solve the helicopter rescue routing problem with time windows and limited survival time. The CDABC algorithm has strong collaborative search capabilities and incorporates single-robot and multi-robot neighborhood operators. It utilizes dynamic strategy lists and the collaborative following-bee strategy to effectively escape local optima. The MPSO algorithm initializes the particle swarm within a predefined search space and iteratively updates to find the global optimal position, determining the key waypoints of the robot’s path, with strong adaptability. The Q_DPIG algorithm uses Q-learning to select the appropriate destruction operator. An efficient and simple method is designed for the local optimization of the critical robot or the robot with the highest load, improving the stability of the solution. The QIG algorithm proposes a new selection mechanism inspired by Q-learning to provide strategic guidance, exhibiting excellent performance in convergence speed and solution quality. All compared algorithm parameters are calibrated, as shown in Table 8. We encoded and tested these five algorithms on the same computer, with consistent testing conditions. For each instance, each algorithm is executed 30 times, with a maximum CPU time of 100 s per execution, and the average rescue cost is recorded.
T: temperature parameter for acceptance criterion; rcal: the experimentally calibrated parameter in the collaborative employed bee stage; λ: maximum iteration threshold for continuous unimproved individuals; ωmax: maximum inertia weight; ωmin: minimum inertia weight; c1max: maximum cognitive parameter; c1min: minimum cognitive parameter; c2max: maximum social parameter; c2min: minimum social parameter; v: particle’s velocity; : greedy selection probability.
Table 9 presents a comparison of the average rescue cost and RPI values across all algorithms. The following points are evident from Table 9: (1) the proposed QPIG achieved 51 optimal values in 56 instances; (2) the remaining five algorithms only obtained a total of 5 optimal solutions, among which the IIG algorithm obtained 4 optimal solutions, the QIG algorithm obtained one optimal solution, and the CDABC, MPSO, and Q_DPIG algorithms did not obtain any optimal solutions. (3) According to the last row of Table 9, the QPIG algorithm has an average RPI of 0.05, which is equivalent to 0.63%, 0.59%, 0.75%, 0.63%, and 1.05% of the average RPI of IIG, CDABC, MPSO, Q_DPIG, and QIG algorithms, respectively. The ANOVA shows a p-value of 1.29 × 10−25, as shown in Figure 9, which is substantially lower than 0.05, demonstrating notable differences between the proposed QPIG algorithm and the algorithms used for comparison. The data distribution of the QPIG algorithm is relatively concentrated, with relatively few outliers, and its performance is relatively stable. This suggests that the QPIG algorithm outperforms others in solving the MRRPP problem.
The 56 instances are divided into four categories based on coordinates: TC101 to TC109, TC201 to TC208, TR101 to TR112, and TRC101 to TRC208. Randomly select an example from each category, and the rescue route is shown in Table 10. The first column indicates the selected instance name, the second column shows the number of robots (Nr), and the last column represents the rescue route. Table 10 illustrates the performance of the QPIG algorithm across various instances. Furthermore, Figure 10 presents the rescue path diagrams of these instances.
To clearly demonstrate the evolution process of all the algorithms, Figure 11 shows the convergence performance of the rescue cost on six selected instances (i.e., TC106, TC201, TR112, TR209, TRC106, and TRC208). All compared algorithms are set with a maximum runtime limit of 100 s. As shown in Figure 11, QPIG achieved the optimal initial value, indicating that the two problem-specific heuristic schemes effectively generate high-quality initial solutions. Additionally, among all the compared algorithms, the QPIG algorithm has a better convergence speed and the lowest total rescue cost, which reflects the effectiveness of the competition-oriented destruction-reconstruction mechanism, Q-learning-based local search strategy, and historical information-based constructive strategy. In summary, the simulation results indicate that QPIG can provide higher-quality solutions when solving the MRRPP problem.

5.6. Statistical Analysis

The Friedman test was employed to analyze the performance differences among IIG, CDABC, MPSO, Q_DPIG, QIG, and QPIG. The null hypothesis assumes no significant performance differences between the algorithms, with a significance level set at α = 0.05. The null hypothesis is rejected when the p-value is less than α. Figure 12 visually presents the significant differences between the algorithms under the two evaluation metrics, RPI and rescue cost. Table 11 provides more detailed data, including rank values, the number of instances (N), mean, standard deviation, minimum, and maximum.
In Table 11 for the RPI metric, the results show that QPIG, IIG, CDABC, MPSO, Q_DPIG, and QIG achieve ranks of 1.16, 4.36, 4.77, 3.64, 4.52, and 2.55, respectively. QPIG demonstrates the best performance with a minimum value of 0, a maximum of 1.49, a standard deviation of 0.21, and a mean of 0.05. Similarly, for the rescue cost metric, the rankings remain consistent, with QPIG achieving the top results, with a minimum of 962.40, a maximum of 2823.61, a standard deviation of 543.84, and a mean of 1515.62. These results highlight QPIG’s outstanding performance in both the RPI and rescue cost metrics, supporting its reliability and superiority in addressing the MRPP problem.
To further validate the performance distinctions among the five algorithms, we applied the Wilcoxon signed-rank tests to both the RPI and rescue cost metrics. The detailed results, presented in Table 12, indicate statistically significant differences at the α = 0.05 level across all 56 instances, clearly demonstrating the variability in the performance of these algorithms.

6. Conclusions and Future Work

This study presents the QPIG algorithm to solve the MRRPP problem. To obtain a high-quality initial population, two problem-specific heuristic schemes are designed. A competition-oriented destruction-reconstruction mechanism is applied, which includes the history-based exploration strategy, the collective intelligence-driven strategy, and the elite-oriented optimization strategy to enhance global exploration ability. A Q-learning-based local search strategy is implemented to select six operators to improve the algorithm’s exploitation ability. Additionally, a historical information-based constructive strategy is investigated to accelerate the algorithm’s convergence. Finally, experiments on 56 instances are conducted using the QPIG algorithm and other compared algorithms. This confirms the effectiveness of the QPIG algorithm in tackling the MRRPP problem. In addition, we analyzed the convergence curve of the algorithm and found that the proposed QPIG can achieve results close to the optimal solution within 10 s and 20 s, and the difference from the optimal value is usually not more than 5%. This result indicates that our method not only converges rapidly but also has good stability and efficiency, making it suitable for real-time applications with high time requirements. Therefore, even with the continuous improvement of scale or map resolution, QPIG can maintain stable computing behavior, converge within a reasonable time range, and provide near-optimal solutions, indicating its good scalability.
The limitations of this work include the following: (1) The QPIG algorithm does not adequately account for dynamic obstacles in rescue scenarios. (2) The proposed QPIG lacks consideration of the priority of task points, which may lead to neglecting the priority processing of emergency task points. (3) This study does not consider robot malfunctions or damage, which could disrupt the vital search process.
Future research will investigate the following directions: (1) The influence of environmental conditions, particularly the challenges introduced by moving obstacles, will be taken into account. (2) As mentioned in [43], we will address the rescheduling problem when robots encounter damage, ensuring that tasks are reassigned and operations continue efficiently. (3) We will focus on incorporating priority approaches to handle urgent rescue tasks in path planning.

Author Contributions

Conceptualization, M.L. and P.D.; methodology, M.L.; software, M.L.; validation, M.L.; investigation, M.L. and P.D.; writing—original draft preparation, M.L.; writing—review and editing, P.D.; supervision, P.D.; funding acquisition, P.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Open Research Fund of Key Laboratory of Machine Intelligence and System Control, Ministry of Education (MISC-202411), the Natural Science Foundation of Shandong Province, China (ZR2021MD090), and the Discipline with Strong Characteristics of Liaocheng University-Intelligent Science and Technology, China (319462208).

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

0Rescue center
NSet of the rescue center and rescue points
TSet of rescue points
RSet of robots
i, jIndices of rescue points
kIndex of robots
dijDistance between rescue points Ti and Tj
wlWeight coefficient for path length
wnWeight coefficient for the number of robots
wtWeight coefficient for rescue time
wrWeight coefficient for task utility
aiStart time of the time window for the rescue point Ti
biEnd time of the time window for the rescue point Ti
driMaterial demand at the rescue point Ti
C k m a x Maximum carrying capacity of robot Rk
UiInitial utility of the task at the rescue point Ti
μiTime decay factor affecting the task at the rescue point Ti
tikService start time of robot Rk at rescue point Ti
τ n e Completion time of the last rescue point task
τ 1 s Start time of the first rescue point task
EkBattery consumption of robot Rk while executing the task at rescue point Ti
di,kDistance traveled by robot Rk while executing the task at rescue point Ti
EaBattery consumption per unit distance traveled by the robot
ξReserve battery capacity
QkTotal battery capacity of robot Rk

References

  1. Pelton, J.N. Natural disasters. In Space Systems and Sustainability: From Asteroids and Solar Storms to Pandemics and Climate Change; Springer Nature: Berlin/Heidelberg, Germany, 2021; pp. 193–207. [Google Scholar] [CrossRef]
  2. Ransikarbum, K.; Mason, S.J. A bi-objective optimisation of post-disaster relief distribution and short-term network restoration using hybrid NSGA-II algorithm. Int. J. Prod. Res. 2022, 60, 5769–5793. [Google Scholar] [CrossRef]
  3. Surmann, H.; Daun, K.; Schnaubelt, M.; von Stryk, O.; Patchou, M.; Böcker, S.; Wietfeld, C.; Quenzel, J.; Schleich, D.; Behnke, S. Lessons from robot—Assisted disaster response deployments by the German rescue robotics center task force. J. Field Robot. 2024, 41, 782–797. [Google Scholar] [CrossRef]
  4. Martinez-Alpiste, I.; Golcarenarenji, G.; Wang, Q.; Alcaraz-Calero, J.M. Search and rescue operation using UAVs: A case study. Expert Syst. Appl. 2021, 178, 114937. [Google Scholar] [CrossRef]
  5. Al-Hussaini, S.; Gregory, J.M.; Gupta, S.K. Generating task reallocation suggestions to handle contingencies in human-supervised multi-robot missions. IEEE Trans. Autom. Sci. Eng. 2023, 21, 367–381. [Google Scholar] [CrossRef]
  6. Akay, R.; Yildirim, M.Y. Multi-strategy and self-adaptive differential sine–cosine algorithm for multi-robot path planning. Expert Syst. Appl. 2023, 232, 120849. [Google Scholar] [CrossRef]
  7. Lyu, M.; Zhao, Y.; Huang, C.; Huang, H. Unmanned aerial vehicles for search and rescue: A survey. Remote Sens. 2023, 15, 3266. [Google Scholar] [CrossRef]
  8. Cui, X.; Yang, K.; Wang, X.; Duan, P. An improved iterated greedy algorithm for solving collaborative helicopter rescue routing problem with time window and limited survival time. Algorithms 2024, 17, 431. [Google Scholar] [CrossRef]
  9. Lin, S.; Geng, N.; Sun, J.; Zhang, Y. A grouping method based on improved PSO for task allocation in rescue environment. In Proceedings of the 2019 IEEE Congress on Evolutionary Computation (CEC), Wellington, New Zealand, 10–13 June 2019; pp. 619–626. [Google Scholar] [CrossRef]
  10. Geng, N.; Chen, Z.; Nguyen, Q.A.; Gong, D. Particle swarm optimization algorithm for the optimization of rescue task allocation with uncertain time constraints. Complex Intell. Syst. 2021, 7, 873–890. [Google Scholar] [CrossRef]
  11. Huang, J.; Song, Q.; Xu, Z. Multi robot cooperative rescue based on two-stage task allocation algorithm. J. Phys. Conf. Ser. 2022, 2310, 012091. [Google Scholar] [CrossRef]
  12. Wang, X.; Duan, P.; Meng, L.; Yang, K. An improved iterated greedy algorithm for solving rescue robot path planning problem with limited survival time. Comput. Mater. Contin. 2024, 80, 931–947. [Google Scholar] [CrossRef]
  13. Liu, W.; Zhou, Y.; Liu, W.; Qiu, J.; Xie, N.; Chang, X.; Chen, J. A hybrid ACS-VTM algorithm for the vehicle routing problem with simultaneous delivery & pickup and real-time traffic condition. Comput. Ind. Eng. 2021, 162, 107747. [Google Scholar] [CrossRef]
  14. Chen, Y.; Huang, K. Solving Heterogeneous Capacitated Vehicle Routing Problem Assisted with Deep Reinforcement Learning. In Proceedings of the 2025 International Conference on Advanced Machine Learning and Data Science (AMLDS), Tokyo, Japan, 19–21 July 2025; pp. 472–477. [Google Scholar] [CrossRef]
  15. Wang, H.; Chen, W. Multi-robot path planning with due times. IEEE Robot. Autom. Lett. 2022, 7, 4829–4836. [Google Scholar] [CrossRef]
  16. Asfora, B.A.; Banfi, J.; Campbell, M. Mixed-integer linear programming models for multi-robot non-adversarial search. IEEE Robot. Autom. Lett. 2020, 5, 6805–6812. [Google Scholar] [CrossRef]
  17. Velasco, L.; Guerrero, H.; Hospitaler, A. A literature review and critical analysis of metaheuristics recently developed. Arch. Comput. Methods Eng. 2024, 31, 125–146. [Google Scholar] [CrossRef]
  18. Wang, Y.; Wang, Y.; Han, Y.; Gao, K.; Li, J.; Wang, Y. Enhancing distributed blocking flowshop group scheduling: Theoretical insight and application of an iterated greedy algorithm with idle time insertion and rapid evaluation mechanisms. Expert Syst. Appl. 2025, 271, 126600. [Google Scholar] [CrossRef]
  19. Zou, W.; Pan, Q.; Tasgetiren, M.F. An effective iterated greedy algorithm for solving a multi-compartment AGV scheduling problem in a matrix manufacturing workshop. Appl. Soft Comput. 2021, 99, 106945. [Google Scholar] [CrossRef]
  20. Liu, J.; Pan, Q.; Li, W.; Wang, B. A novel elite-preserving iterated greedy algorithm with Q-learning for cascaded flowshop joint scheduling problem. Expert Syst. Appl. 2026, 297, 129512. [Google Scholar] [CrossRef]
  21. Wang, Y.; Han, Y.; Wang, Y.; Li, J.; Gao, K.; Liu, Y. An effective two-stage iterated greedy algorithm for distributed flowshop group scheduling problem with setup time. Expert Syst. Appl. 2023, 233, 120909. [Google Scholar] [CrossRef]
  22. Xu, X.; Zhang, L.; Trovati, M.; Palmieri, F.; Asimakopoulou, E.; Johnny, O.; Bessis, N. PERMS: An efficient rescue route planning system in disasters. Appl. Soft Comput. 2021, 111, 107667. [Google Scholar] [CrossRef]
  23. Morin, M.; Abi-Zeid, I.; Quimper, C.-G. Ant colony optimization for path planning in search and rescue operations. Eur. J. Oper. Res. 2023, 305, 53–63. [Google Scholar] [CrossRef]
  24. Wen, H.; Shi, Y.; Wang, S.; Chen, T.; Di, P.; Yang, L. Route planning for UAVs maritime search and rescue considering the targets moving situation. Ocean Eng. 2024, 310, 118623. [Google Scholar] [CrossRef]
  25. Alrayes, F.S.; Dhahbi, S.; Alzahrani, J.S.; Mehanna, A.S.; Al Duhayyim, M.; Motwakel, A.; Yaseen, I.; Atta Abdelmageed, A. Enhanced search-and-rescue optimization-enabled secure route planning scheme for internet of drones environment. Appl. Sci. 2022, 12, 7950. [Google Scholar] [CrossRef]
  26. Liu, K.; Liu, D.; An, K.; Liudandan, L. Efficient multi-objective path planning for complex disaster environments: Hierarchical target filtering and ranking optimization strategies. Intell. Serv. Rob. 2025, 18, 13–25. [Google Scholar] [CrossRef]
  27. Bouamama, S.; Blum, C.; Boukerram, A. A population-based iterated greedy algorithm for the minimum weight vertex cover problem. Appl. Soft Comput. 2012, 12, 1632–1639. [Google Scholar] [CrossRef]
  28. Miao, Z.; Guo, H.; Pan, Q.; Peng, C.; Xu, Z. A reinforcement learning-enhanced multi-objective iterated greedy algorithm for weeding-robot operation scheduling problems. Expert Syst. Appl. 2025, 263, 125760. [Google Scholar] [CrossRef]
  29. Zou, W.; Zou, J.; Sang, H.; Meng, L.; Pan, Q. An effective population-based iterated greedy algorithm for solving the multi-AGV scheduling problem with unloading safety detection. Inf. Sci. 2024, 657, 119949. [Google Scholar] [CrossRef]
  30. Wang, C.; Pan, Q.; Sang, H. The cascaded flowshop joint scheduling problem: A mathematical model and population-based iterated greedy algorithm to minimize total tardiness. Robot. Comput. Integr. Manuf. 2024, 88, 102747. [Google Scholar] [CrossRef]
  31. Zhao, H.; Pan, Q.; Gao, K. A cooperative population-based iterated greedy algorithm for distributed permutation flowshop group scheduling problem. Eng. Appl. Artif. Intell. 2023, 125, 106750. [Google Scholar] [CrossRef]
  32. Wang, Y.; Wang, Y.; Han, Y.; Li, J.; Gao, K. A rapid population-based iterated greedy for distributed blocking group flowshop scheduling with delivery time windows under multiple processing time scenarios. Comput. Ind. Eng. 2025, 202, 110949. [Google Scholar] [CrossRef]
  33. Li, J.; Li, R.; Li, J.; Yu, X.; Xu, Y. A multi-dimensional co-evolutionary algorithm for multi-objective resource-constrained flexible flowshop with robotic transportation. Appl. Soft Comput. 2025, 170, 112689. [Google Scholar] [CrossRef]
  34. Zhao, F.; Zhuang, C.; Wang, L.; Dong, C. An iterative greedy algorithm with Q-learning mechanism for the multiobjective distributed no-idle permutation flowshop scheduling. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 3207–3219. [Google Scholar] [CrossRef]
  35. Ren, Y.; Gao, K.; Fu, Y.; Li, D.; Suganthan, P.N. Ensemble artificial bee colony algorithm with Q-learning for scheduling bi-objective disassembly line. Appl. Soft Comput. 2024, 155, 111415. [Google Scholar] [CrossRef]
  36. Luo, C.; Gong, W.; Ming, F.; Lu, C. A Q-learning memetic algorithm for energy-efficient heterogeneous distributed assembly permutation flowshop scheduling considering priorities. Swarm Evol. Comput. 2024, 85, 101497. [Google Scholar] [CrossRef]
  37. Chen, T.; Miao, Z.; Li, W.; Pan, Q. A learning-based memetic algorithm for a cooperative task allocation problem of multiple unmanned aerial vehicles in smart agriculture. Swarm Evol. Comput. 2024, 91, 101694. [Google Scholar] [CrossRef]
  38. Fang, W.; Liao, Z.; Bai, Y. Improved ACO algorithm fused with improved Q-Learning algorithm for Bessel curve global path planning of search and rescue robots. Robot. Auton. Syst. 2024, 182, 104822. [Google Scholar] [CrossRef]
  39. Zhan, H.; Zhang, Y.; Huang, J.; Song, Y.; Xing, L.; Wu, J.; Gao, Z. A reinforcement learning-based evolutionary algorithm for the unmanned aerial vehicles maritime search and rescue path planning problem considering multiple rescue centers. Memet. Comput. 2024, 16, 373–386. [Google Scholar] [CrossRef]
  40. Li, J.; Du, Y.; Gao, K.; Duan, P.; Gong, D.; Pan, Q.; Suganthan, P.N. A hybrid iterated greedy algorithm for a crane transportation flexible job shop problem. IEEE Trans. Autom. Sci. Eng. 2021, 19, 2153–2170. [Google Scholar] [CrossRef]
  41. Li, C.; Zhu, Y.; Lee, K.Y. Route optimization of electric vehicles based on reinsertion genetic algorithm. IEEE Trans. Transp. Electrif 2023, 9, 3753–3768. [Google Scholar] [CrossRef]
  42. Zhang, J.; Yu, M.; Feng, Q.; Leng, L.; Zhao, Y. Data-Driven robust optimization for solving the heterogeneous vehicle routing problem with customer demand uncertainty. Complexity 2021, 2021, 6634132. [Google Scholar] [CrossRef]
  43. Wang, X.; Zou, W.; Meng, L.; Zhang, B.; Li, J.; Sang, H. Effective metaheuristic and rescheduling strategies for the multi-AGV scheduling problem with sudden failure. Expert Syst. Appl. 2024, 250, 123473. [Google Scholar] [CrossRef]
  44. Duan, P.; Yu, Z.; Gao, K.; Meng, L.; Han, Y.; Ye, F. Solving the multi-objective path planning problem for mobile robot using an improved NSGA-II algorithm. Swarm Evol. Comput. 2024, 87, 101576. [Google Scholar] [CrossRef]
  45. Guo, H.; Miao, Z.; Ji, J.; Pan, Q. An effective collaboration evolutionary algorithm for multi-robot task allocation and scheduling in a smart farm. Knowl. Based Syst. 2024, 289, 111474. [Google Scholar] [CrossRef]
  46. Poy, Y.L.; Darmaraju, S.; Kwan, B.-H. Multi-robot path planning using modified particle swarm optimization. In Proceedings of the 2023 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), Shah Alam, Malaysia, 17 June 2023; pp. 225–230. [Google Scholar] [CrossRef]
  47. Qin, H.; Han, Y.; Chen, Q.; Wang, L.; Wang, Y.; Li, J.; Liu, Y. Energy-efficient iterative greedy algorithm for the distributed hybrid flow shop scheduling with blocking constraints. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 7, 1442–1457. [Google Scholar] [CrossRef]
Figure 1. The flow chart of the proposed QPIG.
Figure 1. The flow chart of the proposed QPIG.
Mathematics 14 00164 g001
Figure 2. An example of the MRRPP problem.
Figure 2. An example of the MRRPP problem.
Mathematics 14 00164 g002
Figure 3. An example of the solution representation.
Figure 3. An example of the solution representation.
Mathematics 14 00164 g003
Figure 4. Graphical representation of six neighborhood operators.
Figure 4. Graphical representation of six neighborhood operators.
Mathematics 14 00164 g004
Figure 5. Locations of the rescue center and rescue points in instance TC101.
Figure 5. Locations of the rescue center and rescue points in instance TC101.
Mathematics 14 00164 g005
Figure 6. Factor level trend of parameters.
Figure 6. Factor level trend of parameters.
Mathematics 14 00164 g006
Figure 7. ANOVA results of the compared algorithms. (a) ANOVA of QPIG and QPIG-NI; (b) ANOVA of QPIG and QPIG-ND; (c) ANOVA of QPIG and QPIG-NH; (d) Tukey test 95% LSD intervals among QPIG components.
Figure 7. ANOVA results of the compared algorithms. (a) ANOVA of QPIG and QPIG-NI; (b) ANOVA of QPIG and QPIG-ND; (c) ANOVA of QPIG and QPIG-NH; (d) Tukey test 95% LSD intervals among QPIG components.
Mathematics 14 00164 g007
Figure 8. ANOVA comparisons of the QPIG, QPIG-NL, QPIG-RS, and QPIG-PS.
Figure 8. ANOVA comparisons of the QPIG, QPIG-NL, QPIG-RS, and QPIG-PS.
Mathematics 14 00164 g008
Figure 9. ANOVA comparisons of QPIG, IIG, CDABC, MPSO, Q_DPIG, and QIG.
Figure 9. ANOVA comparisons of QPIG, IIG, CDABC, MPSO, Q_DPIG, and QIG.
Mathematics 14 00164 g009
Figure 10. Rescue paths corresponding to four instances.
Figure 10. Rescue paths corresponding to four instances.
Mathematics 14 00164 g010aMathematics 14 00164 g010b
Figure 11. Convergence performance for all algorithms across six instances. (a) Instance TC106; (b) instance TC201; (c) instance TR101; (d) instance TR112; (e) instance TRC106; (f) instance TRC208.
Figure 11. Convergence performance for all algorithms across six instances. (a) Instance TC106; (b) instance TC201; (c) instance TR101; (d) instance TR112; (e) instance TRC106; (f) instance TRC208.
Mathematics 14 00164 g011
Figure 12. Related-samples Friedman’s two-way analysis of variance by ranks.
Figure 12. Related-samples Friedman’s two-way analysis of variance by ranks.
Mathematics 14 00164 g012
Table 1. Analysis of related algorithms.
Table 1. Analysis of related algorithms.
AlgorithmsAdvantagesLimitations
GSOCI [22]The initial distribution is enhanced by using chaotic initialization and Chebyshev mapping to improve population diversity.The initial clustering effect directly determines the quality of path planning.
IACO [23]Propose an effective restart strategy to achieve diverse search.The parameter configuration is complex, and the optimal parameter combination varies significantly across different scenarios.
IIG [12]A greedy-based insertion strategy is employed to improve the algorithm’s exploration capability.Rescue operations are restricted to a single robot.
Q_DPIG [28]Select destruction operators adaptively to evaluate the search condition.The algorithm executes several destruction operations, resulting in high computational complexity.
PBIG [29]Two-stage destruction strategy to avoid local optima, dual selection reconstruction strategy to obtain better neighborhood solutions.It is prone to getting trapped in local optima for complex problems.
ABC_QL [35]Two Q-learning-based strategies are employed to select high-quality operators.The algorithm converges slowly when the solution space complexity is high.
QLMA [36]An efficient neighborhood structure guides the population towards better convergence.The global search capability is weak, making it prone to getting stuck in local optima.
IAC-IQL [38]A heuristic learning evaluation model dynamically adjusts the learning factors to guide the search path.The time complexity of the algorithm is not considered.
GA-RL [39]The individual retention mechanism is used to update the population based on the performance of individuals in the elite depot.The local search is insufficiently performed, and the solution space is not fully explored.
Table 2. Q-table.
Table 2. Q-table.
a1a2at
S1Q(S1, a1)Q(S1, a2)Q(S1, at)
S2Q(S2, a1)Q(S2, a2)Q(S2, at)
SnQ(Sn, a1)Q(Sn, a2)Q(Sn, at)
Table 3. Tested values for each parameter level.
Table 3. Tested values for each parameter level.
ParametersParameter Level
1234
ps15203050
ld5101520
α0.10.20.30.4
γ0.60.70.80.9
Table 4. Orthogonal array and RV values.
Table 4. Orthogonal array and RV values.
CombinationParametersResponse
Variable
psldαγ
11550.10.61716.09
215100.20.71654.36
315150.30.81634.45
415200.40.91670.21
52050.20.81700.05
620100.10.91630.80
720150.40.61675.06
820200.30.71662.97
93050.30.91723.61
1030100.40.81690.11
1130150.10.71671.90
1230200.20.61699.16
135050.40.71685.63
1450100.30.61679.17
1550150.20.91692.32
1650200.10.81711.55
Table 5. The average RPI response values and the rank of each parameter.
Table 5. The average RPI response values and the rank of each parameter.
Levelpsldαγ
11669170616831692
21667166416861669
31696166816751684
41692168616801679
Delta29431124
Rank2143
Table 6. Comparison results for QPIG, QPIG-NI, QPIG-ND, and QPIG-NH.
Table 6. Comparison results for QPIG, QPIG-NI, QPIG-ND, and QPIG-NH.
InstanceBest
Value
AlgorithmsRPI
QPIGQPIG-NIQPIG-NDQPIG-NHQPIGQPIG-NIQPIG-NDQPIG-NH
TC1011609.701609.701645.171774.191669.220.002.2010.223.70
TC1021726.051726.051746.081836.371738.150.001.166.390.70
TC1031665.161665.161671.131758.551666.700.000.365.610.09
TC1041590.131590.131599.361716.511603.890.000.587.950.87
TC1051649.471649.471680.001773.871670.680.001.857.541.29
TC1061595.791595.791621.461642.681600.000.001.612.940.26
TC1071604.301604.301615.431749.231613.980.000.699.030.60
TC1081561.281561.281580.571667.091567.560.001.246.780.40
TC1091520.161520.161544.041616.341526.730.001.576.330.43
TC2012823.612823.612836.772921.512839.420.000.473.470.56
TC2022782.362782.362788.312862.432783.060.000.212.880.03
TC2032779.812779.812794.542838.332780.000.000.532.110.01
TC2042797.822797.822836.662860.282815.440.001.392.230.63
TC2052804.192804.192815.232860.572807.400.000.392.010.11
TC2062608.942608.942659.282691.272614.820.001.933.160.23
TC2072619.962619.962650.602724.462623.790.001.173.990.15
TC2082612.202612.202639.992696.852616.070.001.063.240.15
TR1011261.101261.101262.201338.541271.440.000.096.140.82
TR1021129.821129.821135.801215.471138.390.000.537.580.76
TR1031096.061096.061097.411167.821108.180.000.126.551.11
TR1041008.831008.831010.861111.041019.060.000.2010.131.01
TR1051129.401129.401133.461209.421142.650.000.367.091.17
TR1061108.091108.091110.521185.311120.740.000.226.971.14
TR1071050.061052.401050.061140.521068.600.220.008.611.77
TR108972.06972.06973.861089.26974.600.000.1912.060.26
TR1091021.671021.671023.391134.181024.080.000.1711.010.24
TR1101031.551031.551037.601153.141049.500.000.5911.791.74
TR1111035.511035.511037.111126.431044.260.000.168.780.85
TR112960.91962.40960.911107.71968.270.150.0015.280.77
TR2011467.801467.801472.241561.331478.620.000.306.370.74
TR2021515.161515.161521.191619.631526.680.000.406.890.76
TR2031426.661426.661430.031515.001437.620.000.246.190.77
TR2041377.511377.511378.421471.821407.190.000.076.852.15
TR2051460.351460.351462.501549.421466.990.000.156.100.46
TR2061425.371425.371433.721518.821437.530.000.596.560.85
TR2071337.631337.631338.791416.761344.200.000.095.920.49
TR2081285.341285.341287.701352.251293.810.000.185.210.66
TR2091306.991306.991310.111376.451309.290.000.245.310.18
TR2101449.771449.771458.201556.451460.370.000.587.360.73
TR2111196.741196.741216.651263.811200.070.001.665.600.28
TRC1011169.571169.571180.211331.811190.160.000.9113.871.76
TRC1021159.091159.091165.511294.601173.730.000.5511.691.26
TRC1031141.061141.061145.891152.811147.950.000.421.030.60
TRC1041040.571040.571044.431302.511171.260.000.3725.1712.56
TRC1051163.661163.661163.831214.351190.340.000.014.362.29
TRC1061054.761059.401054.761253.971076.290.440.0018.892.04
TRC1071026.931067.671073.581204.331026.933.974.5417.270.00
TRC1081012.761012.761013.681204.331026.930.000.0918.921.40
TRC2011559.461559.461567.411712.761574.040.000.519.830.93
TRC2021507.011554.151547.121624.321507.013.132.667.780.00
TRC2031497.821497.821508.791646.901518.810.000.739.951.40
TRC2041482.911482.911491.861756.871618.000.000.6018.479.11
TRC2051586.291587.361586.291605.351635.510.070.001.203.10
TRC2061435.681445.011435.681531.421476.000.650.006.672.81
TRC2071343.051343.051343.831402.191398.730.000.064.404.15
TRC2081181.761181.761200.111402.191217.190.001.5518.653.00
Mean1513.711515.621524.831621.641531.750.150.698.111.36
The bold font indicates the minimum values.
Table 7. Comparison results for QPIG, QPIG-NL, QPIG-RS and QPIG-PS.
Table 7. Comparison results for QPIG, QPIG-NL, QPIG-RS and QPIG-PS.
InstanceBest
Value
AlgorithmsRPI
QPIGQPIG-NLQPIG-RSQPIG-PSQPIGQPIG-NLQPIG-RSQPIG-PS
TC1011609.701609.701644.041709.191711.370.002.136.186.32
TC1021726.051726.051765.661822.971817.180.002.295.625.28
TC1031665.161665.161799.511876.541809.020.008.0712.698.64
TC1041590.131590.131823.851976.831858.450.0014.7024.3216.87
TC1051649.471649.471717.101861.161769.010.004.1012.837.25
TC1061595.791595.791714.591745.001777.130.007.449.3511.36
TC1071604.301604.301751.811775.091798.980.009.1910.6512.13
TC1081561.281561.281782.141785.051790.640.0014.1514.3314.69
TC1091520.161520.161762.691738.911750.020.0015.9514.3915.12
TC2012823.612823.612851.752885.252900.560.001.002.182.73
TC2022782.362782.362829.742854.452864.940.001.702.592.97
TC2032779.812779.812834.612878.162849.530.001.973.542.51
TC2042797.822797.822865.152913.522879.870.002.414.142.93
TC2052804.192804.192829.982885.012859.390.000.922.881.97
TC2062608.942608.942647.002665.412681.700.001.462.162.79
TC2072619.962619.962667.472681.992718.800.001.812.373.77
TC2082612.202612.202651.362672.352674.990.001.502.302.40
TR1011261.101261.101265.361279.881293.190.000.341.492.54
TR1021129.821129.821144.201152.491159.710.001.272.012.65
TR1031096.061096.061107.221191.671124.810.001.028.722.62
TR1041008.831008.831034.861162.301059.190.002.5815.214.99
TR1051129.401129.401139.541211.671165.210.000.907.293.17
TR1061108.091108.091116.461131.471142.000.000.752.113.06
TR1071052.401052.401065.231078.721095.000.001.222.504.05
TR108972.06972.06982.931006.201022.770.001.123.515.22
TR1091021.671021.671031.341051.541024.250.000.952.920.25
TR1101031.551031.551049.171080.211094.200.001.714.726.07
TR1111035.511035.511045.571058.791076.570.000.972.253.97
TR112962.40962.40979.391002.321033.170.001.774.157.35
TR2011467.801467.801512.931552.741585.210.003.075.798.00
TR2021515.161515.161565.631582.001621.330.003.334.417.01
TR2031426.661426.661460.211613.331490.950.002.3513.084.51
TR2041377.511377.511409.671519.021431.620.002.3310.273.93
TR2051460.351460.351515.561609.911537.570.003.7810.245.29
TR2061425.371425.371474.071496.831513.720.003.425.016.20
TR2071337.631337.631362.551395.941403.830.001.864.364.95
TR2081285.341285.341307.341333.171328.910.001.713.723.39
TR2091306.991306.991319.291342.501363.210.000.942.724.30
TR2101449.771449.771471.811499.341505.000.001.523.423.81
TR2111196.741196.741218.241238.911251.960.001.803.524.61
TRC1011169.571169.571185.761217.831227.300.001.384.134.94
TRC1021159.091159.091182.301202.531204.940.002.003.753.96
TRC1031141.061141.061177.501353.371201.760.003.1918.615.32
TRC1041040.571040.571054.511227.281078.060.001.3417.943.60
TRC1051163.661163.661166.801244.011204.980.000.276.903.55
TRC1061059.401059.401070.431105.151116.090.001.044.325.35
TRC1071067.671067.671101.571135.601142.370.003.176.367.00
TRC1081012.761012.761045.701067.901108.690.003.255.449.47
TRC2011559.461559.461617.681671.011678.220.003.737.157.61
TRC2021554.151554.151593.701674.471645.890.002.557.745.90
TRC2031497.821497.821557.291770.061576.400.003.9718.185.25
TRC2041482.911482.911522.991739.001570.260.002.7017.275.89
TRC2051587.361587.361647.441775.651693.980.003.7811.866.72
TRC2061445.011445.011473.301519.191547.310.001.965.137.08
TRC2071343.051343.051397.261427.211437.160.004.046.277.01
TRC2081181.761181.761226.221273.421285.610.003.767.768.79
Mean1515.621515.621563.171620.031599.180.003.107.235.70
The bold font indicates the minimum values.
Table 8. The parameters for the compared algorithms.
Table 8. The parameters for the compared algorithms.
AlgorithmsParameters
IIGld = 0.2, T = 0.4
CDABCps = 130, rcal = 0, λ = 50
MPSOps = 100, ωmax = 0.95, ωmin = 0.4, c1max = 2.0, c1min = 0.5, c2max = 2.0, c2min = 0.5, v = [2,8]
Q_DPIGps = 15, ld = 13, α = 0.6, γ = 0.2, = 0.7
QIGα = 0.6, γ = 0.8, ld = 3
Table 9. Comparison performance of QPIG with five algorithms.
Table 9. Comparison performance of QPIG with five algorithms.
InstanceBest
Values
AlgorithmsRPI
QPIGIIGCDABCMPSOQ_DPIGQIGQPIGIIGCDABCMPSOQ_DPIGQIG
TC1011609.701609.701652.791704.471631.471719.961656.360.002.685.891.356.852.90
TC1021726.051726.051770.021919.691802.281833.751746.710.002.5511.224.426.241.20
TC1031665.161665.161824.251893.211811.051850.561740.900.009.5513.708.7611.134.55
TC1041590.131590.131910.701879.141852.291903.801706.770.0020.1618.1816.4919.737.34
TC1051649.471649.471791.491811.981800.371801.021752.360.008.619.859.159.196.24
TC1061595.791595.791817.571797.751782.391796.051723.970.0013.9012.6611.6912.558.03
TC1071604.301604.301833.311819.171824.621837.091758.920.0014.2713.3913.7314.519.64
TC1081561.281561.281872.651838.501826.791839.071711.750.0019.9417.7617.0117.799.64
TC1091520.161520.161899.601767.681816.861832.601733.870.0024.9616.2819.5220.5514.06
TC2012823.612823.613023.532990.163007.042959.292970.720.007.085.906.504.815.21
TC2022782.362782.363015.042969.012985.182912.952931.220.008.366.717.294.695.35
TC2032779.812779.813023.352987.332977.882893.122933.040.008.767.477.134.085.51
TC2042797.822797.823059.343030.753025.892927.202945.830.009.358.338.154.625.29
TC2052804.192804.193025.363003.303010.002914.382956.370.007.897.107.343.935.43
TC2062608.942608.942893.502900.002887.102710.702789.230.0010.9111.1610.663.906.91
TC2072619.962619.962847.942881.262863.222730.102788.660.008.709.979.284.206.44
TC2082612.202612.202885.652888.432858.872701.212776.850.0010.4710.579.443.416.30
TR1011242.531261.101242.531282.801258.361287.971247.101.490.003.241.273.660.37
TR1021126.001129.821126.001174.761139.311169.701127.090.340.004.331.183.880.10
TR1031093.831096.061093.831140.501105.171124.981100.720.200.004.271.042.850.63
TR1041008.831008.831025.371064.001029.751064.031022.700.001.645.472.075.471.37
TR1051126.851129.401126.851175.271151.801177.021133.130.230.004.302.214.450.56
TR1061108.091108.091109.961157.131126.671140.821109.060.000.174.431.682.950.09
TR1071051.551052.401051.551095.051070.001091.031059.320.080.004.141.753.750.74
TR108972.06972.06997.061031.86998.501021.49989.080.002.576.152.725.091.75
TR1091018.641021.671034.901050.851043.461070.011018.640.301.603.162.445.040.00
TR1101031.551031.551065.031073.141073.311117.151043.770.003.254.034.058.301.18
TR1111035.511035.511047.431066.981053.231097.711037.000.001.153.041.716.010.14
TR112962.40962.401001.18987.641007.921037.27975.320.004.032.624.737.781.34
TR2011467.801467.801544.001542.741544.671614.861524.640.005.195.115.2410.023.87
TR2021515.161515.161603.301601.021591.901640.281581.190.005.825.675.068.264.36
TR2031426.661426.661501.541497.501479.391502.111441.090.005.254.973.705.291.01
TR2041377.511377.511479.371460.391420.491447.211436.020.007.396.023.125.064.25
TR2051460.351460.351544.321554.261541.491554.471507.040.005.756.435.566.453.20
TR2061425.371425.371531.491520.631501.491542.171455.200.007.446.685.348.192.09
TR2071337.631337.631435.111428.361406.431420.971407.380.007.296.785.146.235.21
TR2081285.341285.341380.081376.171325.951336.691342.600.007.377.073.163.994.45
TR2091306.991306.991402.401393.781359.051370.901337.900.007.306.643.984.892.36
TR2101449.771449.771544.081513.731493.691525.011477.030.006.514.413.035.191.88
TR2111196.741196.741310.091280.971251.681282.701221.860.009.477.044.597.182.10
TRC1011169.571169.571171.071232.291190.191227.341179.060.000.135.361.764.940.81
TRC1021153.061159.091159.511217.491162.491225.451172.000.000.045.040.295.731.11
TRC1031141.061141.061172.131228.531167.451213.301181.630.002.727.662.316.333.56
TRC1041040.571040.571103.101111.571070.521096.121092.910.006.016.822.885.345.03
TRC1051163.661163.661172.511250.771167.581202.881181.600.000.767.490.343.371.54
TRC1061059.401059.401085.221150.811091.661138.331086.860.002.448.633.057.452.59
TRC1071067.671067.671121.661160.091126.241175.641110.940.005.068.665.4910.114.05
TRC1081012.761012.761090.561109.061070.601091.821073.830.007.689.515.717.816.03
TRC2011559.461559.461762.851695.611736.901723.931693.590.0013.048.7311.3810.558.60
TRC2021554.151554.151772.801663.151718.881703.241683.220.0014.077.0110.609.598.30
TRC2031497.821497.821704.261630.991652.121620.511634.440.0013.788.8910.308.199.12
TRC2041482.911482.911692.691645.441615.151629.581625.890.0014.1510.968.929.899.64
TRC2051587.361587.361796.361712.061763.471744.581738.070.0013.177.8611.099.909.49
TRC2061445.011445.011655.541627.731639.921620.911577.300.0014.5712.6513.4912.179.16
TRC2071343.051343.051566.921534.931514.681495.291476.900.0016.6714.2912.7811.349.97
TRC2081181.761181.761393.171387.371306.011361.511291.550.0017.8917.4010.5115.219.29
Mean1514.611515.621638.641641.241620.191626.251589.610.057.498.026.317.504.49
The bold font indicates the minimum values.
Table 10. Rescue path for the four selected instances.
Table 10. Rescue path for the four selected instances.
InstancesNrRescue Path
TC10590→27→26→25→28→29→31→34→33→32→30→0
0→45→43→51→42→44→47→0
0→4→3→5→6→7→8→9→10→11→0
0→52→67→66→65→63→64→68→70→69→2→1→0
0→46→38→36→35→37→39→41→40→0
0→56→54→53→49→48→50→55→0
0→16→17→15→14→13→12→0
0→19→20→18→22→24→23→21→0
0→62→60→59→57→58→61→0
TC20670→65→53→2→8→10→11→12→9→7→3→0
0→55→61→54→51→57→58→56→67→0
0→49→48→46→39→38→37→40→43→42→41→29→32→33→35→36→34→31→30→0
0→14→19→22→25→27→20→18→15→17→6→5→0
0→13→16→21→4→24→28→26→23→0
0→47→44→50→52→45→0
0→1→69→70→68→66→64→59→60→62→63→0
TR10270→65→14→5→36→13→37→28→38→15→39→62→4→0
0→64→47→8→49→48→66→26→7→25→21→0
0→53→54→24→16→51→50→55→6→61→27→41→40→1→0
0→3→69→67→68→70→35→29→11→34→33→63→10→0
0→46→2→45→12→32→58→56→57→17→31→42→0
0→20→44→18→52→30→43→0
0→22→59→23→19→60→9→0
TRC20780→50→1→4→3→5→2→70→0
0→36→61→40→56→51→55→34→68→49→0
0→9→11→31→12→10→6→8→7→0
0→63→45→69→33→39→52→54→38→59→0
0→46→17→18→16→13→14→15→44→0
0→48→41→29→30→28→26→25→27→35→65→67→58→0
0→37→53→62→43→60→66→64→0
0→42→32→23→21→19→20→22→24→47→57→0
Table 11. Friedman test results (α = 0.05) for the RPI and the rescue cost.
Table 11. Friedman test results (α = 0.05) for the RPI and the rescue cost.
AlgorithmsRanksNRPIThe Rescue Cost
MeanStd. DeviationMinMaxMeanStd. DeviationMinMax
QPIG1.16560.050.210.001.491515.62543.84962.402823.61
IIG4.36567.495.960.0024.961638.64617.56997.063059.34
CDABC4.77568.023.862.6218.181641.24601.19987.643030.75
MPSO3.64566.314.590.2919.521620.19610.65998.503025.89
Q_DPIG4.52567.504.102.8520.551626.25566.631021.492959.29
QIG2.55564.493.390.0014.061589.61590.18975.322970.72
The bold font indicates the minimum values.
Table 12. Results of the Wilcoxon signed-rank test for the RPI and the rescue cost.
Table 12. Results of the Wilcoxon signed-rank test for the RPI and the rescue cost.
QPIG vs.RPIThe Rescue Cost
p-ValueSignificantDecisionp-ValueSignificantDecision
IIG0.00YesReject H00.00YesReject H0
CDABC0.00YesReject H00.00YesReject H0
MPSO0.00YesReject H00.00YesReject H0
Q_DPIG0.00YesReject H00.00YesReject H0
QIG0.00YesReject H00.00YesReject H0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, M.; Duan, P. A Population-Based Iterative Greedy Algorithm for Multi-Robot Rescue Path Planning with Task Utility. Mathematics 2026, 14, 164. https://doi.org/10.3390/math14010164

AMA Style

Li M, Duan P. A Population-Based Iterative Greedy Algorithm for Multi-Robot Rescue Path Planning with Task Utility. Mathematics. 2026; 14(1):164. https://doi.org/10.3390/math14010164

Chicago/Turabian Style

Li, Mingming, and Peng Duan. 2026. "A Population-Based Iterative Greedy Algorithm for Multi-Robot Rescue Path Planning with Task Utility" Mathematics 14, no. 1: 164. https://doi.org/10.3390/math14010164

APA Style

Li, M., & Duan, P. (2026). A Population-Based Iterative Greedy Algorithm for Multi-Robot Rescue Path Planning with Task Utility. Mathematics, 14(1), 164. https://doi.org/10.3390/math14010164

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop