Next Article in Journal
UAVs’ Flight Dynamics Is All You Need for Wind Speed and Direction Measurement in Air
Previous Article in Journal
A Novel Neural Network-Based Adaptive Formation Control for Cooperative Transportation of an Underwater Payload Using a Fleet of UUVs
Previous Article in Special Issue
Generalising Rescue Operations in Disaster Scenarios Using Drones: A Lifelong Reinforcement Learning Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cooperative Drone and Water Supply Truck Scheduling for Wildfire Fighting Using Deep Reinforcement Learning

1
College of Field Engineering, Army Engineering University of PLA, Nanjing 210007, China
2
School of Information Science and Technology, Hangzhou Normal University, Hangzhou 311121, China
*
Author to whom correspondence should be addressed.
Drones 2025, 9(7), 464; https://doi.org/10.3390/drones9070464
Submission received: 24 May 2025 / Revised: 28 June 2025 / Accepted: 29 June 2025 / Published: 30 June 2025
(This article belongs to the Special Issue Unmanned Aerial Vehicles for Enhanced Emergency Response)

Abstract

Wildfires often spread rapidly and cause significant casualties and economic losses. Firefighting drones carrying water capsules provide an efficient way for wildfire extinguishing, but their operational capabilities are limited by their payloads. This weakness can be compensated by using ground vehicles to provide mobile water supply. To this end, this paper presents an optimization problem of scheduling multiple drones and water supply trucks for wildfire fighting, which allocates burning subareas to drones, routes drones to perform fire-extinguishing operations in burning subareas and reload water between every two consecutive operations, and routes trucks to provide timely water supply for drones. To solve the problem within the limited emergency response time, we propose a deep reinforcement learning method, which consists of an encoder for embedding the input instance features and a decoder for generating a solution by iteratively predicting the subarea selection decision through attention. Computational results on test instances constructed upon real-world wilderness areas demonstrate the performance advantages of the proposed method over a collection of heuristic and metaheuristic optimization methods.

1. Introduction

Wildfires, such as forest fires and mountain fires, often spread rapidly and cause significant casualties and economic losses. Due to their high speed, flexibility, and accessibility, drones (unmanned aerial vehicles) have been increasingly used in firefighting operations in recent years [1,2,3]. Firefighting drones are typically high-payload drones, which can carry and disperse water or fire-extinguishing compounds to suppress fire in quite an efficient way. There are two common forms of firefighting drones. In the first form, a drone carries a hose that is connected to a ground facility, such that powerful water flow can be directly drawn from the ground water source for firefighting. However, the hose length and the water pressure strictly limit the operational range. In the second form, a drone carries a water capsule or fire-extinguishing bomb(s), which can be considerably more flexible than the first form. Nevertheless, the volume of water or fire-extinguishing compounds a drone can carry at a time is limited, and the drone often needs to fly back and forth between fire spots and water sources.
This study focuses on the second form, where firefighting drones carry water capsules to the fire site for fire-extinguishing. To compensate for its weakness, our approach considers using ground vehicles (water supply trucks) to provide mobile water supply for drones, such that the time consumed on water reloading can be significantly shortened and the efficiency of firefighting can be significantly improved. In this scenario, however, the cooperative scheduling of drones and water supply trucks, which needs to determine drone paths, truck paths, as well as docking locations where drones reload water from trucks, can be a challenging problem.
In this paper, we formulate an optimization problem of scheduling multiple drones and water supply trucks for firefighting in a wilderness area consisting of a subset of ignitable subareas and a subset of water resource subareas, where the fire dynamics is characterized using the wildfire spread model established in our previous work [4]. Each burning subarea will be allocated to a batch of drones for fire-extinguishing, and each drone will select either a truck or a water resource subarea to reload water between every two consecutive burning subareas in its operational sequence, as illustrated in Figure 1. The problem objective is to minimize the completion time of the operation, i.e., the time at which the fire is completely extinguished. To solve the problem within the limited emergency response time, we propose a deep reinforcement learning (DRL) network, which consists of an encoder for embedding the input instance features and a decoder for generating a solution by iteratively predicting the subarea selection decision through attention. We conduct extensive computational experiments on test instances constructed upon three wilderness areas in Zhejiang Province, China, and the results demonstrate the performance advantages of the proposed method over a collection of heuristic and metaheuristic optimization methods. The main contributions of this paper can be summarized as follows:
  • We present an optimization problem of cooperatively scheduling drones and water supply trucks for wildfire fighting, which is increasingly popular but has been rarely studied in the literature.
  • To meet the emergency response requirement, we propose a DRL method, which encodes a roadway network, airway network, environmental features, water supply information, and drone and truck features into high-level embeddings, which are then iteratively decoded to generate sequential decisions for the problem.
  • We demonstrate the performance advantages of the proposed method compared to the state-of-the-art heuristic and metaheuristic optimization methods.
In the remainder of this paper, we review the related work in Section 2, formulate the cooperative drone–truck scheduling problem for wildfire fighting in Section 3, and propose the DRL method in Section 4; Section 5 presents the experimental results, and finally Section 6 concludes with discussions.

2. Related Work

Cooperative scheduling of drones and water supply trucks for firefighting, as to our knowledge, has not been reported in the literature. Here, we discuss related work in two aspects: (1) drone scheduling for firefighting; (2) cooperative drone–truck scheduling for other tasks (mainly delivery tasks).

2.1. Drone Scheduling for Firefighting

Early applications of drones in fire management mainly focused on sensing and monitoring tasks [1,5,6]. An early work of Kumar et al. [7] considered fire monitoring and fighting tasks performed by drones based on optimization of respective utility functions; however, they made an assumption that a drone can take unlimited fire suppressing fluid, which never holds in realistic situations. Only in recent years, with the increase in their payloads, drones have been directly used in fire extinguishing tasks. Ghamry et al. [8] developed a particle swarm optimization (PSO) algorithm for assigning drones to fire spots according to their relative distances and then planning each drone path to minimize the travel distance. However, the separation of two sub-problems might miss the optimal solution to the complete problem. Ausonio et al. [9] proposed a forest firefighting system that used a swarm of drones to generate a continuous flow of extinguishing liquid on the fire front, which was supported by battery replacement and extinguishing liquid refill. Yu et al. [10] used a scheduling algorithm to group drones for wildfire extinguishing, which was test on cases in New South Wales and East Victoria in Australia. Chen et al. [11] proposed a collaborative multi-drone scheduling framework for integrated sensing and operation in large-scale wildfires, which contained a spatio-temporal confidence-aware assessment model for pinpoint location and a priority graph-instructed scalable scheduler for drone coordination. Tan et al. [12] proposed a warm-up heuristic to schedule a limited number of drones to complete a number of firefighting tasks, but the heuristic cannot be extended to large-scale instances. Zhu et al. [13] proposed an adaptive multiple drone swarm collaborative firefighting strategy, which used a temperature change-driven adaptive step-length search strategy to detect fire spots and employed an emergency bidding algorithm to collaborate multiple drone swarms under limited resources. Also considering the scheduling of multi-swarm drones for forest firefighting, John et al. [14] proposed an information-driven search and divide and conquer mitigation control approach, where the local attraction among the swarm members helped the non-detector members reach the fire location faster, and the divide-and-conquer mitigation control ensured a non-overlapping fire sector allocation for all members quenching the fire. In [4], the authors presented a mathematical model for estimating wildfire spread and economic losses simultaneously, which can also help to determine the minimum number of firefighting drones in preparation for wildfire; based on the model, a metaheuristic optimization algorithm was used to schedule a limited number of drones in response to wildfire occurrence to minimize the total expected loss. Zheng et al. [15] studied a problem of cooperatively scheduling inspection drones and deicing drones for power grid deicing in icing disasters; the authors proposed a fully parallelizable evolutionary algorithm combining global search without individual interaction and adaptive local search that uses a fuzzy inference system to determine the operator to be applied on each solution. In [16], a hybrid memetic optimization and DRL method was proposed to cooperative schedule drones for railway catenary deicing to minimize the total negative effect caused by the freezing events on train operations, where DRL was used to adaptively select the most appropriate neighborhood search operators.

2.2. Cooperative Drone–Truck Scheduling

Drone–truck cooperation has been popularized in logistics to achieve complementary advantages [17]. To solve a collaborative truck–drone routing problem for contactless parcel delivery, Wu et al. [18] proposed an improved variable neighborhood descent that combines the Metropolis acceptance criterion of simulated annealing and tabu search, where the initial solution was generated based on K-means clustering and nearest neighbor. In [19], the authors proposed an encoder-decoder framework combined with reinforcement learning for truck-and-drone coordinated delivery routing, where all final deliveries are completed by drones while the truck acts as a movable charging station and a carrier. Weng et al. [20] considered cooperative truck–drone delivery in a restricted traffic zone, where the truck travels along the outer boundary of the zone to send and receive the drone for delivering the cargo to customers; the problem was solved by an adapted water wave optimization (WWO) algorithm. The truck–drone hybrid delivery model presented by Young Jeong and Lee [21] also used a truck to launch and collect multiple drones: the truck did not visit customers but moved between launch positions, looking for the best location to launch the drones. Liu et al. [22] proposed another truck–drone delivery model with multiple trucks, each equipped with a multi-visit drone; they designed a variable neighborhood search algorithm integrated with simulated annealing to solve the problem by constructing drone and truck routes synchronously. Recently, Lv et al. [23] presented a problem of collaborative human–drone–truck search-and-rescue task scheduling in earthquakes, where drones are used to quickly search for survivors who are expected to be ultimately rescued by human rescuers, while a truck serves as a mobile battery depot for drones, such that the survivors can be rescued as many as possible and as early as possible. To solve the problem, a memetic algorithm that combines the ecogeography-based optimization (EBO) for global exploration, adaptive local search for improving solution accuracy, and a modified simulated annealing for truck path planning.
The key difference between our study and existing studies on drone–truck cooperation for parcel (good) delivery is that the water taken by drones from supply trucks will be consumed in fire extinguishing, and the water should be released in the appropriate amount at the appropriate time in order to have effect on suppressing the fire. To the best of our knowledge, no existing studies have been conducted to solve this kind of problem.

3. Problem Formulation

A wilderness area is divided into a set of subareas based on topographic features (e.g., one subarea of grassland and one subarea of shrubland) and boundaries (e.g., two subareas separated by a river). Among these subareas, two subsets are identified:
  • The first subset of m ignitable subareas (e.g., woodlands, grasslands, and shrublands), denoted by A = { A 1 , A 2 , , A m } .
  • The second subset of m ˜ water resource subareas (e.g., pools, lakes, and rivers), denoted by A ˜ = { A m + 1 , A m + 2 , , A m + m ˜ } .
At the beginning time t = 0 , one or several fire (ignition) points are observed in A . Our previous work [4] has established a wildfire spread model, which can be used to estimate
  • For each ignitable subarea A i initially without fire, the expected time t ig ( i ) that A i will be ignited;
  • For each ignitable subarea A i , the heat release rate θ ( i , t ) at any time t t ig ( i ) .
The problem is to schedule a set D of n D firefighting drones and a set W of n W water supply trucks to extinguish the wildfire. The capacity (water volume) of a water capsule carried by a drone is q, the capacity of a water supply truck is Q, and the capacity of a water resource subarea A i is Q i . The volume of water needed for extinguishing the fire in each ignitable subarea A i at time t is proportional to its area a i and heat release rate θ ( i , t ) . Nonetheless, a very accurate calculation of water volume for fire extinguishing is unnecessary; what we need to determine is the number of water capsules (i.e., number of drones) for extinguishing the fire in subarea A i at time t, which is estimated as
N ( i , t ) = c 1 a i θ ( i , t ) / q , θ ( i , t ) < θ ^ c 2 a i θ ( i , t ) / q , θ ( i , t ) θ ^
where denotes rounding up to the closest integer, c 1 and c 2 are two constant coefficients, and θ ^ is a threshold of heat release rate: when the heat release rate reaches the threshold, the fire development enters into the full combustion (flashover) stage [24] and the fire extinguishing becomes significantly more difficult. Therefore, c 2 is larger than c 1 .
Without loss of generality, we use A 0 to denote the fire station at which the drones and trucks are initially located (which can be easily extended to different initial locations of the drones and trucks). A drone can fly directly from a subarea to another, whereas a truck travels along the road. Therefore, we differentiate the following travel times between any two subareas A i and A i ( i , i { 0 , 1 , , m + m ˜ } ):
  • Δ t D ( i , i ) consumed by a drone carrying a fully loaded water capsule from A i to A i .
  • Δ t ¯ D ( i , i ) consumed by a drone carrying an empty water capsule from A i to A i .
  • Δ t W ( i , i ) consumed by a fully loaded water supply truck from A i to A i .
  • Δ t ¯ W ( i , i ) consumed by an empty water supply truck from A i to A i .
When the water volume carried by a water supply truck is Q , the corresponding travel time from A i to A i is estimated as
Δ t W ( i , i , Q ) = Δ t ¯ W ( i , i ) Δ t ¯ W ( i , i ) Δ t W ( i , i ) Q Q
The first decision of the problem is a sequence x = { x 1 , x 2 , , x m } of the m ignitable subareas, in order of which the firefighting operations are conducted. Let Δ t F be the time duration for a drone to release water. For the first subarea x 1 in the sequence, the earliest arrival time t ( x 1 ) and the departure time t ( x 1 ) of a drone are as Equations (3) and (4), respectively:
t ( x 1 ) = Δ t D ( 0 , x 1 )
t ( x 1 ) = t ( x 1 ) + Δ t F
The number of drones required for firefighting in x 1 is
N ( x 1 ) = N ( i , t ( x 1 ) )
Here, we use D 1 ( x 1 ) to denote the first batch of N ( x 1 ) drones for firefighting in x 1 , and use D 0 ( x 1 ) = D D 1 ( x 1 ) to denote the set of remaining n D N ( x 1 ) available drones in A 0 . For the second batch D 2 ( x 2 ) , if the number of drones in D 0 ( x 1 ) is sufficient, the drones are all selected from D 0 ( x 1 ) ; otherwise, some drones should be selected from D 1 ( x 1 ) . Each of such drones, denoted by d, after completing its current task in x 1 , should choose either a water resource subarea A i from A ˜ or a water supply truck w from the non-empty truck subset W ˜ W to reload water, such that it can arrive at the target x 2 at the earliest time denoted by t ( d , x 2 ) :
t ( d , x 2 ) = t ( x 1 ) + min ( min A i ˜ A ˜ ( Δ t ¯ D ( 1 , i ˜ ) + Δ t D ( i ˜ , 2 ) ) , min w W ˜ ( Δ t ¯ D ( 1 , w d ) + Δ t D ( w d , 2 ) ) ) + Δ t L
where Δ t L denotes the time duration for reloading a water capsule.
After determining the second batch D 2 ( x 2 ) , we update the sets of available drones in A 0 and A 1 as D 0 ( x 2 ) = D 0 ( x 1 ) D 2 ( x 2 ) and D 1 ( x 2 ) = D 1 ( x 1 ) D 2 ( x 2 ) , respectively.
By analogy, for the i-th batch D i ( x i ) , if the number of available drones in D 0 ( x i 1 ) is insufficient (typically, there will be no drone left in A 0 after several batches, otherwise the problem is trivial), the drones should be selected from earlier batches D 1 ( x i 1 ) , , D i 1 ( x i 1 ) ; if a drone d is selected from the i -th batch ( i < i ), it should choose either a water resource subarea or a water supply truck to reload water, such that it can arrive at the target x i at the earliest time t ( d , x i ) :
t ( d , x i ) = t ( x i ) + min ( min A i ˜ A ˜ ( Δ t ¯ D ( i , i ˜ ) + Δ t D ( i ˜ , i ) ) , min w W ˜ ( Δ t ¯ D ( i , w d ) + Δ t D ( w d , i ) ) ) + Δ t L
For each i-th batch D i ( x i ) , we iteratively select a drone d with the earliest arrival time on x i among all candidate ones until the number of drones is sufficient:
d = min d ι = 0 i 1 D ι ( x ι ) t ( d , x i )
The location of a water resource subarea is fixed, whereas the location of a water supply truck is movable. In Equations (6) and (7), w d is the docking location of the drone d and truck w for water reloading, which is determined by starting from truck location w ( t ) at time t = 0 if the truck has never been assigned or t = t w when the truck completes the last water supply task, searching along all road directions RD towards (closer to) either the current drone location x i or the target subarea x i , such that t ( d , x i ) is minimized, as illustrated in Figure 2. Let Q w ( t ) be the water volume carried by the truck w at time t, the problem of docking location selection can be formulated as
w d = min R RD min p R max t ( x i ) + t ¯ D ( i , p ) , t w + t W ( w ( t w ) , p , Q w ( t w ) ) + t D ( p , x i )
which can be optimized efficiently using the Gaussian binary simulated annealing method proposed in [23]. Afterwards, for the selected truck w, its water volume will be reduced by q for each drone it supplied; whenever its water volume is smaller than q, the truck goes to the nearest water source subarea (including A 0 ) to reload water; in this case, w is removed from the non-empty truck set W ˜ and, according to Equation (7), could not be chosen by drones until the truck has been reloaded.
After determining the i-th batch D i ( x i ) , we update the sets of available drones in previous subareas as follows ( 0 ι < i ):
D ι ( x i ) = D ι ( x i 1 ) D i ( x i )
Thus, we obtain the earliest arrival time t ( x i ) and the departure time t ( x i ) of the i-th batch drones for subarea x i as Equations (11) and (12), which are exactly the beginning time and the end time of the firefighting operation in x i , respectively:
t ( x i ) = min d D i ( x i ) t ( d , x i )
t ( x i ) = max d D i ( x i ) t ( d , x i ) + Δ t F
Note that a firefighting operation will change the fire spread. At each t ( x i ) , we remove the fire in subarea x i , and re-invoke the wildfire spread model [4] to estimate the fire spread after t ( x i ) .
During the above iterative process, if the current subarea x i is not ignited, we swap x i and the next ignited subarea x i and thus continue the iterative calculations using Equations (7)–(12). If all remaining subareas x i , x i + 1 , , x m are not ignited, the whole operation is completed, and their firefighting ending times t ( · ) are simply set to 0.
The problem objective is to minimize the time at which the wildfire is completely extinguished:
min f ( x ) = min max 1 i m t ( x i )
s . t . t x ig ( x i ) < t , t > t ( x i ) , 0 < i m
x P m ( { A 1 , A 2 , , A m } )
where t x ig ( x i ) denotes the (last) ignition time in subarea x i under the fire extinguishing solution x , and P m denotes the set of all permutations of a given set. The constraint (14) indicates that after a fire extinguishing operation in x i at t ( x i ) , the subarea should not be reignited. Given a solution x , the flowchart for evaluating the objective function f ( x ) in Equation (13) is shown by Figure 3.
Note that our problem formulation does not explicitly consider battery consumption of drones. In practice, water supply trucks can also carry batteries (whose weight is significantly smaller than the weight of water): if the battery level of a drone is below a threshold, it can reload water and replace the battery simultaneously. Typically, during an operation, the frequency of battery replacement is significantly lower than that of the water reloading, and hence battery replacement will have a trivial effect on the whole schedule. In addition, trucks can serve as mobile base stations to enhance the communication between drones [25,26,27].

4. Deep Reinforcement Learning for the Problem

The proposed DRL network consists of an encoder and a decoder. The encoder takes a problem instance as the input and learns input features through embedding. The decoder generates a solution to the problem instance by iteratively predicting the subarea selection decision from the current state through attention. The architecture of the learning network is illustrated in Figure 4.

4.1. Encoder

The input to the encoder consists of the following parts:
  • A roadway network G W used by water supply trucks, which is represented by a weighted adjacency matrix that saves the truck travel time on each edge (roadway segment).
  • An airway network G D used by drones, also represented by a weighted adjacency matrix that saves the drone travel time on each edge (pair of vertices). The vertices of G D include not only the subareas, but also the vertices of G W .
  • T environmental feature vectors, each V t E of which saving the temperature, humidity, wind force, and wind direction at time t during the decision period T ( t = 0 , 1 , , T 1 ).
  • m ignited subarea feature vectors, each V i S saving the area, combustible vegetation density, total combustion heat, and initial ignition state (true or false) of an ignited subarea A i ( i = 1 , 2 , , m ).
  • A water volume vector V ω that saves the water volume of each water supply subarea A i ˜ ( i ˜ = 0 , m + 1 , , m + m ˜ ).
  • The number n D of drones and number n W of trucks.
G W is processed by a graph neural networks (GNN) to generate the embedding, which is then concatenated with n W to a hidden representation H W . Similarly, G D is processed by another GNN to generate the embedding, which is concatenated with n D to a hidden representation H D . The environmental feature vectors are iteratively processed by a recurrent neural network (RNN), where the t-th step takes both V t E and the output of the ( t 1 ) -th step as the input ( 1 t < T ); finally, the outputs of all steps are embedded into a hidden representation H E . The ignited subarea feature vectors are also iteratively processed by a RNN to generate a hidden representation H S . H W , H D , H E , H S , together with the water volume vector V ω , are concatenated and fed into a convolution neural network, whose topmost representation H will be decoded by the decoder to construct the solution to the problem.

4.2. Decoder

The decoder performs m steps of decoding, each deciding the subarea x i to be added to the subarea sequence in the solution ( 1 i m ). At the first step, the subarea embedding H S and the water volume vector V ω are concatenated and fed into a RNN; the RNN output together with the topmost representation H of the encoder are sent to an attention module, which produces the probability (normalized by a Softmax function) of each subarea of being selected, and the subarea with the maximum probability is selected as x 1 . At each following ( i + 1 ) -th step, H S is reconstructed by excluding the subarea x i selected at the previous step, and V ω is reconstructed by subtracting the water volume used by fire extinguishing in x i ; the concatenation of the updated H S and V ω , together with the RNN output at the previous step, are re-fed into the RNN, and then the attention module produces the probabilities p ( x i + 1 = A ι | x 1 , x 2 , , x i ) ( 1 ι m ) for selecting x i + 1 . The procedure continues until the solution is complete.

4.3. Training Method

The training of the DRL network, parameterized by θ , is to minimize the expected value of objective function (13) of the solution x generated according to the network policy p θ for any problem instance (state) s:
L ( θ | s ) = E x p θ ( _ | s ) f ( x | s )
Given a baseline base ( s ) , the gradient of the loss function is:
θ L ( θ | s ) = E x p θ ( _ | s ) ( f ( x | s ) base ( s ) ) θ log p θ ( x | s )
Using Monte Carlo sampling of B instances { s 1 , s 2 , , s B } from the problem distribution S, the gradient can be approximated as
θ L ( θ ) = j = 1 B ( f ( x j | s j ) base ( s j ) ) θ log p θ ( x j | s j )
We employ the policy gradient with rollout baseline algorithm [28] to optimize the network parameters θ according to (18). The algorithm randomly initializes θ and uses the best network θ found so far as the baseline. At each epoch, the two networks are simultaneously tested on a batch of instances, and θ is improved using the Adam optimizer [29]; if θ performs significantly better than θ on the batch, θ is updated by θ ; otherwise, if θ is not updated for a consecutive number e ^ of epochs, θ is rolled back to θ . The pseudo-code of the training method is presented in Algorithm 1. The time complexity of the algorithm is O ( epoch max A iter B D ) , where A iter is the number of iterations used by Adam in the inner loop, and D is the dimensionality of the data.
Algorithm 1: The policy gradient with rollout baseline algorithm for training the network.
Drones 09 00464 i001

5. Computational Results

We select three wilderness areas, one belonging to the Hangzhou West Mountain Forest Park and two belonging to the Tianmu Mountain Nature Reserve, all in Zhejiang Province, China. Their subarea information is summarized in Table 1. The function parameters of drones are set based on the specification of the Spider H200 UAV (https://spideruav.com/product/agricultural-drone/agri-drone-h200/, accessed on 15 April 2025). Currently, the first two areas are equipped with 10 drones and the last is equipped with 15 drones; we reasonably scale the range to test the performance under different numbers of drones. The number of trucks is set according to the guideline that one truck serves for around 8–10 drones. For each wilderness area, an instance of the proposed DRL network is established and trained by a wide set of problem instances generated by setting different initial ignition subareas, different wind and temperature conditions, and different numbers of drones and water supply trucks. The numbers of samples for training the three network instances are 200, 350, and 560, respectively.
For comparison, we implement the following heuristic/metaheuristic permutation optimization methods:
  • Nawaz–Enscore–Ham (NEH) heuristic [30].
  • Suliman heuristic [31] to solve each instance x , and use the better one as the base ( x ) .
  • Discrete differential evolution (DE) metaheuristic [32].
  • EBO metaheuristic [33].
  • WWO metaheuristic [34] adapted for permutation optimization [35].
  • Variable neighborhood search (VNS) algorithm [36].
  • A memetic algorithm (denoted by Meme) for permutation optimization [37].
For each wilderness area, we choose five test instances with fire scales which increase from small to large, and the numbers of drones and trucks increase with the fire scale, as summarized in the last three columns in Table 1. For each of the seven heuristic/metaheuristic algorithms, we respectively record its results after one minute and three minutes of CPU running time on each instance: for an emergency firefighting operation, the solution time is expected to be within one minute and at most three minutes. The computational environment is a workstation with an Intel Core i9-13900 3.0 GHz CPU, one NVIDIA RTX 4090Ti 32 GB GDDR6X GPU, and 128 G DDR5 5600 MHz RAM. The operating system is Microsoft Windows 10. The algorithm is implemented with Python 3.8.5 and PyTorch 2.7.
Figure 5, Figure 6 and Figure 7 present the box plots of the results, which show the median (in yellow line), average (in green triangle), minimum, maximum, first quartile (Q1), and third quartile (Q3) of the objective function values obtained by each comparative method on each test instance over the 30 runs. Table 2 presents the CPU time consumed by DRL on the each test instance. Given an instance to be solved, the trained DRL network can produce a solution very quickly, typically within 30 s, which is significantly shorter than the time consumed by those heuristic/metaheuristic algorithms.
On the small-size instances A-1 and A-2, DRL always obtains the optimal solutions. Among the seven heuristic/metaheuristic algorithms, on instance A-1, DE, EBO, WWO and Meme also obtain the optimal solution after three minutes, but their results after one minute are worse; on instance A-2, only Meme always obtains the optimal solution after three minutes, but its result after one minute is worse.
On the remaining instances, none of the methods can guarantee the optimal solutions. On instances A-3 and B-1, the results of Meme after three minutes are better than the results of DRL, while the the results of Meme after one minute as well as the results of the other six comparative algorithms are worse than the results of DRL. On all other instances, the results of DRL are significantly better than those of all seven comparative algorithms. On the largest-size instance C-5, in terms of median objective function values, the worst solution of NEH needs 695 min to extinguish the fire, the best solutions among the comparative algorithms except DRL need 621 and 520 min after running one and three minutes, respectively, and the solution of DRL needs only 424 min.
With increasing problem instance size, the solution space increases exponentially. NEH and Suliman heuristics start from a single solution and then iteratively try to improve the solution by heuristic operations such as reinsertion and permutation, which cannot explore the whole solution space effectively. Consequently, except for the smallest instance A-1, the performance of NEH and Suliman are unacceptable on all other instances. By using a population of solutions to simultaneously explore the solution space, the five metaheuristic algorithms are significantly more effectively than the two heuristics, and are expected to obtain good or relatively good results when the algorithms converge. Nevertheless, the metaheuristics typically need a considerable time to converge, whereas the solution time for emergency decisions such as firefighting scheduling is quite limited [38]. Using one or three minutes, the results obtained by these metaheuristics are still far from satisfactory on large-size instances. Among the five metaheuristics, the memetic algorithm combining global search and neighborhood search exhibits the best performance. VNS places most emphasis on neighborhood search, and hence exhibits relatively good performance on small-size instances, but its performance deteriorates quickly on large-size instances. EBO has a good global exploration ability in early search stages, but its solutions improve slowly due to the insufficient local search ability in late stages. DE also has a good global exploration ability, and it performs better than EBO because its crossover operator is more effective in escaping local optima. WWO balances global and local search using wavelength-based propagation and neighborhood-based breaking, and its overall performance is only worse than Meme and is better than the other three metaheuristics.
Unlike the heuristic/metaheuristic algorithms that explicitly use one or a population of individuals to search in the solution space, the proposed DRL learns the implicit mapping from input features of the problem instances to high-quality solutions, and uses this mapping to directly construct a solution for a given new instance. Thus, the solution time of DRL is mainly consumed in instance encoding and solution decoding, which is linearly or quasi-linearly (rather than exponentially) proportional to the instance size. For small-size instances, DRL learns the mapping from the state space to the solution space sufficiently and thus can produce the optimal or near-optimal solutions. For large-size instances, DRL can also produce acceptable solutions that are significantly better than the solutions of those heuristic/metaheuristic algorithms that can only explore a small portion of the larger solution space.

6. Conclusions

This paper studies a problem of cooperatively scheduling drones and trucks for extinguishing dynamic spreading wildfire, where the trucks provide a mobile water supply to support firefighting operations of the drones. The problem objective is to minimize the time in which the fire is completely extinguished. To efficiently solve this problem within the limited emergency response time, the proposed DRL method combines GNN and RNN to encode a roadway network, airway network, environmental features, water supply information, and drone and truck features into high-level embeddings, which are then iteratively decoded through RNN and attention to generate sequential decisions. Computational results demonstrate the significant performance advantages of DRL over the selected comparative heuristic and metaheuristic algorithms on the test instances constructed based upon real-world wilderness areas in Zhejiang Province, China.
This study focuses on the scenario where drones carrying water capsules for firefighting. Our ongoing work is extending the scenario to including drones carrying fire-extinguishing bombs and drones carrying water hoses, both of which can benefit from using trucks as mobile water resources. Our future work will also consider integrating firefighting operations with rescuing victims, which requires cooperative scheduling of drones, ground vehicles, and human rescuers [39,40,41].

Author Contributions

Conceptualization, L.-Y.B. and Y.-J.Z.; methodology, X.-Y.C. and Y.-J.Z.; software, L.-Y.B.; validation, X.-Y.C. and H.-F.L.; investigation, H.-F.L.; data curation, L.-Y.B.; writing—original draft preparation, L.-Y.B.; writing—review and editing, Y.-J.Z.; visualization, X.-Y.C.; funding acquisition, Y.-J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under grant number 62372148.

Data Availability Statement

The datasets used in this paper can be downloaded from https://www.compintell.cn/en/dataAndCode.html, accessed on 30 June 2025.

DURC Statement

Current research is limited to the use of drones in fire control, which is beneficial and does not pose a threat to public health or national security. The authors acknowledge the dual-use potential of the research and confirm that all necessary precautions have been taken to prevent potential misuse. As an ethical responsibility, authors strictly adhere to relevant national and international laws about DURC. Authors advocate for responsible deployment, ethical considerations, regulatory compliance, and transparent reporting to mitigate misuse risks and foster beneficial outcomes.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

DRLDeep reinforcement learning
PSOParticle swarm optimization
WWOWater wave optimization
NEHNawaz–Enscore–Ham
DEDiscrete differential evolution
EBOEcogeography-based optimization
VNSVariable neighborhood search
GNNGraph neural network
RNNRecurrent neural network

References

  1. Akhloufi, M.A.; Couturier, A.; Castro, N.A. Unmanned Aerial Vehicles for Wildland Fires: Sensing, Perception, Cooperation and Assistance. Drones 2021, 5, 15. [Google Scholar] [CrossRef]
  2. Roldán-Gómez, J.J.; González-Gironda, E.; Barrientos, A. A survey on robotic technologies for forest firefighting: Applying drone swarms to improve firefighters’ efficiency and safety. Appl. Sci. 2021, 11, 363. [Google Scholar] [CrossRef]
  3. Mohd Daud, S.M.S.; Mohd Yusof, M.Y.P.; Heo, C.C.; Khoo, L.S.; Chainchel Singh, M.K.; Mahmood, M.S.; Nawawi, H. Applications of drone in disaster management: A scoping review. Sci. Justice 2022, 62, 30–42. [Google Scholar] [CrossRef] [PubMed]
  4. Wu, R.Y.; Xie, X.C.; Zheng, Y.J. Firefighting drone configuration and scheduling for wildfire based on loss estimation and minimization. Drones 2024, 8, 17. [Google Scholar] [CrossRef]
  5. Jemmali, M.; Loai Kayed, B.M.; Boulila, W.; Amdouni, H.; Alharbi, M.T. Optimizing Forest Fire Prevention: Intelligent Scheduling Algorithms for Drone-Based Surveillance System. Proc. Comput. Sci. 2023, 225, 1562–1571. [Google Scholar] [CrossRef]
  6. Liu, W.; Lyu, S.K.; Liu, T.; Wu, Y.T.; Qin, Z. Multi-Target Optimization Strategy for Unmanned Aerial Vehicle Formation in Forest Fire Monitoring Based on Deep Q-Network algorithm. Drones 2024, 8, 201. [Google Scholar] [CrossRef]
  7. Kumar, M.; Cohen, K.; HomChaudhuri, B. Cooperative Control of Multiple Uninhabited Aerial Vehicles for Monitoring and Fighting Wildfires. J. Aerosp. Comput. Inf. Commun. 2011, 8, 1–16. [Google Scholar] [CrossRef]
  8. Ghamry, K.A.; Kamel, M.A.; Zhang, Y. Multiple UAVs in forest fire fighting mission using particle swarm optimization. In Proceedings of the 2017 International Conference on Unmanned Aircraft Systems (ICUAS), Miami, FL, USA, 13–16 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1404–1409. [Google Scholar] [CrossRef]
  9. Ausonio, E.; Bagnerini, P.; Ghio, M. Drone Swarms in Fire Suppression Activities: A Conceptual Framework. Drones 2021, 5, 17. [Google Scholar] [CrossRef]
  10. Yu, Q.; He, H.; Li, M.; Hou, D.; Zhang, J.; Wang, X. Research on UAV Scheduling Optimization in the Forest Fire. In Lecture Notes on Data Engineering and Communications Technologies, Proceedings of the International Conference Machine Learning and Big Data Analytics for IoT Security and Privacy, Online, 30 October 2021; Macintyre, J., Zhao, J., Ma, X., Eds.; Springer: Cham, Switzerland, 2022; pp. 770–777. [Google Scholar] [CrossRef]
  11. Chen, X.; Xiao, Z.; Cheng, Y.; Hsia, C.C.; Wang, H.; Xu, J.; Xu, S.; Dang, F.; Zhang, X.P.; Liu, Y.; et al. SOScheduler: Toward Proactive and Adaptive Wildfire Suppression via Multi-UAV Collaborative Scheduling. IEEE Internet Things J. 2024, 11, 24858–24871. [Google Scholar] [CrossRef]
  12. Tan, Q.; Wu, N.; Wu, X. Optimization-Based (UAV) Scheduling Model for Wildfire Management. In Lecture Notes in Networks and Systems, Proceedings of the 2nd International Conference Frontiers of Robotics and Software Engineering, Guiyang, China, 14–16 June 2024; Hu, J., Zhang, J., Eds.; Hu, J., Zhang, J., Eds.; Springer: Singapore, 2025; pp. 52–59. [Google Scholar] [CrossRef]
  13. Zhu, P.; Song, R.; Zhang, J.; Xu, Z.; Gou, Y.; Sun, Z.; Shao, Q. Multiple UAV Swarms Collaborative Firefighting Strategy Considering Forest Fire Spread and Resource Constraints. Drones 2025, 9, 17. [Google Scholar] [CrossRef]
  14. John, J.; Harikumar, K.; Senthilnath, J.; Sundaram, S. An Efficient Approach With Dynamic Multiswarm of UAVs for Forest Firefighting. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 2860–2871. [Google Scholar] [CrossRef]
  15. Zheng, Y.J.; Zhang, Z.Y.; Yan, J.Y.; Sheng, W.G. Cooperative UAV Scheduling for Power Grid Deicing Using Fuzzy Learning and Evolutionary Optimization. IEEE Open J. Ind. Appl. 2025, 6, 15–33. [Google Scholar] [CrossRef]
  16. Zheng, Y.J.; Xie, X.C.; Zhang, Z.Y.; Shi, J.T. Deep reinforcement learning assisted memetic scheduling of drones for railway catenary deicing. Swarm Evol. Comput. 2024, 91, 101719. [Google Scholar] [CrossRef]
  17. Zheng, Y.J.; Liu, H.; Zhang, H.; Chen, S. Guest Editorial Introduction to the Special Issue on Intelligent Transportation Systems in Epidemic Areas. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25059–25061. [Google Scholar] [CrossRef]
  18. Wu, G.; Mao, N.; Luo, Q.; Xu, B.; Shi, J.; Suganthan, P.N. Collaborative Truck-Drone Routing for Contactless Parcel Delivery During the Epidemic. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25077–25091. [Google Scholar] [CrossRef]
  19. Wu, G.; Fan, M.; Shi, J.; Feng, Y. Reinforcement Learning Based Truck-and-Drone Coordinated Delivery. IEEE Trans. Artif. Intell. 2023, 4, 754–763. [Google Scholar] [CrossRef]
  20. Weng, Y.Y.; Wu, R.Y.; Zheng, Y.J. Cooperative Truck-Drone Delivery Path Optimization under Urban Traffic Restriction. Drones 2023, 7, 59. [Google Scholar] [CrossRef]
  21. Young Jeong, H.; Lee, S. Drone routing problem with truck: Optimization and quantitative analysis. Expert Syst. Appl. 2023, 227, 120260. [Google Scholar] [CrossRef]
  22. Liu, Y.; Shi, J.; Luo, Z.; Hu, X.; Pedrycz, W.; Liu, Z. Cooperated Truck-Drone Routing With Drone Energy Consumption and Time Windows. IEEE Trans. Intell. Transp. Syst. 2024, 25, 20390–20404. [Google Scholar] [CrossRef]
  23. Lv, K.C.; Zhang, Z.Y.; Bai, L.Y.; Jiang, X.L.; Zheng, Y.J. Memetic Optimization of Collaborative Human-UAV-Truck Search-and-Rescue Task Scheduling in Earthquakes. Unmanned Syst. 2025, 13, 1–21. [Google Scholar] [CrossRef]
  24. Filkov, A.I.; Tihay-Felicelli, V.; Masoudvaziri, N.; Rush, D.; Valencia, A.; Wang, Y.; Blunck, D.L.; Valero, M.M.; Kempna, K.; Smolka, J.; et al. A review of thermal exposure and fire spread mechanisms in large outdoor fires and the built environment. Fire Saf. J. 2023, 140, 103871. [Google Scholar] [CrossRef]
  25. Wang, S.; Zheng, C.; Wandelt, S. Policy Challenges for Coordinated Delivery of Trucks and Drones. J. Air Transp. Res. Soc. 2024, 2, 100001. [Google Scholar] [CrossRef]
  26. Tu, W. Resource-efficient seamless transitions for high-performance multi-hop UAV multicasting. Computer Netw. 2022, 213, 109051. [Google Scholar] [CrossRef]
  27. Zeng, Y.; Xu, X.; Zhang, R. Trajectory Optimization for Completion Time Minimization in UAV-Enabled Multicasting. arXiv 2017, arXiv:1708.06478. [Google Scholar] [CrossRef]
  28. Kool, W.; van Hoof, H.; Welling, M. Attention, learn to solve routing problems! In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [CrossRef]
  29. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference for Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar] [CrossRef]
  30. Nawaz, M.; Enscore, E.E.; Ham, I. A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem. Omega 1983, 11, 91–95. [Google Scholar] [CrossRef]
  31. Suliman, S.M.A. A two-phase heuristic approach to the permutation flow-shop scheduling problem. Int. J. Prod. Econom. 2000, 64, 143–152. [Google Scholar] [CrossRef]
  32. de Fátima Morais, M.; Ribeiro, M.H.D.M.; da Silva, R.G.; Mariani, V.C.; dos Santos Coelho, L. Discrete differential evolution metaheuristics for permutation flow shop scheduling problems. Comput. Ind. Eng. 2022, 166, 107956. [Google Scholar] [CrossRef]
  33. Zheng, Y.J.; Ling, H.F.; Xue, J.Y. Ecogeography-Based Optimization: Enhancing Biogeography-Based Optimization with Ecogeographic Barriers and Differentiations. Comput. Oper. Res. 2014, 50, 115–127. [Google Scholar] [CrossRef]
  34. Zheng, Y.J. Water wave optimization: A new nature-inspired metaheuristic. Comput. Oper. Res. 2015, 55, 1–11. [Google Scholar] [CrossRef]
  35. Zheng, Y.J.; Lu, X.Q.; Du, Y.C.; Xue, Y.; Sheng, W.G. Water wave optimization for combinatorial optimization: Design strategies and applications. Appl. Soft Comput. 2019, 83, 105611. [Google Scholar] [CrossRef]
  36. Shao, W.; Shao, Z.; Pi, D. Multi-local search-based general variable neighborhood search for distributed flow shop scheduling in heterogeneous multi-factories. Appl. Soft Comput. 2022, 125, 109138. [Google Scholar] [CrossRef]
  37. Zhao, F.; Hu, X.; Wang, L.; Li, Z. A memetic discrete differential evolution algorithm for the distributed permutation flow shop scheduling problem. Complex Intell. Syst. 2022, 8, 141–161. [Google Scholar] [CrossRef]
  38. Zheng, Y.J.; Chen, S.Y.; Ling, H.F. Evolutionary optimization for disaster relief operations: A survey. Appl. Soft Comput. 2015, 27, 553–566. [Google Scholar] [CrossRef]
  39. Zheng, Y.J.; Du, Y.C.; Sheng, W.G.; Ling, H.F. Collaborative human-UAV search and rescue for missing tourists in nature reserves. INFORMS J. Appl. Analy. 2019, 49, 371–383. [Google Scholar] [CrossRef]
  40. Zheng, Y.; Du, Y.; Ling, H.; Sheng, W.; Chen, S. Evolutionary collaborative human-UAV search for escaped criminals. IEEE Trans. Evol. Comput. 2020, 24, 217–231. [Google Scholar] [CrossRef]
  41. Zheng, Y.J.; Du, Y.C.; Su, Z.L.; Ling, H.F.; Zhang, M.X.; Chen, S.Y. Evolutionary human-UAV cooperation for transmission network restoration. IEEE Trans. Ind. Informat. 2021, 17, 1648–1657. [Google Scholar] [CrossRef]
Figure 1. Illustration of of a scenario of cooperative drone and water supply truck scheduling for wildfire fighting.
Figure 1. Illustration of of a scenario of cooperative drone and water supply truck scheduling for wildfire fighting.
Drones 09 00464 g001
Figure 2. Illustration of searching a docking location of water supply for a drone from the current batch to the target subarea. The search should be conducted on roadway segments with purple solid line, which consist of points towards (closer to) either the current drone location or the target subarea.
Figure 2. Illustration of searching a docking location of water supply for a drone from the current batch to the target subarea. The search should be conducted on roadway segments with purple solid line, which consist of points towards (closer to) either the current drone location or the target subarea.
Drones 09 00464 g002
Figure 3. Flowchart for evaluating the objective function given by Equation (13).
Figure 3. Flowchart for evaluating the objective function given by Equation (13).
Drones 09 00464 g003
Figure 4. Architecture of the DRL network for the cooperative scheduling problem.
Figure 4. Architecture of the DRL network for the cooperative scheduling problem.
Drones 09 00464 g004
Figure 5. Box plots (including median, average, minimum, maximum, Q1, and Q3) of the objective function values (time duration in minutes for fire extinguishing) obtained by the comparative methods on the five test instances in wilderness area A (any objective function value not in the range [ Q 1 1.5 ( Q 3 Q 1 ) , Q 3 + 1.5 ( Q 3 Q 1 ) ] is regarded as an outlier).
Figure 5. Box plots (including median, average, minimum, maximum, Q1, and Q3) of the objective function values (time duration in minutes for fire extinguishing) obtained by the comparative methods on the five test instances in wilderness area A (any objective function value not in the range [ Q 1 1.5 ( Q 3 Q 1 ) , Q 3 + 1.5 ( Q 3 Q 1 ) ] is regarded as an outlier).
Drones 09 00464 g005
Figure 6. Box plots (including median, average, minimum, maximum, Q1, and Q3) of the objective function values (time duration in minutes for fire extinguishing) obtained by the comparative methods on the five test instances in wilderness area B. Any objective function value not in the range [ Q 1 1.5 ( Q 3 Q 1 ) , Q 3 + 1.5 ( Q 3 Q 1 ) ] is regarded as an outlier.)
Figure 6. Box plots (including median, average, minimum, maximum, Q1, and Q3) of the objective function values (time duration in minutes for fire extinguishing) obtained by the comparative methods on the five test instances in wilderness area B. Any objective function value not in the range [ Q 1 1.5 ( Q 3 Q 1 ) , Q 3 + 1.5 ( Q 3 Q 1 ) ] is regarded as an outlier.)
Drones 09 00464 g006
Figure 7. Box plots (including median, average, minimum, maximum, Q1, and Q3) of the objective function values (time duration in minutes for fire extinguishing) obtained by the comparative methods on the five test instances in wilderness area C. Any objective function value not in the range [ Q 1 1.5 ( Q 3 Q 1 ) , Q 3 + 1.5 ( Q 3 Q 1 ) ] is regarded as an outlier.)
Figure 7. Box plots (including median, average, minimum, maximum, Q1, and Q3) of the objective function values (time duration in minutes for fire extinguishing) obtained by the comparative methods on the five test instances in wilderness area C. Any objective function value not in the range [ Q 1 1.5 ( Q 3 Q 1 ) , Q 3 + 1.5 ( Q 3 Q 1 ) ] is regarded as an outlier.)
Drones 09 00464 g007
Table 1. Main information of the three wilderness areas and the test instances used in the computational tests.
Table 1. Main information of the three wilderness areas and the test instances used in the computational tests.
Wilderness AreaNum of Ignitable SubareasNum of Water Supply SubareasTest InstanceNum of DronesNum of Trucks
A-161
A-281
A12716A-3102
A-4122
A-5152
B-161
B-281
B18912B-3102
B-4122
B-5152
C-161
C-2102
C26519C-3152
C-4183
C-5213
Table 2. CPU time (in seconds) consumed by DRL on the each test instance.
Table 2. CPU time (in seconds) consumed by DRL on the each test instance.
InstanceA-1A-2A-3A-4A-5B-1B-2B-3B-4B-5C-1C-2C-3C-4C-5
Time121214141515151819211721232932
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bai, L.-Y.; Chen, X.-Y.; Ling, H.-F.; Zheng, Y.-J. Cooperative Drone and Water Supply Truck Scheduling for Wildfire Fighting Using Deep Reinforcement Learning. Drones 2025, 9, 464. https://doi.org/10.3390/drones9070464

AMA Style

Bai L-Y, Chen X-Y, Ling H-F, Zheng Y-J. Cooperative Drone and Water Supply Truck Scheduling for Wildfire Fighting Using Deep Reinforcement Learning. Drones. 2025; 9(7):464. https://doi.org/10.3390/drones9070464

Chicago/Turabian Style

Bai, Lin-Yuan, Xin-Ya Chen, Hai-Feng Ling, and Yu-Jun Zheng. 2025. "Cooperative Drone and Water Supply Truck Scheduling for Wildfire Fighting Using Deep Reinforcement Learning" Drones 9, no. 7: 464. https://doi.org/10.3390/drones9070464

APA Style

Bai, L.-Y., Chen, X.-Y., Ling, H.-F., & Zheng, Y.-J. (2025). Cooperative Drone and Water Supply Truck Scheduling for Wildfire Fighting Using Deep Reinforcement Learning. Drones, 9(7), 464. https://doi.org/10.3390/drones9070464

Article Metrics

Back to TopTop