Modeling Wildfire Initial Attack Success Rate Based on Machine Learning in Liangshan, China

Xu, Yiqing; Zhou, Kaiwen; Zhang, Fuquan

doi:10.3390/f14040740

Open AccessArticle

Modeling Wildfire Initial Attack Success Rate Based on Machine Learning in Liangshan, China

by

Yiqing Xu

¹

,

Kaiwen Zhou

²

and

Fuquan Zhang

^2,*

¹

School of Computer and Software, Nanjing Vocational University of Industry Technology, Nanjing 210023, China

²

College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China

^*

Author to whom correspondence should be addressed.

Forests 2023, 14(4), 740; https://doi.org/10.3390/f14040740

Submission received: 14 February 2023 / Revised: 13 March 2023 / Accepted: 31 March 2023 / Published: 4 April 2023

(This article belongs to the Special Issue Advances in Forest Fire and Other Detection Systems)

Download

Browse Figures

Versions Notes

Abstract

The initial attack is a critical phase in firefighting efforts, where the first batch of resources are deployed to prevent the spread of the fire. This study aimed to analyze and understand the factors that impact the success of the initial attack, and used three machine learning models—logistic regression, XGBoost, and artificial neural network—to simulate the success rate of the initial attack in a specific region. The performance of each machine learning model was evaluated based on accuracy, AUC (Area Under the Curve), and F1 Score, with the results showing that the XGBoost model performed the best. In addition, the study also considered the impact of weather conditions on the initial attack success rate by dividing the scenario into normal weather and extreme weather conditions. This information can be useful for forest fire managers as they plan resource allocation, with the goal of improving the success rate of the initial attack in the area.

Keywords:

logistic regression; XGBoost; artificial neural network; spatial distribution

1. Introduction

The basic principle of wildfire fighting is given by: “put out fires in the early hours, put out fires on a small scale, put out fires completely” [1]. The aim is to control or extinguish the fire in its initial stage and reduce the economic losses and casualties caused by the fire.

The current fire resource scheduling methods include ground firefighting and aerial firefighting. Ground firefighting involves using ground-based suppression resources such as hand tools, water pumps, hoses, and other equipment to fight the fire directly on the ground. Aerial firefighting involves using helicopters or planes to drop water or fire retardant on the fire from the air. A well-developed forest road networks play a key role in tranporting the fire fighting resources and shortening the response time of Ground firefighting [2,3,4]. Currently, the firefighting dispatch method in China is mainly through road network dispatch [5].

The actual wildfire fighting operation is divided into three stages: the initial attack stage, the extended attack stage, and the transition stage. Among these, the initial attack stage plays a crucial role [6]. The initial attack stage refers to the aggressive action taken by the first responding resources to put out the fire, usually within 1–8 h of the start of the fire. The objective is to control or extinguish the fire as quickly as possible, given the fire’s initial scale [7,8].

In past research, numerous scholars have studied the factors that affect the success rate of initial attacks and impact of initial attack on the environment. Arienti used fuels, anthropogenic linear features, fire weather, and management to build an empirical model for the initial attack success rate [9]. Beverly evaluated the impact of fires on containment outcomes to explore whether previous wildfires had a negative feedback on subsequent fires in these ecosystems [10]. Cardil studied the factors that influence firefighting success in Quebec Province, Canada, which can help forest fire protection agencies better understand their wildland fire suppression systems and adapt more effectively to upcoming changes in fire regimes [11]. Tremblay studied the factors affecting the size distribution of wildland fires, including weather, fuels, and firefighting activities in Alberta, Canada [12]. Podur conducted research on the factors influencing the growth and suppression of large forest fires in Ontario, Canada and developed corresponding simulation models [13]. In general, several factors, such as weather, initial attack delay, fire spread rate, fire-fighting resource availability, and arrival time, affect the success rate of the initial attack [14,15].

Suppression resources can also have varying impacts on initial attack success rate. Collins’ research on Australian forest and grass fires indicates that suppression resources are the primary influencing factor for controlling grass and forest fires in Australia [16]. Current research in this area focuses on optimizing resource scheduling to improve the success rate of initial attacks in a given region. For example, Minas used mixed integer linear programming (MIP) to optimize resource allocation for the initial attack [17], while Mendes proposed an iterated local search (ILS) meta-heuristic algorithm to configure fire resources for the initial attack [18]. However, these algorithms may not be suitable for real-life situations, where the occurrence of a fire is often a random process. To account for the uncertain factors in the initial attack, Ntaimo and Wei further divided the initial attack process into two parts: (1) pre-deploying resources at the fire station, and (2) deciding which resources to dispatch after the fire occurs [19,20]. They used a stochastic programming approach to deploy and dispatch initial attack resources [21].

Despite the efforts aimed at optimizing initial attack resource scheduling, there is a lack of research on evaluating the success rate of initial attacks in China. Reimer improved the burn probability (BP) to evaluate the effect of the initial attack [8], while Rashidi proposed an attacker-defender model to analyze the impact of the initial attack on wildfire suppression [15]. Some studies are based on the assumption that the probability of IA success is a key factor in determining the overall success of fire suppression. With the rapid development of machine learning, Rodrigues treated the problem of calculating the success rate of initial attacks as a two-classification problem and solved it by combining random models with geographical factors [22]. Marshall used a random forest model to identify variables that control fires in the first 24 h of response, and then modeled suppression success [23].

Although optimizing resource scheduling can improve the initial attack success rate of a region, these strategies are reactive and only addresses the problem after a wildfire has been discovered. By comprehensively analyzing the factors that affect the initial attack success rate and mapping the spatial distribution of the success rate in a target area, it will help further optimize resource scheduling.

In this study, we focused on the various factors that influence the initial attack success rate, and used machine learning to generate the spatial distribution of the success rate under different weather conditions in the region. Four factors were identified: initial attack delay, arrival time, fire spread potential, and wildfire fighting capability. Remote sensing data, fire points, weather, and other data were collected and analyzed using the SMOTE oversampling method and Pearson correlation coefficient. Since different machine learning method has its own advantages in evaluating initial attack success rates, thus, it is necessary to employ diverse models for evaluating the various factors on initial attack success rate in a given region. Logistic regression provides interpretable results, XGBoost can handle complex datasets and missing values, and ANNs can handle non-linear relationships between variables and learn from a large amount of data. In this study, three machine learning models (logistic regression, XGBoost, artificial neural networks) were used to simulate the spatial distribution of the initial attack success rate in the study area under two different initial scenarios. The performance of the models was evaluated using accuracy, AUC, and F1 Score. The results showed that the XGBoost model had the best performance in terms of accuracy and robustness. The comparison of results from the two different initial scenarios showed that the initial attack success rate was lower in extreme weather conditions, but the spatial distribution was similar to that in normal weather scenarios.

2. Materials and Methods

2.1. Study Area

The research area of this paper is located in Lushan, Xichang, Liangshan Yi Autonomous Prefecture, which is in the southwest of Sichuan Province and covers an area of about 129.45 square kilometers (see Figure 1). The region is rich in forest resources and has a forest coverage rate of 43.0% by the end of 2017. The climate in the area is generally in a subtropical monsoon zone, with hot and rainy summers, dry and less rainy winters and springs, strong solar radiation during the day, large temperature differences between day and night, an annual average temperature of about 18.1 °C, and an annual sunshine of more than 2500 h.

Fires in the study region frequently occur during the winter and spring seasons due to the lack of rainfall and the accumulation of combustible materials. In contrast, fires are less likely to occur in the summer due to the abundant rainfall and dense vegetation growth [24].

In this study, we added a 2 km buffer around the study area to consider the impact of fire stations and road networks. The road network analysis (see Section 2.3.2) and viewability analysis (see Section 2.3.1) were considered in this buffer. Additionally, the buffer zone and the study area are separated by national highways G348, G248, and G108, forming an isolation zone, which makes it difficult for fires to spread to the surrounding buffer zone. We also obtained ASTGTMv003 series of images through EARTH DATA, which provides a digital elevation model (DEM) with a spatial resolution of 30 m × 30 m through Terra satellite. The results are shown in Figure 1. The overall study area has an altitude range between 1467 m and 2326 m, with a higher altitude on the southeast side of the study area and a relatively flat surrounding.

2.2. Data Source

The road network information and the distribution of residential areas in the experimental area are obtained from Open Street Map for the calculation of the initial attack delay and the calculation of the resource arrival time. In addition, commonly used weather data, such as temperature, wind speed, wind direction, and remote sensing data are also obtained for the assessment of fire spread potential. The weather data is obtained from the National Climatic Data Center (NCDC) and its Climate Data Records (CDRs). The remote sensing data comes from Landsat satellite images, which have low cloud coverage and are considered to have the highest available data quality, making them suitable for NDVI calculations [25].

We used the Google Earth Engine (GEE) platform, a cloud-based remote sensing data computing platform for planetary-scale earth science data and geospatial analysis, to process the NDVI and fire burning area in the study area in batch. The GEE platform is widely used in natural disasters, droughts, climate monitoring, and environmental protection [26,27].

The fire point data in the study area from 2008 to 2021 was obtained from NASA’s MODIS, VIIRS S-NPP, and VIIRS NOAA-20. The extracted information includes the fire point distribution, brightness temperature, and thermal radiation information [28]. Using remote sensing methods may be challenging for collecting fire data in extremely small-scale or lacking extremely active fire behavior, but it is actually very effective for areas with limited fire history data. Currently, a spatial resolution data on NASA’s description of VIIRS, may not be indicated for extremely small fire studies. Anyway, a such resolution is still used in initial attack success modeling when more precise data are not available [29,30].

According to the previous studies, if the fire is contained within 10 ha, the initial attack is considered successful, while if it is greater than 50 ha, it is considered a failure [22,31], and burned area thresholds were used to distinguish between fires that were successfully suppressed and those that escaped. As a result, any fires that fell within the range of 10 to 50 ha were disregarded to ensure a clear separation between the two categories [9]. The fire point distribution result is displayed in Figure 2. A comprehensive summary of the data sources can be found in Table 1.

2.3. Feature Selection

Generally, the success rate of the initial attack is influenced by many factors, which can be mainly divided into four aspects: initial attack delay, arrival time, fire spread potential, and wildfire fighting capability. We focus on the extraction of relevant features from these four aspects.

2.3.1. Initial Attack Delay

The initial attack delay refers to the time elapsed between the detection of a fire and when the first suppression resource arrived to a fire [32]. The size of the fire has a correlation with the time until external intervention, and early detection of the fire can reduce the losses. Hence, the detection time of the fire has a significant impact on the success rate of the initial attack. Despite its significance, the initial attack delay is often not recorded and difficult to obtain. Wooster employed remote sensing images from the Spinning Enhanced Visible and InfraRed Imager (SEVIRI) and combined algorithms to obtain the fire detection time [33]. However, due to the resolution of remote sensing, small-scale fires cannot be easily monitored. Small-scale fires are, however, the easiest to suppress through initial attacks, making the detection time of small-scale fires crucial. Rodrigues used Geographic Information System (GIS) to calculate the size of the viewable area as a substitute for fire detection time to some extent, thereby making it easier to obtain the detection time of small-scale fires [22]. Nonetheless, the fire detection time is not only dependent on the visual range near the fire point, but also on the time period and temperature of the fire [6]. Hence, in this study, we discuss the initial attack delay of fire by dividing it into the following four aspects:

(1): The size of the sight field of the fire point

The visual area size is typically calculated using viewshed analysis. As the observation point, we select the fire point. Since people often first observe smoke generated by the fire, we add an upward offset of 300 m to each fire point to simulate this smoke. Additionally, we use residential areas and roads as observed points (as shown in Figure 3a), and take into account the differences in height and floor by increasing the observed point offset by an additional 3 m upward. By setting the atmospheric refractive index to 0.13, and considering the curvature of the earth and the visible range of the human eye, we can obtain the binary field of view generated by the fire point, as shown in Figure 3b [34]. Moreover, we can calculate the size of the visible range of the fire point by overlaying the road network and road area in the area. Using the principles of visibility analysis, we can calculate the viewable range of each fire point in the area. To aid us in completing the viewshed analysis of each fire point in the study area, we utilize the viewshed analysis plug-in [34] in Quantum GIS.

(2): Time period of fire

Because the detection time of fires in different time periods is different, it is usually easier for people to observe fires that occur during the day in a short time [6]. To this end, we divide the fire time into day (6 a.m. to 8 p.m.) and night (8 p.m. to 6 a.m.) and obtain the data from the time detected by FIRMS. Since this feature is discrete, we use One-Hot Encoding to input the day and night conditions at the time of the fire as a feature.

(3): The brightness of the fire point

At the same time, the brightness of the fire point also affects the initial attack delay. Generally, a brighter fire point is more likely to be discovered earlier. In this study, we use the MODIS channels 21/22 and channel 31 in the FIRMS data and the I-5 channel of VIIRS to measure the brightness of a fire pixel, which is expressed in temperature units (in Kelvin). The brightness here is a measurement of the specific wavelength photons received by the spacecraft, expressed in temperature units.

(4): Thermal radiation from the fire point

The thermal radiation generated by the fire point is also a factor that influences the detection time of the fire. The thermal radiation is related to both the scale of the fire and the weather conditions at the time of the fire. In this study, the Fire Radiative Power (FRP) is used to describe the thermal radiation generated by the fire point, which is an integrated measurement of the fire radiation power in megawatts. The FRP data is generated by the MODIS and VIIRS 375 m fire detection algorithms, which are designed to optimize their response to small fires.

2.3.2. Arrival Time

The arrival time of resources primarily refers to the amount of time it takes for resources to travel from the fire station to the fire point. It is influenced by factors such as the distance between the fire station and the fire point, and the conditions of the roads in between. According to the “Technical Standards for Highway Engineering” (JTG B01-2014) issued by the Ministry of Transport, roads are divided into the following categories: (1) Walking road (2) Third class road (3) Second class highway (4) First class highway. Detailed road information is in Table 2.

Considering the presence of three fire fighting stations in the experimental area, any detected fire in the area will result in the dispatch of fire fighting resources from the three stations to extinguish it. These three stations have been numbered and referred to as No.1 fire fighting station, No.2 fire fighting station, and No.3 fire fighting station, and their locations are shown in Figure 4.

To simulate the dispatching process of fire resources, the function of road network analysis (Network Analysis) in Quantum GIS was used, taking the fire station as the starting point and the fire point as the end point. This process considers different road grades as shown in Figure 5. The result of this simulation, depicting the time it takes for each of the three fire stations to reach the fire point, is displayed in Figure 4. The color closer to white indicates a shorter arrival time for the fire station, as it is located closer to the fire point. However, it can be observed that the southeast side of the area has a longer arrival time. This is further supported by the Digital Elevation Model (DEM) image on the right side of Figure 1, which reveals that the area is situated at a higher altitude in the study area, leading to longer arrival times of resources due to the lack of coverage of altitude and road network in this region.

2.3.3. Spread Potential

“Fire spread potential” refers to the likelihood that a fire will spread. Generally, the higher the spread potential of a fire, the easier it will be for it to spread without human intervention in its initial stage. There are several factors that can impact the fire spread potential [6]. We have divided fire spread potential into three categories:

(1): Weather factor

The weather conditions have a significant impact on the success of initial fire attacks. The study considered the main weather factors, including temperature, wind speed, and wind direction. To analyze the temperature data, it was divided into years and months and the average temperature of the current month was calculated in degrees Celsius. The temperature in July and August in the experimental area is high, with an average temperature of around 23 °C, while the temperature in January and February is low, with an average temperature of around 10 °C.

The collected wind speed (in meters per second) and wind direction data were also divided by year. The data reveal that the area experiences windy conditions throughout the year, and the majority of wind speeds are less than 3 m per second.

(2): Vegetation density

The density of vegetation has a large impact on the spread of fires. Generally speaking, fires are not easy to spread in areas lacking vegetation coverage. Vegetation density is represented by the Normalized Difference Vegetation Index (NDVI), which has been widely used to study vegetation distribution and monitor habitat changes [35]. The value of NDVI ranges from

- 1

to 1. Negative numbers indicate a lack of vegetation coverage, whereas the closer the value is to 1, the denser the vegetation distribution and the vegetation is [36]. The formula for calculating NDVI is shown in Equation (1):

N D V I = \frac{N I R - R E D}{N I R + R E D}

(1)

Among them, NIR and RED represent the near infrared band and the infrared band.

(3): Topography

Under normal conditions, topography is considered to be a crucial factor that influences the spread of fire [37,38]. In this experiment, the two essential components of slope and aspect in topography were analyzed. The Slope and Aspect functions in Quantum GIS were used to process Digital Elevation Model (DEM) images of the experimental area. The results are presented in Figure 6. By combining Figure 6a,b, it can be observed that the middle part of the experimental area has steep terrain, whereas the surrounding area is relatively flat, with the main slope facing north and south.

2.3.4. Wildfire Fighting Capability

The wildfire fighting capability refers to the resources that fire fighting stations can deploy during a fire incident. The availability of these resources is often impacted by the number of simultaneous fires. However, in the case of multiple fires happening at the same time, fire resources may become dispersed, leading to an inability to extinguish fires in a timely manner. To address this issue, this study considers the number of fires on the current day and the number of fires from the previous day as input factors.

2.4. Feature Pretreatment

2.4.1. SMOTE Oversampling

The fire point distribution in the study area is shown in Figure 2. We classify the fire points according to the outcome of the initial attack. The proportion of initial attacks that failed on the surface accounted for 76.9%, while the proportion of successful initial attacks accounted for 23.1%. which clearly shows that the sample category distribution is imbalanced. This imbalance poses a risk of overfitting due to an over-reliance on the data samples of most categories during machine learning training. To mitigate this risk and improve the performance of the model, we employ the Synthetic Minority Over-sampling Technique (SMOTE) to balance the sample categories.

The SMOTE algorithm selects multiple similar samples from a few underrepresented categories based on a distance metric and synthesizes new samples by looping through the process. Compared to undersampling, SMOTE retains the characteristics of the majority of samples as much as possible, resulting in improved recognition accuracy. In contrast to random oversampling, SMOTE expands the classification plane of small samples more reasonably. By synthesizing additional samples from underrepresented categories, SMOTE improves the recognition accuracy of the classifier and the overall AUC of the model, and enhances its generalization performance [39].

2.4.2. Correlation Analysis

In the feature selection process, we considered a total of 17 distinct features, but the relationships between these features are unclear. High feature dimensions may lead to the so-called “curse of dimensionality” and negatively impact the overall accuracy and performance of the model. To address this issue, we employed the Pearson correlation coefficient to quantify the correlation between each feature in the model. The Pearson correlation coefficient represents the degree of linear correlation between two continuous variables, with values ranging from −1 to 1. The closer the absolute value of the Pearson correlation coefficient is to 1, the stronger the correlation between the two variables.

From the result of Pearson correlation, we can know that the Pearson correlation coefficient between the arrival time of Fire Station 1 (station_1) and Fire Station 2 (station_2) is high, which could be due to the close proximity of the two fire stations. Additionally, the correlation between the results of viewshed analysis of residential areas (viewshed_area) and roads (viewshed_road) is also substantial, which could be attributed to the large overlap between the residential areas and road coverage. As a result, we removed the two features: the arrival time of Fire Station 2 (station_2) and the result of viewshed analysis of residential areas (viewshed_area) from the subsequent model training.

2.5. Machine Learning Model

After determining the features, it is necessary to simulate the initial attack success rate of each area in the study area. Three different machine learning models are used for training: Logistic Regression in traditional machine learning, XGBoost, a representative method of ensemble learning, and Artificial Neural Networks, which have been popular in recent years. In addition, two different initial scenarios are considered during the simulation of the initial attack, normal weather and extreme weather, and the results obtained from these two scenarios are compared. Finally, the performance of the three machine learning models is compared from multiple perspectives in this chapter.

2.5.1. Logical Regression

The logistic regression algorithm is a well-established machine learning technique that is widely used for classification tasks [40]. It represents the input data in terms of the output of linear regression, and the sample is classified by an activation function. To optimize the logistic regression model, the gradient descent method is often employed.

In this study, we utilize the

l i n e a r_m o d e l

module from the scikit-learn library to implement logistic regression, and we use Python as main programming language to finish this task [41]. During the training process, we selected liblinear as the optimization method (solver) and applied L2 regularization for model optimization (penalty).

2.5.2. XGBoost

XGBoost (Extreme Gradient Boosting) is a powerful integrated learning method that can be used for both classification and regression problems. It is integrated through Boosting methods, which adjust the importance of the data based on the results of the previous round of learning. This ensures that the base learner (base learner), which had previous decision errors, receives more attention in subsequent training samples, leading to a serial training process that effectively improves the training accuracy and reduces deviation.

The

X G B C l a s s i f i e r

module in xgboost is used for model training. During training, the booster parameter is set to “gbtree” to build the tree model. Additionally, the learning rate (eta) is set to 0.1 and the maximum depth of the tree (max_depth) is set to 6.

2.5.3. Artificial Neural Networks

Artificial Neural Networks (ANNs) are inspired by the structure and function of the human brain and are commonly used in the fields of artificial intelligence, machine learning, and deep learning to recognize patterns and solve problems. ANNs consist of multiple node layers, including an input layer, one or more hidden layers, and an output layer. Each node, also known as an artificial neuron, is connected to other nodes and has associated weights and thresholds. If the output of any single node exceeds the specified threshold, the node becomes activated and sends data to the next layer of the network, otherwise, the data is not passed on.

The structure of the artificial neural network model is shown in Figure 7. It consists of an input layer with 15 neurons, representing the characteristics of the input data. The first hidden layer contains 128 neurons, the second hidden layer contains 256 neurons, and the output layer contains 2 neurons, which respectively represent the success or failure of the initial attack. The

n n

module in Pytorch is used to construct the neural network, and the model is trained using mini-batch gradient descent with a total of 500 iterations and a learning rate of 0.01.

3. Results and Discussion

Through the model training described above, three different machine learning models have been obtained. Given the dynamic characteristics required by the model, such as temperature, wind speed, wildfire fighting ability, etc., which change with different weather conditions, two different scenarios were established to simulate these conditions: normal weather and extreme weather.

3.1. Normal Weather

Normal weather refers to a scenario where variables such as temperature and wind speed are at their average values for the year, with no wind and no fire occurring on the previous day or the current day. This scenario results in the three machine learning models shown in Figure 8.

The simulation results of the three machine learning models, as shown in Figure 8, reveal that the initial attack success rate was higher at the edge of the experimental area, and the initial attack was less likely to fail. However, the initial attack success rate was lower at the center of the experimental area, and the initial attack was more likely to fail. More resources are needed for the initial attack implementation in the central part of the experimental area.

3.2. Extreme Weather

Extreme weather refers to a scenario where variables such as temperature and wind speed exceed the 75% quantile for a year, and 10 fires occur simultaneously. This scenario results in the simulation results shown in Figure 9, obtained through the three machine learning models.

The simulation results of the three machine learning models, as shown in Figure 9, reveal that their results are relatively similar. The success rate of the initial attack in the vicinity of the experimental area is higher compared to the central area, indicating that the initial attack is less likely to fail. On the other hand, the success rate of the initial attack in the lower region of the experimental area is lower.

3.3. Comparison under Different Scenes

The comparison results between normal and extreme weather can be observed in Table 3. As indicated in the table, the number of fire points for initial attack failure increases in extreme weather as compared to normal weather. Among the four machine learning models, XGBoost shows the least improvement with an increase of 0.32% in fire points for initial attack failure in extreme weather, whereas the increase for the same metric is 0.48% for the Gradient Boosting model, and 1.62% for the Artificial Neural Network model.

3.4. Model Evaluation

The quality of a classification model in a classification problem can be evaluated using a confusion matrix, which provides a visual representation of the results. The terms True Positive (TP) refer to when the IA was successful and the model correctly predicts it as positive. False Negative (FN) means that the IA was successful, but the model predicted it as a failure. False Positive (FP) refers to the IA was unsuccessful, but the model predicted it as a success, while True Negative (TN) means that the IA was unsuccessful and the model predicted it as a failure. To evaluate the performance of each model, we use 100 five-fold cross-validation and measure the accuracy, AUC, and F1 Score of the models.

Accuracy, defined as the proportion of correctly classified samples to the total number of samples, is a commonly used metric to evaluate the performance of a classification model. The accuracy of each of the three machine learning models—logistic regression, XGBoost, and neural network—is shown in Figure 10. The accuracy of logistic regression ranges from 0.737 to 0.85 with an average accuracy of 0.793. The accuracy of XGBoost ranges from 0.82 to 0.952 with an average accuracy of 0.888. The accuracy of the neural network ranges from 0.683 to 0.832 with an average accuracy of 0.764. It is clear that XGBoost outperforms the other models in terms of accuracy, while the average accuracy of the neural network is the lowest among the three models and its performance is not ideal compared to the other models.

The area under the ROC curve (AUC) is a metric for evaluating the performance of a binary classification model, with its value ranging from 0.5 to 1, with 0.5 indicating that the model is no better than random chance, and 1 indicating that the model is a perfect classifier [42].

The AUC results for each model are shown in Figure 11. The AUC of logistic regression ranges from 0.733 to 0.851, with an average of 0.793. The XGBoost model has an AUC range of 0.823 to 0.95, with an average of 0.889, making it the best performing model in terms of AUC. The AUC of the neural network ranges from 0.681 to 0.832, with an average of 0.764, which is the lowest among the three models and suggests that its classification ability is weaker than the other two models.

In fact, the F1 Score is the harmonic average of accuracy (Precision) and recall (Recall), and the higher the value of F1 Score, the more robust the model is. As shown in Figure 12, the F1 Score range of logistic regression is between 0.739–0.865, and its average F1 Score is 0.801; XGBoost F1 Score range is between 0.824–0.956, and the average F1 Score is 0.893; The F1 Score range of the neural network is between 0.658–0.846, and its average F1 Score is 0.77. It can be seen that the model has good robustness. However, the minimum F1 Score of the artificial neural network is only 0.658, which indicates that the robustness of the artificial neural network model is insufficient.

From the results obtained Table 4, it is clear that the XGBoost model trained through integrated learning performs significantly better compared to both traditional machine learning methods and artificial neural networks in all evaluation metrics. Integrated learning combines multiple base learning models of weak performance to make predictions, which results in improved performance compared to traditional machine learning methods such as Logistic Regression. The poor performance of the artificial neural network could be attributed to the limited amount of data used in the experiment and the type of problem being addressed. As is well known, artificial neural networks require a vast amount of data to optimize their parameters, and in this case, the limited amount of data available does not suffice the optimization requirements, leading to underfitting of the model. Additionally, the task at hand is a two-class classification problem, which falls under the realm of traditional machine learning, making the use of artificial neural networks less effective than traditional machine learning methods.

4. Conclusions

Forest fires pose a significant threat to both the natural environment and human safety. The prevention, control, and rescue of these fires have been a research focus for many scholars. In this paper, We summarized the factors that may affect the initial attack and used them in combination with machine learning to model the distribution of initial attack success rates in the region. The data used in the experiment is relatively easy to obtain which can effectively assist regions lacking detailed historical data to assess initial attack success rates and help decision-makers deploy suppression resources in advance. In this research, the initial attack success rate is simulated in two different scenarios using machine learning methods and evaluated using relevant machine learning metrics. This paper provides an overview of the concept of initial attack and summarizes the methods and current situation of foreign scholars in calculating the success rate. The study also includes an examination of various factors that may impact the success rate of initial attacks. Using fire point data in the experimental area, three machine learning models (logistic regression, XGBoost, and artificial neural network) were trained and evaluated using different machine learning metrics to compare their performance. The initial attack was simulated in two scenarios—normal weather and extreme weather—to obtain the initial attack success rate under different conditions and compare the results. It can be observed from the results that the initial attack success rate at the edge of the study area is significantly higher than that in the central region. The reasons for this phenomenon may include: (1) the road network in the central region is relatively sparse compared to the edge region. Since the study area mainly relies on the road network to transport firefighting volunteers, the longer resource arrival time may lead to a decrease in the initial attack success rate. (2) The central region is farther away from the fire station, which may cause a longer delay in the initial attack compared to the edge region and consequently decrease the initial attack success rate.

The comparison of the three methods used in this study (XGBoost, ANN, LR) has several modeling implications. Firstly, our results show that the XGBoost method outperformed the other two methods in predicting initial attack success rates. This suggests that machine learning methods, such as XGBoost, can provide more accurate and reliable predictions of initial attack success rates compared to traditional regression models like LR. Secondly, the comparison also highlights the importance of selecting appropriate modeling techniques for specific research questions. In this study, we aimed to predict initial attack success rates using a variety of factors, including weather conditions and fire characteristics. Our results suggest that machine learning methods like XGBoost and ANN are better suited for handling complex and nonlinear relationships between these factors, whereas LR may be more appropriate for simpler models with fewer predictors. Finally, our comparison also suggests that incorporating more advanced modeling techniques, such as ensemble methods, may further improve the accuracy and reliability of initial attack success rate predictions. However, these methods may require more data and computational resources, and thus careful consideration should be given to their feasibility and practicality in different contexts.

The experimental results indicate that among the three machine learning methods, XGBoost performed best in accurately and effectively predicting the initial attack success rate in the region. It was noted that the overall success rate of the initial attack was slightly lower under extreme weather conditions compared to normal weather conditions.

Author Contributions

The three authors have contributed equally in each and every stage of this research work. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Start-up Fund for New Talented Researchers of Nanjing Vocational University of Industry Technology [Grant No. YK22-05-01].

Data Availability Statement

All data generated or presented in this study are available upon request from corresponding author. Furthermore, the models and code used during the study cannot be shared at this as the data also form part of an ongoing study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhan, Z. Li Keqiang Made Important Instructions on Forest and Grassland Fire Prevention and Control Work. Mod. Occup. Saf. 2020. Available online: http://www.xinhuanet.com/politics/2020-09/18/c_1126513000.htm (accessed on 13 February 2023).
Gao, C.; Lin, H.L.; Hu, H.Q.; Song, H. A review of models of forest fire occurrence prediction in China. Ying Yong Sheng Tai Xue Bao J. Appl. Ecol. 2020, 31, 3227–3240. [Google Scholar]
Cao, Y.; Wang, M.; Liu, K. Wildfire susceptibility assessment in Southern China: A comparison of multiple methods. Int. J. Disaster Risk Sci. 2017, 8, 164–181. [Google Scholar] [CrossRef]
Zhong, M.; Fan, W.; Liu, T.; Li, P. Statistical analysis on current status of China forest fire safety. Fire Saf. J. 2003, 38, 257–269. [Google Scholar] [CrossRef]
Zhang, F.; Dong, Y.; Xu, S.; Yang, X.; Lin, H. An approach for improving firefighting ability of forest road network. Scand. J. For. Res. 2020, 35, 547–561. [Google Scholar] [CrossRef]
Paudel, A.; Martell, D.L.; Woolford, D.G. Factors that affect the timing of the dispatch of initial attack resources to forest fires in northeastern Ontario, Canada. Int. J. Wildland Fire 2018, 28, 15–24. [Google Scholar] [CrossRef]
Lee, Y.; Fried, J.S.; Albers, H.J.; Haight, R.G. Deploying initial attack resources for wildfire suppression: Spatial coordination, budget constraints, and capacity constraints. Can. J. For. Res. 2013, 43, 56–65. [Google Scholar] [CrossRef]
Reimer, J.; Thompson, D.K.; Povak, N. Measuring initial attack suppression effectiveness through burn probability. Fire 2019, 2, 60. [Google Scholar] [CrossRef]
Arienti, M.C.; Cumming, S.G.; Boutin, S. Empirical models of forest fire initial attack success probabilities: The effects of fuels, anthropogenic linear features, fire weather, and management. Can. J. For. Res. 2006, 36, 3155–3166. [Google Scholar] [CrossRef]
Beverly, J.L. Time since prior wildfire affects subsequent fire containment in black spruce. Int. J. Wildland Fire 2017, 26, 919–929. [Google Scholar] [CrossRef]
Cardil, A.; Lorente, M.; Boucher, D.; Boucher, J.; Gauthier, S. Factors influencing fire suppression success in the province of Quebec (Canada). Can. J. For. Res. 2019, 49, 531–542. [Google Scholar] [CrossRef]
Tremblay, P.O.; Duchesne, T.; Cumming, S.G. Survival analysis and classification methods for forest fire size. PLoS ONE 2018, 13, e0189860. [Google Scholar] [CrossRef]
Podur, J.J.; Martell, D.L. A simulation model of the growth and suppression of large forest fires in Ontario. Int. J. Wildland Fire 2007, 16, 285–294. [Google Scholar] [CrossRef]
Gonzalez-Olabarria, J.R.; Reynolds, K.M.; Larrañaga, A.; Garcia-Gonzalo, J.; Busquets, E.; Pique, M. Strategic and tactical planning to improve suppression efforts against large forest fires in the Catalonia region of Spain. For. Ecol. Manag. 2019, 432, 612–622. [Google Scholar] [CrossRef]
Rashidi, E.; Medal, H.; Hoskins, A. An attacker-defender model for analyzing the vulnerability of initial attack in wildfire suppression. Nav. Res. Logist. (NRL) 2018, 65, 120–134. [Google Scholar] [CrossRef]
Collins, K.M.; Price, O.F.; Penman, T.D. Suppression resource decisions are the dominant influence on containment of Australian forest and grass fires. J. Environ. Manag. 2018, 228, 373–382. [Google Scholar] [CrossRef]
Minas, J.; Hearne, J.; Martell, D. An integrated optimization model for fuel management and fire suppression preparedness planning. Ann. Oper. Res. 2015, 232, 201–215. [Google Scholar] [CrossRef]
Mendes, A.B.; e Alvelos, F.P. Iterated local search for the placement of wildland fire suppression resources. Eur. J. Oper. Res. 2022. [Google Scholar] [CrossRef]
Ntaimo, L.; Arrubla, J.A.G.; Stripling, C.; Young, J.; Spencer, T. A stochastic programming standard response model for wildfire initial attack planning. Can. J. For. Res. 2012, 42, 987–1001. [Google Scholar] [CrossRef]
Ntaimo, L.; Gallego-Arrubla, J.A.; Gan, J.; Stripling, C.; Young, J.; Spencer, T. A simulation and stochastic integer programming approach to wildfire initial attack planning. For. Sci. 2013, 59, 105–117. [Google Scholar] [CrossRef]
Wei, Y.; Bevers, M.; Belval, E.J. Designing seasonal initial attack resource deployment and dispatch rules using a two-stage stochastic programming procedure. For. Sci. 2015, 61, 1021–1032. [Google Scholar] [CrossRef]
Rodrigues, M.; Alcasena, F.; Vega-García, C. Modeling initial attack success of wildfire suppression in Catalonia, Spain. Sci. Total Environ. 2019, 666, 915–927. [Google Scholar] [CrossRef] [PubMed]
Marshall, E.; Dorph, A.; Holyland, B.; Filkov, A.; Penman, T.D. Suppression resources and their influence on containment of forest fires in Victoria. Int. J. Wildland Fire 2022, 31, 1144–1154. [Google Scholar] [CrossRef]
Jia, L. Research on Forest Fire Prevention and Control Countermeasures in Liangshan Prefecture. For. Fire Prev. 2019, 3, 9–14. [Google Scholar]
Chander, G.; Markham, B.L.; Helder, D.L. Summary of current radiometric calibration coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI sensors. Remote Sens. Environ. 2009, 113, 893–903. [Google Scholar] [CrossRef]
Mutanga, O.; Kumar, L. Google Earth Engine Applications; MDPI: Basel, Switzerland, 2019. [Google Scholar]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Giglio, L.; Descloitres, J.; Justice, C.O.; Kaufman, Y.J. An enhanced contextual fire detection algorithm for MODIS. Remote Sens. Environ. 2003, 87, 273–282. [Google Scholar] [CrossRef]
Xofis, P.; Konstantinidis, P.; Papadopoulos, I.; Tsiourlis, G. Integrating remote sensing methods and fire simulation models to estimate fire hazard in a south-east mediterranean protected area. Fire 2020, 3, 31. [Google Scholar] [CrossRef]
Keane, R.E.; Burgan, R.; van Wagtendonk, J. Mapping wildland fuels for fire management across multiple scales: Integrating remote sensing, GIS, and biophysical modeling. Int. J. Wildland Fire 2001, 10, 301–319. [Google Scholar] [CrossRef]
Plucinski, M.P. Factors affecting containment area and time of Australian forest fires featuring aerial suppression. For. Sci. 2012, 58, 390–398. [Google Scholar] [CrossRef]
Plucinski, M.P. Modelling the probability of Australian grassfires escaping initial attack to aid deployment decisions. Int. J. Wildland Fire 2012, 22, 459–468. [Google Scholar] [CrossRef]
Wooster, M.; Roberts, G.; Freeborn, P.; Xu, W.; Govaerts, Y.; Beeby, R.; He, J.; Lattanzio, A.; Fisher, D.; Mullen, R. LSA SAF Meteosat FRP products–Part 1: Algorithms, product contents, and analysis. Atmos. Chem. Phys. 2015, 15, 13217–13239. [Google Scholar] [CrossRef]
Cuckovic, Z. Advanced viewshed analysis: A Quantum GIS plug-in for the analysis of visual landscapes. J. Open Source Softw. 2016, 1, 32. [Google Scholar] [CrossRef]
Pettorelli, N.; Vik, J.O.; Mysterud, A.; Gaillard, J.M.; Tucker, C.J.; Stenseth, N.C. Using the satellite-derived NDVI to assess ecological responses to environmental change. Trends Ecol. Evol. 2005, 20, 503–510. [Google Scholar] [CrossRef] [PubMed]
Myneni, R.B.; Hall, F.G.; Sellers, P.J.; Marshak, A.L. The interpretation of spectral vegetation indexes. IEEE Trans. Geosci. Remote Sens. 1995, 33, 481–486. [Google Scholar] [CrossRef]
Lydersen, J.M.; North, M.P.; Collins, B.M. Severity of an uncharacteristically large wildfire, the Rim Fire, in forests with relatively restored frequent fire regimes. For. Ecol. Manag. 2014, 328, 326–334. [Google Scholar] [CrossRef]
Prichard, S.J.; Kennedy, M.C. Fuel treatments and landform modify landscape patterns of burn severity in an extreme fire event. Ecol. Appl. 2014, 24, 571–590. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Maalouf, M. Logistic regression in data analysis: An overview. Int. J. Data Anal. Tech. Strateg. 2011, 3, 281–299. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Hosmer, D.W., Jr.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; John Wiley & Sons: Hoboken, NJ, USA, 2013; Volume 398. [Google Scholar]

Figure 1. Study area, which is located in Liangshan, Sichuan, China. The left image is the aspect of the study area, the middle images are the overall orientation map of the study area, and the upper right corner is the Digital Elevation Model of the study area which from ASTGTMv003 series of images.

Figure 2. Ignition points distribution.

Figure 3. Viewshed Extraction; (a) Roads and residential areas; (b) Visual area.

Figure 4. Arrival Time; (a) NO.1 fire station; (b) NO.2 fire station; (c) NO.3 fire station.

Figure 5. Distribution of Road Network in Study Area, and three fire stations in the area that can respond promptly to fires that occur in the area. Additionally, different types of roads are also delineated in the map (The road network information is sourced from Open Street Map).

Figure 6. Topography; (a) Slope; (b) Aspect.

Figure 7. Artificial Neural Network.

Figure 8. Simulated Initial Attack Results Under Normal Weather, low and high this correspond to a probability between 0 and 1; (a) Logistic Regression; (b) XGBoost; (c) Artificial Neural Network.

Figure 9. Simulated Initial Attack Results Under Extreme Weather; (a) Logistic Regression; (b) XGBoost; (c) Artificial Neural Network.

Figure 10. Accuracy of Machine Learning Models.

Figure 11. AUC (Area Under Curve) of Machine Learning Models.

Figure 12. F1 Score of Machine Learning Models.

Table 1. Data source.

Data	Description	Source
Landsat 7	NDVI (2008–2013)	https://developers.google.com/earth-engine/data (access date: 2 April 2022)
Landsat 8	NDVI (2014–2021)	https://developers.google.com/earth-engine/datasets (access date: 2 April 2022)
ASTGTMv003	Slope, Aspect	https://lpdaac.usgs.gov/products/astgtmv003/ (access date: 2 April 2022)
MODIS	Fire points Distribution, Brightness, Thermal Radiation	https://earthdata.nasa.gov/earth-observation-data/near-real-time/firms/about-firms (access date: 2 April 2022)
S-NPP	Fire points Distribution, Brightness, Thermal Radiation	https://earthdata.nasa.gov/earth-observation-data/near-real-time/firms/about-firms (access date: 2 April 2022)
NOAA-20	Fire points Distribution, Brightness, Thermal Radiation	https://earthdata.nasa.gov/earth-observation-data/near-real-time/firms/about-firms (access date: 2 April 2022)
OSM	Road network	https://www.openstreetmap.org/ (access date: 2 April 2022)
CDRs	Temperature, Wind speed, Wind direction	https://www.ncei.noaa.gov/products/climate-data-records (access date: 2 April 2022)

Table 2. Road Classification.

Grading	Speed Interval (km/h)	Description
Walking Road	0–20	Vehicles are difficult to pass and can pass a limited number of pedestrians.
Third Class Highway	30–40	The terrain is more tortuous and can only pass small vehicles
Second Class Highway	60–80	Set with single lane, flat terrain, able to pass through medium-sized vehicles
First Class Highway	80–100	There are multiple lanes, which can accommodate multiple large vehicles at the same time

Table 3. Comparison of Simulation Initial Attack Results in Different Scenarios.

Machine Learning Model	Initial Scene	Number of Escape Fire Points	Escape Rate (%)	Escape Rate Increase (%)
logistic regression	Normal weather	14,927	22.22%	N/A
logistic regression	Extreme weather	15,144	22.54%	0.32%
XGBoost	Normal weather	15,112	22.49%	N/A
XGBoost	Extreme weather	15,432	22.97%	0.48%
Artificial neural network	Normal weather	14,926	22.22%	N/A
Artificial neural network	Extreme weather	16,015	23.84%	1.62%

Table 4. Comparison of Machine Learning Models.

Model	LR			XGBoost			ANN
Model	Accuracy	AUC	F1 Score	Accuracy	AUC	F1 Score	Accuracy	AUC	F1 Score
mean	0.793	0.793	0.801	0.888	0.889	0.893	0.764	0.764	0.77
std	0.028	0.027	0.028	0.025	0.025	0.026	0.03	0.03	0.034
min	0.737	0.733	0.739	0.82	0.823	0.824	0.683	0.681	0.658
25%	0.772	0.777	0.779	0.868	0.873	0.876	0.743	0.745	0.752
50%	0.796	0.795	0.802	0.886	0.887	0.894	0.766	0.766	0.771
75%	0.81	0.811	0.822	0.91	0.906	0.913	0.784	0.782	0.79
max	0.85	0.851	0.865	0.952	0.95	0.956	0.832	0.832	0.846

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Y.; Zhou, K.; Zhang, F. Modeling Wildfire Initial Attack Success Rate Based on Machine Learning in Liangshan, China. Forests 2023, 14, 740. https://doi.org/10.3390/f14040740

AMA Style

Xu Y, Zhou K, Zhang F. Modeling Wildfire Initial Attack Success Rate Based on Machine Learning in Liangshan, China. Forests. 2023; 14(4):740. https://doi.org/10.3390/f14040740

Chicago/Turabian Style

Xu, Yiqing, Kaiwen Zhou, and Fuquan Zhang. 2023. "Modeling Wildfire Initial Attack Success Rate Based on Machine Learning in Liangshan, China" Forests 14, no. 4: 740. https://doi.org/10.3390/f14040740

APA Style

Xu, Y., Zhou, K., & Zhang, F. (2023). Modeling Wildfire Initial Attack Success Rate Based on Machine Learning in Liangshan, China. Forests, 14(4), 740. https://doi.org/10.3390/f14040740

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Wildfire Initial Attack Success Rate Based on Machine Learning in Liangshan, China

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Source

2.3. Feature Selection

2.3.1. Initial Attack Delay

2.3.2. Arrival Time

2.3.3. Spread Potential

2.3.4. Wildfire Fighting Capability

2.4. Feature Pretreatment

2.4.1. SMOTE Oversampling

2.4.2. Correlation Analysis

2.5. Machine Learning Model

2.5.1. Logical Regression

2.5.2. XGBoost

2.5.3. Artificial Neural Networks

3. Results and Discussion

3.1. Normal Weather

3.2. Extreme Weather

3.3. Comparison under Different Scenes

3.4. Model Evaluation

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI