Performance Evaluation of Machine Learning Methods for Forest Fire Modeling and Prediction

: Predicting and mapping ﬁre susceptibility is a top research priority in ﬁre-prone forests worldwide. This study evaluates the abilities of the Bayes Network (BN), Naïve Bayes (NB), Decision Tree (DT), and Multivariate Logistic Regression (MLP) machine learning methods for the prediction and mapping ﬁre susceptibility across the Pu Mat National Park, Nghe An Province, Vietnam. The modeling methodology was formulated based on processing the information from the 57 historical ﬁres and a set of nine spatially explicit explanatory variables, namely elevation, slope degree, aspect, average annual temperate, drought index, river density, land cover, and distance from roads and residential areas. Using the area under the receiver operating characteristic curve (AUC) and seven other performance metrics, the models were validated in terms of their abilities to elucidate the general ﬁre behaviors in the Pu Mat National Park and to predict future ﬁres. Despite a few di ﬀ erences between the AUC values, the BN model with an AUC value of 0.96 was dominant over the other models in predicting future ﬁres. The second best was the DT model (AUC = 0.94), followed by the NB (AUC = 0.939), and MLR (AUC = 0.937) models. Our robust analysis demonstrated that these models are su ﬃ ciently robust in response to the training and validation datasets change. Further, the results revealed that moderate to high levels of ﬁre susceptibilities are associated with ~19% of the Pu Mat National Park where human activities are numerous. This study and the resultant susceptibility maps provide a basis for developing more e ﬃ cient ﬁre-ﬁghting strategies and reorganizing policies in favor of sustainable management of forest resources.


Introduction
Fires are potentially the most destructive natural disaster in forested areas [1,2] that burn millions of hectares annually [3,4] and are responsible for the loss of biodiversity, soil quality, and CO 2 capture [5]. The susceptibility of the forests and their adjacent areas, i.e., human settlements and infrastructures, to fires is a major concern to the communities in many land ecosystems of the world [6][7][8][9][10][11][12]. Increased changes in socioeconomic processes and climate that induced extensive modification of natural environment [13,14] and prolonged drought periods [15][16][17][18][19] have placed strong demands on authorities and decision makers to temporally and spatially delineate the forested areas in terms of susceptibility to fires [6,11,20]. Identifying areas with high/very high fire susceptibility must be undertaken to successfully design fire management plans [21] and allocate firefighting resources [22][23][24][25][26]. To this end, robust approaches and tools are required to enable the managers and engineers to accurately estimate the time, location, and extent of future fires [8,10,12,[27][28][29]. The improvements in techniques for predicting fire susceptibility and delineating the forested areas into different susceptibility levels can help forest managers and policy makers to achieve a better understanding of fires that facilitates the development of prevention measures for the fire-prone forests [4,30].
In forest fire prediction, however, it is difficult to compile sufficient amounts of spatially explicit geo-environmental data, particularly over large-scale forests, due to field survey difficulties and budgetary constraints. Over the past decade, machine learning methods have successfully reached the primacy as a replacement to the traditional field-survey methods for the prediction of forest fire susceptibility by elucidating the relationship between historic fire events and different explanatory variables in order to predict future fires [20]. Examples for machine learning methods suggested and used for forest fire prediction include decision tree based classifiers [31,32], artificial neural network (ANN) [33,34], neuro-fuzzy [27,[35][36][37], and support vector machine [7,33,38].
Despite the widespread application of these methods, many regions of the world have not yet been delineated in terms of fire susceptibility. Further, no single model/method has been yet identified to capture fire behavior in all regions due to the variation of training data from different regions [6,8,10,11,36,39,40]. To fill this significant gap in fire prediction efforts, we aimed this study to develop a suite of predictive models based on the four machine learning methods, namely Bayes Network, Naïve Bayes, Decision Tree, and Multivariate Logistic Regression for the prediction of fire susceptibility in the Pu Mat National Park of Vietnam. Although these methods have been broadly investigated in environmental studies, particularly for the prediction of landslides and floods [41][42][43][44][45], their joint application and comparison have not yet been reported for forest fire prediction. The outcomes from this study allow researchers to determine if a particular predictive model derived from machine learning methods aligns with their objectives for modeling and mapping of forest fire susceptibility.

Study Area
The Pu Mat National Park is located in the Nghe An Province in the north-central coast region (18 • 46 north latitude and 104 • 24 east longitude) of Vietnam ( Figure 1). This park was established on 8 November 2001 and is a part of the Western Nghe An Biosphere Reserve. This park with an area about 94,804 ha spreads in Tuong Duong, Con Cuong, and Anh Son of Nghe An. Out of the total land area, the strictly protected area encompasses about 89.5 ha, the ecological recovery area covers about 1.6 ha, and a buffer zone that comprises about 86.000 ha. The park is located in a region characterized by the tropical monsoon climate. The average annual rainfall is recorded to be 1800 mm. Topography highly controls temperature such that the average annual temperature is 20 • C on the coast, 15 • C in the areas with an altitude of 900 m, 12 • C in the areas with an altitude of 1800 m, and 5 • C in the areas with an altitude of 2700 m. The highest temperatures are recorded in August, often exceeding 35 • C during the day. This park is usually faced with a five-month drought period, which typically extends from April to August. In general, the Pu Mat National Park is a greatly bio-diversified area in Vietnam that has periodically suffered fire damage. To safeguard the biodiversity as well as human settlements from recurrent fire events and to make more informed decisions for fire suppression operations, systematic, and continuous studies, such as the one presented in this paper, are required. by the tropical monsoon climate. The average annual rainfall is recorded to be 1800 mm. Topography highly controls temperature such that the average annual temperature is 20 °C on the coast, 15 °C in the areas with an altitude of 900 m, 12 °C in the areas with an altitude of 1800 m, and 5 °C in the areas with an altitude of 2700 m. The highest temperatures are recorded in August, often exceeding 35 °C during the day. This park is usually faced with a five-month drought period, which typically extends from April to August. In general, the Pu Mat National Park is a greatly bio-diversified area in Vietnam that has periodically suffered fire damage. To safeguard the biodiversity as well as human settlements from recurrent fire events and to make more informed decisions for fire suppression operations, systematic, and continuous studies, such as the one presented in this paper, are required.

Fire Inventory Map
An inventory map represents the historical fires occurred across the landscape. To prepare the inventory map of the Pu Mat National Park, we used 56 historical fires georeferenced perimeters from the period of 2014-2016. The records for these fires were obtained from the historical archives that were verified via multiple field surveys and observations. These fires usually occurred during the drought period. However, extensive human activities are supposed to intensify the occurrences.

Explanatory Variables
Another important step in forest fire modeling and mapping is compiling a set of independent explanatory variables known as fire causative factors based on their potential relationship with the local characteristics of the area being investigated, historical fires, and data availability. In this study, we collected nine geo-environmental, climate, and human variables (i.e., elevation, slope degree, aspect, average annual temperature, drought index, river density, land cover, and distance from roads and residential areas) and converted each variable to categorized raster format with a cell size of 30 × 30 m (Figure 2).

Fire Inventory Map
An inventory map represents the historical fires occurred across the landscape. To prepare the inventory map of the Pu Mat National Park, we used 56 historical fires georeferenced perimeters from the period of 2014-2016. The records for these fires were obtained from the historical archives that were verified via multiple field surveys and observations. These fires usually occurred during the drought period. However, extensive human activities are supposed to intensify the occurrences.

Explanatory Variables
Another important step in forest fire modeling and mapping is compiling a set of independent explanatory variables known as fire causative factors based on their potential relationship with the local characteristics of the area being investigated, historical fires, and data availability. In this study, we collected nine geo-environmental, climate, and human variables (i.e., elevation, slope degree, aspect, average annual temperature, drought index, river density, land cover, and distance from roads and residential areas) and converted each variable to categorized raster format with a cell size of 30 × 30 m ( Figure 2). Topography-related variables (elevation, slope degree, aspect) were selected due to their relevance to fire occurrence that has been widely demonstrated in the literature [12,24,28,29,36]. Terrain morphology heavily affects human accessibility, species density and composition, and fire behavior [9]. The topography-related variables considered in this study were derived from a 30-m digital elevation model (DEM) of the Pu Mat National Park.
As a hydrological variable, we used river density to quantify the amount of surface water and surrounding humidity within the study area and their influences on fire susceptibility across the study area. River density is the total length of rivers in a drainage basin divided by the total area of the drainage basin. In general, the region with a higher river density has lower sensitivity to fire occurrences [27].
Land cover was another variable that we used for modeling fire occurrence in the Pu Mat National Park. Land cover is a measure of forests, agriculture, wetlands, impervious surfaces, and other land types in a landscape. Land cover is typically used as a proxy for flammability of the landscape [46] for modeling fire probability [47,48]. The land cover map of the Pu Mat National Park was produced using the Landsat satellite images for the year 2016.
The climate-related variables selected for modeling fire susceptibility in the Pu Mat National Park were annual temperature and drought index. Temperature is an important variable for fire prediction because of its effect on the moisture content of the fuel, which in turn is a crucial parameter in fire ignition [49,50]. For this study, the meteorological data corresponding to the 2014-2016 period were used to develop a thematic map of average annual temperature for the Pu Mat National Park. Drought index was another climate-related variable that was used in this study because forest fires have the tandem of occurrence in the region most affected by drought periods [19,51]. Following Karnieli et al. [52], we computed the drought index of the Pu Mat National Park based on the relationship between the normalized difference vegetation index (NDVI) and land surface temperature (LST) as follows: where NIR and RED are near-infrared and red spectral bands, respectively. The reason to select this drought index was the availability of data required for calculating this index. Although other types of drought indices (e.g., standardized precipitation index, standardized precipitation evapotranspiration index, vegetation condition index, vegetation condition index, Palmer drought severity index, and temperature condition index) have been used for fire modeling [19], their application needs long-term precipitation and temperature data [15,16] that are unavailable for the Pu Mat National Park.
The literature identified human activities as a main cause of the majority of forest fires [7,39,53,54]. Activities such as picnic fires, shepherd fires, smoking, hunting, stubble burning, and arson have been repeatedly identified as the main cause of fire ignitions in forests worldwide [29]. Previous studies quantified the effects of human activities on fire probability using the proximity variables that incorporate the information related to distance from roads, railways, houses, industrial areas, and airports into the modeling process [24,55,56]. In this study, we elected to use two main proximity variables: distance from roads and distance from residential areas. The information for generating these layers was obtained from topographic maps at the scale of 1:100,000 obtained from the North Central Geological Federation of Vietnam.

Relief-F Feature Selection Method
Selection of the most influential explanatory variables is a crucial step in a modeling task using machine learning that allows modelers to efficiently focus on those variables that better explain input-output interactions and contribute the most to the modeling process. Feature selection aims to remove irrelevant and redundant features toward introducing a small number of features that define the dataset better than the original set of features. In a modeling study using the machine learning methods, this can be achieved by measuring the importance of each variable for obtaining a higher accuracy in classification. One of the well-known methods in variable selection is the Relief-F method that was originally developed by Kira and Rendell [57] and then upgraded by Kononenko [58]. The original Relief algorithm can detect the conditional dependencies between attributes for feature selection, but its function is restricted to two class issues only. Moreover, it is not handling incomplete, noisy, and duplicate attributes in the dataset. However, the renewed Relief-F algorithm deals with multi class. The Relief-F method is a widely used feature selection method in the literature due to its simplicity and efficiency for variable ranking [59]. Theoretically, this algorithm ranks different features in terms of their utility for the problem being modeled and determines the most efficient features for the prediction task. In fire modeling, Relief-F measures the spatial associations between fire locations and different causative factors to calculate the average merit (AM) of each causative factor in separating fire-prone and fire-proof portions of the landscape.

Bayes Network (BN)
BN [60] is a probabilistic, statistical model that forms a set of random variables and their conditional dependencies (Bayes law) within an annotated directed acyclic graph. BN is a promising tool for explaining the relationships between an event and several possible explanatory variables. Structurally, the BN classifier is a directed acyclic graph where the arcs have a formal interpretation of probabilistic conditional independence. The quantitative part of this graph is a collection of conditional probability tables, each attached to a node that represents the probability of the variable at the node conditioned on its parents in the network. One of the important advantages of BN is that this method handles risk analysis and uncertainty assessment more accurately than the other models that only predict values.
Managing missing values between input data, the ability to combine quantitative and qualitative data, and providing approximate solutions using simulation techniques or estimation methods in cases where a precise solution is not available are among the advantages of this method [61]. Bayes theory enables forward and backward computation, which means that in addition to predicting the target variable using the state of the input variables it is able to determine the effect of each of the input variables on the outputs of the model with the predicted variable status [62].

Naïve Bayes (NB)
NB classifier is a simple supervised function and is special form of discriminant analysis. NB is a member of the probability-based clustering family that utilizes the Bayesian theorem and assumes independence between variables to perform a classification task. Bayesian classification technique is typically used as a simple way to classify and label the objects or points. Although the NB classifier has some drawbacks (e.g., low performance or biased estimation of prior probability) due to its basic assumptions on variable relationships, this method has been proven to work efficiently for many real-world problems [63,64]. To apply NB for forest fire modeling, suppose X = (x 1 , x 2, . . . x n) is a vector of n properties that are independent explanatory variables. Thus, the probability of fire occurrence (p(C k |x 1 , . . . ,x n )) is represented as one of the states of the class of different events for different Ks: NB can be used for both binary and multi-class classification problems. The NB classifier is very useful in high-dimensional problems.

Decision Tree (DT)
DT is a non-parametric, supervised learning method designed for classification and prediction problems. It is easy to interpret (due to the tree structure) and has a Boolean function (if each decision is binary, i.e., false or true). Decision trees extract predictive information in the form of human-understandable tree-rules (If/Then rules). Each decision in the tree can be seen as a feature. To make a prediction using decision trees, a tree-like structure is designed that first starts with all training samples and selects the variable that best fits the class and makes subdirectories [65]. The tree branches are the result of a test performed at each step by the algorithm on the middle nodes. Predictions also appear on the leaves of the tree. The split criterion in a node is based on the standard deviation of the output values that reach that node as a measure of error. By testing each attribute (parameter) in the node the expected decrease in error is calculated [66]. The reduction of standard deviation is calculated with the relation where SDR is the decrease standard deviation, T is the series of samples that reach the node, m is the number of samples that have no missing values for this parameter, β(i) is a correction factor, and TL and TR are sets created by dividing on this parameter.

Multivariate Logistic Regression (MLR)
In a regression model, an equation for predicting the values of the dependent variable based on one or more independent predictor variables is developed. Dependent variable (e.g., occurrence or non-occurrence of a fire) is a two-state qualitative variable that takes the value of 1 or 0. In fire probability modeling, the objective of MLR is to find the best model to describe the relationships between the occurrence or non-occurrence of a fire (i.e., dependent variable) and a set of independent variables known as fire influencing factors [47]. The general form of the logistic regression equation can be given as follows: where P is the probability of fire occurrence and Z is the occurrence (1) or non-occurrence (0) of a fire event that is expressed by where b 0 is the intercept of the equation, bi (i = 0, 1, 2, . . . , n) are the model coefficients, and xi (i = 0, 1, 2, . . . , n) are the fire explanatory variables.

Validation Metrics
One of the most important steps after developing a model is to evaluate its training and predictive performance [18,67,68]. In this study, we used receiver operating characteristics (ROC) and several statistical measures (true positive (TP), true negative (TN), false positive (FP), false negative (FN), positive predictive value (PPV), negative predictive value (NPV), sensitivity (SST), specificity (SPF), accuracy (ACC), Kappa, and root mean square error (RMSE)) for the evaluation and comparison of the models developed for fire probability mapping. The following subsections provide a brief description of each metric.

Receiver Operating Characteristics (ROC)
The receiver operating characteristics (ROC) curve is one of the most important and widely used performance metrics for the evaluation of classification models in terms of their goodness-of-fit and generalizability [69][70][71]. This method is a probability-based curve that can measure models at different thresholds [72]. ROC curve represents a trade-off between sensitivity on the y-axis and 1-specificity on the x-axis. A model with an excellent performance archives the area under the ROC curve (AUC) of >90 [73].

Statistical Metrics
The statistical metrics used for machine learning evaluation are categorized into three main groups of metrics, including specifically threshold, probability, and ranking metrics [74]. Threshold and ranking metrics are the most widely used metrics [75]. For this study, we opted to use the following five established and applicable threshold metrics to evaluate the BN, NB, DT, and MLP models: specificity (SPF), sensitivity (SST), accuracy (ACC), Kappa, and root mean square error (RMSE). Using these performance metrics, we investigated how well the different models used for the prediction of forest fire susceptibility captured the relationships between historical fires and different explanatory variables (i.e., goodness-of-fit with the training dataset) and made decisions when tested with the unseen validation dataset (i.e., generalization ability). We evaluated the goodness-of-fit and generalization ability of the models based on four components (i.e., true positive (TP), true negative (TN), false positive (FP), and false negative (FN)) of a 2 × 2 confusion matrix. TP and TN are the numbers of fires that are correctly classified as, respectively, fires and non-fires. FP and FN are the numbers of non-fires that are incorrectly classified as fires and non-fires [47,76]. The SPF, SST, ACC, Kappa, and RMSE are calculated as follows: where X obs is the observations (i.e., validation dataset), and X est is the estimated values by the forest fire predictive models.

Modeling Methodology
The flowchart of the methodology proposed for the prediction of forest fire susceptibility in the Pu Mat National Park is shown in Figure 3. The methodology starts by compiling a set of explanatory variables and generating an inventory map of the locations of historical fires. The historical fires were randomly allocated to two different sets: training dataset that contained 40 forest fire locations (70%) and validation dataset that included the remaining 17 forest fire locations (30%) [15,67,69,77,78]. To construct the final datasets, an equal number of non-fire points was randomly sampled from non-fire portions of the Pu Mat National Park. We coded the fire points as "1", whereas the non-fire points were coded as "0" [79][80][81][82]. This process yielded training and validation datasets that consisted of 80 and 34 samples, respectively. Then, the dataset pre-processing was carried out using the Relief-F feature selection method to identify the variables with null predictive usefulness [31,68,70,83]. Through the modeling step, the machine learning methods were trained using the training dataset to develop forest fire predictive models. To check for the model robustness, a five-fold cross-validation procedure that produced five different folds of training and validation datasets was used [32,84,85]. In this procedure, one group out of the five groups was used as the validation dataset and the rest were used as the training dataset. Then, the models were trained using the training sets and validated using the validation dataset. This modeling process was repeated until each one of the five groups was used as the validation dataset. The ultimate outcomes of the modeling process were four distribution maps of forest fire susceptibility that were quantitatively analyzed [48] and compared to each other.
Symmetry 2020, 12, x FOR PEER REVIEW 9 of 22 randomly allocated to two different sets: training dataset that contained 40 forest fire locations (70%) and validation dataset that included the remaining 17 forest fire locations (30%) [15,67,69,77,78]. To construct the final datasets, an equal number of non-fire points was randomly sampled from non-fire portions of the Pu Mat National Park. We coded the fire points as "1", whereas the non-fire points were coded as "0" [79][80][81][82]. This process yielded training and validation datasets that consisted of 80 and 34 samples, respectively. Then, the dataset pre-processing was carried out using the Relief-F feature selection method to identify the variables with null predictive usefulness [31,68,70,83]. Through the modeling step, the machine learning methods were trained using the training dataset to develop forest fire predictive models. To check for the model robustness, a five-fold cross-validation procedure that produced five different folds of training and validation datasets was used [32,84,85].
In this procedure, one group out of the five groups was used as the validation dataset and the rest were used as the training dataset. Then, the models were trained using the training sets and validated using the validation dataset. This modeling process was repeated until each one of the five groups was used as the validation dataset. The ultimate outcomes of the modeling process were four distribution maps of forest fire susceptibility that were quantitatively analyzed [48] and compared to each other.

Variable Importance
The results of the Relief-F ranking of fire explanatory variables based on their AM showed that distance from roads had the highest influence on fire occurrences in the Pu Mat National Park (Table  2). It means that greatest AM (85.9) was obtained for this variable, followed by distance from residential areas (83.4), land cover (79.5), elevation (74.4), and annual temperature (71.8). Afterward, aspect, river density, slope degree, and drought index had the lowest AM equal to 56.5, 55.1, 53.8, and 48.7, respectively. These results revealed that the human-related variables (distance from roads and residential areas, and land cover) were the most influential variables, corroborating previous

Variable Importance
The results of the Relief-F ranking of fire explanatory variables based on their AM showed that distance from roads had the highest influence on fire occurrences in the Pu Mat National Park (Table 2). It means that greatest AM (85.9) was obtained for this variable, followed by distance from residential areas (83.4), land cover (79.5), elevation (74.4), and annual temperature (71.8). Afterward, aspect, river density, slope degree, and drought index had the lowest AM equal to 56.5, 55.1, 53.8, and 48.7, respectively. These results revealed that the human-related variables (distance from roads and residential areas, and land cover) were the most influential variables, corroborating previous studies in Vietnam that reported on the significance of human activities on increasing the probability of fire occurrences [25,27,86]. Some other studies suggested that proximity to roads and residence areas intensifies the likelihood and frequency of fire ignitions, even in regions with a relatively low population density [20,87]. In a recent study, Elia, Giannico, Spano, Lafortezza, and Sanesi [56] demonstrated the significance of distance to roads on increasing fire probabilities in more urbanized Mediterranean regions of southern Italy. In contrast, Gralewicz et al. [88] showed the declined probability of fire occurrence close to urbanized regions of Canada due to the policy of fully suppressing all wildfires.
Land cover was ranked among the most influential factors causing fire occurrence in the Pu Mat National Park. Several studies have documented how a specific type (e.g., grasslands) of land cover is closely related to fire events, whereas some others (e.g., farmlands and orchards) are negatively related [89,90]. Nunes et al. [91] demonstrated that fires are selective for land cover such that they prefer specific land cover types. While there was a marked preference for shrubland and forest cover types, farmlands were clearly avoided. Our results revealed that fires are highly correlated with those portions of the Pu Mat National Park that experienced afforestation and urbanization, while natural forests are obviously fire-proof.
Although the Pu Mat National Park suffered recurrent prolonged drought occurrences, it seems that human-related variables are much stronger than the climate-related variables for fire ignition.
Since the AM of all nine explanatory variables was greater than zero, the spatial modeling was performed using all factors [67,68,70,77].

Model Validation and Comparison
To validate the models and compare them to each other in terms of training and validation performances, we computed several performance metrics processing both training and validation datasets (Table 3). Regarding the PPV metric that is the proportion of correctly classified fire samples out of all samples classified as fire samples, the BN model with PPV training = 89.74% and PPV validation = 100% performed the best. In terms of the NPV metric that is the proportion of samples that were correctly classified as non-fire, the DT and MLR models with the values equal to 100% were identified as the best models. Regarding the SST metric that measured the models' abilities to predict a proportion of all fire samples as fire (i.e., true positives), the DT and MLR models with the values equal to 100% were dominant over the other models. In terms of the SPF metric that measured the models' abilities to predict a proportion of all non-fire samples as non-fire (i.e., true negatives), the BN model with PPVtraining = 89.47% and PPVvalidation = 100% was the best model. In terms of the ACC metric that measured the overall models' efficiencies, the MLR (ACC = 92.31%) and BN (ACC = 94.12%) were the most efficient models in the training phase and validation phase, respectively. Regarding the Kappa index, the MLR (Kappa = 0.846) and BN (Kappa = 0.884) showed perfect agreement between observed fires and predicted fires in the training phase and validation phase, respectively. These variant training and validation performances that have been also previously observed in different models used for different applications [24,[67][68][69]71,77,78,83,92] can be attributed to the specific nature and structure the models applied to different datasets. These results underscore the conclusion drawn by Bui, Khosravi, Tiefenbacher, Nguyen and Kazakis [84] that no model exists that always performs the best for all datasets from different sources. In the matter of the magnitude of the modeling error, the four models exhibited training error that ranged from 0.255 (MLR) to 0.339 (NB) and validation error that ranged from 0.192 (BN) to 0.306 (DT) (Figure 4). Again, we are inclined to attribute these asymmetric performances of a model in training and validation phases to its computational algorithm when tested with different datasets.
Symmetry 2020, 12, x FOR PEER REVIEW 11 of 22 underscore the conclusion drawn by Bui, Khosravi, Tiefenbacher, Nguyen and Kazakis [84] that no model exists that always performs the best for all datasets from different sources. In the matter of the magnitude of the modeling error, the four models exhibited training error that ranged from 0.255 (MLR) to 0.339 (NB) and validation error that ranged from 0.192 (BN) to 0.306 (DT) (Figure 4). Again, we are inclined to attribute these asymmetric performances of a model in training and validation phases to its computational algorithm when tested with different datasets.  The AUC values for the BN, DT, MLR, and NB models obtained from the training phase processing the training dataset were 0.99, 0.969, 0.986, and 0.983, respectively (Figure 5a). Based on these values, all four models performed excellently in distinguishing between training samples (fires and non-fires) with respect to the explanatory variables, although the BN model performed slightly better than the others.
The AUC values obtained from the validation phase that processed the validation dataset exhibited values of 0.96, 0.94, 0.937, and 0.939 for the BN, DT, MLR, and NB models, respectively (Figure 5b). Based on these results and the interpretation of AUC ranges given in the literature [73,93], we can say all four models developed in this study have a very high ability to predict future fire occurrences. In contrast to our results, previous studies have found that the LR models are often outperformed by other models for fire prediction [7,34,87,94]. The AUC values for the BN, DT, MLR, and NB models obtained from the training phase processing the training dataset were 0.99, 0.969, 0.986, and 0.983, respectively (Figure 5a). Based on these values, all four models performed excellently in distinguishing between training samples (fires and non-fires) with respect to the explanatory variables, although the BN model performed slightly better than the others.
The AUC values obtained from the validation phase that processed the validation dataset exhibited values of 0.96, 0.94, 0.937, and 0.939 for the BN, DT, MLR, and NB models, respectively (Figure 5b). Based on these results and the interpretation of AUC ranges given in the literature [73,93], we can say all four models developed in this study have a very high ability to predict future fire occurrences. In contrast to our results, previous studies have found that the LR models are often outperformed by other models for fire prediction [7,34,87,94].

Robustness Analysis
The analysis of the model robustness based on the five different datasets (Fold 1-5) and three performance metrics (ACC, RMSE, and AUC) showed that the models were very stable, and their performance changed in a narrow range (Table 4). For example, the training phase of the BN model ranged ACC from 87.17% to 88.46% (mean = 87.44% and standard deviation = 0.57%), ranged RMSE from 0.279 to 0.301 (mean = 0.29 and standard deviation = 0.01), and ranged AUC from 0.98 to 0.99 (mean = 0.98 and standard deviation = 0.00). Further, the validation phase of this model ranged ACC from 99.85 to 100% (mean = 99.90% and standard deviation = 0.06%), ranged RMSE from 0.192 to 0.31 (mean = 0.28 and standard deviation = 0.05), and ranged AUC from 0.941 to 0.96 (mean = 0.96 and standard deviation = 0.01). Overall, these results revealed that four models used in this study are reliable and robust in response to training and validation data sets change. Our results are supported by previous works that reported on the reliability and robustness of machine learning methods for environmental studies [32,35,36,59,68,95] as well as for other real-world problem [96,97].

Robustness Analysis
The analysis of the model robustness based on the five different datasets (Fold 1-5) and three performance metrics (ACC, RMSE, and AUC) showed that the models were very stable, and their performance changed in a narrow range (Table 4). For example, the training phase of the BN model ranged ACC from 87.17% to 88.46% (mean = 87.44% and standard deviation = 0.57%), ranged RMSE from 0.279 to 0.301 (mean = 0.29 and standard deviation = 0.01), and ranged AUC from 0.98 to 0.99 (mean = 0.98 and standard deviation = 0.00). Further, the validation phase of this model ranged ACC from 99.85 to 100% (mean = 99.90% and standard deviation = 0.06%), ranged RMSE from 0.192 to 0.31 (mean = 0.28 and standard deviation = 0.05), and ranged AUC from 0.941 to 0.96 (mean = 0.96 and standard deviation = 0.01). Overall, these results revealed that four models used in this study are reliable and robust in response to training and validation data sets change. Our results are supported by previous works that reported on the reliability and robustness of machine learning methods for environmental studies [32,35,36,59,68,95] as well as for other real-world problem [96,97].

Forest Fire Susceptibility Maps
Four maps were generated to depict the forest fire susceptibility predicted by BN, DT, MLR, and NB models ( Figure 6). For each map, the probability of fire occurrence was classified into three levels of susceptibility from low to high using the natural breaks classification method. A quantitative analysis of the produced maps revealed that the lowest susceptibility level output from the NB model applied to 81.28% of the land area, whereas the moderate and high covered 4.52% and 14.2% of the area, respectively (Figure 7a). The MLR model classified the study at 81.54, 5.93, and 12.53% for the low, moderate, and high susceptibilities, respectively. The DT model classified 77.99% low susceptibility, 16.68% moderate susceptibility, and 5.33% high susceptibility. The BN model classifications were 82.68%, 8.51%, and 8.81% for the low, moderate, and high susceptibilities, respectively. On average, the models classified 80.9%, 8.9%, and 10.2% of the Pu Mat Natural Park into low, moderate, and high susceptibility to fire occurrences. Given the centralized moderate and high susceptibility classes around areas where human stakes are numerous (i.e., roads and residential areas), we can conclude that anthropogenic pressures transformed 19.1% of the study area into a susceptible zone to the future fires. The reliability of the forest fire susceptibility maps was assessed using frequency ratio analysis (Figure 7b,c), which donates the ratio between the percentage of actual fires and the percentage of the entire area for each susceptibility zone. In all four maps, the highest frequency ratio values belonged to the high susceptibility classes, followed by moderate and low classes for all produced maps. This indicates that all models performed well in delineating the Pu Mat National Park regarding the historical fire locations [47,48]. Despite these promising results, there might be several uncertainties in such susceptibility maps. One possible source of uncertainty is the edge effect, which happens when some fires ignited outside the study area spread to the study area and may alter the level of susceptibility near the boundaries [98,99]. Since no information is available for fires spreading from the other areas to the Pu Mat National Park, we failed to analyze the potential edge effect in this study. When the required information is available, the application of the edge detection methods [98] can help researchers ensure that edge effected regions are identified and removed.

Conclusions
The accurate prediction of fire probability aids forest managers in drafting more efficient firefighting strategies and also helps to reorganize policies for sustainable management of forest resources. To achieve these, we evaluated and compared four fire predictive models derived from the BN, NB, DT, and MLR machine learning methods for predicting and mapping fire susceptibility in the Pu Mat National Park, Vietnam. We formulated our modeling methodology based on processing the information from the historical fires and a set of spatially explicit explanatory variables. The outcome of the ROC-AUC method and several other performance metrics revealed that all four models developed in this study had high accuracy in predicting future fire susceptibilities (AUC > 0.90) in the Pu Mat National Park, although the BN model performed slightly better than the others. Given the similar performance of these models, perhaps the most remarkable difference between these four models is in interpretability. Managers and decision makers prefer a model and value its outputs if they have some understanding of how the model yielded such the outputs, so the poor interpretability of a machine learning model may restrict its application in practice. From our experience, it is much easier to interpret the MLR model than the complex algorithmic-based Bayesian and DT models. Therefore, the outcomes from our study provide several implications for the selection of a specific model over the others. Although the explanatory variables of the models are not expected to change significantly over time and the results could thus be seen as a long-term prediction of fire susceptibility for the Pu Mat National Park, human activities that change land use patterns would render our long-term susceptibility estimates obsolete, making it necessary to regularly update the current susceptibility maps.
Our findings also demonstrated that fires affect ~19% of the study area, where human activities are numerous and fall within the moderate to high susceptibility to fire occurrence mainly because of the increased road developments and forest-human interfaces, underscoring the need to careful attention from the managers to avoid catastrophes in these portions of the area. Although we achieved a high level of prediction accuracy using the current dataset and four machine learning models, future studies could incorporate other explanatory variables (e.g., vegetation type and density, wind speed and direction, and different types of socio-economic factors and drought indices) into the modeling process in favor of better explaining fire behavior in the Pu Mat National Park.

Conclusions
The accurate prediction of fire probability aids forest managers in drafting more efficient fire-fighting strategies and also helps to reorganize policies for sustainable management of forest resources. To achieve these, we evaluated and compared four fire predictive models derived from the BN, NB, DT, and MLR machine learning methods for predicting and mapping fire susceptibility in the Pu Mat National Park, Vietnam. We formulated our modeling methodology based on processing the information from the historical fires and a set of spatially explicit explanatory variables. The outcome of the ROC-AUC method and several other performance metrics revealed that all four models developed in this study had high accuracy in predicting future fire susceptibilities (AUC > 0.90) in the Pu Mat National Park, although the BN model performed slightly better than the others. Given the similar performance of these models, perhaps the most remarkable difference between these four models is in interpretability. Managers and decision makers prefer a model and value its outputs if they have some understanding of how the model yielded such the outputs, so the poor interpretability of a machine learning model may restrict its application in practice. From our experience, it is much easier to interpret the MLR model than the complex algorithmic-based Bayesian and DT models. Therefore, the outcomes from our study provide several implications for the selection of a specific model over the others. Although the explanatory variables of the models are not expected to change significantly over time and the results could thus be seen as a long-term prediction of fire susceptibility for the Pu Mat National Park, human activities that change land use patterns would render our long-term susceptibility estimates obsolete, making it necessary to regularly update the current susceptibility maps.
Our findings also demonstrated that fires affect~19% of the study area, where human activities are numerous and fall within the moderate to high susceptibility to fire occurrence mainly because of the increased road developments and forest-human interfaces, underscoring the need to careful attention from the managers to avoid catastrophes in these portions of the area. Although we achieved a high level of prediction accuracy using the current dataset and four machine learning models, future studies could incorporate other explanatory variables (e.g., vegetation type and density, wind speed and direction, and different types of socio-economic factors and drought indices) into the modeling process in favor of better explaining fire behavior in the Pu Mat National Park.