Comparisons of Diverse Machine Learning Approaches for Wildfire Susceptibility Mapping

Climate change has increased the probability of the occurrence of catastrophes like wildfires, floods, and storms across the globe in recent years. Weather conditions continue to grow more extreme, and wildfires are occurring quite frequently and are spreading with greater intensity. Wildfires ravage forest areas, as recently seen in the Amazon, the United States, and more recently in Australia. The availability of remotely sensed data has vastly improved, and enables us to precisely locate wildfires for monitoring purposes. Wildfire inventory data was created by integrating the polygons collected through field surveys using global positioning systems (GPS) and the data collected from the moderate resolution imaging spectrometer (MODIS) thermal anomalies product between 2012 and 2017 for the study area. The inventory data, along with sixteen conditioning factors selected for the study area, was used to appraise the potential of various machine learning (ML) methods for wildfire susceptibility mapping in Amol County. The ML methods chosen for this study are artificial neural network (ANN), dmine regression (DR), DM neural, least angle regression (LARS), multi-layer perceptron (MLP), random forest (RF), radial basis function (RBF), self-organizing maps (SOM), support vector machine (SVM), and decision tree (DT), along with the statistical approach of logistic regression (LR), which is very apt for wildfire susceptibility studies. The wildfire inventory data was categorized as three-fold, with 66% being used for training the models and 33% being used for accuracy assessment within three-fold cross-validation (CV). Receiver operating characteristics (ROC) was used to assess the accuracy of the ML approaches. RF had the highest accuracy of 88%, followed by SVM with an accuracy of almost 79%, and LR had the lowest accuracy of 65%. This shows that RF is better suited for wildfire susceptibility assessments in our case study area.


Introduction
Forests are considered crucial natural resources that play an integral function in preserving the ecological equilibrium of the environment and shielding one-third of the earth. According to FAO, forests across the world inhabit an area of about 4000 million ha, which is approximately 30% of the earth's total surface area [1]. Forests play a vital role in the production of oxygen and purifying the environment [2]. Ecological health is measured by the state and well-being of the forest which are true signs of the condition of the region. In addition, forests have economic and social importance and play important roles in the existence of all living things on planet Earth. Forests also regulate the climate Figure 1. The location of the study area and the wildfire inventory data from 2012 to 2017 that was created from moderate-resolution imaging Spectroradiometer (MODIS) and field surveys.

Wildfire Inventory Data
Inventory data is crucial for training the model, and adequate and reliable inventory data is a prerequisite for machine learning approaches in particular. Moderate-resolution imaging spectrometer (MODIS) is an instrument that is in operation on the Terra as well as Aqua spacecraft. This instrument has electromagnetic spectral bands that can differ from visible bands to thermal infrared bands [33]. To generate wildfire inventory data, we used the GPS-derived data from the state wildlife organization and MODIS fire event data, which is available without restriction from 2012 to 2017 to precisely map the affected areas as polygons. The MODIS fire data includes information about the spatial distribution of fires and also their timestamp. The MODIS fire data is characterized by the exclusion of vegetation in the charred areas, deposits of charcoal and ash, and alteration in the vegetation structure. MODIS fire products or thermal anomalies are primarily derived from MODIS 4 and 11 micrometer radiances, and we used MOD14, which has a temporal resolution of 5 min with

Wildfire Inventory Data
Inventory data is crucial for training the model, and adequate and reliable inventory data is a prerequisite for machine learning approaches in particular. Moderate-resolution imaging spectrometer (MODIS) is an instrument that is in operation on the Terra as well as Aqua spacecraft. This instrument has electromagnetic spectral bands that can differ from visible bands to thermal infrared bands [33]. To generate wildfire inventory data, we used the GPS-derived data from the state wildlife organization and MODIS fire event data, which is available without restriction from 2012 to 2017 to precisely map the affected areas as polygons. The MODIS fire data includes information about the spatial distribution of fires and also their timestamp. The MODIS fire data is characterized by the exclusion of vegetation in the charred areas, deposits of charcoal and ash, and alteration in the vegetation structure. MODIS fire Symmetry 2020, 12, 604 5 of 20 products or thermal anomalies are primarily derived from MODIS 4 and 11 micrometer radiances, and we used MOD14, which has a temporal resolution of 5 min with level 2 processing. MODIS data on Terra and Aqua are attained from each platform twice every day at mid-latitudes. The fire recognition method is centered on the absolute recognition of the fire when the strength of the fire is sufficient to be detected. A total of 34 fire hotspots (17,420 pixels) were identified for the study area for the given period of time with a resolution of 1km which is appropriate to identify the burned area. The hotspots identified were enriched by means of the GPS data collected through field surveys for credibility. The wildfire inventory data was thus created with a combination of GPS data and data from MODIS. All the corrections were carried out using ArcGIS software. The dataset for the state wildlife collected through GPS surveys may not be complete and may have missed smaller fires in the region. This was completed based on the MODIS dataset, while the larger wildfires were incorporated from the state department, resulting in comprehensive inventory data for wildfires for the study area.

Conditioning Factors
The spatial probability of wildfire occurrence is the probability that the given region will be affected by wildfires based on the environmental conditions. These conditions can be geomorphological, topographical, meteorological, hydrological, and anthropological. These factors are known as conditioning factors and can highly influence the final susceptibility mapping; they are selected based on their relevance to the study area. It is pivotal to standardize the causative factors for the hazard in preparing for any of the natural hazard susceptibility mapping [13]. For this study, sixteen conditioning factors were selected for Amol County based on their relevance to the study area and the availability of the data. The conditioning factors were categorized based on topographic, hydrological, meteorological and anthropological factors, along with normalized difference vegetation index (NDVI) as the vegetation factor, as seen in Table 1. The ranges for the input conditioning factors were classified based on their relevance and importance to forest fires, the literature review, and expert experiences. See Table 2. The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) (NASA, California Institute of Technology, USA) on board the Terra spacecraft of the National Aeronautics and Space Administration (NASA) delivers the Global Digital Elevation Model (GDEM) with a resolution of 30 m. This was used to derive topographic conditioning factors like distance to stream, landform, topographic wetness index (TWI), plan curvature, slope, slope aspect, and altitude.
The data for annual temperature, annual rainfall, wind effect, distance to road, recreation area, potential solar radiation and distance to village were provided by the state wildlife organization of Amol County (SWOAC) and the state meteorological organization of Amol County (SMOAC). Landsat-8 was used for generating the NDVI conditioning factor for the highly vegetative period and Land Use from the United States Geological Survey (USGS) archive (http://earthexplorer.usgs.gov) with a resolution of 30 m.
All the input conditioning factors were resampled using ArcGIS software with a resolution of 30 m and were classified based on expert opinion, literature review, and relevance to the study area.

Methods
Machine learning approaches have contributed significantly during recent years in the evolution of prediction systems providing enhanced performance and efficient results. The persistent advancement of ML approaches over the last few years recognized their suitability for various natural hazard predictions with an adequate degree of surpassing traditional approaches. ML is independent of expert knowledge, but it exclusively hinges on the inventory data. For this study, we have used eleven ML approaches for wildfire susceptibility assessment like ANN, DR, DM neural, LARS, MLP, RF, RBF, LR, SOM, SVM, and DT. Artificial neural network impersonates the performance of the human brain through a set of nodes that are interconnected [34]. The ANN imitates the human brain in two main respects: firstly, it obtains knowledge through a learning procedure; and secondly, the knowledge gained is stored through synaptic weights [35]. The ANN approach is trained to differentiate and simplify the association between input and output. In general, there is a link in multiple-input non-linear procedures among minor single interrelated and interconnected neural networks and weighted interconnections. Spatial prediction of wildfire susceptibility is a complex and non-linear problem, where an optimal solution can be found through ANN by determining the patterns between the conditioning factors and responses. There are various neural networks for different purposes, and we used the most widely used MLP architecture and used the backpropagation algorithm (BPA) in order to train the model. Neurons that are present in the same hidden layer are not connected in the MLP architecture, but there are connections neurons belonging to one layer and the neurons belonging to the next layer. The amount of concealed layers in the neural network depends on the complexity of the problem. The size and number of the hidden layers in the ANN model are generally established based on the application [36]. To minimize the error from the random selection of initial weightings, the repeated backward process is used to update the initial weightings. For this study, the input layer was made up of 16 neurons (based on the number of input conditioning factors), one concealed layer (20 neurons), and one output layer that acted as the network structure. The learning rate was fixed at 0.09, and the number of epochs was set to 500 for our model.

Dmine Regression (DR)
The Dmine Regression (DR) technique executes a regression analysis on data sets that have a binary or interval level target variable. The DR technique calculates a forward stepwise least squares regression. An independent variable is selected at each step, which contributes greatly to the model r-square value. The DR is able to calculate all two-way interactions of classification variables and uses the AOV16 variables to recognize non-linear relations between the interval and target variables. Another advantage is that the DR is able to use the group variables to decrease the number of levels in the classification of variables.

Dmneural
The Dmneural technique is a ML modelling tool that can be used for fitting a non-linear model. The nonlinear model uses transformed principal components as inputs to forecast a binary or interval target variable. The Dmneural technique is intended to offer a flexible target prediction by means of an algorithm that has similarities to a neural network. Using the principle components approach, the problem of selecting useful inputs is avoided. The complexity of the model is controlled by selecting the quantity of stages in the multi-stage prediction formula.

Least Angle Regression (LARS)
LARS is a classic model selection approach well known as forwarding selection or forward stepwise regression [37]. LARS is the formalized version of the stage-wise technique that uses a simple mathematical formula to fast-track the calculations. Forward stepwise regression develops a model in sequence, adding a single variable at a time. At each step, it recognizes the finest variable to include in the active set and then updates the least squares fit to comprise all the active variables. LARS uses a similar approach but only enters predictors as it needs. As a first step, it recognizes the variable best correlated with the response. Rather than fitting this variable entirely, LAR transfers the coefficient of this variable continuously toward its least square value. As soon as another variable catches up, the process is stopped. LARS has a simple structure and lends itself to inferential analysis [38].

Multi-Layer Perceptron (MLP)
The MLP is a neural network that has several hidden layers and the neurons are connected between the neighboring layers. MLP is normally used as a feed-forward supervised neural network and is widely used owing to its clean architecture, fast operation, ease of implementation, and competency when resolving intricate problems arising through classification [39,40]. The MLP system consists of three main layers: an input layer, a hidden layer, and an output layer. These three main layers are used for input of data, transmission of data, and data output, respectively. The function of the hidden layer is to transfer the results to the output layer [41]. Each neuron output can be described scientifically as shown in Equation (1): where y i denotes the input received by a single node j. The function f is a function that can be a threshold, sigmoid or hyperbolic tangent. The weights among the nodes i and j are represented by w ij , and x i signifies the output from node i.

Random Forest (RF)
The RF algorithm was first created using the random subspace method [42]. Random forest (RF) is a machine learning technique in which the input dataset is classified based on an ensemble of multiple decision trees. RF has received increasing attention in recent years due to its ability to produce excellent classification results with a rapid processing speed [43]. Additionally, the feature set is selected randomly at each stage at which the output is forecasted, and then each of the outputs is given a weighting with a value that is based on the votes obtained. Based on the outputs of decision tree assessments, the majority vote converges on a single decision tree for the final classification [44]. In order to overcome the uncertainty problem, a single decision tree can be used, and this will result in higher prediction accuracy [45]. The crucial step in the RF classification is to derive high variance from different decision trees. Obtaining a high degree of variance from diverse decision trees is vital for this classification method. RF is regarded as being one of the best operative non-parametric ensemble learning approaches for the purpose of wildfire susceptibility mapping and modeling. The main training options in the RF model are the use of the maximum number of trees, the variable number required for the split search, and the variant for the sampling process [46]. The first and the second training options are options considered in the split searching of the RF. For the purposes of this analysis, the maximum number of trees used was set to 200. The selection process category was measured as a fraction that recognizes the percentage of observations applied for each tree. For the final forest model, an out-of-the-bag (OOB) sample statistic was used. This OOB sample statistic identifies how a model will be executed when using new inputs. The inputs used in the training samples are called bagged observations, and the input data is referred to as bagged data for the decision tree in the RF approach.

Radial Basis Function (RBF)
The RBF is a neural network that contains an input layer, a hidden layer and an output layer with a feed-forward structure [39,47]. The hidden layer in the network assembles the data from the input layer and passes the data to the Gaussian transfer function, which transforms the data and regulates the data nonlinearly. The Gaussian function reactions are then linearly fused to generate the data of the output layer. The RBF is broadly applied in numerous applications like data classification, time series prediction, system control, and dynamic system problems due to the capability of estimating the conduct directly from the input and output data [41]. RBF systems attempt to curtail the training error, and can be described as shown in Equation (2): where e j(t) is the error of each output unit.

Logistic Regression (LR)
LR is classified with the statistical models known as generalized linear models. Although LR is not considered to be a ML model, it was used for wildfire susceptibility mapping in this study due to its popularity in this field. LR is able to investigate a series of glitches where the results are impacted by one or more factors. The factors influencing the results are referred to as independent variables, which can be discrete or continuous, or a combination of discreet and continuous [48]. Logistic regression allows the forecasting of discrete outcomes, such as group membership, from a set of variables that may be continuous, discrete, or a mixture of any of these types. LR is intended to build regressions that are able to fit fundamental associations among several explanatory variables and dependent variables [49]. The dependent variable is typically binary, with values of either 0 or 1. Logistic regression also offers knowledge on the relationships and strengths among the variables. The main advantage of LR is that, with the addition of a fitting link function to the typical linear regression model, the variables may be either continuous or discrete, or a combination of both types. They usually do not have normal distributions [50].

Self-Organizing Maps (SOM)
Self-Organizing Map (SOM) is an unsupervised learning technique which is neutrally motivated and used in various data analysis tasks. When the classifications of a test set are not known, unsupervised neural networks are used to carry out classifications. Initially, Kohonen generally defined the initial unsupervised neural network as an approach for repetitively segregating the classification space [51]. This approach was designated as a self-organizing map (SOM). SOMs comprise a non-parametric analysis approach built on neural networks that derive general arrangements from the user-defined gridded data over a region [52]. Multivariate and multi-dimensional inputs can be processed by SOMs on a user-defined grid by making a spatially arranged set of general patterns from the input data. User-defined grids can be 4 × 3, i.e., twelve, nodes [53].

Support Vector Machines (SVM)
A support vector machine is a data mining ML approach universally used with a set of linear indicator functions that are used to issue function estimations [54]. SVM is also known as the maximum-margin method, which provides better performance and superior results with an inadequate number of data points. SVM is grounded in statistical knowledge theory, which maps the datasets into a high dimensional feature space through nonlinear transformers to generate the best hyperplane [55]. When there is maximal separations between the margins among the defined classes of the problem, the best hyperplane can be achieved. SVMs have two layers that can implement diverse functions like linear, radial, polynomial or sigmoid; hence, they are unidirectional. The performance of the SVM model is greatly influenced by kernel functions like linear, polynomial, sigmoid and radial basis function (RBF) [56].

Decision Tree (DT)
The DT can be described as a non-supervised non-parametric learning method for prediction and classification [57]. DT has the distinct advantage of easy interpretation and comprehension for the purposes of comparison and validation of options by decision-makers. DTs are easy to build and interpret, and their predictions are efficient. The core idea of the decision tree is to divide the data recursively into subsets to make sure that each of the subsets consists of more or less homogeneous states of the target variable (predictable attribute). All input attributes are assessed for their impact on the predictable attribute at each split in the tree, and when this recursive process is finished, a decision tree has been designed [58]. The decision tree is called a classification tree if the predictable target attribute consists of discrete data, and if it is a continuous variable, then it is called a regression tree. The whole process of decision tree building is known as decision tree induction. Many approaches have been developed for performing decision tree induction. However, the general approach to decision tree induction is similar for all types of decision tree approach. Each approach employs a different learning algorithm to determine a model that fits the relationship between the attribute set and class label of input data best. A model generated by a learning algorithm should both fit the input data well and correctly predict the class labels of records it has never seen before [59].

Results
Each of the eleven selected ML approaches was applied using the sixteen wildfire conditioning factors in order to derive the susceptibility mapping for each ML approach for Amol County in Iran. The resulting susceptibility map for each ML approach is shown below in Figure 2. The susceptibility map indicates the probability of wildfire occurrences based on the relevant conditioning factors for a given region. There is no specific classification method for classifying the resulting susceptibility map. However, the natural break classification method, which is widely used, was used to classify the resulting susceptibility maps derived from the eleven ML approaches. The natural break classification method is useful for interpreting values that lie close to each class boundary. Each of the eleven ML approaches used for attaining the susceptibility maps was classified based on the natural break classification method and into one of five classes (very high, high, moderate, low and very low), which served as a uniform classification system for the purpose of comparing the results. The natural break classification method is based on the data itself, whereby the groupings are done naturally and the classification interval is chosen based on where the groups themselves suggest optimal groupings or similar values [60]. This also helps in the accuracy assessments for the machine learning approaches, which were used to ascertain the best machine learning approach to wildfire susceptibility mapping. Based on the differences between the resulting susceptibility maps, it can be seen that each model was influenced by one or more conditioning factors that distinguished it from the others. However, the role of the slope aspect factor was much more significant. map indicates the probability of wildfire occurrences based on the relevant conditioning factors for a given region. There is no specific classification method for classifying the resulting susceptibility map. However, the natural break classification method, which is widely used, was used to classify the resulting susceptibility maps derived from the eleven ML approaches. The natural break classification method is useful for interpreting values that lie close to each class boundary. Each of the eleven ML approaches used for attaining the susceptibility maps was classified based on the natural break classification method and into one of five classes (very high, high, moderate, low and very low), which served as a uniform classification system for the purpose of comparing the results. The natural break classification method is based on the data itself, whereby the groupings are done naturally and the classification interval is chosen based on where the groups themselves suggest optimal groupings or similar values [60]. This also helps in the accuracy assessments for the machine learning approaches, which were used to ascertain the best machine learning approach to wildfire susceptibility mapping. Based on the differences between the resulting susceptibility maps, it can be seen that each model was influenced by one or more conditioning factors that distinguished it from the others. However, the role of the slope aspect factor was much more significant.

Accuracy Assessment
The accuracy assessment is a crucial step in understanding the accuracy of the w susceptibility maps. For the validation, we used three-fold CV along with the receiver op characteristics (ROC) curve approach in order to determine the accuracy of each model.

Receiver Operating Characteristics (ROC)
The ROC method is frequently used to illustrate the performance of a model [61]. Th curves plotted present a comparison between the true positive rate and the false positive rate y-axis and x-axis. In order to assess the accuracy of each approach, the area under the curve was used, for which values closer to 1 indicate higher accuracy and values closer to zero i lower accuracy of the susceptibility map. The ROC curves were calculated for all of the w susceptibility maps derived from the eleven ML approaches (see Figure 3). The correspondin values for each wildfire susceptibility map based on each ML and each fold are presented in

Cross-Validation (CV)
The three-fold CV was applied in order to prepare the inventory dataset of wildfire pi the training and testing processes of each applied ML. For the three-fold CV validation meth wildfire inventory data pixels are arbitrarily distributed into three reciprocally exclusive fo D2, and D3. Each ML is run using two folds for training the model, and the resulting w susceptibility map is tested with the third fold, which was not used for training. For example an ML is trained the two folds D2 and D3, the resulting maps will be tested with the D1 fold. inventory dataset of wildfire pixels was randomly divided into three folds, each time, 66% pixels) were used for training the applied ML approaches, while 33% (5806 pixels) of the w inventory data was reserved for the accuracy assessment. The three folds were used to stu volume of the wildfire inventory data pixels along with eleven ML approaches. Since we used

Accuracy Assessment
The accuracy assessment is a crucial step in understanding the accuracy of the wildfire susceptibility maps. For the validation, we used three-fold CV along with the receiver operating characteristics (ROC) curve approach in order to determine the accuracy of each model.

Receiver Operating Characteristics (ROC)
The ROC method is frequently used to illustrate the performance of a model [61]. The ROC curves plotted present a comparison between the true positive rate and the false positive rate on the y-axis and x-axis. In order to assess the accuracy of each approach, the area under the curve (AUC) was used, for which values closer to 1 indicate higher accuracy and values closer to zero indicate lower accuracy of the susceptibility map. The ROC curves were calculated for all of the wildfire susceptibility maps derived from the eleven ML approaches (see Figure 3). The corresponding AUC values for each wildfire susceptibility map based on each ML and each fold are presented in Table 3. 0 to 1 (0-100%) when determining the performance of each ML approach. If the ROC values ranged between 0.5 and 0.6 (<60%) then the performance of the model was considered to be poor or bad, ROC values between 0.6 and 0.7 (<70%) were considered to be moderate, ROC values between 0.7 and 0.8 (<80%) were considered to be good, ROC values between 0.8 and 0.9 (<90%) were considered to be very good, and ROC values of more than 0.9 and ranging up to 1 (>90%) were considered to indicate exceptional performance of the model

Cross-Validation (CV)
The three-fold CV was applied in order to prepare the inventory dataset of wildfire pixels for the training and testing processes of each applied ML. For the three-fold CV validation method, the wildfire inventory data pixels are arbitrarily distributed into three reciprocally exclusive folds: D 1 , D 2 , and D 3 . Each ML is run using two folds for training the model, and the resulting wildfire susceptibility map is tested with the third fold, which was not used for training. For example, when an ML is trained the two folds D 2 and D 3 , the resulting maps will be tested with the D 1 fold. As our inventory dataset of wildfire pixels was randomly divided into three folds, each time, 66% (11,614 pixels) were used for training the applied ML approaches, while 33% (5806 pixels) of the wildfire inventory data was reserved for the accuracy assessment. The three folds were used to study the volume of the wildfire inventory data pixels along with eleven ML approaches. Since we used several ML approaches, the CV method was helpful for dealing with the negative effects of randomness on the resulting maps based on each ML approach. The application of a k-fold CV for using MLs for hazard susceptibility mapping has been described in detail by   [62]. The average accuracy assessment of all applied folds is considered as the CV value for each wildfire susceptibility map based on each ML, as presented in Table 3. Thus, the whole process was implemented three times, using different folds of the inventory dataset of wildfire pixels each time. For the accuracy assessment using the three-fold CV along with ROC-AUC, the values ranged from 0 to 1 (0-100%) when determining the performance of each ML approach. If the ROC values ranged between 0.5 and 0.6 (<60%) then the performance of the model was considered to be poor or bad, ROC values between 0.6 and 0.7 (<70%) were considered to be moderate, ROC values between 0.7 and 0.8 (<80%) were considered to be good, ROC values between 0.8 and 0.9 (<90%) were considered to be very good, and ROC values of more than 0.9 and ranging up to 1 (>90%) were considered to indicate exceptional performance of the model.

Discussion
Susceptibility mapping is a crucial component of addressing wildfire risk, which poses a high risk to lives and the ecosystem. It is a fundamental module in emergency management and in the planning of mitigation measures aimed at decreasing adverse impacts [63]. The ML approaches fare better than statistical approaches, as shown in previous studies [64]. Moreover, compared to the other wildfire susceptibility studies in the same area and neighboring forestry areas [4,6,9,65], we used and compared many more of the available and commonly used ML approaches in order to show their capabilities with respect to wildfire susceptibility mapping.
Of all the ML approaches used in this study, RF showed the highest accuracy in any fold of the inventory data set. The highest accuracy for this ML approach was obtained within the second fold, with an AUC value of more than 94%. However, the third fold had a lower AUC value of 85%. Two ML approaches-DR and LARS-presented almost the same accuracies in each fold and subsequently in the resulting CV values. Among all of the applied MLs, RBF had the lowest CV accuracy.

Conclusions
Wildfires have become a frequent hazard across the globe, ravaging forests, burning acres of habitat and causing a loss of lives. The prediction of wildfires is a significant component of emergency management, and mapping susceptible wildfire areas will help in mitigating the impact of fires. Susceptibility maps are widely and increasingly frequently used in order to prioritize locations with respect to managing hazards. However, each susceptibility map might differ based on the input parameters and the methodology used for producing them, which may have varying accuracies. For this study, we used eleven machine learning approaches that were developed and trained based on the historic wildfire events from 2012 to 2017, along with sixteen relevant conditioning factors for the study area. The performance of each ML approach was assessed with respect to accuracy using the ROC curve. Most of the ML approaches showed an accuracy above 70%, except for LR, Dmneural, and RBF, which had the lowest accuracies, indicating that those approaches were not suitable for wildfire susceptibility mapping in our study area. However, this might be different for other study areas, in which these ML approaches might have higher accuracies depending on the available conditioning factors and training data sets.
We possessed sufficient wildfire inventory data for training the ML approaches and also for testing the ML models. However, the manner in which the quantity of training data influences the performance of each ML approach is still not clear, and this might be a limitation of this study that could be studied in more detail in future work. However, we used three-fold CV to address this limitation, as well as the randomness among the resulting wildfire susceptibility maps. Thus, the application of a k-fold CV is highly recommended for similar studies. Furthermore, for future work, we would like to consider the vulnerable areas in the study area, as susceptibility maps can demonstrate the locations of elements at risk and can be incorporated in order to derive wildfire risk maps that incorporate a vulnerability analysis. Obtaining information on the communities within the study area with limited capacities and capabilities to prevent wildfires will be crucial for deriving risk maps and will help in mitigation and planning. In addition, seasonal aspects such as seasonal climate data will be considered for the wildfire susceptibility mappings of our future work.