1. Introduction
Forest fires are a worldwide phenomenon. They affect the structure, composition, and diversity of forests [
1]. The intensity of forest fires has increased profoundly during the last two decades due to the uncontrolled expansion of built-up areas and hot weather conditions [
2,
3]. The main natural drivers of forest fires are very high temperatures and dry fuel, while the main artificial driver is anthropogenic activities. However, the meteorological conditions are the major player; they account for more than 90% of instances of forest fire propagation, behavior, and progress. On the other hand, dry fuel strongly influences the flammability and combustion properties of forest fires and, consequently, the spread and behavior of fire. On the other hand, wind direction and speed largely affect propagation of forest fires [
4].
The forest fire regime has four major features and two minor features [
5]. The major features are the frequency, size, intensity, and seasonality of fire, whereas the minor features are the type and severity of fire. These six features are mainly influenced by weather conditions, vegetation type, and human activities. The frequency of fire depends on the season; if low precipitation and high temperature prevail during summer, then a high frequency of fires is expected in the equatorial regions. In other respects, the vegetation type affects the size of the forest fire. Specifically, a forest with highly dense deciduous vegetation is much more prone to large and aggressive fires [
5].
Fire intensity greatly depends on the type of fuel(s) involved and the meteorological and topographical conditions [
2]. It can be estimated based on the amount of energy discharged during a single fire event. Seasonality of fires, however, largely, or almost solely, depends on the meteorological conditions [
6]. A slight change in climatic conditions profoundly affects the fire seasonality trajectory. The various fire types (i.e., surface fires, crown fires, and ground fires) are largely controlled by the fuel state, whether it is dry or moist [
2]. Lastly, fire severity is a measure of the immediate effects of fire on vegetation, litter, and the soils in the impacted areas [
7].
A review of the literature uncovers that there is a strong relationship between forest fire events and the weather conditions. Several studies have shown that there is a strong correlation between fire frequency and the changing climatic conditions globally [
8,
9]. For instance, Ahmad and Goparaju [
10] analyzed association of forest fires with climatic parameters at different temporal scales and found that there exists a significant relationship between the incidence of forest fires and meteorological parameters. Most studies [
11,
12] of forest fires in different parts of the world have focused on fire frequency, coverage and on the linkages between forest fire frequency and fire risk based on susceptibility assessments and modeling [
13,
14].
In recent decades, however, researchers have looked into forest fire susceptibility mapping (FFSM) and other hazard types mapping using remote sensing [
15,
16,
17,
18], and the geographic information system (GIS) in combination with (i) bivariate statistical analysis method [
19,
20], (ii) data mining methods [
21,
22], and (iii) multi-criteria decision-making (MCDM) methods [
23,
24]. Of the various attempted forest fire susceptibility (FFS) modeling approaches, the adaptive neuro-fuzzy inference system (ANFIS) and combinations of it with meta-heuristic algorithms have been a group of the most widely used models for FFSM [
25,
26,
27]. On the other hand, support vector regression (SVR), which is a more recent modeling approach than the ANFIS, has been one of the most successful models in hazard mapping as well as in other applications [
28,
29,
30,
31,
32,
33,
34]. After its advent, the support vector machine (SVM) has always been competing with the artificial neural network (ANN). Some studies indicate superiority of the ANNs over the SVMs while some others advocate superiority of the SVM over the ANNs. So far, a plethora of hybrid models have been developed using a combination of the ANFIS and evolutionary algorithms, but there are only a handful of studies that compared performance of these models when combined with SVR and evolutionary algorithms, especially in FFSM. Therefore, this study compares the performance of hybrid models based on two powerful SVR and ANFIS algorithms. The study aimed at identifying the locations that are susceptible to forest fires in the governorate of Ajloun in Jordan by using SVR and the ANFIS in integration with the Genetic algorithm (GA) and the shuffled frog-leaping algorithm (SFLA). To the best of our knowledge, the combination of these models has never before been employed in FFSM. Performance of each developed model was evaluated using three measures: the area under the receiver operating characteristic (AUROC) curve, root-mean-squared error (RMSE), and mean absolute error (MAE).
2. Study Area
The governorate of Ajloun lies in the western highlands, in the northwest of Jordan (
Figure 1). The city of Ajloun (32°19′57″ N, 35°45′6″ E) is the center of the governorate. It is located approximately 75 km north of Amman. The total area of the governorate is 420 km
2. In 2019, its population was estimated at 194,700 capita. As such, it is one of the smallest governorates in Jordan (the second smallest governorate in area after Jerash). Nevertheless, the percentage of the land area of the governorate of Ajloun that is covered by vegetation is the highest among all governorates. The vegetation cover in this governorate is highly variable. It includes forests, cultivated trees, seasonal crops, pastures, and grass.
The unique geographic location of Ajloun, in addition to its remarkable weather conditions, manifest in an evergreen forest covering 20,000 acres of its area and high biodiversity. Additionally, the presence of many water springs in this governorate makes it a preferred destination for both domestic and regional tourism. These features of Ajloun allowed for various types of tourism activities, such as ecotourism, camping, hiking, heritage tourism, adventure tourism, and recreation tourism.
Ajloun has a Mediterranean climate. The average annual precipitation depth is 750 mm while the average temperatures are 5 °C in winter (with a few degrees Celsius below zero recorded during some of the winter nights) and 33 °C in summer (temperatures reach, and even exceed, 40 °C on some summer days).
In other respects, there is variability in the topography of Ajloun; the altitudes range from 590 m to 1240 m above sea level. This variation in altitude led to diversity in agricultural production in this governorate. These distinctive climatic conditions, the high contrast between the summer and winter temperatures (about 50 °C), and the diverse vegetative cover, especially of weeds and field crops in summer, increase the probability of forest fires in this governorate, particularly in the summer and autumn when the temperature reaches its maximum (around 40 °C) and the vegetation cover (pastures and field crops) is mostly dry.
2.1. Forest Fire Inventory Map and Conditioning Factors
Information on previous wildfire outbreaks and data on forest fire predictor factors were collected to evaluate vulnerability of the governorate of Ajloun to wildfires. The data pertain to the major factors that have been reported in the literature as determinants of forest fire outbreaks. However, effects of these determinants on incidence of forest fires vary from one factor to another. In other respect, volume of the compiled dataset was governed by availability of relevant data on these determinants. RS data is used in a wide range of applications, especially in the hazard mapping [
35,
36]. Data on these variables were pre-processed and analyzed. The General Directorate of Civil Defense, Jordan, reported 109 fire events in this governorate. One hundred and one locations of these have been selected for study. In addition to these fire locations, one hundred and one other locations beyond the fire zones were randomly selected and incorporated in analysis in this study. Non-fire locations in the same number of the 101 fire locations were picked. This resulted in wide distribution of the study sites within the governorate that were separated from one the other by a distance of 50 m or more. A scheme for selection of the fire and non-fire locations was established based on random production and distribution of sampling points. Afterwards, the sample locations were divided into a training subset, consisting of 70.0% of the locations, and a testing subset, consisting of the remainder 30.0% of the locations. This splitting was intended to provide data for model training and validation [
29,
37] and warrant reasonable generalizability of the developed models. Values of the success rate were calculated using wildfire susceptibility mapping, the 70% training data subset (83 fire sites), and the 30% validation data subset; 36 fire locations [
38].
2.1.1. Elevation
Elevation is considered one of the main factors affecting fire size and severity [
39] and is one of the most important spatial layers that is used in many fields [
40,
41,
42]. In this respect, the higher areas are often more dangerous than the lower areas when fires break out due to rapid drying of the weeds and litter in these areas and the difficulty which the firefighting teams face in accessing these high areas. As
Figure 2a discloses, elevation varied in the study area from −227 m to 1225 m above sea level. The elevation map (
Figure 2a) was derived from ALOS DEM with a resolution of 12.5 m that was downloaded from the Alaska Satellite Facility (ASF) Distributed Active Archive Center (DAAC). In this regard, high areas are often more dangerous than low areas when fires set, especially if the access roads are unpaved or unsuitable for movement of the fire-fighting teams and their large equipment. The situation becomes even worse if the areas facing the fire site have high slopes. This increases the difficulty in reacting to the fires. However, it should be spotlighted that rapid drying of the weeds and litter in these areas takes place during a short period of time (two to three weeks after the end of spring), thus defining the period of time with the highest natural fire risk.
2.1.2. Slope
Land slope is important in the burst and spread of fires and in controlling them because areas with high slopes require complex fire control procedures [
39]. The slope controls advancement of fire, whether it will travel uphill or move downhill. Thereupon, backing fires are more prevalent on locations with low slopes. They usually spread more slowly and have short flame durations. Furthermore, distinct types of flora are associated with elevation and the slope degree. Specifically, the steep slopes have thick woody plants with tiny, hard leaves whilst the lower slopes have plenty of medic vegetation. A slope map of the study area that was extracted from the aforementioned DEM is shown in
Figure 2b.
2.1.3. Aspect
Aspect influences the quantity of solar radiation which the land receives and the moisture content of the soil. Moreover, it has an indirect impact on outbreak of fires since it determines the type and density of plant found in a certain location. An aspect map was created for the study area using the aforementioned DEM. The obtained aspect values were then categorized into nine categories: flat, north, northeast, east, southeast, south, southwest, west, and northwest (
Figure 2c).
2.1.4. Land Use
Land use is an important factor in determining the locations in which fires can occur as the places where weeds and field crops are abundant are more vulnerable to fires than other places, particularly in the summer when the temperatures are at their maxima during the daytime. Meanwhile, the likelihood of fires in residential areas and forests is less. A land cover and use map was prepared for Ajloun by supervised classification of Landsat 8 Operational Land Imager (OLI) images for the year 2019 using the maximum likelihood algorithm. The land covers and uses prevalent in the study area were categorized into nine classes: water bodies, desert, vegetation, urban areas, rural areas, soil, rocks, valleys, and sand (
Figure 2d).
2.1.5. The Normalized Difference Vegetation Index (NDVI)
The Normalized Difference Vegetation Index (NDVI) maps are important maps for analysis of vegetation cover in any area and identification of the places most vulnerable to fires, especially in the presence of seasonal field crops, weeds, and pastures in the area. While the differences in the NDVI values are not that high in areas of evergreen trees such as oak forests and olive groves, the NDVI maps differ greatly between summer and spring [
43]. In the current study, the NDVI map was created from Landsat 8 OLI images that were downloaded from the United States Geological Survey (USGS) website (
http://glovis.usgs.gov/, accessed on 15 November 2021). The NDVI values in the study area, which ranged from –0.35 to 0.20, were classified into six classes as illustrated in
Figure 2e.
2.1.6. Rainfall Rate
Seasonal field crops and pastures, which are the main factors in wildfire proliferation, are highly abundant in areas with high rainfall if these areas are not planted with fruit trees or evergreen plants [
44]. Therefore, maps of rainfall were prepared for the study area to be overlaid over the land use maps and processed with the other determinant factors of forest fire outbreaks. The rainfall rate in the study area was digitized from an annual precipitation isohyetal map prepared by the Jordanian Meteorological Department (JMD) and converted into raster format using the Spatial Analyst Tools in ArcGIS 10.5.
The rainfall rate in the study area, which ranged from 300 to 500 mm/y, was categorized into six classes (
Figure 2f).
Seasonal field crops and pastures are highly abundant in areas with high rainfall if these areas are not planted with fruit trees or evergreen plants [
44]. Therefore, maps of rainfall were prepared for the study area to be overlaid over the land use maps and processed with the other determinants of forest fires under consideration. The rainfall rate ranges in the study area from 300 to 500 mm/y, with a general pattern of decrease from 550 mm in the north and northeast to less than 250 mm in the west near the Dead Sea, which is a pattern that illustrates the rain shadow effect of the western highlands.
2.1.7. Temperature
Temperatures is the most influential factor in incidence of fires in the study area. All fires took place in the summer when the grass was dry and the field crops were ready for harvest. No fires took place in winter or in the low temperatures condition.
In the present study, data on climatic variables obtained from eight meteorological stations with quarter-century records were interpolated with the Inverse Distance Weighted (IDW) interpolation method in the ArcGIS 10.5 environment using the Spatial Analyst Tools [
45]. Then, raster layers with certain indices were created (
Figure 2g).
Temperatures vary in the study area from one location to another, even during the same season. In light of this, the temperatures recorded during the study period were classified into five classes, with values ranging from 15.8 °C to >20 °C, and a temperature map (
Figure 2g) was produced for the study area.
2.1.8. Wind Speed
Wind speed has an effective role in the spread of fires after they set and it may make it impossible to control the spread of fires in some cases [
46]. The wind speed varied in the study area from less than 2 m/s in some places to 6 m/s in others. In light of this, wind speed was classified into four classes and a wind speed map was developed for the study area (
Figure 2h).
2.1.9. Radiation
Radiation has a positive correlation with temperature [
47] and it is associated with incidence of fires, particularly in areas where the land is dominated by dry grass and field crop cover. Usually, radiation varies from one location to another according to many factors such as soil texture and humidity percentage. In this study, the radiation values fell in the range 5.16–5.76 W/m
2. They were categorized into four classes (
Figure 2i). A radiation map was then created for the study area.
2.1.10. Soil Texture
Soil texture determines the types of plants that can grow in the soil [
44]. This study used a soil texture map of the study area that was originally created by the Ministry of Agriculture (MoA) at the scale of 1:250,000. The researchers digitized this map and, afterwards, processed it using ArcGIS 10.5. Then, the texture of soil of the study area was categorized into four texture classes: loam, silt loam, clay loam, and silty clay loam (
Figure 2j).
2.1.11. Topographic Wetness Index (TWI)
The topographic wetness index (TWI) is a morphological factor. It describes the topography of the area and other relating conditions that affect the spatial patterns of soil texture and soil moisture [
48]. Moore et al. [
48] suggested the following equation to calculate TWI:
where
A is the cumulative unsloped area (m
2) and
is the slope gradient in degrees. The values of the TWI in the area under study were positive. They increased as the catchment area increased and the slope gradient decreased. They were classified into three classes (
Figure 2k).
2.1.12. Distance to Drainage
The recharge potential of an area is directly influenced by the its distance from the drainage network [
47]. Consequently, this morphological factor was taken into account in the current study. Its values ranged from zero meters to higher than 600 m. These values were categorized into five classes (
Figure 2l) and a map of the distance from drainage was then created for the study area.
2.1.13. Population Density
Because many fires are directly or indirectly caused by human activities and by activities that permit naked flames to strike inflammable woody biomass, the incidence of fire and the number of fires that take place in a given location correlate positively with the population density [
49]. Human presence and activity in forest areas increases the likelihood of wildfires igniting. As a result, the forests are anticipated to always be at risk of fire outbreaks from nearby population settlements. In line with this, the presence of roads can contribute to flagging and provision of paths that can be used for fire suppression responses.
As can be seen in
Figure 2d, density of population in the study governorate varied from 0.35 capita/km
2 to more than 11.76 capita/km
2. These densities were categorized into five density classes (
Figure 2m).
2.1.14. Distance to Roads
Figure 2d discloses that the values of distance to roads ranged in the study governorate from 0 m to higher than 600 m. These values have been classified into five distance intervals (
Figure 2d). It is noticed in
Figure 2d that the study area is characterized by an extensive network of roads as the distance from any point in it to a road does not exceed 150 m in the largest part of this governorate (
Figure 2n).
3. Methods
In the present study, support vector regression and ANFIS models were used as the base models and the GA and SFLA meta-heuristic algorithms were combined with these base models to tune their parameters. As a result of this integration, four hybrid models were developed, i.e., ANFIS-GA, ANFIS-SFLA, SVR-GA, and SVR-SFLA, and used to create FFSMs (
Figure 3). A description of the main four models is given in the following four sub-sections.
3.1. Adaptive Neuro-Fuzzy Inference System (ANFIS)
The Adaptive Neuro-Fuzzy Inference System (ANFIS) is a modeling system that is composed of ANN and a fuzzy inference system. This inference system uses fuzzy logic and fuzzy arithmetic to map the input data to output [
50]. This system is capable of mimicking human reasoning and decision making. However, it is usually difficult to determine the type of membership function, number of rules, and design parameters of the fuzzy systems [
51]. As a solution, neural networks have been proposed to tune the fuzzy parameters. However, the search for solution to these problems of the fuzzy systems further led to development of the ANFIS, which solved the problems of the fuzzy method and unlocked its potential [
52]. The ANFIS is a fuzzy model placed in a neural network framework to form a high-level modeling system with advanced inferential capabilities [
52].
This system is made up of five layers, some with fixed nodes and others with adaptive nodes. The parameters of the adaptive nodes change during the training process until an optimal value is obtained. The Equations (2) and (3) show the rules of this model with two inputs [
50]:
where
and
are the inputs to the ANFIS;
and
are fuzzy sets; and
,
, and
are parameters that are determined during the training process. Other details of the layers of the ANFIS are described in the subsequent sub-sections.
3.1.1. The First Layer
The first layer of the ANFIS is composed of adaptive nodes and it is responsible for fuzzifying the input. The output of this layer is obtained from the following equation [
50].
where
and
can be any membership functions such as the Gaussian, triangular, or bell-shaped functions. For example,
is expressed by considering a bell-shaped membership function as follows:
In this equation, x stands for input feature whereas μA represents a membership function. Meanwhile, the parameters ai, bi, and ci are parameter of the triangular membership function, also commonly known as the premise parameters.
3.1.2. The Second Layer
The second layer of the ANFIS includes fixed nodes. An appropriate weight is assigned to each rule in this layer and the output is calculated using Equation (6) [
50]:
3.1.3. The Third Layer
The third layer of the ANFIS too is composed of fixed nodes. The weights are normalized in this layer for better convergence according to Equation (7) [
50]:
3.1.4. The Fourth Layer
The fourth layer of the ANFIS is composed of adaptive nodes. Their outputs are calculated by using Equation (8) [
50]:
The parameters in this layer, also known as the consequent parameters, are
,
and
. The learning algorithm of the ANFIS should tune the premise and consequent parameters so as to minimize the estimation error [
53]. In this study, two meta-heuristic algorithms, GA and SFLA, were employed for tuning these parameters.
3.1.5. The Fifth Layer
The fifth layer of the ANFIS consists of only one fixed node. This node calculates the sum of the inputs based on Equation (9) [
50]:
3.2. Support Vector Regression (SVR)
The SVR algorithm is one of the most popular machine learning algorithms. It is widely used in various fields for estimation of dependence of a function Y on a set of independent variables, X
i [
54]. This regression method is based on statistical learning. It uses structural risk minimization (SRM) and the Vapnik–Cherkassky theory in the modeling [
55]. An important feature of any statistical-based learning approach is extensibility of the statistical learning, which is a notable advantage of SVMs [
56]. In this model, the objective is to specify the value of the function f, which can be obtained from Equation (10):
where
,
and
are the coefficient vector, constant value, and kernel function, respectively. To calculate
and
, the error function (Equation (11)) in the SVM model should be optimized by considering the respective constraints established in Equation (12) [
56].
In Equation (11), C is a parameter that determines the penalty for the model calibration error, N is the number of samples, and and are the slack variables that determine the upper and lower bounds of the training error relative to the allowable error, . In general, the data are expected to fall within the boundaries of the interval of . Data falling outside this interval result in errors equal to and .
The SVM model alleviates the over fitting and under fitting problems by simultaneously minimizing the
and
terms in Equation (11). By defining two Lagrange coefficients,
and
, the optimization problem can be solved by numerically maximizing the following quadratic function (Equation (13)), taking into account the conditions preset in Equation (14):
The objective function in Equation (13) is convex. Hence, the solution to this equation is unique and optimal. After defining the Lagrange coefficients in Equation (13), the w and b parameters in the SVM model are calculated using the Karush–Kuhn–Tucker (KKT) conditions according to Equation (15) [
56]:
Eventually, Equation (16) is formulated for the SVM model as follows [
56]:
The Lagrange terms assume a value of either 0 or 1. Only the solution sets in which the Lagrange coefficients are non-zero are entered into the final equation. The solution set having non-zero Lagrange coefficients is known as the support vector [
56].
In other respects, of the various kernels studied in the literature, the kernel of the radial basis function (RBF) (Equation (17)) proved to be the kernel outperforming the other related kernels.
As highlighted earlier, determining the values of C and γ plays a critical role in the development of an appropriate SVM model. On account of this, two GA and SFLA models were used for this purpose.
3.3. The Genetic Algorithm (GA)
As one of the most widely used optimization methods, the Genetic algorithm (GA) is based on the Darwinian principles of natural biological evolution [
57]. It modifies a population of individual solutions to a problem iteratively, which is a procedure that is described as evolution. In the first step, the GA generates chromosomes, which are also referred to as the initial population of solutions. This population contains a set of individual solutions to the problem under consideration [
57].
Generation of the initial population can be carried out using different methods, for example, the completely at random method, or according to the analyst’s opinion [
58]. A function should, then, be defined to evaluate each solution, which is an evaluation that is performed with reference to a fitness (target) function. Generation of the initial population is followed by two general steps: evaluating the error of the generated solutions and, thereafter, updating the population. To update the population, two members of the initial population are selected as parents, either randomly or based on the analyst’s opinion, and modified by the random operators of selection, combination, and mutation, after which the best solutions are selected for the next iteration. Each new generation replaces its precursor. This process is iterated until a stopping criterion is met. The criterion of choice can be the number of generations and/or other criteria such as the RMSE.
3.4. Shuffled Frog-Leaping Algorithm (SFLA)
The SFLA is a novel meta-heuristic search algorithm and an efficient optimization method. It is inspired by the natural evolution of a group of frogs seeking a location with the largest food supplies [
59]. This algorithm is capable of solving large-scale, non-linear, and complex problems while offering a high convergence speed [
59]. The principal steps of this algorithm are the following [
59]:
At first, an initial population is randomly generated to represent a number of solutions, N, for the problem, and, then, a fitness function is defined to allocate a fitness value to each solution in this population.
The population members are sorted in an ascending order based on their fitness values.
The obtained solutions are divided into m sub-groups, named memeplexes, containing n solutions. To assign the solutions to the memeplexes, the first solution with the highest fitness value is allocated to the first memeplex, the second solution is assigned to the second memeplex, and the m-th solution is allotted to the m-th memeplex. Subsequently, the (m + 1)th solution is allocated to the first memeplex. This allocation process continues until n solutions are assigned to each of the m sub-groups.
To perform a local search at this stage, the position of the ith solution is first determined in each memeplex based on differences in fitness values of each ith solution from the best fitness () and the worst fitness () values using the following equation:
where
is a uniform random number with values falling between 0 and 1.
At the second stage, the new position of the solution is determined using Equation (19) based on
, which is defined as the maximum of the changes in the positions of the solution:
When this change in position results in a solution with a better fitness, the worst solution is replaced with the new one. Otherwise, the best solution in the entire population () replaces and a new solution is generated. When this method fails to improve the solution, the worst fitness value, , is eliminated and replaced with a randomly generated new solution. This evolutionary procedure is repeated for a predetermined number of evolutionary steps for each memeplex.
- 5.
After finishing the local search, all the population members are combined and sorted in descending order of fitness values. The population is, again, divided into several sub-groups and the aforementioned procedure continues until the stopping criterion (e.g., the number of iterations and/or value of a specific error measure) is met, in which case the SFLA finishes its operation and the solution having the highest fitness value is returned as the best solution.
4. Proposed Methodology
As shown in the equations of the ANFIS (
Section 3.1), there are two types of fuzzy system parameters, namely, premise and consequent parameters, that need to be tuned. Because of the large number of parameters and the wide range of their values, it is very difficult, if not impossible, to examine all possible values for these parameters by trial and error. To overcome this problem, metaheuristic algorithms are used. Therefore, two algorithms, GA and SFLA, were used in this study for tuning these parameters. In consequence, the GA and SFLA algorithms were used in this study to optimize the modeling parameters of the ANFIS according to the following steps:
Preparing training data;
Creating the basic fuzzy model;
Adjusting values of the premise and consequent parameters of the basic fuzzy model using the error function and the two optimization methods, GA and SFLA; and
Identifying the fuzzy system with the best values of the parameters as a final result.
In addition to that, as the equations of the SVR model (
Section 3.2) demonstrate, the hyperparameters of C and γ in the SVR model with RBF kernel function need to be tuned. These two parameters were tuned in the current study using the GA and SFLA.
5. Results
After the factors contributing to incidence of fire had been identified, one hundred and one forest fire locations were included in this study. Of these locations, almost 70% (i.e., 70 locations) were selected for model training and 30% (i.e., 31 locations) were used for model testing. Then, the 70 forest fire locations and 70 non-forest fire locations were merged to build the training dataset. Similarly, the 31 forest fire locations and 31 non-forest fire locations were compiled to construct the validation dataset. The total numbers of the training and testing samples were 140 and 62, respectively.
It should be highlighted that the GA and SFLA algorithms were used in this study in a different manner from those in previous studies. Specifically, prior to SVR–SFLA, SVR–GA, ANFIS–GA, and ANFIS–SFLA modeling, selection of the most influential features (explanatory variables) for the SVR and ANFIS models was performed using GA and SFLA. This careful selection of variables was additionally intended to exclude collinear predictors from modeling. It should be highlighted that feature selection in the current study was enlightened by the studies of [
28,
29,
31]. Hence, the interested reader is referred to these three studies for further details on this step.
Elevation, aspect, temperature, radiation, distance to drainage, TWI, population density, rainfall rate, distance to roads, and NDVI were the significant fire outbreak determinants in the SVR–GA model. Meantime, distance to drainage, TWI, wind speed, population density, rainfall rate, and distance to roads were the significant predictors of fire incidence in the SVR-SFLA model. On the other hand, elevation, temperature, radiation, distance to drainage, TWI, population density, rainfall rate, and NDVI were the significant explanatory variables of fire outbreaks in the ANFIS–GA model whereas elevation, temperature, radiation, distance to drainage, TWI, rainfall rate, and NDVI were the significant fire outbreak predictor variables in the ANFIS–SFLA model.
The selected factors were input to the corresponding models and modeling was conducted. As mentioned in the description of the ANFIS, to find out the optimum values for the premise and consequent parameters, metaheuristic algorithms can be used. The number of epochs was set to 1000.
Figure 4 and
Figure 5 display the target values for, and the outputs of, the training and testing phases, respectively, in addition to values of the MSE and RMSE for the training samples in both models, frequency of errors in the training samples, the MSE and RMSE values for the testing samples, and frequency of errors in the testing samples for the ANFIS–GA and ANFIS–SFLA models.
Figure 4 uncovers that the values of the MSE and RMSE were 0.126 and 0.354, respectively, for the training data subset and 0.169 and 0.411, respectively, for the testing data subset in the ANFIS–GA model. As for the ANFIS–SFLA model,
Figure 5 shows that the corresponding values were 0.115 and 0.340, respectively, for the training data subset and 0.199 and 0.445, respectively, for the testing data subset.
Figure 6 shows the best fitness (best RMSE) and mean fitness (mean RMSE) values associated with 1000 epochs for the SVR–GA and SVR–SFLA models, respectively. This figure spotlights that the best fitness in these models corresponded to RMSE values of 0.283 and 0.288, respectively.
Table 1 lists the optimum values of
C and
γ for the SVR–GA and SVR–SFLA models, along with the RMSE and
R2 values for the training and testing data subsets. A comparison between the RMSE values of the SVR- and ANFIS-based hybrid models indicates that both the SVR–GA and SVR–SFLA models have relatively low RMSE values.
The AUROCC is another model accuracy evaluation metric.
Figure 7 displays the receiver operating characteristic (ROC) curves of the training and testing data subsets for the four models. This figure uncovers that the areas under the curves (AUCs) of the SVR–GA and SVR–SFLA models are higher than the areas under the ANFIS–GA and ANFIS–SFLA models. The best AUC is associated with the SVR–SFLA model, whereas the second-best AUC pertains to the SVR–GA model (
Figure 7).
Figure 8 presents the FFS maps generated by the four integrated models. Analysis of these maps with reference to the factors contributing to incidence of fires brings into view that the densely populated areas are areas of low fire risk. Consequently, a low incidence of fire and, hence, number of casualties are anticipated in these areas.
If we look at the places where fire incidents have occurred, we find that they are more frequent on the outskirts of residential areas than in villages and rural areas, while they are clearly reduced within residential areas. In addition, fire accidents increase on the sides of the roads, while they decrease in areas that are difficult to access, whether by citizens or vehicles, but accidents in many cases started from areas close to public roads and then extended to areas that are difficult to access. Fire accidents decrease significantly in privately owned lands that are cared for in terms of soil plowing, tree pruning, and seasonal weeding, while it increases in areas that do not receive such agricultural care. Fire accidents decrease in areas where some prevention measures are taken to reduce fire accidents, such as weeding and monitoring the activities of local residents in picnic areas, which are carried out by some municipalities, provincial councils and local community organizations with the aim of reducing fire accidents. These measures are effective in areas where they are taken, but they are not sufficient.
Figure 9 displays results of analysis of frequencies of fire risks in Ajloun according to five risk classes (very low, low, moderate, high, and very high risks) for the four models. As this figure discloses, about half of the study governorate (Ajloun) is classified as high-risk and very high-risk areas.
6. Discussion
Uncontrolled fires cause extensive damage to rangeland and forest ecosystems in Jordan every year. Because these fires can not be controlled naturally, they can be prevented by identifying high-risk areas and taking precautionary executions in due time to prevent outbreaks of fires. Studies indicate that there is no linear relationship between susceptibility to fire and the weather conditions. Therefore, there is a need for advanced models that can identify the fire susceptibility determinants and model the relationships between incidence of fires and its driving factors.
The driving factors for fires have significant effects on the performance and efficiency of data mining and machine learning models that employ them as predictors of fire events. Feature selection is one of the most important processes in knowledge discovery and data mining as it can identify and eliminate irrelevant and redundant predictor variables from modeling.
In the present study, four feature selection algorithms based on GA and the wrapper approach were employed in FFSM. The results of feature selection underscored that four factors (distance from drainage, rainfall rate, TWI, and population density) play a major role in outbreaks of fires in the study area; the governorate of Ajloun, north of Jordan. Distance from drainage lines is one of the most important factors in preventing fires because these lines create buffer zones that curb spread of fires. In unison with this, [
60,
61] support that the drainage lines are one of the most important factors in preventing the spread of fire. Similarly, ref. [
29] concluded that distance to drainage is one of the most important factors for controlling the propagation of fires in Jordan.
Rainfall is another important factor in the incidence of fires in the governorate of Ajloun. This finding concords with findings of previous studies [
62,
63,
64]. Population density is another important factor that plays big role in fire occurrence in the study area. Previous studies [
65,
66,
67] too have found that population density is one of the key factors that contribute to incidence of fires. Lastly, in harmony with the findings of previous studies (e.g., [
68,
69]), this study found that TWI is an important factor in fire outbreaks in the study area.
After determining the optimal settings for the four feature selection methods proposed herein, four hybrid models (SVR–GA, SVR–SFLA, ANFIS–GA, and ANFIS–SFLA) were used for FFSM. The results of this study unveil that the SVR-based models are more accurate than the SFLA-based models. Similar results have been reported in previous studies. For instance, ref. [
70] employed ANN, SVR, and the ANFIS to predict wind speed and direction. Their results suggest that the SVR model produces better results than the other models. Likewise, ref. [
71] used ANFIS, SVM, and ANN models to forecast river flow in semi-arid mountainous regions. Their results confirmed that SVR gives better results than the other models. Similar results were obtained by Hipni et al. [
72].
The selection of training and testing samples from the available samples has a lot of uncertainty. If the number of fire locations is high, it is difficult to manually select non-fire locations in the same number of fire locations. On the other hand, random creation of non-fire points can create the non-fire points that are prone to fire. In addition, selecting the samples with high variance in the values of the conditioning factors is one of the most important steps for achieving the higher accuracy in modeling. These uncertainties were not investigated in the present study and will be considered and applied in future works.
7. Conclusions
In this study, forest fire susceptibility and risk were investigated in the governorate of Ajloun, north of Jordan, using four hybrid models: SVR–GA, SVR–SFLA, ANFIS–GA, and ANFIS–SFLA. Subsequently, forest fire susceptibility mapping was performed and susceptibility maps were developed using 14 predictor variables (elevation, slope, aspect, land use, NDVI, rainfall rate, temperature, wind speed, radiation, soil texture, TWI, distance to drainage, population density, and distance to roads).
The study results demonstrate that the SVR–GA and SVR–SFLA models provide better results than ANFIS–GA and ANFIS–SFLA. On average, the SVR-based hybrid models had about 5.0% higher values of AUROCC than the ANFIS-based models in the validation phase. Thereupon, the outcomes of this study support that the SVR-based hybrid models are the best models. Given the high accuracy of the various models developed here, especially the SVR-based models, the researchers conclude that the SVR-based models can be reliably used to predict FFS and conduct fire risk mapping in order to properly manage the locations under the risk of fire and prevent or reduce potential irreversible damage by fires in forest areas in Jordan and elsewhere.
The fire susceptibility maps are a useful tool for early warning from fires. They contribute to guided management of the risk of forest fire, prevention of incidence of forest fires, and rapid and effective extinguishing of forest fires once they burst. With the help of these maps, it is possible to provide the qualified personnel and facilities that are required to deal with fire in the high-risk areas, including construction of watchtowers and water tanks in impassable areas, covering the areas under risk with wireless sensor networks to detect fires the earliest possible, and establishment of rescue infrastructure prior to the fire season. In addition, damage and losses can be readily mitigated by training villagers to deal with fires and create fire barriers and lobbies.