Machine Learning for Modeling Wildfire Susceptibility at the State Level: An Example from Arkansas, USA

Fire susceptibility modeling is crucial for sustaining and managing forests among many other valuable land resources. With 56% of its area covered by forests, Arkansas is known as the “natural state”. About 1000 wildfires occurred and burned more than 10,000 acres each year during 1981–2018. In this paper, we use remote-sensing-based machine learning methods to address the natural and anthropogenic factors influencing wildfires and model fire susceptibility in Arkansas. Among the 15 explored variables, potential evapotranspiration, soil moisture, Palmer drought severity index, and dry season precipitation were recognized as the most significant factors contributing to the fire density. The obtained R-squared values are significant, with 0.99 for training the model and 0.92 for the validation. The results show that the Ouachita National Forest and the Ozark Forest, in west-central and west Arkansas, respectively, have the highest susceptibility to wildfires. The southern part of Arkansas has low-to-moderate fire susceptibility, while the eastern part of the state has the lowest fire susceptibility. These new results for Arkansas demonstrate the potency of remote-sensing-based random forest in predicting fire susceptibility at the state level that can be adapted to study fires in other states and help with fire preparedness to reduce loss and save the precious environment.


Introduction
Numerous geospatial methods have been developed to study wildfires, and new approaches are being advanced to overcome the currently existing limitations. Remote sensing and geographic information systems (GIS) have taken advantages of the recently developed methods for fire susceptibility modeling. Fire susceptibility refers to the vulnerability of an area to fire due to its natural and/or anthropogenic conditions. Various statistical methods are widely used to map the fire ignition probability over variant spatial and temporal scales. Examples include kernel density interpolation, a method that calculates the probability of density of functions of random variables; weights-of-evidence modeling, a method that provides evidence to the predictive power of an independent variable to a dependent variable; Bayesian belief network analysis, a method for representing the conditional dependence among variables; logistic regression, a method that predicts categorical variable with a given set of independent variables; analytical hierarchy process (AHP), an organized mathematical structure for analyzing complex decisions; and fuzzy logic, a controlled system for creating heuristic functions for regression [1][2][3][4][5][6]. Each of these methods can take various spatial data representing topography, climate, vegetation, and human activity, but they differ from each other in terms of pre-processing, inputs, calculations, and output accuracies. Although these methods are robust and some of them can be utilized in machine learning (ML), they exhibit distinctive problems as different assumptions must be made ahead of the analysis.
More advanced ML methods, such as decision trees (DT), random forest (RF), support vector machine (SVM), and artificial neural networks (ANN), have become popular in implement state-wide precaution measures and reduce the damage and impact of wildfires across Arkansas.

Study Area
As a "natural state", Arkansas is blessed with great biodiversity and natural resources. About 56% of the state is covered by forest, with approximately 11.8 billion trees [34]. Every year between 1981 and 2018, nearly more than 1000 wildfires occurred, and in some years this number exceeded 4000 events (Figure 1). The severity of these wildfires varied from year to year, but on a yearly average more than 10,000 acres were burned, and in some years, this number nearly grasped 90,000 acres. The graph in Figure 1 shows long-term declining trends, but records indicate that there was a significant number of wildfires occurring from time to time. Many spikes in the number of wildfires and burned areas occurred between 1981 and 2018, and the time interval between these spikes became shorter in recent years. The local topography, climate, vegetation cover, and human practice in Arkansas made it susceptible to wildfires. Although wildfires are prominent in Arkansas, unfortunately fire susceptibility studies over the state are still very limited. Arkansas is situated in the south-central part of the United States. It is mainly covered by forests and lakes and has major mountain valleys ( Figure 2). The forest landcover occupies more than half of the entire state. Oak covers about 42%, and shortleaf pine occupies 29% of the forest type. About 58% of the timberland is owned by private landowners, such as farmers and ranchers [35]. Nearly 13% of the state's forested acreage is considered for national forest status, and forest resource companies lease or own about 23% of the state's timberland [36]. The state's forest resource provided job opportunities to about 64,789 people in 2011, amounting to USD 3.6 billion of labor income. Additionally, 75% of the energy needed for the forest industries is generated from wood waste [37]. This highlights the value and importance of the forest resource in Arkansas.
lights the value and importance of the forest resource in Arkansas.
In terms of population, the three main population centers are Northwest Arkansas, the Little Rock metropolitan area, and Northeastern Arkansas. Northwest Arkansas, consisting of Fayetteville, Springdale, Bentonville, and Rogers, has more than 525,000 people living in the region, making it the second most populated area in Arkansas [38]. The Little Rock metropolitan area is located at the center of the state, including Jacksonville, Little Rock, Benton, Cabot, Conway, and Maumelle, and approximately 739,000 people live in this area. Northeastern Arkansas, including Jonesboro, is the third main population hub in the state with a population over 100,000 [39].

Data Acquisition and Processing
The study approach consisted of three major steps: (1) data acquisition and processing for training and prediction, (2) exploring the relationship between fire and selected variables, and (3) geospatial modeling to produce a fire susceptibility map for Arkansas. Four different types of remotely sensed data were acquired for this study, including vegetation, climate, topography, and human intervention factors. The fire incident data were obtained from the Monitoring Trends in Burn Severity (MTBS). MTBS has been collecting fire incident data from 1984 to present, and it is monitored by U.S. Geological Survey's Earth Resources Observation and Science (EROS) Center and the U.S. Department of Agriculture's Forest Service Geospatial Technology and Applications Center (GTAC). Using this dataset, fire density was calculated for Oklahoma using the kernel density. The spatial distribution of fire incidents was highly clustered, and thus, the density was estimated by the k-nearest neighbor. The density of each cell was estimated by adding all the values of surfaces where they overlay the cell center. The size of each cell was calculated using the shorter width or height of the extent in the defined coordinate system, divided by 250, to maintain the necessary details for our analysis. Dividing by a smaller number resulted in mixing multiple features and led to the loss of some important information, while dividing by a larger number resulted in a smaller cell size and a larger raster file that required extensive computation. In terms of population, the three main population centers are Northwest Arkansas, the Little Rock metropolitan area, and Northeastern Arkansas. Northwest Arkansas, consisting of Fayetteville, Springdale, Bentonville, and Rogers, has more than 525,000 people living in the region, making it the second most populated area in Arkansas [38]. The Little Rock metropolitan area is located at the center of the state, including Jacksonville, Little Rock, Benton, Cabot, Conway, and Maumelle, and approximately 739,000 people live in this area. Northeastern Arkansas, including Jonesboro, is the third main population hub in the state with a population over 100,000 [39].

Data Acquisition and Processing
The study approach consisted of three major steps: (1) data acquisition and processing for training and prediction, (2) exploring the relationship between fire and selected variables, and (3) geospatial modeling to produce a fire susceptibility map for Arkansas. Four different types of remotely sensed data were acquired for this study, including vegetation, climate, topography, and human intervention factors. The fire incident data were obtained from the Monitoring Trends in Burn Severity (MTBS). MTBS has been collecting fire incident data from 1984 to present, and it is monitored by U.S. Geological Survey's Earth Resources Observation and Science (EROS) Center and the U.S. Department of Agriculture's Forest Service Geospatial Technology and Applications Center (GTAC). Using this dataset, fire density was calculated for Oklahoma using the kernel density. The spatial distribution of fire incidents was highly clustered, and thus, the density was estimated by the k-nearest neighbor. The density of each cell was estimated by adding all the values of surfaces where they overlay the cell center. The size of each cell was calculated using the shorter width or height of the extent in the defined coordinate system, divided by 250, to maintain the necessary details for our analysis. Dividing by a smaller number resulted in mixing multiple features and led to the loss of some important information, while dividing by a larger number resulted in a smaller cell size and a larger raster file that required extensive computation.
Vegetation distribution and composition are affected by topographic factors, which themselves are indirectly related to the flammability of forest and climatic influences. Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) of 1 arc-second spatial resolution was used to derive the elevations for the training and predicting processes. Later, the slope of the study area was extracted from the elevation data.
The landcover map of 2016 was used to identify the burnable vegetation in the study area. Vegetation data of a 30m spatial resolution were obtained from the National Land Cover Database (NLCD). NLCD data divided the landcover into 16 classes, but for the sake of this analysis, the data were reclassified into 7 classes only by combining the similar classes together to reduce multicollinearity among the variables. Then, the vegetation class was subdivided into herb, shrub, tree, and wildland urban interface based on the spatial extent of these subclasses in the dataset obtained from the Landfire program. These subdivisions were necessary to better understand the contribution of various vegetation types to wildfires.
Climatic conditions are also very important for modeling fire susceptibility. Several conditions, such as precipitation, temperature, lighting, and soil moisture, have considerable contributions to fire ignition and propagation, so they were considered in this analysis. Daymet climate data from 1990 to 2015 were used to calculate the average dry season precipitation and annual average temperature [40]. The average annual soil moisture from 1990 to 2015 was estimated from the TerraClimate dataset of a 4km spatial resolution. A thorough examination of the explanatory regression outputs, such as the p-value and R-squared, revealed that annual measures were more suitable than seasonal measures and improved the model performance in some cases; however, the opposite was true in some other cases. Precipitation, directly and indirectly, influences the fire density, and in this study, the influence of precipitation was more noticeable in the dry season (June-October). For temperature and soil moisture, the model showed higher sensitivity to annual averages than seasonal variations.
In addition, the climate water deficit (DEF), potential evapotranspiration (PET), and Palmer drought severity index (PDSI) were used in this study to better understand the wildfire drivers. DEF is the difference between PET and the actual evapotranspiration [41] and helps to estimate the drought stress on soils and plants. PET is the amount of water evaporated by transpiration and evaporation from an area, given a uniform vegetation cover and being well supplied with water. PDSI is designed to calculate the agricultural drought by adding precipitation to the top two layers of the soil and using a temperature-driven evapotranspiration algorithm to remove the moisture. All these metrics are commonly used in fire research [42,43], and thus they were taken into account in this study. Geospatial data of 4 km resolution for these variables were collected for the 1990-2015 period from the TerraClimate dataset for calculating the annual averages.
Human intervention is another important determinant in all wildfire studies. For example, accessibility to protected areas can be represented by road density because roads serve as pathways to the vegetated range and people can intentionally or unintentionally start a fire. Recent investigations have also shown that electricity lines can act as a source of fire ignitions [44]. To determine the effects of road and electricity line networks in the study area, their data were extracted from the Topologically Integrated Geographic Encoding and Referencing (TIGER) database, and then Euclidean distances to the electricity lines and kernel density of the road lines were calculated.
Gridding the study area was found to be an effective way to incorporate all the variables for the model in one attribute. This not only facilitated the desired analysis, but also allowed extracting the required information from different GIS layers to a standard form. Therefore, a fishnet gridding was applied to produce a 10 × 10 km grid for training and predicting purposes, and then a spatial joining was applied to join the required data from different layers to one attribute table. In regression analysis, the data need to be normally distributed, so various transformations, such as log transformation and square root transformation, were applied to standardize the variables.

OLS and GWR Analyses
The quality and robustness of the predicted model depend mainly on the training data, making the variable selection for training the model critical. To identify significant variables, explanatory regression was applied, and its outputs were used to narrow down the factors that best explained the dependent variable. Variables identified from the explanatory regression were then applied to OLS, which is a linear least-squares method that minimizes the sum of the squares of the differences between the dependent and the independent variables in the given dataset. The least-squares estimatorβ obtains a value that minimizes the sum of squared residuals of the model as the following: where X is the matrix regressor variable X, T is the matrix transpose, and y is vector of the value of the response variable. Assumptions such as normality, homogeneity, and independence of residual are taken into account as they are essential for accurate modeling and any violation of these assumptions may lead to inefficient and biased results [45]. OLS produces statistical outputs that can be used to evaluate the model performance. Coefficients represent the type and strength of the relationship between the individual explanatory and dependent variable; they can be either positively or negatively correlated. Analysis of variance (ANOVA) measures whether the model is statistically significant or not by calculating the p-value.
The T-test (p-value) developed by Gösset can be used to identify the statistically significant parameters [46].
where X is the sample mean from X 1 , X 2 , . . . . . . , X n out of a size n, µ is the population mean,σ is the standard deviation of the population, S is the standard error of the mean, and Z is the standardized statistic. In this paper, the sample mean followed the tendency of normal distribution. The consistency in the relationship between the explanatory and the dependent variables over geographic and data spaces was measured by Koenker (BP) statistic. If BP results are statistically significant, this indicates robust probabilities. Multicollinearity, the intercorrelation among model variables, is an important factor to consider when developing a model as it reveals redundancy in model variables that must be removed. Higher multicollinearity leads to higher variance and covariance, eventually leading to unreliable statistical results. The variance inflation factor (VIF) was employed to measure the severity of multicollinearity in the linear regression model. A higher VIF value than 10 indicates severe multicollinearity due to strong associations among the variables in the model. In this study, a VIF threshold <7.5 for all variables was used, and the R-squared and Akaike information criterion (AICc) were used to evaluate the overall model performance. The VIF value can be calculated as the following: where R 2 i represents the coefficient of determination between the ith variable X i and the other variables.
Through the smoothing of local regression, GWR was developed and was later improved with the advancement of statistical measures [47][48][49]. OLS does not consider local variations and geographical data that may vary over space, making it difficult to fit everything into one global equation and may not even represent the true situation. Generally, spatial data are non-stationary, their structure affects the correlation between variables, and the association between variables may vary in space [50]. GWR incorporated spatial heterogeneity as it applies multiple calibrated local regression at each sample point to capture the spatial variation [51]. The GWR general framework can be expressed as: where I = 1, 2, 3, . . . , n, (u i ,v i ) is point of coordinates at point i, Y i is the value of random variable, ε i is the random error term, and X i is the value of a fixed variable that is known and does not contain errors.
The three smoothing functions available in GWR include predefined bandwidth, corrected AICc, and cross validation. GWR provides important diagnostic statistics, such as the standardized residual and the difference between the observed and the calculated values by the model, to understand the local information and evaluate the model performance. The local variations of the adjusted R-squared were used to test how the model explained the dependent variable, and the leverage values were used to measure the influence of the explanatory variables on the model calibration.

RF Classification
RF is an ensemble ML algorithm developed by Breiman based on decision trees. The number of decision trees generated (nbtree) and the number of selected variables for each split (nbvar) are the two hyperparameters in RF. The RF algorithm bootstrapping is applied to generate nbtree subsets of training dataset from the full dataset with random sampling, with replacement. Bootstrapping improves model stability by averaging all the model outputs. For each of these subsets, RF grows a decision tree by (1) selecting randomly nbvar variables for each split and (2) computing the Gini index and best variables with the best threshold value for the nbvar and output variables for minimizing error in prediction. After successfully training the decision tree, a prediction (y pred ) is made from the average of all the decision trees (y i ). This produces a robust mean value for prediction by reducing the variance as follows: Training the RF model for prediction was accomplished by establishing a relationship between the explanatory variables and the variable to predict via building a "forest". In the training process of building the "forest", we excluded 20% of the data for validation. The trained model successfully predicted that excluded data and compared them to the observed values to assess the accuracy. RF provides the option to predict a variable as categorical or continuous, as it can predict features beyond the training area; but it must have all the associated explanatory variables of the prediction area that were used in the training.
Following the training, numerous diagnostics, such as the forest characteristics, the out-of-bag (OOB) error, and the summary of variable importance, were employed to assess the model performance. The OOB error can help to evaluate the model performance when 100% of the dataset is used in the training, as it is calculated by the unseen subset of data that is not used in building the "forest". The variable importance was estimated via the Gini coefficient by calculating the number of times a variable caused split and the impact of that split divided by the number of "trees". In an unstable RF model, the importance of a variable changes across the validation. By increasing the number of trees, the instability in the model can be decreased. The number of trees, minimum leaf size, maximum tree depth can help to achieve model stability. Increasing the number of trees can make more accurate predictions, but the model takes a longer time to run. In this study, we used 300 "trees" to test the model and excluded 20% of the data for validation. Additionally, we used 5 as the minimum leaf size. Minimum leaf size refers to the number of observations required to keep a terminal node on a tree without further splits, and maximum tree depth refers to the maximum number of splits that will be made. Increasing the number of maximum tree depth can raise the probability of overfitting. The data availability per tree (%) is basically the percentage of input training data used per decision tree. In this study, approximately two-thirds of the randomly sampled data were used in a decision tree. Finally, outputs, such as the R-squared, p-value, and standard error, were used to evaluate the model performance.

OLS and GWR Outputs
Considering the explanatory regression results, the following predictors were selected for the analysis: slope, soil moisture, dry season precipitation (June-October), average maximum temperature (Tmax), PET, DEF, PDSI, portion of cultivated area, portion of tree area, portion of herb area, portion of shrub area, portion of barren area, portion of wildland urban interface, road lines density, and distance to electricity lines (Figure 3a-j). The dependent variable was defined as the fire density for both training and prediction. For training the model, the fire density of Oklahoma was selected. The produced fire density map (Figure 4) shows higher fire density in both the northeastern and the southern parts of the state. the number of maximum tree depth can raise the probability of overfitting. The data availability per tree (%) is basically the percentage of input training data used per decision tree. In this study, approximately two-thirds of the randomly sampled data were used in a decision tree. Finally, outputs, such as the R-squared, p-value, and standard error, were used to evaluate the model performance.

OLS and GWR Outputs
Considering the explanatory regression results, the following predictors were selected for the analysis: slope, soil moisture, dry season precipitation (June-October), average maximum temperature (Tmax), PET, DEF, PDSI, portion of cultivated area, portion of tree area, portion of herb area, portion of shrub area, portion of barren area, portion of wildland urban interface, road lines density, and distance to electricity lines (Figure 3a-j). The dependent variable was defined as the fire density for both training and prediction. For training the model, the fire density of Oklahoma was selected. The produced fire density map (Figure 4) shows higher fire density in both the northeastern and the southern parts of the state.   After running the OLS, the model showed the expected relation with fire density. The topographic parameter of slope was positively correlated with fire density, suggesting that fire tends to spread at a higher slope than at a lower slope. Vegetation was also positively correlated with fire density. All subclasses of vegetation, including trees, shrubs, and herbs, were positively correlated with fire density, with trees having the strongest influence. Wildland urban interface (WUI) and barren area also showed a positive correlation with fire density. Cultivated land showed a negative correlation with fire density (Table 1) most likely because cultivated areas are frequently monitored and are in private ownership, making the chance of ignition extremely low. The relations between fire density and the selected 15 variables are shown in Figure 5. StdError is the standard deviation error; t-Statistic is the ratio between estimated and hypothesized values relative to StdError; probability and robust probability (Pr) are significant when p-values < 0.01 and VIF values < 7.5.  Dry season precipitation and average maximum temperature tended to have a direct relationship with fire density. As precipitation in the dry season decreased, fire density increased. Additionally, fire density increased as maximum temperature increased. Other positively related factors to fire density included PDSI and DEF. Average soil moisture, Dry season precipitation and average maximum temperature tended to have a direct relationship with fire density. As precipitation in the dry season decreased, fire density increased. Additionally, fire density increased as maximum temperature increased. Other positively related factors to fire density included PDSI and DEF. Average soil moisture, however, showed a negative relation with fire density, proving that soil moisture inversely affected fire incidents. Furthermore, PET showed a negative correlation with fire density, along with human-induced factors, such as distance to electric lines, but the road lines density was positively associated with fire density ( Table 1).

Variable
The p-value for all these explanatory variables is less than 0.1, indicating that these variables are statistically significant. Multicollinearity was tested by VIF. A VIF threshold between 5 and 10 is usually recommended as values out of this range would indicate the existence of collinearity and, thus, redundancy [52][53][54][55]. Among the 15 selected variables, only 4 have VIF values over 5; however, these are the most significantly important factors in modeling fire susceptibility, so a threshold less than 7.5 was applied in this study. Robust probability value is also less than 0.1 (Table 1), adding more confidence to the regression results. The performance of the model is evaluated by the adjusted R-squared value, calculated to be 0.51, meaning that the model can explain nearly half of the observed variation by these explanatory variables ( Table 2). OLS was able to explain about 50% of the variance of the dependent variable with the explanatory variables; however, GWR provided better results as the adjusted R-squared value increased from 0.51 to 0.87 (Table 2). This demonstrates that the variables are spatially related, and GWR is superior to OLS in this case for fire density modeling. In this study, the golden search approach was applied to determine the suitable neighbor size. This approach tries to find a suitable number of neighbors based on the lowest AICc value. The findings determined the suitable number of neighbors as 31 with the AICs value of 5171, and GWR provided better results again compared to OLS, adding confidence to the use of these variables in a non-parametric method to predict fire density at a fully different location (Table 2).

RF Outputs
Fire distribution showed an irregular pattern in the training dataset, and OLS and GWR results indicated that fire density was influenced by both natural and human-related factors. The adjusted R-squared value inferred a nonlinear relationship ( Table 2), suggesting that a non-parametric method would be more suitable for this study [18]. This explains why RF outperformed linear regression. A total of 300 "trees" were selected to test the model, and 20% of the data were excluded for validation. The adjusted R-squared value for training the model over Oklahoma was 0.99 with a standard error of 0.002 and a p-value of 0.00, indicating the result is statistically significant. For validation, the adjusted R-squared value was 0.92 with a standard error of 0.01 and a p-value less than 0.00. The validation data share more than 95% of the training data range, indicating that the model is well validated. Overall, the results show that the model is well fitted for prediction. Therefore, the same 15 explanatory variables for the prediction area were fed to the model to predict wildfire densities in Arkansas ( Figure 6). to evaluate the model performance. The MSE value for 50% of the "trees" was 924.42, and the percentage of variation explained was 90.14; for 100% of the "trees", the MSE was 913.81, and the percentage of variation explained was 90.26. The difference in values for both cases was minimal, indicating the number of "trees" had minimal effect on the model outcome.  To predict features at a fully different location, the RF model required all the associated explanatory variables that were used in the training. Thus, a 10 × 10 km grid was created for Arkansas and spatial joining was applied to join the necessary data from different GIS layers to one attribute table. Additionally, a total of 300 "trees" were used in the prediction, but no data were excluded for validation, hence OOB results were utilized to evaluate the model performance. The MSE value for 50% of the "trees" was 924.42, and the percentage of variation explained was 90.14; for 100% of the "trees", the MSE was 913.81, and the percentage of variation explained was 90.26. The difference in values for both cases was minimal, indicating the number of "trees" had minimal effect on the model outcome.
Of the 15 variables, PET showed the highest importance in RF (Table 3); however, in OLS, it showed a negative strong relationship. The influence of PET on wildfires was explained in details in previous work [56]. Wildland fuels, such as live vegetation, organic soil, and dead fuels, are affected by water deficits; therefore, increasing the water deficit may make fuels drier and more susceptible to ignition and burning [57]. It has been found that summer evapotranspiration has a significant correlation with the wildfires in southwest and southern California [58]. Soil moisture directly influences the dryness of fuels, and affects the dead fuels that are generally found in the ground, so it acts as a proxy to drought [1]. Under low wet conditions at the surface, burnable fuels are more susceptible to burning [59]. PDSI shows significant importance in explaining wildfires, as previous wildfires in the western states have shown a correlation with PDSI [60]. PDSI with precipitation and temperature have been used previously to explain fire events in the 12 ecoregions of the western U.S. [42]. Human factors, as well, have a major influence on fire density. Many reports have identified faulty electricity lines acted as a source of ignition.  Climatic factors have a significant impact on fire distribution [61]. Average maximum temperature and dry season precipitation have a demonstrated noticeable importance in the model. In this study, average dry season precipitation showed more importance than average maximum temperature. Temperatures above 20 • C promoted wildfires and showed a positive correlation in the linear regression.
The only topographic factor included in this fire modeling is the slope; initially, elevation and aspect factors were also included, but they were eventually excluded due to multicollinearity. The slope of the area showed a notable effect on the RF results, as well as vegetation covers, such as portions of tree, shrub, and herb, which acted as proxies for fuel. In linear regression, these variables demonstrated a significant correlation with fire density. Cultivated area, barren, and WUI also showed some importance in the RF results.
The final stage of RF modeling revealed the predicted fire density for Arkansas, which we categorized into five divisions using an equal interval approach that divided the full range of fire density values into equal-size subranges. Ultimately, the produced five categories were integrated with the fire inventory of Arkansas to create a fire likelihood map for the entire state (Figure 7), assuming that the predicted areas in the highest fire density category are highly susceptible to wildfire and vice versa. Our results indicate that the Ouachita National Forest and the Ozark Forest are highly susceptible to wildfire, while the eastern side of the state is less susceptible to wildfire, and the southern part of the state is moderately susceptible to wildfire. The Ouachita National Forest is close to Little Rock, which is densely populated in central Arkansas, putting this area at high risk. Northwest Arkansas, which is also highly populated, is close to the Ozark Forest, a high-risk zone for wildfires. It is noteworthy that most of the historic fire events coincide with the predicted high susceptibility, which adds more confidence to the established model. Figure 7 shows a strong correlation between the predicted high and high-medium susceptible areas and the documented fires, while it shows weak correlation in the southern part of the state. However, most of the historic fire events in the southern part are just prescribed burns performed to maintain the forest and agricultural production, but they are still documented as wildfires in the archives. The southern part of Arkansas is known as the West Gulf Coastal Plain. It has a gentle slope with rich soil properties and includes extensive areas of loblolly shortleaf pine forests and agricultural fields. Forest harvesting related to hardwood products is predominant in this plain. the full range of fire density values into equal-size subranges. Ultimately, the produced five categories were integrated with the fire inventory of Arkansas to create a fire likeli hood map for the entire state (Figure 7), assuming that the predicted areas in the highes fire density category are highly susceptible to wildfire and vice versa. Our results indicate that the Ouachita National Forest and the Ozark Forest are highly susceptible to wildfire while the eastern side of the state is less susceptible to wildfire, and the southern part o the state is moderately susceptible to wildfire. The Ouachita National Forest is close to Little Rock, which is densely populated in central Arkansas, putting this area at high risk Northwest Arkansas, which is also highly populated, is close to the Ozark Forest, a high risk zone for wildfires. It is noteworthy that most of the historic fire events coincide with the predicted high susceptibility, which adds more confidence to the established model Figure 7 shows a strong correlation between the predicted high and high-medium sus ceptible areas and the documented fires, while it shows weak correlation in the southern part of the state. However, most of the historic fire events in the southern part are jus prescribed burns performed to maintain the forest and agricultural production, but they are still documented as wildfires in the archives. The southern part of Arkansas is known as the West Gulf Coastal Plain. It has a gentle slope with rich soil properties and include extensive areas of loblolly shortleaf pine forests and agricultural fields. Forest harvesting related to hardwood products is predominant in this plain.

Conclusions
Remotely sensed data of moderate spatial resolution (30 m) are capable of elucidating the relationship between the historic fire events and the identified explanatory variable with an acceptable statistical significance. Remote-sensing-based RF is precise and reliable in predicting fire susceptibility beyond the training area at the state level. With highe

Conclusions
Remotely sensed data of moderate spatial resolution (30 m) are capable of elucidating the relationship between the historic fire events and the identified explanatory variables with an acceptable statistical significance. Remote-sensing-based RF is precise and reliable in predicting fire susceptibility beyond the training area at the state level. With higher confidence and accuracy, RF overcomes the limitations of conventional statistical methods in predicting fire vulnerability. This study demonstrates that RF can capture both human influences and contemporary climate changes in fire susceptibility modeling. Among the 15 investigated variables, potential evapotranspiration, soil moisture, Palmer drought severity index, and dry season precipitation were found to be the most significant factors contributing to fire density in Arkansas. The training and validation processes provided significant statistical outputs, indicating high precision in the RF prediction of fire vulnerability. RF outperformed OLS and GWR with an adjusted R-squared value exceeding 0.9. The adjusted R-squared values for OLS and GWR analyses are 0.51 and 0.89, respectively.
The cross checking of the predicted fire susceptibility categories with the documented fire events revealed strong correlations, which highlights the prediction capability of RF that incorporates both physical and human factors. Our modeling results also indicate a strong relationship between fire density and the explanatory variables used in the analysis. Although the relationship between fire density and the contributing factors may vary by location, the model still can be used to identify the main sources of fire. Fire prevention and planning can benefit from these results to better prepare for future fire events and minimize loss by taking necessary precaution measures in a given area according to its local fire drivers. Funding: This research was funded by the United States Geological Survey (USGS) and AmericaView, grant number GR013384UAF, awarded to Mohamed H. Aly. The APC was fully waived for this invited paper.