Multi-Temporal Analysis of Forest Fire Probability Using Socio-Economic and Environmental Variables

As most of the forest fires in South Korea are related to human activity, socio-economic factors are critical in estimating their probability. To estimate and analyze how human activity is influencing forest fire probability, this study considered not only environmental factors such as precipitation, elevation, topographic wetness index, and forest type, but also socio-economic factors such as population density and distance from urban area. The machine learning Maximum Entropy (Maxent) and Random Forest models were used to predict and analyze the spatial distribution of forest fire probability in South Korea. The model performance was evaluated using the receiver operating characteristic (ROC) curve method, and models’ outputs were compared based on the area under the ROC curve (AUC). In addition, a multi-temporal analysis was conducted to determine the relationships between forest fire probability and socio-economic or environmental changes from the 1980s to the 2000s. The analysis revealed that the spatial distribution was concentrated in or around cities, and the probability had a strong correlation with variables related to human activity and accessibility over the decades. The AUC values for validation were higher in the Random Forest result compared to the Maxent result throughout the decades. Our findings can be useful for developing preventive measures for forest fire risk reduction considering socio-economic development and environmental conditions.


Introduction
Analysis of forest fire probability is important in disaster risk reduction (DRR) because it provides means for preventing and managing forest fires.The most direct cause of forest fires that occur in South Korea, where approximately 65% of the land is covered by forest, is human activity [1][2][3].Most of these forest fires, caused by human negligence, waste incineration, stubble burning, and discarded cigarettes, are considered to be accidental.While human activity directly causes most forest fires in South Korea, climatic, meteorological and environmental conditions cannot be disregarded, as they contribute to the ignition, combustion, and spread of accidental forest fires [4,5].According to the Korea Forest Service (KFS), approximately 57% of the forest fires occurred between March and May during the period 1974-2017, and an average of 37% of the forest fires are reported to have resulted from human negligence, followed by stubble burning (17%), waste incineration (14%) and discarded cigarettes (5%) [6,7].Therefore, cautionary periods are generally announced in Korea for the spring and fall, when relatively low humidity and high temperatures prevail, to enhance forest fire prevention [6,7].
Since the Korean War, which lasted from 1950 to 1953, South Korea has experienced rapid economic growth, especially since the 1970s, a phenomenon known as the "Miracle on the Han River" [8].Along with the socio-economic development, which can be explained by the growth in the gross domestic product (GDP) and GDP per capita, the urban population has increased as urban space has expanded, particularly during the 1990s (Figure 1) [9].As most forest fires are caused by human activity, increased forest fire occurrence can be a consequence of a higher urbanization rate.Therefore, it was necessary for this study to include socio-economic factors, as well as environmental factors, in developing an effective forest fire probability model.March and May during the period 1974-2017, and an average of 37% of the forest fires are reported to have resulted from human negligence, followed by stubble burning (17%), waste incineration (14%) and discarded cigarettes (5%) [6,7].Therefore, cautionary periods are generally announced in Korea for the spring and fall, when relatively low humidity and high temperatures prevail, to enhance forest fire prevention [6,7].Since the Korean War, which lasted from 1950 to 1953, South Korea has experienced rapid economic growth, especially since the 1970s, a phenomenon known as the "Miracle on the Han River" [8].Along with the socio-economic development, which can be explained by the growth in the gross domestic product (GDP) and GDP per capita, the urban population has increased as urban space has expanded, particularly during the 1990s (Figure 1) [9].As most forest fires are caused by human activity, increased forest fire occurrence can be a consequence of a higher urbanization rate.Therefore, it was necessary for this study to include socio-economic factors, as well as environmental factors, in developing an effective forest fire probability model.Socio-economic development of South Korea as described by urban population, GDP, GDP per capita and number of national park visitors.Data is from the Korean Statistical Information Service (KOSIS) and the Korea Forest Service (KFS) [6,7].
Socio-economic variables are among the main factors contributing to the occurrence of forest fires in South Korea, but modeling these factors spatially and temporally has often been considered unimportant and challenging [10][11][12].However, there are numerous studies worldwide that have included both environmental and socio-economic factors, focusing on the spatial pattern of humancaused fires in various statistical methods, including Generalized Linear Models (GLMs) and Generalized Linear Mixed Models (GLMMs), to predict forest fires [1,[13][14][15][16][17][18].Their study periods vary among daily, seasonally, and larger periods of time.
Machine learning tools have also been used to predict the probability of wildfire occurrence considering both factors using models such as Maximum Entropy (Maxent) and Random Forest [19][20][21].Machine learning algorithms, which can be referred to as nonparametric models, use iterative training with a random data subset [22].Maxent is known as a non-linear regression model and was originally designed to predict the spatial distribution of species using point locations and layers [23,24].It has been applied to fire ignition probability in several studies, because fire ignition distribution can be considered a form of species distribution, and has obtained fairly good results [19][20][21]25,26].Random Forest is also a nonparametric model and is based on ensemble techniques for classification and regression trees [27][28][29][30].The model has been applied to estimate fire probability and maps in several studies, and achieved good accuracy [20,[29][30][31].
In this study, multiple socio-economic factors that could influence forest fires are included in the analysis to predict and analyze the spatial distribution of forest fire probability in South Korea.The Socio-economic variables are among the main factors contributing to the occurrence of forest fires in South Korea, but modeling these factors spatially and temporally has often been considered unimportant and challenging [10][11][12].However, there are numerous studies worldwide that have included both environmental and socio-economic factors, focusing on the spatial pattern of human-caused fires in various statistical methods, including Generalized Linear Models (GLMs) and Generalized Linear Mixed Models (GLMMs), to predict forest fires [1,[13][14][15][16][17][18].Their study periods vary among daily, seasonally, and larger periods of time.
Machine learning tools have also been used to predict the probability of wildfire occurrence considering both factors using models such as Maximum Entropy (Maxent) and Random Forest [19][20][21].Machine learning algorithms, which can be referred to as nonparametric models, use iterative training with a random data subset [22].Maxent is known as a non-linear regression model and was originally designed to predict the spatial distribution of species using point locations and layers [23,24].It has been applied to fire ignition probability in several studies, because fire ignition distribution can be considered a form of species distribution, and has obtained fairly good results [19][20][21]25,26].Random Forest is also a nonparametric model and is based on ensemble techniques for classification and regression trees [27][28][29][30].The model has been applied to estimate fire probability and maps in several studies, and achieved good accuracy [20,[29][30][31].
In this study, multiple socio-economic factors that could influence forest fires are included in the analysis to predict and analyze the spatial distribution of forest fire probability in South Korea.The aim of this study are as follows: (1) to predict and analyze the spatial distribution using both Maxent and Random Forest models based on a multi-temporal analysis; (2) to compare the results of the models; and (3) to determine the relationships between forest fire probability and socio-economic or environmental changes from the 1980s to the 2000s.

Study Site
South Korea, specifically the southern part of the Korean Peninsula, with a total land area estimated at 99,720 km 2 , is the study area (Figure 2).South Korea reforested its degraded and devastated forest landscape over a short period of time following the Korean War [32,33].Currently, following the country's enormous reforestation efforts, which have been carried out since the 1970s, 65% of the land is covered by dense forest, of which 37% is coniferous, 32% is deciduous, and 27% is mixed [2].Thus, forest fires are likely to occur on a large scale because the land is primarily covered by coniferous forest, known to be susceptible and prone to fires [34][35][36].From 2008 to 2017, an average of 421 forest fires occurred per year, resulting in an average loss of 5989 ha of forest per fire and 6,326,285 ha of forest in total [6,7,37].aim of this study are as follows: (1) to predict and analyze the spatial distribution using both Maxent and Random Forest models based on a multi-temporal analysis; (2) to compare the results of the models; and (3) to determine the relationships between forest fire probability and socio-economic or environmental changes from the 1980s to the 2000s.

Study Site
South Korea, specifically the southern part of the Korean Peninsula, with a total land area estimated at 99,720 km 2 , is the study area (Figure 2).South Korea reforested its degraded and devastated forest landscape over a short period of time following the Korean War [32,33].Currently, following the country's enormous reforestation efforts, which have been carried out since the 1970s, 65% of the land is covered by dense forest, of which 37% is coniferous, 32% is deciduous, and 27% is mixed [2].Thus, forest fires are likely to occur on a large scale because the land is primarily covered by coniferous forest, known to be susceptible and prone to fires [34][35][36].From 2008 to 2017, an average of 421 forest fires occurred per year, resulting in an average loss of 5989 ha of forest per fire and 6,326,285 ha of forest in total [6,7,37].South Korea has a temperate climate with four distinct seasons that are greatly influenced by prevailing winds [38].Temperature and humidity are high during the summer due to the heavy rainfall, whereas temperature and humidity are low during the winter [39,40].The mean annual temperature of South Korea is 10 °C to 15 °C, while the average August temperature is from 23 °C to 26 °C and the average January temperature is −6 °C to −3 °C.About 50-60% of rainfall is concentrated in summer and the average annual precipitation is around 1200 mm.The average humidity of July and August is between 70% and 85%, while that of March and April is between 50% and 70% [41].Spring and fall are transitional periods, when the temperatures are mild, with less rainfall [42].South Korea has a temperate climate with four distinct seasons that are greatly influenced by prevailing winds [38].Temperature and humidity are high during the summer due to the heavy rainfall, whereas temperature and humidity are low during the winter [39,40].The mean annual temperature of South Korea is 10 • C to 15 • C, while the average August temperature is from 23 • C to 26 • C and the average January temperature is −6 • C to −3 • C.About 50-60% of rainfall is concentrated in summer and the average annual precipitation is around 1200 mm.The average humidity of July and August is between 70% and 85%, while that of March and April is between 50% and 70% [41].Spring and fall are transitional periods, when the temperatures are mild, with less rainfall [42].However, as the temperature increases with the low humidity during the spring, the forest fire probability also increases [43].

Forest Fire Occurrence Data
The KFS offers locational forest fire occurrence data based on field survey [6,7].This includes the area, extent, cause, date, and geographical location of forest fires.From the 1980s to the 2000s, there was an increase in the number of forest fire occurrences (Figure 3).The total fire occurrences were 1291, 3279, and 5196, during the 1980s, 1990s, and 2000s, respectively (Figure 4).However, as the temperature increases with the low humidity during the spring, the forest fire probability also increases [43].

Forest Fire Occurrence Data
The KFS offers locational forest fire occurrence data based on field survey [6,7].This includes the area, extent, cause, date, and geographical location of forest fires.From the 1980s to the 2000s, there was an increase in the number of forest fire occurrences (Figure 3).The total fire occurrences were 1291, 3279, and 5196, during the 1980s, 1990s, and 2000s, respectively (Figure 4).In the analysis using Maxent, we used only the original location data provided by the KFS, because the Maxent model does not need absence data for the analysis.Whereas, in the analysis using Random Forest, we created absence data as random points, and these were combined with the original location data, as needed for the analysis [29].The absence points are randomly distributed individual points in free-fire sites in this study.
The fire locations dataset is commonly partitioned into two subsets: training and validation.Considering the literature information regarding these subsets, we found that 70% of the whole fire data is common enough for model training, and the rest of it is often separated to investigate the accuracy of the models' predictions [44].In the case of non-fire locations (i.e., absence points), they However, as the temperature increases with the low humidity during the spring, the forest fire probability also increases [43].

Forest Fire Occurrence Data
The KFS offers locational forest fire occurrence data based on field survey [6,7].This includes the area, extent, cause, date, and geographical location of forest fires.From the 1980s to the 2000s, there was an increase in the number of forest fire occurrences (Figure 3).The total fire occurrences were 1291, 3279, and 5196, during the 1980s, 1990s, and 2000s, respectively (Figure 4).In the analysis using Maxent, we used only the original location data provided by the KFS, because the Maxent model does not need absence data for the analysis.Whereas, in the analysis using Random Forest, we created absence data as random points, and these were combined with the original location data, as needed for the analysis [29].The absence points are randomly distributed individual points in free-fire sites in this study.
The fire locations dataset is commonly partitioned into two subsets: training and validation.Considering the literature information regarding these subsets, we found that 70% of the whole fire data is common enough for model training, and the rest of it is often separated to investigate the accuracy of the models' predictions [44].In the case of non-fire locations (i.e., absence points), they In the analysis using Maxent, we used only the original location data provided by the KFS, because the Maxent model does not need absence data for the analysis.Whereas, in the analysis using Random Forest, we created absence data as random points, and these were combined with the original location data, as needed for the analysis [29].The absence points are randomly distributed individual points in free-fire sites in this study.
The fire locations dataset is commonly partitioned into two subsets: training and validation.Considering the literature information regarding these subsets, we found that 70% of the whole fire data is common enough for model training, and the rest of it is often separated to investigate the accuracy of the models' predictions [44].In the case of non-fire locations (i.e., absence points), they were also randomly split into a ratio of 70/30 for calibration of RF model and for validation purpose [29,45].

Socio-Economic and Environmental Factors
Previous researchers have used several different factors to model the forest fire susceptibility [1,[12][13][14][15][16][17][18][19][20][21]25,26,28,29,31,[44][45][46].To analyze the socio-economic and environmental impact on forest fire probability, several variables like slope, elevation, aspect, population density, and distance to roads that are highly correlated with forest fire occurrence were selected based on the literature review to be included in the model [1,[12][13][14][15][16][17][18][19][20][21]28,29,31,47,48] (Table 1, Figure 5).Environmental data included forest type (forestype), elevation (elev), topographic wetness index (TWI), precipitation during spring (prcp-spr), average SPI-6 during spring (SPI-spr), and fire weather index (FWI).A forest type map, provided by the KFS in a vector format, was classified into four categories: coniferous, deciduous, mixed forest, and non-forest.The coniferous includes artificial coniferous forest, Larix kaempferi, Pinus densiflora artificial forest, Pinus densiflora, Pinus koraiensis, and Pinus rigida, while the deciduous includes artificial boreal forest, boreal forest, Castanea crenata, and Quercus.Elevation was extracted from a digital elevation model (DEM) at 30-km spatial resolution provided by the Ministry of Land, Infrastructure and Transport (MOLIT).TWI, which can be calculated to determine the aspect of steady-state soil wetness, was calculated using the relevant equation [49].For meteorological data, we gathered the observed data of the Automated Synoptic Observing System from the Korea Meteorological Administration and interpolated using the inverse distance weighted (IDW) method.To relate the geographical features to precipitation, the precipitation lapse rate reflecting elevation was applied to the dataset [50,51].The 6-month Standardized Precipitation Index (SPI-6), which is widely used to detect meteorological drought with monthly precipitation data, was calculated using the R package 'SPIGA' [52,53].This index measures rainfall conditions over a 6-month period, and it can reflect the amount of antecedent precipitation that also has a correlation with fire occurrence [54].Meteorological variables during the spring were considered in the study because forest fires in South Korea historically occur primarily during the spring; high precipitation during the summer affects the annual average and makes the probability analysis more challenging [6,7].The spring in this study was defined as the months of March, April, and May [55].Also, the fire weather index (FWI) was calculated using the R package 'fwi.fbp'with the observed data and interpolated using the IDW method [56].The socio-economic variables included the population density (pop), number of national park visitors (visitors), and distance from urban area (urban).The primary data of socio-economic variables, predominantly statistical data, were transformed into spatial data using ArcMap 10.4.1.The population data was given as statistical but non-spatial data, and the administrative boundaries were given as polygon data at the municipal level.The population was spatially joined to the polygon data, and the population density was then calculated within the boundaries.The number of national park The socio-economic variables included the population density (pop), number of national park visitors (visitors), and distance from urban area (urban).The primary data of socio-economic variables, predominantly statistical data, were transformed into spatial data using ArcMap 10.4.1.The population data was given as statistical but non-spatial data, and the administrative boundaries were given as polygon data at the municipal level.The population was spatially joined to the polygon data, and the population density was then calculated within the boundaries.The number of national park visitors was also given as statistical data and was spatially joined to the national park boundaries.The rest of the area in this study was interpolated using the IDW method.Distance from the given area to an urban area was calculated using the Euclidean distance.We used the land cover map derived from the Landsat 1, 2 MSS, 5 TM, and 7 ETM provided by BIZ-GIS to calculate the distance between a certain grid point and an urban area [57].For this study, all the input variables were set to a spatial resolution of 1 km in an ASCII grid format for the Maxent, and into a point format for the Random Forest.In addition, as the study aims to have a higher resolution map of forest fire probability than that of previous studies, we resampled both datasets with coarser resolution and those with higher resolution at 1 km, because we thought that downscaling would not change the properties of the coarser data, but would allow us to keep all details in the high-resolution data.

Maximum Entropy Model (Maxent)
Maxent is a machine learning tool primarily used for the current and potential prediction of the spatial distribution of species by evaluating the contrasts between point locations and layers.Maxent uses presence-only data to estimate the probability distribution or habitat suitability of species based on the theory of maximum entropy.The model predicts the distribution subject based on input constraints and input variables, which can be continuous or categorical [22,58,59].
Maxent generates a probability of presence varying from 0 to 1 and the response curves for each variable through model training.In addition, the results of the model run include the area under the receiver operating characteristic curve (AUC).AUC values are frequently used as a measure of model performance and accuracy.The range of an AUC value is from 0.0 to 1.0, and a value of 0.5 indicates that the model performance is no better than random, while values nearer 1.0 indicate better model performance.A model generating fair or good predictions has an AUC value greater than 0.7 [60,61].
Moreover, Maxent is known to be highly accurate with regard to statistical value [22][23][24][25][62][63][64].It has obtained fairly good results in predicting forest fires in several studies because fire occurrence can be considered as a species distribution [18][19][20]24,25].In this study, Maxent version 3.4.1 was used to estimate forest fire probability in South Korea at a 1-km spatial resolution.The output format was set as logistic to obtain the probability value between 0 and 1, 30% of the forest fire location points were randomly used, the maximum iterations were set to 5000, and 10 simulation runs were conducted with Bootstrap to reduce the probabilistic uncertainty.

Random Forest Model
Random Forest is an algorithm based on ensemble techniques for classification and regression trees [29][30][31].The model generates decision trees on several randomly selected bootstrap samples to get prediction from each tree and selects a subset of explanatory variables at every node.The final outcome of the model is the average of the results of all the trees [65].When running the Random Forest model, it is necessary to define the number of variables to be used in each tree-building process (m try ) and the number of trees to be grown in the forest (n tree ).The model is known to be having low bias and low variance because the result is averaged over a large number of trees.
The Random Forest model leaves about one-third of the samples for validation, and obtains an unbiased estimate of the generalization error.The proportion of mis-classifications (%) over all out-of-bag elements is called the out-of-bag (OOB) error.The OOB error can evaluate the model performance without a separate test set [66,67].
The aim of the Random Forest model is to identify a suitable model to analyze the relationship between independent variables and a dependent variable in the calibration phase to determine the weight value for each factor, provided with variable importance [68].In this study, the R packages 'randomForest' and 'sdm' were used [69,70].The forest fire occurrence data was used as dependent variable, and forest type, elevation, topographic wetness index, precipitation during spring, SPI-6 during spring, fire weather index, population density, number of national park visitors and distance from urban area are used as independent variables.

Model Performance
The investigation of the model performance is necessary for the modeling process.For this purpose, we used validation data sets which were not used in the model training step.In this study, the predictive performance of the models was evaluated by applying the most common threshold-independent method, the receiver operating characteristic (ROC) curve [71].The ROC curve is drawn by plotting all combinations of sensitivities on the vertical axis and the proportions of false negatives (1-specificity) on the horizontal axis.The area under the ROC curve (AUC) has been considered to be a quantitative performance metric [29,45,46].An AUC value = 1 indicates a perfect prediction, whereas an AUC < 0.5 demonstrates a weak performance [72].Model performance based on the AUC metric can be classified as follows: 50-60% (poor), 60-70% (moderate), 70-80% (good), 80-90% (very good), and 90-100% (excellent) [73].Sensitivity is obtained based on the fraction of fire occurrences (i.e., positive points) that are correctly predicted, while "1-specificity" is the fraction of incorrectly predicted cases that did not occur [74].

Maxent Results
For the Maxent analysis, the average forest fire probability was 0.421, 0.464, and 0.461 during the 1980s, 1990s, and 2000s, respectively.The analysis revealed that the average forest fire probability increased until the 1990s and then slightly decreased during the 2000s.
With regard to the spatial distribution, the probability of forest fires was higher in urban areas and the eastern coastal area, where the population has accessibility to low-elevation forests (Figure 6).The probability has been concentrated in South Korea's largest cities, Seoul, Busan, Daejeon, and Gwangju, over the decades.

Maxent Results
For the Maxent analysis, the average forest fire probability was 0.421, 0.464, and 0.461 during the 1980s, 1990s, and 2000s, respectively.The analysis revealed that the average forest fire probability increased until the 1990s and then slightly decreased during the 2000s.
With regard to the spatial distribution, the probability of forest fires was higher in urban areas and the eastern coastal area, where the population has accessibility to low-elevation forests (Figure 6).The probability has been concentrated in South Korea's largest cities, Seoul, Busan, Daejeon, and Gwangju, over the decades.The Maxent results show the percent contribution of each input variable to forest fire probability.The output was significant for the variable pop during all decades (Table 2).Elev also had high a percent contribution across all time periods.Other than these variables, urban accounted for a significant percentage during the 1980s and the 1990s, and TWI during the 2000s.Additionally, the percent contribution of climate variables like prcp-spr, SPI-spr, and FWI indicated either a decreasing trend or insignificant importance over time.The Maxent results show the percent contribution of each input variable to forest fire probability.The output was significant for the variable pop during all decades (Table 2).Elev also had high a percent contribution across all time periods.Other than these variables, urban accounted for a significant percentage during the 1980s and the 1990s, and TWI during the 2000s.Additionally, the percent contribution of climate variables like prcp-spr, SPI-spr, and FWI indicated either a decreasing trend or insignificant importance over time.Maxent produced response curves that showed how the predicted forest fire probability changes across each variable.The red curves show the average response of the replicates.Each variable showed a similar correlation with forest fire probability (Figure 7).Maxent predicted a significant negative correlation between elev and forest fire probability.TWI, which is related to soil moisture, also showed a strong negative correlation in both analyses and across all decades.Additionally, forestype was a variable with only intermediate importance, based on its percent contribution, and there was little difference, less than 10%, between the categories of forestype.However, the probability was the highest in coniferous forest during the entire period.
Prcp-spr generally showed a negative correlation.On the other hand, SPI-spr exhibited inconsistent results as it showed both a negative and a positive correlation.FWI showed a negative correlation during the 1980s but it showed a positive correlation during the 2000s.Taken together, the analysis supports the conjecture that less rainfall can affect the occurrence of forest fires.
Pop showed a positive correlation during the entire period of the Maxent analysis.This indicates that if the population density is higher, forest fire probability increases.However, the visitors did not have a significant correlation with the probability.In addition, the forest fire probability slightly decreased as the distance from urban area (urban) increased.

Random Forest Results
For the Random Forest analysis, the average forest fire probabilities, which are the average raster values for the study area, were 0.617, 0.628, and 0.618 during the 1980s, 1990s, and 2000s, respectively.The analysis also revealed that the average forest fire probability increased until the 1990s and then decreased during the 2000s.
With regard to spatial distribution, there are some differences over time, but there is a high forest fire probability in and around urban areas and the eastern coastal area in Random Forest analysis (Figure 8).Over the decades, forest fire probability has been concentrated in South Korea's largest cities, Seoul, Busan, Daejeon, and Gwangju.Prcp-spr generally showed a negative correlation.On the other hand, SPI-spr exhibited inconsistent results as it showed both a negative and a positive correlation.FWI showed a negative correlation during the 1980s but it showed a positive correlation during the 2000s.Taken together, the analysis supports the conjecture that less rainfall can affect the occurrence of forest fires.
Pop showed a positive correlation during the entire period of the Maxent analysis.This indicates that if the population density is higher, forest fire probability increases.However, the visitors did not have a significant correlation with the probability.In addition, the forest fire probability slightly decreased as the distance from urban area (urban) increased.The Random Forests results show the importance of each input variable to forest fire probability.The results indicate the significance of pop and elev across all decades (Table 3).Other than these variables, SPI-spr accounted for a significant percentage during the 1980s and the 2000s, and urban during the 2000s.
The analysis also revealed that the average forest fire probability increased until the 1990s and then decreased during the 2000s.
With regard to spatial distribution, there are some differences over time, but there is a high forest fire probability in and around urban areas and the eastern coastal area in Random Forest analysis (Figure 8).Over the decades, forest fire probability has been concentrated in South Korea's largest cities, Seoul, Busan, Daejeon, and Gwangju.The Random Forests results show the importance of each input variable to forest fire probability.The results indicate the significance of pop and elev across all decades (Table 3).Other than these variables, SPI-spr accounted for a significant percentage during the 1980s and the 2000s, and urban during the 2000s.

Comparison and Validation of the Models
Both the Maxent and Random Forest results showed that the spatial distribution of forest fires in South Korea is concentrated around cities and eastern coastal area.The trend of concentration of probability increased over the decades.Although the results from the Random Forest showed higher average probability compared to those from the Maxent, there were similarities on the overall spatial

Comparison and Validation of the Models
Both the Maxent and Random Forest results showed that the spatial distribution of forest fires in South Korea is concentrated around cities and eastern coastal area.The trend of concentration of probability increased over the decades.Although the results from the Random Forest showed higher average probability compared to those from the Maxent, there were similarities on the overall spatial distribution of forest fire.In addition, the average probability was the highest during the 1990s in both analyses.
In terms of variable importance, both the Maxent and Random Forest results showed a high contribution of pop and elev to the forest fire probability.In particular, pop was the most important variable in both the Maxent and Random Forest analyses and across all decades.The contribution of pop was the highest during the 1990s in both analyses.Elev also contributed as one of the significant variables across all decades in both analyses.The Maxent result showed little significance for the climate variables, particularly those related to precipitation, whereas the Random forest had considerable significance on SPI-spr.
In this study, AUC values were used to measure the model performance and accuracy.The results showed relatively high statistical accuracy, considering that forest fire occurrence in South Korea is primarily a result of human activities and that it was simulated at a 1-km spatial resolution on a national level.The AUC values of fire probability were 0.753, 0.652, and 0.636 during the 1980s, 1990s, and 2000s, respectively, for Maxent, whereas for Random Forest they were 0.909, 0.898, and 0.906, respectively.Therefore, the Random Forest model performs better than the Maxent model in terms of accuracy in estimating South Korean forest fire probability.

Impact of Socio-Economic and Environmental Drivers on Forest Fires
As South Korea has grown rapidly, urban sprawl and the concentration of urban population has increased significantly alongside the country's socio-economic development from the 1980s to the 2000s, particularly during the 1990s [75].During this period of rapid development, there has been an increasing trend in the occurrence of forest fires.In this study, we attempted to determine the relationship between socio-economic development and forest fire probability in South Korea.
As most of the forest fires in South Korea are reported to be a result of human activity, population density and forest elevation can help explain the results of the study.A higher population concentration and lower forest elevation can signify a higher presence of human activity [76].These variables are considered to be relevant to human activity because population density leads to an increase in human activity and forests at higher elevations are less accessible.
This study confirmed that forest fire probability has a strong correlation with human-related variables like population density, especially during the 1990s, and elevation.From both the Maxent and Random Forest results, the percent contribution or variable importance of population density and elevation consistently had significant importance throughout the entire study period.However, the percent contribution and importance of the population density was the highest during the 1990s when there was a significant urban sprawl.In terms of the spatial distribution of forest fires in South Korea, it was found that forest fires were mostly in urban areas and in the eastern coastal area in both analyses.
In line with this, the distance from urban area had a fairly high significance with regard to fire probability.The significance has decreased over the decades in both analyses, and we conjecture this to be caused by the sudden urbanization throughout the country that contributed to decreasing importance of distance from urban area over time.
Among the environmental variables, TWI also showed a distinct contribution in both analyses across all decades and a negative correlation to the probability for the Maxent result.As TWI is an indicator measuring the availability of long-term soil moisture, it is assumed that the forest fire probability will be higher in dry soil conditions.Also, for the Maxent analysis, precipitation during spring generally was negatively correlated with the probability.
There is a less than 10% difference between forest types in the Maxent analysis, but most of the cases show that coniferous forest has a slightly larger correlation to fire probability.This can be inferred because the Japanese red pine (Pinus densiflora), a coniferous tree with good burning characteristics, is the dominant tree species in South Korea.Hence, as the dominant species is coniferous, the forest has been susceptible to fire and the amount of available fuel is high during the spring [77-79].

Machine Learning Models with Regard to Spatial Distribution and Accuracy
In terms of spatial distribution, the forest fire probability is more concentrated in or around urban areas over time in both the Maxent and Random Forest analyses.This reflects a lower range of spatial variability in fire probability but indicates that fires occur more frequently near cities.
The South Korean forest fires, which are mostly caused by human activity, show lower spatial autocorrelation compared to other disasters or natural forest fires [80].This has reduced the prediction accuracy of models, but we presented statistically significant results by applying machine learning models to forest fire prediction.The AUC value decreased over the decades in the Maxent analysis.This is due to the fact that the samples become larger over the decades, and this can violate the assumption of independent observations due to spatial autocorrelation [18,81].However, for Random Forest analysis, the AUC value was highest during the 1980s and the lowest during the 1990s.The value has not decreased over the decades, despite the larger samples.This shows that the Random Forest model has a higher capability of dealing with spatial autocorrelation issues with larger samples [82,83].
The overall spatial distributions of the two model results and the importance of the main variables were similar.However, the Random Forest was superior in obtaining higher AUC values, while the forest fire probability was obtaining overall higher or overestimated probability compared to the Maxent result.In the case of the Maxent result, the AUC values were relatively low, but the fire probability was estimated adequately and provided response analysis of each variable.By using both the Maxent model and the Random Forest model, it is possible to resolve the issues regarding the overestimation and the prediction accuracy, and to obtain the spatial distribution of forest fire and verify the impact of socio-economic drivers to forest fires.

Limitations and Uncertainty
The forest fire occurrence data used in this study was based on field surveys from the Korea Forest Service.The agency uses the field data to document forest fire occurrence, but understanding the potential uncertainty and data reliability is necessary.Fire detection with satellite imagery is considered one option based on scientific studies.A comparison between field data and satellite-derived data and understanding errors are suggested in a future study.
Most of the socio-economic source data used in this study are not created for the purpose of performing spatial analysis.However, the data had to be used as variables in order to demonstrate the relationships between forest fire probability and socio-economic changes, thus they are transformed into grid maps.This may create data uncertainty, which can lead to model uncertainty.
Due to the absence of some data, some of the variables are interpolated.Interpolation is useful to fill in the gaps when looking for values missing but it can also affect the accuracy of models due to the uncertainty created in making predictions.In this study, the observed climate data was interpolated to make grid maps to get prcp-spr, SPI-spr and FWI.From the Maxent result, SPI-spr and FWI showed an inconsistent correlation with fire probability.This may be due to the uncertainty from interpolation.Not only climate data, but also socio-economic data was interpolated.In terms of including the number of visitors to national parks, this was to provide human access to the forest area.The number of visitors to national parks was used as the best alternative plan because there were no existing data counting visitors to all forest areas.It was possible to approximately estimate the number of people visiting the forest areas from the 1980s by interpolating.As a result, the number of national park visitors showed an insignificant correlation with fire probability.This might be explained by the fact that the source data was limited only to the number of visitors to national parks.
To make comparisons among the decades, only the variables that were available or were created since the 1980s could be used.Available datasets have increased since then, but they couldn't have been used for comparisons of whole periods.For example, road density from a node and link dataset can be used to represent urbanization, but unfortunately, this was only available from 2008.Thus, distance from urban area was used instead to explain the effect of human accessibility to fire occurrence.
As society becomes more complex, future studies need to incorporate more advanced socio-economic variables and different model approaches in order to gain new insights and further reduce uncertainties.For instance, south-facing slopes in forests present ideal conditions for the fire outbreaks due to the low humidity and soil moisture that result from an abundance of sunlight.These conditions also encourage the growth of pine trees, which, in turn, leads to a larger accumulation of fuel for a given fire [48].More than 50% of the forest fires studied occurred on south-facing slopes according to the '2011 Forest Disaster White Paper' published by the National Institute of Forest Science, which included an investigation of forest fire characteristics from 2007 to 2011 [84].However, this aspect was not considered among the input variables in this study, because the model was simulated at a spatial resolution of 1 km-a large area in which to define only one aspect in a grid.Thus, this aspect should be included in a future study with downscaled model output.

Towards Forest Fire Risk Reduction
This study is meaningful as it considers both socio-economic and environmental variables in the model and shows the linkage between socio-economic development and forest fire probability.For both analyses, the average forest fire probability was highest during the 1990s, exceeding even the 2000s, despite ongoing socio-economic development.We assume that this is a result of increased and more advanced forest management during the 2000s.The KFS established and implemented the Basic Plan for the Prevention of Forest Fires (2006-2010) after a large-scale forest fire in 2000 and continued to expand the budget to prevent forest fires across the country [85].
The results indicate that effectively managing human activity is significant in preventing forest fire occurrence and that socio-economic factors are increasingly being considered in causing and preventing forest fires.Additionally, through analyses of forest fire probability and the identification of key input variables, potential forest fire occurrences can be decreased, and sustainable forest management can be achieved [80,86,87].Probability maps can be utilized to reduce disaster risk in fire-prone areas; there are several measures that can be used to sustainably manage forests more effectively including controlling forest density to mitigate water competition and fire spreading, increasing species diversity to diversify burn severity, selecting tree species to mitigate climate change and potential fire risk, establishing fire safety standards and preventive measures in populated areas, and monitoring fire-prone areas to quickly respond to forest fire occurrences [88][89][90][91][92][93][94][95].Because forest fires are preventable disasters, to some extent, we suggest that it is critical to implement preventive and precautionary measures in a timely fashion, especially measures related to human activity in forests close to urban areas.

Conclusions
South Korea has rapidly urbanized, and has experienced an increasing trend in forest fire occurrence.By modeling forest fire probability for three separate periods using a machine learning algorithm like Maxent and Random Forest, this study simulated the forest fire probability and identified the impact of socio-economic changes, such as urban sprawl, on fire probability.
In terms of average forest fire probability, the probability was highest during the 1990s and the probability of forest fires was high in urban areas and the eastern coastal area in both analyses.The variables with the highest percent contribution were pop and elev across all decades.In particular, population density was found to be the most significant and positively correlated variable for fire occurrence in this study, particularly during the 1990s.
The results showed relatively high statistical accuracy, given that forest fires in South Korea primarily result from human activity that is fairly unpredictable, and that a 1-km spatial resolution was used for the analysis.The accuracy of the model was higher in the Random Forest result compared to that of the Maxent, but the average forest fire probability was overestimated in the Random Forest model.
This study shows that over the decades, the spatial distribution of fire probability has become more and more concentrated in or around cities, and that the forest fire probability has a strong correlation with human-related variables over time.This indicates that it is important to implement preventative and preparedness measures in forest fire management to reduce the occurrence and impact of forest fires in South Korea.

Figure 1 .
Figure 1.Socio-economic development of South Korea as described by urban population, GDP, GDP per capita and number of national park visitors.Data is from the Korean Statistical Information Service (KOSIS) and the Korea Forest Service (KFS) [6,7].

Figure 1 .
Figure 1.Socio-economic development of South Korea as described by urban population, GDP, GDP per capita and number of national park visitors.Data is from the Korean Statistical Information Service (KOSIS) and the Korea Forest Service (KFS) [6,7].

Figure 2 .
Figure 2. Study area and its land cover map (2007) from the Ministry of Environment of the Republic of Korea.

Figure 2 .
Figure 2. Study area and its land cover map (2007) from the Ministry of Environment of the Republic of Korea.

Figure 3 .
Figure 3. Number of annual forest fire occurrences from 1980 to 2009 based on the point location data.Data from the KFS.

Figure 3 .
Figure 3. Number of annual forest fire occurrences from 1980 to 2009 based on the point location data.Data from the KFS.

Figure 3 .
Figure 3. Number of annual forest fire occurrences from 1980 to 2009 based on the point location data.Data from the KFS.

Figure 5 .
Figure 5. Maps of environmental and socio-economic input variables for the 1980s, 1990s, and 2000s.

Figure 5 .
Figure 5. Maps of environmental and socio-economic input variables for the 1980s, 1990s, and 2000s.

Figure 7 .
Figure 7. Response curves illustrating the relationship between forest fire probability and the input variables during the 1980s, 1990s, and 2000s (Maxent analysis).

Figure 7 .
Figure 7. Response curves illustrating the relationship between forest fire probability and the input variables during the 1980s, 1990s, and 2000s (Maxent analysis).

Table 1 .
Input variables used to estimate forest fire probability.

Table 2 .
Percent contribution of each input variable to forest fire probability during the 1980s, 1990s, and 2000s (Maxent Analysis).
Note.Bold indicates significance.

Table 3 .
Variable importance of each input variable to forest fire probability during the 1980s, 1990s, and 2000s (Random Forest analysis).
Note.Bold indicates significance.

Table 3 .
Variable importance of each input variable to forest fire probability during the 1980s, 1990s, and 2000s (Random Forest analysis).
Note.Bold indicates significance.