Developing a New Hourly Forest Fire Risk Index Based on Catboost in South Korea

: Forest ﬁres can cause enormous damage, such as deforestation and environmental pollution, even with a single occurrence. It takes a lot of e ﬀ ort and long time to restore areas damaged by wildﬁres. Therefore, it is crucial to know the forest ﬁre risk of a region to appropriately prepare and respond to such disastrous events. The purpose of this study is to develop an hourly forest ﬁre risk index (HFRI) with 1 km spatial resolution using accessibility, fuel, time, and weather factors based on Catboost machine learning over South Korea. HFRI was calculated through an ensemble model that combined an integrated model using all factors and a meteorological model using weather factors only. To conﬁrm the generalized performance of the proposed model, all forest ﬁres that occurred from 2014 to 2019 were validated using the receiver operating characteristic (ROC) curves and the area under the ROC curve (AUC) values through one-year-out cross-validation. The AUC value of HFRI ensemble model was 0.8434, higher than the meteorological model. HFRI was compared with the modiﬁed version of Fine Fuel Moisture Code (FFMC) used in the Canadian Forest Fire Danger Rating Systems and Daily Weather Index (DWI), South Korea’s current forest ﬁre risk index. When compared to DWI and the revised FFMC, HFRI enabled a more spatially detailed and seasonally stable forest ﬁre risk simulation. In addition, the feature contribution to the forest ﬁre risk prediction was analyzed through the Shapley Additive exPlanations (SHAP) value of Catboost. The contributing variables were in the order of relative humidity, elevation, road density, and population density. It was conﬁrmed that the accessibility factors played very important roles in forest ﬁre risk modeling where most forest ﬁres were caused by anthropogenic factors. The interaction between the variables was also examined.


Introduction
Forest fires have a profound impact on ecological processes on the Earth, such as deforestation, habitat destruction, and loss of soil nutrients, as well as the environmental, economic, and social sectors [1][2][3]. It takes a long time and significant effort to restore areas with forest fires to their previous state. Therefore, it is necessary to make efforts to reduce the damage caused by forest fires. One such effort is to identify areas with high forest fire risk in advance to establish proper mitigation and response plans from a management perspective.
In recent years, many studies have also tried to predict where a forest fire is likely to occur [1,[4][5][6]. Anthropogenic, topographic, meteorological, and fuel factors are mainly used to calculate fire risk [7][8][9]. The anthropogenic factors, such as distance from roads, distance from rivers, and urban areas, and topographic factors, such as elevation and slope, have been used for accessibility consideration [10,11].
As meteorological data, temperature, relative humidity, precipitation, and wind speed have been mainly used to consider dry weather conditions prone to fires. These meteorological variables have been typically extracted by interpolating station values or from numerical model data [7,12]. Some studies usually utilized satellite-derived indices to extract fire-prone dry fuel or weather conditions to explain the relationship between input factors and forest fires. However, there are limitations due to cloud contamination and coarse spatial or temporal resolution when using optical satellite sensor systems [4].
To predict forest fire susceptibility and risk, a lot of efforts have been made worldwide. The Canadian Forestry Service has developed the Canadian Forest Fire Weather Index (CFWI) and has been officially conducting forest fire risk forecasting since 1970. CFWI has been widely studied for its feasibility in many countries including Australia, the United Kingdom, Greece, China, and South Korea [5,[13][14][15][16][17]. The Met Office Fire Severity Index (MOSPI) forest fire risk forecasting system provided by the UK is also based on CFWI [18]. CFWI is a weather index that combines fine fuel moisture code (FFMC), duff moisture code (DMC), drought code (DC), initial spread index (ISI), and build up index (BUI) information obtained using weather information such as temperature, relative humidity, wind speed, and precipitation [19]. CFWI has complicated calculation processes and empirical standards (i.e., coefficients for formula), and thus may not fit well for other areas with different weather conditions, requiring additional careful calibration [20][21][22].
The Korea Forest Service's National Forest Research Institute (NFRI) uses weather data-temperature, relative humidity, and wind speed-to calculate real-time Daily Weather Index (DWI) based on logistic regression [23]. NFRI provides DWI showing the degree of forest fire risk by administrative units. DWI is classified into four fire risk classes (low, moderate, high, and very high), providing real-time and 72-h forecasts at the 3-h interval [23,24]. DWI is mainly focused on weather factors without consideration of other accessibility or surface factors. Thus, it might not work well to predict the susceptibility of forest fires that frequently occur due to anthropogenic factors. In particular, the forest fires caused by anthropogenic factors such as incineration of paddy fields, waste incineration, and cigarettes exceed 70% according to historical data from 2010 to 2019 in South Korea. According to Kim et al. [25], human activities and accessibility in South Korea are closely related to the probability of forest fire occurrence. Thus, it is necessary to develop a forest fire risk monitoring model that considers not only weather conditions but also topographic and anthropogenic factors.
In order to simultaneously consider weather, fuel, topography, and anthropogenic factors, it is advantageous to use machine learning. In particular, machine learning has proved successful in applying a multitude of variables for predicting forest fire risk [8,9,26]. Main input variables to the forest fire risk prediction include satellite-derived vegetation indices, anthropogenic factors such as proximity to roads, meteorological factors such as temperature, and hydrological factors such as rainfall [26]. Many studies have modeled forest fire susceptibility based on the binary information with forest fire location or its surrounding areas as 1 and non-fire areas far from forest fires as 0. A variety of machine learning approaches including artificial neural networks, random forest (RF), support vector machines, generalized additive model, and boosted regression trees have been used for forest fire risk modeling [7,8,20,27].
However, many of the studies have used fixed parameters and focused on long-term (i.e., yearly, monthly) fire risk. Nowadays, studies on dynamic models focusing on short-term (i.e., daily, sub-daily) fire risk are becoming active [5]. Since forest fire risk has its diurnal cycle, it is necessary to derive a forest fire risk index at very short time scale (i.e., every hour) to provide practical information for decision making on forest fire management. Thus, there is still room for improvement such as providing forest fire susceptibility and risk hourly or sub-hourly. So far, machine learning-based forest fire susceptibility studies have focused on a few large forest fire cases or are not updated with time (not hourly) [5,28]. When an approach is proposed for fire risk prediction, its transferability should be evaluated on both spatial and temporal domains. Therefore, this study aims to propose a new hourly forest fire risk index (HFRI) at 1 km resolution over South Korea. Specifically, the measurable objectives of this research were to (1) develop a machine learning-based ensemble approach that combines two models with the topographic and anthropogenic factors as well as weather factors, (2) analyze the impact of each factor on HFRI, and (3) to evaluate the transferability of the proposed approach on a temporal domain. As mentioned above, forest fires in South Korea have mainly been caused by anthropogenic factors. Thus, the proposed HFRI that synergistically integrates weather, topographic, and anthropogenic factors is expected to better predict the forest fire risk in South Korea than other risk indices based solely on weather conditions such as DWI.

Study Area
The study area is South Korea where more than 70% of the land is forested areas (Figure 1). While the mean altitude is 448 m, the mean slope is 5.7 • , which is about 2 • steeper than the overall average of 3.9 • in East Asia including Japan, the east part of Mongolia, and China [29]. Most forest fire cases have occurred in the spring season when the weather is dry and warm, and hikers frequently visit forested areas. On the other hand, the number of forest fire occurrences in summer is very low due to the highly concentrated precipitation and relatively high humidity. Among the forest fire events that occurred from 2010 to 2019, 58% occurred in spring, followed by winter (23%) [30]. The frequency and intensity of forest fires continue to increase as on-going climate change tends to increase extreme weather conditions [30]. The study area of South Korea with (a) elevation and (b) land cover data. The elevation data come from ALOS World 3D 30m v3.1 [31]. The land cover data were provided by the Ministry of Environment [32].

Data
In this study, major weather factors, geographical data considering accessibility, fuel data, and time-related data related to the seasonal pattern of forest fires were used as input data to develop forest fire risk models. The input variables are summarized in Table 1. The Digital Forecast is the weather forecast information provided by the Korea Meteorological Administration at 5 km resolution every 3 h based on the Unified Model Regional Data Assimilation and Prediction System numerical forecast model. It forecasts weather conditions hourly for the first two hours and with the three-hour interval after that up to two days (48 h). Digital Forecast provides basic meteorological products including temperature, precipitation, relative humidity, wind direction, and wind speed. The real-time Digital Forecast data are post-processed by Observation Data Analysis Module (ODAM) utilizing the in-situ observations [33]. In this study, the ODAM data were used for forest fire risk modeling. Among the products from Digital Forecast, relative humidity, temperature, and wind speed that affect forest fire occurrences were used in this study [1,34].

Geographical Data
The elevation from the ALOS World 3D 30 m (AW3D30) provides height information above the sea level globally at 1 arcsecond (approximately 30 m) resolution. It was released in May 2016 and updated in April 2020 by the Japan Aerospace Exploration Agency. It was constructed using the Panchromatic Remote-sensing Instrument for Stereo Mapping boarded on the Advanced Land Observing Satellite [35]. Studies have shown that AW3D30 has a better skill core than Shuttle Radar Topography Mission Digital Elevation Model (DEM) or Advanced Spaceborne Thermal Emission and Reflection Radiometer DEM in the Philippines and Mexico that have rugged mountains at high altitudes [35][36][37]. In particular, AW3D30 showed the most stable results among the three models even on the high altitude and dense vegetation. The elevation data were used as input considering accessibility because areas at the high altitude are less accessible by people, reducing the chance of forest fire occurrences.
The population density was derived from the Gridded Population of the World, Version 4, and Revision 11 data provided by the Society Economic Data and Applications Center [38]. The population density is provided as raster data at multiple spatial scales from 30 arcseconds to 60 arcminutes all over the world. As it is provided every five years from 2000, this study used 30 arcseconds (~1 km) data generated in 2015. The road density was calculated using the global road dataset provided by the Global Roads Inventory Project [39]. It contains five types of roads: highways, primary roads, secondary roads, tertiary roads, and local roads [40]. In this study, road density was calculated from the GRIP vector data (all road types) through the line density function in the ArcGIS 10.4 program. The population and the road densities were also used as factors to indirectly consider accessibility.

Fuel Data
The forest density was calculated using the forest type data provided by the Korea Forest Service's Forest Geospatial Information System [41]. The forest type map has been produced every five years using aerial photography and field observations at two spatial scales: 1:5000 and 1:25,000 scale. The forest type map contains information on forest type, diameter-at-breast class, age class, and density. This study used the density data at the 1:25,000 scale. Forest density is classified into three categories, divided into forest stands with a crown area of 50 percent or less, those with 51 to 70 percent, and those with 71 percent or more. A larger value means a higher density.

Time-Related Data
In the study area, forest fire occurrences were concentrated in spring and winter while they were not frequent in summer and fall. Therefore, day of the year (DOY) was used as a variable to consider the seasonality of forest fires. DOY has been used as a variable after converting through sine or cosine functions to consider seasonality in many studies [42,43]. In this study, the sine function was applied to DOY, which generated a value between −1 (winter) and 1 (summer) with one year period.

In Situ Observation Data
The in-situ forest fire observation data provided by the Korea Forest Service were used for training and validation of the proposed models. The observation data contain the time from the start of a forest fire to the extinguishment, as well as information on the location, area, and cause of the forest fire. In this study, forest fire cases occurred from 2014 to 2019 were used, resulting in a total of 3118 cases. It should be noted that only eight of these cases caused large damaged areas more than 100 ha, while most resulted in damaged areas less than 5 ha.

Methods
The process flow of this study consists of three main parts: data preprocessing, machine learning modeling, and model validation ( Figure 2). The input variables were first preprocessed to have the same spatial extent and resolution. Multicollinearity of the variables was then tested using the variance inflation factor (VIF). Monthly forest fire density through the stratified sampling was finally generated in the preprocessing stage. An ensemble model combining two machine learning models with different input variables was developed in the modeling part. The ensemble model was cross-year validated (One-year-out cross-validation) and further compared to the existing forest fire risk indices.
One-year-out cross-validation refers to excluding one-year data for validation and training the model with the remaining five-year data. By repeating this process for each year (i.e., six times), all six years were cross-validated. The spatial distribution of the hourly forest fire risk index was qualitatively evaluated as well.

Data Preprocessing
Since input data have different spatial scales, all data were aggregated using an average function or resampled using the nearest neighbor to 1 km: Among the data used in this study, Roaddens, Popdens, elevation, and DN with less than 1 km spatial resolution were aggregated to the target resolution of 1 km using an average function. The smoothing effect caused by upscaling the original fine resolution data was negligible to the model performance (Supplementary Figure S1). The weather data at 5 km spatial resolution were simply downscaled to 1 km so that all 25 pixels in a 5-km grid have the same value. The VIF test was used to examine the multicollinearity between the input variables. VIF values > 10 generally indicate high multicollinearity between the variables, so the value of 10 is often used as a threshold for the VIF test [7,44]. In this study, all VIF values were below 10, so all input variables were used to develop the models.
Since this study was to develop a forest fire risk index, we approached it as a regression problem targeting forest fire risk rather than the binary classification of forest fire and non-fires. However, since in-situ data were provided only for forest fire occurrences, reference data for low forest fire risk were extracted in the areas where fires did not occur. To extract non-fire pixels (i.e., very low fire risk), the statistics of historical data were used. Monthly forest fire density was calculated using data of the past five years for each sampling year, resulting in a normalized value ranging from 0 to 1. For example, to extract non-fire samples in January 2015, the monthly forest fire density was calculated using January data from 2010 to 2014. If there were many fire occurrences in the surrounding areas of a location in the past 5 years, the location would have a high density. The density was then divided into four classes (25% each) using the percentiles for stratified sampling (i.e., non-fire samples were evenly extracted for each class). Note that the monthly forest fire density was only used as a supporting material to allocate the dependent variable value of non-fire samples and was not itself used as a dependent variable. The pixels with the high-density class (the top 25%) were excluded for extracting non-fire samples, while the pixels with the lower density classes had values from 0 to 0.2 ( Table 2). The reason for assigning different values instead of 0 value to all non-fire pixels is to ensure the reliability of the non-fire pixel selection in the consideration of the forest fire pattern in recent years. In many studies on wildfire susceptibility, fires and non-fires were designated as 1 and 0, respectively, which often resulted in bimodal output (i.e., very high close to 1 and very low close to 0) with limited areas that have moderate susceptibility (e.g., 0.5-0.6) [9,45]. If no forest fire has occurred at a region in the past years, the forest fire risk for the region might be statistically low, so it is desirable to allocate it to zero. However, the forest fire risk of the region might not be low if the region suffered from forest fires in the past considering the causes of the fires. Thus, we assigned low values to some non-fire samples with moderate forest fire density. The values of 0.1 and 0.2 were empirically determined by testing multiple combinations. It should be noted that the proposed model was designed to predict forest fire risk, not to detect forest fires themselves.

Machine Learning Modeling
An ensemble model was proposed in this study, combining two machine learning models with different input variables. The reason for adopting the ensemble approach is to complement the different characteristics of two types of input variables: static variables and dynamic weather variables. As forest fires generally occur due to anthropogenic factors in South Korea, static input variables such as Popdens and Roaddens often have a relatively higher influence on the model than other dynamic variables (Supplementary Figure S2), which tends to result in high risk around urban areas all the time in the model results when using all variables (hereinafter, the integrated model). To mitigate the problem, the meteorological model based only on weather input variables was developed to show the seasonal and diurnal changes of forest fire risk. These two models were linearly combined through the weighted averaging approach to produce the final HFRI. The empirically determined optimum weighting ratio was 8:2 (i.e., integrated model: meteorological model).
The pixels of in-situ forest fire observations within the fire duration up to eight hours were assigned with a value of "1" showing the highest fire risk. Most forest fires occurred in the study area were extinguished within 8 h, except for a few very large (more than 100 ha damaged area) fires. In addition, even if a forest fire has continued over 8 h, it is difficult to determine whether the situation at that time is really a high forest fire risk. It should be noted that the model was targeting forest fire risk, not fire itself. Therefore, we used eight hours as a threshold when selecting training samples to avoid any impact from fire extinguishing work. Non-fire samples representing low fire risk were randomly extracted only at the time that forest fires have occurred in the forest areas except those within 5 km from forest fire locations for the integrated model. The buffer of 5 km was empirically determined with the assumption of relatively high fire risk within the buffer. Stratified sampling by the monthly forest fire density was used to collect non-fire samples. Since non-fire samples were taken only at the same time as forest fires, the integrated model can distinguish between high and low forest fire risk depending on the location. On the other hand, non-fire samples for the meteorological model were additionally collected covering all seasons and time with the same stratified random sampling approach. That way, differences in variables according to the temporal change were well reflected in the meteorological model to distinguish high and low forest fire risk. As a result, non-fire samples for the integrated model were extracted almost the same rate as the number of forest fires, while non-fire samples for the meteorological model were not to be biased in season and time by extracting non-fire samples in time when forest fires did not occur, which resulted in the non-fire sample size three times more than those for the integrated model. Different sampling strategies for two models were to intentionally emphasize the spatial characteristics of forest fires for the integrated model, while the temporal characteristics for the meteorological model. The training sample sizes used for the two machine learning models are summarized in Table 3.
Both the integrated and meteorological models were developed using Catboost. Catboost is one of the boosted machine learning techniques, featuring ordered boosting unlike the well-known gradient boosting techniques [46]. The typical boosting algorithms such as extreme gradient boosting (XGboost) have often an overfitting problem caused by data leakage. However, Catboost better responds to the overfitting problem than other boosting algorithms by unbiased boosting using the independent permutated historical samples at each training step [47]. In addition, Catboost is characterized by its specialization for handling categorical variables. The most common method of dealing with categorical variables in other machine learning techniques is the one-hot encoding, which converts a categorical variable to features with a binary class for each category. Another method is to use target statistics (TS), which assigns expected values for each category (usually the average value of the dependent variable for each category). TS is an efficient way to prevent losing information than the one-hot encoding and a beneficial way for high-dimension categorical features because there is no need to create binary classes for each category [48]. Catboost uses an ordered TS technique.
Catboost can provide feature importance similar to other decision tree-based machine learning. In this study, two types of indicators were used for feature importance analysis-loss function change and Shapley Additive exPlanations (SHAP) values. The loss function change is calculated using how much the loss function has differed when a variable is omitted. The larger the value, the more important the variable is. SHAP calculates the expected value (the average of predictions for all training samples) and the contribution of each feature for each sample. When the contributions of all features are added to the expected value, it equals the prediction. Therefore, it helps to interpret how the input samples contribute to each feature to have the final prediction [49]. Catboost is, thus, a suitable technique to achieve the purpose of analyzing the effects of variables on HFRI. The formula for calculating the SHAP value for feature i is as follows: where M is the number of input features, N is the set of all input features, S is the set of non-zero feature indices and the f x (s) = E[ f (x) x s ] is the models' prediction for the input x [31,49]. Although Catboost was recently introduced, some studies have already shown that Catboost produced better results than other machine learning approaches such as random forest (RF), support vector machine, and generalized regression neural network [47,50]. We tested multiple machine learning approaches, namely Catboost, RF, and XGboost, and among them, Catboost performed better than the others. Thus, Catboost was adopted to develop the ensemble model in this study. The Catboost was run using the open-source Catboost library in python [51].

Model Validation and Comparison
The developed machine learning models were evaluated using a one-year-out cross-validation method with receiver operating characteristic (ROC) curves and the area under the ROC curve (AUC) values. A ROC curve is frequently used in forest fire risk and landslide susceptibility mapping studies because of its threshold independent interpretation [52][53][54][55][56]. Using the false positive rate and the true positive rate, the ROC curve shows the trade-off of the two rates. The prediction accuracy is usually determined by AUC which can represent the excellent performance if the value is 0.9-1, good for 0.7-0.9, and poor for less than 0.7 [52,55].
The small number of forest fire occurrences of a region does not directly mean the low forest fire risk for the region. Thus, after dividing the validation data into four risk classes (low, moderate, high, and very high) based on the equal interval approach except for the low class with the value from 0 to 0.49, the results were additionally evaluated qualitatively through the distribution of risk classes by hourly, seasonally, and yearly. Low class (i.e., very low susceptibility) was assigned for HFRI less than 0.5, referencing the class standard of a national official index (DWI). The rest of the sections were set from 0.5 to 1 as moderate-high-very high by adopting the equal interval method (Table 4). In the previous susceptibility mapping studies, risk classes were usually classified based on the natural breaks method, which results in varied class ranges depending on the training data [7,10,57]. On the other hand, in the case of the equal interval, the threshold value does not change by training data, so it is very easy to apply to the timely updated model. HFRI's threshold for class standards is very similar to DWI's original class standards provided by the Korea Forest Service. HFRI derived from the ensemble model was compared to the DWI and revised FFMC, which is a FFMC module of the Canadian Forest Fire Information System customized for South Korea [58]. According to [19], FFMC predicts the moisture quantity of fine fuel, and the larger the number, the higher the probability of ignition. In the study area, FFMC was more closely related to forest fires than CFWI [58,59], and even could be more correlated with forest fires when adjusted considering the environmental characteristics in South Korea (i.e., revised FFMC) [58]. Since DWI data were only available from 2014 to 2017, this period was used to compare two indices based on the percentile rank and distribution of forest fire risk pixels by class. The percentile rank was adopted for a relative comparison because these indices have different ranges. The percentile rank is the percentage of observations that are less than or equal to the specific observation value.

One-Year-Out Cross-Validation with ROC Curve
In order to evaluate the ensemble model and to compare it with the individual integrated and meteorological models, the ROC curves of one-year-out cross-validation (Figure 3a-f) and the entire years (Figure 3g) are depicted in Figure 3. Since a large weight (i.e., 80%) was applied to the integrated model when combining two models, it is not surprising to see similar ROC curves between the ensemble and integrated models. When the entire year results were summed, the AUC of the meteorological model was 0.8097, while the AUCs of the integrated and the ensemble models were 0.8434. The ensemble and integrated models outperformed the meteorological model. This means that the models benefited from the use of accessibility, fuel, and time-related variables. The AUC 0.8434 means that the ensemble model is good at distinguishing between high and low fire risk areas.  Figure 4 shows the feature importance identified by the integrated model based on the Loss function change. Rehu, Roaddens, Elevation, Popdens, and DOY in order were the top five contributing variables identified by Catboost. The most important variable was Rehu, which has been proved as a very useful and relevant variable representing dry weather conditions and dead fuel [20,60,61]. Rehu had a clearly opposite relationship with the number of forest fire cases considering their diurnal and seasonal patterns. Rehu has a peak in the dawn and decreases in the afternoon, and has a peak in summer and a decreasing pattern in winter and spring. The Pearson correlation coefficient of Rehu with HFRI was −0.8. Except for Rehu, contributing variables were closely related to accessibility, especially for the Roaddens, Elevation, and Popdens. The Pearson correlation coefficients of the three variables with HFRI were 0.4, −0.4, and 0.3, respectively. This agrees with the fact that most forest fire cases have occurred due to anthropogenic factors. The forest areas with high Roaddens, low Elevation, and high Podens have relatively high accessibility, which is vulnerable to forest fires [27]. DOY moderately contributed to the model. DOY is an effective factor for modeling the seasonality of forest fire risk: frequent forest fires in the dry season (spring and winter), while less in wet summer [62].  Using the SHAP values, we investigated how each feature contributed to the prediction in the integrated model. This tendency is an overall summary of the SHAP values. It means each sample has a different contribution of each feature for the prediction. For example, when Rehu has a value close to the average rather than a particularly high or low value, the contribution of Roaddens or Elevation to the prediction is much greater. These interactions between features are analyzed with a dependence plot (Figure 5b). SHAP values enable to identify how each feature affects the prediction when the feature has a certain value. In Figure 5a, the higher Rehu has negative contribution to the prediction up to −0.2. On the other hand, when the Rehu was low, it has positive contribution to the prediction up to 0.13. The high elevation decreased HFRI. While the low elevation increased the prediction, its effect was smaller than the decreasing effect. When the Roaddens was high, it increased the prediction. When Roaddens was lower, the prediction was lowered. Popdens was inversely related to elevation. It can be inferred that many of the samples predicted as low risk were determined mainly based on the influence of Rehu, elevation, and Roaddens. The weakness of the process of determining the forest fire risk through these three variables is that areas with low Roaddens and high elevation always result in low forest fire risk. It is because Rehu is a dynamic variable, so it reflects the situation at all times, but elevation and Roaddens are static variables. Therefore, it can be seen that the accessibility factors are very important in simulating forest fire risk. These results show the necessity of ensemble the meteorological model to complement the integrated model. Figure 5b shows how Rehu works in the model depending on the Roaddens feature value. The SHAP value was close to 0 when the Rehu value was near 50. If the Rehu had a moderate value, a static forest fire risk map might be generated since the prediction was mostly determined by Elevation, Roaddens, and Popdens. The interaction between the Rehu and the Roaddens changed from the Rehu feature value of 50. When the Rehu was less than 50, it had a larger positive SHAP value with the low Roaddens. This is because if the Rehu was low and the road density was high, both features have the same direction to high forest fire risk, which leads to the relative reduction of the Rehu's contribution. However, if the Roaddens was low despite the outbreak of a forest fire, the contribution of low Rehu would increase. This means that the contribution to the prediction varies depending on the feature value and the interaction between features.

Comparison of Percentile Rank between HFRI, Revised FFMC, and DWI
HFRI was compared to the revised FFMC and DWI using the percentile rank. To confirm the spatial accuracy, the percentile ranks of the forest fire pixels were calculated targeting all forest pixels every hour. Even if the indices (i.e., HFRI and DWI) have different ranges and distributions, since each index is spatially ordered for every pixel, it can be used to compare them which one is better at the discrimination between forest fire (representing high risk) and non-fire locations. For example, the percentile rank value of 95 means that the pixel is at the top 5% among all forest pixels at that time. Figure 6 shows the box plot of the percentile ranks of forest fire pixels. The mean percentile ranks of all forest fire pixels were 68, 59, 61 for HFRI, revised FFMC, and DWI, respectively. Revised FFMC and DWI had the almost same mean values but that of DWI was slightly higher, while HFRI had a remarkably higher mean value than the other two indices. The most noticeable part here is that the percentage of forest fire pixels with a percentile rank of 90 or higher in HFRI is 30%, whereas the other two indices were only 20%. Moreover, the percentage of forest fires with a percentile rank of over 50 was 72%, whereas 65% for DWI and 61% for the revised FFMC. HFRI showed a higher mean percentile rank than DWI thanks to the integrated model. When comparing the percentile rank of HFRI-I and HFRI-M, the percentile rank of HFRI-I appears at the same level as that of HFRI, whereas the percentile rank of HFRI-M is similar to that of DWI developed based on weather factors. Since the integrated model is focused on extracting high-risk areas by sampling non-fire samples only at the time of forest fire and using accessibility factors together, it could have a higher percentile rank in forest fire areas. Therefore, it can be said that HFRI simulates the forest fire risk better in spatial detail and accuracy than the DWI. Since the DWI showed better results than the revised FFMC, the DWI was used in the subsequent analysis and comparison to HFRI.
Although HFRI had a high mean percentile rank, some forest fire pixels had values below 50. Such pixels were generally found in the spring (i.e., 74% in March and April) when forest fires were frequent. In the spring, the forest fire risk was high nationwide and many forest fire cases (e.g., 7) occurred at the same time. In that situation, the percentile rank of the forest fire pixels may be relatively lowered because the forest fire risk was nationwide high. However, it does not mean HFRI failed to simulate the forest fire risk because the actual forest fire risk index was high and only about 5-6% of the forest fire cases occurred in March and April fell into this category.

Comparison of Forest Fire Risk Classes between HFRI and DWI
In this section, HFRI and DWI distributions by risk class were examined using the results of one year out cross-validation by time and season. Figure 7 shows the classification result of HFRI and DWI for forest fires using machine learning validation data with a two-hour interval from 0:00 to 24:00 KST (local time). The sum of the values in each time zone is 100% and the value for each risk class is the proportion of the forest fire pixels for the corresponding time zone. Both HFRI and DWI produced good performances in the afternoon from 12:00 to 16:00 local time, showing many forest fires occurred in the high risk pixels. However, about 50% and 70% of the forest fires occurred in the early morning and at night (22:00-08:00) had low risk for HFRI and DWI, respectively. This is because both HFRI and DWI are affected by the temporal distribution of forest fires since they are empirical models: samples were not evenly distributed on the temporal domain. Forest fires tend to occur intensively in the afternoon (Supplementary Figure S3), increasing the forest fire risk index. Oppositely, the forest fire risk was low in the early morning and night when there were fewer forest fires. Although a majority of forest fires in the early morning and at night were classified as low class, the actual average number of forest fires classified as low class in the early morning and at night was only up to 58 because the absolute number of forest fires was very small when compared to the afternoon. The largest difference in the proportion between the two indices was observed in low class. For HFRI, the proportion of low class for all times decreased by at least 3.9% up to 26.5% when compared to the DWI. Although both indices rapidly dropped in the morning (6:00-12:00), the rate of Low class was much lower for HFRI than the DWI. While the rate of Moderate and High classes was much larger for HFRI than the DWI at dawn and early morning. It is due to the high relative humidity and very low temperature in the early morning, which play an important role in calculating DWI. The average Rehu of forest fire pixels occurring in the dawn and morning (from 00:00 to 10:00) was 62.4%, which was higher than the average Rehu of 43.4% at the other times (from 11:00 to 23:00). Interestingly, the average values of the Popdens and Roaddens of forest fire pixels at dawn and early morning were 1683 and 179, respectively, higher than the other times (1295 and 150, respectively), which implies that accessibility increased the risk even at that time. In other words, DWI, simply based on the meteorological conditions, has limitations in simulating the forest fire risk at dawn and early morning, while HFRI improved them by incorporating anthropogenic features in addition to weather conditions. Figure 8 shows the classification results of HFRI and DWI by season. Both HFRI and DWI well simulated the forest fire risk in spring. However, they showed limited performance in summer and fall. DWI did not work well for winter, either. This might be because empirical modeling approaches were used for both indices with the use of much larger sample size in spring than other seasons. The results of the two indices in spring and summer showed almost similar patterns, but very different in fall and winter. In the DWI, the rate of forest fire occurrences in high class in fall and winter rapidly decreased, and most of the forest fires occurred in low class. On the other hand, in HFRI, the proportion of high class was high in fall and winter resulting in a decrease of 30.6 and 30.87% in low class when compared to the DWI. The mean Rehu of forest fire pixels in fall (52%) had the higher value than in spring and winter (40% and 45%), resulting in less prone to forest fires. For these reasons, the percentage of the high class both HFRI and DWI declined in fall. Nevertheless, HFRI was more likely to be classified as high risk compared to DWI because the relative influence of the Rehu was offset by other factors in HFRI. Overall, the low class rate for HFRI was lower than for DWI for all seasons. In addition, the class rates of HFRI did not change much by season, while those of DWI dramatically changed, which implies the temporal robustness of HFRI over DWI.  Figure 9 shows the spatial distribution of two indices (i.e., HFRI and DWI) with a two-hour interval on 25 February 2016 with three forest fire cases. The difference in spatial distribution of two indices over time and the advantages of the fine spatial resolution of HFRI are described in Figure 7. Both HFRI and DWI had similar spatial distributions overall, and the high forest fire risk areas were well-matched with actual fire locations. This is because both HFRI and DWI are largely influenced by Rehu, which is a common factor. However, as mentioned in Figure 7, DWI was not able to detect the forest fire risk in the morning compared to HFRI. This is well illustrated on the map at 10:00 KST (Figure 9b,h). Despite the forest fire occurred in Gwangju at 9:55, that area was classified as Low class of DWI (Figure 9h), but the high class of HFRI (Figure 9b). Moreover, as mentioned in Figure 8, DWI was not able to detect the forest fires well in winter when compared to HFRI. Forest fires usually occurred in low class of DWI in winter, while HFRI well detected them in high class. This supports that HFRI is stable regardless of season, showing the advantages of more detailed spatial variations within administrative districts. Figure 10 shows the spatial distribution of HFRI and DWI from 30 to 31 March 2016 with a total of 7 forest fires. According to Figures 7 and 8, DWI has almost the same rate as the High class with HFRI from 10:00 to 20:00 in spring. Therefore, the spatiotemporal patterns between the two indices in spring are examined in Figure 10. On the map from 12:00 on March 30 to 2:00 on 31 March, the forest fire risk at the fire location matched a relatively high risk zone in both HFRI and DWI. On the other hand, paying attention to the circle with a dashed red line where the forest fire did not occur, HFRI showed a relatively lower risk distribution (low or moderate) on the red circle region than the location of the fires. However, the DWI has a nationally high class (high and very high) in the afternoon and early evening in the region with the red circle. In particular, the 20:00 maps (Figure 10f,n) show that the DWI still showed a high-risk across the country (Figure 10n), compared to HFRI that has a high risk mainly around the forest fire location and lower risk in other areas (Figure 10f). In spring, the forest fire risk naturally rises due to dry and warm weather so the DWI tended to overestimate the fire risk. Thus, the proportion of forest fires that occurred in high class rapidly increased in the afternoon in spring. On the other hand, HFRI can simulate the detailed forest fire risk in a single administrative district with fine resolutions so it did not show excessively high risk spatial distribution over a large area. Furthermore, the maps from 22:00 on 30 March and 2:00 on 31 March (Figure 10g,h,o,p) showed that HFRI had a relatively high risk at the forest fire location while the DWI had a low risk. This confirms that DWI did not work well at night, resulting in a low risk even in spring, as shown in Figure 7.

Conclusions
Existing forest fire risk indices as an operational use are mostly based on weather and fuel data, and often require complex processes through various conditions based on statistics. In this study, a new hourly forest fire risk index (HFRI) based on the Catboost machine learning using various factors including weather, accessibility, and fuel data was proposed over South Korea. HFRI is provided by the ensemble model that linearly combined the integrated and meteorological models with different input variables. HFRI was validated through the ROC curves and AUC values, and compared to the revised FFMC and DWI through the percentile rank. The AUC of HFRI was 0.8434, much higher than the meteorological model (i.e., typical fire risk model). Moreover, HFRI has shown stable performance in identifying forest fire risk areas dynamically with diurnal and seasonal change. In addition, spatial variations exist within administrative districts because of their fine resolution (1km), which can also help deploy forest fire fighting resources at the district level. However, since HFRI uses local forest density data and the numerical model (i.e., digital forecast model) available in the study area, it is necessary to replace them with globally available data (e.g., Global Data Assimilation and Prediction System) to improve the transferability of the index. In addition, the causes of forest fires may vary by region, it is crucial to extensively train and evaluate the index over multiple areas with different characteristics of forest fires.
Feature contribution analysis through SHAP values showed that Rehu had the greatest contribution, followed by elevation, Roaddens, and Popdens, in order. Most of the contributing features were related to the accessibility, implying that such features were critical especially for areas with frequent forest fires caused by anthropogenic reasons. Due to these static accessibility variables, the high forest fire risk was generally concentrated around urban areas even though HFRI was complemented through the ensemble model structure with the meteorological model. In the future, improved performance is expected if additional spatiotemporally dynamic accessibility variables, such as the number of floating population and the number of hikers by date, are incorporated in the ensemble model.