Mapping Forest Fire Risk Zones Using Machine Learning Algorithms in Hunan Province, China

: Forest ﬁre is a primary disaster that destroys forest resources and the ecological environment, and has a serious negative impact on the safety of human life and property. Predicting the probability of forest ﬁres and drawing forest ﬁre risk maps can provide a reference basis for forest ﬁre control management in Hunan Province. This study selected 19 forest ﬁre impact factors based on satellite monitoring hotspot data, meteorological data, topographic data, vegetation data, and social and human data from 2010–2018. It used random forest, support vector machine, and gradient boosting decision tree models to predict the probability of forest ﬁres in Hunan Province and selected the RF algorithm to create a forest ﬁre risk map of Hunan Province to quantify the potential forest ﬁre risk. The results show that the RF algorithm performs best compared to the SVM and GBDT algorithms with 91.68% accuracy, 91.96% precision, 92.78% recall, 92.37% F1, and 97.2% AUC. The most important drivers of forest ﬁres in Hunan Province are meteorology and vegetation. There are obvious differences in the spatial distribution of seasonal forest ﬁre risks in Hunan Province, and winter and spring are the seasons with high forest ﬁre risks. The medium- and high-risk areas are mostly concentrated in the south of Hunan.


Introduction
As an important component of terrestrial ecosystems, forests are energy reservoirs, gene banks, water reservoirs, and carbon reservoirs on Earth, and play a vital role in maintaining the ecological balance of the planet and improving the ecological environment [1]. Forest fires, as a global phenomenon [2], pose a serious threat to the ecological environment as well as the safety of human life and property [3][4][5][6]. In recent years, the frequency of forest fires has increased due to global warming and frequent human activities [7]. Forest fire risk is defined as the likelihood of fire occurrence and its consequences [2]. Forest fire risk assessment is a scientific method to quantify the level of forest fire risk [8]. Forest fire risk level zoning map is an important part of forest fire risk assessment. It is based on the probability of forest fire occurrence at a specific threshold [9] and provides an effective map for resource allocation for forest fire risk management. Therefore, in order to protect forest resources and forest ecosystem functions, it is important to map the forest fire risk level zones for forest fire prevention and control [10].
Forest fires are influenced by a variety of factors [11,12]. With the advancement of forest fire risk prediction studies, at present, most analyses are carried out in a comprehensive manner with multiple factors, such as meteorology, topography, vegetation, and human activities [13][14][15]. Meteorology is considered to be a determinant of forest fires, which mainly affects forest fires in two ways: by influencing the frequency of forest fire weather and the water content of combustible materials [16]. Differences in topography can influence wind, water, and heat transfer between sites [17]. Furthermore, they have

Forest Fire Data
The Department of Fire Prevention and Control Management, Ministry of National Emergency Management, China, provided the satellite monitoring hotspot data from 2010 to 2018. Fire point data were obtained from this data, and the abnormal samples in the original dataset were removed before screening the forest fire data with the land type of forest land. When modelling forest fire prediction, because the dependent variable under study is a binary variable, a certain number of random points need to be created to participate in the modelling as non-fire points. We created random points (non-fire points) at a 1:1 ratio within the forested area of the 30 m surface cover data for Hunan Province in 2020, and performed a 500 m buffer zone analysis on fire points to avoid random points being located at or near the same location as fire points. The fire point is set to 1 and the random point is set to 0. The random point follows the principle of double randomness in time and space [19,30]. The woodland data were obtained from the 30 m global land cover dataset GlobeLand30 from the Global Geographic Information Public Product (http://www.globallandcover.com/ accessed on 16 January 2022), and the number of fire points and random points were 12,815 and 10,539, respectively.

Forest Fire Impact Factor Data
The fire variables of the forest fire risk model mainly include meteorology, topography, vegetation, and human activities. We chose 22 forest fire variables as the initial variables influencing the occurrence of forest fires in Hunan Province for this study, and detailed descriptions of the categories are shown in Table 1. In this study, all were continuous variables except for aspect and special festival, which were categorical variables. The meteorological data were obtained from the China Meteorological Data Network (https://data.cma.cn/ accessed on 30 June 2021), which includes daily value dataset (V3.0) of Chinese surface climate data for 8 years from 2010 to 2018. After pre-processing the meteorological data, we finally selected 10 meteorological factors, including daily average surface temperature, daily maximum surface temperature, cumulative precipitation at 20-20 (the 24-h cumulative precipitation from 20:00 pm to 20:00 pm the following day), daily average air pressure, daily average relative humidity, daily minimum relative humidity, hours of sunshine, daily average temperature, daily maximum temperature, and average wind speed, as the initial forest fire meteorological variables.

Topographic Data
The incidence and spread of forest fires are impacted by topographic variations. Differences in topography play an important role in the composition of vegetation types and the spatial distribution of combustible materials, which directly affect the occurrence and spread of forest fires, for which elevation, slope, and aspect have been widely reported [17]. To collect altitude, slope, and aspect information for Hunan Province, the DEM data with a spatial resolution of 30 m was acquired from the Geospatial Data Cloud website (http://www.gscloud.cn/ accessed on 16 January 2022). We divided the aspect into nine categories as shown in Table 2.

Vegetation Data
Changes in NDVI values can indicate changes in water and nutrient availability, plant diseases, and other stressors, which in turn are indicators of vegetation vulnerability to fire [50]. As a result, the vegetation data in this study were expressed by NDVI (Normalized Vegetation Index). The Resource Environment Science and Data Center (http://www.resdc. cn/ accessed on 30 June 2021) provided the spatial distribution dataset of China Quarterly Vegetation Index (NDVI) with a spatial resolution of 1 km. The seasons were separated into four groups based on the vegetation status: spring (March-May), summer (June-August), autumn (September-November), and winter (December-February).

Social and Humanistic Data
The basic geographic data were obtained from the National Basic Geographic Database of 1:250,000 from the website of National Geographic Information Resource Catalog System (http://www.webmap.cn/ accessed on 30 June 2021). Based on ArcGIS software, the shortest distance from sample points to infrastructures, such as railroads, roads, and settlements, was calculated. Socioeconomic data included population density, gross domestic product (GDP) per capita, and special festival. GDP and population density were uploaded from the National Earth System Science Data Center (http://www.geodata.cn/ accessed on 30 June 2021) for the 2015 spatial distribution of population and GDP on a kilometer grid with a resolution of 1 km. Since there are certain traditional Chinese holidays where people burn paper money to pay respects to their deceased relatives that may lead to forest fires, Chinese New Year's Eve, the first day of Chinese New Year, the second day of Chinese New Year, the Lantern Festival, the Tomb Sweeping Festival, and the Zhongyuan Festival (i.e., July 15 of the lunar calendar) were set as special festival days and denoted as 1; non-special festival days were denoted as 0.

Normalization
The magnitudes and magnitude units of the forest fire inciting elements vary, which will have an impact on the analysis of the data. The normalization of the data is required so that each factor is in the same order of magnitude in order to avoid the influence of magnitudes among indicators and the issue of excessive differentiation of output data magnitudes. The normalization formula is as follows.
where x i and x * i are the values before and after normalization of the data, respectively; and x max and x min are the maximum and minimum values of the sample data, respectively.

Multiple Collinearity Test
The assessment of multicollinearity of the independent variables can provide their respective importance and positional positioning in the optimal model construction [51]. Therefore, the variance inflation factor (VIF) was applied in this study to step out the independent variables with significant covariance. Generally, when VIF > 10, it indicates that the independent variables should be excluded because of their significant covariance. Since the diagnosis of multicollinearity is only applicable to continuous variables, not to categorical variables, aspect and special festival did not perform multicollinearity diagnosis, and these two variables entered directly the importance test stage of the model. After the test, after excluding the three variables of daily average surface temperature, daily average temperature, and minimum relative humidity (VIF values of 76.849, 89.026, and 14.605, respectively), the VIF values of the remaining 17 continuous variables were less than 10, and there was no multicollinearity (see Table 3). Finally, 17 continuous variables and the 2 categorical variables of aspect and special festival, a total of 19 feature variables, entered the model fitting stage.  [52], which is an inheritance and improvement of the traditional decision tree, capable of analyzing and evaluating the relative importance of the input factors with high classification accuracy and computational speed as well as robustness to outliers [47]. The performance of RF is influenced by two important parameters: the number of trees in the forest (ntree) and the number of random variables per split node (mtry). Therefore, these two parameters must be set appropriately beforehand. As ntree is less sensitive to classification accuracy, we set ntree to 500 trees [20] and used five-fold cross-validation to determine the optimal parameters for the model mtry, finally settling on mtry = 4.

Support Vector Machine
SVM is a machine learning method that is applicable to classification and regression. Its basic idea is to maximize the gap between different classes of samples by finding an optimal hyperplane in the feature space as the basis for classification [53]. The prerequisite for classification using SVM is that the training sample space is linearly divisible, but the actual data may be complex. To solve nonlinear problems in classification or regression, kernel functions are introduced into SVM classification methods. Kernel functions can map the original input space to a new feature space, making samples that are otherwise linearly indistinguishable potentially distinguishable in the kernel space. The kernel functions are mainly divided into kernel functions with linear kernel function, polynomial kernel function, radial basis kernel function (RBF), etc. In this study, the parameters of C and g and the optimal model were determined based on grid search, and then the optimal values of C and g were determined to be 100 and 0.01, respectively, and the RBF kernel function was selected to build the model.

Gradient Boosting Decision Tree
Gradient boosting decision tree (GBDT) calculates the residuals between the current output and the true value by each weak learner, and then accumulates the residuals of each weak learner output to reduce the residuals in the training process to achieve the classification goal [54]. The GBDT algorithm has the advantages of high prediction accuracy, robustness, and the ability to handle both continuous and discrete data [55]. The main purpose of the GBDT algorithm is to solve the optimization of the loss function, using the negative gradient of the loss function to fit the residuals of the previous round of weak learners, and the training process can be represented by the following equations [56].
where M is the number of iterations, T(x, θ m ) is the weak classifier generated at each iteration, and θ m is the loss function, which can be expressed as: where This study determined a definite learning rate of 0.1 and a number of weak learners of 190 for the GBDT model through a five-fold cross-validation and grid search.

Model Performance Evaluation
In this study, the classification ability of different machine learning methods was evaluated using five metrics: accuracy, precision, recall, F1 (H-mean), and area under curve (AUC) [31,48]. F1 is used to assess precision and recall. Accuracy is the proportion of correctly classified samples in the total sample, while precision is the proportion of positive samples in the sample that are predicted to be true, and recall is the proportion of positive samples in the sample that are actually true [57]. The relationship between sensitivity and specificity is represented by the receiver operating characteristic curve (ROC), and the area of the lower part of the ROC curve is known as the AUC. This area is frequently used to assess the predictive power of classification models, and the closer its value is to 1, the more accurate the mode prediction [49]. The accuracy, precision, recall, and F1 can be expressed by the following equations.
where TP (true positive) was predicted by the model as the number of positive samples in the positive category, FP (false positive) was predicted by the model as the number of negative samples in the positive category, TN (true negative) was predicted by the model as the number of negative samples in the negative category, and FN (false negative) was predicted by the model as the number of positive samples in the negative category.

Results
In this study, the original sample data were randomly divided into 70% training samples (for models building) and 30% test samples (for models testing).

Comparison and Validation of the Three Models
This research conducted a grid search and five-fold cross-validation on each classifier to measure the predictive accuracy of the model. Using the training dataset, the final RF, SVM, and GBDT models were trained. Five evaluation metrics are then used to validate the performance of these three machine learning algorithms: accuracy, precision, recall, F1 value, and AUC. The fit results of all three models were good (AUC > 0.85), according to the test findings of the validation dataset (see Figures 2 and 3). The higher accuracy value indicates the stronger predictive ability of the model, and the order of accuracy values of the three models was RF> GBDT > SVM. Therefore, RF was found to be the best method for predicting forest fire risk in Hunan.
Among the three machine learning algorithms, the RF algorithm performed the best, outperforming the other two algorithms in every evaluation index with 91.68% accuracy, 91.96% precision, 92.78% recall, 92.37% F1, and 97.2% AUC. RF algorithms can tolerate outliers and noise, and have the ability to handle redundant attributes and good generalization [58,59]. This was followed by GBDT with 89.38% accuracy, 88.56% precision, 92.36% recall, 90.42% F1, and 95.83% AUC. SVM had the worst performance with an accuracy of 88.88%, precision of 87.07%, recall of 93.38%, F1 value of 90.11%, and AUC of 95.29%. Although SVM also shows good classification and generalization capabilities, it is very time consuming to calibrate [48,59].

Importance of Feature Factors
The RF algorithm is able to automatically identify the relative importance of feature variables by mean decrease accuracy [2]; therefore, the importance of the mean decrease accuracy of the 19 drivers was ranked. The results show (see Figure 4) that the importance of average relative humidity is significantly greater than that of the other variables, and its importance is ranked first, followed by the maximum daily temperature and hours of sunshine. Higher temperatures and long periods of sunshine tend to reduce the water content of vegetation and increase the likelihood of forest fires. Among the meteorological factors, the average air pressure has a relatively small effect on the occurrence of forest fires in Hunan Province. In this study, NDVI was the fourth most important factor influencing the occurrence of forest fires in Hunan, followed by the daily maximum surface temperature and latitude, and the influence of longitude was relatively small compared to latitude. Longitude and latitude reflect, to some extent, differences in forest and tree species categories, as well as differences in the degree of flammability of forests of different tree species. Among the topographical factors, elevation has a greater influence on the occurrence of forest fires in Hunan compared to slope and aspect. The influence of human activities on the occurrence of forest fires in Hunan Province is ranked as follows: GDP, nearest distance of fire point to railway, population density, special festival, closest distance of fire point to residential area, and distance from the fire point to the highway.

Importance of Feature Factors
The RF algorithm is able to automatically identify the relative importance of feature

Seasonal Fire Zoning Map of Hunan Province
The spatial distribution of forest fire occurrence is crucial for forest fire prevention and control as well as fire management. In order to achieve the optimal allocation of firefighting resources, the best performing RF model's forest fire occurrence probability prediction results were selected for this study to map the fire risk in Hunan Province over four seasons. The kriging interpolation method in the ArcGIS 10.4 software was used to interpolate the fire prediction probabilities. In this study, the fire risk zones in Hunan Province were classified into five categories (I-V). I: forest fire probability range of 0.0-0.2 represents the very-low-risk zone, i.e., forest fires are basically unlikely to occur; II: forest fire probability range of 0.2-0.4 represents the low-risk zone, i.e., forest fires are unlikely to occur; III: forest fire probability range of 0.4-0.6 represents the medium-risk zone, i.e., forest fires are likely to occur; IV: forest fire probability range of 0.6-0.8 represents the high-risk zone, i.e., a forest fire is likely to occur; and V: forest fire probability range of 0.8-1.0 represents the very-high-risk zone, i.e., forest fire is very likely to occur. Figure 5 illustrates the stark differences in the spatial range of the risks of seasonal forest fires in Hunan Province. Among them, winter and spring are the seasons with high forest fire risks. The areas of medium and high fire risks are relatively large, mainly concentrated in southern Hunan. There are relatively few forest fires in autumn and summer. Yongzhou City, Chenzhou City, Hengyang City, Zhuzhou City, the center portion of Loudi City, the southeast of Shaoyang City, and the east and south of Yueyang City are the main locations of the highly high-risk zones in winter. The central and southern parts of Yongzhou City, the eastern part of Shaoyang City, the central part of Loudi City, the southern and eastern parts of Hengyang City, and the northern and central parts of Huaihua City are the main locations of the highly high-risk zones in spring. In the fall, the southeast of Hengyang City, Zhuzhou City, and the center of Yongzhou City are the main distribution points for the extremely high-risk zones. In the fall, the southeast of Hengyang City, Zhuzhou City, and the center of Yongzhou City are the main distribution points

Seasonal Fire Zoning Map of Hunan Province
The spatial distribution of forest fire occurrence is crucial for forest fire prevention and control as well as fire management. In order to achieve the optimal allocation of firefighting resources, the best performing RF model's forest fire occurrence probability prediction results were selected for this study to map the fire risk in Hunan Province over four seasons. The kriging interpolation method in the ArcGIS 10.4 software was used to interpolate the fire prediction probabilities. In this study, the fire risk zones in Hunan Province were classified into five categories (I-V). I: forest fire probability range of 0.0-0.2 represents the very-low-risk zone, i.e., forest fires are basically unlikely to occur; II: forest fire probability range of 0.2-0.4 represents the low-risk zone, i.e., forest fires are unlikely to occur; III: forest fire probability range of 0.4-0.6 represents the medium-risk zone, i.e., forest fires are likely to occur; IV: forest fire probability range of 0.6-0.8 represents the high-risk zone, i.e., a forest fire is likely to occur; and V: forest fire probability range of 0.8-1.0 represents the very-high-risk zone, i.e., forest fire is very likely to occur. Figure 5 illustrates the stark differences in the spatial range of the risks of seasonal forest fires in Hunan Province. Among them, winter and spring are the seasons with high forest fire risks. The areas of medium and high fire risks are relatively large, mainly concentrated in southern Hunan. There are relatively few forest fires in autumn and summer. Yongzhou City, Chenzhou City, Hengyang City, Zhuzhou City, the center portion of Loudi City, the southeast of Shaoyang City, and the east and south of Yueyang City are the main locations of the highly high-risk zones in winter. The central and southern parts of Yongzhou City, the eastern part of Shaoyang City, the central part of Loudi City, the southern and eastern parts of Hengyang City, and the northern and central parts of Huaihua City are the main locations of the highly high-risk zones in spring. In the fall, the southeast of Hengyang City, Zhuzhou City, and the center of Yongzhou City are the main distribution points for the extremely high-risk zones. In the fall, the southeast of Hengyang City, Zhuzhou City, and the center of Yongzhou City are the main distribution points for the extremely high-risk zones. The south of Hengyang City and the east of Shaoyang City are mostly where the severely high-risk zones in the summer are located. The relevant management authorities in Hunan Province should step up their fire prevention efforts in spring and winter, especially in the major cities mentioned above. for the extremely high-risk zones. The south of Hengyang City and the east of Shaoyang City are mostly where the severely high-risk zones in the summer are located. The relevant management authorities in Hunan Province should step up their fire prevention efforts in spring and winter, especially in the major cities mentioned above.

Discussion
Three machine learning methods (RF, SVM, and GBDT) and 19 forest-fire-driving factors were used in this study to predict the likelihood of forest fire occurrence in Hunan Province. The results demonstrate that all three models are suitable for predicting forest fire occurrence in Hunan Province (prediction accuracy is greater than 85%), but RF has a higher generalization ability than GBDT and SVM. The optimal model's accuracy is 91.68%, precision is 91.96%, recall is 92.78%, F1 is 92.37%, and the AUC is 97.2%, indicating that the stochastic forest model is more appropriate for this assignment. The results can provide a reference for future forest fire modeling in Hunan. RF is able to operate on large

Discussion
Three machine learning methods (RF, SVM, and GBDT) and 19 forest-fire-driving factors were used in this study to predict the likelihood of forest fire occurrence in Hunan Province. The results demonstrate that all three models are suitable for predicting forest fire occurrence in Hunan Province (prediction accuracy is greater than 85%), but RF has a higher generalization ability than GBDT and SVM. The optimal model's accuracy is 91.68%, precision is 91.96%, recall is 92.78%, F1 is 92.37%, and the AUC is 97.2%, indicating that the stochastic forest model is more appropriate for this assignment. The results can provide a reference for future forest fire modeling in Hunan. RF is able to operate on large datasets with a large number of feature variables, has a high tolerance to noise and missing data, and can efficiently assess complex interactions and nonlinearities among explanatory variables [2,30]. Due to its powerful functions and high usability, RF has become one of the most popular machine learning methods. SVM has the advantage of nonlinear mapping without excessive interference from noisy data and is not prone to overfitting; however, it requires considerable time to test different kernel functions and model parameters to find the best model, making this approach impractical for dealing with large sample datasets [31,50]. GBDT is an additive model consisting of multiple CART regression trees, which improves the accuracy of prediction by updating the residuals and continuously reducing them with the number of training rounds. This study concluded that SVM is less suitable for predicting forest fire incidence in Hunan compared to the other two methods, mainly because it is difficult to calibrate, too time-consuming to optimize, and does not achieve the accuracy of RF or GBDT. Because each model's prediction accuracy depends heavily on the input data and the adjusted parameters, different results may be achieved for various study areas and datasets [48,49]. In addition, we compared a number of other studies related to forest fire risk prediction in Hunan Province and found that the optimal model in this study, RF, achieved a high prediction accuracy, as detailed in Table 4. Table 4. Comparison of the prediction accuracy of some relevant studies on forest fire risk in Hunan Province.

Method Description Impact Factor Precision
Guo et al. [44] Combined with the principal component analysis method, a weighted forest fire risk weather index model was established to determine the forest fire risk weather level according to the weather index.

Meteorology (5 factors) AUC = 74.2%
Wang et al. [45] The logistic model was used to predict the probability of forest fire risk to classify the forest fire risk level in Hunan Province.
Meteorology, vegetation, topography, social/humanity (7 factors) AUC = 77.9% Yang et al. [60] Construction of the Maxent wildfire risk assessment model using GIS to analyze the contribution, importance, and response of environmental variables to wildfire in Hunan Province. The analysis of the significance of the characteristic factors revealed that slope, among the topographic factors, has the least influence on the incidence of forest fires in Hunan Province, followed by human activity infrastructures. However, elevation, among the topographic factors, has a greater influence on forest fire occurrence in Hunan. The higher the elevation, the higher the relative humidity and the less likely fires are to occur [61,62], while the lower the elevation, the higher the human accessibility and the more significant the human activities; thus, the more likely forest fires are to occur [48]. GDP and population density in human activities have a greater influence on the occurrence of forest fires in Hunan Province, most likely because the development of the forestry industry is closely linked to human activities, and an increase in population density and GDP promotes the occurrence of forest fires [63,64]. Meteorological factors and vegetation factors are the most important influencing factors of forest fires in Hunan Province. There is a great deal of weight assigned to meteorological elements, including mean relative humidity, daily maximum temperature, and sunlight hours. The environment's water and heat conditions as well as the moisture content of forest fuels are all impacted by meteorological elements [65,66], which is one of the main causes of forest fires. Among the vegetation factors, NDVI has an important influence. Although longitude and latitude reflect the differences in forest and tree species categories to some extent, they have less influence relative to NDVI, mainly because NDVI can directly reflect the amount of fuel for forest fires to occur, and the amount of fuel directly determines the fire capacity of the forest. Although latitude and longitude and NDVI reflect the situation of combustibles to a certain extent, they have certain limitations. Soil moisture affects the physiological activity of vegetation and is related to vegetation water content and soil moisture. Therefore, in the next work, we hope to incorporate accurate combustible material data and soil moisture data for forest fire prediction modeling.
The Hunan Province's forest fire risk level map indicates that the majority of Hunan's medium-and high-risk areas for forest fires are located in the southern portion of the province, especially in Hengyang City, Shaoyang City, Yongzhou City, Chenzhou City, central Loudi City, and southern Zhuzhou, which are essential locations for monitoring and forecasting forest fires, which is largely consistent with previous studies [45]. The main reason for this situation is that, due to the influence of monsoons and the topography, Hunan is under the influence of winter wind in winter, and the geomorphological characteristics of being surrounded by mountains in the southeast and west and open to the north are conducive to the long drive of cold air, with the general trend of temperature distribution being high in the south and low in the north. In addition, Hunan's forest resources are mainly distributed in the south, and the northern region has a wide water area with small forest coverage and fewer fires. The eastern part of Yueyang City (Linxiang City and Pingjiang County) in this study is also an area with a high incidence of forest fires, probably because of the developed man-made infrastructures and large population in the area, and the large forested area with poor fire resistance of forest tree species. It is advised to raise investments to enhance comprehensive prevention and control of forest fires and to safeguard the security of forest resources in Hunan Province due to the risk of forest fires listed above in high-risk areas. Hunan Province experiences a large number of forest fires in the winter and spring, so the forestry department should concentrate its fire prevention efforts during these two seasons. Hunan is under the control of the winter monsoon in winter, and the winter is dry, which increases the risk of forest fires. In spring, the temperature rises, human activities such as agricultural production and Qingming festival sacrifices increase, and the frequency of forest fires is high. Therefore, the forestry department should strengthen the management of human activities and fire prevention publicity and education, and raise awareness of forest fire prevention among all people. In addition, when a forest fire occurs, structural changes occur in the forest ecosystem. How it recovers is a complex process that includes a series of events, actions or changes, and the role of humans [67]. Resilience can be seen as a key parameter in decision-making processes [68], such as event mitigation following forest fires. In order to minimize the impact of forest fires and reduce the recovery period, resilience in forest fire-prone areas needs to be assessed. In future research, we hope to explore forest fire resilience in order to aid the decision-making processes of local management bodies.

Conclusions
Accurately predicting the probability of forest fires and mapping scientific forest fire risk levels can help forestry management departments to make scientific and effective forest resource management decisions. In order to achieve these goals, this study used the 2010-2018 satellite monitoring hotspot data provided by the Department of Fire Prevention and Control Management, Ministry of National Emergency Management, China, taking into account meteorological, terrain, vegetation, and socio-human factors, and using three machine learning methods (RF, SVM, and GBDT) to evaluate and map forest fire risk zones. The model performance comparison showed that the RF model was more suitable for forest fire occurrence prediction in Hunan Province, with the optimal model having 91.68% accuracy, 91.96% precision, 92.78% recall, 92.37% F1, and 97.2% AUC. In addition, the main characteristic factors of forest fires in Hunan Province were meteorological factors and vegetation factors by RF importance ranking. The drawn forest fire risk level zoning map showed that there are obvious differences in the spatial distribution of seasonal forest fire risk in Hunan Province, among which winter and spring are the seasons with high forest fire risk. The high-risk area of forest fires is mainly concentrated in the south of Hunan Province, and the prevention of forest fires should focus on these areas, and the authorities at all levels should develop scientific management strategies and make reasonable emergency resource allocation according to the local conditions. The results of this study can provide some reference basis for future forest fire management and prevention and control in Hunan Province.
Since the geographical distribution of forest fires and their influencing factors is highly heterogeneous in space, the relationship between them has significant spatial instability [69]. Therefore, in future work, we will consider adding a geographically weighted regression model for comparative studies, which incorporates spatial location information in the regression parameters and is capable of conducting the spatial analysis of the influencing factors and spatial prediction of forest fires. In addition, in future studies, we expect to add more accurate combustible data, soil moisture data, and different types of socio-economic factors to better support forest fire risk assessment in Hunan Province.

Data Availability Statement:
The fire point data used in this study were obtained from the Department of Fire Prevention and Control Management, Ministry of National Emergency Management, China, and are available from the corresponding authors upon request. Meteorological data were obtained from the China Meteorological Data Network (https://data.cma.cn/ accessed on 30 June 2021). DEM was taken from the Geospatial Data Cloud (http://www.gscloud.cn/ accessed on 16 January 2022). The China Quarterly Vegetation Index (NDVI) spatial distribution dataset was obtained from the Resource Environment Science and Data Center (http://www.resdc.cn/ accessed on 30 June 2021). Forest land data were obtained from the Global Geographic Information Public Product (http://www.globallandcover.com/ accessed on 16 January 2022). GDP and population data were obtained from the National Center for Earth System Science and Data (http://www.geodata.cn/ accessed on 30 June 2021). Road and residential datasets were obtained from the National Geographic Information Resource Catalog System (https://www.webmap.cn accessed on 30 June 2021).