Identifying Forest Fire Driving Factors and Related Impacts in China Using Random Forest Algorithm

: Reasonable forest ﬁre management measures can e ﬀ ectively reduce the losses caused by forest ﬁres and forest ﬁre driving factors and their impacts are important aspects that should be considered in forest ﬁre management. We used the random forest model and MODIS Global Fire Atlas dataset (2010~2016) to analyse the impacts of climate, topographic, vegetation and socioeconomic variables on forest ﬁre occurrence in six geographical regions in China. The results show clear regional di ﬀ erences in the forest ﬁre driving factors and their impacts in China. Climate variables are the forest ﬁre driving factors in all regions of China, vegetation variable is the forest ﬁre driving factor in all other regions except the Northwest region and topographic variables and socioeconomic variables are only the driving factors of forest ﬁres in a few regions (Northwest and Southwest regions). The model predictive capability is good: the AUC values are between 0.830 and 0.975, and the prediction accuracy is between 70.0% and 91.4%. High ﬁre hazard areas are concentrated in the Northeast region, Southwest region and East China region. This research will aid in providing a national-scale understanding of forest ﬁre driving factors and ﬁre hazard distribution in China and help policymakers to design ﬁre management strategies to reduce potential ﬁre hazards.


Introduction
Forests are ecosystems with rich biodiversity [1][2][3], and they play an important role in soil and water conservation, climate regulation, the carbon cycle and other aspects [4,5]. Fire, which affects the biodiversity, species composition and ecosystem structure of forest ecosystems, is the dominant disturbance factor in many forest ecosystems [6][7][8][9]. Moreover, fire also affects human lives, regional economies and environmental health [10][11][12]. In short, forest fires threaten the sustainable development of modern forestry and human security [13]. Therefore, as an important component of global environmental change, forest fires have become the focus of forestry and ecological research [14,15]. An important aspect of forest fire management and prevention is studying forest fire driving factors and their impacts, which can help fire prevention departments to accurately assess forest fire hazards and effectively implement forest fire prevention strategies [11,16]. Forest fires are affected complexly by many driving factors, so it is very important to select appropriate forest fire driving factors and prediction models.
Forest fire driving factors have generally been divided into four types, namely, climate, vegetation, topography and socioeconomic [17,18], which vary at different temporal and spatial scales [19]. Regarding impact modes, climate factors control the accumulation and water content of forest fuels [20,21], which are usually considered as the major determinants of forest fire occurrence [22]. Vegetation is a source of forest fuel and directly affects the ignition capacity [23,24]. Topography can affect the structure and distribution of vegetation, thus affecting the possibility of forest fires as well as the spread speed and direction of forest fires [25]. Socioeconomic factors affect forest fire occurrence via building expansion, traffic network construction and human-related activities, which increase pressure on wildlands, bringing ignition sources close to forests [23,26]. In terms of impact scope, climate affects forest fires on a larger scale while the vegetation, topography and socioeconomic factors affect forest fires on a smaller scale [27]. In terms of impact relationship, there are nonlinear relationships and thresholds between forest fire driving factors and forest fire occurrence [28][29][30]. Random forest is a machine learning algorithm, which can automatically select important variables and flexibly evaluate the complex interaction between variables. In recent years, random forest has been used in the study of forest fire driving factors and has shown better prediction ability than multiple linear regression [31] and logistic regression [18]. Previous studies have analysed forest fire distribution, forest fire frequency and burnt area at the national scale in China. Tian [32] analysed the spatial and temporal distribution characteristics of wildfires for 2008-2012 in mainland China. Chang [33] explored the environmental factors influencing the spatial variation in the mean number of fires and mean burned forest area from 1987 to 2007 at a provincial level using cluster analysis and redundancy analysis. Zhong [34] analysed the changes in fire frequency and burnt area during 1992-1999 in China. Lu [35] analysed the impacts of annual temperature and precipitation on the burnt area dynamics in China. Ying used ground-based data to analyse the environmental and social factor contributions to the spatial variation of forest fire frequency and burnt area summarized at a county level during 1989-1991 in China [36]. However, previous studies have used models to analyse the driving factors and their impacts of forest fire occurrence in China, mainly at the provincial scale, such as in Fujian province [18], Heilongjiang province [29,37] and Shanxi province [38]. There is still a lack of nationwide research on forest fire driving factors and their influence on recent forest fires. The value of this study lies in conducting the nationwide research which can provide a detailed analysis and practical information of the forest fire hazard and would help governments to formulate more accurate forest fire prevention strategies and allocate resources rationally. In this study, we used the random forest model and forest fire ignitions for 2010~2016 (obtained from MODIS Global Fire Atlas dataset) to evaluate the impact of four types of forest fire driving factors and the regional differences of these factors in China. This study has three objectives: (1) to determine the forest fire driving factors in various geographical regions of China and analyse how they affect forest fire occurrence; (2) to map the likelihood of forest fire occurrence in China and (3) to discuss forest fire prevention strategies in different geographical areas of China.

Study Area
The study area covered mainland China (Hong Kong, Macao and Taiwan were not analysed due to a lack of data). The driving factors of forest fires and their effect were analysed in 6 geographical regions: Northeast region (NE), North China region (N), East China region (E), Northwest region (NW), Southwest region (SW) and Mid-south region (MS). Each region is an aggregation of provinces with adjacent locations and similar topography, economy and climate. The details of each region are shown in Figure 1 and Table 1. The dominant terrain types, which fluctuate greatly, are deserts and high mountains. The elevation in most areas is between 500 and 5000 m.
Temperate desert and alpine vegetation on the Qinghai-Tibetan plateau. Forest coverage is 8.21% [39].
The total population is 101.86 million. The per capita GDP is ¥45,463 yuan [40].

Southwest region
Tibet, Sichuan, Chongqing, Yunnan, and Guizhou Subtropical monsoon climate and alpine climate The terrain is complex and consists of basins, plateaus and mountains. The elevation in most areas is between 500 and 6000 m.
Alpine vegetation on the Qinghai-Tibetan plateau, subtropical evergreen broadleaved forest and tropical rainforest. Forest coverage is 25.75% [39].
The total population is 200.95 million. The per capita GDP is ¥43,609 yuan [40].

Mid-south region
Henan, Hubei, Hunan, Guangxi, Guangdong, and Hainan Subtropical monsoon climate and tropical monsoon climate The dominant terrain types are plains and mountains. The elevation in most areas is below 1000 m.
The total population is 393.01 million. The per capita GDP is ¥57,664 yuan [40].

Dependent Variables
We identified 17,466 forest fires (ignitions) between 2010 and 2016 across mainland China with the Global Fire Atlas dataset (downloaded from the Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC), https://daac.ornl.gov) and Chinese land-use type dataset (downloaded from the Resource and Environment Data Cloud Platform, http://www.dsac.cn). The timing and location of the fire ignitions were provided by the Global Fire Atlas, which is a global dataset that records the daily dynamics of individual fires based on the Global Fire Atlas algorithm and estimated burn dates from the Moderate Resolution Imaging Spectroradiometer (MODIS) [41]. A Chinese land-use type dataset provided the forest land range in mainland China for 2015 at a 1000-m spatial resolution. According to this range, we identified Chinese forest fire ignitions for 2010-2016 from the Global Fire Atlas dataset in ArcGIS10.2 software (Environmental Systems Research Institute, RedLands, CA, USA). Figure 2 shows the distribution of the forest fire ignitions for six geographical regions in China.     Modelling forest fire occurrence requires a binary target variable, so a certain percentage of control points (nonfire points) were generated randomly according to three principles: (1) the ratio of forest fire ignition points to control points was 1:1.5 [29], (2) the control points were located within the forest land range in mainland China and (3) the points were random in both time and space. ArcGIS10.2 software was used to randomly generate the control points, and the dates of the control points were randomly selected during 2010-2016 to meet the randomness of time.

Explanatory Variables
A total of 21 variables, grouped into climate, topography, vegetation and socioeconomic categories, were selected as the initial forest fire driving factors ( Table 2). All variables were integrated in ArcGIS10.2 software and extracted to the forest fire ignition points and control points. Climate variables affect fuel accumulation and moisture which largely determine the time, location and occurrence probability of forest fires [31]. In this study, the initial climate variables include annual variables and daily variables. The annual variables are precipitation and soil moisture. As climate factors in the period before the fire can also affect vegetation accumulation and the fuel moisture content, precipitation and soil moisture in the year before individual forest fire ignition during 2010-2016 were also taken into consideration [29,31]. We downloaded precipitation data with a 1-km spatial and monthly temporal resolution [42] and soil moisture data with a 0.25 • spatial and monthly temporal resolution from the National Earth System Science Data Sharing Infrastructure, National Science & Technology Infrastructure of China (http://www.geodata.cn). Based on these data, we calculated the annual cumulative precipitation and the annual average soil moisture for 2009-2016.
The daily initial climate variables include daily average temperature, daily average ground surface temperature, daily average relative humidity, daily minimum relative humidity, daily precipitation, daily average atmospheric pressure, sunshine hours, daily average wind speed and daily maximum wind speed. The daily humidity, precipitation, wind speed and sunshine hours affect the possibility of forest fire occurrence by reflecting fuel moisture. Daily temperature is the key condition triggering fire ignition. Atmospheric pressure can affect the oxygen content in the air, and the pressure obviously differs due to significant altitude differences and the complex terrain in China; therefore, atmospheric pressure was also considered as an initial climate variable. Daily climate data were obtained from the Daily Data Set of China's Surface Climate Data (V3.0) of the National Meteorological Information Centre (http://data.cma.cn), and we included daily data from 824 national weather stations in China. The daily climate variable values for each fire ignition and control point were provided by the weather station nearest that point.

Topographic Variables
Topography influences the possibility of forest fire occurrence by affecting the vegetation composition and distribution and local microclimate [25]. In this study, the initial topographic variables include elevation, slope and aspect. Data for these variables were extracted from digital elevation model (DEM) data in China with 90-m spatial resolution (obtained from Geospatial Data Cloud site, Computer Network Information Center, Chinese Academy of Sciences, http://www.gscloud.cn). Aspect was divided into 8 categories according to the criteria in Table 3.

Vegetation Variable
The initial vegetation variable is the fractional vegetation cover (FVC), which is the percentage of the vertical projection of vegetation area to the ground surface within a unit area [48] and can well represent the amount of forest fuel [18,29]. The normalized difference vegetation index (NDVI) is significantly better than other vegetation indices in estimating FVC [49,50], so we calculated FVC based on the annual NDVI dataset for 2010-2016. The calculation formula is as follows: Forests 2020, 11, 507 where NDVI soil is the NDVI of bare soil and NDVI veg is the NDVI of dense vegetation canopy. The annual NDVI dataset for 2010-2016 was from the Resource and Environment Data Cloud Platform (http://www.resdc.cn), and the resolution was 1 km [51].

Socioeconomic Variables
Socioeconomic variables affect the probability of forest fire occurrence by affecting human activities. Human travel and engaging in production activities in or around forests will increase the occurrence probability of forest fires. In this study, the initial socioeconomic variables include the distance to the road and railway, the distance to the settlement, gross national product (GDP) and population density. Collectively, these variables can reflect the accessibility of a forest and the possibility of people engaging in fire-prone behaviours in forests [23,26]. The road, railway and settlement datasets were from the National Basic Geographic Database of 1:1 million, which was published on the National Catalogue Service for Geographic Information website (http://www.webmap.cn). The distance between the forest fire ignitions and control points to the nearest road and railway and settlement areas was calculated using the ArcGIS 10.2 "near analysis tool." The population density dataset and GDP dataset were downloaded by the National Earth System Science Data Center (http://www.geodata.cn), and the resolution was 1 km.

Model
The random forest model was used to identify the forest fire driving factors and their corresponding impacts on forest fire occurrence in 6 geographical regions of China and the whole study area. Random forest is an ensemble learning technique that is derived from classification or regression trees (CARTs). Random forest has a high prediction accuracy and high tolerance to outliers and "noise," and it has shown good prediction ability in forest fire forecasting [30,52]. The random forest model is composed of a combination of various classification trees, which are individually generated by bootstrap samples. Two-thirds of the data are used to train the random forest model and one-third of the data (the out-of-bag samples, OOB) for model validation [53]. Variable importance can also be measured by OOB, which compares increases in OOB error with that variable randomly permuted and all others unchanged [54,55]. The importance score of a variable is as follows [56]: where X j is the jth variable, ntree is the number of trees, errOOB j t is the OOB error of each tree t and err OOB j t is the OOB error when X j is permuted, while all other variables remain unchanged among OOB data. For regression, the OOB error is the mean square error; meanwhile, for classification, the OOB is misclassification probability.
In this study, RF was used for classification, which divided dependent variables into two categories: forest fire occurrence and forest fire nonoccurrence. When using an RF model, the number of trees to run (ntree) and the number of variables to try at each split (mtry) need to be defined. According to previous experience [56,57], the value of mtry was set as number o f variables and the value of ntree was set to 2000. The varSelRF package in R statistical software was applied to select significant variables from the initial variables. Then, we measured and ranked the variable importance of these variables. The partialPlot function was used to draw partial dependence plots which can describe the relationship between the dependent variables and explanatory variables.
To eliminate bias, in each study region and the whole study area, we selected 80% of the original dataset (training dataset) to build the model, and the remaining 20% of the original dataset (independent validation dataset) was used to assess the performance of the final model. Each training dataset was divided into an inner training dataset (60%) and an inner validation dataset (40%) randomly [52]; this procedure was repeated 5 times, and 5 random subsamples of data in each study region and the whole study area were obtained. Each subsample contained an inner training dataset and an inner validation In each region, the driving factors and training dataset were used to build a final model, and the independent validation dataset was used to validate the model [30].

Prediction Accuracy of the Models
The receiver operating characteristic (ROC) curve, a coordinate schema analysis method, was applied to measure the predictive capability of the RF models using the area under the curve (AUC) [28,58,59]. The AUC values ranged from 0.5 to 1, with values closer to 1 indicating a relatively higher accuracy, while an AUC value of >0.8 usually indicates good predictive capability [18,60]. We used the Youden criterion, calculated according to the sensitivity and specificity of ROC (Youden criterion = sensitivity + specificity − 1) [28,61], to determine the cut-off point, which was the threshold for judging whether a fire occurred in the models [62]. If the predicted probability was higher than the cut-off point, it was assumed that a forest fire had occurred and vice versa. The prediction accuracy of the model was calculation based on the cut-off point.

Mapping Forest Fire Occurrence Likelihood
Based on the fire occurrence probability calculated by the random forest model for fire ignitions and nonfire points, we used ordinary kriging interpolation to map the forest fire occurrence likelihood in mainland China in ArcGIS 10.2 [30]. Table 4 and Figure 3 show the forest fire driving factors and their importance rank in six regions and the whole study area. Table A1 and Figure A1 show the significant variables and their importance rank of each intermediate model.   Table 4. VIF (variance inflation factor) was used to measure the amount of multicollinearity in the explanatory variables. When VIF > 10, then collinearity in the explanatory variable exists and is excluded in the random forest model. "+" indicates that the variable was identified as being a forest fire driving factor in a given region, and "/" indicates that the variable is excluded due to multicollinearity.

Influence of the Forest Fire Driving Factors on Forest Fire Occurrence in Different Regions
Partial dependence plots of each forest fire driving factor in each region were drawn to analyse the variables' influence intervals and trends on the probability of forest fire occurrence, where x is the variable value and y is logit of the probability of forest fire occurrence/2 [30]. The markers on the x-axis show the data distribution, where fewer marks indicate less training data and inaccurate model predictions; therefore, only the impact trends within the dense data range are discussed in this study. Figure 4 shows a nonlinear relationship between each forest fire driving factor and the probability of forest fire occurrence. The vegetation variable shows the same influence trend on the forest fire occurrence probability in each region, and the overall trend is fluctuating. When the fractional vegetation cover is approximately 0.9, the probability of forest fire occurrence shows a peak value and then shows a sharp decline trend, while the probability is lowest when the fractional vegetation cover is approximately 0.98. The impact of climate variables is complex. The daily average temperature shows the same influence trend in the Northeast region and Southwest region: it was positively correlated with the probability of forest fire occurrence initially and negatively correlated after the values exceeded thresholds (12 • C in the Northeast region and 21 • C in the Southwest region). However, it shows another influence trend in the Mid-south region and East China region: the probability of forest fire occurrence is stable at higher values within 20 • C and decreases sharply when the daily average temperature exceeds 20 • C. The average daily relative humidity and the minimum daily relative humidity are generally negative correlated with the probability of forest fire occurrence in the respective regions. The annual soil moisture shows different influence trends in the North China and Southwest regions: in the North China region, it shows a fluctuating trend, while in the Southwest region, the probability of forest fire occurrence increases initially and then decreases as the annual soil moisture increases. For other climate variables, annual precipitation in the year before the fire and the year of the fire, daily average air pressure and daily maximum wind speed were generally positively correlated with the probability of forest fire occurrence. The elevation shows similar influence trends in the Northwest region and Southwest region and is negatively correlated with the probability of forest fire occurrence. Among socioeconomic variables, the probability of forest fire occurrence decreases with increasing distance from roads and increases initially and then declines with increasing population density and GDP.
temperature exceeds 20 °C . The average daily relative humidity and the minimum daily relative humidity are generally negative correlated with the probability of forest fire occurrence in the respective regions. The annual soil moisture shows different influence trends in the North China and Southwest regions: in the North China region, it shows a fluctuating trend, while in the Southwest region, the probability of forest fire occurrence increases initially and then decreases as the annual soil moisture increases. For other climate variables, annual precipitation in the year before the fire and the year of the fire, daily average air pressure and daily maximum wind speed were generally positively correlated with the probability of forest fire occurrence. The elevation shows similar influence trends in the Northwest region and Southwest region and is negatively correlated with the probability of forest fire occurrence. Among socioeconomic variables, the probability of forest fire occurrence decreases with increasing distance from roads and increases initially and then declines with increasing population density and GDP.

Model Prediction Accuracy in Different Regions
The AUC values of each final model and intermediate model are greater than 0.85, and the prediction accuracy is between 70.0% and 91.4% (Table 5), which indicates that the model predictive capability is good. In the final models, the AUC (0.974) and prediction accuracy (91.4% for training and 89.3% for testing) in the East China region were the highest. The AUC

Model Prediction Accuracy in Different Regions
The AUC values of each final model and intermediate model are greater than 0.85, and the prediction accuracy is between 70.0% and 91.4% (Table 5), which indicates that the model predictive capability is good. In the final models, the AUC (0.974) and prediction accuracy (91.4% for training and 89.3% for testing) in the East China region were the highest. The AUC (0.871) and prediction accuracy (81.75% for training and 70.52% for testing) in the Northwest region were the lowest, which may be due to the too-few fire ignition in the Northwest region.  Figures 5 and 6 show that the areas with high probability of forest fires are concentrated in the Northeast and Mid-south regions as well as the south of East China region and the northwest of Northwest region. To compare the results of the national model and the regional model, we drew a map of the difference in the likelihood of forest fire occurrence calculated based on the whole study area model and the regional models (Figure 7). The map shows that the probability of the whole model was higher than those of the regional models in most areas of the Southwest region and North China region and in the centre of Northwest region and lower than those in most areas of the Northeast, Northwest, East China and Mid-south regions and in the north of Northwest region.  Figures 5 and 6 show that the areas with high probability of forest fires are concentrated in the Northeast and Mid-south regions as well as the south of East China region and the northwest of Northwest region. To compare the results of the national model and the regional model, we drew a map of the difference in the likelihood of forest fire occurrence calculated based on the whole study area model and the regional models (Figure 7). The map shows that the probability of the whole model was higher than those of the regional models in most areas of the Southwest region and North China region and in the centre of Northwest region and lower than those in most areas of the Northeast, Northwest, East China and Mid-south regions and in the north of Northwest region.     Figures 5 and 6 show that the areas with high probability of forest fires are concentrated in the Northeast and Mid-south regions as well as the south of East China region and the northwest of Northwest region. To compare the results of the national model and the regional model, we drew a map of the difference in the likelihood of forest fire occurrence calculated based on the whole study area model and the regional models (Figure 7). The map shows that the probability of the whole model was higher than those of the regional models in most areas of the Southwest region and North China region and in the centre of Northwest region and lower than those in most areas of the Northeast, Northwest, East China and Mid-south regions and in the north of Northwest region.    Map of the likelihood of forest fire occurrence obtained from the whole study area model minus that obtained from the regional model.

Forest Fire Driving Factors and Their Influence
Previous studies have found regional and scale differences in forest fire factors [18,36,63]. This study also confirmed this point. We found that due to the differing geographical and social conditions in China from region to region, the forest fire driving factors vary in different regions, and the same variables also operate differently depending on the region and the scale of analysis, which illustrates the spatial applicability of forest fire research and the importance of formulating forest fire management systems based on regional characteristics. All final models included a smaller number of variables selected from the initial set. Previous studies have also shown that the simplified model is more stable. Previous studies have also noted that a parsimonious model would be more stable [28,49], In this study, all final models included climate variables, which are considered the dominant factors affecting forest fires [64][65][66]. Among climate variables, daily average temperature was the forest fire driving factor in the most regions (Northeast, Southwest, Mid-south, East China and the whole study area). Previous studies [29,30] have shown thresholds and complex nonlinear relationships between temperature and forest fire occurrence probability, and our study confirms this point. The probability of forest fire occurrence initially increases or stabilizes at a higher value with the increase in temperature. When the temperature exceeds a certain threshold (12 °C in the Northeast region, 21 °C in the Southwest region and 20 °C in the Mid-south region and East China region), the probability shows a sharp downward trend. There may be two reasons for this situation. (1) Although high temperatures can increase plant evaporation, thereby reducing the moisture content of forest fire fuels [67], most parts of China experience a monsoon climate (Table 1), and rainfall and heat are synchronous. Therefore, high-temperature weather is often accompanied by high relative humidity levels, which have opposite effects on forest fires, so there were impact thresholds. (2) At high temperatures, forest fire prevention departments are vigilant, implementing strict fire prevention systems and limiting the occurrence of forest fires [68]. Relative humidity is also one of the main forest fire driving factors. In this study, relative humidity showed a similar influence trend in the respective regions, and it was negatively correlated with the occurrence probability of forest fires despite some moderate fluctuations. This is because high relative humidity increases the moisture content of combustible materials and reduces the possibility of fire [64]. It is noteworthy that the daily Figure 7. Map of the likelihood of forest fire occurrence obtained from the whole study area model minus that obtained from the regional model.

Forest Fire Driving Factors and Their Influence
Previous studies have found regional and scale differences in forest fire factors [18,36,63]. This study also confirmed this point. We found that due to the differing geographical and social conditions in China from region to region, the forest fire driving factors vary in different regions, and the same variables also operate differently depending on the region and the scale of analysis, which illustrates the spatial applicability of forest fire research and the importance of formulating forest fire management systems based on regional characteristics. All final models included a smaller number of variables selected from the initial set. Previous studies have also shown that the simplified model is more stable. Previous studies have also noted that a parsimonious model would be more stable [28,49].
In this study, all final models included climate variables, which are considered the dominant factors affecting forest fires [64][65][66]. Among climate variables, daily average temperature was the forest fire driving factor in the most regions (Northeast, Southwest, Mid-south, East China and the whole study area). Previous studies [29,30] have shown thresholds and complex nonlinear relationships between temperature and forest fire occurrence probability, and our study confirms this point. The probability of forest fire occurrence initially increases or stabilizes at a higher value with the increase in temperature. When the temperature exceeds a certain threshold (12 • C in the Northeast region, 21 • C in the Southwest region and 20 • C in the Mid-south region and East China region), the probability shows a sharp downward trend. There may be two reasons for this situation. (1) Although high temperatures can increase plant evaporation, thereby reducing the moisture content of forest fire fuels [67], most parts of China experience a monsoon climate (Table 1), and rainfall and heat are synchronous. Therefore, high-temperature weather is often accompanied by high relative humidity levels, which have opposite effects on forest fires, so there were impact thresholds. (2) At high temperatures, forest fire prevention departments are vigilant, implementing strict fire prevention systems and limiting the occurrence of forest fires [68]. Relative humidity is also one of the main forest fire driving factors. In this study, relative humidity showed a similar influence trend in the respective regions, and it was negatively correlated with the occurrence probability of forest fires despite some moderate fluctuations. This is because high relative humidity increases the moisture content of combustible materials and reduces the possibility of fire [64]. It is noteworthy that the daily minimum relative humidity was also selected as the forest fire driving factor in the whole study area, which indicates that this variable operates at both regional and large scales. Air pressure affects the oxygen content and fuel ignition temperature, and a relatively lower pressure will lead to a lower oxygen content and higher ignition temperature, thus reducing the possibility of forest fire occurrence [69]. However, in the Southwest region, when the daily air pressure is higher than 860 hPa, the probability of forest fire occurrence shows a small decrease, which indicates that there is also an impact threshold of air pressure. The daily maximum wind speed is also one of the driving factors of forest fires in southwest China. The wind will increase evaporation capacity, and the higher the wind speed, the smaller the water content of forest combustibles; hence, the wind speed has a positive correlation with the occurrence probability of forest fires, which is consistent with the research results of Guo [18] in Fujian province of China. The soil moisture in the year of the fire directly affects the water content of forest combustibles [31], so this variable is negatively correlated with the occurrence probability of forest fire. Annual precipitation promotes the accumulation of plant fuels, thus having a positive impact on the occurrence probability of forest fires.
Among the topographic factors, elevation is a forest fire driving factor in the Northwest region and Southwest region, and it is negatively correlated with the occurrence probability of forest fires in both regions. We suspect that this is because the surface of these two areas fluctuates greatly; the elevation in most areas is 500~5000 m in the Northwest region and 500~6000 m in the Southwest region. As elevation increases, human activity decreases, and its impact on weather conditions, vegetation and soil moisture is also not conducive to forest fire occurrence [49,70,71]. Tian's research [32] also showed that forest fires mainly occurred at low elevations in China.
The vegetation variable (fractional vegetation cover) is a forest fire driving factor in all regions except the Northwest region, and its importance ranked first in four regions (Northeast region, Southwest region, East China region and the whole study area). Previous studies have also shown that vegetation cover has an important impact on forest fires [29,67]. Generally, the higher the vegetation coverage, the more fuel is available, so high vegetation coverage leads to a high forest fire rate. However, in this study, fractional vegetation cover showed a complicated influence trend: when the fractional vegetation cover is between 0.8 and 0.97, the occurrence probability of forest fires fluctuates at a higher value, and then it drops rapidly, reaching a minimum value when the fractional vegetation cover is approximately 0.98. We suspect that this is because in forest land with high vegetation coverage, canopy occlusion will lead to some small fires that are not easily detected by MODIS [67].
Compared with other variables, socioeconomic variables are the forest fire factors in few regions (Northwest region and Southwest region), with low degrees of importance ( Figure 4). Distance from the road is negatively correlated with the probability of forest fires in the Northwest region because the forests close to the road are vulnerable to traffic accidents and human activities (i.e., smoking and picnics) [26,61]. GDP and population density show similar influence trends, and they have a positive impact on the occurrence probability of forest fires initially and then have a negative impact after exceeding a certain threshold (GDP of 200 RMB/km 2 and population density of 100 number/km 2 ). This may be because within a certain range, the increase in population density and GDP will increase human activity in forests, thereby promoting forest fire occurrence [51,72,73]. However, in economically prosperous and high-population-density areas, the forest coverage rate is low and there are few forest-related production activities conducted by humans, so the occurrence probability of forest fires decreases [18,29,32].

Implications for Forest Fire Prevention
There are differences in the forest fire driving factors ( Figure 3 and Table 4) and the prediction results ( Figures 5-7) between the regional models and the whole study area model. Therefore, it is necessary to study forest fire driving factors based on geographical regions, and regional differences should also be fully considered in forest fire management. Forest fire management departments should formulate forest fire prevention strategies according to the differences in forest fire driving factors and impact thresholds in different regions. E.g., in the Northwest region, elevation has the greatest impact on forest fire occurrence, and the probability of forest fires is higher in low-elevation areas. Therefore, the Northwest region forest fire management departments should strengthen forest fire monitoring in low-elevation areas, such as setting up more forest fire observation towers and forest fire brigades [30]. In the North China region, soil moisture has the greatest impact, so changes in soil moisture should be taken into account when developing a forest fire prevention strategy. In the Northeast region and East China region, when the daily average temperature reaches the impact threshold, the occurrence probability of forest fires reaches a maximum; hence, forest fire management departments should be more vigilant in the corresponding weather. In the Southwest region, there are 13 forest fire driving factors. These factors should be integrated into the local assessment index systems of forest fire hazard, and the influence of these factors should be considered comprehensively when judging the forest fire hazard. In the Mid-south region, forest fire management departments should pay attention to monitoring the daily minimum relative humidity.
The map of the likelihood of forest fire occurrence is also crucial to forest fire management [74]. Understanding the distribution of forest fire occurrence likelihood can help to determine the location and number of fire observation towers [28], contributing to more effective use of financial and human resources. Figure 5 shows that areas with a high probability of forest fires are concentrated in the Northeast and Mid-south regions as well as the south of East China region and the northwest of Northwest region; thus, more stringent forest fire prevention systems should be implemented in these areas.

Strengths and Limitations
Previous forest fire research has usually been based on eco-geographical areas or forest types [29,75,76]. However, these zoning methods ignore administrative boundaries, and forest fire management strategies are often formulated by administrative areas. Therefore, we chose a zoning method that takes administrative divisions into account, trying to provide a more practical reference for China's fire prevention department. Our research is based on geographical regions in China, a division method that considers both administrative divisions and natural conditions that has been used in some forestry analysis [77,78]. Each region is an aggregation of provinces with adjacent locations and similar topography, economy and climate. However, this zoning method has its shortcomings. First, if a province has complex topography and different climate and vegetation types (such as Tibet in the Southwest region), it must also be included in one region. We think that this may have led to far higher number of forest fire driving factors in the Southwest region than in the other regions. The second point is about the model. To reveal the nonlinear relationship and influence threshold between forest fire driving factors and forest fire occurrence probability, we used the random forest model, which has shown good prediction ability in previous studies on forest fire [18,30,31]. However, behaving as a "black box," this method cannot calculate regression coefficients or confidence intervals [63,79]. Based on these two points, in future research, we will try to use geographically weighted regression, a spatially explicit technique that would overcome the necessity of building predetermined regions, to analyse forest fire driving factors to address these limitations.

Conclusions
We used the random forest model to analyse the forest fire driving factors in different geographical regions in China for 2010 to 2018. The model predictive capability is good, with a prediction accuracy between 70.0% and 91.4%. Furthermore, we mapped the probability of forest fire occurrence in China based on the results of the model. In China, there are obvious regional differences in the types of forest fire driving factors and their impacts. Climate variables (especially temperature and humidity) have major impacts on forest fires occurrence, and the vegetation variable is secondary. Topographic variables and socioeconomic variables are only the forest fire driving factors in the Southwest and Northwest regions. There is a nonlinear relationship and influence threshold between forest fire driving factors and forest fire occurrence probability. High fire hazard areas are concentrated in the Northeast and Mid-south regions as well as the south of East China region and the northwest of Northwest region. This research will aid in providing a national-scale understanding of forest fire driving factors and fire hazard distribution in China and help policymakers to design fire management strategies and allocate resources reasonably to reduce potential fire hazards.    VIF (variance inflation factor) was used to measure the amount of multicollinearity in the explanatory variables. When VIF > 10, then collinearity in the explanatory variable exists and is excluded in the random forest model. "+" indicates that the variable was identified as being a forest fire driving factor in a given region, and "/" indicates that the variable is excluded due to multicollinearity. Pre_year0: annual precipitation in the year before the fire; Pre_year1: annual precipitation in the year of the fire; Soil_mois0: annual soil moisture in the year before the fire; Soil_mois1: annual soil moisture in the year of the fire; Tem_avg: daily average temperature; GST_avg: daily average ground surface temperature; RH_avg: daily average relative humidity; RH_min: daily minimum relative humidity; Pre_daily: daily precipitation; Pres_avg: daily average air pressure; SSD: sunshine hours; Win_avg: daily average wind speed; Win_max: daily maximum wind speed; DEM: elevation; FVC: fractional vegetation cover; Dis_road: the distance to road and railway; Dis_sett: the distance to settlement; Pop: population density; GDP: gross national product.
Soil_mois0: annual soil moisture in the year before the fire; Soil_mois1: annual soil moisture in the year of the fire; Tem_avg: daily average temperature; GST_avg: daily average ground surface temperature; RH_avg: daily average relative humidity; RH_min: daily minimum relative humidity; Pre_daily: daily precipitation; Pres_avg: daily average air pressure; SSD: sunshine hours; Win_avg: daily average wind speed; Win_max: daily maximum wind speed; DEM: elevation; FVC: fractional vegetation cover; Dis_road: the distance to road and railway; Dis_sett: the distance to settlement; Pop: population density; GDP: gross national product.  Table A1.