Modeling Forest Lightning Fire Occurrence in the Daxinganling Mountains of Northeastern China with Maxent

Forest lightning fire is a recurrent and serious problem in the Daxinganling Mountains of northeastern China. Information on the spatial distribution of fire danger is needed to improve local fire prevention actions. The Maxent (Maximun Entropy Models), which is prevalent in modeling habitat distribution, was used to predict the possibility of lightning fire occurrence in a 1 × 1 km grid based on history fire data and environment variables in Daxinganling Mountains during the period 2005–2010.We used a jack-knife test to assess the independent contributions of lightning characteristics, meteorological factors, topography and vegetation to the goodness-of-fit of models and evaluated the prediction accuracy with the kappa statistic and AUC (receiver operating characteristic curve) analysis. The results showed that rainfall, number of strikes and lightning current intensity were major factors, and vegetation and geographic variable were secondary, in affecting lightning fire occurrence. The predicted model performs well in terms of accuracy, with an average AUC and maximum kappa value of 0.866 and 0.782, respectively, for the validation sample. The prediction accuracy also increased with the sample size. Our study demonstrated that the Maxent model can be used to predict lightning fire occurrence in the Daxinganling Mountains. This model can provide guidance to forest managers in spatial assessment of daily fire danger.


Introduction
Forest fire is a major environmental problem in many forest biomes across the world [1].Forest fire can be divided into natural and human-caused, according to source of fire ignition, and lightning is the main natural source causing forest fire occurrence.Many lightning fires occur annually in China, of which the most frequent and concentrated area is the Daxinganling Mountains [2].According to the statistics compiled from 1988-2007, the number of lightning fires accounted for more than 60% of the fire incidents in the Daxinganling Mountains of Heilongjiang province, with burned areas of 4833 ha, and there was a significant increase in lightning-caused fires (especially in the number of fires and fire days) during this period [3].Lightning fires are more difficult to suppress than those caused by humans because of their remoteness and aggregation in time and space [4,5].Therefore, lightning fires usually cause serious damage to forest resources and the environment [6].
Lightning fires do not occur at random, but rather tend to start in specific places [7].The efficiency of individual lightning strikes in igniting a forest fire is mainly affected by the variety in lightning properties such as the quantity, polarity and intensity [5].Research has shown that the long-continuing current intensity of a lightning discharge is responsible for most, if not all, lightning-caused fires [8][9][10].The advent of lightning location systems has advanced lightning-caused fire prediction [11][12][13].Models of lightning fire occurrence have greater temporal and spatial specificity (correct prediction) than human-caused fires because the proximate cause is obvious.
Various mathematical models have been applied to predict lightning fire occurrence, such as the ordinary least squares (OLS) regression model [14], binary logistic regression model [15][16][17][18], cellular automaton model [19], weights-of-evidence model [20], generalized linear model (GAM) [21], negative binomial regression model [22], etc.Most of these models have their limitations for application.For the least squares regression model, the distribution of fire data needs to be normal and of equal variance.Negative binomial regression model requires the variance of distribution for fire data be greater than the average.Modern data-mining techniques, artificial neural networks and vector-support machines may be superior to parametric methods, but risk overfitting to the data.
The principle of maximum entropy is to estimate the probability distribution of maximum entropy, which is, under a set of constraints (the environmental conditions), the most spread-out or closest to uniform [23].Maxent (Maximum Entropy Models) iteratively evaluates the contrasts between the values of these observations and those of a background consisting of the mean observations over the entire study area, as sampled from a large number of points.One of Maxent's most important features is the capacity to fit highly complex response functions by combining several function types (linear, quadratic, product, threshold, and hinge).It can fit jagged and sharply discontinuous responses that cannot be modeled in even the most flexible regression techniques, such as generalized additive models.It is also adjusted for over fitting through a process called "regularization", a mechanism that prevents the algorithm from matching the data too closely.
Maxent has been proved to perform well in modeling habitat distribution in comparison with other methods.It aims to predict potential geographical distribution of biological species from the observation of species occurrences [24,25].This is a very similar problem to predicting a fire risk level as a function of external explanatory variables [26].Fire is similarly strongly regulated by the "fire environment triangle", i.e., topography, fuels and weather [27,28], which can be assessed from the conditions in which fires have already been observed.The model can express a per-pixel probability of fire occurrence from a set of environmental raster layers, which can be used as a critical tool for forest management.
Several studies have tried to apply the Maxent models to research the susceptibility of fire occurrence to environmental conditions.Parisien [28] used a Maxent model to assess the relationship between fire occurrence and environmental factors at three spatial scales in the USA.The results showed that the model and its relevant concepts can be applied to study spatial distribution of fire occurrence.Renard [1] applied the Maxent algorithm to provide a quantitative understanding of the environmental controls regulating the spatial distribution of forest fires in the Western Ghats of India, and constructed local models for predicting annual forest fire occurrence.
The aims of this study are to apply the Maxent model to predict the risk of daily lightning fire occurrence based on historical fire data and environment factors in the Daxinganling Mountains during the 2005-2010 period, and to provide a new method for spatially explicit assessment of daily lightning fires occurrence.

Study Areas
The Daxinganling Mountains are located at 50°10′-53°33′ N, 121°12′-127°00′ E in the Heilongjiang province of China (Figure 1), with an area of 8.46 × 10 4 km 2 .The study area has a cool continental monsoon climate, with annual average temperature of 2.8 °C and annual precipitation of 350-500 mm mainly concentrated in July-August.Snow covers last five months in winter and snow depth can be of 30-50 cm in forest areas.The area has a relatively flat terrain and the elevation ranges from 300 to 1400 m, with gentle slopes <15° accounting for >80% of the area.Forest cover is 76.3% of the area.Dominant tree species include of Larix gmelinii, Betula platyphylla, Pinus sylvestris var.mongolica.Owing to a high frequency of summer thunderstorms, lightning fires usually occur from April to July each year, with a peak in June.The numbers and burned areas of lightning fires of the region rank first in China.

Data Resources
The data used in this paper consist of 627 historical lightning fire records with geographical coordinates and the time of occurrence in the Daxinganling Mountains during the 2005-2010 period, lightning data (location and date of lightning strikes, lightning current intensity and charge amount), weather over the last three days before ignition (temperature, relative humidity, wind speed, rainfall), forest fuel type of ignition, and topography (altitude, slope and aspect).The particular period (2005-2010) was selected because the lightning detection system began to operate in 2005.
Lightning fire historic records were provided by the Forest Fire Prevention Department of the Forestry Group Company of Daxinganling in the Heilongjiang province.Lightning location data was obtained from the lightning device monitoring system of the National Weather Service located in Daxinganling.
Meteorological data were from 10 meteorological observation stations of the national daily meteorological data sharing network, five stations in the research area and the other five stations near the study area.The daily meteorological data for the area were generated by using Daymet [29], which uses regression and digital elevation maps to interpolate data from existing weather stations over complex terrain.

Data Preparation and Variable Selection
The dependent variable was the probability of fire ignitions divided in a grid cell on each day.An estimated ignition, or start date, of each fire was determined by examining the daily lightning strike record prior to the detection of the fire.That is, we estimated the ignition date of each fire as the most recent day, prior to the fire's detection date, on which lightning occurred within a 10 km radius of the fire ignition location.Based on experience from related modeling work and preliminary analyses, the environmental variables in Table 1 were considered as candidate explanatory variables.
The Maxent software (Princeton, NJ, USA) requires inputs of environment variables to be in an excel file in ESRI ASCII raster format or csv format.Therefore, the inputs of all environmental variables in this study were in form of ASCII grid raster layer at a 1 × 1 km spatial resolution.The raster layers were generated in ArcGIS10.0software (Stanford, CA, USA).Meteorological data were interpolated to raster layers with a 1 × 1 km spatial resolution.Topography data were derived from DEM in a 90 m resolution.Due to the difference in combustion characteristics [17,30,31], fuel types of ignition were categorized into five classes according to combustion characteristics of forest fuel, which were larch forest, Scots pine forests, mixed forest with larch, birch and oak trees, mixed forest with Scots pine, birch and oak trees, and grass.Environmental variables may influence each other, and hence we conducted multi-collinearity diagnostics for all environmental candidate factors and excluded the variables which had a closely linear relationship with other variables before model building.The VIF (variance inflation factor) value was employed to measure the collinearity among environmental factors.Variables with VIF values >5 were removed from the analysis.

Maxent Modelling of Fire Occurrences
The principle of the Maxent model is to estimate the probability distribution of maximum entropy, which is, under a set of constraints (the environmental conditions), the most spread-out or closest to uniform [23].Maxent iteratively evaluates the contrasts between the values of these observations and those of a background consisting of the mean observations over the entire study area, as sampled from a large number of points.One of Maxent's most important features is the capacity to fit highly complex response functions by combining several function types (linear, quadratic, product, threshold, and hinge).It can fit jagged and sharply discontinuous responses that cannot be modeled in even the most flexible regression techniques, such as generalized additive models.It is also adjusted for overfitting through a process called "regularization", a mechanism that prevents the algorithm from matching the data too closely.
The Maxent model was fitted to our data using 70% of the fire occurrences (training points).The predictive power of models was assessed by cross-validations using the 30% remaining occurrences (test points) not used to fit the model [23,32] and a set of 1000 random locations representing background (or pseudo-absence) points [24].
Models were computed in Maxent 3.1 [24].The fitted parameters were used in the model to produce a raw probability map that underwent a transformation to produce the so-called "logistic" output [33].The mapped value of each cell of this output represented an estimate of relative probabilities ranging from 0 to 1, and a high value of the Maxent function at a particular location indicates that it is fire-prone.The default values of the regularization parameters were used in the model [34].

Contribution of Environmental Variables
We analyzed the relative contribution of each environmental variable to the most suited models based on jack-knifes of regularized training gain and AUC value methods [35].The regularized training gain was measured using a heuristic estimate in which the increase or decrease in model gain attributed to each variable from one iteration of the training algorithm to the next was added to the variable contribution.At the end of the iteration, the larger the value of the environment variable gained, the greater the contribution of the environment variable was.The AUC value method assessed which variables matter most by calculating the AUC values in both cases of only including and excluding each environment variable.When the AUC value of only including the variable was not remarkably less than that of including all variables, or the AUC value of excluding the variable was remarkably less than that of including all the variables, the environment variable had a larger contribution.The graphics were produced in SigmaPlot 10.0 software (Stanford, CA, USA).

Model Evaluation
Model prediction usually produces two types of errors (Table 2).The first type of error was underestimation, considering the fire presence area as low risk of fire occurrence, namely a false negative.The second was overestimation, considering the fire absence area as high risk of fire occurrence, namely a false positive.These two types of errors were closely related to the choice of the threshold value.The following indices are often used to evaluate the accuracy of the model: overall accuracy, true positive rate and negative rate, and kappa statistics (Table 3).The overall accuracy, which is prone to bias, reflects the fire number.True positive rate reflects the prediction ability of fire presence, while it is vulnerable to the second type of error.True negative rate indicates the prediction ability of fire absence, but it is vulnerable to the first type of error.Notes: a: true positive; b: false positive; c: false negative; d: true negative; n: total sample number.
The kappa statistic considers the true positive rate and negative rate, but it is vulnerable to the threshold value.The kappa value ranges from −1 to1.The maximum kappa value is close to 1, which means the prediction effect is very high.Landis and Koch [36] have suggested the following ranges of agreement for the kappa statistic: poor K < 0.4, good 0.4 < K < 0.75 and excellent K > 0.75.A kappa value equal to or less than 0 indicates the prediction effect is inferior to a random distribution model.
ROC (receiver operating characteristic curve) analysis is also a threshold-independent method for evaluating the accuracy of prediction.A pair of positive or false values can be obtained by changing the diagnostic threshold value.The plot of sensitivity (true positives rate) versus one negative specificity (false positives rate) is the ROC curve.AUC is the area value under the ROC curve.AUC values typically vary between 0.5, indicating that model predictions are no better than a random classification of observations, and 1.Generally, the AUC value ranges from 0.5 to 0.7, implying that the model accuracy is low, 0.7 to 0.9, meaning that the accuracy is moderate, and more than 0.9, suggesting that the accuracy is high [37].
The AUC value and the maximum kappa value were used to measure the performance of the predicted model in the study.

Sample Size
To evaluate how many observations were required to build good predictive models, the effect of increasing the sample size of the training points was assessed by building the Maxent model.Models created from a subsample of training points can produce different predictions for the entire data set.Ten iterations of the model were run, each iteration using a random selection of training data.The mean and variability of the predictions were examined for each data set.A set of 1000 random background (or pseudo-absence) points was used for the increased sets of sample points for model training.

Multi-Collinearity Relations between Environmental Variables
The VIF values of lightning energy (LE) and neutralized charge amount (NC) were greater than 5 (Table 4), indicating that these two variables had a significant linear relationship with other variables and they were excluded for model training.This may be due to the interaction between lightning energy (LE), neutralized charge amount (NC) and lightning current intensity (LCI).The VIF values of other environmental variables were all less than 5 (Table 4), implying no significant collinearity exists between them and they can be used for model training.Notes: DMT: daily average maximum temperature in the 3 days before the ignition; DAH: daily average relative humidity in the 3 days before the ignition; DAWS: daily average wind speed in the 3 days before the ignition; DR: rainfall in the 3 days before the ignition; FT: forest fuel type of ignition; ALT: altitude; ASP: aspect; SLO: slope; LCI: lightning current intensity for all strikes; LN: number of strikes on the day of the ignition; LE: lightning energy for all strikes; NC: neutralized charge amount for all strikes.

Variables Contribution to the Predicted Model
The regularized gain values of number of strikes (LN), rainfall (DR), and lightning current intensity (LCI) were significantly higher than other environmental variables (Figure 2).This suggested that these three variables had the largest contribution to the prediction model.The gain values of other variables were low, and they were as follows, in descending order: maximum temperature (DMT), average relative humidity (DAH), aspect (ASP), fuel type (FT), average wind speed (DAWS), altitude (ALT), and slope (SLO).
The AUC values of rainfall (DR), lightning current intensity (LCI) and number of strikes (LN) were significantly higher than other variables when including only each variable in model training data (Figure 3).When the three variables (DR, LCI and LN) were excluded, the AUC values in model training clearly decreased compared to those when including all the variables.This suggests that the prediction accuracy improves significantly when the three variables are included in the model training.As to other environmental variables, the AUC values decreased when excluded compared to including all variables, but the AUC values markedly declined when only including those values.This indicates that these variables did not contribute to the model prediction accuracy in a significant fashion.
The importance ordering of the environment variables obtained by jack-knifes of regularized training gain is roughly consistent with that derived from the AUC values method.Number of strikes (LN), lightning current intensity (LCI) and rainfall (DR) were the top three in importance sorting by both methods.It was suggested that the three variables were primary factors impacting forest lightning fire occurrence.Average wind speed (DAWS), altitude (ALT) and slope (SLO) ranked last in order of importance, indicating that these variables had little effect lightning fire occurrence.1).

Figure 3.
The relative contribution of environment variables to the model according to the AUC value method (AUC is the area under the curve of the sensitivity vs. 1-Specificity plot; codes for the variables are as given in Table 1).

Model Fit
Prediction of lightning fire occurrence can be conducted after a lightning strike.After the lightning location system detects lightning data, all the environment variable data are input into the Maxent software in the form of raster layers, and the software will automatically calculate the fire probability value of each grid cell and output the results in the form of raster layers.The probability values, ranging from 0 to 1, represent the risk of lightning fire occurrence, which was divided into five classes in the study: very low (0-0.2),low (0.2-0.4), moderate (0.4-0.6), high (0.6-0.8), and very high (0.8-1).Lightning data is dynamic and changes every day, and therefore the fire danger rating map is also updated dynamically with lightning data.
Observation on a day with many fires was compared with the modeled probability of fire occurrences on that day to examine the fit of the model.Figure 4 shows the modeled fire occurrence on September 26, 2010 and how they were in agreement with those observed.In locations where actual fire occurred, the value of fire danger classes is relatively high.Otherwise, the value of fire danger classes is relatively low.The western region has a relative higher fire danger compared with the other area.Seven fires occurred in that day and all fires occurred in the locations of high or very high fire danger classes.

Effect of Sample Size on Model Predictions
When different proportions of data were selected for model training, the maximum kappa value and the AUC values of the model varied accordingly.With an increase in the proportion of training data, the maximum kappa and AUC values presented an increasing trend (Figure 5).Model performance in Maxent relative to that of the full data set is still acceptable, or at least informative, when 4/10 of the total observations were used for model training in this study region.However, the models experienced substantial loss of prediction accuracy when 3/10 of the available observations were used.For more than 4/10 of the total observations in the study, all AUC values and the maximum kappa values were more than 0.75, with average values of 0.859 and 0.772, respectively, indicating that the predicted models had a higher prediction accuracy in the range of the training data.

Factors Influencing Performance of the Predicted Model
According to the AUC and kappa statistics used for evaluation, the Maxent model developed in this study was found to be suitable for predicting lightning-caused fire occurrence in Daxinganling region of China.The AUC and the maximum kappa values were on average 0.866 and 0.782, respectively, for the validation sample, depending on the size of the data set analyzed.The values of the prediction accuracy obtained were less than those obtained in some studies on the occurrence of lightning-caused fires in other regions [1,28].This is primarily due to the factors discussed below.First of all, the key to the Maxent model application is the choice of characteristic function.The characteristic function represents the way environment variables respond to forest lightning fire occurrence.This study chose the result as the characteristic function for the Maxent software based on the relationship between species distribution and environment variables.The relationship between lightning fire occurrence and environment variables is very complex, which is very difficult to express with the characteristic function.Therefore, exploring new characteristic functions more applicable to lightning fire occurrence may improve the prediction accuracy of the model.Secondly, the interpolation of meteorological data between weather stations may have introduced a source of potential variability.There were only 10 meteorological stations used in the study area, and therefore the interpolation accuracy was inevitably affected by the limited number of weather stations.Improving estimation of the spatial distribution of precipitation is a likely key to improving lightning fire occurrence prediction.Spatial interpolation of precipitation data might be improved by increasing the density of weather stations, or by assimilating ground-or satellitebased radar reflectivity data.Finally, the sample size may be another important factor influencing the prediction accuracy.Our results suggested that the prediction accuracy increased with an increase in the proportion of training sample data (Figure 5).Since the lightning location system was constructed in 2005, our study was limited to the historical lightning fire data during the 2005-2010 periods.With the increase of historical sample data, the prediction accuracy of the model will be improved.

Environmental Determinants of Lightning Fire Occurrence
The contribution of environment variables in the model suggested that rainfall was one of the strongest determinants in lightning fire occurrence (Figure 2).Rainfall can increase dead fuel moisture content, which is one of the best variables to explain the incidence of lightning-induced fire [30].The occurrence of dry thunderstorms is one of the most important variables in the probability of lightning induced ignition.It was reported that dry thunderstorms (defined as thunderstorms without significant concurrent rainfall) were common and an important contributor to fire ignition [3,5].A lightning fire is more likely to be observed when precipitation on the day of a lightning strike is null or negligible, possibly because greater precipitation extinguished fires prior to discovery [38,39].
It was demonstrated that occurrence of lightning fires was largely influenced by the number of lightning strikes but not lightning discharges (Figures 2 and 3).Lightning is more likely to cause ignition due to a greater likelihood of a long continuum current [30,31,40].Shindo and Uman [41] studied 90 cloud-to-ground discharges and found that only 1 of 19 single-stroke discharges was followed by a long-continuing current, whereas 21 discharges of the remaining 71 multiple-stroke discharges contained a long continuum current.These results supported our findings that increased probability of multiple-stroke discharges leaded to an increased probability of ignition.
The topographic variables (altitude, slope and aspect) had less effect on the occurrence of lightning fires (Figures 2 and 3).Similar results were found in other studies [41].It is reported that fires occurring on terrain facing suntraps were more likely to spread due to higher solar incidence and comparatively drier fuel [42].It was found that lightning-induced fires mainly occurred on steeper slopes [43,44].This may be related to the spatial scale of environmental variables data in the study.Because of the spatial precision of the lightning location system, the spatial resolution of the raster data in this study was limited to 1 km, which therefore ignored terrain differences within the scale of 1 km.

Practical Insights for Fire Management
The models developed in this study can be very useful for local forest fire management agencies to improve their forest fires prevention actions.The outputs of this model can be considered robust and are good indicators for spatially explicit assessment of lightning fire occurrence, and hence it is a useful tool for forest fire prevention planning and coordination of regional efforts in the Daxinganling region.The previous wet/dry conditions before a lightning strike, in terms of whether the forest floor was drying, had a significant influence on the probability of a lightning strike.The probability of an ignition will decrease if rain occurred during previous days.Therefore, considering precipitation during previous days as a dry or wet event is likely crucial to improving lightning fire occurrence prediction.Once actual lightning activity has been observed concurrent with previous precipitation, the map of ignition probability can be created to identify the areas that are sensitive to ignition.Forest fire managers can target configuration of manpower and material resources to improve the efficiency of forest fire management according to the spatial distribution of fire danger ratings.Using the Maxent model, the map of potential fire danger could be prepared for the current operational day as well as for several days in the future using forecasted weather.If the Maxent model is applied to other areas, it would require validation for applicability with local history fire and the environmental data.

Conclusions
Based on the historical fire data and the environment variables data influencing forest lightning fire occurrence, it was demonstrated that the habitat distribution model Maxent performed well in modelling lightning fire occurrence in the Daxinganling Mountains area in China.Our results also showed that rainfall and the number and current intensity of lightning were the major determinants of lightning fire occurrence, while the influences of vegetation and geographic variables on lightning fire occurrence were not obvious in this region.The prediction accuracy of the model increased with the sample size.Data with a longer history period can improve the prediction ability of the model, which will help construct a superior predictive model of fire occurrence.The model will be useful for forest managers to improve their fires prevention actions and focus their efforts on endangered sites predicted to have a high possibility of lightning fire occurrence.
Lightning fire is an active ecological factor and acts on forest ecosystem frequently, with a dual nature.On the one hand, fire can burn up forests and do great damage to a forest's structure and function.On the other hand, fire can be regarded a tool and means of forest management.Fire return interval and intensity have played a role in changing plant species composition and affecting succession of forest vegetation in the Daxinganling area.In the north of the study area, the fire return interval is 110~120 a, while the fruition age of Larix-Betula-Pinus forest is 70~150 a, so low or moderate intensity of fire can maintain relatively stability for the forest.However, the fire return interval is 30~40 a at most in the central or south-east area, so it is impossible to fulfill self-renewal for Larix forest, and as a result there is no distribution of Larix forest in this area.Due to high-intensity fire, Larix forest will have a retrogressive succession towards Betula forest.Betula forest will have a progressive succession towards Larix forest due to the interference of low-intensity fire.A Larix forest community can develop better with a certain interval of mild fire interference.

Figure 1 .
Figure 1.Location of the study areas in Heilongjiang province, northeastern China.

Figure 2 .
Figure 2. The relative contribution of environment variables to the model according to a jack-knife of regularized training gain (codes for the variables are as given in Table1).

Figure 4 .
Figure 4. Comparison between the modelled fire danger rating and observed fire occurrence on September 26, 2010.The predicted model performed well in terms of accuracy relative to a random model (AUC = 0.5), with a mean AUC and maximum kappa values of 0.866 and 0.782 respectively.The Maxent model is adequate for predicting lightning fire occurrence in this region.

Figure 5 .
Figure 5.The area under the curve (AUC, mean ± SD) and the maximum kappa value for 10 iterations of the Maxent model as a function of increasing fraction of sample points for the Daxinganling region in China (AUC is the area under the curve of the sensitivity vs. 1-specificity plot).

Table 1 .
Candidate variables for predicting the probability of lightning fire occurrence in area of Daxinganling Mountains, China.

Table 2 .
An error matrix used to evaluate the predictive accuracy of models.

Table 3 .
Measures of predictive accuracy indicators for the common models.

Table 4 .
Multi-collinearity diagnostics for environmental candidate variables.