Mapping Burn Severity of Forest Fires in Small Sample Size Scenarios

Mapping burn severity of forest fires can contribute significantly to understanding, quantifying and monitoring of forest fire severity and its impacts on ecosystems. In recent years, several remote sensing-based methods for mapping burn severity have been reported in the literature, of which the implementations are mainly dependent on several field plots. Therefore, it is a challenge to develop alternative method of mapping burn severity using limited number of field plots. In this study, we proposed a support vector regression based method using multi-temporal satellite data to map the burn severity, evaluated its performance by calculating correlations between the predicted and the observed Composite Burn Index, and compared the performance with that of the regression analysis method (based on dNBR). The results show that the performance of support vector regression based mapping method is more accurate (RMSE = 0.46–0.57) than that of regression analysis method (RMSE = 0.53–0.68). Even with fewer training sets, it can map the detailed distribution of burn severity of forest fires and can achieve relatively better generalization, compared to regression analysis burn severity mapping methods. It could be concluded that the proposed support vector regression based mapping method is an alternative to the regression analysis method in small sample size scenarios. This method with excellent generalization performance should be recommended for future studies on burn severity of forest fires.


Introduction
Forest fires, as a major ecological disturbance agent in forest ecosystems, annually destroy millions of hectares of forest land around the world [1][2][3].Mapping the severity of such fires for their impact assessment has become widely accepted by forest managers and scientific researchers [4].Specifically, the quantitative mapping of burn severity not only is very useful to managers who want to improve the management responses effectively and timely, but also helps researchers to understand the relation between fires and forest ecosystems accurately to obtain more detailed insights [5,6].
Traditionally, burn severity of forest fires has been estimated through field investigations, e.g., Composite Burn Index (CBI) and a modified version of the Composite Burn Index (GeoCBI), which demands a great deal of manpower, money and time [7,8].In the last few decades, the remote Forests 2018, 9, 608 2 of 14 sensing technique has become the major choice of burn severity mapping over large areas, because of its characteristics of large-scale monitoring, short revisiting cycle and low cost [9].Several types of remotely sensed data have been commonly used, such as MODIS, AVIRIS, ALS, TM/ETM+, ASTER, SPOT and IKONOS [4,10].Among them, the Landsat TM/ETM+ has been recognized as the most appropriate one to map burn severity of forest fires, due to the fact that its spatial (~30 m) and temporal resolutions (8−16 days) are sufficient and it is available for free [11].
However, several studies have also indicated that the spectral index is not optimal for burn severity mapping [1,8,[28][29][30][31].Some alternative methods have therefore been proposed to map burn severity.Based on the Radiative Transfer Models (RTM), DeSantisandChuvieco [28] applied a physical method for burn severity mapping in terms of Composite Burn Index (CBI).Further, some image analysis techniques, including the Tasseled Cap Transformation [17,32], the Principal Components Transformation [32,33] and the Spectral Mixture Analysis [1,34] have been applied in mapping burn severity in local study areas.
While the above methods have been successfully applied to map the burn severity of forest fires, there are still room for accuracy improvement, i.e., developing more accurate mapping methods.Furthermore, in practical applications, the successful implementations of traditional methods require several field plots as training samples [35,36].However, this need cannot be met in most cases, since the field work demands a great deal of manpower, money and time.Therefore, a promising method with better generalization performance is needed to improve the accuracy of burn severity mapping and to deal with small sample size problem.
Support vector regression (SVR), as an attractive alternative in estimation and regression analysis [37,38] and a very promising method for machine learning problems, has been studied in many fields, such as land cover classification [39], drought severity estimation [38,40], air pollution forecasting [41], burn severity assessment [42], and so on.Many reported studies have indicated that the SVR has the potential of providing excellent generalization performance and more accurate estimation when the number of field samples is small [43,44].However, to the best of our knowledge, the effectiveness of SVR on mapping burn severity in small sample size scenarios remain unclear.Thus, our aim here is to propose a SVR-based method to map the burn severity of forest fires, and presents the evaluation of its performance.This paper is organized as follows.Section 2 introduces SVR, the optimization of parameters, and the method for performance evaluation.Section 3 describes the study areas and the data required for the study, followed by results and discussions in Section 4. Section 5 concludes the paper.

Study Areas
In this study, we selected five forest fires in the western United States as study areas (Figure 1).The Bear Fire, located in the Southwest of United States, had scorched more than 18.62 km 2  This area belongs to the Highland (alpine) climate and its elevation varies between 1700 and 2740 m.The Precipitation is less than 30 cm per year.The main land covers are trees, shrubs, and grasses.The predominant vegetation are ponderosa pine and Douglas-fir, followed by willows, boxelder, narrow-leaf cottonwood, big sagebrush, rubber rabbit brush, service berry, black grease wood, and so on [45].As the biggest and typical fire in historical records of the Black Hills, the Jasper Fire scorched more than 336 km 2 in the southern Black Hills from 24 August 2000 to 22 September 2000.This area belongs to the Semiarid Steppe climate and its average precipitation is between 44 and 67.3 cm.The elevation in this area ranges from 1050 to 2207 m and the Ponderosa pine is the dominant conifer tree species, which covers about 95% of the Jasper fire [46].There also are white spruce, Limber pine, lodgepole pine, and Rocky Mountain juniper [47,48].The Mule Fire, located in the Northern Rockies region, was started by a lightning strike on 11 July 2002, and smoldered for several days.This area also belongs to the Highland (alpine) climate and the mean annual precipitation ranges from 75 to 115 cm.Primary tree species in this area include lodgepole pine, Engelmann spruce, Douglas-fir, subalpine fir, aspen and whitebark pine [49].The remaining two fires (i.e., the Pw03 fire and the Wolf fire) were located in Yosemite National Park and their occurrence times are close to each other.This area's elevation ranges from 657 to 3997 m with the mean monthly temperatures ranging from −3 • C to 32 • C. It has a Mediterranean climate and contains five major vegetation zones: chaparral/oak woodland, lower montane forest, upper montane forest, subalpine zone, and alpine [50].More detailed descriptions of these fires can be found in the publication of Zhu et al. [51].

in the
Forests 2018, 9, x FOR PEER REVIEW 3 of 14 cm per year.The main land covers are trees, shrubs, and grasses.The predominant vegetation are ponderosa pine and Douglas-fir, followed by willows, boxelder, narrow-leaf cottonwood, big sagebrush, rubber rabbit brush, service berry, black grease wood, and so on [45].As the biggest and typical fire in historical records of the Black Hills, the Jasper Fire scorched more than 336 km 2 in the southern Black Hills from 24 August 2000 to 22 September 2000.This area belongs to the Semiarid Steppe climate and its average precipitation is between 44 and 67.3 cm.The elevation in this area ranges from 1050 to 2207 m and the Ponderosa pine is the dominant conifer tree species, which covers about 95% of the Jasper fire [46].There also are white spruce, Limber pine, lodgepole pine, and Rocky Mountain juniper [47,48].The Mule Fire, located in the Northern Rockies region, was started by a lightning strike on 11 July 2002, and smoldered for several days.This area also belongs to the Highland (alpine) climate and the mean annual precipitation ranges from 75 to 115 cm.Primary tree species in this area include lodgepole pine, Engelmann spruce, Douglas-fir, subalpine fir, aspen and whitebark pine [49].The remaining two fires (i.e., the Pw03 fire and the Wolf fire) were located in Yosemite National Park and their occurrence times are close to each other.This area's elevation ranges from 657 to 3997 m with the mean monthly temperatures ranging from −3 °C to 32 °C.It has a Mediterranean climate and contains five major vegetation zones: chaparral/oak woodland, lower montane forest, upper montane forest, subalpine zone, and alpine [50].More detailed descriptions of these fires can be found in the publication of Zhu et al. [51].

Field Plots and Sub-Sampling
The Composite Burn Index (CBI) is an effective field-measurement of burn severity levels after fire [4].During the first/second following growing season after each fire happened, the fieldmeasurement CBI data (total 262 field plots) were collected by the investigators of the National Park Service (NPS) and the US Geological Survey (USGS).For each field plot, the CBI value was calculated by averaging the fire effects in five strata based on the four or five variables assessed visually and scored from zero to three in each strata independently [4].More descriptions of the CBI data may be found in the publication of KeyandBenson [4].
For the Jasper fire, the post-fire vegetation environment of some field plots was affected by some human activities, such as logging on remaining trees, misregistration errors and creek channel issues prior to field investigation [46].To remove any possible biases resulting from the use of these field plots (8 field plots), we therefore eliminated them before the experiments.Furthermore, since the rest

Field Plots and Sub-Sampling
The Composite Burn Index (CBI) is an effective field-measurement of burn severity levels after fire [4].During the first/second following growing season after each fire happened, the field-measurement CBI data (total 262 field plots) were collected by the investigators of the National Park Service (NPS) and the US Geological Survey (USGS).For each field plot, the CBI value was calculated by averaging the fire effects in five strata based on the four or five variables assessed visually and scored from zero to three in each strata independently [4].More descriptions of the CBI data may be found in the publication of KeyandBenson [4].
For the Jasper fire, the post-fire vegetation environment of some field plots was affected by some human activities, such as logging on remaining trees, misregistration errors and creek channel issues prior to field investigation [46].To remove any possible biases resulting from the use of these field plots (8 field plots), we therefore eliminated them before the experiments.Furthermore, since the rest of field plots (66 field plots) were all located within the burn perimeter, their CBI values were all greater than 0, making the distribution of field plots values not representative for the burn severity conditions within the fire.Therefore, 10 field plots outside the burn perimeter were selected randomly as background field plots on an assumption that their CBI values were 0 s.As a result, the available field plots were 55 in the Bear Fire, 76 in the Jasper Fire, 55 in the Mule Fire, and 78 in the Pw03-Wolf Fires.The total number of field plots was 264.For each fire, its selected field plots were then stratified randomly into two sets: 50% for training and the remaining 50% for testing [52].

Imagery and Pre-Processing
The TM/ETM+ images have already been recognized as the most suitable remotely sensed data for burn severity mapping [4,11].Therefore, multi-temporal TM/ETM+ images were collected in five forest fires (Table 1).For each fire, we obtained the post-fire image acquired around the time when the field data was collected, which can minimize the plant phenology differences between the image acquisition and the field data collection.Pre-fire TM or ETM+ images acquired at the same season one or two years before the acquisition of post-fire image were also obtained for each fire.This was helpful for reducing the differences in sun angle, atmospheric effects and plant phenology [28], when the multi-temporal images were used in this study.All of these images with the Level L1G of processing were downloaded from the U.S. Geological Survey EarthExplorer (USGS 2013).Their pre-processing, including atmospheric correction, geometric normalization and reflectance conversion, was conducted using ENVI 5.0 software (ITT Visual Information Solutions, Boulder, CO, USA).According to the coordinates recorded by GPS in the field, the field-measurement data points were matched with the corresponding pixel in the remote sensing image.The spectral reflectance of remote sensing images was then extracted by using a 3 × 3 pixel matrix to reduce the effect of the GPS positioning error [6,53].This matching process was realized using ArcGIS 9.3 software (EsriChina, Beijing, China).

Support Vector Regression
The basic learning problems addressed by traditional/statistical regression are to establish an unknown model f (w, x) (i.e., the quantitative relationship between input and output), based on a limited number of training samples: (x 1 , y 1 ), (x 2 , y 2 ), • • • , (x n , y n ) [43,54,55], as in Equation (1).
Forests 2018, 9, 608 5 of 14 where w i means a set of parameters in regression function, g i (x) represents a set of non-linear transformations, and b denotes the "bias" term.
Traditionally, when estimating model parameters, we only consider the prediction risk.This can ensure that the model has a better performance on limited training samples [43].Most typical prediction risk measures are based on the Empirical Risk Minimization (ERM) [54,55], which can be determined by Equation (2).
where L(y, f (w, x)) is the loss function, and the training sample is assumed to satisfy some (unknown) distributions, which can be defined as a joint probability density function p(x, y).However, in actual applications, both excellent performance on training samples and good generalization on unknown samples (i.e., predicting samples) are needed from the model.Therefore, the Structural Risk Minimization (SRM), an inductive principle based on the Vapnik-Chervonenkis theory (VC-theory, a form of computational learning theory), has been introduced in the SVR modeling [54,55].Based on the VC-theory, the model generalization is affected by the complexity of model and the number of training samples, as expressed in Equation (3).
where H(•) is often called trust scope, h means a measure of model complexity (VC-dimension), and n represents the number of training samples.Therefore, according to the above SRM principle, the model generalization of support vector regression (SVR) can be improved by reducing the model complexity and selecting the training samples.Specifically, the SVR model estimation procedure can be expressed as a convex optimization problem solving [43,44,54-56]: where the w 2 measures the model complexity and the constant C > 0 determines the tradeoff between the smoothness of model and the amount up to which the deviations larger than ε are tolerated [54].Slack variables (ξ i , ξ * i ), which are used to measure the deviation of training samples outside ε-insensitive zone, are based on a different type of loss function (ε-insensitive loss function) proposed by Vapnik for SVR [43, 54,55]: By introducing Lagrange multipliers (α i , α * i ) and kernel function K(x i , x), this convex optimization problem can be then transformed into a dual problem based on the Wolfe duality theory [43, 44,54,55]: subject to Forests 2018, 9, 608 6 of 14 The established SVR model can be expressed as follows [43,44,54-56]: where Lagrange multipliers (α i , α * i ) for every training sample cannot be nonzero at the same time.Consequently, those training samples are named support vectors (SVs).

SVR Modeling
The software platform for the SVR modeling in this study was the LIBSVM Toolbox of Matlab 2011b which can be downloaded on website (https://www.csie.ntu.edu.tw/~cjlin/libsvm/index. html) [57].The hardware platform was a Personal Computer with CPU 1333 MHz and 2 GB RAM.
Similar to traditional/statistical regression methods, the parameters optimization for the SVR modeling also was needed, since appropriate setting of the key parameters (i.e., kernel parameters, C, and ε) can basically affect the SVR performance [54,58].The procedures for parameter optimization are described in the following.

Kernel Parameters
Kernel parameters (i.e., kernel type and kernel function parameters) are related to the distribution of the input values of the training set and their selections are usually based on application-domain knowledge [54].For regression tasks in this study, based on several trials, it has been found that both the linear kernel and the RBF kernel selecting can ensure that the SVR exhibits good performance.Although the RBF kernel is more widely used in remote sensing applications, we selected linear kernel as optimal Kernel Parameters in this study, since this selection can keep us only focus on the choice of C and ε [54].The following optimization procedures can be simplified.

Regularization Parameter
Regularization parameter C > 0 determines the tradeoff between the complexity of model ( w 2 ) and the amount up to which the deviations larger than ε are tolerated.Larger C makes the objective of optimization formulation intend to focus more on minimizing the empirical risk [54].On the other hand, if C is too small, sufficient stress will be placed on minimizing the model complexity.

Parameter ε
Parameter ε controls the width of the ε-insensitive zone used to select the number of support vectors.Bigger ε means that more training samples inside ε-insensitive zone will be eliminated, which can result in "flatter" (i.e., less complex) estimates [54].Hence, in this study, the ability of SVR model generalization was affected by C and ε.
Generally, these two parameters are determined by the above parameter optimization process.They can be defined based on the grid search method [38,57].This search method includes the following three steps:

•
Different types of parameters are paired together within the assigned ranges of these parameters.

•
With a selected pair of parameters in a search grid, the SVR are trained and then the common three-fold cross-validation precision (i.e., the root mean squared error (RMSE) between the predicted and the observed CBI values) is obtained and recorded.

•
By altering the parameters pairs in other search grids, a series of cross-validation RMSEs are then returned too.Finally, the pair of key parameters of SVR (C and ε) with the minimum cross-validation RMSE value is selected.

Performance Evaluations
To evaluate the performance of SVR-based mapping methods, its predicted CBI values with the observed CBI values of testing field data were compared.The Correlation Coefficient (R) and the RMSE (recommended by DeSantisandChuvieco [8]) for the testing set were calculated.Further, these performance measurements of the SVR-based mapping method were evaluated by comparing to those of the RA method.Specifically, the model for the RA method was dNBR-based quadratic nonlinear regression [46,59,60].The calculations of R and RMSE are shown as follows.
Moreover, an additional experiment (see Figure 2) was designed in this study to test the capabilities of the SVR-based mapping method in handling limited number of field samples, in which the number of training samples for each fire was randomly reduced by the 10%.


By altering the parameters pairs in other search grids, a series of cross-validation RMSEs are then returned too.Finally, the pair of key parameters of SVR (C and ε) with the minimum crossvalidation RMSE value is selected.

Performance Evaluations
To evaluate the performance of SVR-based mapping methods, its predicted CBI values with the observed CBI values of testing field data were compared.The Correlation Coefficient (R) and the RMSE (recommended by DeSantisandChuvieco [8]) for the testing set were calculated.Further, these performance measurements of the SVR-based mapping method were evaluated by comparing to those of the RA method.Specifically, the model for the RA method was dNBR-based quadratic nonlinear regression [46,59,60].The calculations of R and RMSE are shown as follows.
Moreover, an additional experiment (see Figure 2) was designed in this study to test the capabilities of the SVR-based mapping method in handling limited number of field samples, in which the number of training samples for each fire was randomly reduced by the 10%.

SVR-Based Burn Severity Mapping and Performance Evaluation
Figure 3 shows the SVR-based burn severity mapping results.The burn severity is also mapped using RA methods (based on dNBR) in Figure 3. From a visual perspective, the spatial distributions of burn severity are more detailed in SVR-based maps than that of RA-based maps.The correlation

SVR-Based Burn Severity Mapping and Performance Evaluation
Figure 3 shows the SVR-based burn severity mapping results.The burn severity is also mapped using RA methods (based on dNBR) in Figure 3. From a visual perspective, the spatial distributions of burn severity are more detailed in SVR-based maps than that of RA-based maps.The correlation coefficient (R) and root mean squared error (RMSE) between the predicted and the observed CBI values of the testing set are presented in Table 2.The R value for SVR-based method is between 0.83 and 0.89, which is larger than that of RA method (R: 0.73-0.85).The RMSE value of SVR-based method (0.46-0.57) is smaller than that of RA method (0.53-0.68) (see Table 2).coefficient (R) and root mean squared error (RMSE) between the predicted and the observed CBI values of the testing set are presented in Table 2.The R value for SVR-based method is between 0.83 and 0.89, which is larger than that of RA method (R: 0.73-0.85).The RMSE value of SVR-based method (0.46-0.57) is smaller than that of RA method (0.53-0.68) (see Table 2).The above results indicate that the SVR-based method performed better than RA method.According to statistics of the TM band reflectance response to burn severity [4], the obvious changes of each TM band reflectance between pre-and post-fire remote sensing images can be observed.This indicates that each TM band might contain specific information about burn severity.The SVR-based burn severity mapping fully uses the spectral information contained in all bands of remote sensing images, while most spectral indices of burn severity in previous studies use only two spectral bands of remote sensing images, for example, the dNBR calculation for RA method only uses near-infrared spectral band (RNIR) and shortwave-infrared spectral band (RSWIR 2.2 ) in multi-temporal images.Therefore, the full use of spectral information contained in the remotely sensed data makes the SVR-based method to reveal spatial distributions of burn severity in more details.

Burn Severity Mapping Accuracy Sensitivity to Training Samples
Figure 4 shows the generalization performances of both SVR-based and RA methods in the case of small sample size.We find that the accuracy of both methods varies with the reduction of training samples (by 10%).In particular, the overall precision measured by RMSE indicates that SVR outperformed RA during the reduction of training samples size (more details about training samples can be found in Table 3).This fact is supported by that of previous studies of other fields [35,36].
It is believed that the better performance of SVR on small number of field plots should attribute to its better model generalization in case of limited training samples.As described in Section 2.1, a measurement of model complexity (i.e., w 2 ) has been introduced into the objective function of the SVR model.It can be reduced by adjusting a specific parameter (i.e., ε).With a bigger ε, more training samples inside ε-insensitive zone will be eliminated, which can result in "less complex" estimates (i.e., better generalization) [54].This means that only those remaining training samples (i.e., named support vectors) can actually affect the mapping results.If these non-support vectors are eliminated in the training process, the SVR generalization performance will not be affected, which means that the SVR is less sensitive to the reduction of training samples size.
In Figure 4 and Table 3, it can be seen that the SVR performed poorly, when 10% training sets were selected to train it.This could be explained by the fact that the number of training sets is too small (i.e., training field plots number is 6-8).With very few training samples, the Empirical Risk Minimization (ERM) of SVR-based method will be enlarged extremely.Although its effect on model generalization can be adjusted by parameter C, this kind of adjustment will not sufficiently improve the model generalization in this situation.This has reconfirmed an appropriate balance (i.e., C) between the complexity of the model and the ERM is needed during the parameter optimization process of SVR.
Based on above discussion, we can conclude that the SVR can map the burn severity accurately and achieve relatively better generalization performance even with fewer training sets.This method should be recommended in future research.

Conclusions
In this study, the SVR-based burn severity mapping method was proposed to accurately obtain the burn severity of forest fires using multi-temporal images.The evaluation of this method was carried out by comparing the predicted and observed CBI values of training sets, and compared with the dNBR-based RA method.The results demonstrated that the performance of the SVR-based mapping method is more accurate than that of the RA method.Even with a smaller number of training sets, it can map the detailed distribution of burn severity of forest fires and achieve relatively better generalization, compared to RA burn severity mapping methods.Therefore, the SVR-based mapping method with better generalization performance should be recommended in future research of burn severity of forest fires.

Figure 2 .
Figure 2. The flowchart of SVR overall process.(a) Is data collection, (b) are data pre-processing and SVR modeling, and (c) is performance evaluations.

Figure 2 .
Figure 2. The flowchart of SVR overall process.(a) Is data collection, (b) are data pre-processing and SVR modeling, and (c) is performance evaluations.

Figure 3 .
Figure 3. Burn severity maps, expressed as Composite Burn Index (CBI) and calculated by using the RA (based on dNBR) and the SVR in five fires: (a-1) the RA mapping results of the Bear Fire; (a-2) the SVR mapping results of the Bear Fire; (b-1) the RA mapping results of the Jasper Fire; (b-2) the SVR mapping results of the Jasper Fire; (c-1) the RA mapping results of the Mule Fire; (c-2) the SVR mapping results of the Mule Fire; (d-1) the RA mapping results of the Pw03-wolf Fires; and (d-2) the SVR mapping results of the Pw03-Wolf Fires.

Figure 4 .
Figure 4. Comparison of overall performance for the SVR-and RA-based methods using different percentages of training samples: (a) Bear Fire; (b) Jasper Fire; (c) Mule Fire; and (d) Pw03-wolf Fire.RMSE represents the Root Mean Square Error.

Figure 4 .
Figure 4. Comparison of overall performance for the SVR-and RA-based methods using different percentages of training samples: (a) Bear Fire; (b) Jasper Fire; (c) Mule Fire; and (d) Pw03-wolf Fire.RMSE represents the Root Mean Square Error.

Table 1 .
Summary of characteristics of forest fires and datasets used in this study.

Table 2 .
Summary of parameters and precisions of RA-and SVR-based methods.The model for RA method is CBI predicted = a × (dNBR) 2 + b × (dNBR) + c (all p values < 0.05).

Table 3 .
Summary of precisions of RA-and SVR-based methods during the reduction of training samples size.

Table 3 .
Summary of precisions of RA-and SVR-based methods during the reduction of training samples size.