Self-Adjusting Thresholding for Burnt Area Detection Based on Optical Images

: Mapping of regional ﬁres would make it possible to analyse their environmental, social and economic impact, as well as to develop better ﬁre management systems. However, automatic mapping of burnt areas has proved to be a challenging task, due to the wide diversity of vegetation cover worldwide and the heterogeneous nature of ﬁres themselves. Here, we present an algorithm for the automatic mapping of burnt areas using medium-resolution optical images. Although developed for Landsat images, it can be also applied to Sentinel-2 images without modiﬁcation. The algorithm draws upon the classical concept of di ﬀ erences in pre- and post-ﬁre reﬂectance, but also takes advantage of the object-oriented approach and a new threshold calculation method. It consists of four steps. The ﬁrst concerns the calculation of spectral indices and their di ﬀ erences, together with di ﬀ erences in spectral layers based on pre- and post-ﬁre images. In the second step, multiresolution segmentation and masking are performed (clouds, water bodies and non-vegetated areas are removed from further analysis). Thirdly, ‘core’ burnt areas are detected using automatically-adjusted thresholds. Thresholds are calculated on the basis of speciﬁc functions established for di ﬀ erence layers. The last step combines neighbourhood analysis and patch growing to deﬁne the ﬁnal shape of burnt areas. The algorithm was tested in 27 areas located worldwide, and covered by various types of vegetation. Comparisons with manual interpretation show that the fully-automated classiﬁcation is accurate. Over 82% of classiﬁcations were considered satisfactory (overall accuracy > 90%; user and producer accuracy > 70%).


Introduction
Forest fires, both human-made and natural, are one of the main causes of adverse ecological, economic and social impacts worldwide. Not only do they lead to the loss of human life [1], they influence climate and carbon cycle changes [2], biodiversity [3], and change soil properties [4]. The reconstruction of the fire history makes it possible to define at least some, very important, aspects of the fire regime: its spatial pattern, distribution, frequency and seasonality [5]. This knowledge is crucial for the development of accurate fire management strategies and policies, not to mention damage assessment.
Satellite images are an exceptional source of data about forest fires, because it is an opportunity to derive long time series information. In regions where fire statistics are non-existent, remote sensing is the only possible source of data. Several successful attempts to map global and regional burnt areas have been carried out with the use of low resolution (5 km) National Oceanic and Atmospheric Administration (NOAA) and Advanced Very High Resolution Radiometer (AVHRR), or coarse resolution (1 km) Moderate Resolution Imaging Spectroradiometer (MODIS) data [6][7][8][9][10][11][12][13][14]. Furthermore, regional studies have been carried out in various environmental conditions: tropical forests [15,16], savannahs and

Data
Atmospherically-corrected pairs of multispectral optical images, with their corresponding cloud and water masks were used. Most were Level-2 Surface Reflectance products from Landsat 4, 5, 7 [48] and 8 [49], supplemented by a few pairs of Sentinel-2 images. In the case of Landsat images no further processing was needed as they are provided as reflectance values along with the cloud mask layers. In the case of Sentinel-2 the atmospheric correction was applied. Sen2Cor processor was employed to transform Level-1C top-of-atmosphere data into bottom-of-atmosphere surface reflectance (Level-2A) and to obtain a cloud mask [50]. The first image in a pair was acquired before the fire season and was considered as the reference image. Alternatively, the reference image was the one acquired one year before the fire season. The second was taken during, or shortly after, the fire season. Finally, the algorithm requires a slope layer, calculated from Shuttle Radar Topography Mission elevation data.
Two datasets were prepared. The first was used to develop the algorithm and was composed of pairs of Landsat images for the scenes that cover Western Greece and Ionian Islands (path 185, row 33) and Northern Portugal (path 204, row 33). These areas were chosen as they are at the limits between the Mediterranean and semi desert vegetation, in the case of Greece, and between the Mediterranean and Atlantic vegetation in the case of Portugal. Twenty images were selected to cover different combinations of pre-and post-fire images, as well as years with different annual precipitation [51] (Table 1). The second dataset was used to test the accuracy of the algorithm. This was composed of 69 pairs of Landsat images, which covered 19 locations worldwide, and four pairs of Sentinel-2 images. The aim was to cover different vegetation types: tropical forests, coniferous forests, broadleaf forests, savannahs, Mediterranean vegetation, grassland and semi-desert ( Figure 1).

Method
The method consists of the following main steps ( Figure 2): (1) band arithmetic; (2) segmentation and masking; (3) the detection of changes and core burnt areas; and (4) region growing. All steps are implemented in Trimble eCognition® software (Trimble, Munich, Germany).

Method
The method consists of the following main steps (

Method
The method consists of the following main steps ( Figure 2): (1) band arithmetic; (2) segmentation and masking; (3) the detection of changes and core burnt areas; and (4) region growing. All steps are implemented in Trimble eCognition® software (Trimble, Munich, Germany).

Band Arithmetic
In the first step, additional layers are calculated from pre-and post-fire images. Two well-known and widely used spectral indices are computed: the Normalized Burnt Ratio (NBR) [36] and the Normalized Difference Vegetation Index (NDVI) [33]: where NIR is the near infrared band, and SWIR is the short wave infrared band where NIR is the near infrared band and R is the red band. NDVI is known for its usefulness for the monitoring of changes in vegetation. NBR has been successfully applied for burnt detection. Both indices are easy to calculate and especially in the case of NDVI could be computed on images from various imaging systems.

Segmentation and Masking
Once the additional layers are calculated, segmentation can start. Multi-level segmentation is performed. First, images that correspond to both (T 1 , T 2 ) Landsat or Sentinel-2 cloud cover masks are divided into objects. This is done using the quadtree procedure [52] which produces homogenous square objects. The objects which contain information about clouds, snow and water are classified and excluded from further analysis. Next, objects that remain unclassified are merged. In the following step, large non-vegetated areas are masked. Unclassified areas are again segmented using the quadtree algorithm, but this time with both (T 1 , T 2 ) NDVI layers as input, and the scale parameter equal to 10. The largest objects that could be obtained were of the size of 1024 pixels, and only these objects were taken into consideration. The object was classified as non-vegetated when the NDVI for T 1 , T 2 was low (<0.17), and its temporal change was in the range [−0.04, 0.04]. This approach has two advantages. Firstly, it accurately classifies large non-vegetated areas without requiring a detailed analysis, and secondly the computation time is very short. All objects classified as non-vegetated are excluded from further analysis. Any remaining unclassified objects are then merged again, and a segmented to obtain objects which will be directly used for burnt areas mapping is performed. This step uses the multiresolution segmentation method [53]. Trial and error showed that the best layers for segmentation are the NBR T2 and the difference of NIR spectral channels. The scale parameter was set at 100, and did not need to be changed between scenes. In the following paragraphs if we refer to statistics calculated for the scene it means that they were calculated only for the part of scene that has not been excluded from analysis at this point.
Segmentation procedures were selected taking into account the requirement to minimize the processing time needed to exclude areas that would affect the calculation of thresholds, but also to correctly delineate burnt areas with high transferability from one scene to another.

Core Burnt Areas Classification
As noted above, all classical approaches to burnt area mapping require manual, regional calibration. The approach that we propose overcomes this important problem. Here, core burnt areas are classified in two steps. The first concerns the coarse classification of potential burnt areas based on the NBR T2 layer of post-fire images. Objects with NBR T2 values much lower than the mean of the whole scene represent vegetation cover which suffered negative changes caused by, e.g., fires or harvesting. Objects that fulfil the following condition are considered for further analysis: where µ o is the mean for an object of NBR T2 , µ s is the mean for the scene of NBR T2 and σ s is the standard deviation for the scene of NBR T2 . In the next step, differences in each spectral parameter is calculated for pre-and post-fire images of the scene. We consider following differences of spectral parameters: indices-dNBR, dNDVI; spectral bands-near infrared (dNIR) and short wave infrared (dSWIR 1 , dSWIR 2 ). These values are used to calculate the threshold for a specified pair of images; they are expressed as relative values and calculated as follows: where dµ is the difference of a spectral parameter at scene level, µ sT1 is the mean of a spectral parameter of pre-fire image, and is the mean of a spectral parameter of post-fire image. Potential burnt areas are refined using a set of thresholds designed to distinguish between unburnt and burnt areas, calculated for the following layers: dNIR, dSWIR 1 and dSWIR 2 . The algorithm uses two kinds of thresholds: variable, which are calculated on the basis of threshold functions for each pair of analyzed images; and constant. Variable thresholds (T) are calculated as a linear or polynomial function of the difference between pre-and post-fire images (dNIR, dSWIR 1 , dSWIR2 and dNDVI): (5) or as a function of a post-fire (T 2 ) image of green and red spectral bands (G and R, respectively): To find threshold functions, burnt areas were manually mapped on reference pairs of Landsat images for the Ionian Islands and Western Greece and Northern Portugal (Table 1). Statistics for unburnt (clouds, water, snow and shadows were excluded) and burnt segments were extracted from the difference image for all parameters. Using these statistics, thresholds were calculated on the basis of the normal distribution [54]. Three cases were considered. In the case where σ 1 σ 2 and ∆ > 0, two intersection points x 1 , x 2 exist. The intersection point for burnt area mapping, which is located between the maximum of the histograms maxima of the distribution probabilities of classes, was used. Intersection points were established using the following formulas: where µ 1 , µ 2 are the means of classes; σ 1 , σ 2 are the standard deviation of classes, and: Only one intersection point exists in cases when ∆ = 0 or σ 1 = σ 2 . When ∆ = 0 and σ 1 σ 2 the intersection point is calculated as follows: When ∆ > 0 and σ 1 = σ 2 , the intersection is at following point: Finally, it should be noted that ∆ cannot be smaller than 0 for any values of µ 1 , µ 2, σ 1 or σ 2 .
Once the thresholds for all parameters have been calculated for each pair of reference images, linear or polynomial regressions between thresholds and the image difference were found.
Core burnt areas are distinguished from potential burnt areas using the conditions presented in Table 2. The conditions were set after the analysis of the thresholds obtained for different calibration sites and values of indices and spectral bands differences in objects considered as burnt areas. Table 2. Conditions used for core burnt area mapping, µ o is a mean value of object (in the case of dNBR, dNIR, dSWIR 1 , dSWIR 2 µ o is the value of relative mean difference) and in the case of image bands R, G and NDVI index µ o is the absolute mean value). T is the threshold calculated using the function.

Conditions Using Constant Thresholds Conditions Using Variable Thresholds (T) Derived from the Function
In order to avoid the misclassification of crops harvested between data acquisitions, which have a similar spectral profile to burnt areas [13], we removed changes detected in the agricultural areas which fulfilled the following conditions: they were located on plains (slope ≤ 6, a slope map was calculated from SRTM data); they were relatively small (< 30 ha, this condition was fitted after the statistical analysis of dimensions of false detections of core burnt areas) and homogeneous (small internal variation). For the purposes of this reclassification, and to ensure that all three assumptions could be correctly evaluated, an additional segmentation level was prepared based on spectral values and slope layer.

Region Growing
Region growing avoids significant omission errors. An object-oriented approach was adopted, as it makes it possible to analyse relations between neighbouring objects. Here, we analysed the neighbourhood of core burnt areas in order to evaluate if the adjacent object could be considered as another burnt area. Once core burnt areas have been mapped, region growing starts. In the first step, objects classified as core burnt areas are merged, and spectral statistics (mean and standard deviation of dNBR) are calculated for all detected burnt areas. Next, neighbouring objects are classified as burnt areas if their spectral distance from the core is lower than the standard deviation of all areas classified as burnt in a scene: where µ BA is the mean of dNBR values of the total burnt area detected on the scene, σ BA is the standard deviation for dNBR values in the total burnt area detected on the scene, and µ o is the mean of dNBR value of each object adjacent to the core burnt area object. If a neighbouring object is classified as a burnt area, it is merged into the core and the process is repeated until there are no more changes. To prevent uncontrolled growing (found in some specific cases), an additional condition, based on dNDVI thresholding, is used. This is calculated in the same way as the thresholds used for the calculation of core burnt areas.
Post-processing is the final step of the classification. Here, objects are merged, and the minimum mapping unit (MMU) is applied. The size of the MMU was set to 1 ha. It allows to filter out objects that were too small to provide plausible statistical values due to low number of pixels used for calculation.
An enclosure analysis is applied to reclassify small unclassified areas (or areas classified as clouds) that are enclosed by burnt areas.

Validation
The method was tested in various geographical areas, on both Landsat and Sentinel-2 images. A total of 73 classifications were validated. For four regions, all available Landsat images were classified. For another 19 regions distributed worldwide, one pair of images was tested. Reference datasets were prepared for each pair of images by visual interpretation. Other authors have reported that commission errors are the main problem in this type of analysis [28,43]; hence, a stratified random sampling approach was applied to address the problem. Stratification was based on the classification of the detected burnt areas. A point density was defined for burnt and unburnt areas. One point indicated 100 pixels of burnt area. Depending on the extent of the burnt area in the scene, a test point was drawn per 1000 pixels classified as unburnt (if the burnt area occupied > 1.5% of the analysed area), or per 10,000 pixels (if the burnt area occupied < 1.5% of the area). The allocation of sampling points results in a dense representation of burnt points (compared to unburnt), and decreases the standard error in the estimated user accuracy of this class [55].
For all classifications, accuracy was assessed using the confusion (error) matrix [56], which compares mapped classes with those observed in the reference dataset. Overall accuracy is usually used as the measure, but if classes are not more-or-less equally represented, it may not be reliable as the standard error increases [56,57]. Consequently, we focused on omission (producer accuracy, precision) and commission (user accuracy, recall) errors. Specifically, we calculated how many tested pairs of images had a user and producer accuracy above 95, 90, 85, 80, 75 and 70 or less. We considered a classification to be 'very good' when all accuracies were > 90; 'good' [80, 90); acceptable [70, 80]; and 'unacceptable' < 70. Overall accuracy was an additional condition, and was set at > 90% for a classification to be considered correct.

Results
A function was established that estimates thresholds for almost all analysed parameters. For the NDVI index and the NIR, SWIR 1 and SWIR 2 spectral bands, they are based on the relative difference in mean values for pre-and post-fire images. In the case of green and red bands, the function is established for mean values of the post-fire image. The regression for the NBR index was impossible to establish. Correlation coefficients for rest of the functions varied from r 2 = 0.67 (NDVI), r 2 = 0.68 (SWIR 1 ) to r 2 = 0.73 (NIR and R), r 2 = 0.75 (SWIR 1 ) and r 2 = 0.81 (G) (Figure 3). Correlation coefficients for parameters were satisfactory, as we did not attempt to map all burnt areas using them, but only the 'seed' areas.
Average producer accuracy was 94.4% (14%-100% for individual images) and average user accuracy was 93.6% (46%-99% for individual images). A total of 17.4% of classifications were unacceptable; 82.6% were considered satisfactory. 46.4% were evaluated as very good (user and producer accuracies of burnt and unburnt classes were higher than 90%), 24.6% were good (user and producer accuracy for burnt and unburnt classes was 80%-90%), while 11.6% were acceptable.
With respect to the performance of the algorithm, we found that it behaved similarly in different geographic settings (Figure 4, Table 2). Independently of the geographic zone, the surface of changes detected by the coarse classification was around three times larger than the final burnt area surface. The exception concerned cases where agricultural areas covered large part of the scene; in this case, the proportion was up to 15-20 times higher. Detected core fires constituted, on average, around 60% of the final burnt surface, but this varied from 32% (for the Sentinel-2 scene of Colombia) to 92% (for the Landsat scene of Kansas). Average producer accuracy was 94.4% (14%-100% for individual images) and average user accuracy was 93.6% (46%-99% for individual images). A total of 17.4% of classifications were unacceptable; 82.6% were considered satisfactory. 46.4% were evaluated as very good (user and producer accuracies of burnt and unburnt classes were higher than 90%), 24.6% were good (user and producer accuracy for burnt and unburnt classes was 80%-90%), while 11.6% were acceptable.
With respect to the performance of the algorithm, we found that it behaved similarly in different geographic settings (Figure 4, Table 2). Independently of the geographic zone, the surface of changes detected by the coarse classification was around three times larger than the final burnt area surface. The exception concerned cases where agricultural areas covered large part of the scene; in this case, the proportion was up to 15-20 times higher. Detected core fires constituted, on average, around 60% of the final burnt surface, but this varied from 32% (for the Sentinel-2 scene of Colombia) to 92% (for the Landsat scene of Kansas).  Although the algorithm was designed based on Landsat images, it was successfully employed on Sentinel-2 images without modification. Four pairs of Sentinel-2 images were tested for fires that occurred in 2016 and 2017 in Colombia (Table 3), California and for two areas in Portugal (Table 4). In all cases, the accuracy was > 90%. Although the algorithm was designed based on Landsat images, it was successfully employed on Sentinel-2 images without modification. Four pairs of Sentinel-2 images were tested for fires that occurred in 2016 and 2017 in Colombia (Table 3), California and for two areas in Portugal (Table 4). In all cases, the accuracy was > 90%.

Discussion
Although a general results are positive it is necessary to comment the failure classifications in order to find sources of errors and point out a possible ways of improvements. With respect to unacceptable classifications, half were due to commission errors, and half due to omission errors. Most were caused by deficiencies in the cloud mask. When not detected by the cloud mask, low stratus clouds, fog and haze altered the scene statistics and degraded the threshold calculation. As we did not implement topographic correction, other sources of error were the presence of shadows in mountainous areas. Another reason, especially for omission errors, is a time distance between fire and post-fire image acquisition. It was especially seen for fires occurred in grasslands. For example, the post-fire image of grassland in Kansas was acquired four days after main fires and the classification result was very good (Table 3), however, the next cloud free image of the area was acquired two months later and on this image it was impossible to detect fires, even by visual interpretation.
Regarding land cover we did not find any relation between the vegetation type and a specific kind of errors. However, further, more extensive testing should be performed in additional natural and semi-natural environments. In the current version of the method agricultural burns are considered only in the case of fields larger than 30 ha. The statistical analysis of false 'core' areas detections showed that the vast majority of them was located at arable lands smaller than 30 ha, which were harvested instead of being burnt. Moreover, it seems that the scenes which image coastal areas are classified incorrectly more often (Supplementary Materials). It can be due to inefficiency of atmospheric correction [49].
In some cases, no objective reason could be found for the failure. We investigated if there is a specific NDVI difference between pre-and post-fire images expressed in absolute values which is especially good for burnt area mapping or makes it impossible. We found that the NDVI difference cannot be used to preselect images in order to obtain satisfactory results of burnt area mapping as for all ranges of differences there were correct classifications done. However, in the case of NDVI of whole post-fire images higher that 0.05-0.1 compared to pre-fire images, it seems that the probability of successful classification decreases ( Figure 5). Further tests should be run to confirm this hypothesis.

Discussion
Although a general results are positive it is necessary to comment the failure classifications in order to find sources of errors and point out a possible ways of improvements. With respect to unacceptable classifications, half were due to commission errors, and half due to omission errors. Most were caused by deficiencies in the cloud mask. When not detected by the cloud mask, low stratus clouds, fog and haze altered the scene statistics and degraded the threshold calculation. As we did not implement topographic correction, other sources of error were the presence of shadows in mountainous areas. Another reason, especially for omission errors, is a time distance between fire and post-fire image acquisition. It was especially seen for fires occurred in grasslands. For example, the post-fire image of grassland in Kansas was acquired four days after main fires and the classification result was very good (Table 3), however, the next cloud free image of the area was acquired two months later and on this image it was impossible to detect fires, even by visual interpretation.
Regarding land cover we did not find any relation between the vegetation type and a specific kind of errors. However, further, more extensive testing should be performed in additional natural and semi-natural environments. In the current version of the method agricultural burns are considered only in the case of fields larger than 30 ha. The statistical analysis of false 'core' areas detections showed that the vast majority of them was located at arable lands smaller than 30 ha, which were harvested instead of being burnt. Moreover, it seems that the scenes which image coastal areas are classified incorrectly more often (supplementary material). It can be due to inefficiency of atmospheric correction [49].
In some cases, no objective reason could be found for the failure. We investigated if there is a specific NDVI difference between pre-and post-fire images expressed in absolute values which is especially good for burnt area mapping or makes it impossible. We found that the NDVI difference cannot be used to preselect images in order to obtain satisfactory results of burnt area mapping as for all ranges of differences there were correct classifications done. However, in the case of NDVI of whole post-fire images higher that 0.05-0.1 compared to pre-fire images, it seems that the probability of successful classification decreases ( Figure 5). Further tests should be run to confirm this hypothesis.  The analysis of individual burnt patches suggests that the region-growing algorithm should be improved. In cases when the fire intensity changes significantly between the core fire and other areas, omission errors are seen. The most representative case of such a situation is the classification carried out for the coniferous forest in Russia (Figure 4a). If fire is of low intensity than the NBR value of the post-fire image is only slightly lower than those of pre-fire image. Hence, it does not fulfil the condition expressed in Equation (12).

Conclusions
The burnt area mapping method presented here was tested in various areas, on scenes that represent diverse types of vegetation: tropical forests, coniferous forests, broadleaf forests, savannah, Mediterranean vegetation, grassland, and semi-desert. Thresholds to delimit core burnt areas were established from functions developed from statistics of pairs of pre-and post-fire images. The method proved to be easily transferable from one region to another, and accuracy remained satisfactory with no loss in automation. A total of 82.6% of pairs of images from different parts of the world were classified correctly. However, further tests are necessary to check the performance of the method in different environmental conditions. As thresholding is dependent on image statistics, it is possible to transfer the method from the Landsat sensor to Sentinel-2 without modification. Four Sentinel-2 datasets were classified with producer and user accuracies > 90%. Nevertheless, additional tests using Sentinel-2 images are needed. Although, overall, the method is accurate, the region-growing procedure needs to be improved, especially for low-severity fires. The main restriction in the use of the method is its dependency on the quality of the cloud mask. Furthermore, in mountainous areas topographic correction should be applied in an image pre-processing step. Future testing of the method should focus on classification of large datasets under various environmental conditions (e.g., snow and ice presence, leaf on/leaf off conditions, shadows). We hope that the further application of the method will benefit not only our understanding of past and present fire regimes, but also help natural resource management.

Conflicts of Interest:
The authors declare no conflict of interest.