A Novel Feature Extension Method for the Forest Disaster Monitoring Using Multispectral Data

: Remote sensing images classiﬁcation is the key technology for monitoring forest changes. Texture features have been demonstrated to have better effectiveness than spectral features in the improvement of the classiﬁcation accuracy. The accuracy of extracting texture information by window-based method depends on the choice of the window size. Moreover, the size should ideally match the spatial scale of the object or class under consideration. However, most of the existing texture feature extraction methods are all based on a single window and do not adequately consider the scale of different objects. Our ﬁrst proposition is to use a composite window for extracting texture features, which is a small window surrounded by a larger window. Our second proposition is to reinforce the performance of the trained ensemble classiﬁer by training it using only the most important features. Considering the advantages of random forest classiﬁer, such as fast training speed and few parameters, these features feed this classiﬁer. Measures of feature importance are estimated along with the growth of the base classiﬁers, here decision trees. We aim to classify each pixel of the forest images disturbed by hurricanes and ﬁres in three classes, damaged , not damaged , or unknown , as this could be used to compute time-dependent aggregates. In this study, two research areas—Nezer Forest in France and Blue Mountain Forest in Australia—are utilized to validating the effectiveness of the proposed method. Numerical simulations show increased performance and improved monitoring ability of forest disturbance when using these two propositions. When compared with the reference methods, the best increase of the overall accuracy obtained by the proposed algorithm is 4.77% and 2.96% on the Nezer forest data and Blue Mountain forest data, respectively.


Introduction
As an integral part of natural ecosystems, forests not only regulate the circulation of air and water in nature and protect the soil from wind and rain, they also reduce the harm caused by environmental pollution to people. The complete or partial destruction of forest and vegetation cover generally results in many adverse consequences in the ecosystem, including an intensification of runoff and erosive processes, loss of biodiversity, and impact on global warming [1].
In recent years, climate change has made extreme weather more and more frequent. Hurricanes and fires are two types of natural disturbances to forest ecosystems [2,3]. A severe hurricane can widely impact on the vegetation composition, structure, and succession of forests, and accordingly influence the terrestrial carbon sink [4,5]. Damages from fires, a common and prevalent disturbance that affects the forest [3,[6][7][8], can be severe, reducing species richness and above-ground live biomass [9]. Between 2015 and 2020,~10 m hectares of the world's forests have been lost each year according to the records of the Food and Agriculture Organization of the United Nations [10].
It is of great significance to map the windfall damages of forests and estimate the affected areas for the post-disaster management [11]. However, field-based studies are time-consuming, laborious, and usually only cover small areas, in consideration of the time and cost, of making the observations. Moreover, forest disasters generally happen in remote regions [12], and as such can be hard to reach for in situ measurements [13]. Consequently, using these methods is constrained to an extremely limited spatial and temporal extent. On the contrary, as exemplified in [14][15][16][17], remote sensing offers an opportunity to accurately assess all areas covered by damages in a reliable, cost-effective, and time-efficient manner, to the extent that machine learning provides accurate classifying tools.
A substantial number of approaches have been devised for monitoring forest changes using remote sensing images classification [18,19]. For example, Zarco-Tejada et al. detected forest decline using high-resolution hyperspectral and Sentinel-2a imagery [20]. White et al. evaluated the utility of a spectral index of recovery based on the Normalized Burn Ratio (NBR): the years to recovery, as an indicator of the return of forest vegetation following forest harvest (clear-cutting) [21]. Bar et al. identified pre-monsoonal forest fire patches over Uttarakhand Himalaya using medium resolution optical satellite data [22]. However, most of these methods pay more attention to only utilize the spectral features derived from image pixels. The feature limitation could not solve the problem of similar reflectance characteristics coming from different objects, thus leads to a decrease in classification accuracy.
In later research endeavors, apart from the spectral information, texture features have been widely used in image classification to assist in the discrimination of different target classes with similar spectral features [23,24]. They provide an opportunity to increase the classification performance of the forest disturbances, specifically for forest hurricane and forest fire disturbance mapping. For example, Jiang et al. computed texture features of the image to recognize forest fire smog [25]. The recognition accuracy is 98% with robustness on their smog image database. Benjamin et al. extracted the fire region from forest fire images using color rules and texture analysis [26]. Kulkarni et al. used the local fractal TPSA method to compute the textural features of the image and evaluated the impacts of Hurricane Hugo on the land cover of Francis Marion National Forest, South Carolina [27]. Textural features are statistical measures of the spatial arrangement of the gray-level intensities of an image [28,29]. Generally, texture feature extraction approaches are applied to pixels of the input image by evaluating some type of difference among neighboring pixels through square windows that overlap over the entire image [30]. The accuracy of extracting texture information by window-based method depends on the choice of various parameters. For example, the study by Marceau et al. [31] proved that the window size is the most crucial parameter impacting classification accuracy. Murray et al. systematically analyzed the influence of the window size and texture feature selection on classification accuracy [32]. Garcia et al. analyzed the part played by both the shape and size of the window, then indicated that texture features are much more affected by the window size than by its shape [33].
In fact, the size of the texture window should ideally match the spatial scale of the object or class under consideration. However, the aforementioned texture feature extraction methods are all based on a single window. In other words, the single window approaches do not adequately consider the scale of different objects, and always face the trade-off problem between window size and classification accuracy [32]. On the one hand, the window size should be large enough to obtain the relevant patterns, but if the size is too large, edge effects could dominate the results [34]. On the other hand, finding precise localizations of boundary edges between adjacent regions is a fundamental goal for the segmentation task, but it can only be ensured with relatively small windows [30]. Consequently, how to obtain a good texture characterization is a very important research direction for the improvement of the image classification accuracy.
Ensemble learning techniques have been successfully applied to remote sensing image classification and are attracting more attention for their increased performance in accuracy [35][36][37][38][39][40]. They carry out various training tasks inducing multiple base classifiers whose outcomes are fused into a final prediction. Random Forest (RF) is a very powerful ensemble algorithm. It is a collection of decision trees usually induced with the Classification and Regression Trees algorithm (CART) [41]. RF has been widely used for forest disturbance mapping. For example, Einzmann et al. used the RF classifier to detect the area damaged by the hurricane [42]. Their methodology was evaluated on two test sites in Bavaria with RapidEye data at 5 m pixel resolution and identified over 90% of the windthrow areas. Ramo et al. developed an RF algorithm for MODIS global burned area classification [43]. To assess the performance of their methods, these models are used to classify daily MCD43A4 images in three test sites for three consecutive years (2006)(2007)(2008). Considering the advantages of random forest, such as fast training speed and few parameters, random forest is also used for forest disaster monitoring in this paper. In the application under focus, we assume that a fair amount of already classified pixels are available for training. These pixels should be relevant by having sufficiently similar reflectance characteristics, and yet in general, they should not be expected, to be part of the studied image. In RF, base classifiers are decision trees, and diversity is achieved by multiple random samplings of both the training set and the feature set. Outcomes of all decision trees are combined based upon majority rule. As a supplementary benefit and with no extra computations, the training of decision trees provides measurements of feature importance. These measurements are used for feature selection, as a dimension reduction technique improving the ability to discriminate between classes.
Our aim is to classify pixels from a multispectral image captured over a forest landscape, into three classes-damaged, not damaged, and unknown-denoted here as D, N, and U, respectively. The specific objectives of this study are to (1) develop a new algorithm for extracting texture features, more specifically, using a composite window; (2) utilize the importance of features calculated while using RF for feature selection as a class-discriminative dimension reduction tool; and (3) validate this algorithm. Finally, the texture features extracted from this study could improve the classification accuracy of remote sensing images and enhance the monitoring ability of forest disturbance.
The structure of this paper is as follows. Section 2 of this paper describes the studied area and the imagery being used for this study. In Section 3, the proposed method is presented. Section 4 presents the results and the analysis, including the overall accuracy, the kappa coefficient, the per-class accuracy, the per-class area, and color maps towards evaluating forest damage. The discussion is presented in Section 5. Section 6 shows the conclusions that can be drawn from the methods and results.

Study Area
Nezer forest and Blue Mountain forest are the two studied areas shown in Figure 1. Nezer forest covers approximately 60 km 2 and is located near the Atlantic coast in southwest France, within a large European maritime pine forest. Figure 2 presents maximal wind speed recorded during the storm Klaus and the damage to trees in France [44]. As we can see from Figure 2b, the southwestern part of France, where we studied, was severely damaged by the storm. Figure 3 shows the bitemporal Formosat-2 image from pre-and post-Windstorm Klaus, acquired on 22 December 2008 and 4 February 2009, respectively. Only the second image is being used here. The images have 8 m spatial resolution and four spectral bands (red, green, blue, and near-infrared). Image radiance is rescaled between 0 and 255. We performed atmospheric correction on this data through ENVI (5.3). The atmospheric model is Sub-Arctic Winter and the aerosol model is rural. Besides, the aerosol retrieval we selected is 2-band (K-T). The other parameters were set by default. All images are orthorectified and georeferenced. We have been provided with an already labeled classification of the second image in the three classes (D, N, and U). This is based on an exhaustive visual inspection by comparing Figure 3a,b, and recorded on a georeferenced map of locations of damaged and undamaged areas.   The second studied area is located in the southwest of the Blue Mountains National Park, which is located in the territory of New South Wales, approximately 100 km from Sydney, Australia. Covering an area of approximately 47.8 square kilometers, Blue Mountain forest is home to large areas of virgin forest and subtropical rain forest, of which the eucalyptus tree is the most famous, the national tree of Australia. Figure 4, which is provided by EMSINA and Geoscience Australia, Department of Environment, presents where fire agency maps of burned areas and world heritage areas overlap [47]. Eighty percent of Blue Mountains rainforests in which our study area is located were burnt in bush fires. Figure 5 shows the two satellite remote sensing images (before and after the fire) obtained through sentinel-2A in 22

The Sample Data
Samples have been evenly and randomly split into the training set I and the test set T , regardless of their class membership. For Nezer Forest, the training and the test samples were shown in Figure 6a and Figure 6b, respectively. For Blue Mountain Forest, the training and the test samples were exhibited in Figure 7a and Figure 7b, respectively. We note that the training set and the test set for both research areas are all independent. In two figures, red, green, and white represent the damaged, undamaged, and unknown object, respectively.  Tables 1 and 2, respectively, display the amount of training and testing samples from each of the three classes (N, D, U) for both images (Nezer Forest and Blue Mountain Forest). Interestingly, both datasets are fairly balanced between the N and D classes. Note that when comparing the training set and the testing set, it appears that the two-class distributions are different. This is caused by the randomization and it has been left unchanged, as such may occur in the application considered.

The Proposed Method
The proposed training method is summarized in a flowchart shown in Figure 8. On the left of the flowchart, a multispectral image is depicted as a stack of images (i.e., one image for each wavelength). The "preprocessing" indicated with a right arrow is here represented by normalization mapping pixel values in the range of (0, 1). At the upper center of the flowchart, there is a large frame entitled "Texture feature extraction" transforming images into feature values. Pixels inside windows have specific spatial coordinates assembled in two sets: V for the smaller window and W for the larger one. These images are first scanned by a composite window as disclosed in Section 3.1. From pixels inside each window, five feature values are then extracted, as enumerated in Section 3.2. On the right of the flowchart, there is a frame entitled "Feature selection and classification" which figures two tasks. The extracted feature values feed an RF ensemble classifier, as summarized in Section 3.3. Besides, while the base classifiers are being trained, measurements of feature importance are first readout, then used for feature selection, as explained in Section 3.4. The proposed training method is sketched in Algorithm 1. The actual feature values used to train the ensemble classifier, come from the training set. Moreover, labels of the test set are compared to predictions of the trained ensemble classifier. Those connections are indicated with two long arrows at the bottom of the flowchart and the comparisons of these predictions are figured with a small frame entitled "Prediction evaluation" at the bottom right of the flowchart. Equations of these comparisons are given in Section 3.5. The parameter setting will be presented in Section 3.6.

Scanning with a Composite Window
The composite window proposed here is a set of two windows sharing the same center, the first being larger than the second one. To ease notations, windows are assumed to be square and their sizes to be odd integers. Their respective sizes can be written as (2q + 1)×(2q + 1) and (2p + 1)×(2p + 1). With a size denoted as M×N, multispectral images are processed in a raster scanning order with a stride of 1. There are K = #K possible locations for both windows as they are moved together (K is a set containing all possible k-indexes). Some locations are discarded when they would be centers of windows exceeding the image's size.
The coordinates of their common center is Values extracted from both k th windows are where I(m, n, d) is the pixel value at location (m, n) and wavelength is indexed by d. This composite windowing technique is illustrated on the left of Figure 9 showing a flowchart of the feature extraction process with values specific to the Nezer dataset. The four superimposed darker squares are figuring a multispectral image with four wavelengths, and from which can be extracted four specific images, denoted in equations as I(m, n, 1), . . . , I(m, n, 4). On the upper left corner of the first frame, a white square contains inside itself a gray square. Both squares are figuring the first location of the two windows sharing a common center as indicated by a left-oriented arrow. The central part of the Nezer-specific flowchart lists eight windows; there are two windows for each captured wavelength. These 2×D windows are specific to each location, so from a multispectral image, there are 2×D×K windows. Note an unusual consequence of this windowing process in terms of machine learning, because features are actually extracted from pixels inside moving windows, the real training and test set should not be considered each as a set of pixels, rather as sets of pixels contained in windows, whose centers are, respectively, in the so-called training and test sets. Some training and test sets may have common pixels. While the reader ought to bear in mind this puzzling oddity, we think that this is not too much an issue, as this experimental setting is used here for comparison purposes. For the sake of simplicity, the training and test sets mentioned in the following sections are described as featured pixels, not sets of pixels.

Texture Features Extracted
The flowchart of Figure 9 shows on its right a list of five features (Data range, Mean, Variance, Entropy, and Skewness), extracted from each of the 2×D windows available at each k-indexed location. These features are defined in the following equations, all computed using only values of pixels inside a given window. They are denoted as f r,d,k when extracted from the smaller window, and g r,d,k when extracted from the larger window, r being a feature-index ranging from 1 to 5.
• f 1,d,k and g 1,d,k are data ranges and defined as the difference between the maximum pixel value and the pixel minimum value.
• f 2,d,k and g 2,d,k are means computed by adding all pixel values and dividing by their number.
• f 3,d,k and g 3,d,k are variances computed by averaging the square differences between pixel values and their means.
• f 4,d,k and g 4,d,k have a strong flavor of entropy and for that reason are referred to as entropy in this paper. Being computed on pixel values and not on their distribution, they should not to be considered as similar to entropy, rather they can be thought of as aggregates of pixel values being nonlinearly transformed with x → −x ln(x), x ∈ (0, 1). Here, x stands for A(i, j, d, k) or B(i, j, d, k). Up to some normalization, they can be approximated to the average absolute differences between pixel values and 0.5. Figure 10 supports this claim by showing on a graph the mapping x → −x ln(x) as a smooth curve and x → 1 2 − 1 2 − x as a triangle function that is fairly similar.
• f 5,d,k and g 5,d,k are estimates of the skewness.
For each sample, the features extracted are recorded as a row-vector with the following order.
where l ∈ {1, . . . , 10D}, s l = l−1 5D , d l = 1 + l−1−5Ds l 5 , and r l = l − 5(d l − 1) − 5Ds l . The row vectors corresponding to samples in the training set are then stacked into a matrix X of size (#I)×(10D), where each row is related to a specific location and each column is a feature defined by its window, its wavelength, and its equation ranging from (5)- (9). A ground truth label c is assigned to each row and stacked into a column vector Y of size (#I)×1.

Classification
The ensemble classifier considered here is an RF. The following three tasks are repeated V times, V being the number of decisions trees to induce.
Rows of X, figuring the training instances, are sampled with replacement. This is equivalent to the left multiplication of a matrix S v of size V×(#I ), where each row contains only one component equal to 1 and its location in the row is evenly drawn from {1, . . . , (#I)}. The assigned labels are also transformed into S v Y. Moreover, a random G-sized subset of the X-columns is selected. This is equivalent to the right multiplication of F v of size (10D)×G defined as the identity matrix deprived of a random set of 10D − G columns. The labels remain unmodified. Finally, (S v X F v , S v Y) is used to train a decision tree using the Classification And Regression Trees (CART) methodology. The trained decision tree h v maps any row-vector of size #I into an integer c ∈ {1, . . . , C}, C being the number of classes.
The predictions of V decision trees are fused according to the majority rule.
where x is a row vector, and 1("statement") is equal to one when "statement" is true and zero if not.

Measurements of Feature Importance
As side information, the training of decision trees brings us measurements of feature importance. Let us consider the v-indexed decision tree trained with (S v X F v , S v Y) and denote N v the number of its internal nodes (including its root node). Considering a specific node indexed by n, from the previous splits, there remains a set of samples and labels which are rows of S v X F v and S v Y, the set of these row indexes is denoted K v,n . Components of S v X F v and S v Y are denoted as x v k and y v k . At this node, samples are split upon a feature f v,n ∈ {1, . . . , 10D}, which is randomly selected without repetition and a cutting point s n , defining two complementary subsets.
Splitting modifies the class distribution p v,n,c into p − v,n,c and p + v,n,c . These class distributions are associated to, respectively, K n , K − n , and K + n .
The importance of a feature f v,n used in node n is here the amount by which splitting reduces entropy.
The relative amount of samples dealt by each node is used as weighting when averaging Θ v,n The measures of feature importance are defined as where Θ is a normalization factor here defined as Note Θ( f ) does not take into account to which class each sample is assigned, it assumes that at each leaf, the class predicted is the most likely. The rationale is that a decision tree makes more reliable predictions when training instances reaching a leaf are more often belonging to the same class. Moreover, entropy is precisely measuring that "purity". As exemplified in [48], we use feature selection to improve performance and help to prevent the "curse of dimensionality". As compared to principal component analysis, which is generally used for dimension reduction, the Θ-driven feature selection is expected to be more effective when discriminating between classes. We select the G most important features and train a new model using only those G features, G being a new parameter. To select those features, they are first ranked by increasing order of importance as measured by Θ, and then the first G features are selected.
where f 1 , f 2 , . . . , f 10D are the features in ranked order, and F is the selected features. The selection of Ψ features is equivalent to the right multiplication of a matrix F Ψ obtained by removing of the (10D)×(10D)-identity matrix the following columns f Ψ+1 , . . . , f 10D . The new ensemble classifier is trained as in Section 3.3, but using (X F Ψ , Y) instead of (X, Y).

Precision Evaluation
Three classical evaluation metrics are used here to test a classifier h on a test set (x k ) k for which the true labels are known and denoted (y k ) k .
• OA: The Overall Accuracy measures the true prediction rate.
where #T is the number of test samples. • κ: The kappa-statistic is concerned with the overall accuracy.
• pcA(c): The per-class accuracy measures the prediction rate when testing only samples of class c.
and #T c is the number of test samples belonging to class c.
where S p is the area covered by a pixel.

Parameter Setting
Two parameters need to be set in RF classifier: the number of decision trees to be generated (Ntree) and the number of variables to be selected and tested for the best split when growing the trees (Mtry). In this study, the default value of 100 for Ntree was used, and the Mtry parameter was set to the square root of the number of input variables. For comparisons purposes, five learning techniques, M 1 , M 2 , M 3 , M 4 , and M 5 , are trained with the same training set I and tested on the same test set T . Besides, the Blue Mountain Forest is the fire burn area, so differential burn ratio (dNBR) is added to classify the area in this study.  Table 3 shows the OA (overall accuracy) and κ (Kappa's statistic) for all six methods M 0 , M 1 , M 2 , M 3 , M 4 , and M 5 when tested on Nezer Forest and Blue Mountain Forest. The different methods are nearly always ranked in the same order when measured using OA or κ and when applied to both datasets. An increased performance is observed when using texture features: M 2 , M 3 , M 4 , M 5 , and M 0 are better performing than M 1 . With respect to the size of the window, performance appears increasing up to a threshold and then decreasing. The threshold seems to be 7×7 for Nezer forest and 5×5 for Blue Mountain forest, as among {M 2 , M 3 , M 4 }, and M 4 is best performing on the first dataset and M 3 is best performing on the second dataset. Using a composite window proves to be successful on both datasets when compared to using a window having the best performing size, as M 5 is increasing the overall accuracy by 0.5% when compared to M 4 on the first dataset, and also when compared to M 3 on the second dataset. Performance appears to be further increased when the number of features is reduced. Indeed, when comparing M 0 with M 5 , the overall accuracy remains unchanged on the first dataset and is increased by 0.5% on the second dataset. Considering the popularity of dNBR in fire monitoring, it is used to improve classification performance for Blue Mountain Forest data in this study. The corresponding results are shown in Table 3. When dNBR is combined with spectral or texture features for classification, there is no glaringly obvious effect on the improvement of classification performance.

Overall Accuracy Results
The feature importance measurements obtained on the two studied areas are, respectively, shown in Figures 11 and 12. In Figure 11, the twenty-first features are computed with the smaller window, and the next twenty with the larger window. Moreover, within each twenties, the first five are data range, mean, variance, entropy, and skewness, computed using the red wavelength (630-690 nm), and in the same way the following features are, respectively, accounting for the green (520-600 nm), blue (450-520 nm), and near-infrared (NIR) (760-900 nm) wavelengths. For Nezer forest, the mean appears to be more important in these five types of features. However, with two significant exceptions: skewness computed by the green wavelength, aggregated with the smaller window and variance captured at the blue wavelength using the larger window seem to be more crucial than mean. Among texture features, entropy seems to be least significant in most wavelengths. Entropy is a measure of the amount of information in an image, which indicates the degree of nonuniformity of the texture in the image. When the pixel values in the window are highly random and dispersed, the entropy value is very high. In this study, the entropy of the Nezer Forest is extracted by windows (7 × 7 and 9 × 9), which have a small number of pixels. Therefore, entropy has a low contribution to classification.
In Figure 12, the 65 first features are computed with a smaller window, the other 65 features with a larger window. Within each set of features, the first five are data range, mean, variance, entropy, and skewness computed using the coastal aerosol wavelength (442 nm). Similarly, the following features are, respectively, accounting for the blue (492 nm), green (560 nm), red (664 nm), vegetation red (704, 740, and 783 nm), near-infrared (NIR) (832 nm), and subsequent infrared wavelengths (86 nm, 945 nm, 1373 nm, 1614 nm, and 2202 nm). Moreover, dNBR's feature importance value 0.3 is shown at the bottom of the figure. For the Blue Mountain Forest, the results show that the relative importance of features to classification is similar to the Nezer Forest. Compared with other features, the mean has a greater contribution to classification in most wavelengths. With the significant exception: skewness computed by the NIR wavelength, aggregated with the larger window makes the greatest contribution to classification.
Besides, for the two sites, whether features are aggregated with the smaller window or with the larger window, seems to have little or no effect on feature importance. Interestingly, greater importance seems assigned to features computed with intensities captured at the NIR wavelength and to a lesser extent at the red wavelength. Some similarities results shown in Figure 5b by [13] indicating that canopy reflectance has much higher values at such wavelengths. The full agreement should not be expected as higher values of reflectance does not necessarily imply higher discriminative power.    Tables 4 and 5, respectively, show pcA (per-class accuracy) and pcAr (per-class area accuracy) achieved by the six models when tested on Nezer Forest and Blue Mountain Forest. Similar trends are displayed with pcA and pcAr; this is no surprise as these metrics are defined with proportional equations in (21) and (22). With respect to Table 3, very similar model comparisons can be drawn, validating our proposed approach: performance is increased when considering texture features when increasing the size of the window up to a threshold, when using a composite window and when appropriately selecting features. Per-class metrics deliver slightly more subtle information: on both datasets, it seems that the optimal window size to use when computing texture features is 5 × 5 for D-samples and 7 × 7 for N-samples; greater accuracy is displayed for classes for which more training samples are available as compared to the number of test samples, and for those classes, the proposed approach is comparatively showing less increased performance. For Nezer Forest, when using the composite windows (M 5 ), the accuracy of no-damaged is increased by 1.02% compared to M 4 , and the accuracy of damaged is improved by 0.11% compared to M 4 . For M 0 , the accuracy of no-damaged is improved by 0.32% and the accuracy of damaged is increased by 0.28% compared to M 5 . In this study area, the damaged area is 28.48 km 2 , nearly half of the total area. For the Blue Mountain Forest, when the composite window (M 5 ) is used, the accuracy of no-damaged and damaged are, respectively, improved by 0.03% and 0.99% compared to the optimal single window. Compared with the M 5 , the accuracy of no-damaged and damaged obtained from M 0 are, respectively, increased by 0.01% and 0.82%. The damaged area obtained from M 0 is nearly 22.75 km 2 , about half the size of this study area. Besides, when dNBR is combined with spectral or texture features, the classification accuracy improvements of the methods (M 1 , M 2 , M 3 , and M 5 ) are not very obvious. More insight is needed to further verify these observations, and to investigate their possible causes.   Tables 3-5, for three figures, the classification performance is improved from the first figure to the last, among which the last one has the highest classification accuracy and relatively few wrong pixels. Moreover, the color maps obtained by the proposed method (Figures 13f, 14f, and 15e) are more clear and smooth. Unfortunately, the increases in performance as measured by some accuracy metrics do not translate here in maps easier to interpret, an issue exceeding the scope of this paper.

Discussion
In this study, we propose a novel texture feature extension method based on the composite windows for the forest disaster monitoring. As a supplementary benefit, the training of random forest provides measurements of feature importance which are used to features selection. The results of the two study areas demonstrate the effectiveness of the proposed method.
(1) In this study, five types of texture features are computed by different sizes of windows. The results of Tables 3-5 show that compared with spectral features, texture information contributed to the increased accuracy in the classification of forest detection. This is because the texture features of an image include vital information about the spatial and structural information of objects. In the traditional pixel-based method, pixels are separately classified according to their digital values, but spatial concepts or contextual information are not contained [49]. In this case, the misclassification rate is usually high due to (1) similar spectral features of some classes, and (2) the existence of mixed pixels located at the border between classes. In our study, by incorporating texture features in the classification, different substances with the same spectral features can be distinguished efficiently, and higher classification accuracies can be attained. The studies of Jiang et al. [25] and Kulkarni et al. [27] also demonstrated that texture information has a huge impact on forest disturbance monitoring. (2) As described in [50], the size of the window is extremely important for the texture features extraction. To find the optimal single window size, different window sizes, including 3 × 3, 5 × 5, and 7 × 7, are used to calculate texture information in our study. The RF classifier is applied to these features derived from three window sizes. The OA and kappa coefficient of the two study areas are displayed in Table 3. This table shows that the highest kappa value and the overall accuracy were obtained at texture window sizes of 7 × 7 in the Nezer Forest. Murray et al. [32] and Puissant et al. [34] also used the window sizes of 7 × 7 to extract texture features, which proved that this size has advantages for the accurate extraction of texture information. The situation in Blue Mountain Forest is a little different from that in the first area. The significance results show that the overall classification performance is increased from 3 × 3 to 5 × 5. Moreover, the performance had a slight dip at a window size of 7 × 7. Because of the different resolution of images and objects, the textures of the degree of thickness are different. Therefore, the optimal window sizes of these two research areas are different. (3) Although the texture features extracted by a single window contribute to the classification, this method does not adequately consider the scale of different objects. It is of great significance to compound windows of different sizes to cover targets of different sizes. Considering the help of texture information extracted from different optimal windows to improve classification performance, two windows (5 × 5 and 7 × 7) with optimal performance are combined in this study. When using the composite window, the classification performance is superior for both study areas. The OA, kappa coefficient, and the pcA have been significantly improved. The main reason is that the composite window not only contains more information, but also can find out precise localizations of boundary edges between adjacent regions. (4) For the RF classifier, the number of decision trees (Ntree) is set to 500 in many studies, because the errors stabilize before this number of classification trees is achieved [51]. Other researches have also obtained good classification performance by using different values for Ntree such as 100 [52]. In our study, the default value of 100 for Ntree is used for all the methods, and the experimental results show that the OA is over 97% for two study areas. Increasing the value of Ntree to 500 does not greatly improve the classification accuracy, but increases the training time. (5) In this study, five types of texture features-data range, mean, variance, entropy, and skewness-were computed for all bands. Using all texture features in the RF classifier may not be desirable because of feature redundancy. Some studies show that the feature selection not only reduces classification complexity, but also can enhance classification accuracy [53,54]. In our study, the importance of variables showed in Figures 11 and 12 was obtained by RF. To reduce the feature dimension, we select some features for training according to these two figures. In this study, we used two datasets to test the effects of different variables' importance thresholds on classification performance. Compared to using all the features, the classification performance is enhanced by using the selected features. Genuer et al. also demonstrated that utilizing RF for variable selection is an effective method [55]. In future studies, we will test the impact of more diverse importance thresholds on classification performance. Moreover, more fire and hurricane data need to test the proposed algorithm. (6) In this paper, the proposed method is tested on two small data sets. When dNBR is combined with spectral or texture features, the classification accuracies of the methods(M 1 , M 2 , M 3 , and M 5 ) are improved a little. Perhaps the research area range is the main reason for the phenomenon. Therefore, a larger study area will be considered to test the effectiveness of the proposed algorithm in our future work.

Conclusions
In the context of monitoring forest changes using multispectral images and more specifically texture features, we have investigated two propositions and evaluated them on two multispectral datasets with publicly available ground truths, namely, Nezer Forest and Blue Mountain Forest. The first proposition is the use of two windows, one larger surrounding the other, as an alternative to the classical use of a single sized window, whose size is to be precisely estimated. The discriminative ability of the computed texture features is examined by feeding them into a Random Forest ensemble classifier known to be a performing classifier. The second proposition is to use measurements of feature importance for feature selection, as these are available when training the ensemble classifier, not needing increased computation time. Classifying results show that both propositions have provided an increase in the classification performance when measured with overall accuracy, kappa's statistic, and per-class accuracy. In the future, a larger range of forest disturbance data is needed to verify the effectiveness of the proposed method. Besides, we recommend extending and testing our approach to other ecosystems, such as the agricultural ecosystem.
Author Contributions: Y.Q. and W.F. conceived and designed the experiments. X.Z. performed the experiments and wrote the paper. W.F. and G.D. revised the paper. L.G. and M.X. edited the manuscript. All authors reviewed and approved the final manuscript.