Mapping Spatial Distribution of Larch Plantations from Multi-Seasonal Landsat-8 OLI Imagery and Multi-Scale Textures Using Random Forests

The knowledge about spatial distribution of plantation forests is critical for forest management, monitoring programs and functional assessment. This study demonstrates the potential of multi-seasonal (spring, summer, autumn and winter) Landsat-8 Operational Land Imager imageries with random forests (RF) modeling to map larch plantations (LP) in a typical plantation forest landscape in North China. The spectral bands and two types of textures were applied for creating 675 input variables of RF. An accuracy of 92.7% for LP, with a Kappa coefficient of 0.834, was attained using the RF model. A RF-based importance assessment reveals that the spectral bands and bivariate textural features calculated by pseudo-cross variogram (PC) strongly promoted forest class-separability, whereas the univariate textural features influenced weakly. A feature selection strategy eliminated 93% of variables, and then a subset of the 47 most essential variables was generated. In this subset, PC texture derived from summer and winter appeared the most frequently, suggesting that this variability in growing peak season and non-growing season can effectively enhance OPEN ACCESS Remote Sens. 2015, 7 1703 forest class-separability. A RF classifier applied to the subset led to 91.9% accuracy for LP, with a Kappa coefficient of 0.829. This study provides an insight into approaches for discriminating plantation forests with phenological behaviors.


Introduction
Natural forests have been destroyed for more than a century worldwide due to human activity, and plantation forests are planted to meet the demand of timber as a substitution for natural forests [1].Accounting for 36% of the country's total forested area, China possesses the largest area of planted forests (PF) in the world [2].Larix spp. is one of the most important planted timber tree species.Since the 1950s, about 3.78 million ha of larch plantations (LP) have been planted in North China, which provides essential ecosystem services, including timber production, water resource conservation and carbon sequestration [3].With a continuing increase in planted LP area, some problems have been reported regarding LPs such as the poor natural regeneration capability [4,5], and a decline in soil fertility [6,7].The knowledge about spatial patterns of LP is of prime importance to forest management, monitoring programs and ecological services functional assessment, which services for the strategic goals of managing plantation forests [1,8,9].Nevertheless, due to the extensive area and long-period of planting LPs in North China, its spatial distribution pattern remains unclear.
Remote sensing is particularly useful for forest mapping, as it provides a large coverage at relatively high levels of detail [10][11][12][13][14]. Nevertheless, for specific tree species during mono-temporal stage (e.g., growing season), an accurate remote sensing-based classification is still particularly challenging, as the spectral signal from trees may be very hard to distinguish from one another.Multi-seasonal remote sensing images are used to introduce temporal variability of objects as a feature to increase class-separability [15][16][17].In the process of global land-cover mapping, for example, Gong et al. found multi-temporal Landsat images useful in investigating land-cover classification [18].This multi-temporal images-based classification has been applied to map land cover [17,19], wetlands [16] and forest biomass [20].Although this idea has received increasing interest, the multi-seasonal imagery-based classification that allows for the separation of different forest types, is still lacking.
The spatial variability of vegetation cover patterns can be incorporated into the mapping process to distinguish objects more effectively [17,21].Image texture, which describes the visual spatial variability, has been widely used in identifying vegetation types [22,23].Various texture measures aiming to quantify smoothness, symmetry, regularity, etc., of an image have been developed based on the grey level co-occurrence matrix (GLCM) and geostatistical functions [24].Among these texture approaches, the geostatistical approach derived from variogram functions cannot only be calculated for a single image band (univariate mono-seasonal texture), but for a set of bands which describe the covariances between bands of an image for a time series representing different seasons (bivariate multi-seasonal texture).It is especially useful in certain situations where the seasonal differences between the forest types are important for class discrimination.Larch is a deciduous and coniferous tree species, which has remarkable seasonal changes.In summer, spectral signal from coniferous forest is different from broadleaf forest; in winter, larch shed their leaves and it hereby can be discriminated from the other evergreen coniferous tree species.Furthermore, most of LPs in China are a single species monoculture [5].These multi-seasonal textures measures may help to extract the seasonal dynamic of pure forests more accurately.However, little attention has been paid to the phenological texture-based classification of forest types.
Although a number of input variables derived from multi-seasonal spectral data and textures can identify features of forest types and improve classifier accuracy, this may lead to the concomitant curse of dimensionality which may be generated due to the application of various transformations to the original satellite images and hinder the expected increase in accuracy related to the inclusion of redundant features [25].Two methods were used to address this problem: Selecting a robust classifier capable of handling a large number of variables, or selecting only the most informative ones by evaluating individual input variables [17,26].Random Forests (RF) is an ensemble learning algorithm that has been documented as an excellent performer for the analyses of many complex remote sensing datasets [27][28][29].It exhibits many desirable properties, including high accuracy, processing thousands of input variables, integrated measures of variable importance, and so on [30,31].The RF-based variable importance measure can be applied to reduce data dimensionality and further improve classifier efficiency.The previous studies have reported on the RF-based classification approaches with multi-seasonal imageries for land cover mapping [17,19].However, these studies mostly focused on specific seasons (e.g., spring and summer).For a given forest type discrimination, a time series data set throughout the growing cycle of trees can efficiently identify phenological behaviors of different forest types, thus it may be quite applicable for LP discrimination.Additionally, although the previous studies examined the RF-based importance assessment of remote sensing variables for land cover classification [17,32], the variable applicability for forest type classification is uncertain.
The objectives of this study are to (1) assess performance of the RF learning algorithm in the discrimination of LP; (2) quantify the importance of input variables; (3) map spatial distribution of LP at a local scale.To achieve these goals, multi-seasonal (spring, summer, autumn and winter) Landsat-8 Operational Land Imager (OLI) imageries, texture models and ancillary data were employed to adopt a RF-based feature selection (FS) method.Based on the evaluation of these input variables, a RF classifier was developed to map LP distribution.

Study Area
This study was conducted at the Saihanba Forestry Center (SFC) in Hebei Province, Northern China (116°52′E-117°39′E, 42°04′N-42°36′N; ca.93,000 ha; Figure 1).It is located in the transition between the Inner Mongolian Plateau and North Hebei Mountain, with an elevation ranging from 1042 m to 1936 m.The climate features semi-arid and semi-humid, with a short growing season of May to September.Annual mean air temperature and precipitation were −1.2 °C and 530 mm, respectively.SFC consisted of six sub-forestry centers.Since 1960s, SFC has planted over 74,000 ha of planted forest, which is the largest planted forestry center in China; at present the forest cover of SFC reaches as high as 80%.The main planted forest types are Larix principis-rupprechtii plantations (LP), Pinus sylvestris var.mongolica plantations (MP), Picea asperata plantations (AP) and Pinus tabulaeformis plantations (PP); the main natural secondary tree species is Betula platyphylla forests (BF) and rare hardwood deciduous forests (HDF).SFC is a typical area of plantation forests in Northern China, with forest types consisting of coniferous and deciduous tree species; thus it is a suitable location for investigating LP mapping.

Data Acquisition and Preprocessing
Remotely sensed imagery used for this study is Landsat-8 OLI, which is a new sensor of the Landsat series.Landsat-8 extends the Landsat series records and has enhanced capabilities including new spectral bands, improved sensor signal-to-noise performance and associated improvements in radiometric resolution, etc., [33].In this study, six bands of Landsat-8 were utilized, including OLI2 (blue, 0.45-0.51μm), OLI3 (green, 0.53-0.59μm), OLI4 (red, 0.64-0.67μm), OLI5 (near infrared, 0.85-0.88μm), OLI6 (shortwave infrared, 1.57-1.65 μm) and OLI7 (shortwave infrared, 2.11-2.29 μm).Geometric correction was performed by approximately 50 ground control points to reduce the error to less than 15 m for the 4-scene Landsat OLI images.Then, atmospheric correction of the images was performed using the Fast Line-of Sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) software package in ENVI 5.0.
Accurate and sufficient training plot data are crucial for a supervised classification.A reference dataset was obtained from forest inventory data of SFC in 2011.The ground data derived from two field surveys in 2013 and high-resolution aerial imagery (0.5 m) from 2012 were applied to verify the reference dataset of SFC; a few polygons (ca.5% of the total area) were adjusted to obtain an improved ground reference dataset.Then, a random sampling approach was directly used with the improved forest map of SFC.A total of 9909 homogeneous areas were selected to identify forest types to train and test the classifier (Table 1).Each site was confined by a 90 m by 90 m homogeneous area, following recommendations by Congalton and Green [34].Additionally, the forest map of SFC was also employed as a base forest map for exclusion of the other land covers.Plantation forests tend to be planted in a flat area, while natural forests distribute in a rugged terrain.Considering the topographic effects on the distribution of forest types, three topographic features were used as input ancillary variables for the RF classifier.Altitude was derived from Shuttle Radar Topography Mission (SRTM) digital elevation models (DEM) raster with a spatial resolution of 30 by 30 m. Slope and aspect were generated from the DEM.

Textural Analysis
Image texture carries useful information for discriminating forest types.The multi-scale textural features which combined geostatistical texture and GLCM texture, were calculated from Landsat-8 multi-seasonal OLI bands (band 2 to band 7) to introduce textural variables into the LP discrimination.It should be noted that selection of the window size is a practical problem, depending on the spatial resolution of the image and the characteristics of the land cover [17,24,35].The proper window size can ensure obtaining a robust textural estimator.Rodriguez-Galiano et al. [17] suggested that a small window size provide a more representative description of the most heterogeneous environments with high local variance, while larger window sizes may provide an accurate representation of a homogeneous pattern of spatial variability of large areas.By referring to the previous studies and considering patch size of forest in SFC, three different window sizes were tested, including 5 × 5 pixels (150 × 150 m), 9 × 9 pixels (270 × 270 m) and 13 × 13 pixels (390 × 390 m).

Geostatistical Texture
The geostatistics approach is a textural analysis tool used to measure spatial variation (e.g., spatial autocorrelation) in remotely sensed data [36].Implementation of variograms is one of the most promising techniques of geostatistics and it has been widely used in image texture characterization [37].It can be applied not only to calculate the texture contrast within a single spectral band (univariate analysis) but also to describe the relationships between pairs of spectral bands (bivariate analysis) [32].For the latter, the bivariate analyses can be performed with two bands of mono-season or same band of coupled seasons.In order to identify characters of LP in SFC, three geostatistical measures (variogram, madogram and pseudo-cross variogram) were applied to mono-seasonal and multi-seasonal images.The direction of variogram computation and the lag distance h are the two important parameters affecting texture values.In this study, the four main directions (N-S, E-W, NE-SW and NW-SE) were averaged to produce an omnidirectional variogram texture [36,37].Because different lag distances have a limited effect on classification results [17,32], one lag (h = 1) were selected for the three window sizes.More detailed descriptions about variogram functions and definitions of parameters can be found in previous studies [36][37][38].All the prediction variables derived from geostatistical texture were showed in Figure 2.

GLCM Texture
GLCM is an alternative texture analysis tool, which is widely used to describe the specific textural characteristics of an image [39].This texture features calculated from a matrix containing relative occurrence frequencies gray level (digital numbers) of pairs of pixels at a fixed relative position in an image [22,32].Similarly, some parameters should be defined to process the calculation of GLCM texture.For explanation of the GLCM parameters (window sizes, lag distance and orientation), refer to the definition of geostatistical textures (Section 2.3.1).There are a variety of GLCM measures employed in current studies.In order to avoid correlation between textural features as well as reduce data dimensionality, we referred Coburn et al. study and selected the three GLCM measures [39].The defined parameters and texture features of GLCM are shown in Figure 2.

Random Forests Classifier
The machine learning technique RF is an improved version of the Classification and Regression Tree that can be viewed as an ensemble of individual tree-like classifiers [27,40].RF algorithm shows neither sensitivity to noise nor overfitting than other classifiers based on bagging or boosting [32,41].It can handle a thousand of input variables and evaluate their importance in classification [27].In this study, a RF classifier was used to discriminate LP from the other forest types and estimate the importance of 675 input variables.
In a process of RF classification, two basic parameters are required to generate a prediction model, the number of trees in a forest (ntree) and the number of prediction variables for use at each split to grow a decision tree (mtry).Although Breiman [27] suggested that adding more trees to RF model does not induce over-training, redundant trees can introduce the increase of computational time.For a number of prediction variables, its reduction causes a less robust individual tree, while reduces the correlation between trees, which increases the classifier accuracy [42].Therefore, the parameters of the RF model should be optimized for obtaining a high-efficient RF classifier.Considering computation time of RF, ntree is set to the upper limit of 1000.A number of RF models based on possible values of ntree and mtry were created and evaluated to analyze the effect of the two parameters on classification accuracy.The optimal parameters would be applied to improve the RF classifier efficiency.The ground reference data (Table 1) was divided randomly into 70% and 30% for training and testing of RF model, respectively (Section 2.2), which the number of training sites per class was kept roughly equal [17].A confusion matrix was generated to assess classification accuracy.

Feature Selection
The three kinds of data which derived from original satellite spectral bands, textural measures and topographic data produced 675 predictive variables for RF (Figure 2).The original spectral variables are informative and often been directly used in remote sensing-based classification [26,43].Additionally, in order to identify the monoculture and phenological behaviors of LP, two textural methods (GLCM and geostatistical texture) were employed.Since textual measures with multiple parameter combinations (e.g., window size, direction or lag) can dramatically increase variables, many efforts have been carried out to reduce the data dimensionality (Section 2.3).
The textural variables may increase the classification accuracy, but these variables may highly correlate sometimes as well as generate an excess of computational time and the "curse of dimensionality" [25].It is important to know how each predictive variable influences the RF model and further identify the efficiency of variables for LP discrimination.An assessment of variable importance produced by the RF model can help to select the more effective variables with little reduction in classification accuracy.In this process, by random permuting of values of the variable in the out-of-bag (OOB) samples, the increase of estimation error for the modified and original OOB data is measured for determining the variable importance [44].Based on the importance assessment, the optimal subsets of variables were selected to produce the high-efficiency RF model, which is applied to mapping LP at SFC.All the input predictive variables of RF were exhibited in Figure 2. The RF model was performed using statistical software R 2.15.2.

Performance of Random Forests Classifier
The 675 input variables generated by the multi-seasonal spectral data, two textural measures and topographic data, were used in RF classification to discriminate LP from the other forest types.The accuracy of the RF classifier for all forest types is 91.0%, with a Kappa coefficient of 0.834 (The mtry and ntree of the RF classifier were assigned to 15 and 1000, respectively).To assess the effect of input parameters (mtry and ntree) on classification accuracy, numerous RF classifiers were created for various mtry and ntree.As shown in Figure 3, when mtry is greater than 5, its effect on the overall accuracy of RF classifier is very limited.On the other hand, the overall accuracy increased with an increasing ntree; but this increasing trend is rather weak after 100 trees were grown.

Importance Measure of Variables
The two textual measures were employed in RF classification directly together with the multi-seasonal spectral data.The result indicates that the classification accuracy difference derived from separate season variables was very minor (Figure 4a).Furthermore, Figure 4b shows the accuracy of RF classifiers using summer bands (main growing season), multi-seasonal bands and all variables, respectively.Comparing the accuracy of RF classifier with summer bands, the inclusion of the extra three season spectral bands produced an increased accuracy of 4.06% for all categories and 3.06% for LP (an increasing Kappa coefficient of 10.19), respectively.Furthermore, adding textural variables to RF model of multi-season spectral bands increases an accuracy of 2.92% for all categories and 1.86% for LP (an increasing kappa coefficient of 3.02), respectively.When the three classes of subsets (spectral bands, GLCM and GT features) were considered in isolation, the subsets of GT features produced the highest accuracy of 90.33% and 91.15% for all categories and for LP, respectively, with a Kappa coefficient of 0.817; the importance of spectral variables and GLCM variables are relatively lower (Figure 4c).The importance of the three types of variables determined by the RF classifier is shown in Figure 5.It is clear that the importance of multi-seasonal spectral bands was greater than the variables of GLCM and GT (p < 0.05, Figure 5a).Nevertheless, if the most ten effective variables of each subsets were considered, multi-seasonal spectral bands as well as GT measures produced more important variables for the RF classifier (p < 0.05, Figure 5b).The textural variables derived from GLCM measures resulted in consistently less accurate classifications than spectral bands and GT variables (p < 0.05).
The importance of each variable regarding spectral bands, seasons, textural measures and window sizes was assessed in the same way (Figure 6). Figure 6a lists the efficiency of multi-seasonal spectral bands for LP.Several spectral bands appeared to be much more important in summer, autumn and winter than for the rest of the spectral bands, although these important spectral bands varied in different seasons.The bands located in green (OLI3) and red (OLI4) of summer and autumn and those located in shortwave infrared (OLI7) in winter had more predictive power.The difference in importance of spring bands was relatively small.As can be seen, the pattern of important variables for all categories were roughly similar as that for LP, and the most important variables appeared in shortwave infrared-band of winter, red-and green-band of summer and green-band of autumn (Figure 6b).Various textural measures and window sizes also affected classification accuracy.Figure 6c,d show the importance of textual variables calculated by GLCM and GT measures.In general terms, it was clear from these figures that the importance of bivariate variables derived from PC was greater than that of other univariate variables for LP (p < 0.05).The difference of importance among five kinds of univariate variables with the three window sizes was not significant for both LP and all categories.For bivariate variables, the most relevant window size was 5 × 5 pixels of PC for LP (p < 0.05); however, the effect of window sizes on variable importance was not significant for all categories (p > 0.05, Figure 6d).

Mapping LP by Feature Selection of Random Forests
Based on variable importance described in Section 3.2, we attempt to select the most important variables for a RF classifier with little decrease in classification accuracy.Figure 7 illustrates the changes in accuracy and Kappa coefficient for RF models in which the least informative variables were removed gradually.As shown from this figure, the Kappa coefficient and classification accuracy of all categories and LP experienced similar trend, and a well-marked turning point was found (93%).The results shows that the accuracy fluctuated with no significant trend before the turning point (p > 0.05); however, after this turning point, a dramatic decrease in classification accuracy could be observed (R 2 = 0.78, p < 0.05).

Figure 7. The effect of variable reduction on classification accuracies.
Based on FS of the RF model, a subset of the most informative variables (47 variables) was applied to a RF model to mapping distribution of LP in SFC (mtry and ntree were assigned to 3 and 200, respectively).The confusion matrix (Table 2) shows that the classification accuracy was similar to the accuracy of RF classifier based on the overall variables (675 variables).The user's accuracy was 91.9% for LP, which was higher than the other main forest types (Betula platyphylla secondary forests and Pinus sylvestris var.mongolica plantations).Comparing with the user's accuracy of LP, its producer's accuracy is marginally higher (94.8% vs. 91.9%).The overall accuracy is 90.7% with a Kappa coefficient of 0.829.Finally, this classifier was applied to produce the LP distribution of SFC.The mapping results were shown in Figure 8. LP had the widest distribution in SFC (47,176 ha, 62.0% of the total forest area), followed by BF (21,094 ha, 27.7% of the total forest area) and MP (7642 ha, 10.0% of the total forest area).The pixels classified represented over 99% of SFC forest area for just these three categories.Less than 1% pixels were classified as AP, PP and HDF (162 ha, 0.2% of the total forest area).Table 2. Confusion matrix of the RF classifier for the 6 forest types using validation samples.The total accuracy was the percentage of correct-classified samples in total testing samples.LP, Larix principis-rupprechtii plantations; MP, Pinus sylvestris var.mongolica plantations; BF, Betula platyphylla secondary forests; AP, Picea asperata plantations; PP, Pinus tabulaeformis plantations; HDF, hardwood deciduous secondary forests.

Importance of Input Variables
The 24 multi-seasonal spectral bands improve the classification accuracy substantially, suggesting these original variables carry lots of information about multi-temporal characteristics of different forest types.For spring, summer and autumn, bands at visible wavelengths showed higher importance; however, the near-infrared band importance, which is a key band for vegetation discrimination, is relatively weak.The previous study reported the high efficiency of the near-infrared band in land cover classification [17,32], because it can differentiate vegetation from the other land cover types.The object of our study is forested area (other land cover is excluded), which may be a main reason causing the low importance of near-infrared band in classification.Although the importance of the mono-seasonal near-infrared band is low, its multi-seasonal variability (textural variables calculated by PC) showed a significant importance (Section 4.2).Furthermore, shortwave infrared (OLI7) in winter is the most important of the all multi-seasonal spectral bands.The difference of water content in the canopy and soil among the main forest types may help to explain the high effectiveness of shortwave infrared [45].
The effect of window sizes on textural variable importance differed between univariate texture and bivariate texture.For univariate textural variables, the effect of window sizes on variable importance was not significant; for bivariate textural variables, however, smaller window size outperformed significantly the larger ones (for LP).Generally, larger windows sizes may provide a more accurate representation for homogeneous area [17].In this study, the bivariate texture tended to describe temporary correlation between the same bands for coupled seasons (multi-seasonal PCs).The partial temporary variability is probably ignored in a large window, whereas a smaller window size may be more representative.Therefore, a small window size should be suggested to obtain a more accurate representation for bivariate textural measures.

Feature Selection of the most Important Variables
The classification results, which were derived from the subset of the 47 informative variables, suggest that RF-based FS is a reliable approach for reducing redundant variables and improving the classifier efficiency.The subset of the "best" variables included 7 spectral bands, 19 multi-seasonal PCs and 21 multi-band PCs.Although topographic variables were not ranked in this subset, they were high on the list of variable importance; none of univariate textural variables calculated by the two GT (variogram and madogram) and the three GLCM functions (contrast, entropy and second moment) were selected.Previous studies have also reported the good performance of PC for land cover classifications [17,32,37].Synthetically, these results imply that PC textural measures may have more extensive potential for use in remote sensing-based mapping.
In the subset of the "best" variables, the variables derived from PC textural measures have the greatest influence on class-separability, followed by spectral bands.The most important textural variable was the PC textural variable calculated between the blue and green visible bands of autumn, and that calculated between summer and autumn near-infrared was the most important multi-seasonal textural variables.The near-infrared band was also the most frequent variable appearing in the subset of multi-seasonal textural variables (9 times), although the importance of mono-seasonal near-infrared band is relatively low.Since the near-infrared band is very sensitive to green biomass of forests [33,46], it can cause greater inter-class discriminations with temporal variability in different forest categories.Furthermore, in order to enhance the separability of LP, a time series data set throughout growing cycle of LP was highlighted in this study.We found that the summer variables are the most informative, followed by autumn and winter.The textural variables derived from summer and winter were also the most frequent multi-seasonal PC, probably because of characteristic phenology of forests in SFC, which mainly consisted of LP (deciduous and coniferous forest), MP (evergreen-coniferous forest) and BF (deciduous broad-leaved forest).The variability between growing peak season and non-growing season can maximize the class-separability with a remarkably seasonal behavior [47,48].

The Accuracy and Uncertainty of Random Forests Classification
RF classifiers derived from the different parameters (mtry and ntree) showed minor differences in classification accuracies when certain value ranges were assigned (mtry was greater than 5 and ntree was greater than 100).It provided highly reliable accuracies for forest type classification [43,49].The mapping result indicates that LP and MP mainly appeared in the western part and northern part of SFC, while BF was distributed in the eastern and southern part of SFC.This spatial pattern was related to topography of SFC.LP and MP were probably planted in a relatively flat area, whereas natural forest (BF) distributed in a rugged terrain (Figure 8).
Although an encouraging classification result was obtained, the uncertainty in RF classifier was observed.There is a discrepancy between the producer's and user's accuracy: For LP, the user's accuracy was lower than producer's accuracy; in contrast, for the other forest types, user's accuracy was greater than producer's accuracy.This difference suggests that the map produced by RF model tended to misclassify other forest types as LP.The similar misclassification of the RF model has been also reported by previous studies, but the reasons for the misclassifications differed [15,26].For the present study, the misclassification was probably due to unbalanced proportions of ground samples for different forest types (LP samples accounted for over half of the total ground reference dataset), although a RF classifier can handle unbalanced data well [50].This problem seems to be hardly resolved due to forest spatial patterns of SFC.We suggest that more studies in various areas should be undertaken to examine the applicability of this method.

Conclusions
This study aimed to evaluate the performance of the RF classifier for forest classification and map LP in a typical planted forest area.Multi-seasonal spectral bands as well as textual variables extracted from Landsat-8 OLI were used as input to RF models.The results indicate that using a RF model provided reliable classification results, with minor sensitivity when the certain parameters value ranges were assigned.A RF-based importance assessment was employed.These importance measures show spectral bands and bivariate variables strongly influence forest type-separability in SFC, while the importance of univariate variables is weak.On the basis of importance assessment, a FS strategy was applied to produce a subset of the most important variables.This subset consisted of spectral bands and PC variables.The most effective variable was the PC texture calculated between the blue and green visible bands of autumn.PC texture derived from summer and winter was the most frequent multi-seasonal PC, highlighting that this variability between peak growing season and non-growing season can effectively enhance the forest class-separability.A RF classifier created with the subset of the most important 47 variables produced 91.9% accuracy for LP and 90.7% accuracy for overall forest types, with a Kappa coefficient of 0.829.It should be noted that this classification accuracy was approximately equal to the classification accuracy produced by the RF with all the 675 variables.The main uncertainty of this study, however, would be a tendency of misclassification LP caused by unbalanced proportions of samples.The study provides a reference of predictive variables selection of remote sensing and an insight into approaches for mapping plantation forests with phenological behaviors at a regional scale.

Figure 2 .
Figure 2. Flowchart describing the process of calculating the input variables of RF classifier.The total 675 potential variables consisted of spectral bands, textual variables and topographic data.The six bands of multi-seasonal OLI images and topographic variable (altitude, slope and aspect) derived from DEM were directly utilized as RF input variables.For geostatistical textures (GT), both mono-seasonal (univariate and bivariate) and multi-seasonal (bivariate) images were employed.The mono-seasonal GTs were calculated by single band (variogram and madogram, 24 bands × 2 GTs) and pair of bands (pseudo-cross variogram, 15 band combinations × 4 seasons); multi-seasonal GT was calculated by single band for different combinations of coupled seasons (6 bands × 6 seasonal combinations).All the texture variables were generated by omnidirection (averaged by the four main directions), one lag distance (h = 1) and different scales (window sizes equal to 5 × 5 pixels, 9 × 9 pixels and 13 × 13 pixels).

Figure 3 .
Figure 3.Effect of number of trees and number of random split variables (mtry) on the overall accuracy.

Figure 4 .
Figure 4. (a) Spider charts representing the user's accuracies of RF classifier for the separate season variables; (b) the increase variables; (c) the three separate subsets of variables.The figure includes four forest types: LP, Larix principis-rupprechtii plantations; MP, Pinus sylvestris var.mongolica plantations; BF, Betula platyphylla secondary forests; OT, the other forest types.The tree species of OT are Picea asperata, Pinus tabulaeformis and hardwood deciduous trees.Due to their very low proportion (ca.2%) of forest types in SFC, they are merged as one class in these figures.

Figure 5 .
Figure 5. (a) the importance of three types of variables determined by the RF model; (b) the importance of the most ten effective variables determined by the RF model.Bands refer to multispectral bands; GLCM and GT refer to GLCM variables and GT variables, respectively.Topographical variables (altitude, slope and aspect) were not listed due to their relatively low importance.All of the listed results were averaged by the importance of corresponding bands.One-way ANOVA test was applied to compare the importance among the three subsets of input variables.The importance values (bars) with the different letters indicate significant difference (p < 0.05).

Figure 6 .
Figure 6.Variable importance of the multi-seasonal spectral bands and texture functions.(a) Variable importance of multi-seasonal spectral bands for LP and (b) all categories; (c) variable importance of textural functions and window sizes for LP and (d) all categories.OLI refer to multispectral bands; CON., ENT. and SM.refer to contrast, entropy and second moment, respectively (GLCM textures); VAR., MAD. and PC.refer to variogram, madogram and pseudo-cross variogram, respectively (geostatistical textures).Each value of the multi-seasonal spectral bands repents a single band importance (a,b).The other listed results were averaged importance by corresponding textural measures.One-way ANOVA test was applied to compare the importance of various subsets of input textural variables.The importance values (bars) with the different letters indicate significant difference (p < 0.05).
Forest types map of SFC using a RF classifier.
LP: Larix principis-rupprechtii plantations; MP: Pinus sylvestris var.mongolica plantations; BF: Betula platyphylla secondary forests; AP: Picea asperata plantations; PP: Pinus tabulaeformis plantations; HDF: Hardwood deciduous secondary forests.LP area is 47,176 ha, representing 62.0% of the total forest area; BF area is 21,094 ha, representing 27.7% of the total forest area; MP area is 7642 ha, representing 10.0% of the total forest area; the area of other three forest types area is 162 ha, representing 0.2% of the total forest area.