Bamboo Classification Using WorldView-2 Imagery of Giant Panda Habitat in a Large Shaded Area in Wolong, Sichuan Province, China

This study explores the ability of WorldView-2 (WV-2) imagery for bamboo mapping in a mountainous region in Sichuan Province, China. A large area of this place is covered by shadows in the image, and only a few sampled points derived were useful. In order to identify bamboos based on sparse training data, the sample size was expanded according to the reflectance of multispectral bands selected using the principal component analysis (PCA). Then, class separability based on the training data was calculated using a feature space optimization method to select the features for classification. Four regular object-based classification methods were applied based on both sets of training data. The results show that the k-nearest neighbor (k-NN) method produced the greatest accuracy. A geostatistically-weighted k-NN classifier, accounting for the spatial correlation between classes, was then applied to further increase the accuracy. It achieved 82.65% and 93.10% of the producer’s and user’s accuracies respectively for the bamboo class. The canopy densities were estimated to explain the result. This study demonstrates that the WV-2 image can be used to identify small patches of understory bamboos given limited known samples, and the resulting bamboo distribution facilitates the assessments of the habitats of giant pandas.


Introduction
As an endangered species, giant pandas (Ailuropoda melanoleuca) are threatened by continuous habitat loss and a low birth rate. Giant pandas live in a few mountain ranges in central China, mainly in Wolong, Sichuan Province, where bamboos act as the main food source for wild giant pandas. Estimating and mapping suitable habitat are critical to endangered species conservation planning and policy [1]. As a result, knowledge of the spatial distribution of bamboos becomes important for identifying the habitat of giant pandas. The increasing availability of remotely-sensed data has led to widespread use in habitat mapping. The common approach employed for habitat mapping using remote sensing is land cover classification, combined with ancillary information, such as digital elevation models (DEM) and the water system [2]. There have been ongoing studies for mapping bamboos and other tree species using remote sensing [3][4][5][6][7][8][9]. Most of these studies applied classification over large areas using medium or low spatial resolution imagery, such as Landsat TM/ETM+ [10][11][12][13][14] and MODIS [1], or using multi-temporal data, for example Wide Field Sensor data [3] and hyperspectral data [15].
Sensors 2016, 16,1957 3 of 21 of wild pandas has been greatly shrunken and fragmented, due to agricultural expansions, increasing demands for timber products and infrastructure constructions. After the Wenchuan Earthquake happened in 2008, landslides and mudslides have worsened the situation.
There is a high variation in topography, soils and climate that leads to a diverse flora and fauna in Wolong reserve. Broadleaf forests are dominated by evergreen species below 1600 m and by a mixture of evergreen and deciduous species between 1600 m and 2000 m. Above 2000 m, a mixed coniferous and deciduous broadleaf forest gradually changes to a subalpine coniferous forest around 2600 m. The forest reaches about 3600 m, until it is replaced by alpine meadows. Under forest canopies, evergreen bamboo species dominate the understory layer [50].
The study area is the Wuyipeng research site (Figure 1), which was once a research facility of the giant panda reserve center in Wolong, for researchers' convenient access to the habitat of giant pandas. With the establishment of other giant panda reserves, this site is no longer fully in service. A WV-2 subscene (30 •  Sensors 2016, 16,1957 3 of 22 increasing demands for timber products and infrastructure constructions. After the Wenchuan Earthquake happened in 2008, landslides and mudslides have worsened the situation. There is a high variation in topography, soils and climate that leads to a diverse flora and fauna in Wolong reserve. Broadleaf forests are dominated by evergreen species below 1600 m and by a mixture of evergreen and deciduous species between 1600 m and 2000 m. Above 2000 m, a mixed coniferous and deciduous broadleaf forest gradually changes to a subalpine coniferous forest around 2600 m. The forest reaches about 3600 m, until it is replaced by alpine meadows. Under forest canopies, evergreen bamboo species dominate the understory layer [50]. The study area is the Wuyipeng research site (Figure 1), which was once a research facility of the giant panda reserve center in Wolong, for researchers' convenient access to the habitat of giant pandas. With the establishment of other giant panda reserves, this site is no longer fully in service.  In the Wuyipeng area, there is an uphill route existing from northwest to southeast of the image, and the altitude changes from 1900 m-3450 m. There are mainly two bamboo species in this area: arrow bamboos (Bashania fabri) and umbrella bamboos (Fargesia robusta); and both of them are favored by giant pandas. Most umbrella bamboos grow taller than arrow bamboos, and both are sparsely distributed and covered by tree canopies at an altitude above 2000 m. However, the spectra of these two bamboo species are similar in the WV-2 imagery, and it is difficult to identify small patches of bamboo species using remote sensing techniques without hyperspectral information involved. Therefore, we did not identify tree species in this study, and the land cover types we focused on were classified as bamboo, coniferous, broadleaved, mixed woodland, brush and barren land. In the Wuyipeng area, there is an uphill route existing from northwest to southeast of the image, and the altitude changes from 1900 m-3450 m. There are mainly two bamboo species in this area: arrow bamboos (Bashania fabri) and umbrella bamboos (Fargesia robusta); and both of them are favored by giant pandas. Most umbrella bamboos grow taller than arrow bamboos, and both are sparsely distributed and covered by tree canopies at an altitude above 2000 m. However, the spectra of these two bamboo species are similar in the WV-2 imagery, and it is difficult to identify small patches of bamboo species using remote sensing techniques without hyperspectral information involved. Therefore, we did not identify tree species in this study, and the land cover types we focused on were classified as bamboo, coniferous, broadleaved, mixed woodland, brush and barren land.

Fieldwork
Extensive fieldwork at Wuyipeng was carried out in two field visits. The first one was on 11 June 2014, with the aim of measuring feature points for image geometric correction and collecting training data for classification. The second field visit was on 11-12 September 2014, for the purpose of testing the accuracy of the classification results. In both field trips, a Trimble ® GeoXH™ 6000 handheld GPS receiver (Trimble Mexico S. de R.L., Mexico City, Mexico) was used to collect sample points. An antenna was connected to the GPS with a 2-m centering rod to ensure that the GPS signals from multiple navigation satellites could be received under the canopy of large trees. The GeoXH handheld uses both EVEREST™ multipath rejection technology and H-Star™ technology to provide decimeter (10 cm) positioning accuracy. Finally, eight feature points were measured at the road junctions and the corners of houses, which were then used for geometric correction.
In the first fieldwork, about 300 points were sampled with their locations and vegetation categories recorded. However, since we did not know the precise locations of the image in the first fieldwork, many points fell into shadows after geometric correction, and their spectral information could not be used for training. Thus, these samples were discarded for classification. In the remaining training data, only three points were labelled as bamboos, and four for each of the remaining classes (coniferous, broadleaved, mixed woodland, brush and barren land). Commonly, the training and testing data should be sampled from different areas to make a fair comparison. However, only one route could be accessed to get to the top of the mountain, and there were no other routes in a different area nearby. There were still some small landslides in this area, making the place very dangerous to perform the fieldwork; therefore, the same area was explored in the second fieldwork, and 432 points were recorded as testing data. It should be noted that the shadow class was also used as a category for classification, which can be easily identified from the image. However, since it is not a typical land cover class type, the shadow class was not listed in the classification results.

Classification Methods
Several classification methods were used in the experiment, and a brief review of the related methods is presented in this section.
The Bayesian classification is based on Bayes' theorem. It can predict class membership probabilities and then allocate a pixel to a class based on the maximum a posteriori decision rule.
The support vector machine (SVM) classifier is a supervised learning model with associated learning algorithms that analyze data and recognize patterns used for classification and regression analysis. Given a set of training data, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new data into one category or the other, making it a non-probabilistic binary linear classifier.
The classification and regression tree (CART) classifier is a non-parametric method, and its main idea is to recursively partition the data into smaller and smaller strata in order to improve the fit as best as possible. CART partitions the sample space into a set of rectangles and fits a regression model in each one. The sample space is originally split into two regions. The optimal split is found over all variables at all possible split points. For each of the two regions created, this process is repeated again. The major components of the CART method are the selection and stopping rules. The selection rule determines which stratification to perform at every stage, and the stopping rule determines the final strata to be formed. Once the strata are created, the impurity of each stratum is measured. The least squares model is used to measure the impurity of the nodes in regression trees. The heterogeneity of the outcome categories within a stratum is referred to as "node impurity".
The k-nearest neighbor (k-NN) method classifies a pixel by majority voting according to its k neighbors in the feature space [51]. The geostatistically-weighted k-NN (gk-NN) classification was proposed by Atkinson and Naser [52] and was tested for the object-based method later [53]. In this method, the probability that a pixel u belongs to class m can be evaluated as follows: where h is the separation lag; the subscript uk of h indicates the lag between pixel u and its neighbor k. p m,m (h uk ) is the fitted model of the spatial covariance, which also refers to the class-conditional probability. m is a class index for m = 1, . . . , M classes, and m is the class of interest. S g is a proportional weight between 0 and 1. The larger the S g , the larger the weight given to the geographical component feeding into the probability. The class-conditional probability p m,m (h uk ) of a pixel u belonging to class m, given a neighbor k in class m at a given lag h, is estimated as follows: where N is the number of training pixels in the image, and c(h) represents the class value at a lag h (i.e., the class at the neighboring pixel location k). I is an indicator function. The function I takes a value of one if the condition is satisfied, otherwise zero. The spherical, exponential and Gaussian models are usually fitted to the class-conditional probability plot. The gk-NN method can account for the spatial dependence between the unknown location and its nearest neighboring (in the feature space) sample locations. Therefore, both spectral and spatial information iteratively affect the classification result.

Data Processing
The OBIA classification scheme was adopted given such a high spatial resolution image, so the salt and pepper effect can be avoid in the classification. Furthermore, OBIA facilitates the incorporation of the geometry, texture and contextual information, which may be beneficial to classification. The flowchart is shown in Figure 2.
A multi-resolution image segmentation was first applied to the WV-2 image of all the eight MS bands. Then, 63,356 image segments were delineated given a scale parameter of 10, with a mean size of 27.6 pixels. The segmentation result was not further edited since it is difficult to visually identify boundaries in the image. However, the size of the training data (i.e., 23) sampled in the field is rather small relative to the total number of the image objects (i.e., 63,356), which may severely suppress the classification accuracy; it is therefore necessary to expand the size of training data to achieve a high classification accuracy. Thus, the principal component analysis (PCA) was performed to initially select important features, and then, a reflectance analysis was used to seek a method to expand the training data. After deriving the expanded training data, a feature space optimization method was tested on two sets of the training data to further select the features for classification. Then the original and expanded training data were both involved for classification, in which several methods were used to test the abilities of different classifiers given both sets of training data. The most effective classification scheme was selected, and an enhanced classifier was applied to increase the accuracy. Finally, the canopy densities were estimated to further explain the result.

Principal Component Analysis
The PCA was applied to all of the MS bands, along with three geometry features (the ratio of length to width, border index and shape index) and eight contextual features extracted from the grey-level co-occurrence matrix (GLCM) (mean, standard deviation, homogeneity, contrast, dissimilarity, entropy, correlation and angular second moment). The aim of PCA was to select the appropriate features for expanding training data and for classification, as well. The statistics and the loadings of the resulting principal components (PCs) are shown in Tables 1 and 2, respectively. Only the statistics of the first ten PCs are shown in the tables. Table 1 suggests that the first four PCs are critical, achieving a cumulative proportion of 1.0. The standard deviation of the first PC (PC1) is almost four-times more than that of the second PC (PC2), and the proportion of the variance of PC1 accounts for 0.91. Table 2 shows that Bands 6, 7 and 8 have the largest loadings (only depending on the absolute value) for PC1, which correspond to the red edge, and NIR1 and NIR2 bands, respectively. Band 3, corresponding to the green band, has the largest loading for PC2, whereas GLCM contrast has the largest loading for the third PC (PC3).

Principal Component Analysis
The PCA was applied to all of the MS bands, along with three geometry features (the ratio of length to width, border index and shape index) and eight contextual features extracted from the grey-level co-occurrence matrix (GLCM) (mean, standard deviation, homogeneity, contrast, dissimilarity, entropy, correlation and angular second moment). The aim of PCA was to select the appropriate features for expanding training data and for classification, as well. The statistics and the loadings of the resulting principal components (PCs) are shown in Tables 1 and 2, respectively. Only the statistics of the first ten PCs are shown in the tables. Table 1 suggests that the first four PCs are critical, achieving a cumulative proportion of 1.0. The standard deviation of the first PC (PC1) is almost four-times more than that of the second PC (PC2), and the proportion of the variance of PC1 accounts for 0.91. Table 2 shows that Bands 6, 7 and 8 have the largest loadings (only depending on the absolute value) for PC1, which correspond to the red edge, and NIR1 and NIR2 bands, respectively. Band 3, corresponding to the green band, has the largest loading for PC2, whereas GLCM contrast has the largest loading for the third PC (PC3).  Table 2. Loadings of PCs (grey-level co-occurrence matrix (GLCM) Layers 1-8 represent the mean, standard deviation, homogeneity, contrast, dissimilarity, entropy, correlation and angular second moment, respectively).

Expand Sample Size
As mentioned before, the proportion of classes and distribution of features may not be properly reflected due to a small sample size. Therefore, the sample size needs to be expanded to reduce its effects on the classification result. Here, a reflectance analysis was performed to check the spectral distribution of different classes across different bands, seeking a method to expand the training data.
According to the selected 23 training data, box-and-whisker plots of the spectral variability of the training data of six land cover types across eight MS bands are shown in Figure 3. The spectral reflectance is the mean value of the pixels within the segmented object. The bottom and top of the box are the first and third quartiles, respectively, and the band inside the box is the median value. As shown, the red edge and two NIR bands have stronger spectral separability of classes than other bands. The reflectance of bamboo is separable from all of the other classes across these three bands, but the spectral ranges of mixed woodland and brush overlap. The bamboo class also shows a great difference from other classes across the green and yellow bands. As indicated in Table 2, PC1 is exactly the combination of these five bands. Therefore, according to the reflectance analysis above, it is possible to utilize the mean value and the standard deviation of the reflectance for each class across these five bands to expand the training data.
A parameter t is given for µ ± tσ, where µ and σ stand for the mean value and standard deviation of the reflectance, respectively. Two rules are followed: (i) the spectral range of each band is allowed little or no overlap between different classes; and (ii) an appropriate size of the expanded training data should be considered. Here, five MS bands of PC1 were considered for expanding the training data because the cumulative proportion of PC1 achieves above 90%. Another reason is that if too many features are included, the parameter t would take a large value in order to select enough training data, Sensors 2016, 16, 1957 8 of 21 thus loosening the constraint of the features of training data. The parameters are shown in Table 3, and the spectral ranges of the expanded training data are indicated using arrows in Figure 4.
Sensors 2016, 16,1957 8 of 22 many features are included, the parameter t would take a large value in order to select enough training data, thus loosening the constraint of the features of training data. The parameters are shown in Table 3, and the spectral ranges of the expanded training data are indicated using arrows in Figure 4.  In Figure 4, the arrows show spectral ranges of the expanded training data for different classes across five bands. It can be seen that the red edge, NIR1 and NIR2 bands distinguish well all of the classes, but the spectral ranges of barren land and vegetation overlap in the green and yellow bands. After expanding the training data, as shown in Table 3, the total number of training data is 801 (including 83 samples of the shadow class), accounting for 1.26% of the total image objects (63,356). The spatial distributions of the expanded training data and the testing data collected in the second fieldwork are shown in Figure 5. The testing points are located within different segments to avoid redundancy of information, which may affect the reported accuracy. many features are included, the parameter t would take a large value in order to select enough training data, thus loosening the constraint of the features of training data. The parameters are shown in Table 3, and the spectral ranges of the expanded training data are indicated using arrows in Figure 4.  In Figure 4, the arrows show spectral ranges of the expanded training data for different classes across five bands. It can be seen that the red edge, NIR1 and NIR2 bands distinguish well all of the classes, but the spectral ranges of barren land and vegetation overlap in the green and yellow bands. After expanding the training data, as shown in Table 3, the total number of training data is 801 (including 83 samples of the shadow class), accounting for 1.26% of the total image objects (63,356). The spatial distributions of the expanded training data and the testing data collected in the second fieldwork are shown in Figure 5. The testing points are located within different segments to avoid redundancy of information, which may affect the reported accuracy. In Figure 4, the arrows show spectral ranges of the expanded training data for different classes across five bands. It can be seen that the red edge, NIR1 and NIR2 bands distinguish well all of the classes, but the spectral ranges of barren land and vegetation overlap in the green and yellow bands. After expanding the training data, as shown in Table 3, the total number of training data is 801 (including 83 samples of the shadow class), accounting for 1.26% of the total image objects (63,356). The spatial distributions of the expanded training data and the testing data collected in the second fieldwork are shown in Figure 5. The testing points are located within different segments to avoid redundancy of information, which may affect the reported accuracy.

Feature Space Optimization
It is common to use geometry and contextual features for object-based classification. However, the previous PCA result shows that only GLCM contrast has a great weight for PC3; the other geometry and contextual information do not contribute to the first four PCs. This is because there are rarely large vegetation patches in such a mountainous area; the geometry and contextual information do not show distinctive differences between small segments; thus this information cannot be effectively used.
Here, a feature space optimization method was used to further select the appropriate features for classification. We do not merely use the PCA selected features because PCA is estimated based on the whole image, whereas the class separability is also dependent on the features of training data. Therefore, nine features contributing to the first four PCs (the cumulative proportion achieves 1.0 using the first four PCs), including all eight MS bands and GLCM contrast, were used for feature space optimization based on two sets of training data. The barren land and shadow classes are easier to be identified, so these two classes were exclusive to avoid causing a dominated influence when estimating the optimized features. The method mathematically calculates the distances between the training samples of different classes in the feature space and chooses features that produce the largest average minimum distance as the best combination. The chart of the feature dimension against separation distance is shown in Figure 6. It turns out that five features (Bands 4-8) produced the largest distance (0.16) for the original training data, whereas the same five features and the GLCM contrast layer resulted in the largest distance (0.23) for the expanded training data.

Feature Space Optimization
It is common to use geometry and contextual features for object-based classification. However, the previous PCA result shows that only GLCM contrast has a great weight for PC3; the other geometry and contextual information do not contribute to the first four PCs. This is because there are rarely large vegetation patches in such a mountainous area; the geometry and contextual information do not show distinctive differences between small segments; thus this information cannot be effectively used.
Here, a feature space optimization method was used to further select the appropriate features for classification. We do not merely use the PCA selected features because PCA is estimated based on the whole image, whereas the class separability is also dependent on the features of training data. Therefore, nine features contributing to the first four PCs (the cumulative proportion achieves 1.0 using the first four PCs), including all eight MS bands and GLCM contrast, were used for feature space optimization based on two sets of training data. The barren land and shadow classes are easier to be identified, so these two classes were exclusive to avoid causing a dominated influence when estimating the optimized features. The method mathematically calculates the distances between the training samples of different classes in the feature space and chooses features that produce the largest average minimum distance as the best combination. The chart of the feature dimension against separation distance is shown in Figure 6. It turns out that five features (Bands 4-8) produced the largest distance (0.16) for the original training data, whereas the same five features and the GLCM contrast layer resulted in the largest distance (0.23) for the expanded training data. In order to make a fair comparison using different training data, the same features should be involved for classification. Therefore, referring to the optimization result, we chose the six features selected based on the expanded training data. As a result, the yellow, red, red edge, NIR1 and NIR2 bands and a GLCM contrast layer were included for classification.

Initial Classification Results
Four popular classification algorithms, including the CART, k-NN, Bayesian and SVM methods, were applied in this study. Both the original and expanded training data were used for classification based on the mean values of the yellow, red, red edge, NIR1 and NIR2 bands and the GLCM contrast layer.
In the CART method, the maximum depth of the decision tree was set to six; the minimum number of sample data for each node was set to five; and a six-fold cross-validation was performed. Figure 7 shows the decision rules of the regression tree. However, as can be seen, the original training data only involved three MS bands, and the brush class was missed in the decision rule, whereas the expanded training data distinguished all of the classes, but only using two MS bands. In the k-NN method, the k-value was set to five. In the SVM methods, a five-fold cross-validation was applied to select the cost of constraints violation C and the kernel parameter γ for the radial basis function (RBF) kernel. The optimal parameters C = 1 and γ = 0.2 and C = 2 and γ = 0.33 were used for the original and expanded training data, respectively. The classification results are shown in Figures 8 and 9.
Comparing Figures 8 and 9, the classification results using the Bayesian method show the most distinctive difference based on both sets of training data. The misclassification was caused by the non-normal distribution of the features and the inaccurate estimation of the covariance matrix. For the rest of the methods, the brush was not classified using the CART method based on the original training data. The results based on the expanded training data generally show more brush than those based on the original training data. The CART and SVM methods classify more areas as bamboos, and the k-NN method allocates more pixels as broadleaved based on the original training data. Comparing the four classification methods based on the expanded training data in Figure 9, brush is dominated in the Bayesian result. The SVM result has the highest number of pixels classified as bamboos, whereas the Bayesian result has the least. The broadleaved and mixed woodland classes appear more in the SVM result, and the broadleaved class also appears more in the k-NN result. In order to make a fair comparison using different training data, the same features should be involved for classification. Therefore, referring to the optimization result, we chose the six features selected based on the expanded training data. As a result, the yellow, red, red edge, NIR1 and NIR2 bands and a GLCM contrast layer were included for classification.

Initial Classification Results
Four popular classification algorithms, including the CART, k-NN, Bayesian and SVM methods, were applied in this study. Both the original and expanded training data were used for classification based on the mean values of the yellow, red, red edge, NIR1 and NIR2 bands and the GLCM contrast layer.
In the CART method, the maximum depth of the decision tree was set to six; the minimum number of sample data for each node was set to five; and a six-fold cross-validation was performed. Figure 7 shows the decision rules of the regression tree. However, as can be seen, the original training data only involved three MS bands, and the brush class was missed in the decision rule, whereas the expanded training data distinguished all of the classes, but only using two MS bands. In the k-NN method, the k-value was set to five. In the SVM methods, a five-fold cross-validation was applied to select the cost of constraints violation C and the kernel parameter γ for the radial basis function (RBF) kernel. The optimal parameters C = 1 and γ = 0.2 and C = 2 and γ = 0.33 were used for the original and expanded training data, respectively. The classification results are shown in Figures 8 and 9.
Comparing Figures 8 and 9, the classification results using the Bayesian method show the most distinctive difference based on both sets of training data. The misclassification was caused by the non-normal distribution of the features and the inaccurate estimation of the covariance matrix. For the rest of the methods, the brush was not classified using the CART method based on the original training data. The results based on the expanded training data generally show more brush than those based on the original training data. The CART and SVM methods classify more areas as bamboos, and the k-NN method allocates more pixels as broadleaved based on the original training data. Comparing the four classification methods based on the expanded training data in Figure 9, brush is dominated in the Bayesian result. The SVM result has the highest number of pixels classified as bamboos, whereas the Bayesian result has the least. The broadleaved and mixed woodland classes appear more in the SVM result, and the broadleaved class also appears more in the k-NN result.

Accuracy Assessment
The accuracies are reported as radar charts in Figure 10. As shown from Figure 10a, the overall accuracies are greater based on the expanded training data than for the original training data using all four classification methods. The SVM method has the greatest overall accuracy based on the original training data, but it only has a 4.4% increase in the overall accuracy, which indicates that the SVM is the most stable classifier; thus, sample size has little effect on it. The overall accuracies using the Bayesian method are the lowest based on both training data, but the increase in accuracy is great, which achieves 12.77%. The CART method increased 9.4% in accuracy based on the expanded training data, and its overall accuracy is greater than for the SVM method. The k-NN has the greatest overall accuracy based on the expanded training data, where the increase in accuracy is 34.98%.
Comparing the producer's accuracies based on different training data, the SVM method shows a sharp decrease for the broadleaved class and an increase for the mixed woodland class based on the expanded training data. It reflects in Figures 8b and 9b, where the dominant vegetation turned from broadleaved to mixed woodland. A similar situation also occurred for the CART method, where the dominant broadleaved class turned into brush, causing an increase in producer's accuracy for the latter class. The classes of mixed woodland, brush and barren land suffer a severe misclassification

Accuracy Assessment
The accuracies are reported as radar charts in Figure 10. As shown from Figure 10a, the overall accuracies are greater based on the expanded training data than for the original training data using all four classification methods. The SVM method has the greatest overall accuracy based on the original training data, but it only has a 4.4% increase in the overall accuracy, which indicates that the SVM is the most stable classifier; thus, sample size has little effect on it. The overall accuracies using the Bayesian method are the lowest based on both training data, but the increase in accuracy is great, which achieves 12.77%. The CART method increased 9.4% in accuracy based on the expanded training data, and its overall accuracy is greater than for the SVM method. The k-NN has the greatest overall accuracy based on the expanded training data, where the increase in accuracy is 34.98%.
Comparing the producer's accuracies based on different training data, the SVM method shows a sharp decrease for the broadleaved class and an increase for the mixed woodland class based on the expanded training data. It reflects in Figures 8b and 9b, where the dominant vegetation turned from broadleaved to mixed woodland. A similar situation also occurred for the CART method, where the dominant broadleaved class turned into brush, causing an increase in producer's accuracy for the latter class. The classes of mixed woodland, brush and barren land suffer a severe misclassification based on the original training data using the Bayesian method; the brush and barren land classes were corrected based on the expanded training data, but the accuracy reduced for the bamboo class. The producer's accuracies of all of the classes increased using the k-NN method, except for broadleaved forest, since it was classified more based on the original training data, as mentioned before. The user's accuracies of all of the classes are generally greater based on the expanded training data than for the original training data, leading to an increase in the overall accuracy. Therefore, the expanded training data successfully increased the accuracies for all of the classification methods.
Sensors 2016, 16,1957 13 of 22 based on the original training data using the Bayesian method; the brush and barren land classes were corrected based on the expanded training data, but the accuracy reduced for the bamboo class. The producer's accuracies of all of the classes increased using the k-NN method, except for broadleaved forest, since it was classified more based on the original training data, as mentioned before. The user's accuracies of all of the classes are generally greater based on the expanded training data than for the original training data, leading to an increase in the overall accuracy. Therefore, the expanded training data successfully increased the accuracies for all of the classification methods.

Improving the Classifier
Here, the k-NN method based on the expanded training data produced the most accurate result. In order to further increase the classification accuracy, a spatial weighting scheme was performed to improve the k-NN classifier and derive a contextual classification result. As described in Section 2.3, a geostatistical modelling was first performed to estimate class-conditional probability for each class in the gk-NN classification method and provided spatial information for the k-NN classifier. In the geostatistical model, the conditional probability is a function of distance. The Euclidean distance between segments was calculated to build the geostatistical model. In order to do so, the point located at the center of each segment was extracted using the FeatureToPoint tool (inside option) in ArcGIS. The central locations were recorded, and so, the distance between any two segments could be inferred.
Class-conditional probability plots were estimated from 801 sample points and then fitted with the covariance models without considering anisotropy. The class-conditional probability plots for the seven classes and the fitted models are shown in Figure 11. Then, the gk-NN classification method using Equation (1) was performed, based on the same six features of the expanded training data as in the previous classification methods.

Improving the Classifier
Here, the k-NN method based on the expanded training data produced the most accurate result. In order to further increase the classification accuracy, a spatial weighting scheme was performed to improve the k-NN classifier and derive a contextual classification result. As described in Section 2.3, a geostatistical modelling was first performed to estimate class-conditional probability for each class in the gk-NN classification method and provided spatial information for the k-NN classifier. In the geostatistical model, the conditional probability is a function of distance. The Euclidean distance between segments was calculated to build the geostatistical model. In order to do so, the point located at the center of each segment was extracted using the FeatureToPoint tool (inside option) in ArcGIS. The central locations were recorded, and so, the distance between any two segments could be inferred.
Class-conditional probability plots were estimated from 801 sample points and then fitted with the covariance models without considering anisotropy. The class-conditional probability plots for the seven classes and the fitted models are shown in Figure 11. Then, the gk-NN classification method using Equation (1) was performed, based on the same six features of the expanded training data as in the previous classification methods.  The classified map generated using the gk-NN method is shown in Figure 12. As indicated, Figures 9b and 12 are similar in class distribution. The bamboo class in red color appears slightly more in the k-NN result than in the gk-NN result, whereas the mixed woodland class appears more in the gk-NN result. Tables 4 and 5 show the complete confusion matrices and Kappa coefficients of the k-NN and gk-NN classification results. As can be seen, the overall accuracy and Kappa coefficient of the gk-NN method is 81.16% and 0.768, respectively, which are greater than for the k-NN method (76.28% and 0.706). The producer's accuracies of the bamboo class are the same using both methods, and the user's accuracy is slightly greater using the gk-NN method, indicating that the introduced spatial weighting has little effect on the bamboo class. The class that benefits the most from the spatial weighting is broadleaved forest, whose producer's and user's accuracies achieved the increases of 23.29% and 12.77%, respectively. The user's accuracy of the coniferous class, the producer's accuracy of the brush class and both the producer's and user's accuracies of the mixed woodland class increased. It can be seen in Figure 11 that as these four vegetation classes have shorter distance ranges than the other classes, the classification result was improved by spatial context within a short distance range. In conclusion, the spatial weighting used in the gk-NN classifier increased the accuracy of the k-NN method.
The classified map generated using the gk-NN method is shown in Figure 12. As indicated, Figures 9b and 12 are similar in class distribution. The bamboo class in red color appears slightly more in the k-NN result than in the gk-NN result, whereas the mixed woodland class appears more in the gk-NN result. Tables 4 and 5 show the complete confusion matrices and Kappa coefficients of the k-NN and gk-NN classification results. As can be seen, the overall accuracy and Kappa coefficient of the gk-NN method is 81.16% and 0.768, respectively, which are greater than for the k-NN method (76.28% and 0.706). The producer's accuracies of the bamboo class are the same using both methods, and the user's accuracy is slightly greater using the gk-NN method, indicating that the introduced spatial weighting has little effect on the bamboo class. The class that benefits the most from the spatial weighting is broadleaved forest, whose producer's and user's accuracies achieved the increases of 23.29% and 12.77%, respectively. The user's accuracy of the coniferous class, the producer's accuracy of the brush class and both the producer's and user's accuracies of the mixed woodland class increased. It can be seen in Figure 11 that as these four vegetation classes have shorter distance ranges than the other classes, the classification result was improved by spatial context within a short distance range. In conclusion, the spatial weighting used in the gk-NN classifier increased the accuracy of the k-NN method.

Canopy Density Estimation
In this study area, bamboos are covered by tree crowns at many locations, and their spectra do not show distinctive differences from surrounding vegetation and above tree crowns. Therefore, it is worth exploring what degree of canopy density (i.e., the percentage of vegetation to ground) can allow understory bamboos to be identified from the WV-2 imagery. Figure 13 shows two photos that were vertically taken using a fisheye camera at two testing points. Figure 13a was marked as the bamboo class surrounded by brush and was correctly classified, whereas Figure 13b was recorded as the bamboo class covered by mixed woodland, but was misclassified as brush using the gk-NN method.

Canopy Density Estimation
In this study area, bamboos are covered by tree crowns at many locations, and their spectra do not show distinctive differences from surrounding vegetation and above tree crowns. Therefore, it is worth exploring what degree of canopy density (i.e., the percentage of vegetation to ground) can allow understory bamboos to be identified from the WV-2 imagery. Figure 13 shows two photos that were vertically taken using a fisheye camera at two testing points. Figure 13a was marked as the bamboo class surrounded by brush and was correctly classified, whereas Figure 13b was recorded as the bamboo class covered by mixed woodland, but was misclassified as brush using the gk-NN method.  It is obvious that the bamboos in Figure 13a are less covered by canopies than those in Figure 13b. The canopy densities were estimated for these two photos. Thus, the tree canopies were extracted and are shown in binary maps in Figure 14. The background pixels outside the fisheye camera were excluded by masking when estimating canopy densities. The canopy densities are 0.82 It is obvious that the bamboos in Figure 13a are less covered by canopies than those in Figure 13b. The canopy densities were estimated for these two photos. Thus, the tree canopies were extracted and are shown in binary maps in Figure 14. The background pixels outside the fisheye camera were excluded by masking when estimating canopy densities. The canopy densities are 0.82 and 0.67 for Figure 14a,b, respectively. Therefore, in a WV-2 image, it is possible to extract bamboos in the areas with a median canopy density (from 0.2-0.7) or in a sparse forest (the canopy density is less than 0.2), but it is difficult to identify bamboos in the areas with a high canopy density (over 0.70).

Canopy Density Estimation
In this study area, bamboos are covered by tree crowns at many locations, and their spectra do not show distinctive differences from surrounding vegetation and above tree crowns. Therefore, it is worth exploring what degree of canopy density (i.e., the percentage of vegetation to ground) can allow understory bamboos to be identified from the WV-2 imagery. Figure 13 shows two photos that were vertically taken using a fisheye camera at two testing points. Figure 13a was marked as the bamboo class surrounded by brush and was correctly classified, whereas Figure 13b was recorded as the bamboo class covered by mixed woodland, but was misclassified as brush using the gk-NN method.  It is obvious that the bamboos in Figure 13a are less covered by canopies than those in Figure 13b. The canopy densities were estimated for these two photos. Thus, the tree canopies were extracted and are shown in binary maps in Figure 14. The background pixels outside the fisheye camera were excluded by masking when estimating canopy densities. The canopy densities are 0.82 Figure 14. The canopies shown in binary maps of the photos shown in Figure 13. (a) The bamboo class surrounded by brush and was correctly classified; (b) the bamboo class covered by mixed woodland and was misclassified as brush.

Classification Accuracy Comparison
For the classification methods used in the experiment, the overall accuracies are all below 55% based on the original training data. This is because the classification model can be very vulnerable given such a small sample size, causing some class types to be severely misclassified, such as coniferous trees in the k-NN and brush in the CART methods. The expanded training data provided a proper proportion of classes as the prior knowledge and, therefore, increased the reliability of the classification model. The Bayesian classifier is sensitive to the distribution of features, whereas the SVM method shows its robustness in classification. The feature space optimization method is estimated based on the class separation distance, which is more suited to the k-NN method. This may be the reason that k-NN can achieve the greatest accuracy. However, even based on the expanded training data, the overall accuracies are not very high. There are three factors that may affect the accuracies: the understory bamboos, the diverse vegetation and some errors that have inevitably been introduced into the expanded training data, thus causing difficulties in classification. In order to compare the derived accuracy with other studies, Table 6 presents an overview of studies on bamboo classification carried out within the last 10 years using different remotely-sensed images.
The overall accuracies, reported in Table 6, range between 48% and 93%. However, the great accuracies are generally achieved in an area where bamboos are dominated and not covered by trees. For the classification of understory bamboos, the overall accuracies range between 74% and 88% and usually by using assistant data (e.g., elevation). The producer's and user's accuracies of bamboo obtained in this study (82.65% and 93.10%) are greater than most of the accuracies reported in comparable studies. Most of the methods for bamboo classification are widely used, such as the maximum likelihood classifier (MLC), SVM and some machine-learning methods, for example, neural networks and maximum entropy (MaxEnt). We also reviewed some advanced methods for classifying diverse vegetation (without bamboo class), including random forest [1], linear discriminant analysis [54,55], logistic regression [56], etc. However, it is more common to use hyperspectral images for forest classification [15,[57][58][59][60][61]. The research using VHR images for bamboo classification has appeared very recently [16,23], and there is no related research using VHR images to extract understory bamboos as performed in this study. Elevation data and the derivatives can be combined to improve the classification results, which can be explored in the future. It should be noted that the objective of the paper was to check the possibility to extract understory bamboos using a WV-2 image in Wolong reserve rather than the validation of methods. Therefore, the methods in this paper including feature selection and classification are not universal; it may lead to different results when testing on another area. Table 6. Results from bamboo classification using different remotely-sensed images and classification methods from the last 10 years (in chronological order).

Performance of the Gk-NN Method
It is worth exploring how the gk-NN method utilizes spatial information. Bamboos in the study area normally grow as small patches, and thus, unlike in previous research [16], texture information does not facilitate mapping bamboo patches. Instead, the spatial correlation was established between sparsely-distributed vegetated areas. The k-NN classifier was chosen to incorporate the spatial information because the location information of training data can be retained and a geographical weighting can be easily integrated into the classifier. It has been proven that both the distance weighted scheme and the geostatistical scheme can lead to sound classification results [52]. The spatial information lies in the conditional probability function, which derives from each pair of an unknown location and its nearest neighbors. For instance, the distance range of the covariance model for the bamboo class is 70 (Figure 11), which means that the spatial correlation between bamboos decreases within a distance range from 0-70 pixels (0-140 m), and the correlation will reduce to zero outside this range.
The last issue is the applicability of the gk-NN method in an object-based level. Although the classification applied in the case study is object based, the sample points for geostatistical modelling were obtained from the center of the training segments, and so, the covariance model describes the spatial correlation between different segments. The probability plots in Figure 11 show both the correlation within each segment (a peak) and the correlation between segments (a trend). The method characterizes the correlation between segments. Therefore, the peaks were ignored when fitting the geostatistical models. In this case study, as the number of segments is large and the average size of segments is small, the distances between centers of segments are similar to pixel-based measurement, and thus, the gk-NN method is applicable for the object-based classification.

Conclusions
This study explored the potential of VHR WV-2 imagery for extracting small patches of understory bamboos in a mountainous region in Wolong, Sichuan Province, China. A simple, but effective method was used to expand the training data to an appropriate sample size based on the PCA and reflectance analysis. The features of training data were then optimized for classification. Four regular object-based classification methods were applied based on both the original and expanded training data. The results were analyzed and compared through the field validation. The expanded training data successfully increased the accuracies for all of the classification methods, in which the k-NN method achieved the greatest accuracy. Then, an enhanced k-NN classifier weighted with a geostatistical scheme was applied to produce a final land cover map. This method produced the overall accuracy of 81.16%; the producer's and user's accuracies of the bamboo class are 82.65% and 93.10%, respectively. The canopy densities were estimated to check the possibility of extracting bamboos under tree crowns. This study demonstrates that the WV-2 image can be used to identify small patches of understory bamboos in a forest-covered mountainous area, given limited known sample points. The result is critical to identifying habitats of giant pandas and supporting the conservation of the endangered animals.