1. Introduction
As an endangered species, giant pandas (
Ailuropoda melanoleuca) are threatened by continuous habitat loss and a low birth rate. Giant pandas live in a few mountain ranges in central China, mainly in Wolong, Sichuan Province, where bamboos act as the main food source for wild giant pandas. Estimating and mapping suitable habitat are critical to endangered species conservation planning and policy [
1]. As a result, knowledge of the spatial distribution of bamboos becomes important for identifying the habitat of giant pandas. The increasing availability of remotely-sensed data has led to widespread use in habitat mapping. The common approach employed for habitat mapping using remote sensing is land cover classification, combined with ancillary information, such as digital elevation models (DEM) and the water system [
2]. There have been ongoing studies for mapping bamboos and other tree species using remote sensing [
3,
4,
5,
6,
7,
8,
9]. Most of these studies applied classification over large areas using medium or low spatial resolution imagery, such as Landsat TM/ETM+ [
10,
11,
12,
13,
14] and MODIS [
1], or using multi-temporal data, for example Wide Field Sensor data [
3] and hyperspectral data [
15].
In recent decades, the rapid development of satellite techniques has enabled researchers to work on tree species mapping using very high resolution (VHR) multispectral (MS) imagery [
16]. Much research focused on extracting the desired land cover classes using VHR imagery. For example, Kamagata et al. [
17] applied pixel-based and object-based classifications of IKONOS images to identify forest physiognomy. Ouma and Tateishi [
18] estimated biomass by classifying QuickBird imagery. Pouteau et al. [
19] also utilized QuickBird imagery to map rare plants. Sasaki et al. [
20] classified tree species by integrating LiDAR and VHR imagery data. Hu et al. [
21] explored the use of Google Earth imagery for land cover mapping in urban area. An accurate bamboo mapping can be realized with the increasing availability of VHR satellite imagery. For example, Araujo et al. [
22] mapped bamboo-dominated gaps using QuickBird imagery. Han et al. [
23] performed Moso bamboo forest mapping using SPOT-5 imagery.
WorldView-2 (WV-2), as a new satellite-borne sensor, was launched by DigitalGlobe Company in 2009. WV-2 is the first high resolution commercial satellite with eight MS bands. The data provider postulates that all four new bands (coastal blue, yellow, red edge and Near Infrared 2) are strongly related to vegetation properties [
24]. Recent studies have also demonstrated that WV-2 imagery has a high potential in the classification of tree species. Immitzer et al. [
25] examined the suitability of eight-band WV-2 satellite data for the identification of 10 tree species in Austria. Pu and Landry [
26] explored the potential of WV-2 for identifying and mapping urban tree species/groups and compared capabilities between IKONOS and WV-2 imagery. Omer et al. [
27] predicted the Leaf Area Index (LAI) of endangered tree species using WV-2 data. Karlson et al. [
28] used WV-2 imagery to map tree crown in managed woodlands. Chuang and Shiu [
29] used WV-2 pan-sharpened imagery to map tea crop. WV-2 has shown advantages in classifying bamboo patches, as well. For example, Ghosh and Joshi [
16] compared classification algorithms for bamboo mapping using WV-2 imagery.
When processing VHR imagery, such as WV-2, advanced classification techniques are important, which have been studied for many years [
30,
31,
32,
33]. It is generally agreed that object-based image analysis (OBIA) is superior to pixel-based image analysis (PBIA) for processing VHR remotely-sensed data [
34,
35,
36,
37,
38,
39], because the former groups pixels into image objects (also known as segments), thus overcoming the salt and pepper effect, which often occurs in the latter. OBIA is now widely used to classify VHR images, mainly in land cover mapping of vegetation [
32,
40], forest [
41], urban areas [
26], shaded areas [
42], burned areas [
43], etc. For PBIA, much research has proven that spatial information can be used to improve classification results, such as geometry, homogeneity, entropy [
44], contrast, dissimilarity [
45,
46], grey-level co-occurrence matrix (GLCM) and grey-level difference vector (GLDV) [
47,
48]. The OBIA approach also facilitates these features of image objects to be incorporated into classifiers. However, the spatial dependency (e.g., spatial correlation and heterogeneity) is rarely considered in OBIA. According to Tobler’s first law of geography, everything is related to everything else, but near things are more related than distant things [
49]. Therefore, the spatial correlation between classes can be also incorporated to increase classification accuracy.
The Wolong natural reserve of giant pandas in Sichuan Province is a mountainous region. In this area, bamboos are sparsely distributed as fragments, mixed with brush and covered by tree canopies, thus causing difficulties in detecting and identifying bamboos. Therefore, this paper aims at exploring the possibility to accurately map small patches of understory bamboos using a VHR image in a complicated environment, which is critical to identifying habitats of giant pandas and supporting the conservation of endangered animals.
2. Materials and Methods
2.1. Study Area
The Wolong natural reserve is located in the southwest of Wenchuan County, Aba Tibetan and Qiang Autonomous Prefecture, Sichuan Province. This region is at the southeastern slope of Qionglai Mountains and is 130 km away from the provincial capital Chengdu City. In Wolong region, the habitat of wild pandas has been greatly shrunken and fragmented, due to agricultural expansions, increasing demands for timber products and infrastructure constructions. After the Wenchuan Earthquake happened in 2008, landslides and mudslides have worsened the situation.
There is a high variation in topography, soils and climate that leads to a diverse flora and fauna in Wolong reserve. Broadleaf forests are dominated by evergreen species below 1600 m and by a mixture of evergreen and deciduous species between 1600 m and 2000 m. Above 2000 m, a mixed coniferous and deciduous broadleaf forest gradually changes to a subalpine coniferous forest around 2600 m. The forest reaches about 3600 m, until it is replaced by alpine meadows. Under forest canopies, evergreen bamboo species dominate the understory layer [
50].
The study area is the Wuyipeng research site (
Figure 1), which was once a research facility of the giant panda reserve center in Wolong, for researchers’ convenient access to the habitat of giant pandas. With the establishment of other giant panda reserves, this site is no longer fully in service. A WV-2 subscene (30°58′41″–31°0′10″ N, 103°8′57″–103°10′34″ E) with a size of 1383 × 1263 pixels, acquired on 14 January 2014 over the Wuyipeng area, is used in this study. The dataset consists of eight MS bands with a spatial resolution of 2 m, including coastal blue (0.400–0.450 μm), blue (0.450–0.510 μm), green (0.510–0.580 μm), yellow (0.585–0.625 μm), red (0.630–0.690 μm), red edge (0.705–0.745 μm), Near-Infrared 1 (NIR1) (0.770–0.895 μm) and Near-Infrared 2 (NIR2) (0.860–1.040 μm).
In the Wuyipeng area, there is an uphill route existing from northwest to southeast of the image, and the altitude changes from 1900 m–3450 m. There are mainly two bamboo species in this area: arrow bamboos (Bashania fabri) and umbrella bamboos (Fargesia robusta); and both of them are favored by giant pandas. Most umbrella bamboos grow taller than arrow bamboos, and both are sparsely distributed and covered by tree canopies at an altitude above 2000 m. However, the spectra of these two bamboo species are similar in the WV-2 imagery, and it is difficult to identify small patches of bamboo species using remote sensing techniques without hyperspectral information involved. Therefore, we did not identify tree species in this study, and the land cover types we focused on were classified as bamboo, coniferous, broadleaved, mixed woodland, brush and barren land.
2.2. Fieldwork
Extensive fieldwork at Wuyipeng was carried out in two field visits. The first one was on 11 June 2014, with the aim of measuring feature points for image geometric correction and collecting training data for classification. The second field visit was on 11–12 September 2014, for the purpose of testing the accuracy of the classification results. In both field trips, a Trimble® GeoXH™ 6000 handheld GPS receiver (Trimble Mexico S. de R.L., Mexico City, Mexico) was used to collect sample points. An antenna was connected to the GPS with a 2-m centering rod to ensure that the GPS signals from multiple navigation satellites could be received under the canopy of large trees. The GeoXH handheld uses both EVEREST™ multipath rejection technology and H-Star™ technology to provide decimeter (10 cm) positioning accuracy. Finally, eight feature points were measured at the road junctions and the corners of houses, which were then used for geometric correction.
In the first fieldwork, about 300 points were sampled with their locations and vegetation categories recorded. However, since we did not know the precise locations of the image in the first fieldwork, many points fell into shadows after geometric correction, and their spectral information could not be used for training. Thus, these samples were discarded for classification. In the remaining training data, only three points were labelled as bamboos, and four for each of the remaining classes (coniferous, broadleaved, mixed woodland, brush and barren land). Commonly, the training and testing data should be sampled from different areas to make a fair comparison. However, only one route could be accessed to get to the top of the mountain, and there were no other routes in a different area nearby. There were still some small landslides in this area, making the place very dangerous to perform the fieldwork; therefore, the same area was explored in the second fieldwork, and 432 points were recorded as testing data. It should be noted that the shadow class was also used as a category for classification, which can be easily identified from the image. However, since it is not a typical land cover class type, the shadow class was not listed in the classification results.
2.3. Classification Methods
Several classification methods were used in the experiment, and a brief review of the related methods is presented in this section.
The Bayesian classification is based on Bayes’ theorem. It can predict class membership probabilities and then allocate a pixel to a class based on the maximum a posteriori decision rule.
The support vector machine (SVM) classifier is a supervised learning model with associated learning algorithms that analyze data and recognize patterns used for classification and regression analysis. Given a set of training data, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new data into one category or the other, making it a non-probabilistic binary linear classifier.
The classification and regression tree (CART) classifier is a non-parametric method, and its main idea is to recursively partition the data into smaller and smaller strata in order to improve the fit as best as possible. CART partitions the sample space into a set of rectangles and fits a regression model in each one. The sample space is originally split into two regions. The optimal split is found over all variables at all possible split points. For each of the two regions created, this process is repeated again. The major components of the CART method are the selection and stopping rules. The selection rule determines which stratification to perform at every stage, and the stopping rule determines the final strata to be formed. Once the strata are created, the impurity of each stratum is measured. The least squares model is used to measure the impurity of the nodes in regression trees. The heterogeneity of the outcome categories within a stratum is referred to as “node impurity”.
The
k-nearest neighbor (
k-NN) method classifies a pixel by majority voting according to its
k neighbors in the feature space [
51]. The geostatistically-weighted
k-NN (g
k-NN) classification was proposed by Atkinson and Naser [
52] and was tested for the object-based method later [
53]. In this method, the probability that a pixel
u belongs to class
m can be evaluated as follows:
where
h is the separation lag; the subscript
uk of
h indicates the lag between pixel
u and its neighbor
k.
is the fitted model of the spatial covariance, which also refers to the class-conditional probability.
m′ is a class index for
m′ = 1, …,
M classes, and
m is the class of interest.
Sg is a proportional weight between 0 and 1. The larger the
Sg, the larger the weight given to the geographical component feeding into the probability. The class-conditional probability
of a pixel
u belonging to class
m, given a neighbor
k in class
m′ at a given lag
h, is estimated as follows:
where
N is the number of training pixels in the image, and
c(
h) represents the class value at a lag
h (i.e., the class at the neighboring pixel location
k).
I is an indicator function. The function
I takes a value of one if the condition is satisfied, otherwise zero. The spherical, exponential and Gaussian models are usually fitted to the class-conditional probability plot. The g
k-NN method can account for the spatial dependence between the unknown location and its nearest neighboring (in the feature space) sample locations. Therefore, both spectral and spatial information iteratively affect the classification result.
3. Data Processing
The OBIA classification scheme was adopted given such a high spatial resolution image, so the salt and pepper effect can be avoid in the classification. Furthermore, OBIA facilitates the incorporation of the geometry, texture and contextual information, which may be beneficial to classification. The flowchart is shown in
Figure 2.
A multi-resolution image segmentation was first applied to the WV-2 image of all the eight MS bands. Then, 63,356 image segments were delineated given a scale parameter of 10, with a mean size of 27.6 pixels. The segmentation result was not further edited since it is difficult to visually identify boundaries in the image. However, the size of the training data (i.e., 23) sampled in the field is rather small relative to the total number of the image objects (i.e., 63,356), which may severely suppress the classification accuracy; it is therefore necessary to expand the size of training data to achieve a high classification accuracy. Thus, the principal component analysis (PCA) was performed to initially select important features, and then, a reflectance analysis was used to seek a method to expand the training data. After deriving the expanded training data, a feature space optimization method was tested on two sets of the training data to further select the features for classification. Then the original and expanded training data were both involved for classification, in which several methods were used to test the abilities of different classifiers given both sets of training data. The most effective classification scheme was selected, and an enhanced classifier was applied to increase the accuracy. Finally, the canopy densities were estimated to further explain the result.
3.1. Principal Component Analysis
The PCA was applied to all of the MS bands, along with three geometry features (the ratio of length to width, border index and shape index) and eight contextual features extracted from the grey-level co-occurrence matrix (GLCM) (mean, standard deviation, homogeneity, contrast, dissimilarity, entropy, correlation and angular second moment). The aim of PCA was to select the appropriate features for expanding training data and for classification, as well. The statistics and the loadings of the resulting principal components (PCs) are shown in
Table 1 and
Table 2, respectively. Only the statistics of the first ten PCs are shown in the tables.
Table 1 suggests that the first four PCs are critical, achieving a cumulative proportion of 1.0. The standard deviation of the first PC (PC1) is almost four-times more than that of the second PC (PC2), and the proportion of the variance of PC1 accounts for 0.91.
Table 2 shows that Bands 6, 7 and 8 have the largest loadings (only depending on the absolute value) for PC1, which correspond to the red edge, and NIR1 and NIR2 bands, respectively. Band 3, corresponding to the green band, has the largest loading for PC2, whereas GLCM contrast has the largest loading for the third PC (PC3).
3.2. Expand Sample Size
As mentioned before, the proportion of classes and distribution of features may not be properly reflected due to a small sample size. Therefore, the sample size needs to be expanded to reduce its effects on the classification result. Here, a reflectance analysis was performed to check the spectral distribution of different classes across different bands, seeking a method to expand the training data.
According to the selected 23 training data, box-and-whisker plots of the spectral variability of the training data of six land cover types across eight MS bands are shown in
Figure 3. The spectral reflectance is the mean value of the pixels within the segmented object. The bottom and top of the box are the first and third quartiles, respectively, and the band inside the box is the median value. As shown, the red edge and two NIR bands have stronger spectral separability of classes than other bands. The reflectance of bamboo is separable from all of the other classes across these three bands, but the spectral ranges of mixed woodland and brush overlap. The bamboo class also shows a great difference from other classes across the green and yellow bands. As indicated in
Table 2, PC1 is exactly the combination of these five bands. Therefore, according to the reflectance analysis above, it is possible to utilize the mean value and the standard deviation of the reflectance for each class across these five bands to expand the training data.
A parameter
t is given for
µ ±
tσ, where
µ and
σ stand for the mean value and standard deviation of the reflectance, respectively. Two rules are followed: (i) the spectral range of each band is allowed little or no overlap between different classes; and (ii) an appropriate size of the expanded training data should be considered. Here, five MS bands of PC1 were considered for expanding the training data because the cumulative proportion of PC1 achieves above 90%. Another reason is that if too many features are included, the parameter
t would take a large value in order to select enough training data, thus loosening the constraint of the features of training data. The parameters are shown in
Table 3, and the spectral ranges of the expanded training data are indicated using arrows in
Figure 4.
In
Figure 4, the arrows show spectral ranges of the expanded training data for different classes across five bands. It can be seen that the red edge, NIR1 and NIR2 bands distinguish well all of the classes, but the spectral ranges of barren land and vegetation overlap in the green and yellow bands. After expanding the training data, as shown in
Table 3, the total number of training data is 801 (including 83 samples of the shadow class), accounting for 1.26% of the total image objects (63,356). The spatial distributions of the expanded training data and the testing data collected in the second fieldwork are shown in
Figure 5. The testing points are located within different segments to avoid redundancy of information, which may affect the reported accuracy.
3.3. Feature Space Optimization
It is common to use geometry and contextual features for object-based classification. However, the previous PCA result shows that only GLCM contrast has a great weight for PC3; the other geometry and contextual information do not contribute to the first four PCs. This is because there are rarely large vegetation patches in such a mountainous area; the geometry and contextual information do not show distinctive differences between small segments; thus this information cannot be effectively used.
Here, a feature space optimization method was used to further select the appropriate features for classification. We do not merely use the PCA selected features because PCA is estimated based on the whole image, whereas the class separability is also dependent on the features of training data. Therefore, nine features contributing to the first four PCs (the cumulative proportion achieves 1.0 using the first four PCs), including all eight MS bands and GLCM contrast, were used for feature space optimization based on two sets of training data. The barren land and shadow classes are easier to be identified, so these two classes were exclusive to avoid causing a dominated influence when estimating the optimized features. The method mathematically calculates the distances between the training samples of different classes in the feature space and chooses features that produce the largest average minimum distance as the best combination. The chart of the feature dimension against separation distance is shown in
Figure 6. It turns out that five features (Bands 4–8) produced the largest distance (0.16) for the original training data, whereas the same five features and the GLCM contrast layer resulted in the largest distance (0.23) for the expanded training data.
In order to make a fair comparison using different training data, the same features should be involved for classification. Therefore, referring to the optimization result, we chose the six features selected based on the expanded training data. As a result, the yellow, red, red edge, NIR1 and NIR2 bands and a GLCM contrast layer were included for classification.