Integration of Satellite Imagery , Topography and Human Disturbance Factors Based on Canonical Correspondence Analysis Ordination for Mountain Vegetation Mapping : A Case Study in Yunnan , China

The integration between vegetation data, human disturbance factors, and geo-spatial data (Digital Elevation Model (DEM) and image data) is a particular challenge for vegetation mapping in mountainous areas. The present study aimed to incorporate the relationships between species distribution (or vegetation spatial distribution pattern) and topography and human disturbance factors with remote sensing data, to improve the accuracy of mountain vegetation maps. Two different mountainous areas located in Lancang (Mekong) watershed served as study sites. An Artificial Neural Network (ANN) architecture classification was used as image classification protocol. In addition, canonical correspondence analysis (CCA) ordination was applied to address the relationships between topography and human disturbance factors with the spatial distribution of vegetation patterns. We used ordinary kriging at unobserved locations to predict the CCA scores. The CCA ordination results showed that the vegetation spatial distribution patterns are strongly affected by topography and human disturbance factors. The overall accuracy of vegetation classification was significantly improved by incorporating DEM or four CCA axes as additional channels in both the northern and southern study areas. However, there was no significant difference between using DEM or four CCA axes as extra channels in OPEN ACCESS Remote Sens. 2014, 6 1027 the northern steep mountainous areas because of a strong redundancy between CCA axes and DEM data. In the southern lower mountainous areas, the accuracy was significantly higher using four CCA axes as extra bands, compared to using DEM as an extra band. In the southern study area, the variance of vegetation data explained by human disturbance factors was larger than the variance explained by topographic attributes.


Introduction
Remote sensing of vegetation in rugged and mountainous areas is severely affected by topographic effects [1,2].Environmental factors such as topography, soil, water, and climate influence vegetation distribution in mountainous area [3][4][5].Numerous authors have shown the relationships between ecosystem structure and composition and topographic features, such as elevation, slope angle, aspect and indices of relative moisture based on potential solar radiation and topographic redistribution of precipitation [6][7][8][9].Topographic effects cause a difference in radiance values between inclined and horizontal surfaces.This causes the same land cover type on opposite facing slopes to have different spectral values and different land cover types to have similar spectral values [10].
The spatial distribution pattern of mountainous vegetation is the result of complex interactions between natural processes and human influences [4,35].Besides the topography as such human disturbance factors are also a key factor influencing distribution of vegetation types [36][37][38][39].Identifying the factors controlling the distribution, abundance and diversity of species at local scale is one of the central objectives in community ecology [40,41].Ordination and classification are the two main techniques for analyzing effects of environmental factors on plant community data [42].Both techniques organize community data on species abundances independent of the habitat template [43].These studies, however, merely examined the relationships between plant community or vegetation data and the environmental or topographic attributes.Interactions between the spatial components of the vegetation, such as spatial distribution pattern and the geographical distribution of the ecological and human disturbance factors have hardly been addressed [35].Bio [44] recommended that the spatial components of plant species response should be an integral part of the exploratory data analysis and mapping.
Remote sensing methods mainly focus on developing linkages between spectral response and surface features (such as vegetation types or land-use/land-cover types) at a broad range of spatial scales.Mapping plant communities by image classification however, is often limited by spectral similarities among essentially different communities [9,45,46].To overcome this issue, environmental response models have been used to predict or produce potential vegetation maps in high mountainous areas [3][4][5]35,37].However, environmental response models are insufficient to map actual vegetation patterns due to their limited capability to handle vegetation distribution prone to disturbance and perturbation [45,47,48].In mountainous areas the integration between plant community data, human disturbance factors and geo-spatial data (DEM and image data), all featuring different spatial scales, is a particular challenge for vegetation mapping [35,43,45].Hence there is an obvious need to establish the link between theories and practices of ecological research strategies on one side and remote sensing image analysis on the other side [43,49].
Zhang et al. [34] reported that incorporating the relationships between vegetation and environmental factors into image classification requires further investigation.This study aims to add relationships between vegetation spatial distribution pattern (SDP), and environmental/human disturbance factors to the remote sensing image classification process.Two mountainous study sites were selected that differ in terms of climate, ecology, geography, and population density.Selection of two study sites rather than one enables to establish whether the specific characteristics of these mountain zones are a determining factor in the success of the experimental approach.
The objectives of this study are  To quantify the relationships between vegetation SDP and topography and human disturbance attributes by using ordination analysis;  To map mountain vegetation by integrating ordination models into remote sensing (Landsat Thematic Mapper) image analysis;  To test the effectiveness of this mapping approach by evaluating its accuracy against two alternative classification strategies: (1) ordinary image classification without ancillary information (null model); and (2) classification with a DEM as extra input channel; and finally;  To determine whether the significant differences of the two mountain zones are determinant in the outcomes of the experimental procedures.

Study Area
The two research sites are situated in the Lancang (Mekong) watershed, Yunnan province, China (Figure 1).They are located in the northern and southern part of this watershed and feature large differences.The northern part is a very steep mountainous area marked by the presence of four of Southeast Asia's largest rivers: Yangtse, Mekong, Irrawaddy and Salween.The southern part of the watershed is a tropical region containing Xishuangbanna Dai Autonomous Prefecture and is one of the most important areas in China in terms of biodiversity [50].The northern study area is located in the Weixi and Lanping counties and ranges from latitude: 26°7'12"N-27°53'24"N; longitude: 98°58'12"E-99°37'48"E (Figure 1).This study area covers the northern part of the Hengduan Mountain range of the eastern Himalayas and belongs to the World Heritage site of the Three Parallel Rivers of Yunnan Protected Area.The Hengduan Mountain region is one of the most biologically diverse temperate ecosystems on Earth [51][52][53].It is important not only for biodiversity but also for its water resources.This area is a typical steep mountainous area, with altitudes ranging between 1,500 and 4,500 m.The yearly average temperature 11.3 °C, and average annual rainfall is 954 mm.Because of the altitude differences, the climate varies significantly with elevation.The climate can be divided into four zones from the bottom to the top of the mountain: warm temperate, temperate, cold temperate and sub-frigid [54].This vertical climate zonation highly influences the vegetation spatial distribution pattern.In addition, many minority ethnic groups, such as Tibetans, Naxi, Bai, Lisu, are living in the northern study area [51].
The southern study area in the Xishuangbanna prefecture of China's Yunnan Province is located between 22°00'N-23°50'00"N and 100°00'12"E-102°00'E and covers about 50,000 ha (Figure 1).It belongs to the lower catchment of the Lancang River.Xishuangbanna is home to the richest biological and ethnic diversity (i.e., Thai, Lahu, Jinuo, and Jingpo people) in China.The maximum altitude is about 2,000 m.The climate of the region is strongly seasonal.The monthly average temperature ranges between 16.4°C and 22.0 °C.Between May and October the South-West Monsoon air masses brings about 80% of annual rainfall, whereas the dry and cold air of the Southern edges of the jet stream dominates the weather pattern between November and April.Annual rainfall varies between 1,200 mm in the Lancang valley and 1,900 mm at altitudes above 1,500 m.

Software
Canoco 4.5 was used to perform ordination.ArcView 3.2 and ArcGIS 9.1 were used to calculate topographic attributes and to perform ordinary kriging.Artificial Neural Networks (ANN) classification was conducted with LNNS, a neural network simulation program developed at the Laboratory of Forest Management and Spatial Information Techniques (FORSIT), Ghent University, Belgium.LNNS is freely available at http://dfwm.ugent.be/forsit[55].Idrisi Andes was used to assess accuracy comparison between different classification strategies.

Image Data
Each study area is covered by a 30 m resolution Landsat Thematic Mapper 5 (TM) image (Table 1).The available Landsat images are third level products of the China Remote Sensing Satellite Ground Station (http://www.rsgs.ac.cn/English/Standard%20Product.htm).Basic radiometric and geometric preprocessing was done by the image provider.Gitas and Devereux (2006) [56] mentioned that topographic correction should be regarded as an essential element of any classification methodology.We compared six different topographic normalization methods (Cosine correction, Minnaert correction, Ccorrection, SCS correction, two-stage topographic normalization, and slope matching technique) for their effectiveness in enhancing vegetation classification in mountainous environments.However, none of these topographic correction methods could significantly improve overall classification accuracy (Zhang et al., 2011) [57].Therefore, in this study, none of the topographic correction techniques was applied.  2 and 3).In view of this study, their selection was based on a twofold rationale.The first relates to the major vegetation communities in both study areas, while the second is driven by the vegetation classes that can be discerned from Landsat TM imagery.

DEM, Topographic Maps and Derived Attributes
The 25 m resolution DEM data produced by the China State Bureau of Surveying and Mapping was made available for this study (http://www.sbsm.gov.cn/).The DEM was re-sampled to 30 m and projected into the Universal Transverse Mercator (UTM) projection system, zone 47, WGS84 (World Geodetic System 1984).In this study, topographic maps produced in the 1970s by Chinese military at a scale of 1:100,000 were used providing useful information with respect to villages, towns, rivers, roads and land cover.The topographic maps can provide land cover information because the plant community's types did not change notably, especially in the remote mountainous areas.Moreover, human disturbance factors, such as road (small roads and trails), villages, rivers, and towns can easily be derived from topographic maps by digitization.
To analyze the relationship between topographic attributes and vegetation spatial distribution pattern, eight topographic attributes were derived from the DEM data.The applied topographic attributes can be divided into primary and secondary attributes.Primary attributes include elevation, slope, aspect, plan and profile curvature, and topography position index, and are calculated directly from the derivatives of a topographic surface [58].The secondary attributes, computed from two or more primary attributes, include a solar and a wetness index.No erosion index could be computed as the rainfall data was not available.Details of the computed attributes are listed in Table 4 [59][60][61][62][63][64][65].Table 3 lists the dominant vegetation classes and dominant species in the southern study area.Table 4. Short description of the selected topographic factors.

ELEV Elevation
Elevation is one of most important topographic factors in regulating mountain vegetation patterns [39,59,60].

SLO Slope
Slope also is one of important topographic factors for mountain vegetation patterns because it will influence features such as soil moisture, wind, and solar radiation [37,[59][60][61].

PRF Slope profile curvature
This index measures the rate of change of potential gradient and hence is important for characterizing changes in flow velocity and sediment transport processes [58].It also potentially indicates soil moisture [37].

PLF Planiform curvature
This index is related to converging/diverging flow and soil water content [62].

TPI Topographic position index
Topographic position index is the basis of the topography classification system and is simply the difference between a cell elevation value and the average elevation of the neighborhood around that cell [63].This index can affect the vegetation patterns in mountainous areas [4,59,64].

CTI
Compound topographic index CTI is a steady state wetness index [62].Wetness index has been shown to affect vegetation spatial patterns [3,4].

PADIR Potential annual direct incident radiation
PADIR is a solar index, and was developed by McCune and Keon [65].
Solar radiation is the primary atmospheric control over soil moisture status between precipitation events in vegetation not receiving melt water and appears to influence the local adaptation of vegetation [3].

Human Disturbance Factors
Forestry, agriculture, economic development, transport infrastructure are some of the disturbance drivers that influence vegetation spatial distribution pattern and landscape pattern at large regional scales [38].Numerous studies have shown that population density and proximate factors are important driving forces for both land cover and vegetation pattern change as well as for deforestation [66][67][68][69].As it pertains to the distribution of the population over a region, the surface density of villages was computed through GIS analysis [70].Secondly, proximate factors such as distance to settlements and markets (villages and towns) were considered [66,68].In rural areas in China, farmland and rural settlements are the most important land use classes [71].Most of the farming activities tend to cluster into the villages [72].In addition, mining, road and dam construction are important human interventions in the landscapes of both study areas.These activities are directly related to the occurrence of roads and rivers.Distance to roads and rivers reflect the landscape accessibility.Serneels and Lambin [66], and Overmars and Verburg [68], demonstrated that land use and land cover change was strongly related to distance to roads and rivers, while Nagendra et al. [73] claimed that accessible areas featured a higher degree of deforestation and fragmentation.Therefore and thirdly, distance to roads and rivers were included as human disturbance factors in the current analysis (Table 4).Most of the digital line (rivers and roads) and point (villages and towns) data were derived from the topographic maps.Because the topographic maps date back to the 1970s, an update, especially for the road database, was required.This was accomplished using the traffic map of Yunnan published in 2007 (http://www.ynbsm.gov.cn/emapshow.asp?col=ynly).The roads were divided into two categories: wide (width ≥ 3 m) and narrow (width < 3 m) roads.The rivers were categorized into three levels based on the standards of China's river system (http://nfgis.nsdi.gov.cn/nfgis/english/default.htm) (Table 5).Distance is a basic concept inherent to any geographical space.Euclidian distance, the simplest distance measurement, is widely used to calculate the distance of a location to the nearest destination of interest [66,68,74,75].However, Euclidean distance is computed by using the Pythagorean Theorem, which is not very realistic when representing effective spatial accessibility [76].Typically in a mountainous environment, the distance between two points over a sloping surface is significantly larger than over a flat path.Another critical issue is that of friction which impedes movement [77].Several researchers have shown the utility of modeling cost distance that takes into account the spatial heterogeneity [76,78,79].Hence, we measured the cost distance of a cell to the destinations of interest taking into account the existing friction based on slope angles (Table 5).There are a number of ways to assign friction coefficients to slopes.Here, an empirically binomial equation (Equation (1)), which was developed from sample data of walking times in mountains in the Middle Himalaya of Nepal, was used to model friction magnitudes for slope [77]: where Y is friction and X is the slope derived from DEM. Afterward, the cost distance is measured as the least cost (in terms of effort, expense, etc.) when moving over a friction surface [80].

Training and Validation Data
Three field surveys (September to October 2004 in the northern study area; March 2004 and September to October 2006 in the southern study area) were carried out.During field surveys, at several spots that were thought to be of a relevant size with regard to the spatial resolution of the available images, the present vegetation type, as well as a GPS measurement of the co-ordinates and elevation, were recorded.For each point, we also recorded the dominant vegetation type (Tables 2 and 3), the cover percentage and estimated height of vegetation types.In addition, human impact, such as harvest residue, fire, logging or grazing was recorded.Other features that were recorded, include vegetation groups (e.g., coniferous or broad-leaved, evergreen or deciduous, etc.), vertical structure of plant communities (e.g., subcanopy forest regeneration), and succession stages (such as weeds, pioneers, or climax vegetation types).They were used to establish the main characteristics and variability of the vegetation cover, and to acquire reliable field data for both training the classifiers and evaluating the accuracy of the classification results.Furthermore, at each point several photographs were taken in different directions to ensure the possibility of future comparison after the field work was completed.The field data were supplemented with manually digitized reference pixels.As a specific vegetation class might consist of several spectral subclasses, training data for each different spectral subclass was assembled in a ground truth digitizing stage [81].There exists, for example, -pine forest on shaded slopes‖, -pine forest on sunny slopes‖, -old rubber trees‖ and -young rubber trees‖.The number of spectral subclasses of the two images is listed in Table 1.The ground truth polygons were converted into a raster map with a cell size of 30 m × 30 m.Then, this map was randomly divided into training set, test set, and validation set be used in an artificial neural networks classification protocol.The test set was used to evaluate the algorithm's convergence while the validation data served for independent accuracy assessment.The number of pixels of the training sets and validation sets for each class is listed in Table 6.Moreover, in order to measure the topographic effects on spectral patterns of vegetation classes, histograms of digital number (DN) values of vegetation classes were examined (Figure 2).

Classification Algorithm Selection
Since the beginning of the 1990s, artificial neural networks (ANN) have been applied in the analysis of remote sensing images [82].Many authors have reported considerable advantages of ANN over conventional methods (e.g., Verbeke et al. [55], Mas and Flores [83]).One of the main advantages of ANN classifiers is that they are independent of the distribution of the class-specific data in feature space [84].Figure 2 clearly shows that the DN distributions of the ground truth data of some classes (i.e., fir and spruce forest, pine forest, and mixed forest) deviate from a Gaussian model.Considering the non-normality and heteroscedasticity of the ground truth data ANNs were used to perform image classification.After classification, the subclasses were merged into 10 broad land cover classes based upon ancillary information (GPS points, topographic maps, DEM data, and plant community data) and field ecological expertise (Tables 2 and 3).In addition, in order to build a CCA ordination model, a spatial database including vegetation data as well as topographic and human disturbance attributes was built.

Outline
The mountain vegetation mapping approach of this study contains three main stages: CCA ordination analysis (Stage 1), interpolation of CCA axes (Stage 2), and image classification incorporating ordination results or DEM data (Stage 3).In Stage 1, the relationship between topography and human disturbance factors and vegetation data was identified.In Stage 2, the CCA scores for unobserved locations were predicted based on the CCA scores at the sampled locations Ordination methods can be divided in indirect ordination or unconstrained ordination (i.e., detrended correspondence analysis, DCA and principal component analysis, PCA) and direct ordination or constrained ordination (i.e., canonical correspondence analysis, CCA and redundancy analysis, RDA) [85].The direct ordination methods are particularly suited to address relationships between species composition and environmental variables.Direct ordination methods have also been applied to study the relationships between vegetation SDP and environmental factors, especially topographic factors [61,64,[86][87][88].In this study, the spatial distribution of the known pixels was derived from our ground truth data which are homogeneous.In order to build CCA ordination model, a spatial database including vegetation data and topographic and human disturbance attributes was built.This spatial database included two matrices: vegetation type matrix (to each ground truth polygon a vegetation type was assigned) and topography and human disturbance attributes matrix (to each ground truth polygon, the average value of topography and human disturbance attributes listed in Tables 4 and 5 were assigned).
DCA was used in a preliminary test to assess the length of the environmental gradients of the axes [85].If that value was larger than 4.0, unimodal methods (DCA, CA, or CCA) was used.On the other hand, if the longest gradient was shorter than 3.0, the linear method (PCA or RDA) was considered to be a better choice [89].In the range between 3 and 4, both types of ordination methods are assumed to be applicable.
In both study areas, the lengths of the environmental gradients are larger than 4 (Table 7); the CCA ordination procedure was carried out to investigate relationships between topographic attributes and vegetation spatial distribution pattern at both study areas.The core of direct constrained ordination by CCA developed by Ter Braak [90] is a weighted multivariate linear regression of community gradients on environmental variables [91].First, the main variation in the species (or vegetation) data is analyzed by indirect gradient ordination (e.g., correspondence analysis, CA); Secondly, the indirect gradient ordination axis is used to relate to the environmental variables by multiple regression of the site scores on the environmental variables [90].A joint plot (biplot) showed the general vegetation types and their correlations with topographic and human disturbance variables.The relationships between vegetation data and topographic/human disturbance variables can be estimated by perpendicularly projecting vegetation points (triangles) onto the arrows (topographic and human disturbance variables).The lengths of the arrows are used to compare the size of such an effect across the topographic and human disturbance variables.The approximated correlation between topographic and human disturbance variables is equal to the cosine of the angle between the corresponding arrows (Leps and Smilauer, 2003) [85].The significance of the relationships between vegetation variables and topographic/human disturbance variables was determined by Monte Carlo permutation tests (Legendre and Legendre, 1998) [92].In addition, the intraset correlations between topography and human disturbance variables and canonical axes in CCA were used as indicators of which variables were more important in structuring the ordination (Ter Braak, 1986) [90].Therefore, a multiple linear regression was used to address those intraset correlations.CCA analysis reduced the vegetation data and environmental (topography and human disturbance) variables into four independent ordination axes.The four CCA axes explained the vegetation and environment data by recording the dispersion of vegetation and environmental (topography and human disturbance) variables scores.The spatial interpolation of 4 CCA environment axes scores at the sampled locations was carried out by using kriging [35,91,93,94].This geostatistical approach assumes that the spatial correlation structure between points (samples), which are closer together have more similar values than those that are further apart [95].Firstly, semivariogram analysis was used to explore the spatial correlation of the four CCA scores (Figure 3); Secondly, to assess spatial correlation, a model was fitted to the semivariogram with spherical, exponential, or Gaussian functions as candidate models resulting in estimated prediction standard errors.If the prediction standard errors are valid the root-mean-square standardized errors (RMSSE) should be close to 1 [95].From the calculation of the RMSSE a spherical model was used to fit the semivariogram of axis1, axis3, and axis4, while the exponential model was used to fit the semivariogram of axis2 [90].
Afterward, ordinary kriging was used to predict CCA scores at unobserved locations (Burrough and McDonnel 1998) [93].It is the most robust and often used method because it minimizes the variance error between the model and the estimate [96].The kriging interpolation is a weighted average of the observed values which is used to estimate the value of , identified at a specific location x 0 where there are no measured values [93,95,97].By using the spatial structure defined by the semivariogram, a kriging system of linear equations combining neighboring information can be defined [98].

Stage 3: Image Classification
There are many different types of neural networks for image classification [99].In this study, a three layer neural network was constructed, consisting of one input layer, one hidden and one output layer [100].The number of input neurons corresponded to the number of input layers (image bands complemented with DEM or CCA axes).In order to be consistent with the DN value range of the TM image data, all of the extra channels (DEM or CCA axes) were normalized on a 0 to 255 value scale.The number of neurons in the output layer equaled the number of subclasses while the number of the hidden neurons was arbitrary set to twenty.A sigmoid activation function was used with a learning rate of 0.001 and momentum of 0.2.

Accuracy Assessment
The accuracy of the classifications was assessed using confusion or error matrices and Kappa statistics (KHAT) [101].In addition, in order to test the null hypothesis of no differences between different images classification strategies, the pair-wise Z-statistic was calculated [102][103][104].

Ordination Results
From the Monte Carlo permutation test, except for planiform curvature (PLF) and cost distance to streams (CDS), all topographic and human disturbance variables were significantly correlated with the vegetation and non-vegetation spatial distribution pattern data in the northern study areas (Table 8).
In the southern study area, topographic position index (TPI), Slope planiform curvature (PLF), and slope profile curvature (PRF) were not significantly correlated with the vegetation data (Table 8).The results of CCA are shown in two diagrams for the two study areas (Figures 4 and 5 arrows.The length of arrows indicate that elevation (ELEV) and surface density of villages (SDV) are the two most important driving factors determining vegetation spatial distribution pattern in both study areas (Figures 4 and 5).Profile (PRF) and planiform (PLF) curvature have relatively small effects on the vegetation spatial distribution pattern.CCA diagrams also illustrate the relationships between each vegetation class and the set of environmental variables.Compared to the other land cover classes, snow (SN) and fir and spruce forest (FSF) are predicted to occur at higher elevation and are located relatively far from villages, towns, the Lancang River, and roads (Figure 4).Conversely, these two classes have lower surface density of village (SDV) values implying that, in the northern study area, snow (SN) and fir and spruce forest (FSF) occur at higher altitude with less human disturbance.Water (WT), pine forest (PF) and agricultural land (AL) all feature higher surface density of village (SDV) values and lower elevation and proximate values.Based on field knowledge, water (WT) (Lancang River), agricultural land (AL) and pine forest (PF) always occur at lower elevations where most of villages and roads are present.Water (WT) also has the highest compound topographic index (CTI) value and lowest slope degree.In contrast, the cast shadow class (CS) has the highest slope value.Figure 4 shows that dwarf shrub and meadow (DSM) is close to the center.It means that the dwarf shrub and meadow is weakly correlated to elevation and other factors.It can be found at any elevation and anywhere in the study area.It includes both natural vegetation (alpine and subalpine shrub/meadow), and vegetation that is featuring anthropogenic disturbance (grazing land and abandoned farm land).Mixed forest is also close to the center.It also includes both natural vegetation (oak and spruce mixed forest), and vegetation that is featuring anthropogenic disturbance (secondary mixed forest such as pine and poplar mixed forest).4. The symbols of environmental variables are described in Tables 2 and 3.By projecting the points of the vegetation classes (triangles) onto the arrows of quantitative environmental variable, an approximate ordering of those vegetation classes with respect to that environmental variable can be obtained.4. The symbols of environmental variables are described in Tables 2 and 3.By projecting the points of the vegetation classes (triangles) onto the arrows of quantitative environmental variable, an approximate ordering of those vegetation classes with respect to that environmental variable can be obtained.
In the southern study area, rubber plantation (YRT and ORT) and agricultural land (AL) are predicted to have a lower elevation and distance to towns (CDT), villages (CDV), roads (CDSR and CDLR), and rivers (CDS, CDMR, and CDLC) (Figure 5).However, these classes have higher values for surface density of villages (SDV) and potential annual direct incident radiation (PADIR).In contrast, evergreen forest (EF), low density forest (LDF), shrub and grass land (SGL), and burned land (BL) have higher elevation (ELEV) values (Figure 4).From field expertise it is known that the two anthropogenic classes, rubber plantation (YRT and ORT) and agricultural land (AL), always occurs at lower altitudes and close to towns and villages.
The importance of each ordination axis with respect to the original vegetation pattern is indicated by the eigenvalues.For the northern study area, all summed CCA axes explained about 41% of the vegetation data variability and about 32% of the variation of the vegetation data was explained by the four extracted axes.On the other hand, for the southern study area, about 37% of vegetation data variability was assessed by all CCA axes, and the four extracted axes explained about 28%.The eigenvalues showed that the explained variability in the vegetation data is for the larger part expressed by the first two dimensions.
We compared the partial CCA ordination results for both study areas.In the northern zone the topographic attributes explained about 28% (17.76% + 10.82%) of the total variability of the vegetation and non-vegetation data, while human disturbance factors accounted for considerably less (23.65% = 12.83% + 10.82%) of the variance (Table 9).However, in the southern study area, the variance explained by human disturbance factors (22.86% = 15.63%+ 7.23%) was almost equal to the variance explained by topographic attributes (21.94% = 14.71% + 7.23%).

Image Classification
The poorest ANN classification accuracy was obtained using the 7 TM image bands training 10 broad classes (Tables 10 and 11).For both study areas, the highest classification accuracies were obtained when classifying all spectral subclasses (41 and 46 for the northern and southern study area respectively) incorporating the 4 CCA axes as extra channels (Tables 10 and 11) (Figure 6).
Pair-wise Z test values are listed in Tables 12 and 13.These values show that classification accuracy can be significantly improved by training spectral classes or subclasses as compared to training broad classes [81,105].

Discussion
In steep mountainous terrain, topographic effects may cause the same vegetation type on opposite facing slopes to feature different spectral values and different vegetation types to have similar spectral values [106].Figure 7 shows a bimodal distribution of the pine forest (PF) spectral response patterns (training data) indicating the presence of two pine subclasses, pine on sunny slopes and pine on shady slopes.For both the northern and southern study areas, classification accuracy could be significantly improved by incorporating DEM or 4 CCA axes as additional channels (Tables 12 and 13).In the northern steep mountainous terrain, the importance of the DEM or CCA axes for vegetation mapping can be explained by the fact that the distribution of forest types is mainly determined by elevation (i.e., CCA axis1 is strongly correlated to elevation (Table 14).Likewise, the spatial distribution of fir and spruce forest (FSF), pine forest (PF), agricultural land (AL) and water are strongly affected by elevation (Figure 4).In the southern study area, elevation is also the main determining factor for the distribution of rubber trees, agricultural land, deciduous forest, and evergreen forest.
The classification performed without the DEM resulted in unrealistic distributions of some of the vegetation classes: e.g., pine forest (PF) and agricultural land (AL) occurring at altitude levels above 3,300 m, and even above 4,000 m; and fir and spruce forest (FSF) occurring below 2,500 m (Figure 8).Without DEM agricultural land (AL) was confused with alpine and subalpine shrubs and meadows (DSM), and pine forest (PF) was always mistakenly classified as fir and spruce forest (FSF) (Figure 2).Hence, integrating DEM data in the classification process resulted in a more realistic spatial distribution pattern of vegetation classes whose occurrence is determined by altitude, such as fir and spruce forest (FSF), pine forest (PF) and agricultural land (AL) (Figure 9).Both pine forest (PF) and agricultural land (AL) are no longer found in areas with an elevation above 3,500 m, while the fir and spruce forest (FSF) no longer occurs at elevations lower than 3,000 m.The distributions of the training data for three classes (fir and spruce forest (FSF), pine forest (PF), and agricultural land (AL)) in the Landsat TM near infrared (NIR) band are displayed in Figure 10.Pine forest (PF) overlaps both with the patterns of fir and spruce forest (FSF) and agricultural land (AL) indicating that pine forest is difficult to separate from both classes.Patterns of overlap in the DEM values are different: the DEM values of fir and spruce forest (FSF) almost have no overlap with those of pine forest (PF) (Figure 11).On the other hand, agricultural land and pine forest overlap completely when their altitude distributions are compared.These graphs indicate that incorporating DEM values in the image classification process is recommended for vegetation types whose distributions are determined by altitude.This corresponds with research work reported by Elummoh et al. [23], Liu et al. [26], Ren et al. [28], Zhang et al. [34].In the northern study area, the improvements to classification accuracy when adding the DEM or CCAs were very similar (Table 12).This is quite logical as the CCA ordination results showed that the CCA axes are strongly correlated to the computed topographic attributes, particularly for CCA ax1 and elevation (r = 0.95) (Table 10), suggesting that CCA axes and DEM data are redundant.
In the southern study area, the accuracy is significantly higher using 4 CCA axes compared to incorporating a DEM as an extra band (Table 13).As mentioned above, in the southern study area, the variance explained by human disturbance factors is higher than the variance explained by topographic attributes.As vegetation and spatial distribution pattern's are strongly influenced by human disturbance (e.g., rubber plantations, burned land, and agricultural land), classification accuracy can be significantly improved by adding ancillary data relating to human influence factors.Figure 12 shows a large overlap between the spectral response patterns of young rubber trees and agricultural land in the NIR band (Band 4).This overlap can be reduced by using CCA axis2, which is significantly correlated with the human influence factors, except for cost distance to middle rivers (Table 14) as an extra channel (Figure 13).A large part of the variation in the vegetation data remains unexplained.The explained variability depends on the number of vegetation classes and samples [81].Mountain vegetation is the result of a complex interaction between natural processes and human intervention, and it is impossible to completely explain the response of the vegetation to a limited set of attributes [4,32].Additionally, some species have a relatively wide ecological amplitude, so there will never be a perfect species-environment correlation.Soil is another important biophysical factor for vegetation spatial distribution pattern, but this parameter was not included in the research work.Furthermore, the Kriging interpolation scheme for CCA axes introduced errors in feature space, which may explain why the CCA scores do add little to the accuracy of the image classification.
Compared with the southern study area, the northern study area is much more rugged: The effects of topographic factors on vegetation spatial distribution, hence stronger.However, in the southern study area, with its higher population density, the human disturbance is stronger than in the northern study area [107].Anthropogenous effects on the vegetation spatial pattern are stronger than the effects of topography in this area.Since the establishment of rubber plantations in 1956 in the southern study area (Xishuangbanna prefecture, Yunnan, China), the expansion of rubber plantation has accelerated, particularly after 1978 [108][109][110].In the southern study area, another important human disturbance factor is fire (burned land).Xishuangbanna is the traditional homeland of upland minority people (-hill tribes‖) including Dai, Hani (called Akha in Thailand), and Bulang [110].Hani (Akha) and Bulang people traditionally practice shifting cultivation in the uplands and cultivated rice in the lowlands [110][111][112].From field observations it has been established that the local people commonly use fire to clear forest and shrub land for rubber plantation.

Conclusions
The purpose of this study was to present a novel method for mapping mountain vegetation in Lancang Mekong, China, whereby the added value of Canonical Correspondence Analysis (CCA) to an Artificial Neural Network (ANN) was investigated.All the topographic attributes were significantly correlated with vegetation spatial distribution patterns in both study areas.Elevation and slope were established as the two most important influencing factors.In addition, CCA ordination reduced the vegetation data and environmental (topography and human disturbance) factors into four independent ordination axes.Inclusion of these additional variables improved the classifications up to a maximum of 6%.This was particularly the case for vegetation classes strongly affected by topography, such as example pine forest, fir and spruce forest in the north, and rubber plantation in the south.Also, with an overall accuracy of more than 95% and kappa value of more than 0.95, it was demonstrated that inclusion of the CCA ordination model into image analysis holds considerable promise for an automated approach of mountain vegetation mapping.It was found important to separate the vegetation classes into subclasses based on the spectral response patterns, to implement the training stage eventually resulting in improved classification accuracy.In the case of the more rugged terrain in the northern study area, elevation (DEM) and CCA axes were equally important since the most important factor driving the distribution of vegetation is altitude.For the southern area, the CCA axes were more important than the DEM for improving the classification.Accurate mapping of rubber plantations in the south will be very useful for ecosystem conservation, sustainable planning, and management of ecologically fragile mountain area.
As mentioned in the introduction, one of the big challenges for mapping mountain vegetation is the scale issue of the integration between plant community data, human disturbance factors and geo-spatial data (i.e., DEM and image data).Further work will therefore focus on the question of scale interactions and scale gaps concerning the data used for dominant vegetation community mapping, such as plant community data, geographic spatial data and multi-resolution image data.Furthermore, it is acknowledged that the Kriging interpolation scheme for CCA axes introduces errors in feature space, which may partly explain why the CCA scores attribute to the accuracy of the final image classification.Further development of the interpolation scheme may need to be developed in future work as well.

Figure 1 .
Figure 1.Location of the two study areas.
. Next, in Stage 3 an ANN was applied classifying the TM image data incorporating ancillary information from the 4 CCA axes or from the original DEM.Finally, classification accuracy was assessed.Stage 1: Ordination Analysis

Figure 3 .
Figure 3. Fitted variogram for each extracted canonical correspondence analysis (CCA) ordination axis in the northern study area.(a) Semivariance diagram of CCA scores for axis 1; (b) Semivariance diagram of CCA scores for axis 2; (c) Semivariance diagram of CCA scores for axis 3; (d) Semivariance diagram of CCA scores for axis 4.
). Vegetation classes are represented by triangles, and topography and human disturbance factors are represented by 0

Figure 4 .
Figure 4. CCA ordination of vegetation classes' spatial pattern in relation to environmental variables in the northern study area.The symbols of vegetation classes are listed in Table4.The symbols of environmental variables are described in Tables2 and 3.By projecting the points of the vegetation classes (triangles) onto the arrows of quantitative environmental variable, an approximate ordering of those vegetation classes with respect to that environmental variable can be obtained.

Figure 5 .
Figure 5. CCA ordination of vegetation classes' spatial pattern in relation to environmental variables in southern study area.The symbols of vegetation classes are listed in Table4.The symbols of environmental variables are described in Tables2 and 3.By projecting the points of the vegetation classes (triangles) onto the arrows of quantitative environmental variable, an approximate ordering of those vegetation classes with respect to that environmental variable can be obtained.

Figure 6 .
Figure 6.Land cover maps derived from ANN image classification incorporating the four CCA axes as extra channels in both study areas.(a) Land cover image classification in the northern study area; (b) Land cover image classification in the southern study area.

Figure 7 .
Figure 7. Distribution of DN values for pine forest training data in Landsat 5 TM NIR (band 4).

Figure 8 .
Figure 8. Frequency distribution of fir and spruce forest (FSF), pine forest (PF), and agricultural land (AL) obtained by ANN classification without DEM as an extra channel.

Figure 9 .
Figure 9. Frequency distribution of fir and spruce forest (FSF), pine forest (PF), and agricultural land (AL) obtained by ANN classification using DEM as an extra channel.

Figure 10 .
Figure 10.Frequency distribution of DN value for FSF, PF and AL in Landsat 5 TM NIR band.

Figure 11 .
Figure 11.Frequency distribution of DEM values for FSF, PF and AL.

Figure 12 .
Figure 12.Frequency distribution of DN values for YRT and AL in Landsat TM band 4.

Figure 13 .
Figure 13.Frequency distribution of CCA scores for YRT and AL in axis2.

Table 1 .
Satellite image data and number of training classes and subclasses.

Table 2 .
The dominant vegetation classes and dominant species in the northern study area.

Table 3 .
The dominant vegetation classes and dominant species in the southern study area.

Table 6 .
The number of pixels of the training and validation sets for each class in the both study areas.The frequency distribution of digital number (DN) values of the training data in the Thematic Mapper near infrared (NIR) band (band 4).FSF: fir and spruce forest; PF: Pine forest; OF: oak forest; MF: mixed forest; LDF: low density forest and tall shrub; DSM: dwarf shrub and meadow; AL: agricultural land.

Table 7 .
Summary table of DCA applied to vegetation training data.

Table 8 .
Statistical significance of topography and human disturbance variables for vegetation spatial patterns in the two study areas.
Note: The underlined numbers are cases with p > 0.05.

Table 9 .
Decomposed explained variance in vegetation data.

Table 10 .
Summary of the TM image (2 December 2003) classification results in the northern study area.: n is the number of pixels.ANN, Artificial Neural Network.DEM, Digital Elevation Model. Note

Table 11 .
Summary of the TM image (1 March 2004) classification results in the southern study area.
Note: n is the number of pixels.

Table 12 .
Pair-wise Z statistic test of the comparisons of the different classification strategies in the northern study area.

Table 13 .
Pair-wise Z statistic test of the comparisons of the different classification strategies in the southern study area.

Table 14 .
Pearson correlations of CCA axes with topographic and human disturbance attributes.