A Novel Land Cover Classification Map Based on a MODIS Time-Series in Xinjiang , China

Accurate mapping of land cover on a regional scale is useful for climate and environmental modeling. In this study, we present a novel land cover classification product based on spectral and phenological information for the Xinjiang Uygur Autonomous Region (XUAR) in China. The product is derived at a 500 m spatial resolution using an innovative approach employing moderate resolution imaging spectroradiometer (MODIS) surface reflectance and the enhanced vegetation index (EVI) time series. The classification results capture regional scale land cover patterns and small-scale phenomena. By applying a regionally specified classification scheme, an extensive collection of training data, and regionally tuned data processing, the quality and consistency of the phenological maps are significantly improved. With the ability to provide an updated land cover product considering the heterogenic environmental and climatic conditions, the novel land cover map is valuable for research related to environmental change in this region.


Introduction
Land cover (LC) information provides thematic characterizations of the Earth's surface that indirectly represent the biotic and abiotic properties, which are closely related to the ecological condition of land areas [1].As surface properties affect the biosphere-atmosphere interaction, accurate LC information is required to evaluate the effect of LC changes on the environment [2].Additionally, land cover changes are among the most important agents of environmental change at the local to global scales, and have significant implications on the health of the ecosystem and on sustainable land management [3,4].Land cover products are available at different spatial resolutions, ranging from 300 m to 1 km at the global scale.Since the 1990s, large-scale land cover mapping based on satellite data has become possible using datasets derived from the Advanced Very High Resolution Radiometer (AVHRR) [5,6].With the emergence of newer med ium resolution remote sensing data sources (e.g., moderate resolution imaging spectroradiometer (MODIS), SPOT VEGETATION, and MERIS), global land cover data with a higher level of detail have been developed.The current generation of global land cover products includes the GLC2000 product generated from SPOT VEGETATION [7], the MODIS Collection 5 Land Cover Product [8], and the GlobCover product produced using data from MERIS [9].Despite the availability of these various global land cover products, the problem of uncertainty and comparability of these products remains [10].For example, comparisons have been undertaken between global land cover datasets [11][12][13] or between global and specific regional products [14,15] based on the prior harmonization of different products.Agreements can be achieved for very clearly defined classes and, typically, for homogenous areas [11], whereas heterogeneous landscapes and transition zones have been reported to be major challenge s when utilizing medium resolution land cover data [12,[16][17][18].For many regions, the overall relative quality of the existing products is not well known and has not been investigated in depth.
The Xinjiang Uygur Autonomous Region (XUAR) covers an extensive area of 1,660,000 km 2 , which is more than one sixth of China's territory.As the largest autonomous region in China, the XUAR contains a large proportion of the country's arid area.Since 1978, an unprecedented combination of economic reforms, exploration of natural resources, and population gro wth have led to a dramatic transformation of land cover across the XUAR [19].As mass data (more than one hundred Landsat scenes) processing is required to produce a land cover product for the entire XUAR, previous studies have focused primarily on subsets of the overall area.For example, land cover and land-use dynamics have been mapped using satellite images for selected oases in the XUAR [20][21][22].Studies on land cover mapping throughout the XUAR are still lacking.
To satisfy the requirement of accurate land cover mapping in the XUAR, the primary objective of this study was to develop a product covering the entire region.Before processing novel data for the XUAR, we compared seven existing land cover maps of the XUAR extracted from existing regional and global products.Therefore, we harmonized these datasets, based on the well-known Land Cover Classification System (LCCS) [11].We then developed a classification scheme to generate a novel land cover map at a 500 m resolution, covering the entire XUAR for the year 2010.Using the scheme and reference data, the classification was performed using the TWOPAC (Twinned Object-and Pixel-based Automated classification Chain) classification software [23], employing a C5.0 decision tree algorithm built on a time series featuring phenological metrics derived from annual enhanced vegetation index (EVI) data.Finally, the quality of the product was quantitatively and qualitatively evaluated and elucidated.

Study Area
The topography and climate of the XUAR are presented in Figure 1.Located in the northwestern part of China, the XUAR is situated far from oceans and other large water bodies and has a variable arid to semi-arid continental climate with a mean annual precipitation of 100-200 mm [19].The mean July temperature is 27.1 °C, and the mean January temperature is −17.1 °C [19].Areas within high mountain ranges have a typical mountain climate, which is characterized by long, cold winters and short, hot summers.The northern part of the province is influenced by the Siberian climate.The mountain ranges extend in an east-west direction with most elevations exceeding 3000 m.The Junggar Basin and the Gurbantunggut Desert lies between the Altay and Tianshan Mountains.The Tarim Basin and the Taklamakan Desert are situated between the Tianshan and Kunlun Mountains.The XUAR is primarily covered by grassland and sandy desert [21].Forest areas are sparsely scattered within the high mountains and along the rivers.Oasis landscapes ranging from small to moderate in size (0.01~15,000 km 2 ) have developed within inland river deltas, alluvial-diluvial plains, and along the edges of diluvial-alluvial fans.Agricultural land and human settlements are distributed around these oases.

The Requirement for a Novel Land Cover Product ove r the XUAR
In our study, seven land cover maps from existing regional and global products for the XUAR were compared.These maps are as follows: 1. UMD 1992/93-University of Maryland Global Land Cover Product [24] 2. GLC 2000-Global Land Cover [25,26] 3. Landuse2000 [27] 4. MODIS Land Cover MCD12Q1 2001 [8] 5. GlobCover 2004/2006 [9] 6. GlobCover 2009 [28] 7. MODIS Land Cover MCD12Q1 2009 [8] The product characteristics of these datasets are presented in Table 1.The products at differing spatial resolutions varying from 300 m to 1 km were derived from different sensors, such as AVHRR, SPOT-4, MERIS, MODIS, and Landsat TM.The products are characterized by a varying number of land cover classes and represent the land cover at different points in time.We re-projected these datasets to a geographic projection (lat/long) as the common reference.
To compare the products, the land cover classes of the individual products were translated to a harmonized legend, which was performed following Herold et al. [11] and the GOFC-GOLD Report No. 43.According to , the selected products are characterized by 14 to 25 land cover classes, which must be translated into 13 classes, as determined by the LCCS.This scheme is based on a general agreement of the UN Land Cover Classification System [1], which provides a common land cover language for building land cover legends and translating and comparing existing legends.The LCCS defines classifiers rather than categories, thus, standardizing the terminology and the attributes used to define the thematic classes in the maps [34].Figure 2 presents a visual comparison of the seven harmonized land cover maps for the XUAR.Discrepancies among the harmonized products are clear, based solely on a visual comparison.Barren lands dominate the study area for all of the datasets.Large water bodies can be consistently identified in the seven products.Most disagreements occur in the class assignments of the vegetation types.For example, the UMD product has an obvious overestimation of shrub lands where barren lands are distributed [21].Although within these products vast areas of herbaceous vegetation were detected in the middle of the XUAR, only a small part was classified, for example, within the GlobCover data set.In addition, the area of forest coverage shows a large discrepancy among the seven products.A comparison was performed to identify the level of agreement between each 1 km 2 pixel in the seven datasets using the LCCS.Seven levels of agreement were calculated as follows: No agreement-pixels containing different LCCS classes in each dataset; Level 2 to level 6 agreement-pixels in which two to six of the seven datasets are in agreement, respectively; Full agreement-pixels in which all of the seven datasets were in agreement.
According to the levels of agreement in Figure 3, full agreements were obtained for the vast desert areas.Full or level six agreements were achieved for most of the water bodies.In the marginal area of deserts and in the transition zones between deserts and mountain ranges, partial agreement occurred.Most disagreements exist in the mountainous areas of the Kunlun Mountain and in the barren land area around the eastern part of the Tianshan Mountain.To quantitatively analyze the differences, the percentage areas for the 13 LCCS classes were calculated, and they are illustrated in Figure 4. Evergreen broadleaf trees can be neglected for the comparison because they occupy a very small percentage of the total area.There is reasonable agreement across the datasets for barren land, herbaceous vegetation, open water, built-up areas, snow and ice, and croplands.Disagreements were primarily found for vegetation types, including shrubs, mixed trees, deciduous trees, and other vegetation.According to the statistical data [35], the agricultural area of the XUAR at the end of 2008 was 41,245 km 2 , covering 2.48% of the entire area.The percentage area of the UMD product was closest to the statistical data (2.57%), and all of the other 6 products overestimated the area of the agricultural lands (3.4%~10.54%).The urban area in 2010 was 838 km 2 , covering 0.05% of the entire area [35].The UMD and GLC2000 products underestimated the built-up area (0.019%), and the other four products showed overestimations (0.08%~0.17%).
The Manas River Basin was selected as a test site for the local comparison and detailed analyses.Figure 5 presents the seven harmonized products in greater detail for this test site.The Manas River watershed in the Xinjiang Province is a typical inland watershed in an arid area.Over the past 50 years, the population of this river basin increased from 59,000 in 1949 to 1,109,000 in 2004, which has led to intensive changes in land use, including farmland enlargement and urbanization [20,36].The grasslands were detected in the MODIS data, whereas they were classified as shrub land in the UMD product and as croplands in the GlobCover data.The built-up area was not discernible in the GLC2000 data.The forest types differ significantly in the seven products.The area of snow and ice is smaller in the MODIS product compared with the other data.A temporal assessment and reasoning can be performed based on Figure 5.Although the data from the GlobCover 2009 and the MODIS land cover type product from 2009 were generated from the same year, the differences between the two products are significant.GlobCover 2009 displays a significantly larger areal coverage with cultivated agricultural land than the MODIS2009 product.
The land cover products vary in the production algorithm, data resolution, and time of data acquisition, and these factors are responsible for the disagreements.The complex topography in the mountainous area may also lead to data noise and misclassification.The disagreements found globally and at the test site, across the seven datasets, indicate that users should review the global datasets before employing them in regional studies.The integration of the LC products with low accuracies into predictive models (hydrologic modeling, biomass modeling, climate change predictions, etc.) may have a devastating effect with respect to statements on future perspectives of an area.

Classification Scheme
Based on the LCCS of the FAO of the United Nations, a hierarchical classification scheme (Figure 6) was developed specifically for the XUAR and was applied in this study [34].The presented classification scheme with its 12 classes follows the international LCCS standards with clear and systematic definitions of each land cover class, providing internal consistency.All of the classes are clearly defined by unique labels, which were derived based on the LCCS software [34].Table 3 provides descriptions of all of the classes of the introduced classification scheme, the unique class dichotomous codes and the associated class short names.Each label (dichotomous code) involves the specific class construction and a detailed description concerning the life form, canopy coverage, and predominant land use.

Methodology
The classification workflow consists of several steps, as shown in Figure 7.The MODIS data were preprocessed and ingested into an automatically self-generated decision tree classifier.The reference data used as training samples to build the decision tree were manually collected by visual interpreters.Furthermore, a large number of validation samples were collected for a subsequent accuracy assessment.After a post-classification processing of the automated classified image, the final land cover map was generated.[23,37].The classification methodology is based on a C5.0 decision tree algorithm, which belongs to the supervised machine learning algorithms [38,39].The C5.0 classifier is an empirical learning system that uses training samples with known labels to extract informative patterns.The extracted patterns are assembled into a tree-structured classifier, which is subsequently used to classify unseen cases [38].A C5.0 classifier can be expressed as a decision tree or as a set of simple if-then rules (ruleset).The TWOPAC software was used to implement the C5.0 classification process [23].

Input Data
The input data for the classification are composed of spectral and temporal information derived from a one-year time-series (2010) of MODIS EVI and reflectance of the red and near-infrared channels, all at a spatial resolution of 500 m.The EVI time-series was calculated from the MODIS Surface-Reflectance Product (MOD 09A1), which is available in 8-day composites.For these time series (46 time steps of MOD09A1), phenological metrics were derived as descriptive statistics for temporal sections of the time-series.Four temporal sections were defined, namely the winter/spring section from January to March (before the growing season), the summer section from April to September (the growing season), the autumn/winter section from October to December (after the growing season), and the full annual cycle from January to December.
For each of the temporal sections, the median, minimum, maximum, and amplitude values of the EVI, red and near-infrared reflectances were calculated, resulting in 48 metrics.For all of the calculations, all of the observations labeled as cloudy or adjacent to clouds were removed.Six metrics were excluded from further processing due to their high correlation with the others, resulting in a total number of 42 MODIS metrics, which were finally used as features in the classification process.

Training and Validation Data Collection
The selection of reference data for training and validation is primarily a manual process.A reference sample is defined as a polygon with a minimum size of at least nine MODIS pixels.The polygons must be spatially homogeneous and characterized by one land cover type.For training sample generation, 17 Landsat images were acquired during May and June of 2010.The images were equally distributed over the study area and included all of the classes defined in the classification scheme.Suitable polygons were manually selected and assigned to the appropriate land cover class from the Landsat images and based on additional reference information gathered from a high-resolution satellite image and field data (Figure 8).For each of the 12 classes, at least 10 evenly distributed polygons were collected for each Landsat scene.The final reference dataset included 26,000 training samples (500 m × 500 m), approximating 6500 km 2 or 0.5% of the XUAR.Compared with the MODIS Global Land Cover with 14,136 pixels for all of Asia [8], this reference dataset was more comprehensive.For the classification itself, two thirds of the reference dataset was used as training data, whereas the other third was employed for validation of the result.

Post Classification
To improve the classification results of certain spectral-temporal ambiguous classes, we applied several post-classification steps.The digital elevation model SRTM with a spatial resolution of 90 m over the XUAR was rescaled to the size of the MODIS pixels to enable spatial homogeneity because a DEM is a helpful layer to decrease misclassifications between spectrally ambiguous classes.For example, the class -ice and snow‖ has similar spectral features to -bare areas with salt flats‖.Bare lands with salt flats or saline lands are primarily distributed in diluvial-alluvial plains in front of the Kunlun and Tianshan Mountains, the alluvial plain and the delta of large rivers where the altitude is lower than 1000 m [40,41].Therefore, we reclassified the -bare areas with salt flats‖ pixels higher than 1000 m to -ice and snow‖.According to Han et al., irrigated cultivated lands distributed in the oasis plains along the middle and lower reaches of the inland rivers support 95% of the population of the XUAR [42].We assumed that crop growing is limited at heights above 2000 m in the XUAR due to climate conditions.Therefore, we reclassified -cropland‖ pixels higher than 2000 m as -grassland‖.

Land Cover Classification Map
Figure 9 shows the classification map generated with the MODIS time-series, which we will refer to as the -XUAR Landcover 2010‖ product.The classification map shows the extent a nd distribution of the different land cover types for the year 2010 over the XUAR.The Tianshan and Altay Mountains are primarily classified as grassland, ice and snow, and evergreen forest.The Kunlun Mountain is primarily characterized by bare land, grassland, and snow because the scarce precipitation hinders the growth of forests.Bareland extends over the Gurbantunggut and Taklamakan Deserts.Grassland and sparse vegetation spread along the transition zone between the mountains and deserts.Agricultural lands and built-up areas are primarily distributed close to river oases.Deciduous forests are distributed along rivers, and salty lands are distributed around lakes and rivers.Figure 10 shows three subsets within the XUAR, with locations indicated in Figure 9.For each subset, a Landsat TM scene (a) and the classification map (b) are presented.Plot I is located in the northeast of the XUAR in the grassland between the Junggar Basin and the eastern Tianshan Mountains.The Tianshan Mountains in this area are characterized by extensive areas of evergreen forest and grassland with certain bare regions and snow and ice occurring in high elevation areas.The classification map (Ia) correctly differentiated the distribution of evergreen forests and grasslands.The area of soil salinization is also discernible around the lake.The second subset covers a large area of the western Tianshan Mountains.The mountainous area is primarily covered by large areas of snow and ice in the higher regions.Forests and grasslands are distributed in the middle and lower elevation zones of the mountain range.The third plot is a typical oasis located near the southern range of the Taklamakan Desert and to the north of the Kunlun Mountains and is characterized by highly managed agricultural lands in the river plain.The built-up areas with varying sizes are differentiated from croplands in the land cover map (IIIc).

Accuracy Assessment
An accuracy assessment provides information on product quality and identifies possible source s of errors.The compilation of a confusion matrix is a standardized method to represent the accuracy of classification results derived from remote sensing data by calculating accuracy measurements, such as overall accuracy, producer's accuracy, and user's accuracy [43].To evaluate the accuracy of the classification map, we created confusion matrices based on the validation datasets.The validation result is based on a non-overlapping set of samples and is calculated automatically within the TWOPAC classification chain.The results derived from the confusion matrix (Table 4) yield an overall accuracy (OA) of 77.61%.The class -evergreen forest‖ has a user accuracy of 77.05% due to a certain amount of misclassification with grassland.Deciduous forest areas have an accuracy of 87.5%.In certain cases, the grassland area was mislabeled as forest, cropland, and bare land, thus, achieving a user accuracy of 61.41%.The wetland class was partially misclassified as cropland and grassland and yields a user accuracy of 68.75%.The built-up areas reach an accuracy of 84.62%.High accuracies (>90%) were achieved for -cropland‖, -water‖, and -snow and ice‖.Some of the most affected classes, which were misclassified, were relabeled by the post classification procedure.The confusion matrix after the post classification step is also shown in Table 4, and an overall accuracy of 79.78% is achieved.After post-classification, the user accuracies of the four land cover types (cropland, grassland, bare with salt and snow and ice) were improved.

Discussion
The Xinjiang Uygur Autonomous Region, which spans an area of 1,660,000 km 2 and extends for over 1600 km from north to south and 2000 km from east to west, is characterized by a complex topography, with elevations ranging from −192 m to 8028 m, and a unique inland continental location, which results in complex ecosystems.Considering these challenging conditions and the significant disagreements among the existing land cover products analyzed prior to our own product generation, the accuracy of the novel XUAR 2010 land cover map can be considered satisfactory.As illustrated in Figure 9, the land cover distribution differs significantly across the region.According to the State Forestry Administration of China, the desertification area in the XUAR is 1,071,200 km 2 , constituting 65.24% of its territory [44].The spatial distribution of the bare land in the XUAR 2010 land cover map agrees with the sandy deserts and desertified lands map produced with visual interpretation of the Landsat images [19].The large-scale vegetation distribution is also consistent with the ecosystem distribution based on the topographic and climate system of this region [45].From the plot analysis, the small-scale distribution is also well presented in the land cover product.The comparison of the classification product with the Landsat images in Figure 10 indicates that a distinction can be detected among the land cover classes at a spatial resolution of 500 m per pixel.The application of a regionally specified classification scheme, extensive training data, and regionally tuned data processing have been proven to significantly increase the quality and consistency of the LC maps [46,47].
The 79.78% OA of our XUAR Landcover 2010 map is comparable with the cross-validated MODIS land cover product (OA 75%) [8].Although the OA of our XUAR Landcover 2010 map is lower than the classification results for small areas, classification with medium resolution images for large areas has been reported as being challenging.For example, Gong et al. have reported their effort to produce 30 m resolution global land cover maps using Landsat TM and ETM+ data, and the highest OA achieved is only 64.89% [48].Low levels of accuracy primarily exist for spectrally and temporally ambiguous classes.Grasslands of different densities are occasionally confused with other land cover types.High-density grasslands have similar spectral and phenological characteristics to croplands, whereas sparse grasslands may be confused with sparse vegetation and bare lands.Misclassification of land cover types, such as wetland and built-up areas, can be attributed to mixed pixels.Wetland areas are typically a spectral mixture of vegetation and water.Built-up areas are mixed pixels of impervious areas and green areas, which may lead to its confusion with grassland.In addition, the derivation of ambiguous classes covering small areas is difficult at the 500 m resolution of the MODIS sensor.
Despite these misclassifications, which occur with all of the classification results for larger areas, the product retains significant potential because it is currently, according to our knowledge, the best land cover product that exists for this region.The XUAR Landcover 2010 map presented in this study might be a valuable tool for the modeling community, such as in the field of hydrologic modeling, biomass modeling, climate forecasting, or future land use change prediction.Interested scientists are encouraged to contact the authors to receive the novel classification product in a digital format.
Based on the classification approach proposed in this study, LC maps with refined classification schemes can be produced in further studies.For example, various forests and crops can be discriminated based on their different phenological characteristics [49,50].For the refinement of the final LC product, the MODIS data in 2009 and 2011 can also be included to better characterize phenologies of various vegetation types in the classification process.

Conclusions
Accurate land cover mapping on a regional scale in the XUAR is useful for regional climate and environmental modeling.In this study, we evaluated the accuracy of seven global land cover products over the XUAR and found that significant discrepancies exist.Furthermore, the novel XUAR Landcover 2010 product was derived based on an automatic decision tree classification procedure employing the TWOPAC classification software.An extensive MODIS-derived EVI time series was utilized as the input data, covering six MODIS tiles with 46 dates each, which were first preprocessed and then used to extract phenological metrics.After post-processing, including the SRTM digital elevation model and parameters derived thereof, good accuracies of 79.78% for the overall produc ts and accuracies ranging from 22.52% to 99.2% for the individual classes could be attained.For selected areas within the XUAR, we also compared the results with higher resolution Landsat data and found that small-scale types, such as salty lands and the differentiation between deciduous forest and grasslands, can be captured.We consider that the XUAR Landcover 2010 product is a solid input for the modeling community or for future studies on regional land cover change.The novel product in this study can be shared with interested researchers active in the XUAR area.

Figure 1 .
Figure 1.The Xinjiang Uygur Autonomous Region (XUAR) with its geographic units, elevation zones, and typical climatic regimes.The monthly mean temperature and the monthly mean precipitation records at each metrological site from January to December, 2010, were obtained from the National Meteorological Information Center (NMIC) of China, and are shown for the climate charts.

Figure 2 .
Figure 2. Comparison of seven of the available land cover products covering the XUAR (LCCS harmonized).

Figure 3 .
Figure 3. Levels of agreement among seven of the available land cover products covering the XUAR, classified according to the LCCS.

Figure 4 .
Figure 4. Percentage area comparison of the LCCS land cover classes among seven available land cover products covering the XUAR.

Figure 5 .
Figure 5.Comparison of land cover maps extracted from seven of the available land cover products (LCCS harmonized) for the Manas River Basin of the XUAR.

Figure 6 .
Figure 6.Land cover classification scheme for the XUAR based o n the LCCS standards.

Figure 8 .
Figure 8. Footprint of the reference data and validation samples.

Figure 9 .
Figure 9. Land cover classification result for the XUAR derived from a MODIS EVI time series for 2010.The three subsets marked with red rectangles were selected for a closer observation (see Figure 10), and the 90 m SRTM data were shown as the background of the study area.

Figure 10 .
Figure 10.Comparison of the Landsat imagery (a) with the land cover classification result for the XUAR in 2010 (b) for three different subsets (I-III).The Landsat imagery is displayed as an R(band 4) G(band 3) B(band 2) composite.

Table 2 lists the generalized global land cover legend with the LCCS definitions and the corresponding classes from the individual global legends.Table 1 .
Characteristics of the land cover datasets covering the Xinjiang Uygur Autonomous Region (XUAR).

Table 2 .
Generalized global land cover legend with the Land Cover Classification System (LCCS) definitions.

Table 3 .
Land cover class description for the XUAR classification.

Table 4 .
Confusion matrices of the XUAR Landcover 2010 before and after post-classification.The bold values are the classification results after post-classification.