Validation of Land Cover Maps in China Using a Sampling-Based Labeling Approach

This paper presents a rigorous validation of five widely used global land cover products, i.e., GLCC (Global Land Cover Characterization), UMd (University of Maryland land cover product), GLC2000 (Global Land Cover 2000 project data), MODIS LC (Moderate Resolution Imaging Spectro-radiometer Land Cover product) and GlobCover (GLOBCOVER land cover product), and a national land cover map GLCD-2005 (Geodata Land Cover Dataset for year 2005) against an independent reference data set over China. The land cover reference data sets in three epochs (1990, 2000, and 2005) were collected on a web-based prototype system using a sampling-based labeling approach. Results show that, in China, the highest overall accuracy is observed in GLCD-2005 (72.3%), followed by MODIS LC (68.9%), GLC2000 (65.2%), GlobCover (57.7%) and GLCC (57.2%), while UMd has the lowest accuracy (48.6%); all of the products performed best in representing “Trees” and “Others”, well with “Grassland” and “Cropland”, OPEN ACCESS Remote Sens. 2015, 7 10590 but problematic with “Water” and “Urban” across China in general. Moreover, in respect of GLCD-2005, there are significant accuracy differences across seven geographical locations of China, ranging from 46.3% in the Southwest, 77.5% in the South, 79.2% in the Northwest, 80.8% in the North, 81.8% in the Northeast, 82.6% in the Central, to 89.0% in the East. This study indicates that a regionally focused land cover map would in fact be more accurate than extracting the same region from a globally produced map.


Introduction
Land cover is a crucial parameter of the needed ecosystem-based information within the global change framework, which has been placed at the top of international scientific and political agendas by an increasing number of multilateral environmental agreements [1][2][3].Vegetative covers are required as a boundary layer in a number of general circulation and carbon exchange models [4].Accurate land cover information is therefore essential.Conventional approaches like field surveys are not effective to describe the features on the earth's surface due to their potential limitations, e.g., time consuming, too expensive, or date-lagged [5].Remote sensing offers a practical and economical means to acquire land cover information over large areas on account of its capacity for systematic observations at various scales [6]; consequently, it has been identified as one of the major data sources for the generation of land cover products.
The general approach of land cover mapping is to produce temporal, usually monthly composites from daily or weekly mosaics to minimize cloud cover and data noise [7].In conjunction with other ancillary data sets, monthly composites are then used to produce land cover categories according to a defined classification scheme at a regional, continental or global scale.The first global land cover map was produced using the satellite observations from the Advanced Very High Resolution Radiometer (AVHRR) [8][9][10].As more advanced, moderate resolution satellite sensors have emerged, i.e., Systeme Probatoire d'Observation de la Terre (SPOT), Moderate Resolution Imaging Spectro-radiometer (MODIS), and Medium Resolution Imaging Spectrometer (MERIS), the scientific community has witnessed the significant increase of available data sources for land cover mapping [11].In the last two decades, land cover products at various scales have been generated to provide primary terrestrial baseline data sets for numerous applications [12][13][14][15][16][17][18][19][20][21].However, significant disagreements have been found between distinct products across several land cover categories, especially forest and cropland related classes [1,22].
Various assessments on the global land cover data sets have been carried out globally or regionally [11,[23][24][25][26][27].Focusing on China, a region that presents many challenges for monitoring land cover and its dynamics, attempts at rigorous validation of global land cover products for this region have been very limited.Ran et al. [27] evaluated four 1 km global land cover products with the 1:100,000 land cover map of China as a reference data.Wu et al. [28] validated four 1 km global land cover products across China but limited to cropland.We have assessed the five global land cover data sets over China in terms of thematic comparison and consistency analysis [29].In this paper, we will further identify the accuracies of the five widely used global land cover products and a national land cover map of China by validating against independent land cover samples, which are collected using a sampling-based land cover labeling approach for efficiently producing high accuracy reference data over large areas.

Land Cover Maps
Five global land cover maps and a national land cover map of China are assessed here: (1) GLCC (Global Land Cover Characterization) land cover map with 1 km spatial resolution from the IGBP (International Geosphere-Biosphere Programme) [9].(2) UMd land cover map with 1 km spatial resolution from the University of Maryland [15].
(3) GLC2000 land cover map with 1 km spatial resolution from the European Commission's Joint Research Center [13].(4) MODIS land cover map (MODIS LC, hereafter) with 500 m spatial resolution (V005) from NASA (National Aeronautics and Space Administration) [14,30].(5) GlobCover land cover map with 300 m spatial resolution from the ESA (European Space Agency) [31].(6) GLCD-2005 (Geodata Land Cover Dataset for year 2005) land cover map of China at a scale of 1 to 250,000 produced by the Data Sharing Infrastructure of Earth System Science [32].
The general characteristics of the six land cover maps are: (i) they were derived from different sensors, such as AVHRR, SPOT-4, MODIS, and MERIS; (ii) they were characterized by a varying number of land cover classes [17,25]; and (iii) they represented the land cover at different points in time, i.e., GLCC and UMd are in 1992-1993, GLC2000 and MODIS LC are in 2000 and 2001, and GlobCover and GLCD-2005 are in 2005.
In the validation initiatives for these maps, GLCD-2005 was assessed using field survey data [32], MODIS LC was based on a cross-validation using several subsets of the training data that had not been used for the training process as reference information [30], while the other four maps used design based sampling schemes for collecting samples interpreted from high resolution images [19,[33][34][35], such as Landsat TM (Thematic Mapper), SPOT, and Google Earth.Since different approaches and reference data were used in the validation of these land cover maps, the reported accuracy measures are not comparable [7].

Data Pre-Processing
To allow comparative analysis of the above six land cover maps, two issues are addressed in the pre-processing: projection/spatial resolution unification and classification harmonization, which have been illustrated in our earlier work [29] in detail, such as Sinusoidal projection at a spatial resolution of 1 km was chosen as the common reference projection.Besides, we reclassified the six maps again into the data sets with seven land cover types on occurrence of major life forms (Table 1) here; that is, the five forest related classes and "Shrubland" were aggregated into "Trees" only and left the other classes intact.
Table 1.Aggregation of land cover classes according to major life forms (refer to [29]).

Accuracy Validation
Validation of land cover product provides indications of confidence that a pixel or segment has been correctly assigned to a thematic class [36].Generally, four approaches are used to quantify the accuracy of land cover classifications [7]: (i) confidence values of the classifier, (ii) cross-validation with training data sets, (iii) comparison with other reference data, and (iv) sampling and acquisition of ground information, which is regarded as most reliable.Popular measures for mapping accuracy in remote sensing are based on the error matrix [37,38], which is a square array of numbers that presents summary information on units classified as map land cover class and reference class [39,40].Three widely used parameters derived from error matrix describing the map and class specific accuracies are overall, user's and producer's accuracy [41].Overall accuracy is simply the percentage of correctly classified pixels, commonly calculated as area-weighted estimates for the different classes; the user's accuracy of a class is defined as the percentage of map area classified correctly, while the producer's accuracy of a class relates to the percentage of validation sites classified correctly [42].
Validation of the six land cover maps across China was done in the following three major steps: sampling design, sample labeling, and accuracy estimation.

Sampling Design
Like accuracy assessments of all other large scale land cover maps, the constraints such as time, cost and workload lead to the fact that it is impossible to collect ground truth information through direct investigations in the vast area of China.An alternative approach for collecting land cover samples inspired by a number of earlier studies [31,[43][44][45][46][47] is given in this study, i.e., a stratified random sampling based on high resolution satellite images, supplemented by ancillary data sets.
(1) Sampling strategy Strata, known as the division of a population into smaller groups in sampling, are formed based on members' shared attributes or characteristics in stratified random sampling [48].Here, we defined the land cover strata based on two maps: (i) The per-pixel comparison result (agreement map) of the five global land cover maps (GLCC, UMd, GLC2000, MODIS LC, and GlobCover) in China, which was assigned with five levels of agreement ranging from "No agreement" to "Full agreement" (identified as 1 to 5 accordingly) (see [29]).Here, we reclassified this agreement again into two categories: "high agreement" (pixel value > 3) and "low agreement" (pixel value ≤ 3) (Figure 1).(ii) The synthetical land cover map Geodata LC, which was synthesized from the five global land cover maps by voting according to the majority criteria in terms of life forms in Table 1.
If the accumulated number of votes for a pixel is more than three among the five global land cover maps, then the final type of this pixel in Geodata LC is the majority class of the corresponding pixel in the five maps.Supposing that the total number of voting is less than three on a pixel among the compared maps, the land cover type of this pixel in Geodata LC is set to be the corresponding class in MODIS LC.
The reclassified agreement map and Geodata LC were then overlaid to produce the land cover strata for sampling, and finally 14 strata were generated (two agreement levels × seven land cover classes).
Within land cover strata, samples were selected using following criteria: (a) Sample size depends on the area of the strata, in that larger sample sizes are allocated to larger strata.
(b) To increase the sample size in the "low agreement" strata, the sampling probability of "low agreement" is set to be ten times that of the "high agreement".(c) Sample size of each land cover class was controlled under Geodata LC, avoiding the case that the sample size of one class is too excessive while another class is too little.
The sample units were 1 km, and samples were interpreted based on the maximum area rule in each sample unit.
(2) Calculating weight The inclusion probability of the samples in each stratum was different due to the above criteria.Although it ensures enough samples were collected to represent each stratum, it would lead to bias when validating a land cover data with samples collected from different strata.Weight is calculated for each sample to correct the sampling bias (Equation ( 1)).
Where Ws is the weight for sample s, Nt is the number of samples in strata t, Ns,t is the total pixels of strata t that sample s located, and Np is the total sampling pixels (here is the total number of pixels in the area of China).

Sample Labeling
In our study, we utilized the method of interpreting land cover types at samples (Figure 2), which is mainly based on high resolution images, aided by other auxiliary information like pictures, charts, and records, etc.

Figure 2.
Interpreting land cover types at samples collected in a given area [49].
The interpretation process of land cover types at samples collected in a given area was performed using following reference data sources: (i) Landsat images from the three epochs (1990, 2000, and 2005) provided by the Global Land Survey (GLS) [50][51][52][53][54], (ii) Yearly NDVI (Normalized Difference Vegetation Index) variation profile derived from the eight-day composited MODIS Surface Reflectance products (MOD09A1) after cloud and shadow masking, and (iii) Google Maps, and photos collected by Panoramio [55].
Collecting data was done by 21 image analysts who have experience in remote sensing image interpretation.Initially, 18 of the 21 analysts performed the interpretation, and the results were passed to another interpreter for cross-checking.Then, cross-checked interpretation results were submitted to one of the three quality controllers, who are familiar with land cover types of China, for final checking.

Accuracy Estimation
Land cover samples collected on the online system were applied to the accuracy validation of the five global and a national land cover maps in China, and error matrixes were calculated with the weighted samples.The detailed information of measurements retrieved from error matrix, including overall accuracy (OA), user's accuracy (UA), and producer's accuracy (PA), is available in [56].

Validation of Six Land Cover Maps in China
A summary of error matrixes between each map pair and land cover samples is given in Tables 2-4 separately.Overall accuracies of the six land cover maps across China are GLCD-2005 (72.3%),MODIS LC (68.9%),GLC2000 (65.2%),GlobCover (57.7%),GLCC (57.2%), and UMd (48.6%) in the descending order.Both user's and producer's accuracies of life forms vary considerably.It should be known that accuracy of 'Wetland' is not considered here.From the user's point of view, life form classes with accuracy of more than 50% are (Figure 4): (i) "Trees" and "Others" in all of the six land cover maps, (ii) "Grassland" in GLCC, GLC2000, MODIS LC and GLCD-2005, (iii) "Cropland" in the all of the six maps except GlobCover, (iv) "Water" in MODIS LC and GlobCover, and (v) "Urban" in GlobCover and GLCD-2005.
From the producer's aspect, life form classes with accuracy of more than 50% are (Figure 5): Trees" in all of the six land cover maps except GLCC, (ii) "Grassland" in all of the six maps except GlobCover, (iii) "Cropland" in all of the six maps, (iv) "Water" and "Urban" only in GLCD-2005, and (v) "Others" in all of the six maps except GLCC and UMd.In general, the validation results indicate that the six land cover maps performed best in representing "Trees" and "Others", well with "Grassland" and "Cropland", whereas problematic with "Water" and "Urban" particularly over China.

Validation of GLCD-2005 in Geographical Regions of China
To understand the accuracy discrepancies of GLCD-2005 in different regions, we further assessed its accuracy across seven geographical locations of China (Figure 6), including:  The detailed information for land cover types of samples in each region are shown in Table 5, and the validation results are presented in Figures 7-9.At the regional scale, the highest overall accuracy of GLCD-2005 is observed in East of China (89.0%), followed by Central (82.6%),Northeast (81.8%),North (80.8%),Northwest (79.2%) and South of China (77.5%); while GLCD-2005 in the Southwest of China gets the lowest overall accuracy (46.3%), which is expected as this region is mainly geomorphologically mountainous terrains with much more complex landscapes and dominated by highly fragmented land cover types.Besides, in terms of each land cover class, both "Trees" and "Cropland" in the GLCD-2005 are presented quite well in the seven geographical locations; whereas "Water" is problematic particularly in the Northwest of China, although the sampling data is not strictly thorough enough to take equal consideration for every land cover type in each region.

Discussion
This study reveals that there were significant accuracy differences among the one national and five global land cover maps when compared in China.Moreover, for the five global land cover maps, the validation results show relatively lower overall accuracies in China than those in the globe.Many factors could potentially affect the final validation results.
Factors embedded in each land cover map from the very beginning of its generation should firstly be paid attention to, mainly referred to remote sensors, and classification methods.Time acquisition cannot be the primary factor since we assessed the accuracy of land cover map pair against with samples collected in the same year.There were substantial accuracy differences between GLCC and UMd, as well as GlobCover and GLCD-2005.GLC2000 and MODIS LC got the higher accuracy compared with GlobCover, although significant improvements in sensor and data processing approach had been used.In addition, GLCD-2005 has the highest accuracy among the compared maps in this study; it was mainly due to the classification method and high accuracy reference data sets adopted at China regional scale.
Furthermore, accuracy assessment of land cover maps has always been tightly tied with reference data, i.e., "true" land cover type [57], and the quality and availability of adequate validation data sets have been known as the most limited factors for land cover evaluation.In this study, it is not thorough enough to take equal consideration for each land cover type, although a collection of 3000 validation sites per year was applied to assess the maps.Besides, it is necessary to mention that the validation sites collected with a stratified random sampling were exclusively designed for this study, in that the reclassified agreement map and Geodata LC were based on.

Conclusions
Accuracy assessment has always been emphasized by researchers for its importance to scientific investigations and policy makings based on land cover information [58].Our study used a sampling-based labeling approach to assess the accuracies of the five global and a national land cover maps (i.e., GLCC, UMd, GLC2000, MODIS LC, GlobCover, and GLCD-2005) in China, all of which were created for providing the needed ecosystem information of global change research but were based on different remotely sensed data and classification methods.
The rigorous validation for the six land cover maps over China suggests that a regionally focused land cover map would in fact be more accurate than extracting the same region from a globally produced map, which is proved strongly by the result that the overall accuracies among the six maps vary from 48.6% to 72.3%; they were UMd, GLCC, GlobCover, GLC2000, MODIS LC, and GLCD-2005 in ascending order.The differences in user's and producer's accuracies for each life form class among the six maps were substantial.However, all of the six land cover maps performed best in representing "Trees" and "Others", well with "Grassland" and "Cropland", but problematic with "Water" and "Urban" over China in general.For GLCD-2005, there are significant differences between the accuracies of the maps across different regions, ranging from 46.3% in the Southwest (lowest) to 89.0% in the East (highest) of China; "Trees" and "Cropland" in the maps are presented quite well in all seven geographical locations compared with the other five classes.
The approach of interpreting land cover types at samples presented here can be used for identifying land cover types at a sample of locations distributed across a given area efficiently, and the accuracy of the interpreted land cover data is maximized by associating the sample locations with multiple sources of input, such as Landsat images acquired in different seasons and epochs, high-resolution images from Google Maps, and in situ ground photos from web sources [49].The approach can also be adopted for continental and global land cover research activities.In the next work, we would use this sampling-based labeling approach to investigate the accuracy of land cover maps produced recently and (or) with high resolution, like GLCNMO (Global Land Cover by National Mapping Organizations) [59], FROM-GLC (Finer Resolution Observation and Monitoring of Global Land Cover) [44], and GlobeLand30 (Global land cover dataset with 30 m spatial resolution) [60].

Figure 1 .
Figure 1.Reclassified agreement map in China.

Figure 3
Figure 3 presents the distribution of land cover samples in China.A total of 9000 samples were collected, 3000 samples in each individual year (1990, 2000, and 2005).Subsequently, GLCC and UMd pair, GLC2000 and MODIS LC pair, and GlobCover and GLCD-2005 pair were evaluated using 3000 samples in 1990, 2000, and 2005 separately.

Figure 3 .
Figure 3.The distribution of land cover samples collected in China.

Figure 4 .
Figure 4. User's accuracy of the six land cover maps with life forms in China.

Figure 5 .
Figure 5. Producer's accuracy of the six land cover maps with life forms in China.

Figure 6 .
Figure 6.Geographical locations of regions and provinces in China.

Figure 7 .
Figure 7. Overall accuracy for GLCD-2005 over seven geographical regions of China.

Figure 8 .
Figure 8. User's accuracy for GLCD-2005 over seven geographical regions of China.

Table 2 .
Summary of error matrixes for GLCC (Global Land Cover Characterization), UMd (University of Maryland land cover product) in China.

Table 3 .
Summary of error matrixes for GLC2000 (Global Land Cover 2000 project data), MODIS LC (Moderate Resolution Imaging Spectro-radiometer Land Cover product) in China.

Table 4 .
Summary of error matrixes for GlobCover (GLOBCOVER land cover product), and a national land cover map GLCD-2005 (Geodata Land Cover Dataset for year 2005) in China.

Table 5 .
Land cover types of sample distributed in each region ("Y" means "type-included", "N" means "not type-included").The number in brackets refers to the sum of samples in each region.