Spatial Consistency Assessments for Global Land-Cover Datasets: A Comparison among GLC2000, CCI LC, MCD12, GLOBCOVER and GLCNMO

: Numerous global-scale land-cover datasets have greatly contributed to the study of global environmental change and the sustainable management of natural resources. However, land-cover datasets inevitably experience information loss because of the nature of the uncertainty in the interpretation of remote-sensing images. Therefore, analyzing the spatial consistency of multi-source land-cover datasets on the global scale is important to maintain the consistency of time and consider the effects of land-cover changes on spatial consistency. In this study, we assess the spatial consistency of ﬁve land-cover datasets, namely, GLC2000, CCI LC, MCD12, GLOBCOVER and GLCNMO, at the global and continental scales through climate and elevation partitions. The inﬂuencing factors of surface conditions and data producers on the spatial inconsistency are discussed. The results show that the global overall consistency of the ﬁve datasets ranges from 49.2% to 67.63%. The spatial consistency of Europe is high, and the multi-year value is 66.57%. In addition, the overall consistency in the EF climatic zone is very high, around 95%. The surface conditions and data producers affect the spatial consistency of land-cover datasets to different degrees. CCI LC and GLCNMO (2013) have the highest overall consistencies on the global scale, reaching 67.63%. Generally, the consistency of these ﬁve global land-cover datasets is relatively low, increasing the difﬁculty of satisfying the needs of high-precision land-surface-process simulations.


Introduction
LUCC (Land use and land cover change) is a cause and consequence of global environmental change [1,2]. Changes in land use and land cover greatly alter the ecological process and land resource management, i.e. energy fluxes of land surface, carbon sequestration, water balance, as well as land use policy decisions [3][4][5][6]. At the same time, land use/land cover change is one of the most essential inputs for many land-surface models and is the basis for spatial inferences [7][8][9][10][11]. Compared to traditional methods (e.g., field surveys), remote sensing can provide large-scale and long time-series information of the Earth's surface [12]. The most commonly used global land-cover datasets included the Matthews, Olson and Watts, and the Wilson and Henderson-Sellers global databases before the 90s, which have trouble meeting the needs of the global-change science community for very coarse resolution (typically 1 • latitude by 1 • longitude) [13][14][15]. As early as 1995, the International Geosphere-Biosphere Programme (IGBP) and the International Human Dimensions Programme on Remote Sens. 2018, 10, 1846 3 of 18 and individual continents and (2) identify spatially consistent gradient characteristics of datasets for elevation partitions and climatic zones.

Data Resources and Preprocessing
Before comparing the results, we must understand the similarities and differences among the following five datasets: (1) Table 1. The above land-cover datasets are extracted according to the boundaries of continents, climate zones and elevation partitions. The projection system of land-cover datasets after extraction is transformed to the continental-scale Albers equal-area projection. These five datasets range in resolution from 300 m to 1 km, so resampling is required for comparison analysis, and the spatial resolution is adjusted to 1 km by the nearest-neighbor method. Reclassification is the other important step because these five datasets have different classification schemes. For example, GLOBCOVER (2005/2009) uses FAO LCCS classification schemes with 22 classes and MCD12 uses IGBP classification scheme with 17 classes. To facilitate the comparison, the initial classification code must be normalized, and detailed land-cover objects are required to merge and form a set of unified and generalized conversion criteria. The conversion relationship between the original schemes of the land-cover datasets and the target scheme is provided in Table 2. Additionally, the ocean and Antarctica are not considered in this comparison. All the land-cover datasets are converted into a consistent classification system with nine types (see Table 2). Furthermore, DEM data uses resampled data with 1-km resolution from the SRTM 90-m Digital Elevation Database v4.1, and climate-zone data are reclassified based on world maps of KÖPPEN-GEIGER climate classification [44].

Category Composition Similarity Analysis
For each dataset, we count and summarize the area of each land-cover type and calculate the correlation coefficient of area series between each dataset, which evaluates the similarity of the land-cover datasets' quantitative composition and Hu also conducted category composition similarity analysis in Europe for GLOBCOVER2005, GLOBCOVER2009, GLC2000 and MODIS2000 [45]. The category composition similarity formula is defined as where i refers to the i-th combination of land-cover datasets, k stands for the k-th land-cover type, X k represents the area of land-cover type k in dataset X, Y k represents the area of land-cover type k in dataset Y, Y stands for the average of all nine types of land areas in dataset Y, and X stands for the average of all nine types of land areas in dataset X.

Overall Consistency and Category Consistency Analysis
Category composition similarity analysis can assess the similarities in the composition of areas among each land-cover dataset. However, appraising the spatial confusion degree of the same land-cover type among different datasets is difficult. Thus, we utilize a confusion matrix to obtain the corresponding per-pixel information among each dataset and determine the matrix that explains the switched relationship of land use types between two sets of datasets.
Currently, the use of a confusion matrix or error matrix to calculate the accuracy index of land-cover data is the most commonly used accuracy-evaluation method, occupying the core position in the accuracy evaluation method [46,47]. This confusion matrix can calculate the overall accuracy, user's accuracy, producer's accuracy, Kappa coefficient, etc. [48,49]. In view of the meaning of overall accuracy (OA) and producer accuracy (PA), this study uses two precision indicators, namely, the overall consistency and category consistency, to characterize dataset consistency. The formulas are as follows: where X ii refers to the number of class i pixels that were correctly classified, X +i stands for the number of class i pixels in the reference data, and N is the total number of pixels. This analysis is applied to the entire globe, continents, and different elevations zones and climate zones. Based on the elevation division scheme of the geomorphology unit, 200 m, 500 m, 1000 m, and 3500 m are chosen as the classification thresholds. The spatial consistency of each sub-region is also studied according to the difference between the first two letters of the Köppen climate zone. The climate zone includes 13 areas: Af for equatorial rainforest and full humid, Am for equatorial monsoon, Aw for equatorial savannah, BW for desert climate, BS for steppe climate, Cs for warm temperate climate with dry summer, Cw for warm temperate climate with dry winter, Cf for warm temperate climate and fully humid, Ds for snow climate with dry summer, Dw for snow climate with dry winter, Df for snow climate and fully humid, ET for tundra climate, and EF for frost climate.

Spatial Multiple-Consistency Analysis
The aim of spatial multiple-consistency analysis is to reveal similarities in terms of the spatial distribution of different classes among three datasets and Mccallum assessed the percentage agreement of the four global land cover datasets (IGBP, UMD, GLC2000 and MODIS) by this method [32]. We adopt the method of overlaying maps to directly express the spatial multiple-consistency of specific land-cover types. In particular, we can judge pixels as having spatial multiple-consistency if three datasets estimate this pixel to be the same land use types. The formula is as follows: where L i , M i , and N i are the number of pixels of the i-th land-cover category in the land-cover datasets L, M, and N, respectively, and T i is the number of pixels in the i-th land-cover category that the three datasets consistently determined.

Weighted Complexity of the Land Cover
We call the complexity of land-cover types of a certain area as the weighted complexity of land cover. When using the focal statistics tool in ArcGIS 10.2, the number of different land-cover types within the rectangular analysis window (10 × 10 pixels) is the land-cover complexity of the central pixel of the window. We study a total of nine different land use types, so the range of the land-cover complexity is 1-9. According to the level classification of the land-cover complexity, when the land-cover complexity of a pixel is x, x different types of land cover exist within the 10 × 10 rectangular area centered on this pixel. The formula is as follows: where x is the land complexity and S x is the area with land complexity x.

Category Composition Similarity at the Global and Continental Scales
At the global scale, the correlation coefficients of the compositional similarity among the five datasets are quite different, ranging from 69% to 97%, and the inter-annual variation is small. Obvious intercontinental differences were observed according to the results of the intercontinental comparison (see Figure S1).

Overall Consistency Differences at the Global and Continental Scales
We calculate the confusion matrix of the globe and six continents to obtain the overall consistency of the datasets. Figure 1 indicates the overall inter-annual consistency of the inter-comparison of five datasets for the globe and six continents. On the global scale, the overall consistency of all the datasets is between 49.2% and 67.6%. On the continental scale, the overall consistency of some datasets presents obvious intercontinental and inter-annual differences.
As for the multi-year averages of the overall consistency on the global scale, the highest overall consistency between CCI LC and GLC2000 was 66.46%. The second highest was found between CCI LC and GLOBCOVER (65.73%). The comparison between CCI LC and GLCNMO was also relatively high (63.26%). The overall consistency between other land-cover datasets was always lower than 60%. The comparison between MCD12 and GLOBCOVER was the worst: only 49.40%. The overall global consistency of each node with GLCNMO in the comparison significantly increased with time, and the results of the comparison with CCI LC in 2013 were 10 percentage points higher than those of 2003, while the results of comparison with MCD12 in 2013 were eight percentage points higher than those of 2003.
As for the multi-year averages of the overall consistency on the continental scale, the comparison of CCI LC with GLC2000 and CCI LC with GLOBCOVER basically exceeded 60% on all continents. However, the comparison of CCI LC with GLCNMO, CCI LC with MCD12, MCD12 with GLCNMO and MCD12 with GLOBCOVER showed that the overall consistency of Oceania was much lower than that of the other continents, with multi-year averages of 47.54%, 25.71%, 39.14% and 21.04%, respectively. Meanwhile, the overall consistency of Europe was slightly higher than that of other continents in most comparisons.
In addition to the comparison in 2003 showing a sharp increase, the inter-annual variation in the overall consistency between CCI LC and MCD12 was relatively stable and the CCI LC and MCD12 multi-phase datasets likely had very stable interpretations. However, the overall consistency of the comparisons between other datasets showed varying degrees of inter-annual change. The overall consistency between MCD12 and GLOBCOVER in Europe showed significant improvement, with an increase of nearly eight percentage points. The overall consistency between CCI LC and GLCNMO and between MCD12 and GLCNMO in most regions increased each year.
We calculate the confusion matrix of the globe and six continents to obtain the overall consistency of the datasets. Figure 1 indicates the overall inter-annual consistency of the intercomparison of five datasets for the globe and six continents. On the global scale, the overall consistency of all the datasets is between 49.2% and 67.6%. On the continental scale, the overall consistency of some datasets presents obvious intercontinental and inter-annual differences.
As for the multi-year averages of the overall consistency on the global scale, the highest overall consistency between CCI LC and GLC2000 was 66.46%. The second highest was found between CCI LC and GLOBCOVER (65.73%). The comparison between CCI LC and GLCNMO was also relatively high (63.26%). The overall consistency between other land-cover datasets was always lower than 60%. The comparison between MCD12 and GLOBCOVER was the worst: only 49.40%. The overall global consistency of each node with GLCNMO in the comparison significantly increased with time, and the results of the comparison with CCI LC in 2013 were 10 percentage points higher than those of 2003, while the results of comparison with MCD12 in 2013 were eight percentage points higher than those of 2003.
As for the multi-year averages of the overall consistency on the continental scale, the comparison of CCI LC with GLC2000 and CCI LC with GLOBCOVER basically exceeded 60% on all continents. However, the comparison of CCI LC with GLCNMO, CCI LC with MCD12, MCD12 with GLCNMO and MCD12 with GLOBCOVER showed that the overall consistency of Oceania was much lower than that of the other continents, with multi-year averages of 47.54%, 25.71%, 39.14% and 21.04%, respectively. Meanwhile, the overall consistency of Europe was slightly higher than that of other continents in most comparisons.
In addition to the comparison in 2003 showing a sharp increase, the inter-annual variation in the overall consistency between CCI LC and MCD12 was relatively stable and the CCI LC and MCD12 multi-phase datasets likely had very stable interpretations. However, the overall consistency of the comparisons between other datasets showed varying degrees of inter-annual change. The overall consistency between MCD12 and GLOBCOVER in Europe showed significant improvement, with an increase of nearly eight percentage points. The overall consistency between CCI LC and GLCNMO and between MCD12 and GLCNMO in most regions increased each year.

Category Consistency Difference at the Global and Continental Scales
These datasets spanned several years, so we calculated the multi-year average of the category consistency of each land use type. The results are shown in Figure 2. On the global scale, the category consistency of forest and bare land was relatively high, and the average results among the datasets were 73.24% and 72.74%, respectively. The category consistency of shrub and wetland was relatively low, and the average results among the datasets were 28.48% and 20.92%, respectively. The spectral characteristics of shrubs and wetlands are not obvious, and the spatial distribution of shrubs and wetlands can easily interlace with other land objects. Therefore, the category consistency of these two land use types was lower in all the dataset comparisons.
On the continental scale, except for the comparison between MCD12 and CCI LC in Africa, the category consistency of forest in all the other datasets was higher than 60%. Forests can be easily identified because their spectral and spatial texture features are clear and easy to distinguish from other land use types, and these regions have a wide and continuous distribution on the global scale. In terms of construction land, the category consistency of each dataset was relatively low, and the degree of recognition of construction land by CCI LC and MCD12 was slightly better than for the other datasets. The identification of construction land was not high, which may have been related to the complex spectral features of construction land and the difficulty in extracting any features.
Furthermore, the category consistency of bare land in Africa and permanent ice and snow in North America was extremely high, close to or exceeding 90%, mainly because of the continuous and concentrated distribution of bare land and permanent ice and snow cover in the Sahara Desert and Greenland, respectively.

Spatial Multiple-Consistency at the Global and Continental Scales
Spatial multiple-consistency occurs if three datasets determine that a certain pixel is the same land use type. Three such datasets exhibited this feature for 2003, 2005, 2008, 2009 and 2013. More precisely, we used the GLCNMO, CCI LC, and MCD12 datasets to compare the spatial multiple- On the global scale, the category consistency of forest and bare land was relatively high, and the average results among the datasets were 73.24% and 72.74%, respectively. The category consistency of shrub and wetland was relatively low, and the average results among the datasets were 28.48% and 20.92%, respectively. The spectral characteristics of shrubs and wetlands are not obvious, and the spatial distribution of shrubs and wetlands can easily interlace with other land objects. Therefore, the category consistency of these two land use types was lower in all the dataset comparisons.
On the continental scale, except for the comparison between MCD12 and CCI LC in Africa, the category consistency of forest in all the other datasets was higher than 60%. Forests can be easily identified because their spectral and spatial texture features are clear and easy to distinguish from other land use types, and these regions have a wide and continuous distribution on the global scale. In terms of construction land, the category consistency of each dataset was relatively low, and the degree of recognition of construction land by CCI LC and MCD12 was slightly better than for the other datasets. The identification of construction land was not high, which may have been related to the complex spectral features of construction land and the difficulty in extracting any features.
Furthermore, the category consistency of bare land in Africa and permanent ice and snow in North America was extremely high, close to or exceeding 90%, mainly because of the continuous and concentrated distribution of bare land and permanent ice and snow cover in the Sahara Desert and Greenland, respectively.  Figure 3 illustrates the spatial multiple-consistency of nine land use types on six continents and across the globe and describes the characteristics of the spatial distribution of these land use types.  Figure 3 illustrates the spatial multipleconsistency of nine land use types on six continents and across the globe and describes the characteristics of the spatial distribution of these land use types.  The spatial multiple-consistency of forest was high in South America, Europe, and Asia, higher than 75%, 50%, and 50%, respectively, including being widely distributed in the Amazon Plain, western Pampas, Southeast Asia, Korean Peninsula, and the middle and lower reaches of the Yangtze River in Asia. The spatial multiple-consistency of cropland was high in Europe, Asia, and Oceania, with values higher than 57.5%, 50%, and 50%, respectively. The spatial multiple-consistency of construction land was low across all six continents: less than 50%. At the same time, the spatial multiple-consistency of grassland, wetland, shrub was also low on all six continents, with values less than 25%.
The spatial multiple-consistency of bare land widely varied across all continents, with a high consistency of more than 90% in Africa, while the spatial multiple-consistency in Europe, Oceania, and North America was close to 0. Bare land was mainly concentrated in the Sahara Desert, Arabian Peninsula, Iranian Plateau, Taklamakan Desert, and Mongolian Plateau. The spatial multiple-consistency of water on each continent except Africa and Oceania was approximately 50%. The distribution of water was basically the same as the world's important water lakes and rivers. Permanent ice and snow was highly spatially consistent in North America, and the values of the five time nodes were more than 75%, while permanent ice and snow was mainly distributed in the high latitudes of the Northern Hemisphere, such as in Greenland and Svalbard. However, water and forests had high spatial consistency on the global scale, often exceeding 50%, and grassland, shrub and wetland were less than 20%. Generally, a large number of inconsistent areas were present in central and southern Africa, northern high latitudes outside Greenland, and most of Australia. Subsequent global land-cover datasets should focus on the above areas.

Overall Consistency Differences of Continental Elevations
The overall consistency of continental elevation greatly fluctuated with variations in elevation, but the fluctuation trend of each dataset was close. Twenty-four elevation gradient intervals were observed across the six continents. Six datasets exhibited consistent characteristics in two thirds of the intervals. However, at least five datasets had consistent features in 22 intervals, or 91.67% of the total (see Figure S2). Figure 4 illustrates the gradient characteristics of the category consistency with elevation changes for the nine land use types on all six continents. The results show that the datasets had relatively poor consistency in the description of category consistency of the elevation gradient features: four comparison results showed that six datasets consistently characterized the gradient features, and 10 comparison features showed that five datasets presented similar gradient features. That is to say, the comparison results of at least five datasets appeared to be consistent in only one quarter.   Figure 4 illustrates the gradient characteristics of the category consistency with elevation changes for the nine land use types on all six continents. The results show that the datasets had relatively poor consistency in the description of category consistency of the elevation gradient features: four comparison results showed that six datasets consistently characterized the gradient features, and 10 comparison features showed that five datasets presented similar gradient features. That is to say, the comparison results of at least five datasets appeared to be consistent in only one quarter.

Category Consistency Differences of Continental Elevations
Africa had three types of land-type gradients with similar characteristics. For Asia, the spatial consistency gradients of wetland were similar, and the spatial consistency of water and permanent Africa had three types of land-type gradients with similar characteristics. For Asia, the spatial consistency gradients of wetland were similar, and the spatial consistency of water and permanent ice and snow showed an obvious fluctuation with increasing elevation. The gradient characteristics of the spatial consistency of cropland, water and wetland in Europe were similar. In Oceania, the consistency gradients of construction land and permanent ice and snow were similar, and the characteristics of the consistency gradients of grassland and permanent ice and snow in South America were similar. More than 75% of the results showed that the consistency above 3500 m was lower than that for 1000-3500 m, whose value was 57.40%, while the values of the other two intervals were 54.31% and 43.21% for all six continents. Figure 5 shows the overall consistency of the comparison results among the various climate sub-regions, which were divided into five categories based on the natural-breaks method.

Overall Consistency Differences of Climatic Zones
were 54.31% and 43.21% for all six continents. Figure 5 shows the overall consistency of the comparison results among the various climate subregions, which were divided into five categories based on the natural-breaks method.

Overall Consistency Differences of Climatic Zones
The six comparison results exhibited consistency in the first, second and fifth categories. The overall consistency of the EF region was significantly higher than that of other regions: close to 95%. The overall consistency of Af, Am and Bw was also relatively high, with averages of 79.21%, 74.63% and 73.60%, respectively. Meanwhile, the six comparison results showed that the overall consistency in Ds and ET was relatively poor, with averages of 45.44% and 41.92%, respectively. However, the overall consistency of the other climatic zones in the six comparison results greatly fluctuated.
We calculated the standard deviation of the spatial consistency of each climatic zone to quantify the differences in the overall consistency of each climate zone. The average standard deviation of the six comparisons was 15.33, and a certain degree of spatial-consistency difference was observed in the climatic zone. At the same time, except for the results of CCI LC compared to MCD12, the standard deviation of the results of the MCD12 participation comparison had an average value of 17.98, which was significantly higher than that of CCI LC, whose average was 12.81. In short, large differences were present in the overall consistency of each climatic zone. According to the comparison results, the overall consistency of the EF, Af, Am and Bw climatic zones was higher, while that of the Ds and ET climatic zones was lower. Figure 5. Overall consistency of climatic zones. Different letters represent different climatic zones including 13 areas: Af for equatorial rainforest and full humid, Am for equatorial monsoon, Aw for equatorial savannah, BW for desert climate, BS for steppe climate, Cs for warm temperate climate with dry summer, Cw for warm temperate climate with dry winter, Cf for warm temperate climate and fully humid, Ds for snow climate with dry summer, Dw for snow climate with dry winter, Df for snow climate and fully humid , ET for tundra climate, and EF for frost climate. Different colors or numbers represent different classification results from the natural-breaks method: The lower the number, the higher the overall consistency. Figure 5. Overall consistency of climatic zones. Different letters represent different climatic zones including 13 areas: Af for equatorial rainforest and full humid, Am for equatorial monsoon, Aw for equatorial savannah, BW for desert climate, BS for steppe climate, Cs for warm temperate climate with dry summer, Cw for warm temperate climate with dry winter, Cf for warm temperate climate and fully humid, Ds for snow climate with dry summer, Dw for snow climate with dry winter, Df for snow climate and fully humid , ET for tundra climate, and EF for frost climate. Different colors or numbers represent different classification results from the natural-breaks method: The lower the number, the higher the overall consistency.
The six comparison results exhibited consistency in the first, second and fifth categories. The overall consistency of the EF region was significantly higher than that of other regions: close to 95%. The overall consistency of Af, Am and Bw was also relatively high, with averages of 79.21%, 74.63% and 73.60%, respectively. Meanwhile, the six comparison results showed that the overall consistency in Ds and ET was relatively poor, with averages of 45.44% and 41.92%, respectively. However, the overall consistency of the other climatic zones in the six comparison results greatly fluctuated.
We calculated the standard deviation of the spatial consistency of each climatic zone to quantify the differences in the overall consistency of each climate zone. The average standard deviation of the six comparisons was 15.33, and a certain degree of spatial-consistency difference was observed in the climatic zone. At the same time, except for the results of CCI LC compared to MCD12, the standard deviation of the results of the MCD12 participation comparison had an average value of 17.98, which was significantly higher than that of CCI LC, whose average was 12.81. In short, large differences were present in the overall consistency of each climatic zone. According to the comparison results, the overall consistency of the EF, Af, Am and Bw climatic zones was higher, while that of the Ds and ET climatic zones was lower.

Category Consistency Differences of Climatic Zones
Factors such as the temperature and precipitation created large differences in the land use types' composition in each climatic zone, which affected the category consistency (see Figure S3).

Advantages of Temporal Consistency and Geographical Zoning on Spatial Assessment
According to the above comparison and analyses, the spatial consistency among these datasets was not high. This finding is not surprising and matches earlier conclusions by DeFries, Hansen and Latifovic [50][51][52]. This finding could either be real or simply be caused by differences in sensors, temporal periods, original classification algorithms, or classification schemes [53]. In addition, we have some thoughts on the role of maintaining temporal consistency and basing the analysis on geographical zoning research to consistently compare datasets.
First, maintaining temporal consistency is useful to reduce the effect of land-cover changes on the comparison. Scholars found that land-cover changes between the acquisition datasets can affect such comparisons [29,54], and the purpose of maintaining temporal consistency is to abandon the original cross-time comparison and reduce the effect of land-cover changes on the comparison. According to previous research, some land use types, such as water and cropland, have greatly changed on the global and continental scales, and values on smaller scales may hide heterogeneities among regions [55,56]. In other words, land-cover changes in local areas are dramatic and may not be the primary factor but are still critical in comparisons.
Second, we can observe whether the interpretation ability of the same set of datasets varies between years by maintaining temporal consistency. According to the comparison between GLOBCOVER and CCI LC, GLOBCOVER and MCD12 in Figure 1, the consistency of the European region was greatly improved and the comparison between CCI LC and MCD12 in 2003 significantly deviated compared to other years. In other words, the interpretation of multiphase datasets may be less stable, and the interpretation consistency of a certain area at a certain time node for a set of land datasets does not present interpretation consistency for all the datasets.
Third, we chose to compare the spatial consistency in geography-based schemes, such as elevation and climatic zones, rather than administrative areas (continental or national scales). Compared to administrative areas, the composition of geography-based schemes is more regular, and we could find features of consistency in homogeneous or heterogeneous environments. The consistency of many datasets above 3500 m declined somewhat compared to the consistency at elevations between 1000 and 3500 m. The consistency of different climate zones was also somewhat different, suggesting that the input data source, classification scheme, and classification methodology were all optimized to the features of regional heterogeneity.

Inconsistencies from Surface Conditions
Temperature and precipitation factors affect the surface composition of the Earth's surface. Compared to continents, the composition of surface conditions in different climatic zones can be more regular, which could result in inconsistencies in land-cover datasets. It is noted that this study used the method of weighted complexity of land cover to measure the land pattern in each climatic zone.
It contributes to explore the relationship between spatial consistency and the weighted complexity of land cover. Figure 6 shows the weighted complexity of the land cover in each climatic zone and regression equations between the overall consistency and weighted complexity of land cover in different climatic zones. Based on the results, the weighted complexity of the land cover in climatic zones with relatively high overall consistency is relatively low. As the weighted complexity of the land cover increases, the overall consistency significantly linearly declines.
Remote Sens. 2018, 10, x FOR PEER REVIEW 13 of 19 Figure 6. The relationship between the weighted complexity of land cover from surface conditions and spatial consistency. The left is the weighted complexity of the land cover in each climatic zone, with the red color indicating a higher overall consistency of the climatic zone and different letters represent different climatic zones including 13 areas: Af for equatorial rainforest and full humid, Am for equatorial monsoon, Aw for equatorial savannah, BW for desert climate, BS for steppe climate, Cs for warm temperate climate with dry summer, Cw for warm temperate climate with dry winter, Cf for warm temperate climate and fully humid, Ds for snow climate with dry summer, Dw for snow climate with dry winter, Df for snow climate and fully humid , ET for tundra climate, and EF for frost climate. The right has regression equations between the overall consistency and weighted complexity of the land cover for different climatic zones. The first two datasets before "/" are for the overall consistency analysis, and the dataset in "()" is for calculating the weighted complexity of the land cover.
In the EF, the average monthly temperature of the hottest months was below 0 °C, and snow remained accumulated throughout the year, resulting in large snow-covered areas (around 95% of the area). The spectral characteristics of snow and ice are also very clear and easy to distinguish from other features, so the overall consistency is very high: around 95%. The average overall consistency of Am was 79.21%, and Af is hot and rainy throughout the year, with an average monthly rainfall of more than 60 mm, resulting in the growth of forests and widespread distribution. Forests' spectral and spatial texture features are clear and easy to distinguish from those of other types. Except for the small amount of mixed forest and cropland in the area, fewer mixed types were present, so the overall consistency was higher. The average overall consistency of Am was 74.63%, with the dry months during winter. The monthly precipitation may have been less than 60 mm, but the overall rainfall was sufficient. Trees grew densely and had high distribution. Except for a small amount of forest that was mixed with cropland, mixed areas were also relatively scarce and the overall consistency was higher. The average overall consistency of Bw was 73.60%. Little rain falls throughout the year, and this region contains areas where bare land is highly concentrated, such as the Sahara Desert and Arabian Plateau. Bare land occupies around 70% of the area, so the overall consistency was relatively higher. In summary, the surface conditions caused inconsistencies to some extent. The more complex the surface was, the greater the inconsistency became. Figure 6. The relationship between the weighted complexity of land cover from surface conditions and spatial consistency. The left is the weighted complexity of the land cover in each climatic zone, with the red color indicating a higher overall consistency of the climatic zone and different letters represent different climatic zones including 13 areas: Af for equatorial rainforest and full humid, Am for equatorial monsoon, Aw for equatorial savannah, BW for desert climate, BS for steppe climate, Cs for warm temperate climate with dry summer, Cw for warm temperate climate with dry winter, Cf for warm temperate climate and fully humid, Ds for snow climate with dry summer, Dw for snow climate with dry winter, Df for snow climate and fully humid , ET for tundra climate, and EF for frost climate. The right has regression equations between the overall consistency and weighted complexity of the land cover for different climatic zones. The first two datasets before "/" are for the overall consistency analysis, and the dataset in "()" is for calculating the weighted complexity of the land cover.
In the EF, the average monthly temperature of the hottest months was below 0 • C, and snow remained accumulated throughout the year, resulting in large snow-covered areas (around 95% of the area). The spectral characteristics of snow and ice are also very clear and easy to distinguish from other features, so the overall consistency is very high: around 95%. The average overall consistency of Am was 79.21%, and Af is hot and rainy throughout the year, with an average monthly rainfall of more than 60 mm, resulting in the growth of forests and widespread distribution. Forests' spectral and spatial texture features are clear and easy to distinguish from those of other types. Except for the small amount of mixed forest and cropland in the area, fewer mixed types were present, so the overall consistency was higher. The average overall consistency of Am was 74.63%, with the dry months during winter. The monthly precipitation may have been less than 60 mm, but the overall rainfall was sufficient. Trees grew densely and had high distribution. Except for a small amount of forest that was mixed with cropland, mixed areas were also relatively scarce and the overall consistency was higher. The average overall consistency of Bw was 73.60%. Little rain falls throughout the year, and this region contains areas where bare land is highly concentrated, such as the Sahara Desert and Arabian Plateau. Bare land occupies around 70% of the area, so the overall consistency was relatively higher. In summary, the surface conditions caused inconsistencies to some extent. The more complex the surface was, the greater the inconsistency became.

Inconsistencies from Dataset Producers
The dataset-development rules that have been established by dataset producers, including the classification schemes, classification method, connotation of land use types, resolution size, and selected time nodes of images and international participation may also result in inconsistencies among land-cover datasets.
Differences in the classification schemes and subordinate definitions of land use types among datasets are one of the major factors that create deviations in classification results, and classification schemes also make the rigorous comparison and synergistic use of different maps challenging, if not impossible [25,54,57,58]. The five datasets that were used in this study are based on two land-classification systems: LCCS and IGBP. Each land-use dataset has different definitions for their land-use types. For example, shrub forests are divided into evergreen shrubs and deciduous shrubs in GLC2000 and CCI LC; however, both are merged into shrubs in GLOBCOVER 2005/2009. Forests and other natural vegetation mosaics are subdivided into forest/shrub/grass mosaics in GLC2000 and grassland/woodland/shrub mosaics in GLOBCOVER 2005/2009 and CCI LC. The classification scheme of a dataset can affect the consistency of datasets, and the definition of land use types based on these classification schemes can create differences in interpretations from remote-sensing classification. For example, MCD12's height threshold for dividing forests and shrubs is 2 m, while that in GLOBCOVER is 5 m and that in GLC2000 is 3 m. Therefore, the classification principles of these datasets require further investigation, and the connotation of land use types requires further revision to reduce the uncertainty in these classification schemes' development.
In addition, maps with more detailed classes are often transformed into generalized schemes with fewer classes in dataset comparison, which naturally removes the ability to describe detailed LC characteristics [59].
Three of the five types of datasets were based on artificially assisted supervised classification. The subjectivity of visual interpretation directly affects the accuracy of translation. The GLCNMO and MCD12 datasets use the decision tree method for mapping. However, the process of setting cell values in the leaf nodes of the decision tree causes the dominant categories to be overestimated and the non-dominant categories to be underestimated.
The area, spectral information and other features of many land use types fluctuate because of seasonal changes, and images that are captured by the issuing agency of the land-cover dataset may not be consistent between months. If the selected image was taken in the dry season, the water-identification capacity will be reduced. Using image data during the non-growing season of vegetation will also reduce the interpretation results of vegetation types.
The spatial resolution of datasets that are issued by institutions is different. Higher spatial resolution can effectively describe land-cover features, especially complex surface conditions. For example, the spatial resolution of GLCNMO in 2003 was 1 km, increasing to 500 m in 2008 and 2013, and the spatial consistency was significantly improved. Many classification principles of vegetation are based on these criteria and are defined from an ecological perspective, such as tree height and canopy density, which were difficult thresholds to be detected by the coarse-resolution images that were used in these mapping projects [58].
More than 30 countries and regional and international organizations participated in the development of GLC2000. The input data source, classification scheme, and classification methodology were all optimized to the needs of the participating institutions based on the land-cover types in their respective regions [27]. Because of this overwhelming participation, the overall consistency of the CCI LC from GLC2000 was relatively small according to Figure 1.

Lessons of Global Land-Cover Mapping
According to the previous analysis, results of globe, continent, and local area all have large inconsistencies, and the complexity of land and the processing of datasets by issuing agencies can greatly affect the inconsistency of datasets (specifications of spatial consistency on some regional land cover datasets are shown in Figure S4). We should also realize that land-cover mapping is a long-term process. To meet the needs of dataset construction, we hereby propose the following suggestions.
First, subsequent classification research should focus on areas with high land-cover complexity to further improve the classification accuracy. The problem of forest-grass-shrub mixing because of their similar spectra could be solved by combining topographical features and regional vegetation phenotypic data to reduce errors.
Second, the data producer should disclose the completed and detailed mapping process and dataset features, including the time information of each view of the dataset, which is convenient for the user to optimize and reclassify the dataset locally according to the characteristics of the time distribution and the research needs.
Third, radar data are sensitive to underlying water in vegetation; backscattered signals significantly increase when the ground floor is covered by water [60][61][62]. Dataset publications should consider using radar data to participate in the classification of land use types and improve the interpretation accuracy of wetland. Furthermore, cloud cover may cause problems in large areas of the humid tropics during a considerable portion of the year [63]. Radar images may overcome these problems to some extent [64,65].
Fourth, construction land classification on the global scale faces enormous challenges because of the small patch size. Several studies showed that the imagery of nighttime lights can effectively extract construction-land areas [66][67][68], and the feasibility of these data should increase with the launch of the 130 m resolution LJ01 nighttime-lighting satellite. In global land mapping, we can consider the integration of nighttime-lighting data to improve the classification accuracy of construction land.
Finally, significant regional differences in spatial consistency exist among land-cover datasets based on previous research. Datasets that are developed by regional or national cooperation must ensure that the quality of datasets is consistent at the global scale.

Conclusions
Based on the premise of temporal consistency, this paper compared the spatial consistency of five datasets, including GLC2000, CCI LC, MCD12, GLOBCOVER, and GLCNMO, from 2000 to 2013 at the global and continent scales. This research was conducted through composition similarity analysis, confusion matrix analysis, and spatial multiple-consistency analysis and examined inter-annual changes. At the same time, we studied the gradient characteristics of the spatial consistency of five datasets at different elevations and climatic zones. From the two angles of the surface conditions and dataset producers, the influence factors of the spatial consistency were determined and suggestions for global land-cover mapping were proposed. The conclusions are summarized as follows.
At the global scale, the average compositional similarity among five datasets was 70.82%, and the inter-annual variation was small. The mean overall consistency of all the datasets was 56.58%, and much room for improvement was observed. The spatial consistency of cropland, forests, water, bare land, and permanent ice and snow was close to or higher than 50%, while the spatial consistency of wetland and shrub was less than 50%. At the continental scale, the results of multiple datasets showed obvious intercontinental differences in the compositional similarity and overall consistency: Europe's composition similarity and overall consistency was higher, while those of Oceania were extremely low. The trend depiction of each dataset was more consistent in the elevation gradient of the overall consistency, especially in North America and Asia, and the comparison of the six datasets was completely consistent. For the spatial consistency of the climate-zone gradient, the results of the dataset comparison showed that the overall consistency of the EF, Af, Am and BW regions was higher, while the overall consistency of ET and Ds was the worst. The classification schemes, classification method, connotation of land use types, resolution size, selected time nodes for images and international participation, and the complex situation of surface conditions can greatly influence the spatial consistency.
This study showed that the spatial consistency of global datasets is characterized by spatial heterogeneity, that is, the consistency of different regions is quite different. Although the development of these land-cover datasets greatly promotes scientific research, satisfying the needs of high-precision land-surface-process simulations remains difficult. For the future development of global land-cover datasets, more attention should be paid to the experiences of previous global land-cover mapping activities, including land-cover classification in complex areas and regional representative classification sample selection.
In addition, it would be more effective if some validation points were used in case the lack of some validation points will lead to some uncertainty about the results of our research. While it is difficult for us to collect validation points from multiple time periods on a global scale, we believe that there is also a certain confidence through cross comparison between datasets. In addition, our work will be more useful if consulting with the advanced measures of thematic classification accuracy. Furthermore, our research only focused on medium resolution land-cover datasets and did not adopt GlobeLand30, for it has a resolution of 30 meters, which is significantly higher than other datasets we used. If there are more 30-meter resolution global land cover datasets in the future, we will compare the spatial consistency of these datasets including GlobeLand30.