1. Introduction
Land cover (LC) monitoring plays a pivotal role in ecological environment governance, agricultural and forestry management, urban and rural planning, and ecological conservation. The rapid development of computer technology and the continuous sharing of medium- to fine-resolution remote sensing data, such as from Landsat and Sentinel, have ushered the world into an era of fine-resolution remote sensing [
1,
2]. When conducting large-scale macro-scale land cover research, a resolution of 30 m is considered optimal [
3,
4]. Moreover, this resolution has an advantageous data size, enabling more efficient data processing and storage.
In recent years, many 30-m fine-resolution LC products have been produced. For example, Chen et al. (2015) developed the well-known GlobeLand30 using a pixel-object-knowledge-based (POK) classification strategy based on multi-temporal Landsat and HJ-1A/B imagery [
3], Zhang et al. (2021) generated a global-scale 30-m Global Land Cover with Fine Classification System (GLC_FCS30) product based on the spectral library of land cover and time-series Landsat data [
1], and Yang et al. (2021) utilized Landsat data and the random forest classifier to generate the annual China Land Cover Dataset (CLCD) based on the Google Earth Engine (GEE) [
5]. Similarly, Gong et al. (2013) produced the Finer Resolution Observation and Monitoring of Global Land Cover (FROM_GLC) product using the Landsat thematic mapper (TM) and enhanced thematic mapper Plus (ETM+) data [
6]. The United States Geological Survey (USGS) collaborated with multiple federal agencies to produce the 30-m National Land Cover Database (NLCD), which covers the entire United States [
7]. These large-scale fine-resolution LC products provide adequate information to observe the fundamental characteristics and trends of land cover.
However, the accuracies reported for the abovementioned large-scale LC products are generally towards global and national scales (
Table 1). For instance, the accuracy of GlobeLand30 was evaluated using 38,644 globally distributed validation samples [
3]. Similarly, the accuracy of GLC_FCS30 was assessed using 44,043 samples that were distributed across the globe [
1]. For the assessment of CLCD accuracy, a total of 5463 validation points within China were used [
5]. Nevertheless, due to the differences in scene complexity and data quality across regions, the accuracy of land cover products inevitably varies from region to region [
3,
8,
9,
10]. In some complex areas, the accuracy of LC products is even much lower than the overall accuracy of the report [
11]. For example, the overall accuracy of GlobeLand30 is reported to be 83.5% worldwide. However, its accuracy significantly drops to approximately 50% in Africa [
12]. Similarly, in the Central Asian region, the accuracy of GlobeLand30 was found to be only about 46% [
13]. Cui et al. (2023) collected thirteen sets of global or national-scale land cover datasets. Through the visual interpretation of high-resolution images, ground “truth” samples were collected to evaluate the data accuracy across Northeast China [
14]. Accordingly, the quantitative evaluation of the accuracy and consistency of existing 30-m land cover products at the regional scale is of great significance for the scientific use of these products.
The validation of land cover products is typically conducted by randomly selecting validation points and visually interpreting the labels of each point using high-resolution imagery for comparison with the product [
17]. Therefore, in order to accurately evaluate the accuracy of land cover products in a specific region, a scientific high-density validation dataset is a crucial prerequisite [
10,
15,
18,
19,
20]. The construction of a high-density verification dataset in a scientific manner, with sufficient quantities of representative samples covering different categories in each region, is a crucial issue. The research conducted by Sahr (2011) demonstrated that dividing the study area into equal-sized grids and assigning validation points at the grid level ensures coverage of sample points in all regions [
21]. Zhao et al. (2023) demonstrated that considering multiple factors such as climate and population when allocating sampling points and employing a stratified random sampling approach can result in a more reasonable distribution of validation points [
16]. To ensure an adequate number of samples in complex areas and appropriate placement of sampling points in homogeneous regions, this study employs a multi-indicator equal-area stratified random sampling method [
15]. This approach not only increases the sample size of rare land cover categories but also effectively reduces the standard deviation of accuracy assessment [
16]. Additionally, the use of equal-area stratified random sampling ensures the balance of different categories during the sample selection process, thereby enhancing the representativeness and reliability of the assessment results.
To this end, we first employed an equal-area stratified random sampling approach with climate, population density, and landscape heterogeneity information as constraints, along with the hexagonal discrete global grid system (HDGGS) as basic sampling grids to develop a High-Density Land cover Validation dataset in Xinjiang (HDLV-XJ). Then, the consistency and accuracy of three widely used 30-m resolution LC products (GLC_FCS30, CLCD, GlobeLand30) in 2020 were analyzed based on the developed validation dataset. Additionally, the suitability of these three LC products under different environment conditions was for the first time analyzed and revealed. The results of this study can provide an important reference value for the accuracy assessment of existing LC products on regional scales and for the application of LC products in specific scenarios. Additionally, our HDLV-XJ dataset can provide valuable data support for regional-scale research.
4. Results
4.1. The High-Density Land Cover Validation Dataset for Xinjiang
In order to generate a high-density and representative validation dataset for Xinjiang, the HDGGS was used to generate hexagonal grids as the basic unit for sample allocation.
Figure 6a shows the results of the generated hexagonal grid units, including 1604 grids in the Xinjiang region. In addition, a total of 20,932 validation samples were generated within these grid units. The spatial distribution and distribution via the categories of these validation samples are illustrated in
Figure 6b. The interpretation of land features relied on the comprehensive status observed in high-resolution imagery from Google Earth for 2020. It can be observed that the sample points are distributed across the study area. The equal-area stratified random sampling method employed in this study ensures the distribution of sample points in different areas.
Furthermore, in order to demonstrate the accuracy of the HDLV-XJ dataset that we developed, we merged the validation samples from SRS_Val and GLV_2015 to cross-compare with our HDLV-XJ product. Considering that the locations of the sample points in the three validation datasets do not correspond one-to-one, the nearest neighbor points were selected as the reference points for accuracy calculation [
45]. The confusion matrix was computed using Equations (2)–(5), and the resulting values are presented in
Table 5. When SRS_Val and GLV_2015 were used as reference benchmarks, the overall accuracy of our HDLV_XJ dataset exceeded 80% for both. Specifically, the producer’s accuracy for bare land and snow/ice was around 90%. In the case of SRS_Val vs. HDLV-XJ, the producer’s accuracy for grassland and cropland was above 90%. Although forest and shrubland showed a lower accuracy than the above land cover types, the agreement still reached 75%. Taking into account errors in the visual interpretation of the validation samples, as well as differences in the years represented by the validation datasets, the comparison results with the third-party dataset demonstrated the reliability of our HDLV-XJ dataset.
4.2. Accuracy Assessment of GLC_FCS30, GlobeLand30, and CLCD
The confusion matrices for the three LC products in Xinjiang were calculated using the validation dataset constructed for this study, utilizing Equations (2)–(5). Based on the obtained results, the GLC_FCS30 exhibits the highest overall accuracy of 88.10%. The GlobeLand30 follows with the second highest overall accuracy of 83.58%, while the CLCD demonstrates the lowest overall accuracy of 81.57%.
The GLC_FCS30 demonstrates an overall accuracy of 88.10%, along with a kappa coefficient of 0.799 (
Table 6). Regarding the producer’s accuracy, forest exhibits the highest accuracy, followed by bare land, shrubland, cropland, and water. However, impervious and wetland display lower accuracies. These findings suggest that regions with homogenous surface cover types occupying larger proportions in the study area generally exhibit higher accuracy levels. Conversely, complex surface cover types often exhibit confusion with other types. For instance, wetland, which possess particularly intricate spectra, are prone to confusion with vegetation [
46]. According to
Table 6, approximately 49.4% of wetland validation points were incorrectly classified as vegetation types, including cropland, forest, shrubland, and grassland. In terms of user’s accuracy, cropland, water, permanent snow/ice, and bare land exhibit similar accuracies to the producer’s accuracy. Water demonstrates the highest user’s accuracy, reaching 96.24%. This indicates the product’s remarkable capability to accurately assign samples to their respective water categories.
The CLCD achieves an overall accuracy of 81.57%, with a kappa coefficient of 0.675 (
Table 7). Considering producer’s accuracy, bare land exhibits the highest accuracy at 91.01%, followed by water, cropland, and grassland. However, shrubland and wetland display relatively low producer’s accuracies, possibly due to identification difficulties within the CLCD. In terms of user’s accuracy, cropland and forest demonstrate higher accuracies at 93.51% and 87.84%, respectively, indicating strong classification abilities for these land cover types. Notably, the producer’s accuracy and user’s accuracy for the shrubland in the CLCD classifier are both zero, suggesting that no samples were correctly classified within this category in the training dataset for the CLCD classifier, as depicted in
Figure 7c,d. This deficiency in training data representation for the shrubland potentially undermines the classification performance of this specific class.
The overall accuracy of the GlobeLand30 is 83.58%, with a kappa coefficient of 0.717 (
Table 8). Regarding the producer’s accuracy, bare land obtains the highest accuracy at 90.28%, followed by cropland, water, and wetland. On the other hand, shrubland has a relatively low producer’s accuracy of 18.24%, indicating significant challenges in classifying shrubland. As for the user’s accuracy, cropland and bare land demonstrate higher accuracies of 90.13% and 89.60%, respectively, suggesting GlobeLand30’s strong ability to correctly classify instances of cropland and bare land. However, the user’s accuracy for shrubland is lower, at 32.77%. In summary, GlobeLand30 performs well in some land cover categories, such as cropland and bare land, but there is room for improvement in the classification of other categories.
In conclusion, it is evident that complex land cover types are more susceptible to misclassification. For example, based on
Table 6, it is apparent that wetlands in GLC_FCS30 exhibit a higher confusion ratio, with over 50% of validation samples being misclassified as other types, as demonstrated in
Figure 7a,b. There are also numerous instances of misclassification between similar land cover types. In particular, approximately 20% of shrubland was classified incorrectly as forest and grassland, as depicted in
Figure 7c,d.
4.3. Consistency Analysis for Global Land Cover Products
Through the spatial overlay of the three LC products, a combination of visual mapping and quantitative expression approaches was employed to illustrate the overall pixel consistency between two or more products, as well as the pixel consistency for each land cover type.
To begin, the overall similarity coefficient (
OS) and class similarity coefficient (
CS) were computed for different products using Equations (6) and (7), as presented in
Table 9. The overall similarity coefficient between CLCD and GlobeLand30 was 83.32%, whereas between GLC_FCS30 and CLCD, it was 78.46%. Notably, the lowest overall similarity coefficient of 75.16% was observed between GLC_FCS30 and GlobeLand30. The above analysis reveals that approximately 75% of pixels share the same land cover label between any two LC products. Additionally, through overlaying the three products, an overall similarity coefficient of 69.96% was obtained, indicating a lower coefficient compared to the pairwise similarities. Consequently, it can be inferred that the land cover types are completely identical in 69.96% of the study area.
Subsequently, the class similarity coefficients were calculated among the three products. The analysis reveals that the class similarity coefficients for bare land, water, and cropland exceed 60%, indicating a spatial correspondence of over 60% between these land cover types (
Figure 8). Remarkably, the average value of the class similarity coefficient for bare land stands is highest, at 77.14%. In contrast, wetland and shrubland exhibit the lowest class similarity coefficients, suggesting limited overlap among all three products. Additionally, the consistency for the impervious category is notably low, with an average class similarity coefficient of 21.52%.
Simultaneously, the pixel consistency between the two products across different land cover types aligns with the aforementioned similarity of the three products. The highest consistency is observed in bare land, with class similarity coefficients of 73.46%, 90.41%, and 69.55% for GLC_FCS30 vs. CLCD, CLCD vs. GlobeLand30, and GLC_FCS30 vs. GlobeLand30, respectively. Following that, cropland exhibits the next highest consistency, with class similarity coefficients ranging from 64% to 80%. Conversely, the class similarity coefficients for shrubland and wetland are below 9%.
The average areas of the nine land classes for the three LC products, as presented in
Table 10, can be sorted in descending order based on their average area: bare land, grassland, cropland, permanent snow/ice, forest, shrubland, water, impervious, and wetland. Notably, the average areas of bare land, grassland, and cropland surpass 90,000 km
2. Their respective average areas are 1,102,577.54 km
2; 350,050.94 km
2; and 98,989.83 km
2. Bare land covers 66.22% of the region, making it the predominant geographical landscape.
Table 11 presents the comparative area results for each land class in three LC products: GLC_FCS30, CLCD, and GlobeLand30. There is a relatively high consistency in the area measurements for bare land, cropland, and permanent snow/ice across the three products. The consistency is moderate for water and impervious. However, significant variations exist in the area measurements for some land classes, particularly for shrubland and wetland, indicating low consistency. The area of shrubland in GLC_FCS30 is nearly four times larger than in GlobeLand30, while CLCD only reports shrubland of 1.27 km
2, significantly smaller than both GLC_FCS30 and GlobeLand30. Moreover, GlobeLand30 indicates a larger wetland area of 9197.70 km
2 compared to the other two products. In contrast, CLCD reports a mere 500.83 km
2 of wetland area, accounting for a mere 0.03% of the total area.
An analysis of area consistency between GLC_FCS30 and GlobeLand30 exposes notable variations across different land cover types. The areas of bare land in both products exhibit high similarity, encompassing approximately 66% of the total study area. A similar pattern emerges in cropland, where GLC_FCS30 reports an area of 104,931.34 km2 and GlobeLand30 reports an area of 104,293.52 km2. However, for other categories, the consistency between the two products is relatively low.
6. Conclusions
Fine-resolution LC products have been developed in recent years. However, the accuracy evaluation of the developed LC products is typically conducted at the global and national levels, with limited consideration for their accuracy and applicability in regional areas. Therefore, conducting a comprehensive accuracy assessment and consistency analysis of LC products in local areas is crucial for users to effectively compare the performance of different products. This study examined Xinjiang as the research area and constructed a high-density land cover validation dataset (HDLV-XJ, containing 20,932 validation samples) based on multi-source remote sensing data for Xinjiang. The accuracy and consistency of three mainstream land cover products in 2020 with a resolution of 30-m (GLC_FCS30, CLCD, GlobeLand30) were analyzed based on the constructed validation dataset.
The results indicated that the CLC_FCS30 exhibited the highest overall accuracy (88.10%) in Xinjiang, followed by GlobeLand30 (with an overall accuracy of 83.58%). By contrast, CLCD demonstrated the lowest overall accuracy of 81.57%. In terms of consistency, Xinjiang demonstrates high-consistency patterns for bare land, farmland, and water. Specifically, the average consistencies between the GLC_FCS30 and GlobeLand30, GLC_FCS30 and CLCD, and GlobeLand30 and CLCD are 77.14%, 70.21%, and 68.66%, respectively. However, this consistency drops significantly when it comes to wetland and shrubland, with a value of less than 1% for each. Furthermore, among the LC products, GlobeLand30 demonstrated the highest performance in regions characterized by high landscape fragmentation. On the other hand, GLC_FCS30 emerged as the superior product in areas with uneven proportions of land cover types. Additionally, the utilization of a local adaptive classification mapping strategy offers significant advantages in enhancing the accuracy of land cover mapping. The study provided important insights for the application of current land cover products in Xinjiang, uncovering both the accuracy and limitations associated with these products.