Performance of Global Land Use Land Cover Products for Southwest China Karst

Zhang, Chunhua; Qi, Xiangkun; Cheung, Hoi Shan; Zhang, Mingyang; Yue, Yuemin; Wang, Kelin

doi:10.3390/rs18101573

Open AccessArticle

Performance of Global Land Use Land Cover Products for Southwest China Karst

by

Chunhua Zhang

¹

,

Xiangkun Qi

²

,

Hoi Shan Cheung

³,

Mingyang Zhang

²

,

Yuemin Yue

²

and

Kelin Wang

^2,*

¹

Department of Biology, Algoma University, Sault Ste. Marie, ON P6A 2G4, Canada

²

Huanjiang Observation and Research Station for Karst Ecosystem, Guangxi Key Laboratory of Karst Ecological Processes and Services, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, China

³

Independent Researcher, Hong Kong, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(10), 1573; https://doi.org/10.3390/rs18101573

Submission received: 10 March 2026 / Revised: 29 April 2026 / Accepted: 9 May 2026 / Published: 14 May 2026

(This article belongs to the Section Environmental Remote Sensing)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

The ESA WorldCover 2021 product outperformed ESRI Land Cover and Dynamic World (annual mode composite) in mapping accuracy for the karst region of Southwest China, better preserving fragmented features (e.g., roads and small fields) than its competitors;
All three products showed major limitations in separating spectrally similar vegetation (shrub, grass, and crops) and were strongly affected by topographic shadows and haze. Accuracy dropped sharply at patch boundaries (where over half of errors occurred).

What are the implications of the main findings?

ESA WorldCover 2021 offers the best balance of spatial detail and accuracy among 10 m global products. Regional validation remains essential because global products can underperform in complex terrain;
The poor distinction between shrub, crops, and grass classes constrains these datasets’ value for monitoring vegetation recovery. Future improvements in karst areas should integrate multi-source (optical + synthetic aperture radar) and multi-temporal data with topographic correction.

Abstract

Accurate land use and land cover (LULC) data are essential for effective environmental management and reliable ecological modeling within complex landscapes such as the karst region of Southwest China. While new 10 m resolution global LULC products (i.e., ESA WorldCover, ESRI Land Cover, and annual mode composite of Dynamic World (DW)) offer unprecedented spatial detail, their reliability in heterogeneous karst remains poorly understood. We evaluated the accuracy and spatial consistency of these products for 2021 in the karst regions across five provinces in Southwest China using 1416 reference points collected through stratified random sampling. The ESA WorldCover dataset outperformed the others, achieving the highest overall accuracy (79.39 ± 2.19%). ESRI’s shrub metrics, however, reflect the structural absence of this class from its 2021 product rather than classification error. ESA’s superior performance in preserving fine-scale features is consistent with independent global assessments of both the 2020 and 2021 versions. This superior performance is attributed to its integration of Sentinel-1 SAR with optical data, a finer minimum mapping unit (100 m²), and expert-driven post-classification corrections. While all products successfully identified dominant classes like trees, substantial confusion emerged among spectrally similar classes such as shrubs, grass, and crops. A key finding was the strong effect of landscape heterogeneity on accuracy. Classification accuracy was 19.37% lower at patch edges (67.38%) compared to patch interiors (86.75%). Furthermore, edge reference points contribute disproportionately to total errors. Critically, none of the three products currently provide a sufficient basis for shrub-focused ecological monitoring in this region: ESA rarely detected shrub cover, DW mapped extensive but largely inaccurate shrub areas, and ESRI eliminated the shrub class from its 2021 product. These results show that while global 10 m products provide valuable information, careful product selection and regional validation remain essential for heterogeneous karst environments. Future improvements should integrate multi-source data (optical + synthetic aperture radar), apply topographic compensation for shadow effects, and develop region-specific approaches for mapping vegetation transitions.

Keywords:

accuracy assessment; karst terrain; ESA WorldCover; Dynamic World; ESRI Land Cover; Sentinel-2; landscape heterogeneity; edge effects; vegetation transition

1. Introduction

Land use and land cover (LULC) data is fundamental for ecosystem condition monitoring and modelling, environmental management and decision making. Over the past two decades, global LULC products have evolved from coarse-resolution datasets (around 1 km) to medium-resolution (30 m) and, more recently, to high-resolution 10 m products derived primarily from Sentinel-2 imagery [1,2,3]. Three 10 m global products—European Space Agency WorldCover (ESA), ESRI Land Cover (ESRI), and the annual mode composite of Dynamic World (DW)—now offer unprecedented spatial detail and are increasingly used in regional studies.

While the producers report satisfactory global accuracies (76.7% for ESA, 73.8% for DW, 85.0% for ESRI for 2021 dataset [1,2,3]), their reliability in topographically complex, heterogeneous landscapes remains uncertain. Southwest China’s karst region is characterized by extreme relief, small patch sizes, and complex vegetation transitions driven by ongoing ecological restoration, representing an ideal and demanding test case for such evaluation [4].

Independent karst regional validations are limited, particularly in karst environments. Previous studies [5,6,7] have shown variable performance across regions, but none have systematically assessed all three 10 m products in Southwest China’s karst region using a design-based sampling approach. This study addresses this gap by (1) assessing the overall and class-specific accuracies of the ESA, ESRI, and DW in Southwest China’s karst region; (2) quantifying the influence of landscape heterogeneity on classification accuracy; and (3) identifying key challenges and potential improvements for mapping the complex karst terrain.

2. Literature Review

2.1. Evolution of Global LULC Products

LULC data can be obtained from various sources and at different spatial scales. Historically (especially prior to the 1990s), LULC and topographic maps were created using labor-intensive methods [8]. Data from visually interpreted aerial photographs and early satellite images were transferred to hardcopy maps or digital versions.

At global and regional levels, LULC data were initially available at coarse resolution due to the deployment of satellites with both low spatial resolution (1 km and coarser) and high temporal resolution (1 day or less) starting in the late 1980s [9]. Publicly available LULC datasets (e.g., Global Land Cover (GLC) 2000, GLC-Share, MODIS (MCD12Q1), and VIIRS Surface Type (ST) Environmental Data Record (EDR)) are derived from a range of satellite sensors, including the Advanced Very High-Resolution Radiometer (AVHRR), Moderate Resolution Imaging Spectroradiometer (MODIS), SPOT 4 Vegetation, and the Visible Infrared Imaging Radiometer Suite (VIIRS). While this group of LULC datasets has been critical in environmental monitoring and modelling at the global level, their coarse resolution limits their application at regional and local scales. The extensive historical data from the Landsat and Sentinel programs facilitates the development of finer medium-resolution regional and global LULC products (e.g., Global Land Cover and Land Use Change [10]; GlobeLand30 [11]).

While medium- and coarse-resolution global LULC datasets may be applied to regional studies, they lack the necessary fine details, and their quality is significantly compromised by the severe terrain relief inherent to complex terrains such as karst landforms. One of the most significant recent advances in Earth surface monitoring is the public sharing of 10 m resolution global LULC datasets derived from Sentinel-2 imagery in the early 2020s: DW from Google and World Resources Institute (2015-near-real-time) [1], ESRI developed by Impact Observatory, Microsoft and ESRI (2017–2024) [2], and ESA (2020 and 2021) [3].

2.2. Challenges in Heterogeneous Landscape

These globally created 10 m resolution LULC datasets offer unprecedented spatial detail and significant potential for diverse regional applications, such as ecosystem condition assessment, ecosystem service valuation, and integrated land-atmosphere modelling. However, a critical question arises when applying these global level products to regional-scale studies: do these datasets maintain sufficient accuracy across different landscape types and environmental conditions? This question is particularly relevant for spatially heterogeneous terrains, where classification challenges are amplified by fragmented LULC patterns, small patch sizes, and complex environmental gradients.

Classification accuracy is strongly influenced by landscape heterogeneity, with all three datasets demonstrating reduced performance in spatially complex environments compared to homogeneous landscapes [12]. This heterogeneity effect is particularly pronounced in mountainous regions with fragmented land cover, where small patches, abundant edge pixels, and topographic complexity challenge the spectral signatures and contextual algorithms underlying these global products. Despite the common practice of applying data created at larger scales to regional studies [13,14], researchers typically neglect to verify whether the accuracy of the large-scale product is sufficient for the target area. This lack of quality control can be problematic, as LULC data accuracy can introduce significant uncertainty and negatively impact research outputs [15,16].

2.3. Karst Landscapes as a Critical Test Case

Karst landscapes are a good test environment for evaluating global LULC products because their extreme spatial heterogeneity presents a significant challenge. These distinctive terrains cover approximately 15% of Earth’s land area [17,18], support roughly 25% of the global population through karst water resources [14], and harbor exceptional biodiversity and endemism [19]. Southwest China’s tower karsts, in particular, rank among the world’s most iconic karst landscapes [20,21]. While earlier studies relied on Landsat-derived LULC products [22,23,24], the advent of 10 m global LULC datasets offers an opportunity to analyze karst landscapes with greater spatial detail, provided that accuracy is adequately maintained in such challenging terrain.

2.4. The Need for Regional Validation

While the producers provide satisfactory accuracy assessments, third-party evaluation is needed to quantify their reliability for various environmental and ecological applications. Globally, only two studies [12,25] systematically assessed the accuracy of these three products. Both studies found broadly similar trends, with ESA generally outperforming the other two products. However, accuracy values varied depending on the year, reference data, and assessment methodology employed. Across the two studies, accuracy values varied: one reported overall accuracy from 65% (DW) to 75% (ESRI) for the 2020 datasets [25], while another found values between 73.4% (ESRI) and 83.8% (ESA) for 2021 [12]. Beyond accuracy, these studies noted consistent strengths (high accuracy for Water and Trees) and weaknesses (poor discrimination among spectrally similar vegetation). ESA also demonstrated better spatial detail.

All three LULC datasets performed worse in heterogeneous landscapes than in homogeneous ones [12,25]. Furthermore, neither study quantified the relationship between landscape heterogeneity and classification accuracy, nor did they focus on extremely fragmented terrains such as karst regions. Regional evaluations in other areas (e.g., Syria [5], Northwest China [6]) and Southwest China karst (e.g., Guangxi Province [7]) have produced highly variable results, underscoring the need for targeted validation in complex landscapes.

Characterized by extreme topographic relief, small patch sizes, high edge density, and abundant transitional vegetation communities, the Southwest China karst region presents a testing ground that highlights both the strengths and limitations of these global datasets. Accurate LULC mapping is particularly critical in this context because shrubland and grass–shrub–forest transitions are key indicators of ongoing ecological recovery following karst rocky desertification (KRD). Yet, no systematic accuracy assessment of all three 10 m global LULC products has been conducted in this environment using standard design-based methods, leaving a significant gap in our understanding of their utility for regional ecological applications.

3. Study Area

The study area encompasses 382,467 km² of karst terrain within five provinces of Southwest China: Sichuan, Guizhou, Yunnan, Guangxi, and Chongqing (Figure 1). This region is one of the world’s largest continuous karst areas and includes diverse karst landforms, including tower karsts, peak-cluster depressions, cone karsts, gorges, giant collapse depressions, and sinkholes [26]. The region’s bedrock is dominated by limestone, dolostone, and various interbedded formations of carbonate and clastic rocks. The terrain is highly fragmented: dramatic elevation changes produce extreme heterogeneity in land cover patterns. Even within a single landform, variations in slope and aspect create distinct surface features. For example, rock outcrops and grass may dominate the ridges of tower karsts while trees and shrubs flourish on the flanks [21].

The climate is subtropical monsoon, with annual precipitation of 1000–1800 mm and mean temperatures of 14–24 °C [27]. Despite high rainfall, the dual surface–underground hydrological system causes rapid drainage and frequent droughts [27]. The persistent cloud cover throughout much of the year creates specific challenges for optical remote sensing in the region. Topographic shadows and atmospheric haze are common.

Human land use follows topographic constraints. In typical peak–cluster depressions, villages are located near depression bottoms, surrounded by terraced croplands on gentler slopes and natural vegetation (trees and shrubs) on steeper upper slopes. This creates a fine-scale mosaic of small agricultural fields, exposed bedrock, and vegetation patches.

Human activities and national land use policies have significantly shaped the karst landscape over recent decades. Before the 1990s, high population pressure on limited arable land led to the expansion of cropland onto steep slopes [4], causing widespread soil erosion, vegetation loss, and KRD. Areas with severe KRD (bedrock exposure >80%) covered 3.5% of the region by the end of the 20th century [28].

Since the early 2000s, comprehensive ecological restoration programs (e.g., the Grain for Green Project, environmental migration initiatives, and rocky desertification treatment programs) have been implemented to alleviate environmental pressure and promote sustainable land use [4]. Concurrently, substantial rural-to-urban migration has reduced population pressure in many karst areas [29,30]. Driven by active restoration and out-migration, widespread vegetation regrowth has occurred in the area [31,32]. This has created a shifting landscape where transitional shrublands are now essential for controlling soil erosion and promoting forest regrowth [33,34,35].

The dominant LULC types in the region include trees (primarily subtropical evergreen broadleaf forests and secondary forests), crops (featuring small, terraced plots with rice, corn, and sugarcane as major crops), grass, shrub, built areas (highways, urban centers, towns, and rural settlements), bare ground (exposed bedrock and sparse vegetation), and water bodies (rivers, reservoirs, and lakes).

4. Methodology

4.1. Data Acquisition and Preprocessing

The ESA, ESRI, and DW datasets for 2021 were compared in this study. The ESA dataset is available for 2020 and 2021; we selected the 2021 product as it was reported to have higher accuracy than the 2020 version [3]. To obtain map data for the study area, we used the Google Earth Engine (GEE) platform to clip the ESA and ESRI Land Cover datasets to the study area boundary and downloaded them as GeoTIFF file tiles. The downloaded GeoTIFF tiles, including the modified composite of the DW dataset, were subsequently mosaicked in ArcGIS Pro 3.6 (ESRI, Redlands, CA, USA) for analysis. The Copernicus GLO-30 Digital Elevation Model (DEM) was also clipped to the study area via GEE. Rock type and Chinese administrative boundary GIS layers were downloaded from https://www.geodata.cn/main/ (accessed on 6 April 2025).

It is important to note that the DW dataset evaluated in this study is a modified annual composite rather than the native near-real-time DW product. The DW dataset provides cloud-free (<35%) Sentinel-2 imagery with LULC classifications and per-class probability scores. We synthesized all available images for 2021 into a single LULC data layer by calculating the mode of the classification for each pixel using GEE [25]. Certain high-elevation areas lack 2021 LULC composite data due to impacts from persistent cloud cover, high atmospheric opacity, and extreme topographic shadows caused by sharp elevation contrasts. To ensure complete coverage, a systematic gap-filling process was implemented. The procedure prioritizes available LULC data from 2021, then sequentially utilizes 2020, 2022, 2019, 2023, 2018, 2024, and 2025 data, ordered by temporal proximity. When no annual data were available, a 5-pixel focal median filter was applied as the final gap-filling method. This compositing approach—annual mode aggregation, multi-year gap filling, and focal median smoothing—differs from the native per-scene probabilistic output of DW and should be considered when interpreting the accuracy results reported here. While using LULC labels from different years could lower the overall accuracy (OA) of the composited dataset due to possible LULC changes, the effect is expected to be minor. The gap-filled regions are small, accounting for only 0.01% of all DW pixels, and they are mostly found in high-elevation areas, where LULC change occurs slowly because of low population density.

4.2. Harmonization of LULC Classification Systems

One important difference between the LULC schemas for the DW and ESRI datasets is that ESRI eliminated the Shrub class starting in 2021 by merging it into either the rangeland or forest categories. Consequently, inter-product comparisons involving the Shrub class are inherently asymmetric: ESRI’s shrub accuracy metrics under the harmonized scheme reflect a structural design choice rather than classification performance and should be interpreted accordingly. However, shrub is a significant LULC type in karst regions due to its distinctive ecological role in ecosystem recovery processes. Flooded vegetation (e.g., rice paddies) is also agriculturally significant in the region. We therefore retained nine LULC types following the DW classification system as described in [25]: Water, Trees, Grass, Crops, Built areas, Bare ground, Flooded vegetation, Shrub, and Ice/snow.

Additional LULC types in ESA were aggregated to align with the DW schema. Specifically, “herbaceous wetland” from ESA was treated as “Flooded vegetation” in DW. The classes “barren/sparse vegetation”, “moss and lichen”, and “bare/sparse vegetation” were merged into “bare ground”. We assigned numerical codes (0–8) to each class following the DW convention for consistency in analysis: Water (0), Trees (1), Grass (2), Flooded vegetation (3), Crops (4), Shrub (5), Built-areas (6), Bare ground (7), and Ice/snow (8).

4.3. Spatial Correspondence Assessment

We quantified the agreement of per-class spatial distributions among the three global LULC products using a pixel-based overlay analysis in ArcGIS Pro. For each pixel location, we compared the LULC classifications across all three datasets and categorized the results into three levels of agreement: Unanimous agreement (all three datasets assign the same LULC type to the pixel); majority agreement (two of the three datasets agree on the LULC type, the most common class has a majority vote); and full disagreement (all three datasets assign different LULC types to the pixel). We calculated the area and percentage of each agreement category across the entire study region. Additionally, we created a Sankey diagram [36] to visualize how pixels were classified differently across the three datasets.

4.4. Accuracy Assessment

We conducted an accuracy assessment following the design-based inference approach recommended by Olofsson et al. [37]. The error matrix was area-weighted (i.e., cells weighted by the proportion of the study area mapped to each class) rather than based on sample counts. OA was calculated as the area-weighted sum of correctly classified proportions (the diagonal elements of the estimated error matrix in terms of proportion of area). UA for each class represents the probability that a pixel mapped as that class is correctly classified (i.e., map correctness from the user’s perspective). PA indicates the proportion of the true (reference) area of a class that was correctly mapped. Variance estimates for OA, UA, and PA were computed following the stratified estimation formulas in [37] by incorporating the mapped area proportion for each class and accounting for unequal sample allocation across land cover classes. 95% Confidence intervals (CI) were calculated as ±1.96 × standard error. The standard error is the square root of the estimated variance.

Following [37], we constructed a reference dataset for the study area (defined by rock type: mostly limestone and dolostone, and their interbedding with other rock types) using a hybrid approach that combines area-proportional allocation with minimum sample sizes for rare classes. This disproportionate sampling strategy enables precise class-specific estimates while avoiding an impractically large total sample. We set a target standard error of 1% for the OA metric and initially allocated 75 samples to LULC types with low percentages (i.e., Water, Flooded vegetation, and Ice/snow) to ensure sufficient representation for calculating class-specific accuracy metrics. Sample sizes for extremely small classes (e.g., Flooded vegetation) were subsequently reduced to maintain the integrity of the stratified design. The remaining samples were then distributed proportionally to the area of the more common classes (Trees, Grass, and Crops). This resulted in an initial set of 1450 sample points. To enhance the spatial independence of the sample, we subsequently removed any reference points of the same LULC type that were located within 1 km of their nearest neighbor of the same class. As a result, a final set of 1416 reference points was utilized in accuracy assessment (Table 1).

The LULC type for each reference point was determined through visual interpretation of high-resolution natural colour image composites in Google Earth. We prioritized imagery from 2021 to match the temporal window of the evaluated LULC datasets. When 2021 high resolution imagery was unavailable, we used imagery from adjacent years in Google Earth after cross-validation with cloud-free Sentinel-2 Level-2A composites from 2021 to ascertain the LULC type. To ensure accuracy and consistency, one interpreter labeled all reference points, while a second interpreter verified challenging LULC points that are spectrally similar or spatially heterogeneous (e.g., crops, shrubs, and grasses). Discrepancies were resolved through discussion and joint re-examination of the imagery.

4.5. Terrain and Landscape Heterogeneity Analyses

For terrain analysis, we derived slope and aspect layers from the Copernicus GLO-30 DEM in ArcGIS Pro. Elevation, slope, and aspect values were then extracted for each reference point location. To evaluate how data sources and algorithmic choices manifest in the final land cover products, we calculated four landscape metrics (patch density, mean shape index, area-weighted mean shape index, and edge density) for five of the clips (A, B, C, D, and F) in Figure 3. These metrics were generated using the landscapemetrics package in R.

Landscape heterogeneity is quantified using patch size and LULC heterogeneity on the ESA dataset [38]. The patch size is calculated as the total count of connected pixels of the identical LULC type that contains the sample point. Pixels are considered contiguous if any of the eight surrounding pixels share the same LULC type as the central pixel. LULC heterogeneity at each reference point was quantified using a 3 × 3 pixel moving window (i.e., the focal pixel and its 8 immediately adjacent neighbors) centered on that pixel. This 30 m × 30 m window is the smallest neighborhood unit that can detect class boundaries while remaining computationally tractable and ecologically meaningful for 10 m imagery. The heterogeneity value is the count of distinct LULC classes present within this 9-pixel neighborhood. A value of 1 means all nine pixels (focal + eight neighbors) share the same LULC class, placing the reference point inside a homogeneous patch. A value greater than 1 means that at least one neighboring pixel belongs to a different LULC class, indicating that the reference point lies at or very close to a patch boundary (i.e., within one pixel of a class transition). The 3 × 3 window was selected because it is the minimum kernel capturing all eight-connected neighbors [38], consistent with the minimum mapping unit of the evaluated LULC datasets (100–250 m²), and it directly captures sub-patch mixed-pixel effects, which represent the primary source of boundary-related error at 10 m resolution. This is conceptually equivalent to identifying edge pixels in landscape ecology, where a pixel is considered an edge if any of its eight-connected neighbors belong to a different class. These calculations were conducted within a 3-km buffer around each reference point to improve computational efficiency while capturing local landscape context. This buffer size is sufficient to characterize patch sizes and heterogeneity patterns in the fragmented karst landscape, where patches are typically small to medium-sized. The sensitivity of the accuracy estimates to varying neighborhood definitions was evaluated by computing heterogeneity metrics at two additional window sizes (5 × 5 and 7 × 7 pixels) and comparing the resulting interior versus edge accuracy differences across these scales.

To verify that our spatially constrained reference samples (1 km minimum distance) adequately represent the heterogeneous karst landscape, we generated 10,000 systematic sample points using a regular fishnet grid (grid spacing: 13.6 km) across the study area extent. After removing points outside the classified area, 1801 valid points remained. We then quantified LULC heterogeneity at the systematic sampling points and used a Chi-square test to assess whether their distribution differed significantly from that of the reference samples. Given large sample sizes (n = 1416 and 1801, respectively), we calculated effect sizes to assess practical significance (Cramér’s V < 0.1 indicated negligible practical difference [39]).

5. Results

5.1. Spatial Consistency Among Global Products

5.1.1. Area Distribution by LULC Type

The three global LULC datasets were consistent in identifying dominant LULC types (e.g., Trees) and sparse land cover (Water, Bare ground, and Ice/snow) (Table 2). All three datasets indicated that tree cover was the most dominant LULC type in the region. The three least populated LULC types were Water, Bare ground, and Ice/snow. However, the estimated LULC type areas and percentage varied from one dataset to another. Tree area increased from a conservative 62.11% (ESA) to 70.51% (DW).

Significant discrepancies emerged for other LULC types. Grass estimates showed an inverse relationship with tree cover: ESA reported the highest grass percentage (20.37%), followed by ESRI (17.72%), while DW estimated only 2.58%. These differences suggest systematic classification confusion between Trees and Grass across products. Cropland estimates increased progressively from DW (5.98%) through ESRI (8.23%) to ESA (13.88%). DW allocates significantly higher proportions to Shrub (10.95%) and Water (1.83%) compared to the other products. Built areas estimates are relatively consistent between DW (6.41%) and ESRI (8.47%) but substantially lower in ESA (1.62%). This likely reflects different definitions of “Built areas” between land-use-focused (ESRI) and land-cover-focused (ESA, DW) classification schemes.

5.1.2. Pixel-Level Agreement Patterns

The three LULC datasets had a relatively high level of consistency in the karst region. Complete agreement across all three datasets occurred for 60.45% of the total area. An additional 32.45% of the region showed a partial consensus, where LULC types match in any two of the three datasets. Specifically, ESA shares LULC types with the ESRI and DW datasets at percentages of 13.22% and 6.59%, respectively. Additionally, 12.65% of areas have identical LULC types in the ESRI and DW datasets. Only 7.09% of the area exhibits complete disagreement. Tree cover is the most consistent LULC type across the evaluated datasets (Table 3). Shrub is the LULC type with the least consistency.

5.1.3. Cross-Product Classification Transitions

The majority of pixels in dominant LULC types aligned correctly across datasets (Figure 2, Table 4). 81.85% of Trees in DW (2.48 billion out of 3.03 billion total tree pixels) corresponded to Trees pixels in ESA. Similarly, 80.58% and 73.40% of pixels classified as Crops and Grass from DW aligned with their ESA counterparts, respectively. Comparing ESA to ESRI, agreement was even stronger for certain classes: 95.88% of Water pixels, 92.59% of Built area pixels, and 88.45% of Trees pixels from ESA were classified identically in ESRI.

However, the three datasets disagree on transitional and heterogeneous LULC types. Only 46.11%, 31.08%, and 22.11% of Bare ground, Water, and Built areas pixels in DW corresponded to their counterparts in ESA. The Shrub class in DW was particularly problematic: 64.30%, 19.77%, and 13.45% of pixels correspond to Grass, Crops and Trees pixels in ESA. This confusion likely stems from differences in class definitions and the spectral similarity between Shrub and other vegetation types in the karst landscape. For the ESA-to-ESRI comparison, Grass, Crops, and Bare ground show the lowest correspondence rates (51.59%, 41.00%, and 32.13%, respectively). Furthermore, 33.44%, 7.63%, and 6.25% of pixels classified as Grass in ESA were labeled as Trees, Built areas, and Crops pixels in ESRI. Minor LULC types like Flooded vegetation and Ice/snow show an even higher percentage of misalignment with the respective LULC types sharing less than 1% of pixels across LULC datasets. These results align with global-scale research, which indicates that Bare ground, Grass, and Shrub classes exhibit the lowest inter-product agreement [25].

5.2. Comparative Accuracy Assessment

The ESA 2021 dataset achieved the highest OA for Southwest China’s karst region, with an OA of 79.39 ± 2.19% (95% CI, Table 5). The area-weighted OA, calculated following [37], was slightly lower than the simple pixel-based accuracy (81.0%) due to the dominance of the Trees class (62.11% of study area) in the mapped area proportions. Comparatively, ESRI showed substantially lower performance (OA: 65.29 ± 2.41%) (Table 5). The OA for the DW LULC dataset was 64.24 ± 2.52%, statistically like ESRI LULC. Accuracies varied substantially by LULC type and were generally consistent with global assessments [12,25]. Water and Trees were the most accurately mapped classes across all products, with ESA performing slightly better. All three datasets struggled with spectrally similar vegetation, showed confusion in Built areas and Crops, and tended to overestimate Trees.

For the ESA 2021 dataset, accuracies varied substantially by LULC type. Water showed high UA (96.00 ± 4.46%) but high PA (82.73 ± 23.45%). The large CI reflects the small mapped area proportion (0.59%). Ice/snow showed high UA (88.73 ± 7.41%) but low PA (19.00 ± 21.22%), indicating that the map frequently missed Ice/snow areas despite high correctness when it did map this class. The high PA (97.08 ± 1.30%) and high UA (87.87 ± 2.53%) for Trees, the dominant LULC type, reflected the map’s strong performance for this major class.

In contrast, Grass LULC in the ESA 2021 dataset had a low UA (44.55 ± 6.72%) but high PA (89.37 ± 5.68%), indicating substantial overestimation. Out of 211 reference points classified as Grass, 17 were ground-truthed as Trees, 39 as Crops, and 43 as Shrub. This is generally consistent with the assessment conducted at global level [25]. While the Crops LULC type showed a high UA (90.97 ± 4.70%), its PA was low (67.61 ± 5.17%). As a result, over 32% of actual crop fields were labeled as other LULC types (e.g., Grass with 39 references and Trees with 23 references). Similarly, the Built areas LULC type showed low PA (34.10 ± 8.04%) but high UA (88.00 ± 7.40%), suggesting conservative mapping that missed many Built areas. Bare ground demonstrated moderate accuracies (PA: 43.49 ± 12.50%; UA: 78.38 ± 9.44%) with significant confusion with Built areas; thirteen bare ground reference points were incorrectly labeled as built-up. Shrub showed extremely low PA (0.23 ± 0.05%), indicating that ESA failed to detect most shrubland in the karst region. However, UA was high (88.00 ± 7.40%) for the few areas it did map as Shrub. Finally, the high PA (100.00 ± 0.00%) for Flooded vegetation should be interpreted with caution. Due to the extreme rarity of this class (<0.01% of the study area), a smaller sample size (n = 32), and a negligible mapped area proportion (0.0001%), a mathematically perfect PA was achieved. However, only 32 of 48 mapped reference samples were correctly identified (UA = 66.67 ± 13.48%).

Similar to those for ESA, Water (PA: 69.17 ± 18.79%; UA: 85.56 ± 7.30%), Trees (PA: 93.46 ± 1.57%; UA: 76.37 ± 3.16%)), and Ice/snow (PA: 9.94 ± 4.52%; UA: 90.24 ± 9.20%) were accurately classified LULC types in terms of UA for the ESRI dataset, though Ice/snow showed low PA. In contrast, Crops (PA: 41.57 ± 5.00%; UA: 74.77 ± 8.27%) and Built areas (PA: 74.22 ± 8.61%; UA: 56.46 ± 8.04%) showed moderate accuracies. Built areas demonstrated the opposite pattern to ESA, with higher PA but lower UA. This indicated that while the dataset captured more Built areas, a lower proportion of those mapped areas were correctly classified. Grass LULC type has a high rate of misclassification (UA: 24.30 ± 5.00%; PA: 52.31 ± 8.80%) with Shrub, Crops, and Trees. The lowest performance was observed for Flooded vegetation (0% PA). It was expected that the Shrub LULC type would have no mapped samples, resulting in undefined UA, because ESRI removed this class from their LULC dataset starting in 2021. More notably, Flooded vegetation also had 0.00% PA with undefined UA, as no samples were mapped. This indicated ESRI’s complete failure to detect this rare class.

In contrast, classification results for Water and Trees for the DW dataset were relatively reliable in terms of PA (89.54 ± 12.26% and 96.58 ± 1.14%, respectively). However, UAs were lower (69.03 ± 8.56% and 71.21 ± 3.20%, respectively). Like other two datasets, spectrally similar LULC types posed substantial confusion between several vegetation classes. Trees showed significant confusion with Shrub (93 misclassified instances) and Crops (72 misclassifications) despite high PA (96.58 ± 1.14%). Shrub demonstrated poor UA (23.08 ± 6.14%) and low PA (21.58 ± 5.45% with widespread misclassification as other vegetation types (Grass, Crops, and Trees). Reasonable accuracies were achieved for Built areas (PA: 67.63 ± 9.38%; UA: 60.77 ± 8.43%). Unlike ESA which barely mapped Shrub, DW mapped substantial shrub area (10.95%) but with low accuracy for both detection and correctness. Built areas achieved reasonable accuracies (PA: 67.63 ± 9.38%; UA: 60.77 ± 8.43%), showing more balanced performance than ESA or ESRI with moderate confusion. However, the Ice/snow class showed considerable confusion with Bare ground (16 misclassifications out of 61 total) and demonstrated low PA (23.45 ± 7.57%) and moderate UA (59.18 ± 13.90%).

5.3. Spatial Detail and Feature Preservation

Visual comparison of the three datasets revealed systematic differences in spatial detail and feature preservation (Figure 3). The ESA dataset presented fine-grained landscape features relative to the ESRI and DW datasets. Its key advantage lies in the preservation of small-scale linear features and complex heterogeneous patterns such as road networks and the distinct boundaries of agricultural fields (Figure 3). Furthermore, the accuracies for various datasets differ for different LULC types.

5.3.1. Built Areas Classification

For classification of Built areas (Figure 3A,B), all three datasets generally succeed in identifying major settlements and transportation infrastructure. However, there are significant variations in spatial extent. In Image Clip A (Huanjiang town, Guangxi, China), ESRI identified the largest Built area, while ESA identified the smallest. These size discrepancies were apparently due to differences in definitional criteria: ESA explicitly separates Bare ground from established Built areas, whereas ESRI’s land use-focused approach combines both into its Built areas classification. In the lower-left corner of Image Clip A, DW classifies most of the area as Trees, ESRI presents it as a mixture of Shrub and Built areas, and ESA exhibits the best performance by accurately distinguishing between Trees (predominantly eucalyptus plantation), Crops, and Built areas.

Regarding transportation networks (Figure 3B), all three datasets generally identified the linear freeway, highway and ramps, however with varying levels of completeness. Furthermore, ESA failed to identify one industrial patch in the lower-left corner, which was best represented by DW. ESRI minimally depicted this area as Built areas surrounding a dominant patch of Grass. However, DW failed to capture a freeway segment in the upper-left corner and erroneously classified a nearby forested area as Water, an error attributable to shadow effects in the imagery.

5.3.2. Agricultural Land Classification

Agricultural land classification (Figure 3C) showed notable differences across datasets. ESA identified the area as Crops with Trees, with tree patches mapped at precise locations within the agricultural matrix. While ESRI also labels the area as Crops and Trees, it missed some tree patches. In contrast, DW depicted the areas as Trees with some crops. This confusion for DW likely stemmed from the region’s dominant crop, sugarcane. Although sugarcane can be spectrally separated using medium- to high-resolution imagery such as Landsat and Sentinel imagery [40,41], its perennial life cycle (over one year) may cause spectral overlap with shrubs and grass. Additionally, infrastructure representation varies: the highway is clearly present in ESA, intermittent in DW, and absent in ESRI, suggesting likely post-processing dissolution in the latter.

5.3.3. Peak-Cluster Depression Landscapes

Figure 3D,E illustrated classification performance in typical karst peak–cluster depression terrain—one of the most challenging landscape types in the study area. ESA consistently provided the highest level of spatial detail across both clips. However, both the DW and ESRI datasets offered better representation of Built areas (villages), with minimal Built areas identified in the ESA data. In the lower central portion, the classification of built areas in the DW and ESRI datasets appeared exaggerated, likely stemming from shadow-induced misclassification.

A persistent issue for DW is the presence of scattered Water pixels attributed to topographic shadows. At its bottom center of Image Clip E, a magnified portion of Image Clip D clearly illustrated the typical concentric land-use arrangement associated with karst peak-cluster depressions [27]: dwellings are situated near the center (avoiding the depression bottom to prevent flooding during the rainy season), surrounded by cropland on the lower slopes. The steeper and higher-elevation slopes retained the original vegetation cover. ESA best captured this concentric pattern, although it struggled to separate Crops from Grass due to spectral property similarity and irregular field boundaries in karst regions. Irregular boundaries of plots aggravate the problem of mixed pixels, especially at the edges. Conversely, both the DW and ESRI datasets exhibited a tendency to overestimate forest acreage while underestimating grass and crop coverage.

While DW has the largest percentage of Shrub cover, the areas classified as Shrub patches within DW are, in fact, Crops (Figure 3E). This misclassification stemmed from the definition of “Shrub” used by DW: “small clusters of plants or individual plants dispersed on a landscape that showed exposed soil and rock”. However, shrub communities in Southwestern China often have both grassy and bare backgrounds. In this specific context, the observed LULC consists of scattered trees interspersed among patches of cropland, suggesting that the agricultural area’s appearance, possibly due to exposed soil in the cropland, caused it to meet the dataset’s definition of Shrub.

Figure 3F included a less rugged area where topographic shadows are less severe. ESA again presented a more realistic LULC pattern. However, the percentage of Crops may be underestimated due to the confusion between grass and crops. The DW dataset does the best in showing villages (Built areas), ESRI dataset slightly exaggerated the Built areas. In contrast, ESA underestimated Built areas, most of these pixels are labelled as Grass and Bare ground. As with Image Clip E, both ESRI and DW datasets over-estimate Trees and underestimate Crops and Grass. Shrub pixels in DW generally should be labelled as Crops.

6. Discussions

6.1. Edge Effects, Landscape Heterogeneity, and Classification Accuracy

Landscape heterogeneity exerts a strong influence on classification accuracy in Southwest China’s karst region. According to the sensitivity analysis of the accuracy estimates across varying neighborhood definitions, both classification accuracy and patch size consistently decline from homogeneous patch interiors (heterogeneity = 1) to edges and mixed areas (heterogeneity ≥ 2), a pattern stable across window sizes (Table 6). Homogeneous patches are characterized by large, continuous patches (median > 100,000 pixels) and high accuracy (86.75–92.59%). Highly heterogeneous patches (Heterogeneity = 3–5) contain much smaller patches (median < 600 pixels) and have substantially lower accuracy (33.33–67.23%). Although larger windows increase measured heterogeneity by incorporating more LULC classes, the fundamental relationship remains unchanged; interior conditions yield higher accuracy than edge or mixed conditions. While the proportion of reference points classified as interior declined monotonically with larger window sizes, the accuracy gap between heterogeneity level 1 and 2 remained stable (18.17–19.21%), confirming that the results are not sensitive to the choice of window size.

This decline is primarily driven by the mixed-pixel effect at 10 m resolution. Pixels at patch boundaries integrate spectral signals from multiple LULC. This spectral mixing creates ambiguous signatures that deviate from the pure spectral endmembers used to train classification algorithms. Smaller patches exacerbate the problem due to their higher edge-to-interior ratio, resulting in more mixed pixels. Additionally, the misregistration (or misalignment) between the LULC and reference datasets cause misclassifications or mismatch of ground truth and classified LULC data [12,42]. These findings explain the substantially lower accuracies of DW and ESRI in this fragmented karst landscape compared with more homogeneous regions.

6.2. Elevation, Slope and Classification Accuracy

ESA maintains a stable OA of approximately 81.00% across all elevation zones (from 1–500 m to 4001–6000 m) (Figure 4), with only minor fluctuations and slightly higher performance in some mid-to-high ranges (e.g., ~90% at 3001–4000 m). In contrast, both ESRI and Dynamic World show some decline in accuracy at higher elevations, dropping to below 40% in the 3001–4000 m range before a recovery at 4001–6000 m. Although these patterns suggest possible advantages of SAR integration and topographic consideration in ESA’s workflow, this study did not apply controlled analysis. Therefore, the observed elevation-related differences should be interpreted with caution.

Aspect-related patterns are consistent across all three products, as reflected in the symmetrical structure of the radar charts (Figure 5). ESA achieves higher accuracy in all directions. However, it is difficult to determine if minor directional variability (e.g., slightly lower performance on north-facing slopes) these differences are statistically significant. As such, a clear link between classification error and aspect remains difficult to establish. Similarly, no significant relationship between slope and accuracy was detected.

6.3. Challenges in Discriminating Spectrally Similar Vegetation Classes

All three 10 m products showed poor performance for spectrally similar vegetation types, though inter-product comparisons for Shrub are not fully equivalent because ESRI does not include Shrub as a native class in its 2021 product. This is particularly limited in the karst region, where transitional communities serve as key indicators of restoration progress and reliable differentiation is essential for monitoring ecosystem recovery. Shrub mapping was particularly problematic. The shrub community is ecologically important for mitigating soil erosion and facilitating forest regeneration, with woody shrub species (e.g., Pyracantha fortuneana, Rosa cymosa) uniquely adapted to drought conditions, rocky substrates, and calcium-rich soils [33]. During recovery, key soil parameters including organic carbon, total nitrogen, and microbial abundance increase significantly from grassland through shrubland stages [34,35].

The ESA dataset rarely detected shrub cover (0.23% ± 0.05%) despite a high UA (88.00% ± 7.40%). While Shrub classifications are generally reliable when they occur, the dataset suffers from significant omission errors, frequently misclassifying shrubland as either grassland or forest. The DW dataset classified substantial areas as Shrub (10.95% of the study region), but most of these pixels were Trees, Crops, or mixed vegetation (PA: 21.58 ± 5.45%, UA: 23.08 ± 6.14%) (Table 5, Figure 6). ESRI eliminated the Shrub class entirely in 2021 by merging it into rangeland or forest.

The confusion among Grass, Crops, and Shrub stems from multiple factors. First, spectral signatures of these classes overlap considerably, particularly when comparing sugarcane (a perennial crop reaching 2–3 m in height) with grassland or shrubland. The tall structure and extended growing season of sugarcane create spectral properties intermediate between annual crops and perennial vegetation. Second, in the karst landscape, agricultural fields are frequently small and irregular in shape, with boundaries that do not align with the 10 m pixel grid. This creates abundant mixed pixels even in patch interiors. Third, the understory of shrub communities is heterogeneous. Grassy openings, Bare ground patches, and varying canopy densities produce spectral variability that confounds pixel-based classification. Finally, phenological timing of image acquisition strongly influences separability; grass and crops show maximum spectral distinction during peak growing season but may be nearly indistinguishable during senescence or after harvest.

The failure to separate Trees and Shrub (Figure 6) is most likely due to lack of training samples from the region. Spectrally, there are large differences between Trees and Shrub because of the differences in canopy structure, canopy height, and understory vascular plants. In true-colour composites for Sentinel-2 imagery, forests typically exhibit a darker, denser, and more uniform texture due to a closed, multi-layered canopy of tall trees. In contrast, shrubland appears lighter, less homogeneous, and patchier because the vegetation consists of shorter, smaller woody plants that form a discontinuous, lower canopy. This reduced cover often exposes more bedrock and soil background, leading to mixed spectral signatures that fall between dense forest, grass, and bare ground. This makes the overall signature more texturally varied than a closed forest canopy in a true-colour composite.

6.4. Technical Factors Affecting Classification Performance

6.4.1. Atmospheric Correction and Topographic Shadow Effects

The choice of input data preprocessing significantly impacts classification results in mountainous karst terrain. Sentinel-2 imagery is available in two formats: Level-1C (L1C, top-of-atmosphere reflectance) and Level-2A (L2A, atmospherically corrected surface reflectance) generated by the Sen2Cor algorithm [43]. Although the Sen2Cor algorithm performs adequately on a global scale, it can introduce artifacts in rugged terrain where deep topographic shadows exist.

Our examination of L2A imagery showed pronounced over-brightening in shadowed valleys and on sunlit ridges. This was evidenced by a 61.5% to 66.6% increase in the mean digital numbers for the green band. Consequently, pale greenish blocks and residual shadow edges coexist in karst valleys (Figure 7). Third-party evaluation of Sen2Cor’s shadow detection performance reported PAs and UAs of only 19.4% and 80.8% respectively for cloud shadow identification across six sites in Switzerland, France, Morocco, and Senegal [44], confirming the algorithm’s limitations in complex terrain. These artifacts, combined with persistent clouds, haze, and atmospheric opacity, reduce image contrast and quality in the study area (Figure 7). Consequently, DW frequently misclassifies deep topographic shadows as water, especially in winter scenes that dominate the annual composite due to limited clear-sky imagery. Such issues highlight the limitations of optical-only atmospheric correction in complex karst landscapes.

6.4.2. Classification Algorithms and Methodological Differences

Several methodological differences may help explain the relatively better performance of ESA observed in this study. Independent global validations [12,25] attribute this superiority to the fusion of Sentinel-1 SAR with optical imagery, a finer minimum mapping unit (100 m²), and expert-driven post-classification corrections. Research indicates that the ESA product extracts smaller landscape elements (e.g., urban trees, hedgerows, and small agricultural fields) more effectively than the DW and ESRI datasets [12,25]. Furthermore, the ESA product maintains superior spatial detail and higher accuracy in heterogeneous landscapes, although the comparisons remained qualitative [12]. Because this study did not conduct a formal sensitivity analysis to isolate the contribution of each methodological factor, these explanations remain correlative rather than definitive.

The integration of cloud penetration SAR together with topographic context and a finer minimum mapping unit (100 m² vs. 250 m² for the others), gives ESA a clear advantage in persistently cloudy and topographically complex karst regions. Deep-learning approaches adopted by ESRI and DW, which incorporate neighborhood context, produce smoother, more generalized maps that reduce salt-and-pepper noise but lose fine-scale features such as roads and small agricultural fields.

These patterns are supported by Figure 3 and landscape metrics in Figure 8. Clip A and B are dominated by urban landscapes and transportation networks and as such have higher edge density and patch density (Figure 8). In contrast, edge and patch densities are lower for Clip C (mainly Crops) and Clip D (peak–cluster depression dominated by mistakenly classified Trees and Crops). ESA consistently shows the highest edge and patch densities than those for DW and ESRI, sometimes more than double those of DW and ESRI. Correspondingly, the aggregation index is higher for DW and ESRI than that for ESA. Consequently, ESA preserves fine-scale features (e.g., road and fields) and reflects the actual heterogeneous karst landscape. The quantitative evidence (Figure 8) supports the visual differences observed in Figure 3.

ESA’s independent pixel classification produces systematically different spatial patterns compared to the deep learning approaches used by DW and ESRI. As explained in Section 5.3, most of the smaller patches in ESA are not artifacts of classification error, but consequences of methodological choices with important implications for representing heterogeneous landscapes. Deep learning models with neighborhood context produce spatially smoother results via implicit generalization [25].

6.4.3. Classification Scheme Inconsistencies

Differences in LULC class definitions across products create systematic discrepancies that persist even after harmonization attempts. The ESRI dataset’s land use-focused approach classifies yards, parks, and tree groves as Built areas rather than vegetation. As a result, its built area estimate is larger than those land cover-focused ESA and DW products. The definition of Shrub varies slightly across global LULC datasets. ESA defines Shrub as clustered woody species covering more than 10% of the area within an herbaceous background, permitting the presence of trees with less than 10% cover. DW and ESRI (pre-2021) emphasize contrast between woody species and a dynamic herbaceous background with visible bare ground or forest gaps, leading to systematic confusion with Crops containing scattered trees. As illustrated in Image Clip B (Figure 6), the Shrub class in Southwest China karsts is dominated by low-growing vascular plants, interspersed with shrubs or young trees. Furthermore, the high-resolution imagery reveals visible patches of soil or bedrock.

The elimination of the Shrub class from ESRI’s 2021 classification scheme represents a design choice that substantially limits the product’s utility for applications requiring shrubland mapping. Similarly, “Flooded vegetation” in DW and ESRI includes heavily irrigated rice paddies and wetlands, while ESA classifies irrigated agriculture as Crops. These definitional inconsistencies complicate cross-product comparison and contribute to lower agreement in confusion matrices, particularly for mixed vegetation classes. More distinct categories like Water and Built areas align better across datasets because their definitions are more universally consistent.

6.4.4. Reference Data Uncertainty and Temporal Misalignment

Uncertainty in the reference dataset arises from challenges inherent to visual interpretation, including mixed pixels, spectral similarity among vegetation types, and limited availability of very-high-resolution imagery from 2021 [12,25]. While we employed dual-interpreter protocols for ambiguous cases, subjective judgment remains necessary for transitional vegetation types. Spatial dependence was minimized by enforcing a 1 km minimum distance between same-class points. Low Moran’s I values (<0.05) that confirm adequate independence (Table 7). The reference samples also adequately represent the heterogeneous karst landscape, as their LULC heterogeneity distribution closely matched that of 1801 systematic grid points (Cramér’s V = 0.095, negligible practical difference) [39]. The narrow 95% CIs from the design-based accuracy assessment (e.g., ±2.2% for ESA) confirms that our sample size of 1416 points provides stable estimates as well.

Temporal misalignment between imagery used for classification and imagery used for reference interpretation introduces additional uncertainty. The specific Sentinel-2 tiles used for creating LULC datasets are not publicly documented, making it impossible to ensure perfect temporal correspondence. Seasonal variations in spectral signatures can cause the same location to be classified differently. For example, an area might appear as bare soil in winter imagery but as grassland during peak growing season. This issue is exacerbated by cloud cover limiting usable images. Dynamic features like water bodies are highly susceptible to such temporal effects; sandbars in dammed rivers might be classified as Water during high flow or as bare ground when exposed.

Cloud cover and atmospheric opacity limit the number of quality images in the study area. Of 146 images available for Tile 48QYM (Figure 6 shows a portion of the tile), only four winter scenes contained complete LULC information. Winter generally offers clearer skies but creates the most severe topographic shadows, leading to the water misclassification issues observed in DW. Eight reference points classified as “Flooded vegetation” in ESA were likely misclassified water bodies due to water level variations between image acquisition dates, illustrating the challenge of validating temporally dynamic features.

6.5. Limitations and Future Improvements

6.5.1. Limitations

While the three global LULC datasets may be used for LULC maps for the southwest China karst, data users should be cautious of the associated errors. Note that the Shrub class in ESRI classification system was merged with the grass class to form a class of rangeland starting in 2021. As such, it is inappropriate to apply ESRI directly in the Southwest China karst since both grass and shrub are important indicators of the recovery of the karst ecosystem. ESRI’s shrub accuracy metrics in this study reflect a classification absence rather than algorithmic performance and are not directly comparable to those of ESA and DW. Although DW includes a Shrub class, these corresponding pixels were generally associated with Grass, Crops, and Trees. The gap-filling process enhanced the geographical completeness of the DW dataset, particularly within high-altitude and cloud-prone regions. By addressing these data gaps, the total number of valid reference points increased by 53, with the most substantial gains observed in the Snow/Ice (+34 points) and the Shrub categories (+8 points). Area-weighted OA remained virtually unchanged (64.27 ± 2.53% to 64.24 ± 2.52%). The stable accuracy of most LULC types (e.g., Trees, Shrub, Built Area) confirms that gap-filling resolved data deficiencies in high-relief zones without compromising classification integrity. PA and UA changes were minimal (<3%) for most classes, except Water (UA: −11.39%) and Snow/Ice (UA: +16.33%).

Despite offering the most LULC detail and highest accuracy, ESA, like the DW and ESRI products, fails to distinguish between shrubs and trees, which is a critical limitation for accurately assessing the degree of ecosystem recovery in this region. The uncritical application of this dataset to different fields (e.g., climate change, ecosystem services, hydrological, and carbon modeling) may be inappropriate. The overestimation of tree cover inherent in this data may bias the evaluation of environmental recovery initiatives by suggesting greater success than achieved. Consequently, using this data for assessments and predictions without prior quality assurance will yield unreliable results. Globally and nationally derived LULC products are often insufficient for accurate local or regional studies in complex environments. Therefore, their accuracy should always be validated through localized assessment.

6.5.2. Pathways for Improvement

Fusion spectral and RADAR data and LiDAR -derived elevation and canopy height could improve LULC mapping accuracy in karst regions. The ESA dataset’s good performance demonstrates that fusing Sentinel-1 RADAR with optical data overcomes cloud cover limitations. Furthermore, LiDAR-derived elevation and canopy height metrics could further improve separation of shrub, grass, and forest by providing structural information absent from spectral data alone. The accuracy of ICESat-2 and the Global Ecosystem Dynamics Investigation (GEDI) has been assessed for canopy height estimation, with GEDI performing better in two cities in Southwest China [45]. The efficacy of global canopy height maps, such as the 10 m ETH Global Sentinel-2 Canopy Height 2020 (which fuses GEDI sparse LiDAR data with Sentinel-2 imagery [46]) and the 30 m annual median vegetation height maps (which integrate ICESat-2 LiDAR and Landsat imagery [47]), also warrants further evaluation in the region. In addition, topographic compensation methods (e.g., Minnaert correction, Sun-Canopy-Sensor models) could reduce shadow artifacts that plague optical imagery in rugged terrain [48,49].

Advancements in classification algorithms and training data refinement present additional avenues for improvement. Contextual classification approaches (e.g., object-based image analysis (OBIA) that model spatial relationships could better handle edge pixels where our analysis showed maximum confusion. Soft classification techniques, which assign probabilistic class memberships, could more accurately represent heterogeneous and transitional zones. Product developers could address this limitation by implementing confidence metrics that flag high-uncertainty pixels in heterogeneous areas. This will allow users to assess data suitability. Future CNN work could explore adapting existing architectures, training new models, or developing hybrids. Promising directions include combining CNNs with vision transformers to capture broader context and finer detail, or with LSTM networks to integrate spatial and temporal information [12]. For spectrally analogous classes such as Crops, Grass, and Shrub, multi-temporal classification leveraging phenological differences throughout the growing season could improve their separation. Sugarcane, for example, maintains green vegetation year-round, potentially enabling separation from annual crops and natural grassland if multi-date imagery is used systematically. Finally, the development of region-specific training data addressing the shrub, grass, and crop communities present in Southwest China’s karst is fundamental for improving classification of these transitional types. Complementing this with edge-specific training samples would sharpen boundary delineation in fragmented landscapes.

6.5.3. Recommendations for Regional Applications

For researchers and practitioners applying global LULC products in Southwest China’s karst region, the ESA datasets should be the default choice when spatial detail and OA are priorities. For applications sensitive to boundary accuracy (e.g., habitat fragmentation analysis, edge effect studies) users should recognize that effective accuracy is lower than the reported OA (81.0%). When temporal monitoring is required, the DW product offers valuable time-series data despite its lower accuracy. Systematic shadow-induced water misclassification should be corrected using ancillary data or expert knowledge. We applied a correction wherein any pixel classified as Water by DW but as a non-water class by ESA was reassigned the ESA label. This correction resulted in a slight increase in quantitative accuracy metrics (OA increased 1%). However, the correction significantly improves the qualitative interpretability and logical consistency of the DW dataset.

It is critical to note that none of the three products (including the top-performing ESA dataset) provides a sufficient foundation for shrub-focused ecological monitoring in this region. The ESA product rarely detected shrubs (PA: 0.23%), while the DW product mapped extensive but largely inaccurate shrub cover (PA: 21.58%; UA: 23.08%); the ESRI dataset lacks the class. Given these limitations, these global products are unsuitable for assessing ecosystem recovery in this region. This underscores the need for region-specific classification methods that can accurately capture transitional vegetation. Additionally, users should consider landscape heterogeneity when designing sampling strategies or interpreting accuracy metrics; standard OA metrics may be misleading in highly fragmented landscapes.

Although this study focuses on Southwest China’s karst region, the core relationships we uncovered between landscape heterogeneity and classification accuracy are broadly applicable to many other fragmented and topographically complex environments worldwide. Karst landscapes are characterized by extreme topographic relief, small patch sizes, high edge density, and spectral similarity among vegetation types. These challenges are mirrored in mountainous regions (e.g., the Andes, Himalayas, Rockies), agroforestry systems, peri-urban transitional zones, and arctic/subarctic transition areas. Steep slopes, shadows, cloud cover, and fine-grained LULC patterns in these areas similarly degrade automated classification performance. The persistent confusion among spectrally similar transitional classes (shrub, grass, crops) is particularly relevant for ecosystem recovery monitoring, habitat modeling, and land degradation assessment in these diverse landscapes. Regional validation studies in other complex environments should adopt comparable approaches to quantify these effects and inform product selection. When applying global datasets to mountainous, peri-urban, or transitional environments, researchers should expect higher uncertainty and consider supplementing with multi-source data (e.g., SAR, high-resolution imagery) where fine-scale detail or transitional vegetation mapping is critical. This landscape-aware interpretation will enhance the reliable use of global products across a wide range of heterogeneous regions.

7. Conclusions

The public availability of global LULC datasets at 10 m resolution offers unprecedented opportunities for regional environmental monitoring and analysis. This study provides the first comprehensive accuracy assessment of three 10 m global LULC products (ESA, ESRI, and DW) for the ecologically vulnerable Southwest China karst region. ESA outperformed the other two products, achieving the highest overall accuracy (79.39 ± 2.19%) while best preserving fine-scale fragmented features characteristic of karst topography. ESA’s good performance stems from the integration of Sentinel-1 SAR data, a finer minimum mapping unit (100 m²), and expert post-classification corrections.

A key finding is that landscape heterogeneity significantly degrades classification accuracy. There is a 19.4% decline in accuracy from patch interiors (86.75%) to edges (67.38%). Edge pixels (29.66% of samples) accounted for 50.93% of total errors. Critically, none of the three products currently provides a sufficient basis for shrub-focused ecological monitoring in this region: ESA rarely detected shrub cover, DW produced largely inaccurate shrub classifications, and ESRI lacks the shrub class. All three products are therefore unsuitable for monitoring grass-shrub-forest transitions—a key indicator of ecosystem recovery in karst regions—without substantial local validation and correction.

These results highlight the need for regional validation of global products in complex terrains. Future efforts to improve LULC mapping in karst regions should integrate multi-source data (optical + SAR), apply topographic correction for shadows, develop region-specific training data for transitional vegetation, and adopt contextual classification approaches to better handle edge effects.

Author Contributions

Conceptualization, C.Z., X.Q., M.Z., Y.Y. and K.W.; methodology, C.Z. and X.Q.; reference points interpretation, H.S.C. and C.Z., spatial analysis, C.Z., X.Q. and H.S.C.; Writing—Original draft preparation, C.Z., X.Q. and K.W.; writing—review and editing, C.Z., X.Q., H.S.C., M.Z., Y.Y. and K.W.; visualization, C.Z., H.S.C. and X.Q.; funding acquisition, C.Z., M.Z. and K.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (Grant No. U2244216, awarded to X.Q.; Grant No. 42430512, awarded to K.W.; Grant No. 42571145, awarded to M.Z.), the West Light Foundation of the Chinese Academy of Sciences (awarded to X.Q.), and Research Assistant Funding Program from Algoma University (awarded to C.Z.).

Data Availability Statement

The original data presented in the study are openly available in Google Earth Engine.

Acknowledgments

The authors gratefully acknowledge the providers of all open databases for making the necessary data available for this study, as well as Google Earth Engine for providing free academic access. We also sincerely thank the editor and the anonymous reviewers for their valuable comments and constructive suggestions, which greatly improved this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Brown, C.F.; Brumby, S.P.; Guzder-Williams, B.; Birch, T.; Hyde, S.B.; Mazzariello, J.; Tait, A.M. Dynamic World, Near Real-Time Global 10 m Land Use Land Cover Mapping. Sci. Data 2022, 9, 251. [Google Scholar] [CrossRef]
Karra, K.; Kontgis, C.; Statman-Weil, Z.; Mazzariello, J.C.; Mathis, M.; Brumby, S.P. Global Land Use/Land Cover with Sentinel-2 and Deep Learning. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Brussels, Belgium, 11–16 July 2021; pp. 4704–4707. [Google Scholar]
Zanaga, D.; Van De Kerchove, R.; Daems, D.; De Keersmaecker, W.; Brockmann, C.; Kirches, G.; Arino, O. ESA WorldCover 10 m 2021 v200. 2022. Available online: https://esa-worldcover.org (accessed on 10 December 2025).
Zhang, C.; Qi, X.; Wang, K.; Zhang, M.; Yue, Y. The Application of Geospatial Techniques in Monitoring Karst Vegetation Recovery in Southwest China: A Review. Prog. Phys. Geogr. 2017, 41, 450–477. [Google Scholar] [CrossRef]
Chaaban, F.; El Khattabi, J.; Darwishe, H. Accuracy Assessment of ESA WorldCover 2020 and ESRI 2020 Land Cover Maps for a Region in Syria. J. Geovis. Spat. Anal. 2022, 6, 31. [Google Scholar] [CrossRef]
Kang, J.; Yang, X.; Wang, Z.; Cheng, H.; Wang, J.; Tang, H.; Bai, Z. Comparison of Three Ten-Meter Land Cover Products in a Drought Region: A Case Study in Northwestern China. Land 2022, 11, 427. [Google Scholar] [CrossRef]
Hao, X.; Qiu, Y.; Jia, G.; Menenti, M.; Ma, J.; Jiang, Z. Evaluation of Global Land Use–Land Cover Data Products in Guangxi, China. Remote Sens. 2023, 15, 1291. [Google Scholar] [CrossRef]
Lillesand, T.; Kiefer, R.W.; Chipman, J. Remote Sensing and Image Interpretation, 7th ed.; John Wiley & Sons: New York, NY, USA, 2015. [Google Scholar]
Tuanmu, M.N.; Jetz, W. A Global 1-km Consensus Land-Cover Product for Biodiversity and Ecosystem Modelling. Glob. Ecol. Biogeogr. 2014, 23, 1031–1045. [Google Scholar] [CrossRef]
Potapov, P.; Hansen, M.C.; Pickens, A.; Hernandez-Serna, A.; Tyukavina, A.; Turubanova, S.; Kommareddy, A. The Global 2000–2020 Land Cover and Land Use Change Dataset Derived from the Landsat Archive: First Results. Front. Remote Sens. 2022, 3, 856903. [Google Scholar] [CrossRef]
Chen, J.; Chen, J.; Liao, A.; Cao, X.; Chen, L.; Chen, X.; Mills, J. Global Land Cover Mapping at 30 m Resolution: A POK-Based Operational Approach. ISPRS J. Photogramm. Remote Sens. 2015, 103, 7–27. [Google Scholar] [CrossRef]
Xu, P.; Tsendbazar, N.E.; Herold, M.; de Bruin, S.; Koopmans, M.; Birch, T.; Zanaga, D. Comparative Validation of Recent 10 m-Resolution Global Land Cover Maps. Remote Sens. Environ. 2024, 311, 114316. [Google Scholar] [CrossRef]
Sertel, E.; Robock, A.; Ormeci, C. Impacts of Land Cover Data Quality on Regional Climate Simulations. Int. J. Climatol. 2010, 30, 1942–1953. [Google Scholar] [CrossRef]
Kusi, K.K.; Khattabi, A.; Mhammdi, N. Analyzing the Impact of Land Use Change on Ecosystem Service Value in the Main Watersheds of Morocco. Environ. Dev. Sustain. 2023, 25, 2688–2715. [Google Scholar] [CrossRef]
Ge, J.; Qi, J.; Lofgren, B.M.; Moore, N.; Torbick, N.; Olson, J.M. Impacts of Land Use/Cover Classification Accuracy on Regional Climate Simulations. J. Geophys. Res. Atmos. 2007, 112, D05113. [Google Scholar] [CrossRef]
Huang, Z.; Du, H.; Mao, F.; Li, X.; Zhou, G.; Sun, J.; Song, M. Assessing the Impact of Land Use and Cover Change on Above-Ground Carbon Storage in Subtropical Forests: A Case Study of Zhejiang Province, China. Geo-Spat. Inf. Sci. 2025, 28, 2781–2807. [Google Scholar] [CrossRef]
Yuan, D. Karst of China; Geological Publishing House: Beijing, China, 1991. (In Chinese) [Google Scholar]
Ford, D.; Williams, P.D. Karst Hydrogeology and Geomorphology; Wiley: New York, NY, USA, 2013. [Google Scholar]
Clements, R.; Sodhi, N.S.; Schilthuizen, M.; Ng, P.K. Limestone Karsts of Southeast Asia: Imperiled Arks of Biodiversity. BioScience 2006, 56, 733–742. [Google Scholar] [CrossRef]
UNESCO. South China Karst. Available online: http://whc.unesco.org/en/list/1248 (accessed on 10 December 2025).
Zhou, G. Karst Peak Cluster-Depression System—Land Use, Population and Settlement Distribution. Carsologica Sin. 1995, 14, 194–198, (In Chinese with English abstract). [Google Scholar]
Xiao, H.; Weng, Q. The Impact of Land Use and Land Cover Changes on Land Surface Temperature in a Karst Area of China. J. Environ. Manag. 2007, 85, 245–257. [Google Scholar] [CrossRef]
Kheir, R.B.; Abdallah, C.; Khawlie, M. Assessing Soil Erosion in Mediterranean Karst Landscapes of Lebanon Using Remote Sensing and GIS. Eng. Geol. 2008, 99, 239–254. [Google Scholar] [CrossRef]
Tian, Y.; Wang, S.; Bai, X.; Luo, G.; Xu, Y. Trade-Offs among Ecosystem Services in a Typical Karst Watershed, SW China. Sci. Total Environ. 2016, 566, 1297–1308. [Google Scholar] [CrossRef] [PubMed]
Venter, Z.S.; Barton, D.N.; Chakraborty, T.; Simensen, T.; Singh, G. Global 10 m Land Use Land Cover Datasets: A Comparison of Dynamic World, World Cover and ESRI Land Cover. Remote Sens. 2022, 14, 4101. [Google Scholar] [CrossRef]
Wang, K.; Zhang, C.; Chen, H.; Yue, Y.; Zhang, W.; Zhang, M.; Qi, X.; Fu, Z. Karst Landscapes of China: Pattern, Ecosystem Processes, and Ecosystem Services. Landsc. Ecol. 2019, 34, 2743–2763. [Google Scholar] [CrossRef]
Fan, F.; Wang, K.; Xiong, Y.; Xuan, Y.; Zhang, W.; Yue, Y. Assessment and Spatial Distribution of Water and Soil Loss in Karst Regions, Southwest China. Acta Ecol. Sin. 2011, 31, 6353–6362. [Google Scholar]
Ju, J.; Dai, C.; Kuang, S. Monitoring Karst Rocky Desertification Using Remotely Sensed Data; Geological Publishing House: Beijing, China, 2006. (In Chinese) [Google Scholar]
Qi, X.; Li, Q.; Yue, Y.; Liao, C.; Zhai, L.; Zhang, X.; Wang, K.; Zhang, C.; Zhang, M.; Xiong, Y. Rural–Urban Migration and Conservation Drive the Ecosystem Services Improvement in China Karst: A Case Study of HuanJiang County, Guangxi. Remote Sens. 2021, 13, 566. [Google Scholar] [CrossRef]
Chang, J.; Yue, Y.; Tong, X.; Brandt, M.; Zhang, C.; Zhang, X.; Qi, X.; Wang, K. Rural Outmigration Generates a Carbon Sink in South China Karst. Prog. Phys. Geogr. Earth Environ. 2023, 47, 655–667. [Google Scholar] [CrossRef]
Chang, J.; Li, Q.; Zhai, L.; Liao, C.; Qi, X.; Zhang, Y.; Wang, K. Comprehensive Assessment of Rocky Desertification Treatment in Southwest China Karst. Land Degrad. Dev. 2024, 35, 3461–3476. [Google Scholar] [CrossRef]
Tong, X.; Brandt, M.; Yue, Y.; Ciais, P.; Jepsen, M.R.; Penuelas, J.; Fensholt, R. Forest Management in Southern China Generates Short-Term Extensive Carbon Sequestration. Nat. Commun. 2020, 11, 129. [Google Scholar] [CrossRef]
Guo, K.; Liu, C.; Dong, M. Ecological Adaptation of Plants and Control of Rocky-Desertification on Karst Region of Southwest China. Chin. J. Plant Ecol. 2011, 35, 991–999, (In Chinese with English abstract). [Google Scholar] [CrossRef]
Chen, J.; Zhang, L.; Cai, X.; Luo, W.; Lyu, Y.; Cheng, A.; Wang, S. The Impact of Natural Vegetation Restoration on Surface Soil Moisture of Secondary Forests and Shrubs in the Karst Region of Southwest China. Hydrol. Process. 2024, 38, e15161. [Google Scholar] [CrossRef]
Yang, L.; Yang, H.; Liu, L.; Yang, S.; Wen, D.; Li, X.; Zhu, T. Vegetation Restoration Significantly Increased Soil Organic Nitrogen Mineralization and Nitrification Rates in Karst Regions of China. Forests 2025, 16, 1006. [Google Scholar] [CrossRef]
Cuba, N. Research Note: Sankey Diagrams for Visualizing Land Cover Dynamics. Landsc. Urban Plan. 2015, 139, 163–167. [Google Scholar] [CrossRef]
Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.E.; Wulder, M.A. Good Practices for Estimating Area and Assessing Accuracy of Land Change. Remote Sens. Environ. 2014, 148, 42–57. [Google Scholar] [CrossRef]
Smith, J.H.; Wickham, J.D.; Stehman, S.V.; Yang, L. Impacts of Patch Size and Land-Cover Heterogeneity on Thematic Image Classification Accuracy. Photogramm. Eng. Remote Sens. 2002, 68, 65–70. [Google Scholar]
Sullivan, G.M.; Feinn, R. Using Effect Size—Or Why the P Value Is Not Enough. J. Grad. Med. Educ. 2012, 4, 279–282. [Google Scholar] [CrossRef]
Abdel-Rahman, E.M.; Ahmed, F.B. The Application of Remote Sensing Techniques to Sugarcane (Saccharum spp. hybrid) Production: A Review of the Literature. Int. J. Remote Sens. 2008, 29, 3753–3767. [Google Scholar] [CrossRef]
Wang, J.; Xiao, X.; Liu, L.; Wu, X.; Qin, Y.; Steiner, J.L.; Dong, J. Mapping Sugarcane Plantation Dynamics in Guangxi, China, by Time Series Sentinel-1, Sentinel-2 and Landsat Images. Remote Sens. Environ. 2020, 247, 111951. [Google Scholar] [CrossRef]
Smith, J.H.; Stehman, S.V.; Wickham, J.D.; Yang, L. Effects of Landscape Characteristics on Land-Cover Class Accuracy. Remote Sens. Environ. 2003, 84, 342–349. [Google Scholar] [CrossRef]
Main-Knorn, M.; Pflug, B.; Louis, J.; Debaecker, V.; Müller-Wilm, U.; Gascon, F. Sen2Cor for Sentinel-2. Proc. SPIE 2017, 10427, 37–48. [Google Scholar]
Tarrio, K.; Tang, X.; Masek, J.G.; Claverie, M.; Ju, J.; Qiu, S.; Woodcock, C.E. Comparison of Cloud Detection Algorithms for Sentinel-2 Imagery. Sci. Remote Sens. 2020, 2, 100010. [Google Scholar] [CrossRef]
Fu, L.; Shu, Q.; Yang, Z.; Xia, C.; Zhang, X.; Zhang, Y.; Li, S. Accuracy Assessment of Topography and Forest Canopy Height in Complex Terrain Conditions of Southern China Using ICESat-2 and GEDI Data. Front. Plant Sci. 2025, 16, 1547688. [Google Scholar] [CrossRef] [PubMed]
Lang, N.; Jetz, W.; Schindler, K.; Wegner, J.D. A High-Resolution Canopy Height Model of the Earth. Nat. Ecol. Evol. 2023, 7, 1778–1789. [Google Scholar] [CrossRef]
Hunter, M.O.; Parente, L.; Ho, Y.F.; Bonannella, C.; Ferreira, L.G.; Morton, D.; Sloat, L. Global 30-m Annual Median Vegetation Height Maps (2000–2022) Based on ICESat-2 Data and Machine Learning. Sci. Data 2025, 12, 1470. [Google Scholar] [CrossRef]
Mostafa, Y. A Review on Various Shadow Detection and Compensation Techniques in Remote Sensing Images. Can. J. Remote Sens. 2017, 43, 545–562. [Google Scholar] [CrossRef]
Rajah, P.; Odindi, J.; Mutanga, O. Feature Level Image Fusion of Optical Imagery and Synthetic Aperture Radar (SAR) for Invasive Alien Plant Species Detection and Mapping. Remote Sens. Appl. Soc. Environ. 2018, 10, 198–208. [Google Scholar] [CrossRef]

Figure 1. LULC of the Southwest China karst regions derived from the 10 m ESA product. Main bedrock types in Southwest China karst are limestone, dolostone, interbedded limestone and dolomite, interbedded limestone and clastic rock, interbedded dolomite and clastic rock, interbedded carbonate and clastic rock. The legend has been harmonized to the DW schema.

Figure 2. Sankey diagram showing the relationships between the global LULC datasets ((A: DW, B: ESA, C: ESRI). Line thickness represents the number of pixels transitioning between LULC type from different datasets. Small LULC types (i.e., Flooded vegetation, Ice/snow) and transitions (links) smaller than 10,000 were not included in the chart to ensure readability.

Figure 3. Side-by-side comparison of three LULC datasets (DW, ESA, and ESRI) against high resolution ESRI Imagery to assess classification consistency. (A,B) Built areas/Bare ground, (C) Crops. (D) Typical karst peak–cluster depression region, (E) Close-up view of peaks and depressions, (F) Similar terrain as to (D) with lower contrast (reduced shadows). High resolution imagery acquisition dates: (A) 24 March 2021 (WorldView 2), (B) 4 November 2022 (WorldView 3), (C) 7 September 2022 (WorldView 2), (D) 18 October 2023 (WorldView 3), (E) 18 October 2023 (WorldView 3), (F) 18 November 2021 (WorldView 2) respectively.

Figure 4. Relationships between classification accuracy and elevation ranges ((A) DW, (B) ESA, (C) ESRI). The red dashed line represents the overall accuracy (OA). n indicates the number of reference points in respective ranges.

Figure 5. Relationships between classification accuracy and aspect ((A) DW, (B) ESA, (C) ESRI).

Figure 6. Capabilities of three LULC datasets (DW, ESA, and ESRI) in mapping Trees (A) and Shrub (B). Imagery clips A and B are from a WorldView-2 image acquired on 12 January 2023, available within ArcGIS Pro.

Figure 7. Comparison of the Sentinel-2 L1C image (true colour composite), L2A image (true colour composite), and DW classified layer for a typical karst peak–cluster depression. Sentinel-2 image clips for panels (A–C) are from 14 February, 29 July, and 6 December 2021, respectively. Pictures are shown with default enhancement (Percent Clip). The gap areas are shown as blue because “no value” pixels were confused with “Water (0)” in the classified DW layer downloaded from GEE.

Figure 8. Landscape metrics for the LULC dataset shown in Figure 3: (A) edge density, (B) patch density, and (C) aggregation index.

Table 1. Distribution of reference samples by LULC class based on a stratified random sampling design for accuracy assessment.

LULC Class	Mapped Area (ESA, %)	Initial Allocation	Final Sample Size	Sampling Rationale
Water	0.59	75	82	Minimum threshold—rare class
Trees	62.11	Proportional	584	Area-proportional allocation
Grass	20.37	Proportional	112	Area-proportional allocation
Flooded vegetation	0	75	32	Minimum threshold—rare class; reduced to preserve design integrity
Crops	13.88	Proportional	203	Area-proportional allocation
Shrub	0.02	Proportional	147	Area-proportional allocation
Built areas	1.62	75	105	Minimum threshold—rare class
Bare ground	1.4	75	86	Minimum threshold—rare class
Ice/snow	0.01	75	65	Minimum threshold—rare class
Total	100.00	1450	1416

Table 2. Comparison of mapped area percentages for LULC classes across the three global 10 m products (DW, ESRI, ESA) for Southwest China’s karst region.

	Water	Trees	Grass	Flooded Vegetation	Crops	Shrub	Built Areas	Bare Ground	Ice/Snow
DW	1.83%	70.51%	2.58%	0.02%	5.98%	10.95%	6.41%	1.33%	0.38%
ESRI	0.92%	63.98%	17.72%	0.01%	8.23%	-	8.47%	0.60%	0.07%
ESA	0.59%	62.11%	20.37%	0.00%	13.88%	0.02%	1.62%	1.40%	0.01%

Table 3. LULC datasets consistency by label (showing pixels where at least two datasets agree).

LULC Type	Total Consistent Pixels	% of Overall
Trees	15,020,696,000	65.61
Built areas	1,444,298,207	6.31
Grass	2,506,978,001	10.95
Crops	1,791,415,245	7.82
Water	194,191,353	0.85
Bare ground	168,207,967	0.73
Ice/snow	9,709,566	0.04
Shrub	2,427,391	0.01

Table 4. Major cross-product pixel transition percentages (>10%) between the three 10 m global LULC datasets in Southwest China karst region.

Source Class	DW → ESA	ESA → ESRI
Trees	81.85% Trees	88.45% Trees
Grass	73.40% Grass 13.53% Crops	51.59% Grass 33.44% Trees
Crops	80.58% Crops	41.00% Crops 22.99% Grass
Shrub	64.30% Grass 19.77% Crops 13.45% Trees	62.48% Grass 36.07% Trees
Built areas	22.11% Built areas	92.59% Built areas
Bare ground	46.11% Bare ground	32.13% Bare ground
Water	31.08% Water	95.88% Water

Table 5. Design-based accuracy assessment results with 95% CI (i.e., accuracy ± 95% CI) for the global 10 m LULC dataset in Southwest China karst (OA: Overall accuracy. PA: Producer’s accuracy. UA: User’s accuracy. NA: Not Available. NA for ESRI Shrub reflects the absence of this class from its 2021 native classification scheme, not a data gap).

	DW		ESA		ESRI
	PA (%)	UA (%)	PA (%)	UA (%)	PA (%)	UA (%)
Water	89.54 ± 12.26	69.03 ± 8.56	82.73 ± 23.45	96.00 ± 4.46	69.17 ± 18.79	85.56 ± 7.30
Trees	96.58 ± 1.14	71.21 ± 3.20	97.08 ± 1.30	87.87 ± 2.53	93.46 ± 1.57	76.37 ± 3.16
Grass	8.19 ± 4.80	25.00 ± 15.24	89.37 ± 5.68	44.55 ± 6.72	52.31 ± 8.80	24.30 ± 5.00
Flooded vegetation	0.74 ± 0.77	66.67 ± 65.33	100.00 ± 0.00	66.67 ± 13.48	0.00 ± 0.00	NA
Crops	30.41 ± 4.19	78.87 ± 9.56	67.61 ± 5.17	90.97 ± 4.70	41.57 ± 5.00	74.77 ± 8.27
Shrub	21.58 ± 5.45	23.08 ± 6.14	0.23 ± 0.05	88.00 ± 7.40	NA	NA
Built areas	67.63 ± 9.38	60.77 ± 8.43	34.10 ± 8.04	88.00 ± 7.40	74.22 ± 8.61	56.46 ± 8.04
Bare ground	24.51 ± 7.29	56.36 ± 13.23	43.49 ± 12.50	78.38 ± 9.44	9.59 ± 3.09	56.60 ± 13.47
Ice/snow	23.45 ± 7.57	59.18 ± 13.90	19.00 ± 21.22	88.73 ± 7.41	9.94 ± 4.52	90.24 ± 9.20
OA	64.24 ± 2.52		79.39 ± 2.19		65.29 ± 2.41

Table 6. Summary statistics for window size, degree of heterogeneity, sample size and accuracy. The heterogeneity indicator quantifies the number of different LULC classes within the 3 × 3, 5 × 5 and 7 × 7 windows (the central pixel and its neighbors in the windows). A value of 1 indicates a homogeneous patch interior. Values ≥ 2 indicate that the reference point is within one pixel (10 m) of a class boundary and is therefore classified as an edge or near-edge location.

Window Size	Heterogeneity	Sample Size	% of Total	Accuracy (%)	Median Patch Size (Pixel Count)
3 × 3	1	996	70.34	86.75	103,282
3 × 3	2	342	24.15	67.54	754
3 × 3	3	75	5.3	68	10
3 × 3	4	3	0.21	33.33	552
5 × 5	1	772	54.52	90.16	127,281
5 × 5	2	457	32.27	71.99	4326
5 × 5	3	161	11.37	66.46	103
5 × 5	4	20	1.41	65	365
5 × 5	5	6	0.42	33.33	718
7 × 7	1	621	43.86	92.59	143,099
7 × 7	2	498	35.17	75.5	14,950
7 × 7	3	238	16.81	67.23	457
7 × 7	4	48	3.39	66.67	242
7 × 7	5	11	0.78	36.36	552

Table 7. Moran’s I values and reference spatial dependence.

# of Neighbours	Moran’s I	p
10	0.037	0.001
20	0.018	0.019
30	0.012	0.053

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, C.; Qi, X.; Cheung, H.S.; Zhang, M.; Yue, Y.; Wang, K. Performance of Global Land Use Land Cover Products for Southwest China Karst. Remote Sens. 2026, 18, 1573. https://doi.org/10.3390/rs18101573

AMA Style

Zhang C, Qi X, Cheung HS, Zhang M, Yue Y, Wang K. Performance of Global Land Use Land Cover Products for Southwest China Karst. Remote Sensing. 2026; 18(10):1573. https://doi.org/10.3390/rs18101573

Chicago/Turabian Style

Zhang, Chunhua, Xiangkun Qi, Hoi Shan Cheung, Mingyang Zhang, Yuemin Yue, and Kelin Wang. 2026. "Performance of Global Land Use Land Cover Products for Southwest China Karst" Remote Sensing 18, no. 10: 1573. https://doi.org/10.3390/rs18101573

APA Style

Zhang, C., Qi, X., Cheung, H. S., Zhang, M., Yue, Y., & Wang, K. (2026). Performance of Global Land Use Land Cover Products for Southwest China Karst. Remote Sensing, 18(10), 1573. https://doi.org/10.3390/rs18101573

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Performance of Global Land Use Land Cover Products for Southwest China Karst

Highlights

Abstract

1. Introduction

2. Literature Review

2.1. Evolution of Global LULC Products

2.2. Challenges in Heterogeneous Landscape

2.3. Karst Landscapes as a Critical Test Case

2.4. The Need for Regional Validation

3. Study Area

4. Methodology

4.1. Data Acquisition and Preprocessing

4.2. Harmonization of LULC Classification Systems

4.3. Spatial Correspondence Assessment

4.4. Accuracy Assessment

4.5. Terrain and Landscape Heterogeneity Analyses

5. Results

5.1. Spatial Consistency Among Global Products

5.1.1. Area Distribution by LULC Type

5.1.2. Pixel-Level Agreement Patterns

5.1.3. Cross-Product Classification Transitions

5.2. Comparative Accuracy Assessment

5.3. Spatial Detail and Feature Preservation

5.3.1. Built Areas Classification

5.3.2. Agricultural Land Classification

5.3.3. Peak-Cluster Depression Landscapes

6. Discussions

6.1. Edge Effects, Landscape Heterogeneity, and Classification Accuracy

6.2. Elevation, Slope and Classification Accuracy

6.3. Challenges in Discriminating Spectrally Similar Vegetation Classes

6.4. Technical Factors Affecting Classification Performance

6.4.1. Atmospheric Correction and Topographic Shadow Effects

6.4.2. Classification Algorithms and Methodological Differences

6.4.3. Classification Scheme Inconsistencies

6.4.4. Reference Data Uncertainty and Temporal Misalignment

6.5. Limitations and Future Improvements

6.5.1. Limitations

6.5.2. Pathways for Improvement

6.5.3. Recommendations for Regional Applications

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI