Global 10 m Land Use Land Cover Datasets: A Comparison of Dynamic World, World Cover and Esri Land Cover

: The European Space Agency’s Sentinel satellites have laid the foundation for global land use land cover (LULC) mapping with unprecedented detail at 10 m resolution. We present a cross-comparison and accuracy assessment of Google’s Dynamic World (DW), ESA’s World Cover (WC) and Esri’s Land Cover (Esri) products for the first time in order to inform the adoption and application of these maps going forward. For the year 2020, the three global LULC maps show strong spatial correspondence (i.e., near-equal area estimates) for water, built area, trees and crop LULC classes. However, relative to one another, WC is biased towards over-estimating grass cover, Esri towards shrub and scrub cover and DW towards snow and ice. Using global ground truth data with a minimum mapping unit of 250 m 2 , we found that Esri had the highest overall accuracy (75%) compared to DW (72%) and WC (65%). Across all global maps, water was the most accurately mapped class (92%), followed by built area (83%), tree cover (81%) and crops (78%), particularly in biomes characterized by temperate and boreal forests. The classes with the lowest accuracies, particularly in the tundra biome, included shrub and scrub (47%), grass (34%), bare ground (57%) and flooded vegetation (53%). When using European ground truth data from LUCAS (Land Use/Cover Area Frame Survey) with a minimum mapping unit of <100 m 2 , we found that WC had the highest accuracy (71%) compared to DW (66%) and Esri (63%), highlighting the ability of WC to resolve landscape elements with more detail compared to DW and Esri. Although not analyzed in our study, we discuss the relative advantages of DW due to its frequent and near real-time data delivery of both categorical predictions and class probability scores. We recommend that the use of global LULC products should involve critical evaluation of their suitability with respect to the application purpose, such as aggregate changes in ecosystem accounting versus site-specific change detection in monitoring, considering trade-offs between thematic resolution, global versus. local accuracy, class-specific biases and whether change analysis is necessary. We also emphasize the importance of not estimating areas from pixel-counting alone but adopting best practices in design-based inference and area estimation that quantify uncertainty for a given study area.


Introduction
Global land use land cover (LULC) maps provide information necessary to quantify and understand Earth system processes and anthropogenic pressures, often at multiple spatial and temporal scales [1][2][3].Earth observation and satellite remote sensing have 2022, 14, 4101.https://doi.org/enabled mapping LULC in a spatially explicit manner that ultimately informs policy and land management decisions aimed at achieving the global sustainable development goals [4].Global LULC maps are adopted in a vast range of scientific domains and application environments.A few examples of LULC map application include: data input into mesoscale models for operational numerical weather forecasts and climate models for future climate projections [5,6]; outlining the extent of ecosystems and the change therein, e.g., as the basis for ecosystem service accounting [7,8]; isolating urban areas from their background to quantify local climate impacts [9]; informing spatial species distribution models that can predict and inform biodiversity conservation [10,11]; spatial conservation planning and environmental impact assessments [12]; monitoring deforestation and reporting to policy mechanisms, such as reduced emissions from deforestation and forest degradation (REDD+) [13].One of the most impactful applications of global LULC maps going forward may be for ecosystem extent mapping following UN statistical standards for ecosystem accounting (EA) under the System of Environmental-Economic Accounting (SEEA) [14].
Over the past two decades, the spatial resolution of land cover maps has kept pace with the resolution of available satellite sensors, including the Moderate Resolution Imaging Spectroradiometer (MODIS; 250-500 m), PROBA-v (100 m) and Landsat (30 m) satellites.The most prominent corresponding global LULC maps include the National Aeronautics and Space Administration (NASA) MCD12Q1 500 m resolution dataset (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) [15], the European Space Agency (ESA) Copernicus Global Land Service (CGLS) Land Cover 100 m dataset (2015-2019) [16] and GlobLand30 (2010) [17].While these products have been widely adopted, particularly at provincial to regional spatial scales, the medium spatial resolution prohibits the detection and monitoring of smaller landscape elements, which are vital to finer-scale Earth system processes and local land use planning.For instance, the monitoring and evaluation of agri-environmental schemes [18], such as installing hedge rows or semi-natural vegetation vital to pollinators, is not possible with the aforementioned LULC maps.Similarly, accounting for intra-urban bluegreen space requires finer resolution data to distinguish street trees, green roofs and pocket parks from built surfaces [19].
Advancing upon the revolutionary legacy of the open-access Landsat missions [20], the European Space Agency (ESA) and Copernicus Programme have delivered globally consistent optical and radar data from the Sentinel satellites (10-20 m resolution) since 2014.Together with the advances in machine learning algorithms and cloud computing platforms for Earth observation, such as Google Earth Engine (GEE) [21] and openEO [22], the Sentinel satellites have enabled large-scale mapping of LULC at a 10 m resolution [2].Since 2021, there have been three global Sentinel-based 10 m LULC maps released, including Google's Dynamic World (DW) [23], ESA's World Cover 2020 (WC) [24] and Esri's 2020 Land Cover (Esri) [25].All three products have the vision of being multi-temporal, with WC and Esri being annually updated, but only DW is operationally delivering near real-time LULC maps as new Sentinel-2 scenes become available (every 5 days).Esri and DW were both developed from deep learning models trained on the same reference dataset of over 5 billion hand-annotated Sentinel-2 pixel patches from 24,000 individual image tiles (510 × 510 pixels each) distributed over the world [23].In contrast, WC was produced with a random forest classification tree algorithm trained on hand-labeled pixels in 100 × 100 m grids at 141,000 unique locations distributed over the world [24].WC also included both Sentinel-1 and Sentinel-2 data as predictors in their model.Furthermore, a noteworthy difference is that the DW and Esri reference dataset was digitized with a minimum mapping unit (MMU) of 250 m 2 , while the WC reference dataset was digitized with an MMU of 100 m 2 .
To date, there have been no systematic evaluations of the three global 10 m LULC products with reference to one another.Given the importance of global LULC maps for various applications, and the large differences in the production of the recent 10 m products, we aimed to compare DW, WC and Esri global LULC maps in terms of their spatial correspondence with one another and their global and regional accuracy.To quantify accuracy at the global scale, we used the hand-annotated validation dataset published alongside DW.We supplemented this with a regional reference dataset of in situ pointbased survey data on LULC across the European Union.We discuss how spatial correspondence and accuracy vary across LULC classes, biomes and human settlement types and explore key limitations and advantages of the three datasets.

Land Cover Data Processing
Data pre-processing and extraction took place in GEE [21] and fed into our complete workflow, as outlined in Figure 1.Data analysis and visualization were performed in R [26].The WC and Esri global land cover datasets for 2020 are available in the GEE official and community data catalogs.However, DW is provided as a collection of classified Sentinel-2 images with less than 35% cloud cover, as defined by the 'CLOUDY_PIXEL_PER-CENTAGE' scene metadata property.Each image has a 'label' band with a discrete classification of LULC, but also 9 probability bands with class-specific probability scores generated by the deep learning model on the basis of the pixel's spatial context.To generate an annual LULC composite comparable with WC and Esri, we calculated the mode of the predicted LULC class in the 'label' band of all DW images for 2020.We also tested annual compositing by calculating the mean and median probability scores for all LULC classes during the year and then classifying by taking the class with the highest probability score per pixel.We found no difference in overall accuracy using the alternative methods (Figure S1), and, because the mode composite on the 'label' band was more computationally efficient, we decided to use that global composite for further analysis.The land cover typologies were identical for DW and Esri; however, we converted the WC typology to match DW and Esri by aggregating four LULC classes, as outlined in Table 1.The three global 10 m LULC maps (Figure 2) were used to assess spatial correspondence and accuracy, as outlined below.This class includes any geographic area covered by snow or glaciers persistently.

Spatial Correspondence Assessment
To assess how strongly the global LULC products corresponded to one another over space, we quantified and compared class-wise LULC area sums that were aggregated to an equal-area hexagonal grid (70,000 km 2 ), which covered the globe.The size of the grid was chosen based on a trade-off between computation time and precision of area aggregation.Pixels were aggregated over mutually overlapping extents and image masks across the three LULC products.To quantify correspondence between products, for each LULC class and unique hexagonal grid cell, we calculate the proportional share of each product's area estimation.Perfect correspondence resulted in a proportional ratio of 0.33:0.33:0.33 or 33% proportional share for each product.We defined strong correspondence when no single product's proportional share of the LULC class area exceeded 40% in a grid cell.Therefore, weak correspondence was when the maximum difference between the product with the biggest and smallest proportional share was greater than 20%.We visualized these proportional shares for each hexagon over the globe by assigning a color code along a tri-color gradient using the tricolore (v1.2.2.) package in R. The relative abundance of the given LULC class (average of the three products' area estimates) was mapped to the opacity of each hexagon grid so that areas where the LULC is abundant appear opaque and those where the LULC is less abundant are transparent.

Accuracy Assessment
To quantify the accuracy of the three LULC products, we used two sources of openaccess reference data (Figure 3).The first source was from the ground truth validation dataset produced by the Dynamic World team, which included a group of annotators (manual labeling of LULC types using visual interpretation of high-resolution reference imagery) supported by the National Geographic Society in partnership with Google and the World Resources Institute [23].This team consisted of 25 expert and 45 non-expert annotators who together annotated approximately 24,000 individual image tiles of 510 × 510 pixels from Sentinel-2 imagery from random dates in 2019.Annotators followed the typology definitions outlined in the first column of Table 1 and were instructed to consider an MMU of 250 m 2 , which, by definition, included mosaics of distinct landscape elements within; for example, buildings, trees and grass within a 250 m 2 area might be labeled as "built area".From the annotated dataset, a stratified random subsample of 409 Sentinel-2 tiles were withheld from the training of the DW deep learning model and used for validation.The validation tiles included expert consensus labels where all three experts agreed, or where two experts agreed and the third had no opinion or where one expert had an opinion and the other two did not.We used this dataset constituting 72 million distinct 10 × 10 m pixels for global accuracy assessment of the three LULC products (Figure 3A).WC used a completely different reference dataset for training and validation of their model, which is not open-access, and, thus, we could not use it in the present analysis.Esri used the same reference dataset as DW to train and validate their LULC model; however, the sub-set of tiles they used for validating their map was not open-access, and, therefore, we cannot be sure that the 409 validation tiles we use here were in fact independent from the dataset used to train the Esri LULC model.
The second source of reference data was a regional dataset from the Land Use/Cover Area Frame Survey (LUCAS) over the European Union (Figure 3B).LUCAS is a systematic grid of 337,845 points that are visited triennially for the collection of in situ land cover and land use data [27].In contrast to the DW ground truth dataset described above, LUCAS surveyors are instructed to record the land cover within a 1.5 m circle at each point in the grid, and, therefore, when applied to Earth observation, it consists of a significantly smaller MMU.We used the 2018 LUCAS dataset with the first-level classification with the exception of removing "G50: glaciers and permanent snow" from the water category into its own category to match the "snow and ice" category in the global LULC maps (Table 2).

Grass
Grassland (E00): Land predominantly covered by communities of grassland, grass-like plants and forbs.This class includes permanent grassland and permanent pasture that is not part of a crop rotation (normally for 5 years or more).It may include sparsely occurring trees within a limit of a canopy below 10% and shrubs within a total limit of cover (including trees) of 20%.May include: dry grasslands, dry edaphic meadows, steppes with gramineae and artemisia, plain and mountainous grassland, wet grasslands, alpine and subalpine grasslands, saline grasslands, arctic meadows, set aside land within agricultural areas (including unused land where revegetation is occurring) and clear cuts within previously existing forests.Excludes spontaneously re-vegetated surfaces consisting of agricultural land that has not been cultivated this year or the years before, clear-cut forest areas, industrial "brownfields" and storage land.

Shrub and scrub
Shrubland (D00): Areas dominated (at least 10% of the surface) by shrubs and low woody plants normally not able to reach >5 m of height.It may include sparsely occurring trees with a canopy below 10%.Excludes berries, vineyards and orchards.

Trees
Woodland (C00): Areas with a tree canopy cover of at least 10%, including woody hedges and palm trees.Includes a range of coniferous and deciduous forest types.Excludes forest tree nurseries, young plantations or natural stands (<10% canopy cover) dominated by shrubs or grass.

Flooded vegetation
Wetlands (H00): Wetlands located inland and having fresh water and wetlands located on marine coasts or having salty or brackish water as well as areas of a marine origin.

Water
Water areas (G10 to G40): Inland or coastal areas without vegetation and covered by water and flooded surfaces, or likely to be so over a large part of the year.

Snow and ice
Glaciers, permanent snow (G50): Areas covered by glaciers (generally measured at the time of their greatest expansion in the season) or permanent snow.
The 2020 LULC predictions for each global product were sampled over the annotated image pixels (global validation set) and survey locations (regional validation set).Accuracy was quantified globally/regionally, but also stratified by biome, settlement type (urban, rural and uninhabited) and continent.Biomes were defined using the RESOLVE bioregions dataset [28], while human settlement type was derived from the Global Human Settlement Layers, Settlement Grid [29].We constructed confusion matrices for each LULC product to calculate class-specific user's/precision and producer's/recall accuracy and overall accuracies.

Spatial Correspondence
The area estimates for built area, crops, trees and water showed strong correspondence between the three global LULC products, particularly in areas with the greatest relative abundance of the given LULC class (i.e., gray areas with bordered grid cells in Figure 4).However, for some regions, there were discrepancies between products; for example, DW over-estimated the crop cover in the western USA, Kazakhstan and Mongolia relative to the other LULC products.
The LULC classes with the lowest correspondence between global products were bare ground, grass, scrub and shrub (Figure 4).DW estimated higher proportions of bare ground in mid-to lower latitudes, whereas WC estimated more bare ground in higher latitudes (Figure 4C).WC consistently estimated greater grass cover than DW and Esri across most of the world, except for over the taiga in Russia (Figure 4D).Conversely, the Esri product estimated greater shrub and scrub cover across the world except for a small section in Canada and the savanna-forest ecotone in central Africa (Figure 4E).
The LULC classes flooded vegetation, and snow and ice exhibited notable disagreements between products for the northern latitudes (Figure 4G,I).Esri estimated the highest proportions of flooded vegetation over North America, whereas WC estimated the highest proportions over Russia.DW estimated significantly more snow and ice cover than both WC and Esri over the whole of the Northern Hemisphere, apart from areas that are permanently snow-covered (e.g., Greenland) or snow-free (e.g., Sahara desert) (Figure 4I).

Accuracy
Using the global ground truth dataset with a minimum mapping unit of 250 m 2 , we found that Esri had the highest overall accuracy (75%) compared to DW (72%) and WC (65%; Table 3; Figure 5).Across all the LULC products, water was consistently the most accurately mapped class (balanced accuracy 92%; mean of precision and recall Figure 5), followed by built area (83%), trees (81%) and crops (78%).In contrast, bare ground (57%), grass (34%), shrub and scrub (47%) and flooded vegetation (53%) were mapped with the lowest accuracies (Figure 5).The accuracies across all the LULC products were generally lowest in the tundra biome, where grass, bare ground, shrub and scrub and crops were mis-classified and had the lowest recall and precision accuracies.The accuracies were the highest in temperate and boreal forests, where crops, trees and built area had the highest accuracies.There was very little difference in overall accuracy between urban, rural and uninhabited areas, with the exception that trees had lower precision in urban areas compared to rural and uninhabited areas, particularly for WC (40% lower) and DW (20% lower).Differences in accuracy between the continents were small; however, when averaged across LULC products, the accuracies were highest in North America and lowest in Africa (Figure 6).

Accuracy
Dynamic World Esri World Cover Global validation 72% 75% 65% Regional validation (European) 66% 63% 71% Using the regional ground truth dataset (LUCAS) across Europe with an MMU of <100 m 2 , we found that the order of product accuracies was reversed compared to the global validation (Table 3; Figure 7).WC exhibited the highest accuracy (71%) compared to DW (66%) and Esri (63%).WC was particularly more accurate relative to DW and Esri in temporal and boreal forests and savanna biomes.Similar to the result from the global validation, the differences in accuracy between human settlement types were minimal.Furthermore, the relative differences in accuracy between LULC products were consistent across human settlement types.It should be noted that, for the European validation data, there were very few data points for the bare ground and snow and ice LULC classes, which may bias accuracy estimates significantly.

Explaining the Differences between Global LULC Products
The production of global LULC maps is inherently difficult due to the extensive biogeographical variations within and across biomes that lead to diverse spectral signatures within a single LULC class [30].In this sense, it is not surprising that three global LULC maps produced by three independent groups have large differences in accuracy and spatial correspondence.Below, we attempt to explain some of the main differences as presented in our results.

Minimum Mapping Unit
The global LULC maps had contrasting accuracies when validated against the global versus regional reference datasets (Figure 5 versus Figure 7).While Esri and DW were most accurate at the global scale, WC was most accurate at the European scale.Apart from the spatial extent, the most important difference between the reference datasets used to validate the LULC maps is the MMU used to collect ground truth information.The global validation dataset was digitized ex situ (i.e., based on visual interpretation of satellite imagery) using an MMU of 250 m 2 (i.e., 50 × 50 m square), whereas the regional validation dataset (LUCAS) was collected in situ for point circles with a radius of 1.5 m (MMU <100 m).We suspect that the main reason Esri and DW had higher accuracies than WC at the global scale was because Esri and DW were produced from models trained on reference data with an MMU unit of 250 m 2 , while WC was trained on data with an MMU of 100 m 2 .We also posit that this difference in MMU is partially responsible for the difference in LULC classification granularity at the landscape scale.For instance, the inset maps in Figure 2 reveal how Esri and DW predictions are more clustered and generalized compared to WC, which exhibits more of the 'salt and pepper' characteristic of pixel-based classification techniques.Urban gardens and trees are incorporated into the "built area" cluster of pixels in DW and Esri maps, while they are labeled as "grass" or "trees" in WC.This illustrates why WC is 10% more accurate than Esri and DW when classifying built areas across Europe using the point-based LUCAS dataset as a reference (Figure 7).

Classification Typology
Another factor leading to differences between global LULC products is the classification typology used.For the purposes of cross-comparison, we harmonized all three LULC products to a nine-class typology (Table 1).Even though this is a much simpler typology than other regional LULC datasets (e.g., CORINE land cover [31]), there remain significant challenges in distinguishing spectrally similar classes, such as bare ground, grass, shrub and scrub, a finding often echoed in the literature [16,32,33].These classes are not only difficult for satellite-based machine learning models to distinguish but also for human annotators using aerial or satellite imagery for visual interpretation.For instance, there was large disagreement between expert and non-expert labelers involved in developing the training dataset for DW and Esri [23].A pixel-based comparison of expert and non-expert annotations revealed a recall rate (producer's accuracy) as low as 22% for grass and 31% for bare ground.Although WC did not publish a similar uncertainty assessment of its annotation team, it is reasonable to assume their reference dataset suffered from the same error.It is known that, even with in situ LULC labeling (i.e., field-based ground truth), similar errors due to sampler bias may exist.For instance, the European Environment Agency discovered that CORINE-2000 accuracy was boosted by 6.4% following post-screening and cleaning of erroneous LULC labels in the LUCAS dataset [34].This partly explains why the spatial correspondence between global LULC products and classlevel accuracy was generally poorest for bare ground, grass and shrub and scrub cover (Figures 4 and 5).
Another source of discrepancy between LULC products is the slight difference in LULC definitions for certain classes (see descriptions in Table 1).For instance, the flooded vegetation class in DW and Esri includes rice paddies and irrigated/inundated agriculture.In WC, this type of cropland is included in the cropland class.Furthermore, the DW reference dataset defines the shrub and scrub class relatively broadly as clusters of plants that are dispersed over an area without any specification of cover percentage ("...moderate to sparse cover of bushes, shrubs, and tufts of grass") [23].This is different to the WC typology, where grass, bare ground and shrub/scrub are defined using specific cover percentages ("...shrubs having a cover of 10% or more.Shrubs are defined as woody perennial plants with persistent and woody stems and without any defined main stem being less than 5 m tall") [24].

Modeling and Validation Methods
At the global scale, the accuracies of DW, WC and Esri were different to one another (Figure 5) and also different to the independent accuracies reported by the data providers themselves: DW 72% versus reported 74%; WC 65% versus reported 74%; Esri 75% versus 85%.The discrepancy for DW is likely due to the fact that we aggregated LULC predictions to an annual composite for 2020 (using the mode of the 'label' band), whereas the DW validation report used scene-level model predictions for Sentinel-2 tiles in 2019.The WC dataset was produced and validated with a completely different reference dataset with a different MMU to the DW validation tiles used here.The metadata published with the Esri dataset is incomplete, and, thus, we do not know which reference data were withheld from model training and whether the 409 validation tiles used here were indeed independent from the Esri training dataset.If it is the case that the validation data used here had been part of the data used to train the Esri deep learning model, then it is possible that our estimates of the Esri map accuracy are overly optimistic.
The DW and Esri maps were produced using a deep learning model, whereas WC was produced using a random forest classification.This difference in modeling framework likely explains some of the difference in accuracy between LULC products.Deep learning models, such as the fully convolutional neural network employed in DW, take the pixel context into account when making inferences, whereas random forest does not.This, together with the difference in MMU, explains why the DW and Esri maps are clustered and generalized at a landscape scale (inset maps Figure 2).
Finally, we acknowledge that we are using reference data from 2019 (global) and 2018 (European) that do not align with the year of the LULC maps (2020) and that this may lead to discrepancies between the validation set and the reality on the ground in 2020.However, we suspect these changes to be minimal, and any bias introduced should be consistent across global LULC products.Furthermore, even the WC reference dataset, which is not open-access, faces the same limitation as it was collected for the year 2015 and used to create and validate the WC map for the year 2020.

Recommendations for Users
The cross-comparison results presented here indicate that there is no "one-size-fitsall" when it comes to global LULC maps and their potential application.We find differences in accuracy across spatial scales (global versus regional), LULC classes, continents, biomes and urban settlement types.Therefore, an overall recommendation is to carefully evaluate the global LULC products with respect to the aforementioned factors and how they relate to the application requirements.We note that it is also possible to use all three LULC products in combination by creating some form of majority vote or weighted average.Nevertheless, we make some general recommendations for users of either DW, Esri or WC or a combination thereof:

•
Regardless of LULC product, users should implement design-based inference when calculating LULC areas or changes and avoid drawing conclusions from simple 'pixel-counting', which leads to biased area estimates [35].Design-based inference involves generating a post-classification reference (validation or "ground truth") sample that is implemented with a probability sampling design (e.g., simple random or stratified random), which can be used to quantify unbiased area and accuracy estimates.WC is most appropriate when considering an MMU of <100 m 2 or when a user wants to resolve smaller landscape elements.For example, WC is advantageous in urban areas and complex agricultural landscapes with small or thin vegetation structures, such as trees or hedge rows.

•
Based on our supplementary analysis of the DW compositing method (Figure S1), we find that the type of temporal aggregation of DW predictions has very little effect on global and regional accuracy.However, we note that changing the seasonal extent of temporal aggregation (e.g., growing-season composite) may have significant effects on accuracy (although we did not test this here).

•
The delivery format of DW in near real-time, covering the entire Sentinel-2 image collection and including LULC class-specific probability scores, is qualitatively unique from Esri and WC, which only produce annual LULC maps without probability scores.We encourage users to take advantage of this unique aspect to DW by exploring novel possibilities discussed in the section below.

Potential for Future Research
Although our analysis provides important information about global model accuracies, it remains to be seen which LULC product provides area change estimates with the least amount of error as measured using design-based inference for area estimation [35,36].Quantifying confidence intervals around area estimates or area change estimates is necessary for the adoption of LULC products in policy mechanisms, such as the REDD+ [13] reporting on deforestation or the SEEA ecosystem accounting [14].It is, therefore, of interest to test DW, Esri and WC (as well as other multi-temporal medium resolution maps (e.g., Friedl et al., 2022 [37]) in terms of how accurately they estimate changes in LULC at various spatial and temporal scales.However, design-based inference can be time-consuming and costly.A requirement for design-based inference is an explicitly specified population, and, in the context of evaluating map accuracy, this population often refers to a population of pixels included in a map and given different class labels.This population varies spatially and temporally depending on the project scope, and, therefore, a probability-based sample needs to be generated for making an inference each time the project scope changes.The probability scores provided by DW may allow for more efficient alternatives to design-based inference.Sales et al., 2021 [38] have shown that averaging class probability scores from a random forest model can give area estimates that are substantially less biased than 'pixel-counting' and almost as precise as design-based methods.The same might be true of DW class probability scores.Furthermore, there is also scope to use DW probability scores to estimate per-pixel classification accuracy [39,40].
The probability scores available in the DW product provide scope for several other avenues of further research and tailored LULC classification.Firstly, they can be aggregated over time frames relevant to the application task.For example, annual composites might be appropriate for some tasks, while summer or winter mosaics might be appropriate for others.Secondly, users can apply custom thresholds or more complex decision frameworks to the predicted probabilities in order to derive continuous or discrete LULC outputs (Figure 8C).For example, this type of thresholding can be used to generate surface water extent data much more frequently than the global surface water (GSW) dataset without any update lags (currently up to 18 months).Thirdly, users can train their own local machine learning models using DW probability scores and custom reference data as the input (Figure 8D).This may allow for changing the LULC typology or resolving finerscale landscape elements that are not present in the default DW 'label' band.Users that explore custom models, such as DNN architectures, may also try different loss functions that result in predictions that are less clustered and able to resolve finer details.For example, Lang et al., 2022 [41] use GEDI reference data at 25 m but produce a tree canopy height model at a 10 m resolution.Fourthly, frequency and near real-time availability of the class probability scores allow for application of advanced time series analysis, such as Land-Trendr (Landsat-based detection of trends in disturbance and recovery) and CCDC (continuous change detection and classification) [42].There are several spheres of application for which 10 m LULC maps are particularly well-suited.Urban ecology and climate science is one such area because landscape composition and structure often manifest at scales of 10 m or less.In an increasingly urbanizing planet, it is important to accurately monitor global and regional urban extent for planning purposes.How these newer-generation datasets compare to existing estimates across regions is still largely unknown [43].There is also potential for using DW to quantify seasonal land cover dynamics (e.g., including urban vegetation phenology [44]) within cities, which is not possible with static annual LULC maps.The spatial compositions of LULC and LULC changes are important for modeling landscape connectivity and biodiversity community dynamics [45].Urban ecology may, therefore, benefit from comparing the landscape metrics that result from global 10 m LULC maps.
Ecosystem accounting is another application where multi-temporal 10 m LULC maps may prove particularly useful.Many have highlighted the need for remote-sensing-based approaches to implement ecosystem extent-condition accounting in ecosystem services mapping for ecosystem accounts [46].A challenge for ecosystem accounting has been that biophysical mapping of ecosystem services at a national level has relied on discrete categories of land cover types typical of traditional LULC maps.Therefore, ecosystem service models have not been sensitive to transitional or successional changes from one LULC type to another typical of ecological gradients or continuums [47].It is possible that DW probability scores can detect continuous gradients in ecosystem accounting and that LULC probability can be thought of as a higher-level structural composition indicator within the ecosystem condition typology.Area-weighted probabilities have been computed for biodiversity indices but typically at much coarser spatial levels [48].It is possible that LULC class probabilities could be applied to some standard ecosystem services (required by EUROSTAT) in decreasing order of accuracy, including (i) biomass, carbon storage and sequestration, air filtration, run-off control; (ii) somewhat less for local climate regulation and landscape aesthetics/greenview exposure and (iii) even less for crop pollination and recreation potential.Therefore, class probability scores could allow a merging of the extent-condition account for the purpose of more accurate ecosystem service computation (i.e., reducing the information loss in using threshold levels for classifying LULC types for the extent account on its own-moving from Figure 8D to 7C).

Conclusions
LULC mapping at global extents has been revolutionized by the plethora of mediumresolution satellite data available from programs such as Landsat and Sentinel.In our cross-comparison of global 10 m resolution LULC maps, we found large inaccuracies and spatial and thematic biases in each product that vary across biomes, continents and human settlement types.Our overarching recommendation is to critically evaluate each LULC product with reference to the application purpose.We highlight the novelty of DW as a global near real-time LULC product with class probability scores.LULC types, regardless of definition and type system, share with ecosystems the property that their composition, structure and processes often vary in a gradual, continuous manner over space and time.We suggest that the DW probability scores offer a fundamental shift in land cover mapping from categorical to continuum concepts.

Figure 1 .
Figure 1.Flowchart of the methods followed in the cross-comparison and accuracy assessment of global 10 m LULC maps.

Figure 2 .
Figure 2. Maps of global 10 m resolution land cover maps, including Dynamic World (A), World Cover (B) and Esri Land Cover (C).Inset maps show a zoomed extent of a landscape in South Africa and Brazil, indicated with black dots on the world maps to illustrate the spatial grain of the maps at a local scale.White areas in the Arctic and Antarctic in (A) and (B), although partly mapped in (C), were not included in the analysis.

Figure 3 .
Figure 3. Distribution of the global (A) and regional (B) reference data locations.Locations in (A) represent Sentinel-2 image tiles of 510 × 510 pixels, which were manually annotated ex situ.Locations in (B) are grid points sampled in situ in the LUCAS area survey.Inset bar plot shows the number of reference samples per LULC class.

Figure 4 .
Figure 4. Spatial correspondence between global 10 m land cover maps for each of 9 land cover classes.The proportion of land cover within each hexagonal grid cell is calculated for each LULC product and then visualized along a tri-color gradient illustrating the proportional share of areas estimated by each LULC product.Gray areas indicate strong correspondence, whereas colored areas reflect dominance of one LULC product.Gric cells outlined in black indicate strong correspondence, defined as cases where no single LULC product has more than 40% share of the combined area within the grid cell.The opacity of the grid cells indicates the absolute percentage abundance of each LULC class averaged over the three LULC classes.Opaque areas have near-maximum percentage cover for that LULC class, whereas transparent areas have very low percentage cover.

Figure 5 .
Figure 5. Accuracy of global 10 m land cover maps across LULC classes, biomes and human settlement type based on hand-annotated image tiles with minimum mapping unit of 50 × 50 m.Accuracy is expressed as precision/user's, recall/producer's and overall accuracy based on a confusion matrix, with sample sizes indicated in millions (MM) in parentheses on the y-axis.

Figure 6 .
Figure 6.Accuracy of global 10 m land cover maps across LULC classes and continents based on hand-annotated image tiles with MMU unit of 50 × 50 m.Circles show users, producers and overall accuracy.Accuracy is expressed as precision/user's, recall/producer's and overall accuracy based on a confusion matrix, with sample sizes indicated in millions (MM) in parentheses on the y-axis.

Figure 7 .
Figure 7. Accuracy of global 10 m land cover maps over Europe using point-based ground truth data from the LUCAS survey.Accuracy is stratified by LULC class, ecoregion and human settlement type.Accuracy is expressed as precision/user's, recall/producer's and overall accuracy based on a confusion matrix, with sample sizes indicated in thousands (K) in parentheses on the y-axis.

Figure 8 .
Figure 8. Example of the flexibility inherent in the Dynamic World data format, which includes multi-temporal class probability estimates.An urban landscape in Prague, Czechia (A) along with an annual mode composite from Dynamic World (B).Dynamic World class probabilities for the tree class are rescaled to highlight intra-urban tree cover (C).The predictions from a random forest model trained on LUCAS reference data and Dynamic World class probabilities are shown in (D), illustrating the possibility of resolving smaller landscape elements.

Table 1 .
Classification typology cross-walk between the three global 10 m LULC maps included in this study.
amples: rivers, ponds, lakes, oceans, flooded salt plains.This class includes any geographic area covered for most of the year (more than 9 months) by water bodies: lakes, reservoirs and rivers.Can be either fresh-or saltwater bodies.In some cases, the water can be frozen for part of the year (less than 9 months).Snow and ice Snow and ice Snow and iceLarge homogenous areas of thick snow or ice, typically only in mountain areas or highest latitudes.Large homogenous areas of snowfall.Large homogenous areas of permanent snow or ice, typically only in mountain areas or highest latitudes; examples: glaciers, permanent snowpack, snow fields.

Table 2 .
Classification typology of the LUCAS data used for regional (European) validation of global LULC maps in this study.Areas where seasonal or perennial crops are planted and cultivated, including cereals, root crops, non-permanent industrial crops, dry pulses, vegetables, flowers, fodder crops, fruit trees and other permanent crops.Excludes temporary grasslands, which are artificial pastures that may only be planted for one year.Areas with no dominant vegetation cover on at least 90% of the area or areas covered by lichens/moss.Excludes other bare soil, which includes bare arable land, temporarily unstocked areas within forests, burnt areas, secondary land cover for tracks and parking areas/yards.

Table 3 .
Summary of overall accuracies quantified at the global and regional scale.

•
In general, users can rely on water, built area, trees and crops being mapped with the highest accuracy, while shrub and scrub, grass, bare ground and flooded vegetation have the lowest accuracies.With this in mind, it may be beneficial to simplify the LULC typology by merging classes with low accuracies into aggregate classes if your use-case allows it.• Users should be aware of the biases in global LULC products (reported relative to one another).Specifically, WC is biased toward estimating greater grass cover, Esri towards shrub and scrub cover and DW towards snow and ice.• LULC classification accuracy varies by biome, continent and urban settlement type, and, therefore, users may consult Figures 4-6 here to gather information on what to expect given the local context of their work.•