Assessing a Prototype Database for Comprehensive Global Aquatic Land Cover Mapping

: The monitoring of Global Aquatic Land Cover (GALC) plays an essential role in protecting and restoring water-related ecosystems. Although many GALC datasets have been created before, a uniform and comprehensive GALC dataset is lacking to meet multiple user needs. This study aims to assess the effectiveness of using existing global datasets to develop a comprehensive and user-oriented GALC database and identify the gaps of current datasets in GALC mapping. Eight global datasets were reframed to construct a three-level (i.e., from general to detailed) prototype database for 2015, conforming with the United Nations Land Cover Classiﬁcation System (LCCS)-based GALC characterization framework. An independent validation was done, and the overall results show some limitations of current datasets in comprehensive GALC mapping. The Level-1 map had considerable commission errors in delineating the general GALC distribution. The Level-2 maps were good at characterizing permanently ﬂooded areas and natural aquatic types, while accuracies were poor in the mapping of temporarily ﬂooded and waterlogged areas as well as artiﬁcial aquatic types; vegetated aquatic areas were also underestimated. The Level-3 maps were not sufﬁcient in characterizing the detailed life form types (e.g., trees, shrubs) for aquatic land cover. However, the prototype GALC database is ﬂexible to derive user-speciﬁc maps and has important values to aquatic ecosystem management. With the evolving earth observation opportunities, limitations in the current GALC characterization can be addressed in the future.


Introduction
Aquatic land cover (excluding open oceans) refers to land cover types that are significantly influenced by the presence of water over extensive periods in a year [1], including not only open water, but also wetlands in transitional zones of terrestrial upland and open water systems [2]. Aquatic ecosystems play an important role in the global carbon cycle and provide crucial ecosystem services to our social, economic, and environmental well-being. However, the increased global water demand and global climate changes have exerted pressure on aquatic ecosystems [3]. Knowledge about the global distribution of aquatic land cover is critical to manage and protect aquatic ecosystems.
Remote sensing provides an efficient way to monitor the spatial distribution of aquatic land. As there is a lack of uniform and comprehensive aquatic land cover classification schemes, current Global Aquatic Land Cover (GALC) datasets have often been narrowed down to specific classes [4], most of which focus on providing the information of water bodies [5,6] while missing the vegetation and wet soils that are key components of aquatic ecosystems [2]. Currently, the most comprehensive GALC product that describes a variety of aquatic classes is the Global Lakes and Wetlands Database (GLWD) [7]. However, sourcing from data in the 1980s, GLWD is out of date for present GALC monitoring. Aquatic classes mapped in Global Land Cover (GLC) products have often been underrepresented Remote Sens. 2021, 13,4012 2 of 24 and have suffered from low accuracies [8]. The inconsistent classification schemes adopted by different datasets lead to discrepancies in the spatial distribution among different GALC datasets [5], and further bring uncertainties for users when employing these datasets in their research [9].
Depending on the application, GALC map users may require aquatic information at different levels of detail. GALC datasets are most commonly applied to define the region of interest using the general distribution of aquatic land cover [6]. In some other cases, more detailed information on aquatic lands is needed. For instance, a global product distinguishing the vegetation type under different water persistence is helpful for estimating methane emissions, because the production of methane in aquatic ecosystems is affected by water duration [10] and vegetation type [11]. However, such detailed information is rare in existing datasets. Moreover, it is difficult to obtain the user-required information from only one dataset for various applications.
Considering the variety of user needs and the limitations of current global products, a more comprehensive and user-oriented GALC dataset is necessary. As existing classification schemes are either too broad, which is beyond the capability of satellite sensors (e.g., Ramsar wetland classification system [12]), or too centered on a national scale (e.g., Canadian wetland classification system [13]), a generally applicable GALC characterization framework is required. The ISO-certified United Nations Land Cover Classification System (LCCS) offers a good way to standardize the terminology of a land cover type by combining a set of independent diagnostic attributes, i.e., classifiers [14]. Built upon the LCCS approach, a three-level GALC characterization framework was developed recently [6] which identifies aquatic land cover from general to detailed levels. By organizing the information on a level and classifier basis, this framework not only reflects the complexity of aquatic ecosystems but also allows users to derive the information for their own applications.
Given that a comprehensive and state-of-the-art GALC dataset is not yet available, to create an improved dataset, existing global maps are often integrated, benefitting from the strengths of individual datasets. With map integration, existing thematic information can be adapted to specific user needs by adjusting to the user-required legends [15]. This is also helpful to identify the gaps between current datasets and user requirements [16]. Developments in new Earth Observation (EO) data and techniques have promoted the continuous and operational monitoring of global land cover [17]. Although a number of GALC datasets have been created in recent years, these datasets have not been assessed towards comprehensive GALC mapping. Given the lack of such research, a closer look at the status of current datasets would provide useful insights for ongoing GALC mapping initiatives.
Here, we present a study on assessing the effectiveness of the integration of existing datasets towards comprehensive and user-oriented GALC mapping. We first generated a prototype GALC database using several representative global products. Then, the limitations of existing datasets for GALC mapping were analyzed through independent validation. Finally, we highlighted the evolving EO opportunities provided for improving GALC characterization.

Materials and Methods
According to the review of currently available GALC datasets by Xu et al. [6], users prefer datasets with ≤100 m resolution, thus, the spatial resolution of the prototype GALC database was set to 100 m. The nominal year of the static prototype database was chosen as 2015 because more global products describing GALC are available around 2015 compared with other years [6]. General steps taken in this study are summarized in Figure 1.

Global Aquatic Land Cover Characterization Framework
The prototype GALC database was built upon the LCCS-based GALC characterization framework proposed by Xu et al. [6] (Figure 2). Level-1 identifies aquatic land cover as a whole, representing the discrimination of aquatic and non-aquatic lands. Xu et al. [6] proposed five classifiers at Level-2, while this study focused on three of them; the persistence of water-the duration of water covering the surface; the presence of vegetation-the existence or absence of vegetation; and the artificiality of cover-whether or not a land cover is managed by humans. At Level-3, the vegetated and non-vegetated types are specified into more detailed classes by the life form classifier. This unique design was intended to enable users to generate maps according to their own needs.

Input Datasets
Input datasets were selected from the 33 GALC datasets reviewed by Xu et al. [6]. To ensure the thematic representativeness and quality of the input datasets, four criteria were used in the selection:

•
Thematic detail: The dataset should include at least one classifier of information at Level-2 or Level-3 of the reference GALC characterization framework. • Temporal range: To minimize the influence of land changes, the dataset should describe aquatic land cover within 2015 ± 3 years. • Spatial resolution: Considering the limited availability of high-resolution (≤100 m) datasets, the spatial resolution of the dataset should at least be ≤1 km. • Accuracy: The dataset should at least have an overall accuracy > 70% or being extensively evaluated (for those without quantitative assessment).
Finally, eight datasets (Table 1) meeting the above criteria were selected, of which five have a single aquatic class and three are GLC products. It should be noted that the selected datasets are considered the best to represent currently available datasets around 2015, however they might still be inferior compared with recently developed ones. If needed, users can include more advanced datasets to update the database.

. Validation Datasets
The Level-1 validation dataset used for accuracy assessment was collected as part of the CGLS-LC100 project [26]. The data include 26,714 sample sites across the globe (Figure 3), of which 2989 are aquatic and 23,725 are non-aquatic. Each sample site corresponds to a 100 m × 100 m pixel, and it is then divided into 100 subpixels at 10 m × 10 m resolution. The reference land cover was labelled at the subpixel level by a group of experts that were trained on separating different land cover types. In this study, the dominant type of the 100 subpixels was used to represent the land cover class of each 100 m × 100 m sample site. This dataset was generated following the stratified random sampling, and the inclusion probabilities of different sampling stratums were considered (see Tsendbazar et al. [27] for more details). The satellite imagery used for interpretation was from the year 2015. The validation of Level-2 and Level-3 maps requires information on water persistence, vegetation presence, artificiality of cover, and life form types. Such detailed information was not recorded in the CGLS validation dataset. Thus, we randomly selected (i.e., simple random sampling) 800 aquatic sample sites ( Figure 4) and visually interpreted the four classifiers on the Geo-wiki platform (http://www.geo-wiki.org, accessed 1 July 2009) using high-resolution Google Earth images, Bing maps, ESRI-WORLD imagery, and Sentinel-2 images from 2015. Time series of Sentinel-2 images (2015-2019) and the Normalized Difference Vegetation Index based on MODIS, Landsat, and PROBA-V were also used to characterize the information on the four classifiers. mation was not recorded in the CGLS validation dataset. Thus, we randomly selected (i.e., simple random sampling) 800 aquatic sample sites ( Figure 4) and visually interpreted the four classifiers on the Geo-wiki platform (http://www.geo-wiki.org, accessed 1 July 2009) using high-resolution Google Earth images, Bing maps, ESRI-WORLD imagery, and Sentinel-2 images from 2015. Time series of Sentinel-2 images (2015-2019) and the Normalized Difference Vegetation Index based on MODIS, Landsat, and PROBA-V were also used to characterize the information on the four classifiers.

Dataset Pre-Processing
The input datasets were reprojected onto the World Geodetic System (WGS) 1984 latitude/longitude and resampled into a spatial resolution of 0.00099° (approximately 100 m at the equator). Datasets in a vector format (i.e., GRanD, PEATMAP, global saltmarsh, GMW) were rasterized into the same projection and spatial resolution.

Dataset Pre-Processing
The input datasets were reprojected onto the World Geodetic System (WGS) 1984 latitude/longitude and resampled into a spatial resolution of 0.00099 • (approximately 100 m at the equator). Datasets in a vector format (i.e., GRanD, PEATMAP, global saltmarsh, GMW) were rasterized into the same projection and spatial resolution.
Among the input datasets, there exist some repeated classes (e.g., water bodies, mangroves) and overlapping areas, which may cause inconsistencies in the map integration process. To deal with this issue, the priority of each input dataset was evaluated using a ranking based on spatial resolution, temporal range, and accuracy. The general rule is that a dataset with a higher resolution, higher classification accuracy, and closer to the year 2015 was ranked higher. For datasets in a vector format, the larger range of the MMUs (Table 1) was taken as the spatial resolution. Furthermore, to facilitate the comparison of datasets with a differing spatial resolution, we divided the resolution into 6 groups, being ≤30 m, 30~100 m, 100~300 m, 300~500 m, 500~1000 m, and >1000 m. Datasets with a spatial resolution ≤ 30 m were ranked on top. For those datasets with a long time span (e.g., 1990-2013), the earlier starting year was used to rank that dataset. Regarding the accuracy ranking, the F-score [28] was calculated based on Equation (1) whenever the producer's (PA) and user's (UA) accuracies were available. For those without a quantitative accuracy assessment, the F-score was set to 0.
Based on the above rules, the three quality indicators of each dataset were given a ranking score ( Table 2). The priority of the input datasets was determined using the average of the three rankings. Among the eight input datasets, GSW was ranked on top, followed by the GMW dataset. According to the ranking, water bodies from the CGLS-LC100, CCI-LC, and GLCNMO2013 dataset were excluded, and mangroves from GLCNMO2013 and the "tree cover, flooded, saline water" from CCI-LC were not used. Note: The F-scores of the CGLS-LC100, CCI-LC, and GLCNMO2013 dataset were calculated as an average of the F-score of all aquatic classes.

Legend Harmonization of Input Datasets
The legend harmonization was accommodating the legend of input datasets into "classifiers" of the reference GALC characterization framework based on the original class definition in the reference papers (Table 1). Take mangroves of the GMW dataset as an example, they are defined as "forested wetlands that are uniquely adapted to the intertidal zone" [18]. Accordingly, mangroves were translated as "aquatic" at Level-1, "permanently flooded" (as water is regularly available with tides in the intertidal zone throughout a year), "vegetated" (i.e., "forested wetland"), and "natural" (as the mangrove ecosystem is naturally formed) at Level-2, and "trees" at Level-3. There are also ambiguities or inconsistencies in class definitions identified in the harmonization process, and the following explains how we dealt with these issues.

•
Classes without information on the duration of water (e.g., herbaceous wetland of CGLS-LC100) were assumed as "temporarily flooded". • Inconsistent class definition, i.e., the permanent water and seasonal water of the GSW dataset (Table 1), was adjusted to conform with the reference framework.

•
For classes including more than one cover type under the same classifier and making no distinction between them, several types were put under the same classifier, e.g., the life form type of PEATMAP included both herbaceous cover and shrubs (Table 3), as marshes and shrub swamps were both mapped by PEATMAP. Datasets were composited in the order of their priority rankings (Table 2) using the Geospatial Data Abstraction Library (GDAL) [29]. Specific GDAL commands used in the map generation were listed in Table S1 (Supplementary Materials). The integrated maps were converted to the world cylindrical equal area projection [30] to calculate the area of different classes.

Level-1: The Aquatic Land Cover Map
The Level-1 map (hereafter referred to as the "integrated Level-1 map") was generated by combining the eight input datasets into one map. To get an insight on how many aquatic areas are on the land, the CGLS-LC100 land/sea mask [31] was applied to separate the aquatic land cover in the land/sea transitional zones and that on the land. The land area defined by the CGLS-LC100 land/sea mask is approximately 134.59 million km 2 (excluding Antarctica and the land/sea transitional area).
Level-2: The Persistence of Water, Presence of Vegetation, and Artificiality of Cover Map The Level-2 maps were created by combining corresponding classes (Table 3) into the three classifiers: persistence of water, presence of vegetation, and artificiality of cover. Figure 5 shows the input datasets to each classifier.
Prior to creating the persistence of water map, some processing was made to the input datasets. Firstly, the GSW water seasonality map was reclassified to generate the permanent water (≥9 months) and seasonal water (<9 months). Secondly, as the CCI-LC dataset mixed up the three water persistence types (Table 3), two masks were used to remove the permanently flooded area and the waterlogged area to get the "temporarily flooded trees, shrubs, and herbaceous cover". The mask of permanently flooded areas was formulated by the three permanently flooded classes including mangroves of the GMW dataset, reservoirs of the GRanD dataset, and the permanent water from GSW. The PEATMAP was used to remove waterlogged areas from CCI-LC. The GRanD dataset and the GSW dataset were also processed before generating the artificiality of cover map. The GRanD dataset contains natural lakes that are regulated by dams, which is not consistent with the LCCS-based definition because these lakes are naturally formed and do not require human maintenance over the long term. Therefore, we used the "natural lakes with regulation structure" from an external dataset called HydroLAKES [32] to separate natural lakes from reservoirs in the GRanD dataset. Likewise, the natural water and artificial water of the GSW dataset were separated using a mask formulated by the reservoirs (excluding dam-regulated natural lakes) from GRanD and the paddy field from GLCNMO2013.
The presence of vegetation map was composited from the Level-3 life form types ( Figure 5) into the vegetated and non-vegetated categories.

Level-3: The Life Form Map
The Level-3 map was created by combining corresponding classes for the five life form types ( Figure 2). As none of the selected input datasets contain aquatic classes of "bare land", and additionally, shrubs and herbaceous cover cannot be separated in PEATMAP as well as CCI-LC (Table 3), the Level-3 map integrated by the eight input datasets (hereafter called the "integrated life form" map) comprised only three classes, including "water body", "trees", and "shrubs and herbaceous cover" (Figure 5).
To acquire a more complete delineation of the five life form types, another map (hereafter called the "CGLS life form") was created using the Fractional Land Cover (FLC) maps of the CGLS-LC100 product [24]. This product comprises ten FLC maps, and the value of each map indicates the proportion of a 100 m × 100 m pixel filled with a specific land cover class. As several classes might coexist within the same pixel, we firstly generated a global dominant cover map using the ten maps in Google Earth Engine (GEE). Eight classes that correspond to our classification scheme, i.e., "bare/sparse vegetation", "permanent water", "seasonal water", "herbaceous grassland", "cropland", "moss/lichen", "shrubland", and "tree" were then selected from the global dominant cover map and exported from GEE. The resulting map was finally restrained to aquatic areas using the integrated Level-1 map created in this study in GDAL.

Accuracy Assessment
The integrated Level-1 map was assessed using 26,714 samples from the Level-1 validation dataset (Figure 3). Accuracy estimates such as overall accuracies (OA), class accuracies, and their confidence intervals (CI, at 95% confidence level) were calculated using the same method described in Tsendbazar et al. [27] following the good practice recommendations of stratified random sampling suggested by Olofsson et al. [33]. The sample inclusion probabilities were used in the accuracy calculation to reduce bias arising from the sampling design.
The three Level-2 maps and two Level-3 maps were assessed using the 800 sample sites shown in Figure 4. As some locations of this validation dataset had no data on the Level-2 or Level-3 maps, not all of the 800 samples were used in the confusion matrix calculation. The method of calculating the accuracy for simple random sampling [33,34] was implemented for the Level-2/3 maps. Accuracies were adjusted based on the samplecounted confusion matrix and area proportions of the mapped land cover classes. To compare the accuracy of the two Level-3 maps, herbaceous cover and shrubs on the CGLS life form map were merged and the bare land was excluded in the validation.

Level-1: Aquatic Land Cover
The integrated Level-1 map is presented in Figure 6. The total area of GALC is estimated as 27.5 million km 2 , of which 15.3 million km 2 is on the land (i.e., 11.4% of the global land area). The confusion matrix correcting unequal inclusion probabilities is shown in Table 4. The count-based confusion matrix is provided in Table S2 (Supplementary Materials). Although the integrated Level-1 map achieved an overall accuracy of 93.0% ± 0.4% (at 95% CI, Table 4), it had considerable commission errors (100%-UA) in mapping aquatic land cover. It was observed that the area-weighted UA of aquatic lands (32.7%, Table 4) was much lower compared with that of the count-based confusion matrix (58.7%, Table S2). This could be explained by the fact that non-aquatic sample sites represent a much larger proportion of the Earth's surface, therefore they carry larger weights when accounting for the unequal inclusion probabilities than aquatic sample sites. Still, even when the area weights of the classes were not considered, a lower UA of the aquatic class was notable.

Level-2: Persistence of Water, Presence of Vegetation, and Artificiality of Cover
The Level-2 maps are presented in Figure 7. The area-weighted and count-based confusion matrices of the three maps are provided in Tables 5-7 and Tables S3-S5 (Supplementary     The overall accuracy of the persistence of water map was 50.7 ± 3.8% (at 95% CI, Table 5). This map achieved a higher UA and PA in permanently flooded areas than that of the temporarily flooded and waterlogged areas. The map overrepresented the waterlogged class at the cost of the temporarily flooded class. Almost 72% (100%-PA) of the reference temporarily flooded samples were misclassified as the waterlogged and permanently flooded types.
The presence of vegetation map achieved an overall accuracy of 63.5 ± 3.6% (at 95% CI, Table 6). Generally, the PA of the non-vegetated class was much higher than its UA, and a contrary situation occurred for the vegetated class, meaning that this map tended to overestimate the non-vegetated class while underestimating the vegetated class.
The natural aquatic class on the artificiality of cover map was highly accurate in terms of PA and UA (i.e., both exceeded 90%, Table 7). However, even though the overall accuracy (88.3 ± 2.0%) was high, artificial aquatic areas were poorly characterized by this map, with the UA and PA being only 26.8 and 37.1%, respectively.

Level-3: Life Form
The integrated life form map and the CGLS life form map are shown in Figure 8. Their area-weighted and count-based confusion matrices are provided in Tables 8 and S6 (Supplementary Materials), respectively. The two Level-3 maps had a similar spatial distribution and areal percentage of water bodies, while they differed a lot in other life form types (pie charts in Figure 8). The overall accuracies of both maps were relatively low (    The integrated life form map was better at characterizing shrubs/herbaceous cover than the CGLS life form map (Table 8), while at the same time it underestimated trees with around 79% (calculated from Table 8) of the reference tree samples being omitted from shrubs/herbaceous cover. The CGLS life form map was better at characterizing trees than the integrated life form map, while it had a tendency of overestimating trees at the cost of shrubs/herbaceous cover.

Discussion
With the increasing demand for water resources, the characterization of aquatic land cover has attracted more and more attention. By reframing current datasets consistently, this research created a three-level prototype GALC database ( Figure 9) and evaluated its performance rigorously. In this section, the limitations of existing datasets and possible reasons behind those limitations are discussed. The evolving EO opportunities to improve the GALC characterization are also highlighted. Although the prototype GALC database was developed and evaluated in a systematic way, findings in this study might be subject to some limitations because of the limited number of "waterlogged", "artificial", and "shrub" sample sites for a global assessment. These classes should be investigated further if sufficient validation data are available. Nevertheless, obtaining high-quality global aquatic reference datasets with detailed information on classifiers requires considerable time and expertise given the heterogeneous and dynamic characteristics of aquatic land cover.

General Classification of Global Aquatic Land Cover
The global aquatic area on the land estimated by the integrated Level-1 map is 15.3 million km 2 , with a tendency of overestimating the total extent of GALC (Table 4). The overestimation could have originated from the input datasets. For instance, the CGLS-LC100 product is prone to misclassify the herbaceous wetland with terrestrial grasslands in the land cover classification [26].
The most recent research on the mapping of the overall distribution of GALC made by Hu et al. [35] and Tootchi et al. [36] reported an estimate of 29.8 million km 2 and 29 million km 2 of aquatic area on the land, respectively. According to the result of our accuracy assessment, the two estimates could also be considerably overestimated, indicating a global product that can accurately separate the aquatic from the non-aquatic land is still needed. Considering the key components of aquatic ecosystems, it is more difficult to map aquatic vegetation and wet soils remotely than water bodies [37]. However, integrating multi-source data such as optical, Synthetic Aperture Radar (SAR), soil, and topographic features has been demonstrated useful in improving the general-level classification of aquatic lands [38].

Classification of Persistence of Water, Presence of Vegetation, and Artificiality of Cover
The validation of the persistence of water map highlights that current datasets have limitations of characterizing the waterlogged and temporally dynamic types ( Table 5). One of the reasons is that the classification of waterlogged areas without evident surface flooding is more difficult than detecting open surface water because the contrast between wet soils and their surroundings is less pronounced [39]. In addition, the input datasets used to generate the temporarily flooded class represent mainly vegetated aquatic types ( Figure 5), while characterizing water bodies under vegetation has always been challenging [4]. Furthermore, the information on water persistence is still lacking among existing datasets. Except for the GSW dataset that characterizes the water seasonality, other input datasets were all static maps missing the information on water duration.
The presence of vegetation map tends to underestimate vegetated aquatic lands ( Table 6). The main cause lies in that identifying vegetated aquatic land cover globally remains challenging based on remote sensing classification [27]. Unlike open surface water, vegetated aquatic lands are complicated by their distribution throughout tropical to boreal environments that encompass a wide variety of vegetation types, hydrological regimes, and land-use impacts [37]. Another possible cause could be the inconsistent definition of the input datasets with our reference classification framework. For example, the GSW dataset, which was used as an input of the "non-vegetated" class in this study, considers vegetated areas that represent short-duration flooding events as seasonal water bodies [19].
The artificiality of cover map performs well in characterizing natural aquatic lands (Table 7), while defects of the two source datasets (i.e., GLCNMO2013 and GRanD) lead to low accuracies of the artificial class. Firstly, as a main source providing the information on aquatic croplands, GLC products often confuse croplands with other natural herbaceous types [25]. Secondly, the GRanD dataset delineated reservoirs with a storage capacity of >0.1 km 3 while excluding smaller reservoirs, which might cause the omission of artificial water bodies, such as fishponds.

Classification of Aquatic Life Forms
As an extension of the Level-2 presence of vegetation map, the lower overall accuracy of the Level-3 life form map (Table 8) demonstrates prominent gaps existing in the characterization of the vegetation presence and detailed vegetation types in aquatic areas. The significant underestimation of trees on the integrated life form map indicates that the two source input datasets, i.e., GMW and CCI-LC, also omitted considerable trees under an aquatic environment globally. Both the integrated life form map and the CGLS life form map have the issue of misclassifying trees, shrubs, and herbaceous cover. In fact, these types are indeed challenging to be separated solely by optical sensors as they have similar spectral signals [26]. Moreover, shrubs always grow with herbaceous vegetation or trees, making it difficult to be mapped independently.
The CGLS-LC100 FLC maps, offering the proportional estimates for basic land cover types, allow users to tailor the maps to their own applications. However, the life form map derived from these maps does not perform well in aquatic areas (Table 8), even though it has been reported with higher accuracies in the global validation [26]. The poor prediction could have resulted from the seasonal or even daily water dynamics which make it challenging to estimate the exact fraction of different land cover types [37].

Evolving EO Opportunities to Improve the GALC Characterization
Recent developments of cloud-based computational platforms, such as Google Earth Engine [40], offer a unique opportunity for global aquatic land cover mapping with its free access to tremendous volumes of EO data [41]. The Sentinel satellite imagery provided by the European Space Agency's Copernicus programme can be easily accessed on the GEE platform. Data acquired from the Sentinel-1 and Sentinel-2 satellites have a spatial resolution up to 10 m and temporal resolution reaching six days and five days, respectively. The improved spatial and temporal resolutions allow capturing the variations of water occurrence [42] and small water bodies [43]. The three red-edge bands and two shortwave infrared (SWIR) bands of Sentinel-2 imagery are valuable in discriminating spectrally similar vegetation types [44]. The SWIR bands sensitive to both soil and vegetation moisture could contribute to characterizing waterlogged areas [45]. The Sentinel-1 Cband SAR data has been successfully used to identify water under temporarily flooded vegetation [46].
Integrating multi-sensor (e.g., Landsat and Sentinel-2) and multi-source data (e.g., optical, radar, topographic, and soil data) has a better capacity to capture the inundation extent, vegetation structure, and hydroperiod variations [38] and thus is more suitable to discriminate between the aquatic and terrestrial uplands as well as the temporally dynamic and complex aquatic types (e.g., Level-3 classes). Some new datasets also have potential in improving the GALC characterization. For example, incorporating the height information, such as the recent global forest canopy height dataset [47], could reduce confusions of trees and shrubs.
SAR data at longer wavelengths can penetrate tree canopies, and specifically the P-band SAR from the upcoming BIOMASS mission [48] has higher chances to reach the surface underneath [49]. Such a design would enable characterizing the water under dense vegetation canopies and improving the mapping of vegetation in aquatic environments and water persistence in densely vegetated areas. Many innovative methods for aquatic land mapping have also been proposed that are suited to multi-temporal images, such as the Water Wetness Presence Index [39] and the Water Change Tracking algorithm [50]. Evaluating these methods is beyond the scope of the current paper.

Potential of the Prototype GALC Database in Addressing Multiple User Needs
Regardless of the accuracy of integrated maps, the developed prototype database showed what a comprehensive and user-oriented GALC product could comprise. With sufficient flexibility, the prototype database allows users to obtain their required information by combining maps at various levels and classifiers. As mentioned before, climate modelers may require a map showing the vegetation type under different water persistence for accurate estimation of methane emissions. Such a map ( Figure 10  The prototype database also has important implications for aquatic ecosystem management. Firstly, the GALC maps could serve as basic inputs to hydrological and hydrodynamic models [51]. Secondly, these maps are helpful for the determination of appropriate input parameters for hydrological modeling. For example, in flood risk management, roughness estimation is an important step to simulate flood flows using hydrological models [52]. The roughness is strongly influenced by the physical properties of surface materials, such as the vegetation density, which differ among vegetation types. In this sense, accurate characterization of the Level-2 (i.e., presence of vegetation, the artificiality of cover) and Level-3 maps hold considerable potential in improving the accuracy of roughness estimation, which can be beneficial for mitigating flood risks and conserving aquatic ecosystems.
Maps in the GALC database can also be integrated with external datasets. One of the important applications is for global land change monitoring. For instance, integrating the Level-1 map with land/vegetation change datasets (e.g., the Global Forest Watch datasets [53]) or combining the Level-2 and Level-3 maps with water change products (e.g., [54]) allows for monitoring changes in aquatic areas. Such information is valuable for evaluating land disturbance and vegetation regeneration dynamics in aquatic ecosystems.
The evolving EO opportunities provided for more accurate and continuous GALC mapping enables updating and enriching the database routinely (e.g., annually). The Sustainable Development Goal 6 [55] has put an emphasis on protecting and restoring water-related ecosystems. A comprehensive and continuous GALC database would contribute to the implementation of this goal.

Conclusions
With the aim of assessing the integration of current global datasets for comprehensive and user-oriented GALC mapping, this study has created a prototype database for 2015 which includes six maps at three levels with 100 m resolution. The combination of existing datasets tends to overestimate the general extent of aquatic land cover. At Level-2, the persistence of water map is good at characterizing permanently flooded areas, while weak in waterlogged areas without evident surface flooding and temporarily flooded areas with greater water variations-the presence of vegetation map tends to underestimate the vegetated aquatic land cover while overestimating the non-vegetated ones; natural aquatic types are sufficiently mapped while artificial aquatic lands (i.e., reservoirs and paddy fields) are poorly represented. Current datasets cannot accurately characterize the detailed life form types (Level-3) such as trees and shrubs for aquatic land cover. Although the integrated maps have relatively low accuracies, the prototype GALC database is flexible for deriving multiple user-required maps and has important implications for aquatic ecosystem management and land change monitoring in aquatic areas. The availability and easier access of high spatial and temporal resolution data and the development of new satellite missions and aquatic land cover classification methods provide opportunities to address the limitations in current GALC characterization. This work provides insights for the next-generation GALC mapping and helps future map users as well as producers to avoid some of the limitations of current global datasets.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/rs13194012/s1, Table S1: GDAL commands used in the map generation, Table S2: Count-based confusion matrix for the integrated Level-1 map, Table S3: Count-based confusion matrix for the Level-2 persistence of water map, Table S4: Count-based confusion matrix for the Level-2 presence of vegetation map, Table S5: Count-based confusion matrix for the Level-2 artificiality of cover map, Table S6: Count-based confusion matrix for the Level-3 maps.

Data Availability Statement:
The datasets used in this study are available from online repositories, with access links provided in Table 1. The maps produced in this study can be accessed from https://figshare.com/s/e06ad06fdc79e4aa43de. (accessed on 2 September 2021).