Construction of High Spatial-Temporal Water Body Dataset in China Based on Sentinel-1 Archives and GEE

: Surface water is the most important resource and environmental factor in maintaining human survival and ecosystem stability; therefore, timely accurate information on dynamic surface water is urgently needed. However, the existing water datasets fall short of the current needs of the various organizations and disciplines due to the limitations of optical sensors in dynamic water mapping. The advancement of the cloud-based Google Earth Engine (GEE) platform and free-sharing Sentinel-1 imagery makes it possible to map the dynamics of a surface water body with high spatial-temporal resolution on a large scale. This study ﬁrst establishes a water extraction method oriented towards Sentinel-1 Synthetic Aperture Radar (SAR) data based on the statistics of a large number of samples of land-cover types. An unprecedented high spatial-temporal water body dataset in China (HSWDC) with monthly temporal and 10-m spatial resolution using the Sentinel-1 data from 2016 to 2018 is developed in this study. The HSWDC is validated by 14,070 random samples across China. A high classiﬁcation accuracy (overall accuracy = 0.93, kappa coe ﬃ cient = 0.86) is achieved. The HSWDC is highly consistent with the Global Surface Water Explorer dataset and water levels from satellite altimetry. In addition to the good performance of detecting frozen water and small water bodies, the HSWDC can also classify various water cover / uses, which are obtained from its high spatial-temporal resolution. The HSWDC dataset can provide more detailed information on surface water bodies in China and has good application potential for developing high-resolution wetland maps.


Introduction
Surface water, as the most important terrestrial resource, is undergoing spatial and temporal changes caused by many factors, such as land-use/cover changes, climate changes, seasonal changes, and environmental changes, throughout the world [1]. Quantifying spatiotemporal dynamics of surface water resources will provide decision makers with information for feasible wetland restoration and management strategies and to further evaluate their effects [2].
Many water body databases have been developed in previous studies. The Shuttle Radar Topography Mission (SRTM) water body data is available at latitudes from 56 • S to 60 • N with a resolution of 30 m [3]. Verpoorter et al. [4] produced a global water body database with a 14.25-m spatial resolution. Feng et al. [5] develop the Global Land Cover Facility inland surface water dataset at a 30-m resolution for circa-2000. A global water mask based on MODIS archives is produced by Carroll et al. [1]. The aforementioned studies produced static water maps and can meet some applications, but the demand for information on the spatial and temporal changes of inland water bodies and their long-term evolution is still growing [6]. Therefore, many authors have already attempted to map dynamic changes of water bodies [7][8][9][10][11][12][13]. These datasets provide information on the extent of the water bodies at daily to monthly intervals and cover a limited geographic area. Pekel et al. [11] produced an excellent Global Surface Water Explorer (GSWE) dataset with 30-m spatial resolution and a monthly time interval. It comes from the entire multi-temporal orthorectified Landsat 5, 7 and 8 archive spanning the past 32 years and shows the spatial and temporal variability of global surface water and its long-term changes. However, due to the limitations of optical data, the above efforts for monitoring the water body dynamics are still in need of further improvement. For example, GSWE still cannot form regular periodic monitoring of water body dynamics. The dynamic water dataset formed by time series interpolation is also affected by the model itself.
Synthetic Aperture Radar (SAR) data have the advantage of being unaffected by clouds, so they can be used to regularly monitor surface water [14,15]. Yet, due to the difficulty of SAR data acquisition at large scale and the complexity of data processing, in the past, water body monitoring and mapping using SAR data has mainly focused on individual bodies of water or specific small areas [16][17][18]. With the free distribution of Sentinel-1 data, as well as the release and application of the Google Earth Engine (GEE) platform, it is possible to carry out dynamic monitoring and mapping of water bodies on a large scale. At the same time, Sentinel-1 provides data on 10-m resolution, which can produce more accurate water maps that include many small water bodies that are missed by the aforementioned surface water data products. Furthermore, hydrological inputs and outputs influence soil biochemistry, and characteristics of flooding such as duration, spatial extent, and timing of high and low waters drive plants' germination and growth [19,20]. Quantifying long-term spatiotemporal hydro-period variability and changes are fundamentally important for wetland management and restoration. Therefore, it is of significance to provide the high-precision surface water dynamic information.
In order to explore the potentiality of Sentinel-1 in large-scale water body mapping and obtain an unprecedented high spatial-temporal water body dataset, in this study, our work includes: (1) establishing a large-scale water classification method based on time series Sentinel-1 data, (2) extracting China surface water body on monthly temporal and 10-m spatial resolution from 2016 to 2018 based on the GEE platform and evaluating of its accuracy, and (3) comparing our dataset with existing water products to assess their spatial and temporal differences.

Data Sources and Availability
Sentinel-1 is the first of the Copernicus Programme satellite constellations created by the European Space Agency. This space mission is composed of two satellites, Sentinel-1A and Sentinel-1B, carrying a C-band (~5.7 cm wavelength) SAR instrument offering data products in single (HH or VV) or double (HH + VH or VV + VH) polarization [16]. There is a total of 73,128 ground range detected (GRD) images from 2016 to 2018, covering all of mainland China (Figure 1). We determined that the observation frequency at same location ranges from 17 to 128 during 2018, which can meet the monthly water mapping of China. The auxiliary data includes Landsat Operational Land Imager (OLI) image, the SRTM DEM, and GSWE. The water level dataset, GSWE and the time series of inland surface water dataset in China (ISWDC) are used for cross comparison [9,11,21]. Table 1 summarizes all of the datasets and tools used in this paper.

Methods
The main technical route of this study includes four major parts ( Figure 2): (1) preliminary extraction of water body based on Sentinel-1 archive, (2) acquisition of the auxiliary water mask, (3) post-processing for water extraction, and (4) accuracy validation of the high spatial-temporal water body dataset in China (HSWDC).

Preliminary Extraction of Water Body
In the first step, we first implement monthly mean composite using monthly Sentinel-1 images. The Sentinel-1 data covering China include two polarization modes: VV polarization and VH polarization. With reference to existing research, water bodies based on the two polarization modes have different thresholds [16]. In order to verify the feasibility of the threshold method, we divide land cover into seven land-cover types, referring to Copernicus Global Land Cover Layers-Collection 2 and high-resolution images of Google Earth: buildings, snow, forests, crops, sand dunes, grassland and water bodies [22]. Water bodies are further divided into freshwater lakes, saltwater lakes with high mineral content, reservoirs with complex geometry, rivers, water in high mountains, and periodically frozen water bodies. Based on Copernicus Global Land Cover Layers-Collection 2, the stratified random sampling approach is used to generate sample point locations, and then they are visually confirmed one by one based on the high-resolution images on Google Earth. Finally, there are totally 550 training sample locations for these 11 land-cover types across China. At each sample location, the backscatter coefficient set of the above land-cover type are extracted based on time series Sentinel-1 VH/VV polarized images. We select 6866 backscatter coefficient samples from VH images in 2018, 6820 backscatter coefficient samples from VV images in 2017 and then calculate the median, mean, upper and lower quartiles, and 1.5 interquartile range (IQR) of these backscatter coefficients ( Figure 3). The results show that all of the water body subtypes have similar values and all of the non-water land-cover types, except for sand dunes, have good separability with water bodies. With all this, on the VV polarization image, the pixels with a backscattering coefficient value not greater than −15 dB are regarded as a body of water, and on the VH polarization image, the pixels with a backscattering coefficient value not greater than −23 dB are regarded as a body of water. These characteristics of water bodies are what past research studies have used [16,[23][24][25][26]. These characteristics also allow us to detect water bodies with periodically frozen ice in winter using the above thresholds. When using GEE and Sentinel-1 archives, this effective approach is clearly operational and efficient for large-scale water mapping.

Preliminary Extraction of Water Body
In the first step, we first implement monthly mean composite using monthly Sentinel-1 images. The Sentinel-1 data covering China include two polarization modes: VV polarization and VH polarization. With reference to existing research, water bodies based on the two polarization modes have different thresholds [16]. In order to verify the feasibility of the threshold method, we divide land cover into seven land-cover types, referring to Copernicus Global Land Cover Layers-Collection 2 and high-resolution images of Google Earth: buildings, snow, forests, crops, sand dunes, grassland and water bodies [22]. Water bodies are further divided into freshwater lakes, saltwater lakes with high mineral content, reservoirs with complex geometry, rivers, water in high mountains, and periodically frozen water bodies. Based on Copernicus Global Land Cover Layers-Collection 2, the stratified random sampling approach is used to generate sample point locations, and then they are visually confirmed one by one based on the high-resolution images on Google Earth. Finally, there are totally 550 training sample locations for these 11 land-cover types across China. At each sample location, the backscatter coefficient set of the above land-cover type are extracted based on time series Sentinel-1 VH/VV polarized images. We select 6866 backscatter coefficient samples from VH images in 2018, 6820 backscatter coefficient samples from VV images in 2017 and then calculate the median, mean, upper and lower quartiles, and 1.5 interquartile range (IQR) of these backscatter coefficients ( Figure 3). The results show that all of the water body subtypes have similar values and all of the non-water land-cover types, except for sand dunes, have good separability with water bodies. With all this, on the VV polarization image, the pixels with a backscattering coefficient value not greater than −15 dB are regarded as a body of water, and on the VH polarization image, the pixels with a backscattering coefficient value not greater than −23 dB are regarded as a body of water. These characteristics of water bodies are what past research studies have used [16,[23][24][25][26]. These characteristics also allow us to detect water bodies with periodically frozen ice in winter using the above thresholds. When using GEE and Sentinel-1 archives, this effective approach is clearly operational and efficient for large-scale water mapping.

Development of Different Water Masks
The defects of this approach are the confusion between the sand dunes, snow, terrain shadows, and waters. Therefore, we build three kinds of water masks, i.e., Water Mask-slope, Water Mask-OLI, and Max Water-GSWE, to refine the preliminary water body results produced in the first step.
Some studies have shown that synthetic ascending and descending SAR scenes reduce some errors caused by radar shadows or layover, but do not completely eliminate them [17]. Using optical sensors to detect surface water will also encounter the problem of terrain shadows, and there has been a lot of research performed using slope dataset to solve this problem [6,11,27,28]. We obtain the slope dataset from SRTM DEM by calculating the maximum elevation change rate of each grid cell to its eight neighboring cells. Finally, we use a threshold of to 3 degrees to exclude steep locations where water is unlikely to exist. We extract the pixels that have slopes less than 3 degrees from the slope dataset as Water Mask-slope.
Second, the limitation of the threshold method itself will confuse the extracted water body based on Sentinel-1 with the sand dunes dominated by dry sand. The normalized differential water body index (NDWI) can be used to distinguish between sand dunes and water; therefore, we use NDWI calculated based on Landsat OLI images to generate Water Mask-OLI. Every Landsat OLI scene during one year excludes cloud pixels by using its own quality control band, and then uses a mean composite method to obtain an annual cloudless Landsat OLI image. Each pixel value of the annual cloudless Landsat OLI image is derived from the average value of the original clean pixels. McFeeters et al. propose different NDWI calculation methods [29][30][31]. We compare these calculation methods through experiments. The results show that the calculation method developed by McFeeters et al. can best reflect the water body based on the Landsat OLI image. Since it is not the main purpose of this study, no more detailed experimental results are given in the article. With all this, the NDWI mentioned in this article is calculated using the near-infrared spectrum (0.85-0.88 μm) and the green spectrum (0.53-0.59 μm) of the annual cloudless Landsat OLI image. We use the water sample in Section 3.1 to determine that the NDWI threshold is −0.06, that is, pixels with an NDWI value greater than −0.06 are identified as a water body, and pixels with an NDWI value not greater than −0.06 are identified as a non-water body.
The water mask obtained in this way contains snow pixels. It has been proven in a previous study that the reflectivity of snow in the near infrared band is higher than that of water bodies [32]. This feature can help us exclude snow pixels in the above water mask; therefore, based on the nearinfrared spectrum of the annual cloudless Landsat OLI image, the snow cover threshold (0.17) is obtained using the snow cover sample in Section 3.1. Those pixels with an NDWI value of greater than −0.06 and a near-infrared surface reflectance value of greater than 0.17 are removed from the above water mask. In order to avoid the restriction of a static mask, the annual water mask is buffered 10 m outwards to obtain Water Mask-OLI, using a morphological dilation operation [33,34].

Development of Different Water Masks
The defects of this approach are the confusion between the sand dunes, snow, terrain shadows, and waters. Therefore, we build three kinds of water masks, i.e., Water Mask-slope, Water Mask-OLI, and Max Water-GSWE, to refine the preliminary water body results produced in the first step.
Some studies have shown that synthetic ascending and descending SAR scenes reduce some errors caused by radar shadows or layover, but do not completely eliminate them [17]. Using optical sensors to detect surface water will also encounter the problem of terrain shadows, and there has been a lot of research performed using slope dataset to solve this problem [6,11,27,28]. We obtain the slope dataset from SRTM DEM by calculating the maximum elevation change rate of each grid cell to its eight neighboring cells. Finally, we use a threshold of to 3 degrees to exclude steep locations where water is unlikely to exist. We extract the pixels that have slopes less than 3 degrees from the slope dataset as Water Mask-slope.
Second, the limitation of the threshold method itself will confuse the extracted water body based on Sentinel-1 with the sand dunes dominated by dry sand. The normalized differential water body index (NDWI) can be used to distinguish between sand dunes and water; therefore, we use NDWI calculated based on Landsat OLI images to generate Water Mask-OLI. Every Landsat OLI scene during one year excludes cloud pixels by using its own quality control band, and then uses a mean composite method to obtain an annual cloudless Landsat OLI image. Each pixel value of the annual cloudless Landsat OLI image is derived from the average value of the original clean pixels. McFeeters et al. propose different NDWI calculation methods [29][30][31]. We compare these calculation methods through experiments. The results show that the calculation method developed by McFeeters et al. can best reflect the water body based on the Landsat OLI image. Since it is not the main purpose of this study, no more detailed experimental results are given in the article. With all this, the NDWI mentioned in this article is calculated using the near-infrared spectrum (0.85-0.88 µm) and the green spectrum (0.53-0.59 µm) of the annual cloudless Landsat OLI image. We use the water sample in Section 3.1 to determine that the NDWI threshold is −0.06, that is, pixels with an NDWI value greater than −0.06 are identified as a water body, and pixels with an NDWI value not greater than −0.06 are identified as a non-water body.
The water mask obtained in this way contains snow pixels. It has been proven in a previous study that the reflectivity of snow in the near infrared band is higher than that of water bodies [32]. This feature can help us exclude snow pixels in the above water mask; therefore, based on the near-infrared spectrum of the annual cloudless Landsat OLI image, the snow cover threshold (0.17) is obtained using the snow cover sample in Section 3.1. Those pixels with an NDWI value of greater than −0.06 and a near-infrared surface reflectance value of greater than 0.17 are removed from the above water mask. In order to avoid the restriction of a static mask, the annual water mask is buffered 10 m outwards to obtain Water Mask-OLI, using a morphological dilation operation [33,34].
In a study by Pekel et al. [11], the GSWE included global surface water dynamics from 1984 to 2015 in 2016, and the GSWE was later updated to 2018. The max extent water surface mask of GSWE contains any region where water has ever been detected during 1984-2018, and the max extent water surface is much larger the primary water body extraction in the first step. We use this max extent water surface as Max Water-GSWE. We resample all of these auxiliary masks to 10 m.

Post-Processing Preliminary Water Extraction
In checking the classification results by visual inspection, we find the confusion of the sand dunes and waters mainly occurs in the west region of China. So, we divide the whole of China into two parts: the east (including Beijing, Tianjin, Hebei, Liaoning, Shanghai, Zhejiang, Fujian, Shandong, Guangdong, Taiwan, Henan, Jiangsu, Anhui, Hubei, Hunan, Jiangxi and Hainan) and the west (the other provinces) (Figure 1). These commissions will be corrected using Water Mask-slope and Water Mask-OLI for the west. That is, we overlay the Water Mask-slope, Water Mask-OLI, and the primary monthly water body extracted in the first step. The pixels where the three layers are identified as the water body are the ultimate water map for the west.
There is also misclassification of water bodies with topography shadows in the east. The Max Water-GSWE is overlapped with the primary water extraction in the first step. The pixels where both layers marked as a water body are considered as ultimate water map for the east. Finally, we produce the monthly China water map by mosaicking the east water map and the west water map.

Validation of the High Spatial-Temporal Water Body Dataset in China (HSWDC)
Pekel et al. [11] published their paper and the GSWE, which included global surface water dynamics from 1984 to 2015 in 2016, and later updated the GSWE to 2018. Based on GSWE monthly water products in 2018, we divide China into two layers of water and land, and then use the stratified random sampling method to generate a set of points on each month of GSWE products. This set of sample points, marked with the corresponding month, includes water sample points and land sample points. We firstly generate a total of 14,523 sample points from the 12-month GSWE monthly water products. Because the GSWE data itself has some misclassification errors, which mainly come from mountain shadows, saline-alkali land, ridges, and roofs (Figure 4a), we visually check those samples one by one, combining the high-resolution images from Google Earth, and remove the sample points that are clearly not water bodies. In the end, there are 14,070 samples left, including 6554 water samples and 7516 land samples (Figure 4b). In order to avoid estimation errors caused by temporal inconsistence, the verification of the HSWDC is implemented monthly using the verification samples having the same date. In this way, we can obtain the number of correctly classified and incorrectly classified water/land samples per month. The confusion matrix is finally calculated based on the sum of the above corresponding samples across the year 2018. The overall accuracy of the HSWDC is 0.93, the kappa coefficient is 0.86, the omission error is 0.14, and the commission error is 0.01.  Additionally, the spatial distributions of surface water can clearly be depicted according to average water area which is calculated from the max and min water body area in 2018 (Figure 5b). The results show that China surface water is mainly distributed in the Continental basin and Yangtze River basin, accounting for 34.94% and 26% of the total surface water area, respectively. With 8.97% in the Huaihe River basin and 8.68% in the Songhua and Liaohe River basin, followed by the Pearl River basin, which accounts for 6.25%. The Yellow River Basin, Southwest Basin, Haihe River Basin, and Southeast Basin account for the other 15.16% of the national surface water area. Furthermore, the water area of the Continental Basin, Songhua and Liaohe River Basin, Yellow River Basin, Southwest Basin, and Haihe River Basin show an increasing trend, while the other river basins show a decreasing trend, and the total water area increase slightly during the period from 2016 to 2018 ( Figure 5).     Additionally, the spatial distributions of surface water can clearly be depicted according to average water area which is calculated from the max and min water body area in 2018 (Figure 5b). The results show that China surface water is mainly distributed in the Continental basin and Yangtze River basin, accounting for 34.94% and 26% of the total surface water area, respectively. With 8.97% in the Huaihe River basin and 8.68% in the Songhua and Liaohe River basin, followed by the Pearl River basin, which accounts for 6.25%. The Yellow River Basin, Southwest Basin, Haihe River Basin, and Southeast Basin account for the other 15.16% of the national surface water area. Furthermore, the water area of the Continental Basin, Songhua and Liaohe River Basin, Yellow River Basin, Southwest Basin, and Haihe River Basin show an increasing trend, while the other river basins show a decreasing trend, and the total water area increase slightly during the period from 2016 to 2018 ( Figure 5).  Additionally, the spatial distributions of surface water can clearly be depicted according to average water area which is calculated from the max and min water body area in 2018 (Figure 5b). The results show that China surface water is mainly distributed in the Continental basin and Yangtze River basin, accounting for 34.94% and 26% of the total surface water area, respectively. With 8.97% in the Huaihe River basin and 8.68% in the Songhua and Liaohe River basin, followed by the Pearl River basin, which accounts for 6.25%. The Yellow River Basin, Southwest Basin, Haihe River Basin, and Southeast Basin account for the other 15.16% of the national surface water area. Furthermore, the water area of the Continental Basin, Songhua and Liaohe River Basin, Yellow River Basin, Southwest Basin, and Haihe River Basin show an increasing trend, while the other river basins show a decreasing trend, and the total water area increase slightly during the period from 2016 to 2018 ( Figure 5).

Comparisons with Existing Datasets
The GSWE dataset, which is based on the long-term Landsat TM, ETM, and OLI images, provides the global monthly surface water area from 1984 to 2018. However, due to the impact of clouds and the long revisit period (16 days) of Landsat satellite, GSWE cannot actually provide monthly water dynamics across most areas of China, such as in the southern China where it has a subtropical humid monsoon climate and cloudy weather occurs frequently. Figure 6 shows the monthly water dynamics of Poyang Lake (which is located in southern China) in 2018 based on the GSWE dataset and the HSWDC respectively. As expected, the GSWE cannot provide the surface water maps during January, June, and December of 2018 because of the above limitations of Landsat satellites. Furthermore, we can find GSWE only extracts part of the actual water body surface in some months, such as May, September, October, and November.

Comparisons with Existing Datasets
The GSWE dataset, which is based on the long-term Landsat TM, ETM, and OLI images, provides the global monthly surface water area from 1984 to 2018. However, due to the impact of clouds and the long revisit period (16 days) of Landsat satellite, GSWE cannot actually provide monthly water dynamics across most areas of China, such as in the southern China where it has a subtropical humid monsoon climate and cloudy weather occurs frequently. Figure 6 shows the monthly water dynamics of Poyang Lake (which is located in southern China) in 2018 based on the GSWE dataset and the HSWDC respectively. As expected, the GSWE cannot provide the surface water maps during January, June, and December of 2018 because of the above limitations of Landsat satellites. Furthermore, we can find GSWE only extracts part of the actual water body surface in some months, such as May, September, October, and November. Choosing GSWE in August 2018 as the standard, it can be seen that the water surfaces extracted by HSWDC and GSWE have high consistency (gray in Figure 7). However, due to the high spatial resolution of Sentinel-1, HSWDC can present more narrow rivers and small ponds (orange in Figure  7). However, the limitations of the SAR image itself will cause some water surfaces to be missed in HSWDC (blue in Figure 7). Another possible reason for the inconsistency between the two datasets Choosing GSWE in August 2018 as the standard, it can be seen that the water surfaces extracted by HSWDC and GSWE have high consistency (gray in Figure 7). However, due to the high spatial resolution of Sentinel-1, HSWDC can present more narrow rivers and small ponds (orange in Figure 7). However, the limitations of the SAR image itself will cause some water surfaces to be missed in HSWDC (blue in Figure 7). Another possible reason for the inconsistency between the two datasets may come from the different observation date of the two satellites in the same month. The more frequent observation of waters can address this issue in the future.
Remote Sens. 2020, 9, x FOR PEER REVIEW 9 of 15 may come from the different observation date of the two satellites in the same month. The more frequent observation of waters can address this issue in the future. Taking Dongting Lake as a sample, we also analyze the temporal consistency between the HSWDC, the GSWE and lake water level from 9 January 2016 to 25 December 2018 with a 10-day temporal resolution. The water level from satellite altimetry comes from Envisat, ERS-2, Jason-1, Jason-2, TOPEX/Poseidon, and SARAL/AltiKa, and the root mean square difference between it and in situ data ranges from 4 to 36 cm [21]. Li et al. [35] also used the water level dataset to make a comparison with the water body dataset they produced. The reason why the water area detected by HSWDC is larger than GSWE is that HSWDC has a higher spatial resolution. With the decrease of water storage, the area of some small water bodies becomes smaller, so that GSWE cannot detect them. The difference between the water surface detected by HSWDC and the water surface monitored by GSWE in winter are more significant. Since the changes in water area and elevation are inconsistent, we can only compare the trends of water extent detected by HSWDC and water level in Figure 8. The HSWDC and the water level have highly consistent seasonal variation (Pearson correlation coefficient is 0.92), while the consistency between GSWE and water level is relatively poor (Pearson correlation coefficient is 0.84) (Figure 8). The results show that the HSWDC can effectively reflect the dynamics of the water body. Taking Dongting Lake as a sample, we also analyze the temporal consistency between the HSWDC, the GSWE and lake water level from 9 January 2016 to 25 December 2018 with a 10-day temporal resolution. The water level from satellite altimetry comes from Envisat, ERS-2, Jason-1, Jason-2, TOPEX/Poseidon, and SARAL/AltiKa, and the root mean square difference between it and in situ data ranges from 4 to 36 cm [21]. Li et al. [35] also used the water level dataset to make a comparison with the water body dataset they produced. The reason why the water area detected by HSWDC is larger than GSWE is that HSWDC has a higher spatial resolution. With the decrease of water storage, the area of some small water bodies becomes smaller, so that GSWE cannot detect them. The difference between the water surface detected by HSWDC and the water surface monitored by GSWE in winter are more significant. Since the changes in water area and elevation are inconsistent, we can only compare the trends of water extent detected by HSWDC and water level in Figure 8. The HSWDC and the water level have highly consistent seasonal variation (Pearson correlation coefficient is 0.92), while the consistency between GSWE and water level is relatively poor (Pearson correlation coefficient is 0.84) (Figure 8). The results show that the HSWDC can effectively reflect the dynamics of the water body.

Open Water Wetland Classification Based on HSWDC
The short revisit period of Sentinel-1 sensor has a good advantage in open water wetland classification. For example, the inundated duration is important for many wetland-dependent animals and plants, and even the greenhouse gas sequestration and emissions of wetlands. We can use the HSWDC data (water occurrence) to further classify waters into permanent and seasonal water types, which is key to the above ecological and environmental issues. According to the classification system of Ramsar Wetland Convention, seasonal swamps show characteristics of water during the flood period during one year, mud flats are submerged during the rainy season and exposed during the dry season, and permanent water bodies are regions that always show the characteristics of the water body in a year. By referencing the above wetland definitions and Xu et al. [36], we can identify permanent water body, mudflats, seasonal marshes and rice fields in Dongting Lake by using the water occurrence information of the HSWDC (Figure 9a). Results show that the wetland classification results by water occurrence information are consistent with the classification results of Chen et al. [37]. The difference between rice fields in Figure 9a and agricultural land in Figure 9b is that the agricultural land defined by Chen et al. includes dry crop land. One challenge in wetland mapping is water dynamics; therefore, the HSWDC could be applied to future wetland mapping to improve the accuracy of wetland classifications.

Open Water Wetland Classification Based on HSWDC
The short revisit period of Sentinel-1 sensor has a good advantage in open water wetland classification. For example, the inundated duration is important for many wetland-dependent animals and plants, and even the greenhouse gas sequestration and emissions of wetlands. We can use the HSWDC data (water occurrence) to further classify waters into permanent and seasonal water types, which is key to the above ecological and environmental issues. According to the classification system of Ramsar Wetland Convention, seasonal swamps show characteristics of water during the flood period during one year, mud flats are submerged during the rainy season and exposed during the dry season, and permanent water bodies are regions that always show the characteristics of the water body in a year. By referencing the above wetland definitions and Xu et al. [36], we can identify permanent water body, mudflats, seasonal marshes and rice fields in Dongting Lake by using the water occurrence information of the HSWDC (Figure 9a). Results show that the wetland classification results by water occurrence information are consistent with the classification results of Chen et al. [37]. The difference between rice fields in Figure 9a and agricultural land in Figure 9b is that the agricultural land defined by Chen et al. includes dry crop land. One challenge in wetland mapping is water dynamics; therefore, the HSWDC could be applied to future wetland mapping to improve the accuracy of wetland classifications.

Open Water Wetland Classification Based on HSWDC
The short revisit period of Sentinel-1 sensor has a good advantage in open water wetland classification. For example, the inundated duration is important for many wetland-dependent animals and plants, and even the greenhouse gas sequestration and emissions of wetlands. We can use the HSWDC data (water occurrence) to further classify waters into permanent and seasonal water types, which is key to the above ecological and environmental issues. According to the classification system of Ramsar Wetland Convention, seasonal swamps show characteristics of water during the flood period during one year, mud flats are submerged during the rainy season and exposed during the dry season, and permanent water bodies are regions that always show the characteristics of the water body in a year. By referencing the above wetland definitions and Xu et al. [36], we can identify permanent water body, mudflats, seasonal marshes and rice fields in Dongting Lake by using the water occurrence information of the HSWDC (Figure 9a). Results show that the wetland classification results by water occurrence information are consistent with the classification results of Chen et al. [37]. The difference between rice fields in Figure 9a and agricultural land in Figure 9b is that the agricultural land defined by Chen et al. includes dry crop land. One challenge in wetland mapping is water dynamics; therefore, the HSWDC could be applied to future wetland mapping to improve the accuracy of wetland classifications.

Identification of Frozen Water Body
The existing water dataset products derived from the optical images could have many omissions of mapping water bodies during the icing period, which mainly results from the limitations of the optical sensors in identifying waters and ices/snows. The free water and frozen water, however, have similar scattering coefficients on the Sentinel-1 images (Figure 2). Consequently, the SAR-based water mapping can break through the limitations of optical images and map the frozen water. Taking the Selinco Lake located on the Tibetan Plateau as an example, it begins to freeze in December and does not completely freeze until the end of January, begins to ablate in March of the following year, and achieves full ablation in April [38]. As shown in Figure 10, compared with the ISWDC, the HSWDC can map the complete water surface even during the freezing months. The advantage of the SAR-based water mapping means the HSWDC can extract a more complete surface water body when it comes to China's water bodies.
Remote Sens. 2020, 9, x FOR PEER REVIEW 11 of 15 The existing water dataset products derived from the optical images could have many omissions of mapping water bodies during the icing period, which mainly results from the limitations of the optical sensors in identifying waters and ices/snows. The free water and frozen water, however, have similar scattering coefficients on the Sentinel-1 images (Figure 2). Consequently, the SAR-based water mapping can break through the limitations of optical images and map the frozen water. Taking the Selinco Lake located on the Tibetan Plateau as an example, it begins to freeze in December and does not completely freeze until the end of January, begins to ablate in March of the following year, and achieves full ablation in April [38]. As shown in Figure 10, compared with the ISWDC, the HSWDC can map the complete water surface even during the freezing months. The advantage of the SARbased water mapping means the HSWDC can extract a more complete surface water body when it comes to China's water bodies.

Uncertainty of This Study
This paper proposes a threshold method based on Sentinel-1 SAR imagery to develop an unprecedented high spatiotemporal resolution water body dataset. We use 14,070 samples to verify the accuracy of HSWDC. These points are mainly derived from stratified random sampling. They may not have enough points to fall on the edge of the water body and small water body, so the actual error may be higher than the error value we obtain. The error mainly comes from two aspects: the limitation of the Sentinel-1 image itself and the error of auxiliary data.
The C-band electromagnetic wave of the Sentinel-1 sensor makes the detection of open water relatively simple, with almost no signal returning to the antenna. When the water level is high or the transmission is low, the radar signal is usually attenuated, while when the water level is low relative to the vegetation, double rebound scattering may occur [16]. Some studies have tried to detect water with obvious vegetation canopy on its surface [26,39,40], but most of these studies require the relative heights of vegetation and water surface and the distribution characteristics of vegetation leaves as input parameters of the model. These parameters need to be measured in the wild, so these existing studies are basically limited to small scales. The influence of the complex scattering mechanism between vegetation and water is difficult to unify in a large region, and the water with significant vegetation canopy over its surface contributes very little to the total amount of water in China, so we ignore it in the method we design. In addition, during the radar-scanning process, the amplitude of sub-spherical waves between pixels may be repeated or attenuated, and finally the addition, subtraction, and difference generate random variables. This phenomenon is reflected as speckle noise in SAR images. For speckle noise, previous researches provide two schemes: one is to filter the Sentinel-1 image before water extraction [41], and the other is to perform a morphological operation after water extraction to remove speckle noise [33,34]. We have conducted experiments on these two methods, and the results show that while these methods reduce noise, they also eliminate some small waters in the HSWDC, such as narrow rivers ( Figure 11). Therefore, we retain the original results in our study, but the post-processing could be carried on according to their specific needs.

Uncertainty of This Study
This paper proposes a threshold method based on Sentinel-1 SAR imagery to develop an unprecedented high spatiotemporal resolution water body dataset. We use 14,070 samples to verify the accuracy of HSWDC. These points are mainly derived from stratified random sampling. They may not have enough points to fall on the edge of the water body and small water body, so the actual error may be higher than the error value we obtain. The error mainly comes from two aspects: the limitation of the Sentinel-1 image itself and the error of auxiliary data.
The C-band electromagnetic wave of the Sentinel-1 sensor makes the detection of open water relatively simple, with almost no signal returning to the antenna. When the water level is high or the transmission is low, the radar signal is usually attenuated, while when the water level is low relative to the vegetation, double rebound scattering may occur [16]. Some studies have tried to detect water with obvious vegetation canopy on its surface [26,39,40], but most of these studies require the relative heights of vegetation and water surface and the distribution characteristics of vegetation leaves as input parameters of the model. These parameters need to be measured in the wild, so these existing studies are basically limited to small scales. The influence of the complex scattering mechanism between vegetation and water is difficult to unify in a large region, and the water with significant vegetation canopy over its surface contributes very little to the total amount of water in China, so we ignore it in the method we design. In addition, during the radar-scanning process, the amplitude of sub-spherical waves between pixels may be repeated or attenuated, and finally the addition, subtraction, and difference generate random variables. This phenomenon is reflected as speckle noise in SAR images. For speckle noise, previous researches provide two schemes: one is to filter the Sentinel-1 image before water extraction [41], and the other is to perform a morphological operation after water extraction to remove speckle noise [33,34]. We have conducted experiments on these two methods, and the results show that while these methods reduce noise, they also eliminate some small waters in the HSWDC, such as narrow rivers ( Figure 11). Therefore, we retain the original results in our study, but the post-processing could be carried on according to their specific needs. In checking the classification results by visual inspection, we find the confusion of the sand dunes and waters mainly occurs in the west region of China. The NDWI can be used to distinguish between sand dunes and water; therefore, we use NDWI calculated based on Landsat OLI images to generate Water Mask-OLI. In order to avoid the restriction of a static mask, the annual water mask is buffered 10 m outwards to obtain Water Mask-OLI, using a morphological dilation operation. We overlap Water Mask-OLI and the water body we produce to exclude the impact of the dunes in the west. This method is based on the assumption that the dynamic change of the surface of the western water body is less than 10 m, which could bring some uncertainty to the west region. In addition, the use of STRM DEM and GSWE maximum water extent map may limit the water surface area of HSWDC or introduce their errors to HSWDC. In the future, a new classification method that does not use these auxiliary data can be developed. Sentinel-1 data in GEE is only processed through primary processing such as noise removal, calibration and geocoding. As we all know, layover and shadow will bring some errors to the extraction of water based on radar images [17]. In this study, compositing ascending and descending SAR scenes have reduced this error but do not fully eliminate it. It is also urgent to invent a suitable method for eliminating the effects of layover and shadow on radar images at a large scale.

Conclusions
In this study, we develop a threshold method for the large-scale SAR-based water mapping based on massive experiments with different land-cover type samples across China. The thresholdbased water mapping method, which is more universal than previous studies on individual water bodies mapping, is proven to be robust and applicable across different seasons in a year. The Sentinel-1 images from 2016 to 2018 are then employed to construct the HSWDC using the cloud-based GEE platform and other auxiliary data. Compared with the existing surface water datasets, the HSWDC has the advantages of unprecedented spatial resolution (10 m) and period observation at the month scale. In addition, the HSWDC can effectively provide surface water distribution even in the freezing period. Due to the characteristics of SAR, which are not affected by clouds, the HSWDC can be an ideal alternative data source for future wetland mapping in cloudy regions.  In checking the classification results by visual inspection, we find the confusion of the sand dunes and waters mainly occurs in the west region of China. The NDWI can be used to distinguish between sand dunes and water; therefore, we use NDWI calculated based on Landsat OLI images to generate Water Mask-OLI. In order to avoid the restriction of a static mask, the annual water mask is buffered 10 m outwards to obtain Water Mask-OLI, using a morphological dilation operation. We overlap Water Mask-OLI and the water body we produce to exclude the impact of the dunes in the west. This method is based on the assumption that the dynamic change of the surface of the western water body is less than 10 m, which could bring some uncertainty to the west region. In addition, the use of STRM DEM and GSWE maximum water extent map may limit the water surface area of HSWDC or introduce their errors to HSWDC. In the future, a new classification method that does not use these auxiliary data can be developed. Sentinel-1 data in GEE is only processed through primary processing such as noise removal, calibration and geocoding. As we all know, layover and shadow will bring some errors to the extraction of water based on radar images [17]. In this study, compositing ascending and descending SAR scenes have reduced this error but do not fully eliminate it. It is also urgent to invent a suitable method for eliminating the effects of layover and shadow on radar images at a large scale.

Conclusions
In this study, we develop a threshold method for the large-scale SAR-based water mapping based on massive experiments with different land-cover type samples across China. The threshold-based water mapping method, which is more universal than previous studies on individual water bodies mapping, is proven to be robust and applicable across different seasons in a year. The Sentinel-1 images from 2016 to 2018 are then employed to construct the HSWDC using the cloud-based GEE platform and other auxiliary data. Compared with the existing surface water datasets, the HSWDC has the advantages of unprecedented spatial resolution (10 m) and period observation at the month scale. In addition, the HSWDC can effectively provide surface water distribution even in the freezing period. Due to the characteristics of SAR, which are not affected by clouds, the HSWDC can be an ideal alternative data source for future wetland mapping in cloudy regions.