Evaluating the Performance of Sentinel-1A and Sentinel-2 in Small Waterbody Mapping over Urban and Mountainous Regions

: Accurate waterbody mapping can support water-related environment monitoring and resource management. The Sentinel series satellites provide high-quality Synthetic Aperture Radar (SAR) and optical observations that are commonly used in waterbody mapping. However, owing to the 10-m spatial resolution of Sentinel data, previous studies mostly focused on the mapping of large waterbodies. In this work, we evaluated the performance of small waterbody mapping over urban and mountainous regions with two datasets, the average annual VH backscatter coefﬁcients (VH avg ), derived from the Sentinel-1A series, and the Modiﬁed Normalized Difference Water Index (MNDWI), derived from cloud-free Sentinel-2. A proven framework of waterbody mapping based on watershed segmentation and noise reduction was employed to assess the performance of the two datasets in waterbody identiﬁcation. The validation was performed by comparing their results with 1-m spatial resolution reference waterbody data. Assessment metrics, including Precision, Recall, and F-measure, were employed. Results showed that: (1) the MNDWI outperformed the VH avg by 9 percentage points of the F-measure; (2) there was more room for results of VH avg to improve the accuracy through a combination with noise reduction; and (3) the potential smallest identiﬁable waterbody area (recall rate larger than 0.8) was larger than 10 4 m 2 . F-measure were employed. The results showed that: (1) The MNDWI outperformed the VH avg by 9 percentage points of the F-measure. (2) There was more room for the results of VH avg to improve the accuracy in combination with noise reduction, i.e., the Precision of noise reduction could be improved by 4 percentage points with 1 percentage point cost in Recall. (3) The potential smallest identiﬁable waterbody area (Recall rate larger than 0.8) was larger than 10 4 m 2 . Future work will evaluate the waterbody mapping methods using GF-2 data of 0.8 m (high-resolution) in Guangdong Province, China.


Introduction
Surface waterbodies, including rivers, channels, ponds, lakes, and reservoirs, play an essential role in socioeconomic development and ecosystem balance, and provide irreplaceable natural resources for humans' survival and development [1,2]. The waterbodies in Guangzhou City, an important central city in China with a permanent population of 15.31 million at the end of 2019, are under intensive pressure because of human activities (e.g., housing development, dredging, and agricultural pollution) or natural phenomena (e.g., extreme weather). To better understand waterbody changes and to allow comprehensive water-related studies and planning activities, such as water resource management and water disaster prevention, it is critical to conduct waterbody mapping [3].
Remote sensing is a powerful tool for measuring and monitoring waterbody dynamics. In the past several decades, numerous satellite sensors capable of waterbody mapping were launched. The remotely sensed data can be divided into two categories based on their imaging principles. (1) Optical remote sensing is a passive technique that relies on (1) Cloud/cloud shadows: SAR images are free of cloud-related contamination. However, the optical images are easily affected by clouds and cloud shadows. Thus, it is necessary to select cloudless images through visual or automatic methods [18]. (2) Mountain/built-up shadows: The mountain/skyscraper shadow is caused by similar factors as optical and SAR images: An object blocks the path of direct radiation in optical imaging and the radar beam in the case of SAR. However, unlike optical imagery, in which objects in shadows can be seen because of atmospheric scattering, there is no information in a SAR shadow because there is no return signal. Theoretically, the mountain shadow can be identified by simulating hill-shading with Digital Elevation Models (DEMs) and the solar azimuth and elevation at the time of image acquisition. However, owing to the limitation of DEM resolution and the influence of land cover, using DEM alone is often not enough to eliminate mountain shadows in practice. Thus, ancillary data other than DEM are required. (3) Flat surfaces: Noises like those made by asphalt, cement surfaces, and roads mainly appear in SAR images rather than in optical images. Generally, the rough surfaces scatter the energy and return a significant amount back to the antenna, resulting in a bright feature. In contrast, the flat surfaces reflect the signal away, resulting in a dark feature. As waterbodies absorb radar energy, they also exhibit a dark feature. Flat surfaces are often misidentified as waterbodies by using images of SAR backscatter coefficients. However, these noises are usually different from waterbodies in infrared bands of optical data. (4) Ships: The ships have a high reflectivity or backscatter coefficient that may cause "holes" in the waterbody extraction results.
The dense time series of Sentinel offers a unique opportunity to systematically monitor waterbodies by both optical and SAR measurements with similar features, including fine spatial resolution (approximately 10 m), frequent revisit (12 days for Sentinel-1A, 5 days for Sentinel-2 A/B together), and accessibility.
Previous studies mostly focused on the mapping of large waterbodies. It is not clear how effective Sentinel data are in small waterbody mappings over urban and mountainous regions, where multiple types of noise exist. It is necessary to analyze the waterbodies' pixel values in images to determine the capabilities and limitations of Sentinel data in waterbody identification. In this study, we defined the smallest identifiable waterbody area level as the first time the Recall rate at this waterbody area level reached a threshold of 0.8.
This study attempted to answer three questions: (1) What is the difference in the waterbody identification performance between Sentinel-1 and Sentinel-2 data? (2) How much improvement can some classic noise reduction methods bring to the accuracy of waterbody identification? (3) For urban and mountainous regions, how large is the smallest identifiable waterbody for Sentinel data at a 10-m resolution? The remainder of this article is organized as follows. Section 2 describes the experimental materials, a previously developed framework of waterbody mapping, and assessment metrics. Section 3 presents the analysis of the experimental results. Section 4 provides the discussion. Conclusions and future research are given in Section 5.

Study Area
The study area was Guangzhou City, China. The total area of the study region is 7434.4 km 2 . The topography of Guangzhou is mountainous and densely covered with waterways. The terrain heights decrease from north to south. The highest peak has an elevation of 1210 m. The northeast is a moderately mountainous area; the middle is a hilly basin; and the south is a coastal floodplain, a component of the Pearl River Delta [19,20]. According to the Guangzhou Water Resources Bulletin of 2018, the total waterbody area was estimated at 744 km 2 , which accounted for 10.05% of the city's total land area [21].
The extent of the study area is presented in Figure 1. The yellow area was chosen as the training area, and the green area was the testing area.

Data
For this study, three imagery data sources, Sentinel-1A (S1A), Sentinel-2 (S2), and SRTM DEM, were chosen for waterbody mapping and result pruning. The data and preprocessing details are described below.

Sentinel-1A Data and Pre-Processing
The S1A interferometric wide swath (IW) GRD product has a 12-day revisiting period. Thus, each location covers 30 to 31 images per year. Among the entire study area, 421 images in 2017 were downloaded from the European Space Agency (ESA) Sentinels Scientific Data Hub website (https://scihub.copernicus.eu/dhus, accessed on 25 March 2021). The S1A product contains both VH and VV polarization, but only VH polarization was selected for waterbody mapping.
The S1A data were preprocessed using the Sentinel Application Platform (SNAP) toolbox (version 6.0.0). The preprocessing stages included: (1) radiometric calibration using the export image band as the sigma naught band (σ • ), (2) ortho-rectification using range Doppler terrain correction with the Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM), (3) re-projection to Sinusoidal projection, World Geodetic System (WGS) 84, and resampling to a spatial resolution of 10 m using the bilinear interpolation method, (4) calculation of the average composite of annual VH σ • band time series using Equations (1) and (5) conversion of the sigma naught band from linear backscatter units to logarithmic decibels (dB) [22].
where VH annual is VH σ • band time series per year.
where VH annual is VH σ° band time series per year.

Sentinel-2 Data and Pre-Processing
The Sentinel-2 data used in this study were derived using the Sentinel-2A&B L1C product, which was already rectified through radiometric and geometric corrections [23]. The product was downloaded via the Sentinel-2 repository on the Amazon Simple Storage Service (registry.opendata.aws/sentinel-2, accessed on 25 March 2021). Four indices were calculated from Sentinel-2 data, MNDWI (Modified Normalized Difference Water Index), NDBI (Normalized Difference Built-up Index), NDVI (Normalized Difference Vegetation Index), and NDVImax using Equation (2) to Equation (5):

Sentinel-2 Data and Pre-Processing
The Sentinel-2 data used in this study were derived using the Sentinel-2A&B L1C product, which was already rectified through radiometric and geometric corrections [23]. The product was downloaded via the Sentinel-2 repository on the Amazon Simple Storage Service (registry.opendata.aws/sentinel-2, accessed on 25 March 2021). Four indices were calculated from Sentinel-2 data, MNDWI (Modified Normalized Difference Water Index), NDBI (Normalized Difference Built-up Index), NDVI (Normalized Difference Vegetation Index), and NDVI max using Equation (2) to Equation (5): where ρ green , ρ red , ρ NIR , and ρ SWIR1 are the visible green, red, near-infrared, and short-wave infrared bands in S2 data, and NDVI annual is the NDVI time series per year. The ρ SWIR1 band with a 20-m spatial resolution was resampled to 10 m to match the other bands using the bilinear interpolation method.

Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM)
The topographic SLOPE data were produced from the SRTM DEM product with 1-arcsecond (nearly 30 m) spatial resolution and then resampled to 10 m to match the Sentinel data using the bilinear interpolation method.

Waterbody Reference Data
The waterbody reference data (WRD) of Guangzhou City were manually delineated according to high-resolution optical images of Google Maps and Bing Maps from 2017 to 2019 ( Figure 2). Ubukawa [24] evaluated the position error of the two Maps in 10 cities around the world (including Shanghai, China) and found that the root mean square error (RMSE) for the satellite imagery represented in Google Maps and Bing Maps was 8.2 m and 7.9 m, respectively. This position error was less than the 10-m resolution of the Sentinel data, indicating that the data can verify the Sentinel data.
where ρgreen, ρred, ρNIR, and ρSWIR1 are the visible green, red, near-infrared, and short-wave infrared bands in S2 data, and NDVIannual is the NDVI time series per year. The ρSWIR1 band with a 20-m spatial resolution was resampled to 10 m to match the other bands using the bilinear interpolation method.

Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM)
The topographic SLOPE data were produced from the SRTM DEM product with 1arcsecond (nearly 30 m) spatial resolution and then resampled to 10 m to match the Sentinel data using the bilinear interpolation method.

Waterbody Reference Data
The waterbody reference data (WRD) of Guangzhou City were manually delineated according to high-resolution optical images of Google Maps and Bing Maps from 2017 to 2019 ( Figure 2). Ubukawa [24] evaluated the position error of the two Maps in 10 cities around the world (including Shanghai, China) and found that the root mean square error (RMSE) for the satellite imagery represented in Google Maps and Bing Maps was 8.2 m and 7.9 m, respectively. This position error was less than the 10-m resolution of the Sentinel data, indicating that the data can verify the Sentinel data.  According to the reference data, there were 75,368 polygons of waterbodies, and the associated area was 656.63 km 2 . The size of the WRD was less than the estimated size in Guangzhou Water Resources Bulletin 2018 [21]. The possible reason was that some of the images used for WRD delineation were acquired in the dry season, and therefore part of the waterbodies failed to be extracted. Each waterbody was classified into one of five types: rivers, water ditches, reservoirs, lakes, and ponds. The details are shown in Table 1.

Framework of Waterbody Mapping
For a comparison of the performance of the S1A SAR and S2 optical datasets in waterbody extraction, we applied an automatic framework of waterbody extraction that was independent of the two datasets ( Figure 3). The framework included two stages: the first was the initial waterbody extraction, which was adapted from existing methods [4]; the second was noise reduction, which was designed according to the characteristics of the study area.

Initial Waterbody Extraction
Previous studies suggested that marker-controlled watershed segmentation (MCW) is useful for waterbody mapping as it performs better in the edge position of waterbody compared with using a single threshold to distinguish a waterbody/non-water area [4]. A typical waterbody extraction process using MCW requires three steps: (1) Identifying the markers: For all pixels in the indices, including VH avg or MNDWI. The waterbody or non-water area with higher credibility is first marked; (2) Calculating the gradient of the indices: The indices are then applied with the Sobel operator to produce the gradient image, which is used to determine the boundaries between the markers of the waterbody and non-water area; and (3) Performing MCW: By using markers of the waterbody/nonwater area and gradient of the indices, the MCW expands each marker iteratively until all unmarked pixels are marked as either waterbody or non-water.
In this method, the latter two steps are relatively fixed because there are no parameters involved. However, the markers in the first step are relatively sensitive to the segmentation result. As a result, the parameters that control the selection of markers require tuning.
We applied a thresholding method to produce the waterbody/non-water markers automatically. For each index, there were two thresholds associated with the typical value of the waterbody/non-water. Moreover, for each index, the statistics derived from WRD and images were applied to determine the two thresholds through Equations (6) and (7): waterbodiesif VH avg ≤ VH w non waterbodiesif VH avg > VH nw (6)   We applied a thresholding method to produce the waterbody/non-water markers automatically. For each index, there were two thresholds associated with the typical value of the waterbody/non-water. Moreover, for each index, the statistics derived from WRD and images were applied to determine the two thresholds through Equations (6) and (7): waterbodies if VH avg ≤ VH w non waterbodies if VH avg > VH nw , (6) waterbodies if MNDWI ≥ MNDWI w non waterbodies if MNDWI < MNDWI nw , (7) where VH , VH , MNDWI , and MNDWI are thresholds to identify waterbody/nonwater with higher credibility in the VHavg and MNDWI indices, respectively. To determine the four thresholds, the WRD of the training area was applied to analyze the pixel values' statistics for waterbody and non-water. The method was: (1) Generate the non-water polygons: The non-water (NW) polygons were generated using Equation (8).
where PWRD and PNW are the polygons of WRD and NW, respectively. The Buffering3 and Buffering1 are buffers of the WRD data with 3 or 1 pixels, respectively.
(2) Statistics on pixel values of waterbodies by each area level and non-water: For VHavg and MNDWI, we calculated the median of pixel values associated with NW data and WRD for each waterbody area level.  waterbodiesif MNDWI ≥ MNDWI w non waterbodiesif MNDWI < MNDWI nw (7) where VH w , VH nw , MNDWI w , and MNDWI nw are thresholds to identify waterbody/nonwater with higher credibility in the VH avg and MNDWI indices, respectively. To determine the four thresholds, the WRD of the training area was applied to analyze the pixel values' statistics for waterbody and non-water. The method was: (1) Generate the non-water polygons: The non-water (NW) polygons were generated using Equation (8).
where P WRD and P NW are the polygons of WRD and NW, respectively. The Buffering 3 and Buffering 1 are buffers of the WRD data with 3 or 1 pixels, respectively. At last, the initial waterbody mask (IWM) was extracted. However, the mask might contain a lot of noise at this stage, including mountain shadows, flat surfaces, and holes. Thus, a second stage of result pruning was required to correct and filter the IWM.

Noise Reduction
Three indices, SLOPE, NDVI, and NDBI, were involved in the noise reduction stage: (1) Mountain shadow removal by SLOPE: The SLOPE was first applied with the thresholding method to remove pixels in IWM that satisfied Equation (9). However, the use of slope data alone could not wholly exclude mountain shadows as the spatial resolution of SRTM DEM data was lower than that of the Sentinel data.
(2) Mountain shadow removal by NDVI max : Mountain shadows are significant noise in waterbody identification. However, the mountains are usually covered by dense vegetation in southern China. Thus, it was possible to use the NDVI max that satisfied Equation (10) to eliminate mountain shadows.
(3) Flat surface removal by NDBI: Waterbodies and flat surfaces have a similar SAR backscattering coefficient value. For optical data, these two types exhibit significant differences in the near-infrared and short-wave infrared bands. The NDBI was designed to identify built-up areas and barren land. Thus, Equation (11) was applied to distinguish between a waterbody and a flat surface.
(4) Hole removal by binary image morphology: The waterbody extraction results may contain some "holes" caused by ships or islands. A binary image morphology method was thus applied to remove the holes. Additionally, a threshold satisfying Equation (12) was set on the maximum pixels of the holes to avoid holes caused by islands. Area hole < 10 (12)

Accuracy Assessment
In this study, we defined "Accurate waterbody mapping" as mapping the actual distribution of waterbodies at a certain spatial resolution. This ensured that enough waterbodies were correctly identified and that the area of the water bodies was realistic. Thus, the accuracy of the extraction results was evaluated from two aspects: waterbody number and area.
The evaluation of the waterbody area was to calculate the identification rate of overall waterbody pixels. This helped us observe how the overall waterbody area was identified against the true waterbody area. This method is commonly used for the verification of waterbody mapping results [17,25]. However, these metrics may not be able to express the misidentification of small waterbodies, because the area of small waterbodies only accounts for a small part of the total area of waterbodies.
Therefore, we applied another verification method based on the waterbody number, which is to examine if the waterbody individuals are extracted correctly or falsely. To this end, if the proportion of correct identified pixels to total pixels for a waterbody of WRD is equal to or greater than 0.5, it is correctly extracted; otherwise, it is considered to be wrongly extracted. This helped us observe how many of the individual waterbodies were correctly identified. This method is commonly used for object identification [26]. In this method, each waterbody, regardless of its area, is considered as equally important. For both evaluations, three metrics, Precision, Recall, and F-measure, were calculated by comparing the results with the WRD: where true positive (TP) and false positive (FP) are correct and incorrect identifications, respectively. Note that the correct identification for evaluation of the waterbody number meant at least one pixel in a candidate waterbody had been detected. In the assessment of waterbody area, it signified that one pixel of an overall waterbody had been detected. False negative (FN) is the misidentification. Precision represents the probability that the identifications are valid. Recall is the probability that waterbodies were correctly identified, and F-measure is the harmonic mean of Precision and Recall.

Statistics on the Waterbody Value in Images
The distribution of VH avg and MNDWI according to waterbodies of different area sizes and non-water areas is shown in Figures 4 and 5, respectively. We learned that the corresponding value of VH avg and MNDWI gradually diverged from the value of the nonwater area as the waterbody area increased. For clarity, we divided all of the waterbodies according to their areas into three levels: small waterbodies (smaller than 10 3 m 2 ), medium waterbodies (larger than 10 3 m 2 but smaller than 10 6 m 2 ), and large waterbodies (larger than 10 6 m 2 ).
Water 2021, 13, x FOR PEER REVIEW 9 of 16 correctly identified. This method is commonly used for object identification [26]. In this method, each waterbody, regardless of its area, is considered as equally important. For both evaluations, three metrics, Precision, Recall, and F-measure, were calculated by comparing the results with the WRD: Precision = , (13) Recall = ,

Fmeasure = 2 Precision Recall Precision Recall ,
where true positive (TP) and false positive (FP) are correct and incorrect identifications, respectively. Note that the correct identification for evaluation of the waterbody number meant at least one pixel in a candidate waterbody had been detected. In the assessment of waterbody area, it signified that one pixel of an overall waterbody had been detected. False negative (FN) is the misidentification. Precision represents the probability that the identifications are valid. Recall is the probability that waterbodies were correctly identified, and F-measure is the harmonic mean of Precision and Recall.

Statistics on the Waterbody Value in Images
The distribution of VHavg and MNDWI according to waterbodies of different area sizes and non-water areas is shown in Figures 4 and 5, respectively. We learned that the corresponding value of VHavg and MNDWI gradually diverged from the value of the nonwater area as the waterbody area increased. For clarity, we divided all of the waterbodies according to their areas into three levels: small waterbodies (smaller than 10 3 m 2 ), medium waterbodies (larger than 10 3 m 2 but smaller than 10 6 m 2 ), and large waterbodies (larger than 10 6 m 2 ).   The median value of MNDWI increased as the waterbody's area increased; for nonwater, it was −0.16. For small waterbodies, the two median values were −0.14 and −0.08, which were very close to that of the non-water areas. For medium waterbodies, the three median values were 0.18, 0.32, and 0.40, respectively. For large waterbodies, it was higher than 0.70.
We found that the pixel values changed rapidly in two places. One was in the bar of 10 3 and 10 4 . We defined the value of a high-confidence non-water area as the arithmetic mean of the median value of the two bars. The other was in the bar of 10 6 and 10 7 . We defined the value of a high-confidence waterbody as the arithmetic mean of the median value of the two bars.
As a result, VHavg pixel values that were smaller than −24.56 or larger than −20.55 were marked as waterbodies/non-water areas. Similarly, the pixel value of MNDWI values larger than 0.55 or smaller than −0.05 was marked as waterbodies/non-water areas.

Accuracy of Waterbody Identification
The assessment of identification accuracy contained three parts: (1) Accuracy of Waterbodies' Total Area Table 2 demonstrates that: (1) The noise reduction achieved a more significant improvement in VHavg than in MNDWI. Notably, comparing the results of noise reduced and initially extracted, for the VHavg dataset, the Precisiona improved by approximately 4 percentage points at the cost of lowering the Recalla by nearly 1 percentage point; and (2) The Recalla obtained by MNDWI was significantly higher than that of VHavg. Moreover, the F-measurea for VHavg and MNDWI were approximate 0.662 and 0.752, respectively. The median value of VH avg generally decreased as the waterbody's area increased; for non-water, it was −18.07. For small waterbodies, the two median values were −19.20 and −19.18, which were very close to that of the non-water areas. For medium waterbodies, there were three bars with median values of −21.93, −22.94, and −23.55, respectively. For large waterbodies, it was lower than −25.58.
The median value of MNDWI increased as the waterbody's area increased; for nonwater, it was −0.16. For small waterbodies, the two median values were −0.14 and −0.08, which were very close to that of the non-water areas. For medium waterbodies, the three median values were 0.18, 0.32, and 0.40, respectively. For large waterbodies, it was higher than 0.70.
We found that the pixel values changed rapidly in two places. One was in the bar of 10 3 and 10 4 . We defined the value of a high-confidence non-water area as the arithmetic mean of the median value of the two bars. The other was in the bar of 10 6 and 10 7 . We defined the value of a high-confidence waterbody as the arithmetic mean of the median value of the two bars.
As a result, VH avg pixel values that were smaller than −24.56 or larger than −20.55 were marked as waterbodies/non-water areas. Similarly, the pixel value of MNDWI values larger than 0.55 or smaller than −0.05 was marked as waterbodies/non-water areas.

Accuracy of Waterbody Identification
The assessment of identification accuracy contained three parts: (1) Accuracy of Waterbodies' Total Area Table 2 demonstrates that: (1) The noise reduction achieved a more significant improvement in VH avg than in MNDWI. Notably, comparing the results of noise reduced and initially extracted, for the VH avg dataset, the Precision a improved by approximately 4 percentage points at the cost of lowering the Recall a by nearly 1 percentage point; and (2) The Recall a obtained by MNDWI was significantly higher than that of VH avg . Moreover, the F-measure a for VH avg and MNDWI were approximate 0.662 and 0.752, respectively. (2) Accuracy of waterbody number The results of the waterbody identification were derived from two datasets (VH avg and MNDWI) and two processing stages (initially extracted and noise-reduced). Several points were made, as shown in Table 3: (1) The noise reduction increased the Recall n and decreased the Precision n as expected. Nevertheless, the highest Recall n value in the four sets of results was less than 0.27, indicating that a large number of small waterbodies have been misidentified. (2) The three metrics achieved using MNDWI were higher overall than those achieved using VH avg . (3) The F-measure n for both datasets were lower than 0.409. (3) Recall rate by waterbody area level We further tested the Recall n of the identified waterbodies against their area level. As Table 4 shows, the Recall n could also be divided into three levels: for small waterbodies, it was lower than 0.028; for medium waterbodies, it ranged from 0.167 to 0.847; and for large waterbodies, it was larger than 0.75. The situation was consistent with that of the training dataset. The table also shows that the SIWAL values for VH avg and MNDWI were larger than 10 7 and 10 4 m 2 , respectively.

Noise Reduction
Comparing the noise reduction results with those of the initially extracted results of both VH avg and MNDWI, the noise reduction increased Precision a and Precision n and deceased Recall a and Recall n . However, the changes in the last three metrics were only around 1 percentage point. This was because the noise reduction method's thresholds were relatively loose and had a small effect on the results. The only exception was that the Precision a for VH avg increased by 4 percentage points. Thus, it can be inferred that there was some large area of noise introduced. After checking the results, we found that the major misidentifications were caused by reasons that included mountain shadows, artificial facilities, and ships. Figure 6 shows a typical case of mountain shadows denoising. The terrain in this area is quite undulating. For example, the altitude of the Liuxihe Reservoir is only 160 m. However, the altitude of the adjacent Liuxihe National Forest Park reaches 1000 m. In this case, the VH avg (Figure 6e) generated significantly more noises than MNDWI (Figure 6f). However, SAR is an active remote sensing device, which does not rely on sunlight. Moreover, SAR data are vulnerable to shadowing effects from steep mountains because of the side-looking viewing geometry of SAR satellite sensors [12]. Figure 6 shows a typical case of mountain shadows denoising. The terrain in this area is quite undulating. For example, the altitude of the Liuxihe Reservoir is only 160 m. However, the altitude of the adjacent Liuxihe National Forest Park reaches 1000 m. In this case, the VHavg (Figure 6e) generated significantly more noises than MNDWI (Figure 6f). However, SAR is an active remote sensing device, which does not rely on sunlight. Moreover, SAR data are vulnerable to shadowing effects from steep mountains because of the sidelooking viewing geometry of SAR satellite sensors [12]. Figure 7 shows a typical case where the runway of Guangzhou Baiyun International Airport was misidentified as a waterbody using VHavg in initially extracted results and was then removed using flat surface removal by NDBI. There were similar noises, including viaducts, highways, playgrounds, and golf courses, that occurred in the results of VHavg. However, they did not appear in the results of MNDWI.
Typically, the composition of time series backscatter coefficients such as averaging can help reduce random factors such as speckle noise or sidelobes. However, for some waterways with intensive ship traffic, such as those shown in Figure 8, the value of the average backscatter coefficients was high, making the rivers extracted by VHavg reasonably fragmented.   Figure 7 shows a typical case where the runway of Guangzhou Baiyun International Airport was misidentified as a waterbody using VH avg in initially extracted results and was then removed using flat surface removal by NDBI. There were similar noises, including viaducts, highways, playgrounds, and golf courses, that occurred in the results of VH avg . However, they did not appear in the results of MNDWI.
Typically, the composition of time series backscatter coefficients such as averaging can help reduce random factors such as speckle noise or sidelobes. However, for some waterways with intensive ship traffic, such as those shown in Figure 8, the value of the average backscatter coefficients was high, making the rivers extracted by VH avg reasonably fragmented. Water 2021, 13, x FOR PEER REVIEW 13 of 16

Comparison of Sentinel-1A & 2 in Waterbody Mapping
To enable comparing the two datasets, VH avg and MNDWI, we used the same training data and training methods to determine the parameters in the waterbody mapping framework. As Tables 2-4 show, the MNDWI outperformed the VH avg in most of the metrics overall, especially F-measure a and F-measure n , with an increase of 9 and 16 percentage points, respectively.
We also found that the MNDWI was more likely to cause commission errors through the inspection results, while VH avg was more likely to cause omission errors. For a large number of small ponds, shown in Figure 9, the VH avg identified them as small pond individuals but missed many, while MNDWI identified them as a whole.

Comparison of Sentinel-1A & 2 in Waterbody Mapping
To enable comparing the two datasets, VHavg and MNDWI, we used the same training data and training methods to determine the parameters in the waterbody mapping framework. As Table 2-4 show, the MNDWI outperformed the VHavg in most of the metrics overall, especially F-measurea and F-measuren, with an increase of 9 and 16 percentage points, respectively.
We also found that the MNDWI was more likely to cause commission errors through the inspection results, while VHavg was more likely to cause omission errors. For a large number of small ponds, shown in Figure 9, the VHavg identified them as small pond individuals but missed many, while MNDWI identified them as a whole.

The Smallest Identifiable Area of Waterbody
Sentinel-1 and Sentinel-2 were proved to be high-quality data sources for monitoring the water surface. In our case, the two datasets, VHavg and MNDWI, exhibited similar performance in small waterbodies (smaller than 10 3 m 2 ) and large waterbodies (larger than 10 6 m 2 ). Both datasets were nearly incapable of identifying small waterbodies. However, the MNDWI outperformed the VHavg in medium waterbodies (larger than 10 3 m 2 but smaller than 10 6 m 2 ) identification. Especially for MNDWI, the SIWAL was larger than 10 4 m 2 ; for VHavg, the SIWAL was larger than 10 7 m 2 .

Conclusions
In this paper, we evaluated the performance of two datasets, Sentinel-1A and Sentinel-2, on the mapping of small waterbodies over urban and mountainous regions. First, we applied an existing two-stage framework of waterbody mapping using two datasets. Notably, we used the same training data and training methods to determine the parameters of the framework. Then, we performed the validation by comparing the extracted waterbody results with 1-m spatial resolution reference waterbody data, which was delineated from Google Maps over Guangzhou, China. The assessment metrics of Precision,

The Smallest Identifiable Area of Waterbody
Sentinel-1 and Sentinel-2 were proved to be high-quality data sources for monitoring the water surface. In our case, the two datasets, VH avg and MNDWI, exhibited similar performance in small waterbodies (smaller than 10 3 m 2 ) and large waterbodies (larger than 10 6 m 2 ). Both datasets were nearly incapable of identifying small waterbodies. However, the MNDWI outperformed the VH avg in medium waterbodies (larger than 10 3 m 2 but smaller than 10 6 m 2 ) identification. Especially for MNDWI, the SIWAL was larger than 10 4 m 2 ; for VH avg , the SIWAL was larger than 10 7 m 2 .

Conclusions
In this paper, we evaluated the performance of two datasets, Sentinel-1A and Sentinel-2, on the mapping of small waterbodies over urban and mountainous regions. First, we applied an existing two-stage framework of waterbody mapping using two datasets. Notably, we used the same training data and training methods to determine the parameters of the framework. Then, we performed the validation by comparing the extracted waterbody results with 1-m spatial resolution reference waterbody data, which was delineated from Google Maps over Guangzhou, China. The assessment metrics of Precision, Recall, and F-measure were employed. The results showed that: (1) The MNDWI outperformed the VH avg by 9 percentage points of the F-measure. (2) There was more room for the results of VH avg to improve the accuracy in combination with noise reduction, i.e., the Precision of noise reduction could be improved by 4 percentage points with 1 percentage point cost in Recall. (3) The potential smallest identifiable waterbody area (Recall rate larger than 0.8) was larger than 10 4 m 2 . Future work will evaluate the waterbody mapping methods using GF-2 data of 0.8 m (high-resolution) in Guangdong Province, China.