Fusion of High- and Medium-Resolution Optical Remote Sensing Imagery and GlobeLand30 Products for the Automated Detection of Intra-Urban Surface Water

Intra-urban surface water (IUSW) is an indispensable resource for urban living. Accurately acquiring and updating the distributions of IUSW resources is significant for human settlement environments and urban ecosystem services. High-resolution optical remote sensing data are used widely in the detailed monitoring of IUSW because of their characteristics of high resolution, large width, and high frequency. The lack of spectral information in high-resolution remote sensing data, however, has led to the IUSW misclassification problem, which is difficult to fully solve by relying only on spatial features. In addition, with an increasing abundance of water products, it is equally important to explore methods for using water products to further enhance the automatic acquisition of IUSW. In this study, we developed an automated urban surface-water area extraction method (AUSWAEM) to obtain accurate IUSW by fusing GaoFen-1 (GF-1) images, Landsat-8 Operational Land Imager (OLI) images, and GlobeLand30 products. First, we derived morphological large-area/small-area water indices to increase the salience of IUSW features. Then, we applied an adaptive segmentation model based on the GlobeLand30 product to obtain the initial results of IUSW. Finally, we constructed a decision-level fusion model based on expert knowledge to eliminate the problem of misclassification resulting from insufficient information from high-resolution remote sensing spectra and obtained the final IUSW results. We used a three-case study in China (i.e., Tianjin, Shanghai, and Guangzhou) to validate this method based on remotely sensed images, such as those from GF-1 and Landsat-8 OLI. We performed a comparative analysis of the results from the proposed method and the results from the normalized differential water index, with average kappa coefficients of 0.91 and 0.55, respectively, which indicated that the AUSWAEM improved the average kappa coefficient by 0.36 and obtained accurate spatial patterns of IUSW. Furthermore, the AUSWAEM displayed more stable and robust performance under different environmental conditions. Therefore, the AUSWAEM is a promising technique for extracting IUSW with more accurate and automated detection performance.

On the basis of summarizing and analyzing the existing IUSW literatures [10,32,33,41,42], this paper suggests that there are still three issues that need to be further addressed, and the comprehensive solution through the three issues is reflected in the contribution of this paper:

1.
A technique is proposed to fuse the abundant water indices in the spectral dimension into high-resolution remote sensing images in order to solve the commission problem caused by insufficient spectral information to extract IUSW from high spatial resolution images.

2.
According to the morphological differences of IUSWs, IUSWs are classified into two types of IUSWs, and corresponding spatial features are constructed for both types of IUSWs, thus alleviating the problem that one spatial feature cannot fully represent different morphological types of IUSWs and enhancing the saliency of IUSW features. 3.
Optimized samples from available water products are automatically obtained and used in the IUSW automatic extraction method, thus reducing the uncertainty of a priori information products, and enabling the automatic detection of IUSW with high accuracy.
To this end, this study introduced an automatic surface-water area extraction method (AUSWAEM) for IUSW using a combination of GaoFen-1, Landsat-8 OLI, and GlobeLand30 data. Three main aspects were included: (1) the development of a morphological large/small area water index (MLWI/MSWI), (2) the construction of an adaptive threshold segmentation method based on the GlobeLand30 product, and (3) a decision-level fusion model based on expert knowledge. This manuscript is structured as follows. Section 2 describes the study area (Tianjin, Shanghai, and Guangzhou, China) and the multisource data collection. Section 3 details the AUSWAEM model framework proposed to obtain the IUSW. In Section 4, the proposed method is implemented in the study area, and the results are analyzed and discussed. Finally, the results of this study are discussed and summarized in Sections 5 and 6, respectively.

Study Area and Datasets
The study areas included three coastal mega-cities in China: Tianjin, Shanghai, and Guangzhou ( Figure 1). China is undergoing rapid urbanization, especially in coastal cities, where more rapid economic development has led to dramatic changes in urban landscape patterns and more frequent changes in urban water bodies [43]. At the same time, these areas feature a wide spatial distribution, morphological diversity, and richness of surface waters that are representative. In addition, their visual noise is more informative as a result of complex and diverse land cover types, such as shadows and exposed land, which makes the study area somewhat challenging. Therefore, we focused on extracting water information from the three study areas to validate the feasibility of the present research methodology.
In this study, we selected three GF-1 images and Landsat-8 OLI images featuring different environments, such as lakes, rivers, reservoirs, and ponds, in China to verify the feasibility of the AUSWAEM. The GF-1 satellite was equipped with two 60-km wide high-spatial-resolution sensor cameras (GF-1/PMS1 and GF-1/PMS2). The data collected by the GF-1/PMS camera consisted of a panchromatic band with a spatial resolution of 2 m and four multispectral bands (blue, green, red, and near-red bands) with a spatial resolution of 8 m [44]. We acquired the three Landsat-8 OLI L1T products in the study areas from the U.S. Geological Survey (USGS) website (https://glovis.usgs.gov/app). The data details of the three image scenes are listed in Table 1, and the GF-1 images, as well as the corresponding ground truth references for the study areas, are described in Figure 1. In this study, we selected three GF-1 images and Landsat-8 OLI images featuring different environments, such as lakes, rivers, reservoirs, and ponds, in China to verify the feasibility of the AUSWAEM. The GF-1 satellite was equipped with two 60-km wide high-spatial-resolution sensor cameras (GF-1/PMS1 and GF-1/PMS2). The data collected by the GF-1/PMS camera consisted of a panchromatic band with a spatial resolution of 2 m and four multispectral bands (blue, green, red, and near-red bands) with a spatial resolution of 8 m [44]. We acquired the three Landsat-8 OLI L1T products in the study areas from the U.S. Geological Survey (USGS) website (https://glovis.usgs.gov/app). The data details of the three image scenes are listed in Table 1, and the GF-1 images, as well as the corresponding ground truth references for the study areas, are described in Figure 1.  The study areas consisted of the urban areas of (a) Tianjin, (b) Shanghai, and (c) Guangzhou, China. The GaoFen-1 imagery is shown with a false-color composite of the near-infrared, red, and green bands of the raw image data overlaid with the ground truth maps covering the study areas. We manually delineated the ground truth maps based on careful visual interpretation of the GF-1 data and a field campaign in the study area. In addition, the initial samples of the urban surface-water area were provided by the water body types of the GlobeLand30 (2010) product data, which has a water body type accuracy of 80% [45][46][47]. The GlobeLand30 dataset was produced by the National Basic Geographic Information of China using Landsat multispectral data and pixel object knowledge techniques and downloaded from the GLOBELAND30 website (http://www.globallandcover.com/ GLC30Download/index.aspx).

Methods
The automated detection of IUSW extraction methodology of AUSWAEM was established ( Figure 2). First, the GF-1 and Landsat-8 images were preprocessed to generate a 2 m GF-1 primary image product and a 15 m Landsat-8 primary image product, respectively. See Section 3.1 for a detailed description. Second, the MLWI and MSWI were constructed from the GF-1 primary data product, and the MNDWI index was calculated from the Landsat-8 OLI primary data product. Then, based on the adaptive segmentation model of the GlobeLand30 product, the constructed MLWI, MSWI, and MNDWI water indices were threshold segmented separately to obtain the corresponding initial IUSW results. Finally, the decision-level fusion model was used to fuse the initial IUSW results to obtain the final IUSW results. We manually delineated the ground truth maps based on careful visual interpretation of the GF-1 data and a field campaign in the study area. In addition, the initial samples of the urban surface-water area were provided by the water body types of the GlobeLand30 (2010) product data, which has a water body type accuracy of 80% [45][46][47]. The GlobeLand30 dataset was produced by the National Basic Geographic Information of China using Landsat multispectral data and pixel object knowledge techniques and downloaded from the GLOBELAND30 website (http://www.globallandcover.com/GLC30Download/index.aspx).

Methods
The automated detection of IUSW extraction methodology of AUSWAEM was established ( Figure 2). First, the GF-1 and Landsat-8 images were preprocessed to generate a 2 m GF-1 primary image product and a 15 m Landsat-8 primary image product, respectively. See Section 3.1 for a detailed description. Second, the MLWI and MSWI were constructed from the GF-1 primary data product, and the MNDWI index was calculated from the Landsat-8 OLI primary data product. Then, based on the adaptive segmentation model of the GlobeLand30 product, the constructed MLWI, MSWI, and MNDWI water indices were threshold segmented separately to obtain the corresponding initial IUSW results. Finally, the decision-level fusion model was used to fuse the initial IUSW results to obtain the final IUSW results.

Preprocessing of GF-1 and Landsat-8 Images
The GF-1 images and Landsat-8 OLI data were preprocessed with ortho-rectification and fusion to improve the spatial accuracy and spatial resolution using ENVI 5.0 commercial software. In this study, the GF-1 images were ortho-rectification based on a rational polynomial coefficient (RPC) model (utilizing the RPC file in the system) and were fused using the Gram-Schmidt spectral sharpening method [48,49]. Thus, the GF-1 data primary product with 2 m spatial resolution was obtained.
On the basis of the polynomial model [50], the Landsat-8 OLI data were geometrically corrected using collected ground control points from GF-1 images with an average root-meansquare error of <0.4 pixels. To obtain a surface reflectance dataset from the Landsat-8 OLI data, we employed the fast line-of-sight atmospheric analysis of the spectral hypercube atmospheric Figure 2. Workflow of the integrated urban surface-water area mapping method based on the GaoFen-1 and Landsat-8 OLI imagery. Note: MLWI, MSWI, and MNDWI represent the morphology large-area water index, the morphology small-area water index, and the modified normalized difference water index, respectively. W MLWI , W MSWI , and W MNDWI are thematic maps of the water bodies generated from the MLWI, MSWI, and MNDWI, respectively. GLCM represents the gray-level co-occurrence matrix.

Preprocessing of GF-1 and Landsat-8 Images
The GF-1 images and Landsat-8 OLI data were preprocessed with ortho-rectification and fusion to improve the spatial accuracy and spatial resolution using ENVI 5.0 commercial software. In this study, the GF-1 images were ortho-rectification based on a rational polynomial coefficient (RPC) model (utilizing the RPC file in the system) and were fused using the Gram-Schmidt spectral sharpening method [48,49]. Thus, the GF-1 data primary product with 2 m spatial resolution was obtained.
On the basis of the polynomial model [50], the Landsat-8 OLI data were geometrically corrected using collected ground control points from GF-1 images with an average root-mean-square error of <0.4 pixels. To obtain a surface reflectance dataset from the Landsat-8 OLI data, we employed the fast line-of-sight atmospheric analysis of the spectral hypercube atmospheric correction algorithm [49]. On this basis, to obtain the 15 m Landsat-8 fusion data, the Landsat-8 multispectral data (30 m × 30 m pixel size) was fused with the Landsat-8 panchromatic data (15 m × 15 m pixel size) using the Gram-Schmidt spectral sharpening method [51].

MLWI/MSWI Extraction from GF-1 Imagery
Because of the morphological diversity of urban water in high-spatial-resolution remotely sensed images, it is difficult to represent the diversity of water bodies in built-up areas through a single water index. Therefore, in this study, we divided urban water bodies into two categories, namely, large-area water bodies (with water surface widths >10 m) and small-area water bodies (with water surface widths <10 m). We then proposed the MLWI/MSWI to describe the spectral-spatial properties of water using an extended morphological profile method based on the NDWI, which was characterized by a combination of the water body index and morphological information, thereby improving the recognition accuracy of different forms of water bodies in urban scenes. We used the NDWI to enhance water body information and suppress other types of information in remote sensing imagery [52]. To efficiently suppress the omission and correct inaccurate boundaries of water areas, however, we applied the extended morphology profiles (EMPs), which provided the shape and size information of all water elements present in high-resolution images, to enhance the traditional NDWI spectral features [53].
The MLWI and MSWI calculations were based on the NDWI index extracted from GF-1 images, to enhance the difference between water and non-water bodies in the spectral features. Then, based on the NDWI index feature information, we performed reconstruction and white top-hat operations using multiscale structural elements, to achieve further increases in the difference between global water body information and background values, as well as to enhance small water body profile information. On this basis, we performed averaging calculations to obtain the integrated expression results of the multiscale MCR and WTH feature sequences. Finally, we performed subtraction and addition calculations on the combined expression results to highlight the information of large and small water bodies. The calculation steps of the MLWI and MSWI were as follows: (1) calculation of NDWI: the NDWI image was generated using the green and near-infrared (NIR) bands using Equation (1); (2) morphology closing-by-reconstruction (MCR) and white top-hat (WTH) were employed using Equations (2) and (3), respectively; (3) the EMPs of the extended MCR and WTH were defined as expressed in Equations (4) and (5), respectively; (4) the mean of the EMPs was calculated as expressed in Equations (6) and (7); and (5) the MLWI and MSWI were calculated as expressed in Equations (8) and (9), respectively.
where ρ Green is a green band, such as GF-1 band 2, and ρ NIR is a near-infrared band, such as GF-1 band 4.
where φ re NDWI (r) and γ NDWI (r) represent the closing-by-reconstruction and the opening of the NDWI image, respectively, and r indicates the radius of a circular structural element (SE).
where i indicates the number of MCR, and WTH is the interval of a circular SE.
Remote Sens. 2020, 12, 4037 where EMPs MCR mean and EMPs WTH mean are defined as the means of the EMP profiles, since water exhibits large NDWI values at different multiscales.
where MLWI and MSWI are defined as the addition and subtraction of the EMPs MCR mean and EMPs WTH mean, because they respectively increase the difference between a large-area water body and the background and between a small-area water body and the background. Consequently, large-area and small-area water bodies correspond to large MLWI and MSWI feature values.
Because of the diversity of water structures in the images, a single linear SE is difficult to fully describe. Therefore, in this study, we used a circular SE, which has a good shape presentation for all the elements in an image. The scale parameters i min , i max , and ∆i were set to 1, 21, and 1, respectively, based on the 2-m spatial resolution of the images. The parameters of i were determined by calculating the average diameters of various types of water bodies in the image. It was difficult, however, to select a suitable value of for complex urban scenes, due to the diversity of the water structures present in the images. We obtained the parameters used in this study by measuring and counting the water body information of the images in the study area and obtaining i through trial and error. Thus, the parameters of the determined MLWI/MSWI were relatively stable. The MLWI in Figure 3c enhanced the characteristics of large-area water bodies, while the MSWI in Figure 3d more effectively detected small-area water bodies.

MNDWI Extraction from Landsat-8 Imagery
To effectively solve the problem of mixing caused by the insufficient spectral information of GF-1 images, this study utilized MNDWI extraction based on Landsat-8 data to assist IUSW interpretation. In medium-resolution remote sensing images, the MNDWI can more effectively enhance the water features in regions with large amounts of built-up areas than the NDWI [20]. This is because water bodies have greater positive values of the MNDWI than NDWI because to enhance the difference between large-area water bodies and small-area water bodies and non-water areas, respectively. Note: The dashed yellow rectangular frames represent typical areas where objects with large areas and small areas of water were enhanced.

MNDWI Extraction from Landsat-8 Imagery
To effectively solve the problem of mixing caused by the insufficient spectral information of GF-1 images, this study utilized MNDWI extraction based on Landsat-8 data to assist IUSW interpretation. In medium-resolution remote sensing images, the MNDWI can more effectively enhance the water features in regions with large amounts of built-up areas than the NDWI [20]. This is because water bodies have greater positive values of the MNDWI than NDWI because the image bands generally have stronger absorbability in the short-wave infrared (SWIR) band than in the near-infrared (NIR) band [21]. In addition, soil, vegetation, and built-up areas have smaller negative values because they reflect more SWIR light than green light [12]. Based on this finding, the MNDWI was proposed using Landsat-8 data and is defined as follows: where ρ green is a green band, such as OLI band 3, and ρ SWIR is a middle infrared band, such as OLI band 6.

IUSW Automatic Extraction Based on Adaptive Threshold Segmentation by the GlobeLand30 Product
In this section, we proposed an adaptive threshold segmentation method based on the water body type of the GlobeLand30 product, which mainly includes (1) the sample optimization using k-means clustering from the water body types of GlobeLand30 products and (2) the adaptive segmentation of water body index features based on the optimized samples. The method performed automatic segmentation of MLWI, MSWI, and MNDWI water body indices respectively and obtains the corresponding initial IUSW results. This process is shown in Figure 4. where green  is a green band, such as OLI band 3, and SW IR  is a middle infrared band, such as OLI band 6.

IUSW Automatic Extraction Based on Adaptive Threshold Segmentation by the GlobeLand30 Product
In this section, we proposed an adaptive threshold segmentation method based on the water body type of the GlobeLand30 product, which mainly includes (1) the sample optimization using k-means clustering from the water body types of GlobeLand30 products and (2) the adaptive segmentation of water body index features based on the optimized samples. The method performed automatic segmentation of MLWI, MSWI, and MNDWI water body indices respectively and obtains the corresponding initial IUSW results. This process is shown in Figure 4. Due to the differences in precision, scale, and timeliness between the GlobeLand30 product and the multiple water index, it is difficult to apply the water type in the GlobeLand30 product directly as a sample. Therefore, sample optimization of the water types in the GlobeLand30 product is needed to eliminate the noise mixed with the water types in the GlobeLand30 product and to obtain relatively pure water samples. The k-means is widely used Due to the differences in precision, scale, and timeliness between the GlobeLand30 product and the multiple water index, it is difficult to apply the water type in the GlobeLand30 product directly as a sample. Therefore, sample optimization of the water types in the GlobeLand30 product is needed to eliminate the noise mixed with the water types in the GlobeLand30 product and to obtain relatively pure water samples. The k-means is widely used because of its simple principle, effectiveness, excellent clustering effect, easy interpretation, and the fact that the parameter to be tuned is only the number of cluster types [54]. Thus, we applied the k-means clustering algorithm to further refine the initial water samples of the GlobeLand30 product according to the distances of the samples and cluster centers in the multiwater indices feature space. Based on the concise algorithm principle of k-means, each water body sample S was defined as follows [55]: where k is the number of cluster types; Ci represents the water and non-water classes from the GlobeLand30 product; t is the class in Ci; DN F represents the pixel values of the water index; F represents the water indices (i.e., MSWI, MLWI, and MNDWI); and µ i = 1 |C i | t∈Ci DN F t is a vector of the average pixel values in the water body and non-water-body types. The k clusters were agglomerated into two classes: water and non-water. Table 2 shows the statistical results of water and non-water bodies after clustering the water types of Globeland30 products using the k-means algorithm according to the MNDWI, MLWI and MSWI indices, respectively. The results show that the noise can be well removed from the water types of the GlobeLand30 product, and a relatively pure water sample S can be obtained, even though some outliers are present; however, overall, it does not affect the use as an IUSW segmentation sample. We obtained the corresponding initial IUSW results by automatically segmenting the MNDWI, MLWI, and MSWI indices by the threshold interval, which was determined by the mean and variance calculated from the water sample S (see Section 3.3.1), as shown in Equation (12). Table 3 shows the automatically calculated threshold results based on the water sample S. Specifically, the calculation method of Mean F S is as follows: Specifically, the calculation method of Stdev F S is as follows: where Mean F S and Stdev F S are the mean and standard deviation of the water body sample S, respectively; n is the number of water body sample S; F represents the water indices (i.e., MSWI, MLWI, and MNDWI); and W F is the initial urban surface-water result corresponding to the water body indices F, namely, W MSWI , W MLWI , and W MNDWI .

IUSW Automatic Optimization Based on Decision-Level Fusion Model
In this study, we used a decision-level information fusion model based on expert knowledge to fuse the initial IUSW extraction results to solve the noise problem generated by relying only on high-resolution remote sensing data sources and to achieve the goal of improving the accuracy of the automatic IUSW extraction results. The final IUSW results were composed of two parts: the optimized large-area water body and the optimized small-area water body. We obtained the optimized large-area water body by fusing the initial results of the large-area water body with the initial results based on the MDNWI features using an object-level area ratio index, primarily to eliminate the noise problem caused by insufficient spectral information (e.g., bare ground-type information). We obtained the optimized small-area water body by dividing the interval range of the texture homogeneity features of the optimized large-area water body and the intersecting objects of the small-area water body initial results, and then optimizing the small-area initial water body to achieve eliminate the noise from building shadows.
The steps of the decision-level fusion model based on expert knowledge were as follows: (1) use area ratios on the initial results of water bodies W MLWI and W MNDWI for information fusion calculations, thereby eliminating the noise information in the large-area initial results and obtaining the optimized large-area water body results W PLN ; (2) obtain sample W S through the intersection operation of the optimized water body results W PLN and W MSWI ; (3) calculate the textural homogeneity characteristics of the small-area water body objects, and obtain the maximum and minimum values of the texture homogeneity features of sample W S ; (4) based on the threshold interval of sample W S , use the texture homogeneity features to segment the small-area water body object W MSWI and obtain the optimized small-area water result W PS ; and (5) perform the merge operation on the optimization results W PLN and W PS to obtain the final W IUSW result. The workflow is illustrated in Figure 5.
where W PLN is the optimized large-area water body result; W MLWI (O) and W MNDWI (O) are the initial results of the large-area water body obtained based on the MLWI and MNDWI, respectively; O is the water body object; A is the area of the water body object O; and T 1 is the threshold parameter. The threshold value of 0.4 was determined experimentally.
Remote Sens. 2020, 12, 4037 12 of 23 respectively; O is the water body object; A is the area of the water body object O; and 1 T is the threshold parameter. The threshold value of 0.4 was determined experimentally.

Specifically, GLCM W Smax (O) and GLCM W S min
(O) were calculated as follows: Specifically, W S was calculated as follows: Specifically, GLCM was calculated as follows: where W PS is the optimized small-area water body result; GLCM W Smax (O) and GLCM W S min (O) are the maximum and minimum values of the texture homogeneity of water sample W S , respectively; W S is the water sample; W PLN is the optimized large-area water body result; W MSWI is the initial small-area water body result; GLCM is the grayscale co-presentation matrix; N is the number of grayscale levels; NDWI is the normalized water index of the GF-1 image; O is the water object; and i and j are pixel coordinates. A window size of 7 × 7 pixels is appropriate for classifying remotely sensed imagery in urban areas, and the tested resolutions ranged from 2.5 m × 2.5 m to 10 m × 10 m [56]. Therefore, we set the window size to 7 × 7 pixels in each of the three study regions.
where W IUSW is the final urban inland surface-water result; and W PLN and W PS are the optimized large-area and small-area water body results, respectively.

Classification Accuracy Assessment
In this study, we used the ground truth maps to verify the vector results of water extracted from the three study areas in Tianjin, Shanghai, and Guangzhou. We manually delineated the ground truth maps in the study areas based on careful visual interpretation and a field campaign. We used the kappa coefficient (Kappa), producer's accuracy (PA), and user's accuracy (UA) to evaluate the water extraction accuracy using the ground truth maps [57].
where TP, FN, FP, and TN represent the areas of correct extraction, undetected water bodies, incorrect extraction, and non-water bodies that were correctly rejected, respectively; and T is the total area of the study area.

Comparison of Spatial Distribution Mapping Results of IUSW from the NDWI-Based Method and the AUSWAEM for a Three-Case Study in China
In this study, we used the water body extraction method based on the NDWI as a comparative experiment. To ensure the best results of the water body extraction based on the NDWI, we determined the threshold by trial and error, resulting in thresholds of 0.3, 0.4, and 0.3 in Tianjin, Shanghai, and Guangzhou, respectively. Figure 6 shows the water extraction results using the NDWI-based method and the AUSWAEM. The spatial distributions of the intra-urban surface water extracted by the NDWI-based method and the AUSWAEM were verified by the ground truth data, with the red commission regions and the white omission regions shown in Figure 6. Visual inspection of this figure indicates that the AUSWAEM successfully extracted most of the urban water areas with complete shapes, whereas the extracted results obtained using the NDWI-based method were incomplete. The AUSWAEM had fewer commission and omission errors in each study area, and there were only minor omission errors at the boundaries of the small-area water bodies (e.g., small rivers). The NDWI-based method, however, had clear errors of commission and omission. NDWI, we determined the threshold by trial and error, resulting in thresholds of 0.3, 0.4, and 0.3 in Tianjin, Shanghai, and Guangzhou, respectively. Figure 6 shows the water extraction results using the NDWI-based method and the AUSWAEM. The spatial distributions of the intra-urban surface water extracted by the NDWI-based method and the AUSWAEM were verified by the ground truth data, with the red commission regions and the white omission regions shown in Figure 6. Visual inspection of this figure indicates that the AUSWAEM successfully extracted most of the urban water areas with complete shapes, whereas the extracted results obtained using the NDWI-based method were incomplete. The AUSWAEM had fewer commission and omission errors in each study area, and there were only minor omission errors at the boundaries of the small-area water bodies (e.g., small rivers). The NDWIbased method, however, had clear errors of commission and omission. Figure 6. Comparison of water extraction results between the NDWI-based method and the automatic surface-water area extraction method (AUSWAEM) for the three study areas. Note: the yellow square frames represent 1-km 2 areas. Additionally, to facilitate observation and analysis, we selected 1-km 2 areas from the images of the study area, which appear as square yellow frames. The classification results are presented in Figure 7. This figure shows that the water bodies extracted using the AUSWAEM, including large rivers, small rivers, and reservoirs, were continuous and consistent with the validation data. The main reason for the omission errors at the boundaries of small rivers stemmed from the mixed pixels in the water and non-water boundary areas, which resulted in subtle differences between the AUSWAEM extraction results and the ground truth data. presented in Figure 7. This figure shows that the water bodies extracted using the AUSWAEM, including large rivers, small rivers, and reservoirs, were continuous and consistent with the validation data. The main reason for the omission errors at the boundaries of small rivers stemmed from the mixed pixels in the water and non-water boundary areas, which resulted in subtle differences between the AUSWAEM extraction results and the ground truth data. Figure 7. Comparison of the water extraction results between the NDWI-based method and the AUSWAEM algorithms in local areas (the small areas in the yellow square frames from the image in Figure 6).

Accuracy Assessment Results of IUSW between the NDWI-Based Method and the AUSWAEM for a Three-Case Study in China
We used the accuracy evaluation indicators, including the kappa coefficient (Kappa), producer's accuracy (PA), and user's accuracy (UA), to quantitatively evaluate the accuracy of the water area extraction results using ground truth maps. Table 4 lists the results of the accuracy assessment of the AUSWAEM and NDWI-based method in the Tianjin, Shanghai, and Guangzhou study areas. Figure 7. Comparison of the water extraction results between the NDWI-based method and the AUSWAEM algorithms in local areas (the small areas in the yellow square frames from the image in Figure 6).

Accuracy Assessment Results of IUSW between the NDWI-Based Method and the AUSWAEM for a Three-Case Study in China
We used the accuracy evaluation indicators, including the kappa coefficient (Kappa), producer's accuracy (PA), and user's accuracy (UA), to quantitatively evaluate the accuracy of the water area extraction results using ground truth maps. Table 4 lists the results of the accuracy assessment of the AUSWAEM and NDWI-based method in the Tianjin, Shanghai, and Guangzhou study areas. The accuracy assessment results presented in Table 4 indicate that the average Kappa of the urban water results extracted using the AUSWAEM in the three study areas reached 0.91, which was 0.36 higher than that of the urban water extracted using the NDWI-based method. This result indicated that the AUSWAEM performed well in the extraction of urban water bodies, which is an outcome that could not be achieved by the NDWI-based method. In the three study areas, the accuracy of the AUSWAEM in the extraction of urban water bodies had an average UA value of 96.5% and an average PA value >92%, which were higher than the corresponding values of 38.4% and 14.6% achieved by the NDWI-based method. These results revealed that the AUSWAEM maintained lower commission and omission errors (i.e., <8%) compared with the NDWI-based method.

Analysis of the Importance of MLWI/MSWI Features
To verify the advantages of the MLWI/MSWI features, we used Kappa to compare the water body extraction results from the MLWI/MSWI-based method and the NDWI-based method. Furthermore, to assess the importance of MLWI/MSWI features in the AUSWAEM, we employed Kappa, PA, and PU to quantitatively evaluate the accuracy curve for each step of the AUSWAEM during the water extraction process. Figure 8a shows a comparison of the water body extraction accuracy between the MLWI/MSWI-based and NDWI-based algorithms in Tianjin, Shanghai, and Guangzhou. Figure 8b-d, respectively, present the accuracy curves of Kappa, PA, and UA for each step of the AUSWAEM during the water extraction process in the study areas.
Remote Sens. 2020, 12, 4037 17 of 23 indicated that object-based post-processing had a significant effect on eliminating small-area shadows from low-rise buildings.

Validation and Analysis of Models and Threshold Sensitivity
In order to verify the stability of the model and the determined parameters of the method, the statistical analysis of the kappa coefficients of the model and the automatic segmentation threshold parameters in the model were performed using standard deviations (Std), respectively, in this study. As shown in Table 5, the proposed AUSWAEM has the highest average kappa coefficient of 0.91 and the lowest standard deviation of 0.02 in the three study areas. The results show that AUSWAEM has excellent IUSW detection capability in various urban scenarios. The accuracy assessment results presented in Figure 8a reveal that the Kappa of the urban water area extraction results based on the MLWI/MSWI features of the three study areas were >0.8, which was 0.28 higher than the NDWI-based method and 0.08 lower than the final result from the AUSWAEM model. These results indicated that the MLWI/MSWI features were superior to the NDWI in urban water area extraction performance. Figure 8c shows that the water area extraction results based on the MLWI/MSWI features were basically consistent with the PA values for post-processing (1%) in the study area of Shanghai, whereas they decreased slightly by 1.4% and 1.0% in the study areas of Tianjin and Guangzhou, respectively. Compared with the final water area extraction results, the omission rates were stable at a relatively low level, indicating that the MLWI/MSWI-based water extraction performed well. In Figure 8b,c, the post-processing method used in this study reduced the PA (i.e., the omission rates were increased), although the reduction was only 1%, which had no effect on the overall accuracy. In contrast, Figure 8d indicates that the UA improved in the three study areas after the MNDWI-based post-processing and decision-level fusion method. Among them, the study areas of Tianjin and Guangzhou increased slightly, by 2.4% and 3.4%, respectively, whereas the Shanghai study area increased significantly, by 10.9%. Compared with the research areas in Tianjin and Guangzhou, the accuracy of the UA in Shanghai was significantly improved because Shanghai is particularly affected by the shadows of high-rise buildings, and decision-level fusion based on MNDWI features can better solve this type of noise. In addition, as can be seen in Figure 8b,d, based on the post-processing of objects in the three study areas, the average Kappa and the average UA improved by 0.05 and 9.5%, respectively. This result indicated that object-based post-processing had a significant effect on eliminating small-area shadows from low-rise buildings.

Validation and Analysis of Models and Threshold Sensitivity
In order to verify the stability of the model and the determined parameters of the method, the statistical analysis of the kappa coefficients of the model and the automatic segmentation threshold parameters in the model were performed using standard deviations (Std), respectively, in this study. As shown in Table 5, the proposed AUSWAEM has the highest average kappa coefficient of 0.91 and the lowest standard deviation of 0.02 in the three study areas. The results show that AUSWAEM has excellent IUSW detection capability in various urban scenarios.  Table 6 presents the threshold parameters for all water body indices in the three study areas. The MSWI has the lowest Std of 0.01, which indicates the stability of MSWI in extracting IUSW detail information under different urban scenarios. Secondly, the Std of MNDWI is 0.03, which is lower than that of NDVI (0.05), indicating that MNDWI is more stable than NDVI in extracting IUSW spectral information under different urban scenarios. Finally, the Std of MLWI is the highest 0.1 among all threshold parameters, which is due to the threshold parameter of 0.13 in Shanghai, which is significantly lower than the segmentation threshold parameter of the other two study regions. The significant decrease in the threshold parameter in Shanghai is due to the presence of a large number of confusing types, such as bare land types and bright building types, in the Shanghai study area. This is also the main reason why the accuracy of IUSW extraction is lower than the other two study areas when using only MLWI + MSWI. However, in the AUSWAEM model, the Shanghai study area has the highest accuracy improvement of 0.1, which indicates that the fusion of medium-resolution spectral information can effectively solve the commission problem caused by insufficient spectral information, thus improving the applicability in different urban scenes. Therefore, from the above analysis, it can be concluded that the proposed AUSWAEM method has a well performance to detect IUSWs in different urban scenarios. In addition, the adaptive threshold determination method can automatically determine more reasonable threshold parameters according to the characteristics of different urban scenes.

Discussion
The IUSW results were consistent with the results from the ground truth data, which we delineated manually based on careful visual interpretation and a field campaign. Therefore, the results of the proposed method in this study were reliable. Compared with the results of other studies using high spatial resolution remote sensing images for urban surface water mapping [34,57,58], the overall accuracy of IUSW in this study is higher (0.5-1.0% higher), indicating the validity of the AUSWAEM method proposed in this study. It is also shown that the fusion of medium-resolution spectral information on the basis of high-resolution remote sensing images can effectively improve the accuracy of IUSW detection compared to relying only on high-resolution remote sensing images for IUSW detection. Thus, the fusion of the spectral information of the medium resolution remote sensing image can play a role in improving the accuracy of IUSW detection by high resolution remote sensing. In addition, compared with other methods for IUSW extraction, the proposed AUSWAEM method requires fewer parameters to be determined and has better robustness and applicability, which reflects the good performance of this method.
The main reasons for the reduction in accuracy are twofold: First, slender shadows from some of the larger buildings, such as factories and shopping malls, are easy to confuse with small rivers. Shadows of some larger buildings can be difficult to distinguish from small rivers because such shadows are characterized by a slender shape and a strong homogeneity of texture. This is similar to the characteristics of small rivers, which makes it difficult to distinguish to two features using a water index, texture homogeneity, and shape characteristics. Because of vast differences in the structures of large buildings, the shapes of their shadows vary significantly, making it challenging to determine rules. In addition, although this can lead to a certain degree of misclassification, it did not exert a large impact on the overall accuracy because the occupied area proportion of large buildings was <2% in the study areas. Therefore, to generalize this method, this study did not address this issue further. Second, even with high-resolution images, the problem of mixed pixels between water and non-water objects still exists. Therefore, compared with ground truth maps, the water bodies extracted by this study had some omissions at their boundaries, leading to a decrease in the PA. This problem can be addressed in subsequent research.
The primary advantage of the proposed method was that it did not require the selection of samples, thus saving the cost of manual marking. Additionally, the problem of threshold selection was partially solved, which reduced the time-consuming selection of thresholds and the uncertainty of segmentation. The disadvantage, however, was that some of the thresholds (e.g., texture homogeneity and shape features) still had to be manually determined. In this study, the threshold was selected by trial and error and compared in three study areas, making the parameters of the two thresholds relatively stable. However, the appropriate thresholds still need to be manually adjusted for more types of urban scenes. In follow-up studies on the AUSWAEM, we will focus on the issue of threshold selection in the model. Although the adaptive threshold segmentation based on the GlobeLand30 product in this study could avoid the time-consuming nature of threshold selection to some extent, it requires further investigation. Subsequently, an active learning method from multisource reference data and adaptive mining of global-local relationship information will be attempted to solve the threshold optimization problem. Then, in future applications, the AUSWAEM model will be applied to high-precision automated IUSW dynamic monitoring to explore the dynamic development pattern of surface water resources in each city on a large regional scale, so as to provide detailed information and technical support for the cooperative use and development of water resources in urban agglomerations on a regional scale.

Conclusions
In this study, we proposed a novel intra-urban surface-water extraction method, which integrated the advantages of high-and medium-resolution remotely sensed products to accurately and automatically delineate IUSW. To address the issue of extracting different water spatial forms, we proposed the MLWI/MSWI, in combination with the NDWI. In addition, we designed a decision-level information fusion model to successfully suppress noise from different types of building shadows. We conducted a test of the remotely sensed products of GF-1 and Landsat-8 OLI images covering a three-case study in China to validate the extraction accuracy and performance of the proposed method. The results indicated that the AUSWAEM method performed well, with an average Kappa of 0.91 and an average total commission and omission error of 5.7%. The MLWI/MSWI in the AUSWAEM detected different level water forms, including water body boundaries and small rivers, and was more efficient than the NDWI-based method in terms of visual inspection. Meanwhile, the combined post-processing method in the AUSWAEM was also effective in the suppression of noise from different types of shadows. The AUSWAEM was an effective water area extraction method, which demonstrated that this algorithm could maintain high precision, stability, and robustness in urban areas.
Author Contributions: Z.L. and X.Y. conceived and designed the experiments; Z.L., performed the experiments; Z.L. and X.Y. analyzed the data; Z.L. and X.Y. wrote the paper. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.