Water-Body Segmentation for SAR Images: Past, Current, and Future

: Synthetic Aperture Radar (SAR), as a microwave sensor that can sense a target all day or night under all-weather conditions, is of great signiﬁcance for detecting water resources, such as coastlines, lakes and rivers. This paper reviews literature published in the past 30 years in the ﬁeld of water body extraction in SAR images, and makes some proposals that the community working with SAR image waterbody extraction should consider. Firstly, this review focuses on the main ideas and characteristics of traditional water body extraction on SAR images, mainly focusing on traditional Machine Learning (ML) methods. Secondly, how Deep Learning (DL) methods are applied and optimized in the task of water-body segmentation for SAR images is summarized from the two levels of pixel and image. We also pay more attention to the most popular networks, such as U-Net and its modiﬁed models, and novel networks, such as the Cascaded Fully-Convolutional Network (CFCN) and River-Net. In the end, an in-depth discussion is presented, along with conclusions and future trends, on the limitations and challenges of DL for water-body segmentation.


Introduction
Synthetic Aperture Radar (SAR) can sense targets all day/night [1], and its active microwave can penetrate clouds and fog. In extreme weather conditions, recourse to SAR is almost compulsory [2,3]. Meanwhile, the different scattering characteristics of water and land make SAR capable of providing excellent data resources [4][5][6][7] for river and lake segmentation [8,9], coastline extraction [10], river extraction [11], and other similar tasks [12]. Its distinctive way of receiving and emitting microwaves allows SAR to effectively distinguish water and land. Specifically, water on land is typically a connected area of irregular shape, with low surface roughness and no apparent texture. When the microwaves from SAR hit a smooth water surface, they are reflected away as though from a mirror, returning little energy back to the SAR. Consequently, bodies of water appear as low-intensity areas in SAR images. On the contrary, the land surface around water is coarser, returning stronger microwave energy to SAR. Therefore, the water in the SAR image appears darker and the surrounding terrain brighter [13]. The above characteristics of SAR have resulted in mushrooming research on water-body segmentation of SAR imagery. In order to intuitively express this development trend, this paper counts the publications in the field of water body extraction from SAR images in the past 30 years, as shown in Figure 1. The past 30 years have witnessed that, prior to the development of Deep Learning (DL) methods, scholars habitually introduced traditional Machine Learning (ML) algorithms into the water-body segmentation task of SAR images. In this way, they achieved the following promising achievements: edge detection, level set, clustering, and Markov random field ideas. However, the special imaging mechanism of SAR results in images full of speckle noise. ML methods always try to formulate some kind of mapping relationship to determine the category of a pixel, which leads to a lack of flexibility and intelligence. For SAR images with a lot of speckle noise, it is difficult for these mappings to fit all pixels in the entire SAR image. Fortunately, in recent years, DL methods have overcome the limitations. As a derivative of neural networks, DL received extensive attention from scholars, due to its powerful feature extraction capabilities. By iterating over the shared weights in the model, Neural networks can build a mapping model that can accommodate all pixels. At present, many neural network models, especially Convolutional Neural Networks (CNN), with excellent performance, are mushrooming; typical representatives of which are U-Net, DeepLab and their derivatives. CNN achieved remarkable results in the field of water-body segmentation in SAR images, due to their translation invariance and shared weight characteristics. At a time when DL methods are attracting more attention than traditional ML methods, this paper reviews and summarizes existing segmentation methods to provide insights for future scientific research work.
The remainder of the paper is organized as follows. In Section 2, traditional waterbody segmentation methods are reviewed, including some boundary-based and regionbased ML algorithms. The development process of DL in the task of water-body segmentation for SAR images is summarized in Section 3. Finally, an in-depth discussion is presented, along with conclusions and prospects, on the limitations and challenges of DL for water segmentation in Section 4.

Traditional Water Body Extraction Methods for SAR Images
In order to facilitate a review of traditional SAR image water segmentation algorithms, the methods involved are usually classified based on data sources, and technical routes, and scene types based on time as a clue. The related algorithms are shown in  The past 30 years have witnessed that, prior to the development of Deep Learning (DL) methods, scholars habitually introduced traditional Machine Learning (ML) algorithms into the water-body segmentation task of SAR images. In this way, they achieved the following promising achievements: edge detection, level set, clustering, and Markov random field ideas. However, the special imaging mechanism of SAR results in images full of speckle noise. ML methods always try to formulate some kind of mapping relationship to determine the category of a pixel, which leads to a lack of flexibility and intelligence. For SAR images with a lot of speckle noise, it is difficult for these mappings to fit all pixels in the entire SAR image. Fortunately, in recent years, DL methods have overcome the limitations. As a derivative of neural networks, DL received extensive attention from scholars, due to its powerful feature extraction capabilities. By iterating over the shared weights in the model, Neural networks can build a mapping model that can accommodate all pixels. At present, many neural network models, especially Convolutional Neural Networks (CNN), with excellent performance, are mushrooming; typical representatives of which are U-Net, DeepLab and their derivatives. CNN achieved remarkable results in the field of water-body segmentation in SAR images, due to their translation invariance and shared weight characteristics. At a time when DL methods are attracting more attention than traditional ML methods, this paper reviews and summarizes existing segmentation methods to provide insights for future scientific research work.
The remainder of the paper is organized as follows. In Section 2, traditional water-body segmentation methods are reviewed, including some boundary-based and region-based ML algorithms. The development process of DL in the task of water-body segmentation for SAR images is summarized in Section 3. Finally, an in-depth discussion is presented, along with conclusions and prospects, on the limitations and challenges of DL for water segmentation in Section 4.

Traditional Water Body Extraction Methods for SAR Images
In order to facilitate a review of traditional SAR image water segmentation algorithms, the methods involved are usually classified based on data sources, and technical routes, and scene types based on time as a clue. The related algorithms are shown in Figure 2.

Edge Detection
Edge information of water is the most intuitive and important feature, and it is also an essential image feature for early research in the task of extracting water from SAR images [14,15]. Edge detection involves finding a set of pixels whose values change sharply in the image, which is often the contour of water. These methods are usually sensitive to speckle noise in SAR images.

Canny Edge Detection
Canny Edge Detection (CED) performs two improvements based on the first-order differential operator: non-maximum suppression and double threshold. Non-maximum suppression can suppress multi-response edges effectively, contributing to improvement in edge location accuracy [16]. At the same time, double threshold can effectively reduce the rate of missed edge detection. In 2004, Liu H. et al. [17] utilized CED and the Levenberg Marquardt method to accelerate the convergence speed of the iterative Gaussian curve fitting process and so improve the reliability of the local threshold [18] of image segmentation. In 2021, Chen, F. et al. [19] applied classical CED to extract glacial lake outlines and other surface features by using different modes of Chinese GF-3 data in the Mount Everest region of the Himalayas. Although these methods are easy to perform, detected edge discontinuity is easily caused and the methods are sensitive to speckle noise.

Sobel Edge Detection
Sobel Edge Detection (SED), in essence, is a first-order discrete difference operator, which is used to calculate the approximate value of the image brightness function gradient [20]. Applying SED to any pixel in the image will generate the corresponding gradient vector, or its normal vector. SED not only produces good detection effects, but also has smooth suppression effects on noise. In 2015, Liu, Y. et al. [21] investigated a water body detection method based on the combination of SED and contrast-stretching transformation. In 2021, Wang, B. [22] employed SED to separate the coastline in sentinel-1 images and fused these with original SAR images to view the results clearly. SED not only produces a good detection effect, but also has a smooth suppression effect on noise. However, the obtained edges are thick, and may appear to be pseudo edges.

Level Set Method
Referring to some important ideas in fluids, Osher and Sethian proposed the Level Set algorithm [23] in 1988, which is an effective numerical method for solving the boundary evolution problem of water body detection, and the calculation is stable and suitable for arbitrary dimension spaces. The method was popular in early water-body segmentation in SAR images. In 2008, Silveira, M. et al. [24], basing their research on a regional level

Edge Detection
Edge information of water is the most intuitive and important feature, and it is also an essential image feature for early research in the task of extracting water from SAR images [14,15]. Edge detection involves finding a set of pixels whose values change sharply in the image, which is often the contour of water. These methods are usually sensitive to speckle noise in SAR images.

Canny Edge Detection
Canny Edge Detection (CED) performs two improvements based on the first-order differential operator: non-maximum suppression and double threshold. Non-maximum suppression can suppress multi-response edges effectively, contributing to improvement in edge location accuracy [16]. At the same time, double threshold can effectively reduce the rate of missed edge detection. In 2004, Liu H. et al. [17] utilized CED and the Levenberg Marquardt method to accelerate the convergence speed of the iterative Gaussian curve fitting process and so improve the reliability of the local threshold [18] of image segmentation. In 2021, Chen, F. et al. [19] applied classical CED to extract glacial lake outlines and other surface features by using different modes of Chinese GF-3 data in the Mount Everest region of the Himalayas. Although these methods are easy to perform, detected edge discontinuity is easily caused and the methods are sensitive to speckle noise.

Sobel Edge Detection
Sobel Edge Detection (SED), in essence, is a first-order discrete difference operator, which is used to calculate the approximate value of the image brightness function gradient [20]. Applying SED to any pixel in the image will generate the corresponding gradient vector, or its normal vector. SED not only produces good detection effects, but also has smooth suppression effects on noise. In 2015, Liu, Y. et al. [21] investigated a water body detection method based on the combination of SED and contrast-stretching transformation. In 2021, Wang, B. [22] employed SED to separate the coastline in sentinel-1 images and fused these with original SAR images to view the results clearly. SED not only produces a good detection effect, but also has a smooth suppression effect on noise. However, the obtained edges are thick, and may appear to be pseudo edges.

Level Set Method
Referring to some important ideas in fluids, Osher and Sethian proposed the Level Set algorithm [23] in 1988, which is an effective numerical method for solving the boundary evolution problem of water body detection, and the calculation is stable and suitable for arbitrary dimension spaces. The method was popular in early water-body segmentation in SAR images. In 2008, Silveira, M. et al. [24], basing their research on a regional level set, established a mixture of lognormal densities as the probability model of water, or non- water areas, in ERS-2 images, and employed the expectation maximization (EM) method to estimate the probability density function of each area. In 2009, Silveira, M. et al. [25] estimated the PDF in each area using Parzen window [26] on the basis of previous work [24]. Nonparametric density estimation makes the method more flexible, and the researchers applied the method to river mapping and coastline extraction in Envisat ASAR images [27]. In addition, the regional level set method is still effective in water area segmentation of X-band SAR images. In 2012, Cafaro, B. et al. [28] obtained texture features, based on variational calculus and regional level set, to subsequently segment water bodies in COSMO-SkyMed X-band images, using a Support vector machine (SVM). Later, many more excellent algorithms appeared for water-body segmentation, but the idea of the level set algorithm has a far-reaching influence on region-based ACM developed after 2001. In 2021, Chen, F. et al. [19] introduced the variational B-spline into the level set method, extracting glacial lake outlines in Chinese Gaofen-3 images. This approach can preserve many details of water/land boundaries, especially for uneven areas. On the other hand, the method can also effectively smooth noise in SAR images, which is difficult in the traditional level set method. The Level set method has the advantage of automatic topology, but at the cost of computational consumption.

Active Contour Model
In 1988, Kass, M. et al. [29] proposed the ACM to transform the image segmentation problem into finding the minimum value of the energy universal function, which provided a new idea for image segmentation. The main principle of ACM is to construct an energy universal function. Driven by the minimum value of energy function, the contour curve gradually approaches the edge of the object to be detected, and finally segments the target. ACM is the most widely used image segmentation method using a variational idea [30]. Its biggest advantage is that it can also obtain a continuous, and smooth, closed segmentation boundary in the case of high noise. There are many methods to construct energy functions, and thus ACM can be mainly divided into edge-based and region-based methods. However, the drawback of this kind of method is complexity of calculation, and it requires many attempts to select optimization parameters.

Edge-Based Active Contour Model
The edge of the target is not only important information to represent the target, but also an important basis for segmenting the target. Kass, M. et al. proposed a typical edge-based ACM, which mainly detects the target according to the gradient jump of the target edge. On this basis, Cohen, L. [31] developed the notion of some form of pressure term on ACM to maintain its dynamic behavior in regions of low edge density, which segmented the North Sea shoreline on the east coast of the UK. In 2000, Niedermeier, A. et al. [32] refined the water edge by local edge selection and propagation along the wavelet scale, prior to the active contour algorithm connecting edge segments. They used this to investigate the Elbe estuary, located in the intertidal zone of the German Gulf.

Region-Based Active Contour Model
Based on the idea of the level-set algorithm, ACM segments an image by computing patch statistics in the image. In 2001, Chan and Vese proposed a classical region-based C-V ACM [33], which is based on the simplified Mumford Shah model, and uses the idea of level set to evolve the edge curve, through finding the minimization value of energy function. In 2003, Heremans, R. et al. [34] employed region-based ACM to record, during the flood period of Flanders, on ENVISAT ASAR images, as well as compare the method with the object-oriented method. Then, features of the two methods were discussed. Object-oriented algorithms can usually describe the water-body area in SAR images more accurately, while ACM tends to find the largest area maintaining low color variance. In 2010, Hahmann, T. et al. [35] applied two different ACM methods for SAR image segmentation and discussed the results of both models with regards to TerraSAR-X images, with calm and rough water surfaces. One method, the geometric ACM by Wasilewski [36], showed good results for calm water bodies with low backscatter values, but rough water surfaces could not be extracted with satisfying results. The other method, the parametric ACM, by Hamarneh [37], worked well for calm and rough water bodies. Regrettably, the main drawback of the method is that change in topology is impossible and careful parameter adjustment is necessary.
In 2014, Li, N. et al. [38,39] described an approach, mainly based on ACM, to separate rivers from their backgrounds. The proposed method gave promising results. However, the performance of the algorithm depends too much on the selection of parameters in the preprocessing used to reduce speckle noise. In 2015, Wang, W. et al. [40] extracted river area by clustering and morphological processing on Radarsat-1 images. Then the edge of the river was extracted with the wavelet modulus maximum method (WTMM), and smoothed by ACM. In 2008, Lv, J. et al. [41], based on ACM, segmented the Luoma lake on different SAR image resolutions, and found that the higher the resolution, the better the segmentation performance of ACM was. Subsequently, research exploited modified narrow-band-section ACM [42], detecting the changing area of Huangqi Lake based on a series of Sentinel-1 images. This method combines region-based and edge-based ACM to realize rough to fine water-body segmentation and improve segmentation efficiency. In 2017, Hu, J. et al. [43] introduced multi-scale technology to the C-V model, reducing the image size and obtaining a series of images under different spatial resolutions. Experiments show that the method accelerates the acquisition of initial level-set formation, shortens the time of extracting coastline, while, at the same time, removing non-coastline, and improves identification precision. In 2019, Li, N. et al. [44] roughly segmented, by Fuzzy C-means (FCM) clustering algorithm, and then used geometric ACM for more accurate water area segmentation. Finally, a strategy of division in local regions (SDLR) is proposed to process images in parallel, which greatly reduces computing time.

Clustering Methods
The principle of clustering algorithms is to class the pixels in the image into different classes, or clusters, according to specific standards (such as some distance rules), so that the similarity of pixels in the same cluster is as strong as possible, and the difference of pixels that are not in the same cluster is as large as possible. That is, after clustering, the pixels of the same class should be clustered together as much as possible, and different pixels should be separated as much as possible. In the field of SAR water-body segmentation, K-means [45] and Fuzzy C-means (FCM) [46] are two main clustering methods.

K-Means
The K-means clustering method is one of the most commonly used clustering algorithms based on distance rule in SAR image segmentation. It believes that the closer the "distance" of two targets is, the greater the similarity is. In 2010, Wang, M. et al. [47] presented a novel method to segment water/land in SAR images based on texture information after wavelet transform [48] to first extract a gray level co-occurrence matrix (GLCM). Then wavelet transform was utilized to extract texture feature information, and, finally, a Kmeans clustering algorithm was employed to segment water bodies in SAR images. In 2016, Liu, Z. et al. [49] employed the K-means to generate an initial over-segmentation result for the following region-merging stage. Then, an adaptive strategy that uses subregion classification based on rough-fine object-based region-merging is introduced to extend the automatically selected 'sea' and 'land' seed, to segment finely the final coastline in Sentinel-1A images. In 2017, Zheng, X. et al. [50] extracted GLCM, and other texture feature information, after wavelet transform to establish a multi-dimensional feature space, then employed a K-means clustering method to separate water/land in Sentinel-1 images. In 2018, Wu, L. et al. [51] applied K-means, together with thresholding and a region-growing method. to extract dark region features in water bodies. They then used the support vector machine (SVM) to recognize algal bloom in Taihu Lake in Sentinel-1 and Landsat-8 images. In 2019, Obida, C. et al. [52] drew the time series river network of the Niger Delta in multitemporal Sentinel-1 images, based on K-means nested in SNAP software. The K-means algorithm is implanted into relevant SAR processing software, such as SNAP and ENVI [53], reflecting the maturity and popularity of this method. In 2020, Landuyt, L. et al. [54] presented an unsupervised object-based clustering framework, which can segment floods, whether or not they are covered by vegetation. In detail, the region of interest is first segmented into different objects and clustered by K-means. These clusters are then classified according to an adaptive threshold, and the resulting classification is refined through several region-growing post-processing steps. Finally, dry land, permanent water, open flood and submerged vegetation are segmented in the SAR image. K-means is quite important, in that it has far-reaching influence on the super-pixel segmentation method.

Fuzzy C-Means
Different from the hard clustering of K-means, FCM [55] optimizes the objective function, and calculates the membership degree of each pixel to all class centers so as to determine the category of sample pixels. In this way, FCM achieves the purpose of automatic classification of the sample image. The description of some pixel categories is fuzzy, which can more objectively reflect category information in SAR images. In 2017, Leng, Y. et al. [56] utilized a pixel level change detection method to determine the initial clustering center, and then employed FCM to divide terrains into three categories in Sentinel-1 images. Finally, transition region pixels were further divided into water body and background by nearest neighbor clustering (NNC). In 2019, Li, N. et al. [44] employed FCM to roughly segment water in Gaofen-3 and Sentinel-1 images, which effectively accelerates the convergence of the subsequent fine segmentation model. In 2020, Li, N. et al. [57] presegmented water bodies based on FCM, then reconstructed high-resolution SAR images and extracted water boundaries with high precision.

Nonlinear Clustering Method
In addition to K-means and FCM, there is an interesting nonlinear clustering method [58] that, in 2013, performed well for segmenting water bodies in SAR images. The research proposed an automatic strategy, starting by considering the digital elevation model (DEM) as a priori information, and re-balancing the different types of pixels (non-water, permanentwater, and temporary-water), then separating water/non-water pixels, and, finally, separating temporary-water from permanent-water pixels by utilizing non-linear clustering in dedicated feature spaces.

Random Forest
The random forest (RF) algorithm was first proposed by [59]. It is an ML algorithm, based on a classification and regression decision tree. It can analyze the importance of up to thousands of input features. The main idea is to integrate the results of many decision trees for overall analysis of classification tasks. In 2015, Xie, L. et al. [60], based on RF algorithm, classed water bodies further into lakes, ponds and canals in Radarsat-2, after segmenting the water bodies by Wishart classifier. This explored the potential of water type classification based on PolSAR images. In 2019, Zhou, X. et al. [61] proposed an easy-to-implement method to solve the problem that previous methods could not describe large-scale statistical features. Their method combines RF and multi-features, including a novel statistical feature, named object-based generalized gamma distribution (OGΓD), and two texture features from GLCM and the Local Moran Index (LMI), respectively. In 2020, Shen, G. et al. [62] employed the object-oriented (OO) method to segment component images after Freeman-Durden polarimetric decomposition [63,64]. Then RF was used to segment the image to extract open water. In consequence, the method has more advantages in spatial continuity of classification results over the H/A/α-Wishart [65].

Support Vector Machine
The support vector machine (SVM) [66] is a common supervised learning classifier. SVM shows many distinctive advantages in solving the problems of nonlinear and highdimensional classification and recognition of remote sensing images. The goal of SVM classification is to find an optimal hyperplane, separating all pixels in the image as far as possible, while meeting classification constraints, and making pixels farthest from the hyperplane. In 2010, Lv, W. et al. [67] proposed a novel method for water-body segmentation in SAR images that combines GLCM-based features and SVM. GLCM-based features distinctively depicted the characteristics of water and non-water pixels, which were fed into the SVM classifier to extract water bodies in SAR images. In 2011, Wang, Y. et al. [68] focused more on the different textures between water and non-water in SAR images and proposed a novel method combining circular-window gray features and GLCM. In detail, eighteen features of water and non-water were included in circular-window gray feature and GLCM were fed into SVM to segment coastline in TerraSAR-X images. In 2012, Cafaro, B. et al. [28] selected the most significant feature via a 1−norm SVM binary classifier [69], transforming SAR images into a suitable feature space, and then separating water bodies from non-water. In the same year, Klemenjak, S. et al. [70] adapted and extended the method in their research [71], proposing a classification strategy that considers SVM as an unsupervised method, of which the premise is that the training samples are generated automatically. In 2018, Kreiser, Z. et al. [72] described water across synthetic aperture radar data (WASARD), which utilized SVM to segment the water bodies automatically in Australia. In the end, WASARD has the ability to segment water bodies of interest with an accuracy of 97% with Geoscience Australia's Water Observations from Space (WOFS). In the same year, Wu, L. et al. [51] applied an algal bloom recognition SVM model to separate algal bloom dark areas and low wind dark areas (water bodies) after feature extraction and analysis of Sentinel-1A images. The scattering characteristics of algae and calm water surfaces are quite similar, so it was very difficult to distinguish them in SAR images before the study. On the basis of water-body segmentation, SVM can be considered a technical means to monitor algal bloom in inland lakes. In 2019, Qin, X. et al. [73] studied the effectiveness of water-body segmentation by employing polarimetric decomposition components and SVM for Gaofen-3 images. In this study, six polarization decomposition components from Cloude and Freeman decomposition were used to feed SVM, which achieved satisfactory results by taking advantage of limited training samples, and proved the effectiveness of SVM in segmenting water bodies.

Markov Random Field
Markov Random Field (MRF) simulates the image as a grid, composed of random variables, and the gray value of each pixel is dependent on adjacent groups [74]. This model considers the conditional distribution of each pixel on its adjacent pixels, and effectively describes the local statistical characteristics of the image, which can be used for processing the pixel spatial relationship of SAR images and for classifying all pixels. In 2005, Deng, H. et al. [75], to fuse image spatial relationships and various image features, developed a Bayesian-based MRF method, which weights the two parameters in the MRF model by a function-based parameter. This method allows automatic estimation of parameters in MRF model parameters to generate more accurate unsupervised segmentation results. In 2010, Martinis, S. [76] presented a hybrid multi-contextual Markov model, which can segment flood in a near real-time way in multi-temporal X-band SAR data. It works by combining Hierarchical Marginal Posterior Mode (HMPM) estimation, on directed graphs, with non-causal Markov image modeling, related to planar MRFs, so scale and spatiotemporal context information are introduced in classification decision making. In 2011, Cao, F. et al. [77] presented a multi-scale line extraction approach to extract the river in Ka-band SWOT images. The method used multiscale technology to obtain image pyramids, and then employed MRF to connect river candidate segments, based on the results of water-body segmentation by the threshold method [78]. However, only some preliminary results on simulated SWOT SAR images showed the effectiveness of this method. In 2017, Lobry, S. et al. [79] proposed a double-MRF classification method that can explain local and global variations of internal parameters of SAR images and generates more regularized segmentation on water/land. This method was introduced in airborne TropiSAR and simulated SWOT HR images to extract the river in the Camargue area, France.

Mathematical Statistical Algorithm
Mathematical statistics methods are used to discover inherent regularity by observing the frequency of the same category of pixels, and then building a targeted statistical model to make judgments and predictions on other pixels. These methods are generally suitable for highly heterogeneous, but highly dependent, statistical models [80]. The Wishart classifier and the Bayesian classifier are usually used to segment water bodies in SAR images.

Wishart Classifier
Wishart distribution is the result of generalization of x 2 distribution on multivariate, which can describe the covariance matrix of multivariate normal distribution samples. Therefore, it is very suitable to describe the covariance matrix representing fully polarized SAR data, and many SAR image classification researches are carried out on this basis. In 1994, Lee, J. et al. [81] first proposed a supervised classification method about Wishart, based on Maximum Likelihood (ML) criterion, namely Wishart-ML. In 2015, Xie, L. et al. [60] extracted water bodies by employing Wishart-ML and reduced false alarms in building areas with the help of spatial semantic information. False alarms have a negative impact on the classification results of complex urban areas, so this method segments lakes better than canals and ponds. In 2017, Zhang, X. et al. [82] performed, and evaluated, a multiclass water body change detection method based on complex Wishart distribution, for CPSAR data, of which the purpose was to discuss the capability of Gaofen-3 simulated compact polarimetric SAR data of π/4 and CTLR mode for water-body extraction. The complex Wishart segmentation experiment results showed that the π/4 mode has better water change detection ability than the CTLR mode. In 2018, Irwin, K. et al. [83] utilized the H-Alpha-Wishart, a classifier corresponding to dual-polarization SAR data, to separate water and flooded vegetation. Nevertheless, the H-Alpha-Wishart classifier is not suitable for separating water around ice, and misclassified fields and ice as water. In 2020, Shen, G. et al. [62] further employed the H/A/α-Wishart to segment the water-body from the background with Gaofen-3 qual-polarization data. The method, based on the three components obtained by Cloude decomposition, employed the Wishart classifier to separate water and land. The segmentation result of this method is promising, but the spatial continuity is weak.

Bayesian Classifier
Bayesian classification is mainly based on the Bayesian theorem, which is widely used in SAR image classification, because it gives the optimal solution to minimize error in theory. In detail, the pixels are classified by calculating the probability that the pixels belong to a specific class. In Bayesian classification, all features have potential impact on segment results, which can mean all features participate in the classification. In 2005, Deng, H. et al. [75] applied a Bayesian segmentation approach to seamlessly combine image spatial relationships with various image features, and experiments demonstrated that the method is successfully able to segment water/ice in SAR images. In 2016, D'Addabbo, A. et al. [84] proposed a Bayesian network to integrate remotely sensed images, such as multi temporal SAR intensity images and interferometric-SAR coherence data, with geomorphic and other ground information. The methodology was employed to extract flood in the Basilicata region (Italy) in December, 2013, based on multi-temporal COSMO-SkyMed images, with accuracy up to 89%. In 2019, Qin, X. et al. [73] employed the Naive Bayesian classifier to explore the effectiveness of water body extraction with polarimetric decomposition in Gaofen-3 images, demonstrating that combining polarimetric decomposition components and Naive Bayesian classifiers can achieve satisfactory accuracy in water body extraction.
In summary, the advantages and disadvantages of several typical segmentation methods are shown in Table 1.

Water-Body Extraction from SAR Images Based on DL
In order to facilitate the summary of the development process of SAR image waterbody segmentation methods based on DL, this paper focuses on describing the early artificial neural network (ANN), water body extraction based on the idea of CNN image classification, water-body segmentation based on the semantic segmentation task of full convolutional neural network, and the introduction of existing and novel model ideas. The related models are shown in Figure 3.

Background of DL in Remote Sensing Field
In recent years, DL has been making great achievements in the remote sensing field [85]. More and more scholars are constantly studying how to make CNN technology play a greater role in remote sensing, and to make it more suitable for the tasks of SAR remote

Background of DL in Remote Sensing Field
In recent years, DL has been making great achievements in the remote sensing field [85]. More and more scholars are constantly studying how to make CNN technology play a greater role in remote sensing, and to make it more suitable for the tasks of SAR remote sensing image classification [86], target detection [87], change detection [88] and so on. For the task of water segmentation, the essence of the traditional methods introduced above are to manually design certain mappings or criteria to determine the category of pixels in the images. For SAR images with a large amount of speckle noise, it is difficult to adapt the mappings to all the pixels in the whole SAR image. By iterating the shared weights in the model, neural network technology finally constructs a mapping model that can adapt to all pixels.

Pixel Level Water Body Classification Based on CNN
The advantage of CNN lies in its multilayer structure, which can automatically extract not only image features, but multilevel features. The shallow feature extraction layer can learn more intuitive image features, while the deeper convolution layer has stronger nonlinear expression ability and can learn more abstract features. These abstract features can represent the more essential information of the target, and are less sensitive to the size, position and direction of the object, which is helpful to improve recognition performance.
In fact, before CNN appeared, some scholars tried to use ANN [89] to realize pixel level water body classification, which can be regarded as the prototype of CNN. In 2012, Latini, D. et al. [90] purposed Pulse Coupled Neural Networks (PCNN) to automatically extract water bodies from COSMO-SkyMed data with different polarization, geometric configuration, and measurement modes. PCNN is an early neural network model. When applied to SAR images, the image information is represented as a series of binary pulse signals, and each signal is related to a pixel, or a group of pixels. It is worth mentioning that it does not need training and belongs to an unsupervised classification algorithm. In 2018, Dasgupta, A. et al. [91] proposed an Adaptive Neuro-Fuzzy Inference System (ANFIS) applied to texture-enhanced single-polarization SAR data. Specifically, the second-order statistical texture of SAR images is first extracted, and then it is optimized by dimension reduction technology. Subsequently, Gaussian curves were used to model t flood, and non-flood, categories in a fuzzy inference system. Finally, ANN was trained by defining multiscale polygons in the image.
After 2012, the emergence of CNN quickly attracted the attention of many scholars in the field of computer vision. The abstract features extracted by convolution kernel are very helpful for water-body segmentation, and can effectively judge what kind of objects are contained in SAR images. In 2017, Li, J. et al. [92] used CNN to separate water/ice. In detail, the train data was obtained after dividing Gaofen-3 images into many patches according to different categories. Then, the different categories of train data were used to train the CNN. Finally, the trained CNN model was used to segment water and ice by a patch-based window traversing the entire SAR image. In the same year, Ren, Y. et al. [93] inputted the three-dimensional image, composed of HH, HV and incident angle, in dual polarization SAR image into AlexNet [94] to separate water and ice in Sentinel-1 images. Instead of building a new CNN model, the researchers employed AlexNet in a transfer learning approach [95] to segment ice and water in SAR imagery. In contrast, some researchers segmented the image into many patches containing only one category of information, and dyed the input image the same color when the network returned the category prediction results. Although this method will damage the edge information between different categories, it is valuable in the classification of open water in large-scale SAR images, and can greatly reduce consumption of computing resources. In 2018, Wang, C. et al. [96] utilized image slices of three spatial scales (32 × 32, 64 × 64, 128 × 128) to make different data sets as CNN training input, and tried to find the best proportion of feature description with different slice sizes. They then used the trained CNN to separate water and ice in ScanSAR images. In 2020, Boulze, H. et al. [97] used the same strategy to segment water and ice in Sentinel-1 images, demonstrating that larger image patches can achieve better segmentation results. In 2021, Chen, K. et al. [98] employed CNN to segment water bodies in four scenarios: plain lakes, mountain lakes, narrow rivers and wide rivers. The experimental results are much better than OTSU, SVM and other algorithms. However, the effect of CNN on the extraction of fine and narrow rivers is not satisfactory, implying that a large number of patches may damage the coherence of fine and narrow rivers and have a negative impact on segmentation results.
Boundary is the most important information for extracting a water body. However, due to the loss of some object details, CNN cannot effectively describe the specific outline of water and point out which category each pixel belongs to at the same time. Therefore, it is very reluctant to deal with intensive pixel classification tasks. In the above segmentation methods, based on CNN, in order to determine the category to which the current pixel belongs, an image patch centered on the pixel is regarded as the input of CNN to determine the category of the pixel. The method features several defects. First, the storage overhead is enormous. Second, the computational efficiency is low. The image patches corresponding to adjacent pixels overlap, which leads to the fact that convolution calculation is redundant to a great extent, and easily causes overlearning. Third, the sensing domain of the convolution layer depends on the size of the patch, rather than the whole SAR image. The size of the patches is much smaller than that of the whole SAR image, and only some local features can be extracted, which limits performance in segmenting water bodies.

Image Level Water-Body Segmentation Based on CNN
Different from the pixel-level classification, described in Section 3.2, the idea of CNNbased image-level segmentation is to take the entire image as input, instead of some relatively independent pixels or patches. As a whole picture is fed to the neural network model, local and global semantic information will be referenced by the model in various degrees, which means that the model is more robust and 'intelligent'.

Water-Body Segmentation Based on Existing Network Model
The water-body segmentation task at SAR image level is the assigning of a semantic category to each water, or non-water, pixel in the input image to achieve a pixelated dense classification. Although water-body segmentation from SAR images by neural network has been an important part of SAR remote sensing applications since 2012, it was not until 2015, when Long, J. et al. [99] proposed full convolution network (FCN) for endto-end segmentation of natural images for the first time, that there was a real basis for establishing an appropriate neural network water-body segmentation model. FCN changes the full connection layer at the tail of the network model CNN into a full convolution layer, so that the network can elegantly realize pixel level dense prediction. Afterwards, U-Net [100] transmits the shallow feature information of the network to the deep through jump-connection and integrates the corresponding feature information, making the network model more suitable for the task of small sample data set segmentation, the architecture of which is shown in Figure 4.
U-Net has been quite a popular network model in the field of SAR image water-body segmentation in recent years, which has proved that the result of water-body segmentation is better than traditional algorithms [19,101]. Considering the cost and condition constraints of SAR image acquisition, it is relatively difficult to obtain large-scale data samples. In addition, SAR images are vulnerable to speckle noise, angle sensitivity and other problems, which makes it more difficult to accurately label water bodies. Therefore, the problem of small sample data sets in the field of SAR image water body extraction is more prominent. Fortunately, water bodies in SAR images have some unique advantages, which can give full play to the role of U-Net. On one hand, the microwave emitted by SAR has different scattering characteristics for water and land, which is very helpful for the model to adaptively train pixel level dense mapping. On the other hand, the contour of the water area is smooth and the structure is relatively single, which is an important factor for the good performance of U-Net small sample training [102]. In 2019, U-Net was employed to extract river channels in SAR images by way of transfer learning [103]. In addition, two different U-Net architectures are discussed to segment water in SAR images. One is to train U-Net from scratch, and the other is to obtain pre-trained weights from transfer U-Net. Experiments show that water-body segmentation results corresponding to the two methods have similar performances in F1-score, pixel accuracy and mean IOU. In 2020, in the field of water segmentation, Generative Adversarial Networks (GAN) [104] was used for data augmentation for the first time [105], to solve the problem of the limited number of SAR image samples. It replaced time-consuming manual annotations, and thus improved the efficiency of the model. Different from previous data augmentation operations, such as rotating, stretching, and flipping images, GAN can produce additional SAR images with labels, so as to increase the scale of the data set and improve the segmentation performance of subsequent U-Net architecture. In addition, FCN, as a comparative model, was also trained on the same data set, confirming that U-Net is superior to the performance of the FCN, especially in terms of mean IOU. In the same year, Dai, M. et al. [106] proposed a loss function, based on edge area, and a novel training data generation method, to improve the segmentation ability of the network. This is utilized to modify an improved water segmentation network based on the Bilateral Segmentation Network (BiSeNet) [107]. The BiSeNet restricts input size by resizing or cropping to reduce computational complexity. Although this method is simple and effective, it causes loss of spatial details, especially prediction near the boundary. New loss function design, based on edge information, can effectively solve this problem. Experiments showed that the modified model has advantages in both speed and segmentation accuracy. In 2020, Denbina, M. et al. [108] employed two CNN image classifier architectures, U-Net and SegNet, to map flooded areas in data collected by NASA/JPL Unmanned Aerial Vehicle Synthetic Aperture Radar (UAVSAR). Both U-Net and SegNet show higher accuracy than ML classifiers, and the accuracy of U-Net is slightly higher than that of SegNet. Also in the same year" Zhang, L. et al. [109] proposed a coarse-to-fine framework process to segment floods in SAR images. In detail, the coarse segmentation results were from the binary segmentation of GF-3 SAR, and the water indexes of GF-1/6 multispectral and Zhuhai-1 hyperspectral images. Then, the label images with noise were fed to U-Net to perform fine segmentation. The results demonstrate the reliability and accuracy of the process. In 2021, Katiyar, V. et al. [110] employed U-Net to explore which polarization combination is more suitable to separate permanent water bodies and inundated areas in SAR images, and experiments show that it is necessary to use at least two polarization images to participate in water area segmentation for high-precision rendering of flood areas. In the same year, Lalchhanhima, R. [111] introduced the concept of inception into U-Net, and a larger convolution kernel is replaced by smaller-sized convolution kernels. This method not only increases the receptive field of the model, but reduces computational consumption. And then Dong, Z. et al. [112] employed five existing CNN models, including HRNet, DenseNet, SegNet, ResNet and DeepLab v3+, for detecting flood in the Poyang Lake. The results show that, compared with traditional algorithms, DL methods have a better suppression effect on speckle noise. In 2021, Konapala, G. et al. [113] subsequently employed U-Net to fuse Sentinel-1 and Sentinel-2 images to explore the possibility of multi-source remote sensing images as water-body segmentation.
After 2020, scholars gradually tried to modify U-Net to achieve better performance, or to explore the applicability of more advanced semantic segmentation models in the field of computer vision to segment water in SAR images [114]. In 2021, Bayesian was introduced in U-Net for water-body segmentation [115], and the Bayesian neural network was created, which considers dropout as a random sampling layer in U-Net architecture. In detail, based on multiple forward passes, a sampling distribution is created to estimate the model uncertainty of each pixel in the segmentation mask. Finally, the Bayesian neural network led to 95.24% dice similarity, which is an overall improvement in segmentation performance compared to U-Net. U-Net lowers the threshold of the magnitude of the data set required by the neural network. One of the biggest challenges for using DL to segment water bodies in SAR images is lack of accurate annotation data sets suitable for the correct training of supervision algorithms. Asaro, F. et al. [116] showed an effective solution that exploits a weak-labeled dataset, rather than absolutely relying on labels with manual annotation. In detail, this work introduced how to train U-Net to segment the water surface in SAR images from weak-labeled datasets.
The water-body segmentation task at SAR image level is the assigning of a semantic category to each water, or non-water, pixel in the input image to achieve a pixelated dense classification. Although water-body segmentation from SAR images by neural network has been an important part of SAR remote sensing applications since 2012, it was not until 2015, when Long, J. et al. [99] proposed full convolution network (FCN) for end-to-end segmentation of natural images for the first time, that there was a real basis for establishing an appropriate neural network water-body segmentation model. FCN changes the full connection layer at the tail of the network model CNN into a full convolution layer, so that the network can elegantly realize pixel level dense prediction. Afterwards, U-Net [100] transmits the shallow feature information of the network to the deep through jumpconnection and integrates the corresponding feature information, making the network model more suitable for the task of small sample data set segmentation, the architecture of which is shown in Figure 4. U-Net has been quite a popular network model in the field of SAR image water-body segmentation in recent years, which has proved that the result of water-body segmentation is better than traditional algorithms [19,101]. Considering the cost and condition constraints of SAR image acquisition, it is relatively difficult to obtain large-scale data samples. In addition, SAR images are vulnerable to speckle noise, angle sensitivity and other In addition to the above methods of implanting transfer learning, Bayesian and weak supervision into U-Net (encoder-decoder structure), scholars have gradually tended to explore how to modify the structure of U-Net, so that it can deal with more water-body segmentation scenes, such as Multi-source Satellite Data fusion and multi-temporal waterbody segmentation. Specifically, Li, J. et al. [117] introduced attention block and spatial pyramid pooling (SPP) modules into U-Net to construct a robust water body extraction network (PA-U-Net). In detail, the APP module is located between the encoder and decoder and is used for multi-scale feature extraction from deep layers. The attention block lies in the decoder part and can focus the network on the detection of water objects. The attention block [118] can make the neural network focus on more valuable features, so as to increase the sparsity of the network model. SPP module [119] can fuse more context semantic information, so that the segmentation layer at the end of the model can sense more global information and reduce the probability of false segmentation. The principle of SPP is shown in Figure 5a. Further, Dirscherl, M. et al. [120] proposed a modified U-Net for water-body segmentation of supraglacial lakes in single-polarization Sentinel-1 images. In the modified model, skip connection is used to fuse shallow and deep features, and the Atrous Spatial Pyramid Pooling (ASPP) module [121] is used for multiscale feature extraction. The principle of ASPP is shown in Figure 5b. In addition, the researchers employed the modified U-Net to extract the water body in Sentinel-1 and Sentinel-2 images respectively, and realized multi-source satellite data fusion through decision-level image fusion to further improve accuracy. The extraction results confirmed the reliability of the workflow. For water bodies, the kappa coefficient was 0.925 and the F1 score was 93.0%. Based on the idea of dilated convolution [122,123], ASPP introduced dilation rate, on the basis of SPP, which allows the neural network model to capture multiscale semantic information. For attention mechanism, Ren, Y. et al. [124] further introduced the dual-attention mechanism into the original U-Net, forming a dual-attention U-Net model (DAU-Net), which can improve the feature representation ability of the model and perform higher accuracy water-body segmentation tasks. Dual-attention is composed of the position attention module (PAM) and the channel attention module (CAM), located in between the decoding and encoding structure of U-Net, as Figure 6 shows. PAM can aggregate the eigenvalues of all locations through weighted summation, so as to update the eigenvalues of specific locations, which facilitates capturing the global spatial correlation of any two locations. While CAM updates the eigenvalues at a position, the channels corresponding to the same, or similar, category responses will be assigned large weights, thereby obtaining interdependencies among channels. Experiments in three Sentinel-1 images show that water-body segmentation accuracy of DAU-Net is about 1% higher than that of U-Net.  On the other hand, with the continuous development of DL in the field of computer vision, scholars in the field of SAR water-body segmentation are also trying more advanced network models, such as DeepLab and High-Resolution Network (HRNet) to segment water bodies in SAR images. In 2021, Kim, M. et al. [125] employed representative  On the other hand, with the continuous development of DL in the field of computer vision, scholars in the field of SAR water-body segmentation are also trying more advanced network models, such as DeepLab and High-Resolution Network (HRNet) to segment water bodies in SAR images. In 2021, Kim, M. et al. [125] employed representative DL segmentation models, such as FCN, U-Net, DeepUNet, and HRNet, to perform water- On the other hand, with the continuous development of DL in the field of computer vision, scholars in the field of SAR water-body segmentation are also trying more advanced network models, such as DeepLab and High-Resolution Network (HRNet) to segment water bodies in SAR images. In 2021, Kim, M. et al. [125] employed representative DL segmentation models, such as FCN, U-Net, DeepUNet, and HRNet, to perform water-body segmentation tasks in KOMPSAT-5 images. DeepUNet is a deeper model with more layers than U-Net. The characteristic of the model is to add a 'plus layer', which can prevent loss of network from expanding and the model from falling into local optimization. HRNet is one of the latest network models for semantic segmentation, typically retaining the features of the high-resolution, while extracting low-resolution features. The results show that the performance of the DeepUNet model is similar to U-Net, while the counterpart of HRNet is 8.49% higher than that of U-Net. The corresponding results of DeepUNet show that simply increasing the depth of network model cannot significantly improve the accuracy of water-body segmentation in SAR images. This is because the water/land boundary information in SAR images is relatively clear, and the network can segment it well without extracting target features that are too abstract. In other words, the network model should pay more attention to image semantic information, or the inherent characteristics of SAR images, to improve segmentation accuracy. In the same year, Verma, U. et al. [126] tried to employ U-Net and DeepLabV3+ [121] to perform river segmentation and subsequent width measurement. Both the models performed competitively and achieved an mIoU of 0.96. However, compared to DeepLabV3+, U-Net obtained a more satisfactory segmentation result. This is because the dilated convolution in DeepLabV3+ model means some pixels are unable to participate in the operation, and these pixels happen to be key in affecting segmentation results. In addition, this work obtained significant improvement in accuracy in measuring river width, compared to existing river width measurement approaches, in some instances, U-Net and DeepLabV3+ failed to identify a very narrow river.

Water-Body Segmentation Based on Novel Network Models
In recent years, with increasing understanding of DL by scholars in the field of remote sensing, more novel network models for water body extraction in SAR images have been springing up. In 2019, Zhang, J. et al. [127,128] explored the role of two new convolution structures: depth-wise separable convolution [129] and dilated convolution, and then presented depth-wise separable dilated convolution network architecture. The network extracts high-dimensional features by means of depth-wise separable and dilated convolutions (DSDC), then constructs an up-sampling and decoding module, based on bilinear interpolation, to interpret the extracted features. Finally, corresponding segmentation results are outputs. Different from the convolution operation, as shown in Figure 7a, the principles of depth-wise separable and dilated convolutions are shown in Figure 7b,c, respectively. Depth-wise separable convolutions can adjust the size of the feature maps of the upper layer, or the number of feature maps of the upper layer, so as to reduce calculation cost. The segmentation results showed that the network model remarkably improved segmentation accuracy, and segmentation robustness, and shortened time spent, compared to existing network models, including U-Net and DeepLabV3+. In a parallel year, Gao, Y. et al. [130] proposed a transferred multi-level fusion network (MLFN) to segment water/ice in SAR images, which employed cascade dense blocks to optimize model feature extraction capability. In order to make full use of the complementary information among low-, mid-, and high-level feature representations, multi-layer feature fusion was introduced. Thus, MLFN has the ability to achieve more discriminative feature extraction. In 2020, Li, N. et al. [57] employed FCM for rough segmentation, and then utilized a lightweight residual CNN for local super-resolution restoration, to achieve high-precision water segmentation from SAR images in a clever way. Bai, Y. et al. [131] employed BASNet [132] to identify surface water, permanent water, and temporary water. BASNet combines a dense encoder-decoder structure, similar to U-Net, and a modified Residual Refinement Module (RRM). The encoder-decoder structure can generate a coarse probability prediction map after inputting the image, and RRM learns the loss between the map and the ground truth. Moreover, with hybrid loss [133], the model can focus more on boundary information and increasing the reliability of the prediction. Experiments show that in the Sen1Floods11 dataset, segmentation accuracy of this method for permanent water is 93.84%, which is higher than DeepLabV3+ 3.66%, and the accuracy of temporary water segmentation is 92.81%, higher than DeepLabV3+ 2.54%.

Portable Module in Neural Network for Water-Body Segmentation
In the near past, while some scholars began to think about how to propose novel network models, others tried to develop novel portable modules in models to better perform the task of water area segmentation. In 2021, Zhang, J. et al. [134] proposed the cascaded fully-convolutional network (CFCN) to improve the performance of segmenting water bodies in high-resolution SAR images. Aiming at the resolution loss caused by convolutions, associated with large strides in CNN, scholars designed the fully-convolutional Up-sampling Pyramid Networks (UPNs) to reduce loss and realize pixel-wise water-body segmentation in SAR images. Then, for the fuzzy water boundary, fully-convolutional conditional random fields (FC-CRFs) were introduced into UPNs, which reduce computational complexity and automatically learns Gaussian kernels in CRFs, leading to higher boundary accuracy. In addition, a novel variable focal loss (VFL) function has been proposed to improve inefficient training caused by imbalanced distribution of categories in training datasets, which is based on the frequency correlation factor, rather than the constant weighting factor of focal loss. Not only the UPN model, but also the VFL function improves water-body segmentation performance of the DeepLab model, confirming the portability of the VFL. Li, N. et al. [135] combined the Refined-Lee filter concept [136] and the filtering characteristics of convolution, to propose the Refined-Lee Kernel (RLK), which can optimize the internal weight of the convolution kernel according to the geometric characteristic of the river. The principle of RLK is shown in Figure 8. Further, River-Net, a novel river extraction neural network model, was proposed to segment the Yellow River in Sentinel-1 images. The operation method of the convolution kernel is similar to the filtering process, which means that it is time-consuming to filter the characteristic image after the convolution operation. The RLK module can be obtained by Refined-Lee directly filtering the convolution kernel matrix, which can not only strengthen the feature extraction ability of the convolution kernel, but also reduce the cost of computation. On the other hand, River-Net can refer to more contextual semantic information and weaken negative effect on the segmentation results of independent water bodies with similar characteristics to rivers, such as pools and fish ponds around the river, in SAR images. The results showed that the model is superior to U-Net and DeepLabV3+ models. In addition, the RLK module can also improve segmentation performance of FCN and U-Net.

Portable Module in Neural Network for Water-Body Segmentation
In the near past, while some scholars began to think about how to propose novel network models, others tried to develop novel portable modules in models to better perform the task of water area segmentation. In 2021, Zhang, J. et al. [134] proposed the cascaded fully-convolutional network (CFCN) to improve the performance of segmenting water bodies in high-resolution SAR images. Aiming at the resolution loss caused by convolutions, associated with large strides in CNN, scholars designed the fully-convolutional Up-sampling Pyramid Networks (UPNs) to reduce loss and realize pixel-wise water-body segmentation in SAR images. Then, for the fuzzy water boundary, fully-convolutional conditional random fields (FC-CRFs) were introduced into UPNs, which reduce computational complexity and automatically learns Gaussian kernels in CRFs, leading to higher boundary accuracy. In addition, a novel variable focal loss (VFL) function has been proposed to improve inefficient training caused by imbalanced distribution of categories in training datasets, which is based on the frequency correlation factor, rather than the constant weighting factor of focal loss. Not only the UPN model, but also the VFL function improves water-body segmentation performance of the DeepLab model, confirming the portability of the VFL. Li, N. et al. [135] combined the Refined-Lee filter concept [136] and the filtering characteristics of convolution, to propose the Refined-Lee Kernel (RLK), which can optimize the internal weight of the convolution kernel according to the geometric characteristic of the river. The principle of RLK is shown in Figure 8. Further, River-Net, a novel river extraction neural network model, was proposed to segment the Yellow River in Sentinel-1 images. The operation method of the convolution kernel is similar to the filtering process, which means that it is time-consuming to filter the characteristic image after the convolution operation. The RLK module can be obtained by Refined-Lee directly filtering the convolution kernel matrix, which can not only strengthen the feature extraction ability of the convolution kernel, but also reduce the cost of computation. On the other hand, River-Net can refer to more contextual semantic information and weaken negative effect on the segmentation results of independent water bodies with similar characteristics to rivers, such as pools and fish ponds around the river, in SAR images. The results showed that the model is superior to U-Net and DeepLabV3+ models. In addition, the RLK module can also improve segmentation performance of FCN and U-Net.

Conclusions
Segmenting water bodies in SAR images is an important subject in the field of remote sensing. As more and more high-resolution, multi-band, and multi-polarized satellites are available, research on this subject is becoming more and more diversified, systematic and innovative. However, the complex terrain information in SAR images, and the ubiquitous coherent speckle noise, are always obstacles to the task of water-body segmentation. This article systematically reviews research results in this field over the recent 30 years. According to the characteristics of the algorithms, this research is divided into ML-based and DL-based water body extraction methods. The essence of the traditional water-body segmentation methods is to manually design a mapping, or criterion, to determine the category of pixels in SAR images. These methods have some problems, such as excessive dependence on the constructed mapping relationship, overly complex algorithm design, and low prediction accuracy. In addition, unreasonable algorithm design usually leads to weak feature expression. For SAR images with a large amount of coherent speckle noise, these mapping relations are difficult to adapt to all pixels in the entire SAR image. Fortunately, DL-based methods consider the neural network model structure as a torso, while the trainable parameters are seen as the blood flowing in it. Finally, an adaptive mapping relationship is created that can adapt to all pixels. The mapping relationship is not only based on the current pixel and its neighboring pixels, but refers to more contextual semantic information. These methods do not rely on specific algorithms or assumptions, and the segmentation effect is better than in traditional methods. However, DL methods also have their own limitations. Most existing network models require a lot of sample support and a large amount of manual labor to label data sets for training. On the other hand, since neural network technology aims at constructing the mapping relationship adaptively, its internal mechanism is difficult to interpret. Therefore, it is promising to seek to improve manual labor costs and model interpretation of DL methods.

Conclusions
Segmenting water bodies in SAR images is an important subject in the field of remote sensing. As more and more high-resolution, multi-band, and multi-polarized satellites are available, research on this subject is becoming more and more diversified, systematic and innovative. However, the complex terrain information in SAR images, and the ubiquitous coherent speckle noise, are always obstacles to the task of water-body segmentation. This article systematically reviews research results in this field over the recent 30 years. According to the characteristics of the algorithms, this research is divided into ML-based and DL-based water body extraction methods. The essence of the traditional water-body segmentation methods is to manually design a mapping, or criterion, to determine the category of pixels in SAR images. These methods have some problems, such as excessive dependence on the constructed mapping relationship, overly complex algorithm design, and low prediction accuracy. In addition, unreasonable algorithm design usually leads to weak feature expression. For SAR images with a large amount of coherent speckle noise, these mapping relations are difficult to adapt to all pixels in the entire SAR image. Fortunately, DL-based methods consider the neural network model structure as a torso, while the trainable parameters are seen as the blood flowing in it. Finally, an adaptive mapping relationship is created that can adapt to all pixels. The mapping relationship is not only based on the current pixel and its neighboring pixels, but refers to more contextual semantic information. These methods do not rely on specific algorithms or assumptions, and the segmentation effect is better than in traditional methods. However, DL methods also have their own limitations. Most existing network models require a lot of sample support and a large amount of manual labor to label data sets for training. On the other hand, since neural network technology aims at constructing the mapping relationship adaptively, its internal mechanism is difficult to interpret. Therefore, it is promising to seek to improve manual labor costs and model interpretation of DL methods.

Future Prospects
The introduction of DL methods has made it 'intelligent' for computers to extract water bodies in SAR images. However, there is still much room for improvement in the methods of segmenting water bodies from SAR images. According to relevant articles on watersegmentation for SAR images in the past 30 years, extracting water bodies from SAR images is more based on SAR image intensity information than amplitude, phase information, or even complex value information of SAR images. Due to the above more abstract SAR image information, it is quite difficult for traditional methods, such as edge detection, ACM, and the random forest method, to establish an accurate mapping relationship to extract waterbodies. However, DL methods can do these things without increasing difficulty. Further, the abundant polarimetric scattering information from polarimetric SAR should be fully utilized. The Wishart classifier, and the related polarization decomposition method, provide a solution to water-body segmentation for SAR images by using polarization information. However, the Wishart classifier and polarization decomposition are based on the coherence matrix. The premise of the coherence matrix is that the propagation medium meets the reciprocity condition, which is conducive to simplifying the model. but damages the original SAR data. At present, employing complex-valued image information of SAR images to achieve SAR image classification has achieved preliminary results, proving that it is feasible to feed the abstract SAR image information to neural network technology for water-body segmentation for SAR images with higher accuracy.
In addition, means to explore solutions for small-scale SAR data to train network models is still an urgent problem to be solved. Although a variety of SAR sensors continue to appear, the data scale of SAR images, and the large number of samples required, to train neural networks are still inherent contradictions. Worse, water and land boundary information only occupy quite a small part in the whole SAR image, which often leads to imbalance of the dataset in water-body segmentation tasks. Existing solutions focus more on data augmentation methods in the field of computer vision, such as the operations of copying, rotating, and flipping, as well as windowing operations, commonly used in previous water-body segmentation for SAR images. The methods attempt to increase the robustness of the network model and avoid over-learning. However, the models need to learn more diverse sample images, rather than simple duplication, rotation and windowing of the original samples. In this regard, GAN is a promising method to generate more 'intelligent' land and water boundary information and other sample image information to reduce the imbalance of the dataset, and further enhance data.
On the other hand, with the rapid development of DL methods in the field of computer vision, scholars are paying more attention to trying to utilize various excellent modules, such as separable convolution, dilated convolution, spatial pyramid pooling, self-attention mechanism modules, and to even propose novel modules to improve the network models in extracting water bodies in SAR images with high accuracy. These modules are functionally diverse but lack system integration. When using data sets to train established neural network models, a large number of experiments are often needed to optimize the hyperparameters. Further, while exploring novel modules conducive to water body extraction, and considering various excellent modules as "hypermodules", then the modules more suitable for extracting water will be found in SAR images. This has effectively contributed to guiding the design of neural network models and further improving the accuracy of water-body extraction.
To sum up, although the adoption of various DL models and architectures to segment water bodies in SAR images is increasing with promising results, the following issues remain to be improved on:

•
Exploit the characteristics of the SAR image itself, making full use of more amplitude, phase and complex information of SAR; • Explore the solutions for small-scale SAR data to train network models, and data augmentation methods to weaken the imbalance of datasets; • Develop novel modules suitable for water body extraction from SAR images and integrate existing excellent modules.