Superpixel Segmentation of Polarimetric Synthetic Aperture Radar ( SAR ) Images Based on Generalized Mean Shift

The mean shift algorithm has been shown to perform well in optical image segmentation. However, the conventional mean shift algorithm performs poorly if it is directly used with Synthetic Aperture Radar (SAR) images due to the large dynamic range and strong speckle noise. Recently, the Generalized Mean Shift (GMS) algorithm with an adaptive variable asymmetric bandwidth has been proposed for Polarimetric SAR (PolSAR) image filtering. In this paper, the GMS algorithm is further developed for PolSAR image segmentation. A new merging predicate that is defined in the joint spatial-range domain is derived based on the GMS algorithm. A pre-sorting strategy and a post-processing step are also introduced into the GMS segmentation algorithm. The proposed algorithm can be directly used for PolSAR image superpixel segmentation without any pre-processing steps. Experiments using Airborne SAR (AirSAR) and Experimental SAR (ESAR) L-band PolSAR data demonstrate the effectiveness of the proposed superpixel segmentation algorithm. The parameter settings, stability, quality, and efficiency of the GMS algorithm are also discussed at the end of this paper.


Introduction
In recent decades, object-based image analysis has become a new paradigm in remote sensing [1][2][3].Furthermore, in recent years, object-based classification for Polarimetric Synthetic Aperture Radar (PolSAR) images has become more and more popular [4][5][6][7][8][9][10][11][12].Image segmentation is an important pre-processing step in image interpretation and analysis, and object detection, recognition, and tracking [13].However, the state-of-the-art segmentation algorithms, such as the Fractal Net Evolution Approach (FNEA) [14], mean shift [13], spectral clustering [15], Simple Linear Iterative Clustering (SLIC) [16], and Statistical Region Merging (SRM) [17], are mostly designed for optical images.It is therefore necessary to develop segmentation algorithms specifically for PolSAR data.To date, various PolSAR image segmentation algorithms have been proposed.These algorithms can be classified into three categories according to the segmentation results, as follows: (1) Conventional Segmentation Methods.The basic principle of these methods is merging the adjacent and homogeneous pixels into the same region, and dividing the heterogeneous pixels into different regions.The size and shape of the obtained regions vary with the image scenes.Dong et al., [4] proposed using a Gaussian Markov Random Field (GMRF) model for PolSAR image segmentation and size even in large homogeneous areas, preserving the statistical characteristics of the image.Therefore, superpixel segmentation is more suitable for the object-based classification algorithms that are based on statistics such as the Wishart classifier.
Mean shift segmentation is a commonly used segmentation algorithm.Because the segmentation result of the mean shift algorithm is often piecemeal, it is usually regarded as a superpixel segmentation algorithm.However, the conventional mean shift algorithm is designed only for optical images and cannot be directly used with PolSAR images.Recently, Lang et al., [32] extended the conventional mean shift algorithm according to the characteristics of PolSAR images.The proposed Generalized Mean Shift (GMS) algorithm can be directly used for Synthetic Aperture Radar (SAR) and PolSAR image filtering, avoiding unnecessary information loss.In this paper, the GMS algorithm is further extended for the superpixel segmentation of PolSAR images.
The main contributions of this paper are as follows: (1) We propose a new merging predicate to be used for superpixel segmentation based on the basic GMS formula, and we further propose the GMS algorithm based on the new merging predicate.(2) To improve the accuracy of the segmentation, we introduce a pre-sorting strategy into the GMS segmentation algorithm after comparing the pre-sorting strategy and the row-column strategy used in the conventional mean shift segmentation algorithm.
(3) To suppress the influence of speckle noise and preserve strong point targets, we introduce a post-processing step into the GMS segmentation algorithm.
The rest of this paper is organized as follows.Firstly, in Section 2, we briefly review the materials and methods that are used in this study.A new merging predicate defined in the joint spatial-range domain is derived based on the GMS algorithm.A pre-sorting strategy and a post-processing step are introduced into the GMS segmentation algorithm.Section 3 describes the experimental results that were obtained with AirSAR and ESAR L-band PolSAR data and evaluates the effectiveness of the proposed algorithm.The parameter settings, stability, quality, and efficiency of the GMS segmentation algorithm are discussed in Section 4. Finally, in Section 5, we conclude the paper and summarize the next steps in this work.

Experimental Data Sets and Preprocessing
NASA/JPL Airborne SAR (AirSAR) and DLR Experimental SAR (ESAR) L-band PolSAR data were selected to evaluate the effectiveness of the proposed superpixel segmentation algorithm by comparison with the original mean shift algorithm [13], the normalized cuts (Ncut) algorithm used in [8], and the SLIC-PolSAR algorithm using the grid-centered initialization strategy (SLIC-GC) [24].The data sets that were used in this study are freely available from the website of the European Space Agency (ESA) (https://earth.esa.int/web/polsarpro/airborne-data-sources).Because the Ncut and SLIC-GC algorithms are sensitive to speckle noise, filtering processing is required before the segmentation step.In these experiments, the PolSAR data were preprocessed by the refined Lee filter [33].The GMS segmentation algorithm involves the GMS filtering process, so no extra filtering processing was needed.The first data set was a four-look AirSAR L-Band PolSAR image from Flevoland, the Netherlands.Since the original image was as large as 750 × 1024, for the convenience of showing details of the image, a region of 380 × 430, which was as the same as the one shown in Figure 9 in [8], was sampled from the original image.Because the points and lines in the image were small and fine, a speckle filter with a large window might have blurred the points, lines, and edges, which are very important in remote sensing image segmentation, so we used a relatively small window, 3 × 3, for the refined Lee filter.The Red-Green-Blue color image composited by Pauli decomposition parameters (Pauli-RGB image) of this area is shown in Figure 1a, and the 3 × 3 refined Lee filtered image is shown in Figure 1b.Since the filtering window was small, strong speckle noise can still be observed in Figure 1b.The second data set was an ESAR L-band PolSAR image from the Oberpfaffenhofen test site, Germany.The original single-look image was 2861 × 1540 in size.To facilitate the visual interpretation and evaluation, a pre-processing of two looks in the azimuth direction was performed to equalize the resolutions of the azimuth and range, and a region of 540 × 888, which was the same as the one shown in Figure 9 in [32], was sampled from the original image.Since the points and lines in the ESAR image were bigger and thicker than those in the AirSAR image, and the speckle noise of the former was stronger than the latter, we chose a 7 × 7 window for the refined Lee filter.The Pauli-RGB image of this area is shown in Figure 2a, and the 7 × 7 refined Lee filtered image is shown in Figure 2b.It can be observed from Figure 2 that the speckle noise is well suppressed, but some details are blurred, such as the dark lines in the areas that are marked by the ellipse.

Conventional Mean Shift
The mean shift procedure was proposed for nonparametric density gradient estimation by Fukunaga and Hostetler [Error!Reference source not found.],and it was improved and applied to data and image analysis, including smoothing, segmentation, clustering, and real-time tracking of objects by Cheng [34] and Comaniciu et al., [13,[35][36][37][38].The following is a brief introduction to the conventional mean shift algorithm.The second data set was an ESAR L-band PolSAR image from the Oberpfaffenhofen test site, Germany.The original single-look image was 2861 × 1540 in size.To facilitate the visual interpretation and evaluation, a pre-processing of two looks in the azimuth direction was performed to equalize the resolutions of the azimuth and range, and a region of 540 × 888, which was the same as the one shown in Figure 9 in [32], was sampled from the original image.Since the points and lines in the ESAR image were bigger and thicker than those in the AirSAR image, and the speckle noise of the former was stronger than the latter, we chose a 7 × 7 window for the refined Lee filter.The Pauli-RGB image of this area is shown in Figure 2a, and the 7 × 7 refined Lee filtered image is shown in Figure 2b.It can be observed from Figure 2 that the speckle noise is well suppressed, but some details are blurred, such as the dark lines in the areas that are marked by the ellipse.The second data set was an ESAR L-band PolSAR image from the Oberpfaffenhofen test site, Germany.The original single-look image was 2861 × 1540 in size.To facilitate the visual interpretation and evaluation, a pre-processing of two looks in the azimuth direction was performed to equalize the resolutions of the azimuth and range, and a region of 540 × 888, which was the same as the one shown in Figure 9 in [32], was sampled from the original image.Since the points and lines in the ESAR image were bigger and thicker than those in the AirSAR image, and the speckle noise of the former was stronger than the latter, we chose a 7 × 7 window for the refined Lee filter.The Pauli-RGB image of this area is shown in Figure 2a, and the 7 × 7 refined Lee filtered image is shown in Figure 2b.It can be observed from Figure 2 that the speckle noise is well suppressed, but some details are blurred, such as the dark lines in the areas that are marked by the ellipse.

Conventional Mean Shift
The mean shift procedure was proposed for nonparametric density gradient estimation by Fukunaga and Hostetler [Error!Reference source not found.],and it was improved and applied to data and image analysis, including smoothing, segmentation, clustering, and real-time tracking of objects by Cheng [34] and Comaniciu et al., [13,[35][36][37][38].The following is a brief introduction to the conventional mean shift algorithm.

Conventional Mean Shift
The mean shift procedure was proposed for nonparametric density gradient estimation by Fukunaga and Hostetler [34], and it was improved and applied to data and image analysis, including smoothing, segmentation, clustering, and real-time tracking of objects by Cheng [35] and Comaniciu et al., [13,[36][37][38][39].The following is a brief introduction to the conventional mean shift algorithm.
Given n independent and identically distributed (i.i.d.) random vectors x i ∈R d , i = 1, . . ., n, in the d-dimensional space R d , the PDF of x i can be estimated by the multivariate kernel density estimator with kernel K(x) and a symmetric positive definite d × d bandwidth matrix H: where In practice, to reduce the complexity of the estimation, the bandwidth matrix H is chosen as proportional to the identity matrix H = h 2 I.In pattern recognition, a special class of radially symmetric kernels satisfying K(x) = c k k x 2 are usually used, where the function k(x) is called the kernel profile, and constant c k makes the integral of K(x) equal to one.Accordingly, the PDF estimator (Equation ( 1)) can be rewritten as: A key step in the feature space analysis with the underlying density f (x) is to find the local maximum values of the density, i.e., the modes of the density.For a continuous PDF, the modes are located where the gradient ∇f (x) = 0.The mean shift procedure is an efficient way to locate these modes without estimating the PDF.The mean shift vector is calculated, as follows [13]: where q(x) = −k'(x) and Q(x) = c q q(||x|| 2 ).The kernel K(x) was called the shadow of the kernel Q(x) in [35].From Equation (4), it can be found that the mean shift vector is the difference between the weighted mean of the area bounded by bandwidth h and the center point x.At each point, the mean shift vector always points to the direction of maximum increase of f (x), so it can define a path leading to a mode of f (x).The mean shift procedure is defined as an iterative process.At each iteration, the mean shift vector (Equation ( 4)) is calculated and it is added to the center point to obtain the new center until it converges to a mode point where the density gradient is zero.

Mean Shift Filtering
In order to take full advantage of the spatial information that is contained in the image, Comaniciu and Meer [13] proposed introducing a multivariate kernel that is defined in the joint spatial-range domain into the mean shift procedure: where x s is the spatial part, x r is the range part of feature vector x, k s (x) and k r (x) are the profiles used in the two domains, h s and h r are the employed kernel bandwidths, and C is the corresponding normalization constant.
Mean shift filtering is a straightforward application of the mean shift procedure.For each point x defined in the d-dimensional space, find its mode point x m by using the mean shift iterative process, and assign the filtered value of x = (x s , x m,r ).

Mean Shift Segmentation
The principle of the conventional mean shift segmentation is straightforward [13]: firstly, run the mean shift filtering procedure for the image and store both the spatial and range information of the mode point; then, merge the pixels whose mode points' spatial distances are less than h s and range distances are less than h r .
We suppose that x k and z k (k = [1, N]) are the d-dimensional pixels in the original input image I and the filtered image I f defined in the joint spatial-range domain, and L i is the region label that the i-th pixel belongs to.The procedure of the conventional mean shift segmentation algorithm can then be described, as follows: Run the mean shift filtering procedure for the image I, obtain the filtered image I f , and save the coordinate of each pixel's mode point.

2.
Judge the adjacent pixels in turn.Merge the pixels in I f whose mode points' spatial distances are less than h s and range distances are less than h r .3.
(Optional) Merge the small regions whose sizes are less than M pixels into the most similar adjacent regions.

4.
Suppose that the number of final segmented regions is m, and the segmented regions are C p , p = 1, . . .,m.
For each pixel k = 1, . . .,N, assign From the procedure of mean shift segmentation, it can be seen that this algorithm uses the region growing and merging technique, which contains two elements: the merging predicate confirming whether adjacent regions are merged or not, and the merging order followed to test the merging of regions.In this paper, the mean shift segmentation algorithm is improved from these two aspects, so that it is appropriate for PolSAR images.

Generalized Mean Shift
It is known that the probability distribution of single-look and N-look amplitude SAR data is asymmetric, as is the intensity.However, the bandwidth of the conventional mean shift algorithm is symmetric.Furthermore, a single fixed bandwidth is also inadequate, since the range of SAR data is usually very large.Therefore, pre-processing, such as logarithmic transformation and normalization, has to be applied before the mean shift filter is used for SAR images in the existing approaches [40][41][42][43].Clearly, these pre-processing steps will result in a loss of some information and reduce the segmentation quality.To overcome these problems, Lang et al., [32] proposed using an adaptive variable asymmetric bandwidth, and derived the GMS: where x = [x 1 , x 2 , . . ., x p ] is the center pixel in an image with p channels, x i is a sample pixel, q(||x|| 2 ) is the profile of the kernel function Q(x), and h(x) is the bandwidth vector: where xj is the Minimum Mean Square Error (MMSE) [44,45] estimated value of the center pixel in channel j, and (σ 1 , σ 2 ) is the sigma range that maintains the mean value of the SAR intensity or amplitude [32,46].Since the sigma range was described by parameter ξ in [46], the bandwidth vector h is determined only by parameter ξ.
If we suppose that a = [a 1 ,a 2 , . . .,a n ] and b = [b 1 ,b 2 , . . .,b n ] are two n-dimensional vectors, then the vector division a/b is defined as: According to Equations ( 7) and ( 8), the Euclidean norm used in the profile function is: where the subscript j stands for the dimension x i,j ≤x j , and the subscript k stands for the dimension x i,k >x k , in the total p range dimensions.For more information about GMS, we refer the reader to [32].
After the above extension, the GMS algorithm can be used directly with SAR data without any pre-processing steps.Accordingly, we obtain the new GMS that is defined in the joint spatial-range domain: where subscripts s and r stand for the spatial and range domains, respectively.

Merging Predicate
The merging predicate of the conventional mean shift segmentation can be defined in mathematical form as: where z denotes the center pixel filtered by the mean shift filter and z i denotes an adjacent pixel.
The conventional mean shift defined in the joint spatial-range domain is [32]: By comparing Equations ( 11) and ( 12), the predicate (Equation ( 11)) can be modified, as follows: Equation (13) indicates that the merging predicate of the mean shift segmentation algorithm is judging whether the parameter of the profile function q(x) is less than 1 or not.
Accordingly, the merging predicate of the GMS segmentation algorithm can be derived based on Equation ( 10): What we should note here is, that, differing from mean shift, during the region-merging process of GMS, two regions are involved, whose corresponding range bandwidths are h(x) and h(x i ), respectively.However, only the bandwidth of the central pixel is considered in Equation ( 14), which is not comprehensive.We therefore need to consider how to use these two bandwidths.In general, there are four main approaches: (1) minimum; (2) maximum; (3) summation; and, (4) difference.The fourth method can be eliminated because the result is a constant, which can be derived according to Equation (7).The final bandwidths of the other three methods increase from (1) to (3).Since the bigger the bandwidth, the higher the merging probability, we choose the first method to reduce the over-merging problem.Accordingly, the predicate (Equation ( 14)) becomes: In the conventional mean shift segmentation algorithm, the adjacent pixels are used to decide if the regions to which the adjacent pixels belong to should be merged.However, this model might produce unsatisfactory results since the value of a single pixel may not be true in a noise-polluted image.To ensure the reliability of the merging test, the adjacent homogeneous pixels should be used to estimate the central pixel.For a segmentation algorithm that is based on the region-growing and merging technique, the pixels in the same segmented region are considered to be homogenous.Therefore, any pixel in the same region can be represented by the mean of the region.In this paper, the means of two adjacent regions are used to judge if they should be merged.Consequently, the merging predicate (Equation ( 15)) becomes: where R(z) stands for the region to which pixel z belongs and R r (z) stands for the mean values of region R in the range domain.

Merging Order
From Equation ( 16), we can deduce that the merging order will influence the merging result, because the mean region value will be different if the merging order is changed.In the conventional mean shift segmentation algorithm, the merging predicate is tested in row-column order.This strategy can obtain correct segmentation results when the borders between different classes are clear, but when the borders are ambiguous, the results may be wrong.For example, in Figure 3a, where the border of class 1 and class 2 is clear, the row-column strategy can obtain two regions that are consistent with class 1 and class 2. However, in Figure 3b, where the border is ambiguous, class 1 and class 2 are merged into one region (Figure 4a), even if the bandwidth is set as small as 15.
However, it is clear that the smaller the difference of the merged pixels, the better the reliability of the merging predicate.Therefore, we propose to use a pre-sorting strategy that first sorts the adjacent pixel pairs according to their difference or gradient in increasing order, and it then traverses this order only once [17,23].For any current pair of pixels (p 1 ,p 2 ), if R(p 1 ) = R(p 2 ), we make the test P G (R(p 1 ),R(p 2 )) and merge R(p 1 ) and R(p 2 ) if the result is true.When using the pre-sorting strategy, the adjacent pixels with the smallest differences in Figure 3b will be merged first.The border pixels will be later merged into several different regions, as shown in Figure 4b, c, and d.We can see that, even when the bandwidth is set as high as 90, class 1 and class 2 can be preserved.Since the pre-sorting strategy sorts the differences or gradients of adjacent pixels, the gradient function should be deduced according to Equation (15), and not Equation (16).On the basis of Equation ( 15), the gradient function used for sorting is: , ( )

Post-Processing
As with most segmentation algorithms, the result of mean shift segmentation is often very broken due to the influence of speckle noise.A post-processing step is therefore needed to eliminate the "noisy regions", which are composed of only one or several pixels.
The strategy of the post-processing is usually straightforward: merge the regions whose sizes are less than a threshold Nn into their nearest neighbors [13].However, this strategy might result in some small heterogeneous objects being merged inappropriately, which is not conducive to the follow-up image interpretation.To preserve strong point targets while eliminating the noisy regions, we adopt the same strategy as [24]: set a threshold Np, whose value should be set manually, according to the size of the point targets in the PolSAR image.Any region less than Np is directly merged into its most similar adjacent region as a noisy region.If the size of a region is in the range of [Np, Nn), the gradient of the region and its nearest neighbor should be calculated.If the gradient is bigger than a  When using the pre-sorting strategy, the adjacent pixels with the smallest differences in Figure 3b will be merged first.The border pixels will be later merged into several different regions, as shown in Figure 4b-d.We can see that, even when the bandwidth is set as high as 90, class 1 and class 2 can be preserved.When using the pre-sorting strategy, the adjacent pixels with the smallest differences in Figure 3b will be merged first.The border pixels will be later merged into several different regions, as shown in Figure 4b, c, and d.We can see that, even when the bandwidth is set as high as 90, class 1 and class 2 can be preserved.Since the pre-sorting strategy sorts the differences or gradients of adjacent pixels, the gradient function should be deduced according to Equation (15), and not Equation (16).On the basis of Equation ( 15), the gradient function used for sorting is: , ( )

Post-Processing
As with most segmentation algorithms, the result of mean shift segmentation is often very broken due to the influence of speckle noise.A post-processing step is therefore needed to eliminate the "noisy regions", which are composed of only one or several pixels.
The strategy of the post-processing is usually straightforward: merge the regions whose sizes are less than a threshold Nn into their nearest neighbors [13].However, this strategy might result in some small heterogeneous objects being merged inappropriately, which is not conducive to the follow-up image interpretation.To preserve strong point targets while eliminating the noisy regions, we adopt the same strategy as [24]: set a threshold Np, whose value should be set manually, according to the size of the point targets in the PolSAR image.Any region less than Np is directly merged into its most similar adjacent region as a noisy region.If the size of a region is in the range of [Np, Nn), the gradient of the region and its nearest neighbor should be calculated.If the gradient is bigger than a  Since the pre-sorting strategy sorts the differences or gradients of adjacent pixels, the gradient function should be deduced according to Equation (15), and not Equation (16).On the basis of Equation ( 15), the gradient function used for sorting is:

Post-Processing
As with most segmentation algorithms, the result of mean shift segmentation is often very broken due to the influence of speckle noise.A post-processing step is therefore needed to eliminate the "noisy regions", which are composed of only one or several pixels.
The strategy of the post-processing is usually straightforward: merge the regions whose sizes are less than a threshold N n into their nearest neighbors [13].However, this strategy might result in some small heterogeneous objects being merged inappropriately, which is not conducive to the follow-up image interpretation.To preserve strong point targets while eliminating the noisy regions, we adopt the same strategy as [24]: set a threshold N p , whose value should be set manually, according to the size of the point targets in the PolSAR image.Any region less than N p is directly merged into its most similar adjacent region as a noisy region.If the size of a region is in the range of [N p , N n ), the gradient of the region and its nearest neighbor should be calculated.If the gradient is bigger than a threshold G th , the region is preserved as a strong point target; else, merge the two regions.The gradient measure used in this paper is defined, as follows [23,24]: where T diag denotes the vector that is composed by the diagonal elements of the central coherence matrix T of a region R, • 1 denotes the 1-norm, and q is the number of diagonal elements.Since the range of G is [0,1], it is simple to set the threshold G th .In this study, we set G th = 0.2 in all of the experiments.

GMS Superpixel Segmentation for PolSAR Data
Because the GMS segmentation method is used to produce superpixels, the size of the maximum superpixel should be controlled during the region merging.If we suppose that the size threshold of the maximum superpixel is N max , then the complete procedure of the proposed GMS superpixel segmentation algorithm can be summarized, as follows: GMS filtering 1) Let x and z be the d-dimensional input and filtered image pixels in the joint spatial-range domain, x i , i = 1,..., n are the adjacent pixels of x, and y j , j = 1, 2,..., to which the successive locations of the mean shift vectors point.Given an initial point x, assign j=1 and y j = x.

2)
From Equation (10), y j+1 can be calculated by: and the j-th GMS vector can be written as: 3) If M j < ε, then y j+1 is the mode point, assign z = (x s , y j+1 ,r) and break; else, assign y j = y j+1 and j = j + 1, go to step 2), and compute the new y j+1 and M j .

GMS merging 1)
Let S I be the set of adjacent pixel pairs in I in the 8-neighborhood.Compute the gradients of the pairs in S I by Equation (17).
2) Sort the pairs of S I by the gradients computed in Step 1) in increasing order.

3)
In the series of S I , for any current pair (z, by Equation (16).If the result is true, and their total size is less than N max , merge R(z) and R(z i ).

3.
Post-processing.For each segmented region, if its size N s is less than N n , compute the gradient between this region and its neighboring regions.Find the minimum gradient G min .If G min < G th or N s < N p , merge the two regions; else, set the region aside.
It should be noted that, in Algorithm 2, the spatial bandwidths used in the GMS filtering step (denoted by h s f ) and the GMS step (denoted by h s m ) are different.h s f means the radius of the filtering window, and it is a positive integer that is equal to or greater than 1.Meanwhile, h s m means the spatial location threshold of the mode points, and it is a positive real number that is usually assigned as equal to or less than 1.In this study, we set h s f = 5 and h s m = 1 in all of the experiments.

Evaluation Based on AirSAR Data
Firstly, Figure 1a was processed by the original mean shift segmentation algorithm with h s = 7, M = 10, and h r = 10, 15, and 20, respectively.The results are shown in Figure 5a-c, with the right halves of the figures showing the borders of the segmented regions.The obtained numbers of regions are N r = 6474, 5467, and 3444, respectively.From the left halves of Figure 5a-c, it can be clearly seen that the points and edges in Figure 1a are not preserved well.From the right halves of Figure 5a-c, both over-segmentation and over-merging can be clearly observed, as in the area that is marked by the rectangle.The over-segmentation could be reduced by parameter adjustment, while the over-merging situation would increase, and vice versa.To further understand the results, the mean shift filtered images are shown in Figure 5d-f, from which we can see that the speckle noise is barely suppressed, especially in dark areas, such as the areas that are marked by the ellipses.

Evaluation Based on AirSAR Data
Firstly, Figure 1a was processed by the original mean shift segmentation algorithm with hs=7, M = 10, and h r= 10, 15, and 20, respectively.The results are shown in Figure 5a-c, with the right halves of the figures showing the borders of the segmented regions.The obtained numbers of regions are Nr = 6474, 5467, and 3444, respectively.From the left halves of Figure 5a-c, it can be clearly seen that the points and edges in Figure 1a are not preserved well.From the right halves of Figure 5a-c, both oversegmentation and over-merging can be clearly observed, as in the area that is marked by the rectangle.The over-segmentation could be reduced by parameter adjustment, while the overmerging situation would increase, and vice versa.To further understand the results, the mean shift filtered images are shown in Figure 5d-f, from which we can see that the speckle noise is barely suppressed, especially in dark areas, such as the areas that are marked by the ellipses.The GMS superpixel segmentation algorithm was then applied to the AirSAR PolSAR image.The filtering and merging parameters were set as: ξ = 0.9, hs f = 5, hs m = 1, and Nmax = 100.Although there are several other parameters that need to be set, the appropriate values have been discussed in [32].These values were directly selected as the default values.The post-processing parameters were set as: Nn = 49, Np = 4, and Gth = 0.2.The result is shown in Figure 6a, with Nr = 2021.When compared with Figure 5a-c, Figure 6a presents a better segmentation quality: the sizes of the superpixels are similar, the borders of the superpixels are smoother, and the points and edges are preserved well, proving the effectiveness of the GMS superpixel segmentation algorithm.The GMS superpixel segmentation algorithm was then applied to the AirSAR PolSAR image.The filtering and merging parameters were set as: ξ = 0.9, h s f = 5, h s m = 1, and N max = 100.Although there are several other parameters that need to be set, the appropriate values have been discussed in [32].These values were directly selected as the default values.The post-processing parameters were set as: For comparison, the Ncut algorithm was applied to the first data set.The desired superpixel number was set as k = 2000.The result is shown in Figure 6b, with Nr = 2016.From a general view, the left half of Figure 6b is similar to the left half of Figure 6a.However, the right half of Figure 6b shows that the Ncut superpixels are more compact and the shapes and sizes of the superpixels are more similar.On the other hand, the points and lines, such as those that are marked by the red ellipses, are not preserved well.
Finally, the SLIC-GC algorithm was applied to the AirSAR PolSAR image.The desired superpixel number was again set as k = 2000.The post-processing parameters were the same as for the GMS algorithm.The result is shown in Figure 6c, with Nr = 2081.When compared with Figure 6a and Figure 6b, the left half of Figure 6c shows that the segmentation result is broken and discontinuous in homogeneous areas, and both the edges of homogeneous areas and the borders of superpixels are jagged, such as those areas that are marked by the rectangles.This indicates that the SLIC-GC algorithm is still affected by speckle noise.In addition, the compactness of Figure 6c is m = 0.5.When this parameter is set as a larger value, the superpixels are more compact, but the points and lines are not preserved.When it is set as a smaller value, the shapes of the superpixels are more irregular, and the borders are more jagged.
To further evaluate the segmentation quality, the ratio method that was used in [47] was adopted.The first step is to obtain the ratio images of the original intensity image to the segmented versions.Based on the ratio images, there are two ways to evaluate the segmentation quality: 1) Qualitative evaluation: This method is a visual evaluation method used to evaluate the ability of a segmentation algorithm in detail preservation.If the ratio image is made up of random noise without structured information, it can be considered that the segmentation result depicts the ideal segmentation image; otherwise, any sign of structure indicates a loss of detail.
2) Quantitative evaluation: The mean and variance of the ratio image are calculated.For the intensity ratio image, the theoretical mean value is r = 1, and the theoretical variance is given by: ( ) where N is the number of pixels in the ratio image, m is the number of segments, nj is the number of pixels in the j-th segment, and L is the number of looks.The closer the mean value to the theoretical one, the better the radiometric information preservation ability.Since the value of any segment is defined as the local average, the mean value of the ratio image is always 1. Similarly, the closer the variance to the theoretical one, the better the detail information preservation ability.For comparison, the Ncut algorithm was applied to the first data set.The desired superpixel number was set as k = 2000.The result is shown in Figure 6b, with N r = 2016.From a general view, the left half of Figure 6b is similar to the left half of Figure 6a.However, the right half of Figure 6b shows that the Ncut superpixels are more compact and the shapes and sizes of the superpixels are more similar.On the other hand, the points and lines, such as those that are marked by the red ellipses, are not preserved well.
Finally, the SLIC-GC algorithm was applied to the AirSAR PolSAR image.The desired superpixel number was again set as k = 2000.The post-processing parameters were the same as for the GMS algorithm.The result is shown in Figure 6c, with N r = 2081.When compared with Figures 6a and 6b, the left half of Figure 6c shows that the segmentation result is broken and discontinuous in homogeneous areas, and both the edges of homogeneous areas and the borders of superpixels are jagged, such as those areas that are marked by the rectangles.This indicates that the SLIC-GC algorithm is still affected by speckle noise.In addition, the compactness of Figure 6c is m = 0.5.When this parameter is set as a larger value, the superpixels are more compact, but the points and lines are not preserved.When it is set as a smaller value, the shapes of the superpixels are more irregular, and the borders are more jagged.
To further evaluate the segmentation quality, the ratio method that was used in [47] was adopted.The first step is to obtain the ratio images of the original intensity image to the segmented versions.Based on the ratio images, there are two ways to evaluate the segmentation quality: 1) Qualitative evaluation: This method is a visual evaluation method used to evaluate the ability of a segmentation algorithm in detail preservation.If the ratio image is made up of random noise without structured information, it can be considered that the segmentation result depicts the ideal segmentation image; otherwise, any sign of structure indicates a loss of detail.
2) Quantitative evaluation: The mean and variance of the ratio image are calculated.For the intensity ratio image, the theoretical mean value is r = 1, and the theoretical variance is given by: var where N is the number of pixels in the ratio image, m is the number of segments, n j is the number of pixels in the j-th segment, and L is the number of looks.The closer the mean value to the theoretical one, the better the radiometric information preservation ability.Since the value of any segment is defined as the local average, the mean value of the ratio image is always 1. Similarly, the closer the variance to the theoretical one, the better the detail information preservation ability.
The ratio images of the original image to the segmented versions of the different algorithms are shown in Figure 7, being plotted over the range [0.5, 1.5].Weak structural information can be observed in Figure 7a,c, indicating the good detail preservation abilities of the GMS and SLIC-GC algorithms.Meanwhile, strong and obvious structures can be observed in Figure 7b, indicating the poor detail preservation ability of the Ncut algorithm.
The ratio images of the original image to the segmented versions of the different algorithms are shown in Figure 7, being plotted over the range [0.5, 1.5].Weak structural information can be observed in Figure 7a and Figure 7c, indicating the good detail preservation abilities of the GMS and SLIC-GC algorithms.Meanwhile, strong and obvious structures can be observed in Figure 7b, indicating the poor detail preservation ability of the Ncut algorithm.
The quantitative evaluation measurements of the superpixel segmentation results are listed in Table 1.Although the numbers of superpixels are different, the theoretical variances are equal.The variance of the SLIC-GC segmentation is the smallest and it is the closest to the theoretical value.The GMS result is close to the SLIC-GC, which is consistent with the visual assessment.The variance of the Ncut segmentation is the biggest.The execution efficiency is always an important evaluation measurement for a segmentation algorithm.The execution times of the different superpixel segmentation algorithms are listed in Table 2. Since all three algorithms need filtering steps, the execution times of both the filtering and segmentation steps are taken into account.Table 2 shows that although the GMS algorithm has an obvious advantage in the segmentation step, it has a moderate performance in total execution time due to the low efficiency of the GMS filter.The most efficient segmentation algorithm is SLIC-GC.The execution time of the Ncut algorithm is far more than other algorithms.

Evaluation Based on ESAR Data
Figure 8 shows the visual assessment of the GMS superpixel segmentation algorithm, the Ncut segmentation algorithm, and the SLIC-GC segmentation algorithm applied to the ESAR image.For the GMS superpixel segmentation algorithm, the filtering, merging, and post-processing parameters The quantitative evaluation measurements of the superpixel segmentation results are listed in Table 1.Although the numbers of superpixels are different, the theoretical variances are equal.The variance of the SLIC-GC segmentation is the smallest and it is the closest to the theoretical value.The GMS result is close to the SLIC-GC, which is consistent with the visual assessment.The variance of the Ncut segmentation is the biggest.The execution efficiency is always an important evaluation measurement for a segmentation algorithm.The execution times of the different superpixel segmentation algorithms are listed in Table 2. Since all three algorithms need filtering steps, the execution times of both the filtering and segmentation steps are taken into account.Table 2 shows that although the GMS algorithm has an obvious advantage in the segmentation step, it has a moderate performance in total execution time due to the low efficiency of the GMS filter.The most efficient segmentation algorithm is SLIC-GC.The execution time of the Ncut algorithm is far more than other algorithms.

Evaluation Based on ESAR Data
Figure 8 shows the visual assessment of the GMS superpixel segmentation algorithm, the Ncut segmentation algorithm, and the SLIC-GC segmentation algorithm applied to the ESAR image.For the GMS superpixel segmentation algorithm, the filtering, merging, and post-processing parameters were the same as for the first data set.The result is shown in Figure 8a, with N r = 7449.From the right side of Figure 8a, we can see that the points and lines in the image are preserved well, such as those in the areas that are marked by the ellipses.From the left side of Figure 8a, it can be observed that the sizes and shapes of the superpixels are irregular and they vary with the different scenes.were the same as for the first data set.The result is shown in Figure 8a, with Nr = 7449.From the right side of Figure 8a, we can see that the points and lines in the image are preserved well, such as those in the areas that are marked by the ellipses.From the left side of Figure 8a, it can be observed that the sizes and shapes of the superpixels are irregular and they vary with the different scenes.For the Ncut algorithm, the desired superpixel number was set as k = 8000.The result is shown in Figure 8b, with Nr = 8064.From the left side of Figure 8b, it can be seen that the sizes and shapes of the superpixels are regular, and the borders are smooth.However, the right side of Figure 8b shows that the points and lines are blurred obviously, especially those in the areas that are marked by the ellipse, although the number of superpixels is larger than that of the GMS algorithm.
For the SLIC-GC algorithm, the desired superpixel number was also set as k = 8000.The postprocessing parameters were the same as for the first data set.The result is shown in Figure 8c, with Nr = 7728.When compared with Figure 2a, the right side of Figure 8c shows a moderate segmentation quality, since some dark lines and bright points in the areas marked by the ellipse are not preserved well.However, when compared with Figure 2b, Figure 8c can be considered as acceptable.The right side is similar to Figure 2b, and the left side shows that the sizes and shapes of the superpixels are similar, and the borders of the superpixels are smooth, due to the good speckle noise suppression.When combined with the experimental results that were obtained with the first data set, we can conclude that the SLIC-GC algorithm is sensitive to speckle noise, and its segmentation quality depends on the performance of the filtering algorithm.
The ratio method was again used to evaluate the segmentation quality.The ratio images of the different algorithms are shown in Figure 9, being plotted over the range [0.5, 1.5].Similar to the first data set, weak structural information can be observed in Figure 9a and Figure 9c, and strong and obvious structures can be observed in Figure 9b, indicating the good detail preservation abilities of the GMS and SLIC-GC algorithms, and the poor detail preservation ability of the Ncut algorithm.For the Ncut algorithm, the desired superpixel number was set as k = 8000.The result is shown in Figure 8b, with N r = 8064.From the left side of Figure 8b, it can be seen that the sizes and shapes of the superpixels are regular, and the borders are smooth.However, the right side of Figure 8b shows that the points and lines are blurred obviously, especially those in the areas that are marked by the ellipse, although the number of superpixels is larger than that of the GMS algorithm.
For the SLIC-GC algorithm, the desired superpixel number was also set as k = 8000.The post-processing parameters were the same as for the first data set.The result is shown in Figure 8c, with N r = 7728.When compared with Figure 2a, the right side of Figure 8c shows a moderate segmentation quality, since some dark lines and bright points in the areas marked by the ellipse are not preserved well.However, when compared with Figure 2b, Figure 8c can be considered as acceptable.The right side is similar to Figure 2b, and the left side shows that the sizes and shapes of the superpixels are similar, and the borders of the superpixels are smooth, due to the good speckle noise suppression.When combined with the experimental results that were obtained with the first data set, we can conclude that the SLIC-GC algorithm is sensitive to speckle noise, and its segmentation quality depends on the performance of the filtering algorithm.
The ratio method was again used to evaluate the segmentation quality.The ratio images of the different algorithms are shown in Figure 9, being plotted over the range [0.5, 1.5].Similar to the first data set, weak structural information can be observed in Figure 9a,c, and strong and obvious structures can be observed in Figure 9b, indicating the good detail preservation abilities of the GMS and SLIC-GC algorithms, and the poor detail preservation ability of the Ncut algorithm.The quantitative evaluation measurements are listed in Table 3.Since the differences of the numbers of superpixels are much bigger than the first data set, the theoretical variances are different, but they are very close.From Table 3, we can find that the variance of the GMS segmentation is the smallest and it is the closest to the theoretical value.The SLIC-GC result is slightly worse.The variance of the Ncut segmentation is nearly twice the theoretical value.The execution times of the different superpixel segmentation algorithms are listed in Table 4. Table 4 shows that the GMS algorithm still has an obvious advantage in the segmentation step.However, due to the low efficiency of the GMS filter, the total execution time of the GMS algorithm is the longest.The quantitative evaluation measurements are listed in Table 3.Since the differences of the numbers of superpixels are much bigger than the first data set, the theoretical variances are different, but they are very close.From Table 3, we can find that the variance of the GMS segmentation is the smallest and it is the closest to the theoretical value.The SLIC-GC result is slightly worse.The variance of the Ncut segmentation is nearly twice the theoretical value.The execution times of the different superpixel segmentation algorithms are listed in Table 4. Table 4 shows that the GMS algorithm still has an obvious advantage in the segmentation step.However, due to the low efficiency of the GMS filter, the total execution time of the GMS algorithm is the longest.Combined with the first data set, we can find that, when the number of looks is high or the speckle noise is not very strong, the SLIC-GC algorithm has the best segmentation results.When the speckle noise is strong, the SLIC-GC algorithm gets worse, and the GMS algorithm get better.In the aspect of execution efficiency, the GMS algorithm has an obvious advantage in the segmentation step.However, the efficiency of the GMS filter is too low.The GMS algorithm does not have any advantage in the total execution time.How to improve the efficiency of GMS filtering is an urgent problem to be solved.Since this problem is beyond the scope of this paper, limited discussion will be expressed in Section 4.2.

Parameter Settings
It is well known that the parameter settings are of great importance to a segmentation algorithm.There are many parameters that are involved in the GMS algorithm.However, as analyzed in [32], most of the parameters have strong adaptability in the appropriate range.A set of default parameters can nearly always obtain good results for different data.In practical applications, the parameters can be set to the default values.The parameter settings of the filtering step used in this paper were the same as those in [32], except that h s f was set to 5. The reason for this is that a filtering algorithm with a small window is more efficient and it can preserve more details, which is helpful to subsequent segmentation.The new parameters of the merging step are {h s m , N max , N n , N p , G th }.Among them, h s m is usually set to 1 by default.The other four parameters have nothing to do with the segmentation algorithm, but are dependent on the application requirements and the actual situation of the images.N max and N n are related to the desired size or amount of segmented regions.Suppose that the total number of pixels is N I , the excepted size of the segmented regions is S, and the expected number of the segmented regions is N S .They have the follow relationship: N I = S * N S .The empirical value of N max can be set in the range (S, 2S] or (S, 2N I /N S ].The empirical value of N n can be set in the range [S/4, S) or [N I /4N S , S).It should be noted that, due to the presence of some small heterogeneous regions, the actual number of segmented regions N a is often larger than N s .By increasing N max and N n , N a can be close to or even less than N S .N p is related to the minimum target size in the image.When the image resolution is high and the point target is large, N p should be set as a large value, and vice versa.G th reflects the heterogeneity degree of the same class of objects in the image.G th should be set as a large value when the difference between the same class of objects is large.When the difference between different objects in the image is small, G th should be set as a small value.When both situations occur at the same time, priority should be given to the second situation, which is to set a small value for G th to ensure the homogeneity of pixels within the same segmented region.

Stability and Efficiency
The stability of segmentation algorithms was discussed in [48], and a stable mean shift algorithm was proposed.The so-called stable segmentation algorithm means that a segmentation algorithm should get matching results for any two overlapping subsets of an image.The basic mean shift segmentation algorithm is considered to be stable according to the analysis in [48].However, some optimization steps can lead to instability of the mean shift algorithm.Therefore, the proposed stable version in fact achieves the stabilization of the mean shift process by giving up the optimization steps, so as to realize the image tile segmentation.Finally, the execution efficiency of the mean shift segmentation could be improved by the use of parallel computing technology.
The GMS segmentation algorithm that is proposed in this paper is unstable because it adopts a pre-sorting strategy, and different tile partition methods lead to different orders.According to [48], the GMS segmentation algorithm can also have a corresponding stable version, i.e., with the optimization strategies, such as the pre-sorting strategy and the post-processing step not adopted.However, this may result in a decline of the segmentation quality.In order to obtain a better balance between segmentation quality and efficiency, another experiment was undertaken.Figure 10 shows the GMS segmentation results with and without the pre-sorting strategy and the post-processing step, and the filtering results are also shown.The obtained number of regions for the GMS algorithm without the pre-sorting strategy and the post-processing step (GMS-0 for short) is N r = 73511, which is nearly 10 times larger than the result that is shown in Figure 8a.The execution time of the GMS-0 segmentation is less than 2s.The areas that are marked by the circle in Figure 10 show that some of the points, edges, and lines that are preserved well by the proposed GMS algorithm are not preserved by the GMS-0 algorithm, even though the obtained superpixels of the latter are nearly 10 times the size of the former.As it is well known that both effectiveness and efficiency are important to an algorithm, when we cannot have both, we should make a choice that is based on the actual requirements.the GMS segmentation results with and without the pre-sorting strategy and the post-processing step, and the filtering results are also shown.The obtained number of regions for the GMS algorithm without the pre-sorting strategy and the post-processing step (GMS-0 for short) is Nr = 73511, which is nearly 10 times larger than the result that is shown in Figure 8a.The execution time of the GMS-0 segmentation is less than 2s.The areas that are marked by the circle in Figure 10 show that some of the points, edges, and lines that are preserved well by the proposed GMS algorithm are not preserved by the GMS-0 algorithm, even though the obtained superpixels of the latter are nearly 10 times the size of the former.As it is well known that both effectiveness and efficiency are important to an algorithm, when we cannot have both, we should make a choice that is based on the actual requirements.

Conclusions
The work that is proposed in this paper is an extension of the previous work by Lang et al., [32], who proposed the GMS algorithm and applied it to PolSAR image filtering.In this paper, the GMS algorithm is further applied to PolSAR image superpixel segmentation.A new merging predicate defined in the joint spatial-range domain is derived based on the GMS algorithm.A pre-sorting strategy and a post-processing step are also introduced into the GMS segmentation algorithm.When compared with the conventional mean shift segmentation algorithm, the method proposed in this paper can be directly applied to PolSAR images without any pre-processing.The experimental results that were obtained with AirSAR and ESAR L-band PolSAR data showed that the GMS segmentation algorithm has good anti-noise ability and can preserve the detail information by producing superpixels conforming to the actual scene of the image.Although the GMS, Ncut, and SLIC-GC algorithms all require pre-processing filtering to suppress speckle noise, the GMS segmentation is bound to the GMS filtering, and thus show a robust performance.For the Ncut and SLIC-GC algorithms, different choices of filters and filtering parameters can produce very different segmentation results.
The time consumption of the GMS filtering is longer than the other algorithms due to the low efficiency of the GMS iterative process.Although the segmentation efficiency could be improved by abandoning the optimization steps and adopting parallel computing technology, the segmentation quality might decline.How to achieve a balance between segmentation quality and efficiency for the

Conclusions
The work that is proposed in this paper is an extension of the previous work by Lang et al., [32], who proposed the GMS algorithm and applied it to PolSAR image filtering.In this paper, the GMS algorithm is further applied to PolSAR image superpixel segmentation.A new merging predicate defined in the joint spatial-range domain is derived based on the GMS algorithm.A pre-sorting strategy and a post-processing step are also introduced into the GMS segmentation algorithm.When compared with the conventional mean shift segmentation algorithm, the method proposed in this paper can be directly applied to PolSAR images without any pre-processing.The experimental results that were obtained with AirSAR and ESAR L-band PolSAR data showed that the GMS segmentation algorithm has good anti-noise ability and can preserve the detail information by producing superpixels conforming to the actual scene of the image.Although the GMS, Ncut, and SLIC-GC algorithms all require pre-processing filtering to suppress speckle noise, the GMS segmentation is bound to the GMS filtering, and thus show a robust performance.For the Ncut and SLIC-GC algorithms, different choices of filters and filtering parameters can produce very different segmentation results.

Figure 3 .
Figure 3. Class 1 and class 2 with (a) clear and (b) ambiguous borders.

Figure 4 .
Figure 4. Mean shift segmentation results of Error!Reference source not found.busing (a) the row-c olumn strategy with range bandwidth h = 15 and 30 (same result); and, (b)-(d) the pre-sorting strategy with range bandwidth h = 30, 60, and 90, respectively.

Figure 3 .
Figure 3. Class 1 and class 2 with (a) clear and (b) ambiguous borders.

Figure 3 .
Figure 3. Class 1 and class 2 with (a) clear and (b) ambiguous borders.

Figure 4 .
Figure 4. Mean shift segmentation results of Error!Reference source not found.busing (a) the row-c olumn strategy with range bandwidth h = 15 and 30 (same result); and, (b)-(d) the pre-sorting strategy with range bandwidth h = 30, 60, and 90, respectively.

Figure 4 .
Figure 4. Mean shift segmentation results of Figure 3b using (a) the row-column strategy with range bandwidth h = 15 and 30 (same result); and, (b)-(d) the pre-sorting strategy with range bandwidth h = 30, 60, and 90, respectively.

Figure 5 .
Figure 5.The segmentation (a-c) and filtering (d-f) results of the conventional mean shift algorithm with h s = 7, M = 10, and h r = 10 (a,d), h r = 15 (b,e), and h r = 20 (c,f).

Figure 10 .
Figure 10.The GMS filtering and segmentation results of the ESAR PolSAR image.(a) and (d) are the filtering results.(b) and (e) are the segmentation results with the pre-sorting strategy and the postprocessing step.(c) and (f) are the segmentation results without the pre-sorting strategy and the postprocessing step.

Figure 10 .
Figure 10.The GMS filtering and segmentation results of the ESAR PolSAR image.(a) and (d) are the filtering results.(b) and (e) are the segmentation results with the pre-sorting strategy and the post-processing step.(c) and (f) are the segmentation results without the pre-sorting strategy and the post-processing step.

Table 1 .
Means and variances of the ratio images for the different superpixel segmentation algorithms with the AirSAR image.

Table 2 .
Execution times of the different superpixel segmentation algorithms with the AirSAR image.

Table 1 .
Means and variances of the ratio images for the different superpixel segmentation algorithms with the AirSAR image.

Table 2 .
Execution times of the different superpixel segmentation algorithms with the AirSAR image.

Table 3 .
Means and variances of the ratio images for the different superpixel segmentation algorithms with the ESAR image.

Table 4 .
Execution times of the different superpixel segmentation algorithms with the ESAR image.

Table 3 .
Means and variances of the ratio images for the different superpixel segmentation algorithms with the ESAR image.

Table 4 .
Execution times of the different superpixel segmentation algorithms with the ESAR image.