Adaptive Distance-Weighted Voronoi Tessellation for Remote Sensing Image Segmentation

: The spatial fragmentation of high-resolution remote sensing images makes the segmentation algorithm put forward a strong demand for noise immunity. However, the stronger the noise immunity, the more serious the loss of detailed information, which easily leads to the neglect of e ﬀ ective characteristics. In view of the di ﬃ culty of balancing the noise immunity and e ﬀ ective characteristic retention, an adaptive distance-weighted Voronoi tessellation technology is proposed for remote sensing image segmentation. The distance between pixels and seed points in Voronoi tessellation is established by the adaptive weighting of spatial distance and spectral distance. The weight coe ﬃ cient used to control the inﬂuence intensity of spatial distance is deﬁned by a monotone decreasing function. Following the fuzzy clustering framework, a fuzzy segmentation model with Kullback–Leibler (KL) entropy regularization is established by using multivariate Gaussian distribution to describe the spectral characteristics and Markov Random Field (MRF) to consider the neighborhood e ﬀ ect of sub-regions. Finally, a series of parameter optimization schemes are designed according to parameter characteristics to obtain the optimal segmentation results. The proposed algorithm is validated on many multispectral remote sensing images with ﬁve comparing algorithms by qualitative and quantitative analysis. A large number of experiments show that the proposed algorithm can overcome the complex noise as well as better ensure e ﬀ ective characteristics.


Introduction
Image segmentation plays a crucial role in remote sensing image interpretation, segmentation accuracy can directly affect the quality of interpretation [1]. The goal of image segmentation is to partition the image into a group of homogeneous regions, and the features in the homogeneous region are highly similar [2]. With the development of the spatial resolution of remote sensors, the high-spatial-resolution remote sensing images provide rich surface features. However, the fine spatial structure also increases the heterogeneity of homogeneous regions and complicates the spatial correlation. These characteristics bring big challenges to high-accuracy remote sensing image segmentation in balancing the noise immunity and effective characteristic retention [3,4].
At present, the commonly used segmentation methods include watershed [5,6], level set [7,8], clustering [9,10], and so on. In watershed methods, the image is regarded as landforms and the gray as To further improve the performance of the region-based segmentation algorithms in preserving effective details, a novel Voronoi tessellation technology with adaptive weighted distance is proposed for remote sensing image segmentation. To divide the image domain into many sub-regions with high spectral homogeneity and spatial connectivity, the distance between pixels and seed points in Voronoi tessellation extends from only spatial information to both spatial and spectral. An adaptive weight coefficient, modeled by a monotone decreasing function, is designed to control the spatial connectivity. After dividing, the dissimilarity between Voronoi sub-regions and clusters is modeled by the negative logarithm of the Gaussian probability distribution function (pdf) to further accurately describe the cluster characteristics. To consider the effect of spatial neighborhood system, the prior probability in Kullback-Leibler (KL) entropy regularization is defined based on MRF theory. Then, the optimal segmentation result is obtained by solving the segmentation model parameters according to parameter characteristics. The main contributions of this paper are summarized as follows, (1) We design a monotone decreasing function as the adaptive weight coefficient to control the influence intensity of spatial information. The farther the spatial distance is, the more important the spectral information is. Therefore, both the spectral homogeneity and spatial connectivity of sub-regions can be ensured greatly. (2) Integrating the adaptive distance-weighted Voronoi tessellation into the fuzzy clustering framework can describe the segmentation uncertainty more effectively and better balance the noise immunity and effective characteristic retention.
This paper is organized as follows. In Section 2 the adaptive distance-weighted Voronoi tessellation is introduced first, and then the establishing and solving of the regionalized segmentation model are discussed in detail. In Section 3, the experiments are designed to demonstrate the effectiveness of the proposed algorithm qualitatively and quantitatively. In Section 4, a deep analysis of the performance of the proposed algorithm is further discussed. Finally, the conclusion is exposed in Section 5.

Methods
An image I = {I i (a i , b i ): i = 1, . . . , n, (a i , b i ) ∈ Ω} is formed by giving the spectral information on a finite set of 2-D rectangular discrete pixel lattice Ω = {(a i , b i ): i = 1, . . . , n}; where i and n are the index and the total number of pixels, I i = (I ie : e = 1, . . . , h) is the spectral characteristic vector of pixel i, e, and h are the index and the total number of spectral channel, (a i , b i ) is lattice position of pixel i, respectively. For dividing the image domain into a series of sub-regions, the seed points set G = {g j (u j , v j ): j = 1, . . . , m, (u j , v j ) ∈ Ω} need to be produced first, where j and m are the index and the total number of seed points, g j = (g je : e = 1, . . . , h) and (u j , v j ) are spectral characteristic vector and lattice position of seed point j, respectively. The sub-regions corresponding to seed points can be expressed as V = {V j : j = 1, . . . , m}.

Adaptive Distance-Weighted Voronoi Tessellation
Voronoi tessellation is a spatial tessellation technology that divides the space into several sub-spaces according to the distance between points and seed points. the pixel has not only spatial position information but also spectral characteristic information, thus the distance can be described from two aspects by Equations (1) and (2), where d[·] represents distance function, and it is usually defined by Euclidean distance, d ij s and d ij c are the spatial distance and spectral distance between pixel i and seed point j, respectively.
In the traditional Voronoi-based segmentation algorithms, the distance is only modeled based on the spatial distance. Then, the sub-region V j is obtained by Although Equation (3) provides a good way for overcoming the complex noise, the spectral information must be considered by model constraints in the segmentation process. This method is deficient in the accurate description of image characteristics.
To consider the comprehensive effect of spatial position and spectral characteristic information in Voronoi tessellation, the mixed distance d ij of spatial distance d ij s and spectral distance d ij c is established.
Then, the sub-region V j is re-obtained by Equations (4) and (5), where w ij s and w ij c are the weighted coefficients of spatial distance and spectral distance respectively, they are satisfied w ij s + w ij c = 1. In order to make full use of the spatial distance to overcome the noise, and to finely describe the image characteristics with the help of spectral information, the adaptive weight coefficient determined by a monotone decreasing function with the increasing spatial distance is constructed by Equation (6), where M s is the maximum value of d ij s (i∈{1, . . . , n}, j∈{1, . . . , m}), α is the adaptive factor, α∈[0,1].
The curve of w ij s with different α is shown in Figure 1. It shows that the weight is independent of the spatial distance when α = 0, and the bigger α is, the closer the weight is to 1. In particular, d ij s and d ij c are in different orders of magnitude, which easily leads to invalid distance.
where d s ij and d c ij are the normalized spatial and spectral distance, ξ s and ξ c are the corresponding normalization coefficient respectively, According to Equations (5)-(10), d ij can be rewritten as, where M s is the maximum value of d . The visualization of the spatial distance effect is shown in Figure 2. Figure 2a represents the traditional Voronoi tessellation, Figure 2b represents the adaptive distance-weighted Voronoi Tessellation, where the depth of color represents the intensity of spatial distance effect to the corresponding seed point. It can be seen that pixels in the boundary between two homogeneous regions have smaller spatial distance effects, which can consider more spectral information and help to further accurately describe the image characteristics.

Segmentation Model
Assume that there are k homogeneous regions in the image, the fuzzy relationship between sub-regions and clusters can be expressed as R = [r jl ] m×k , where r jl is fuzzy membership of sub-region V j belonging to cluster l, and satisfied k l=1 r jl = 1, r jl ∈ [0, 1]. Then, the regionalized fuzzy cluster objective function can be defined as, where s jl is the dissimilarity measure between sub-region V j and cluster l, λ is the regularization term coefficient used to control the fuzzy degree, is the number of pixels in sub-regions V j , "#" is the counting symbol, ρ jl is the scale factor that protects small-scale cluster from being easily affected by large-scale.
Assuming that the spectra of pixels in sub-region V j follow independently identically Gaussian distribution, the joint probability of pixels in V j conditional on L j = l can be defined as, where . . , k} are mean and covariance of Gaussian distribution that cluster l follows, . . , k} is the cluster label to which sub-region V j belongs. The lager the dissimilarity measure is, the higher the consistency of features is. Thus, combining Equation (13), s jl can be defined as negative log-likelihood of the probability, In order to consider the spatial constraints, the positions and cluster labels of pixels can be regarded as a random field that is called the label field. Then, based on MRF, the scale factor can be defined by the prior probability of sub-region V j belonging to cluster l according to Hammersley-Clifford theorem by Equation (15), where δ j = j : V j ∼ V j is the neighborhood system of sub-region V j , as shown in Figure 3, the pink represents the center sub-region V j , the green represents the neighbor sub-regions V j , "~" represents neighborhood relationship; j is the index of neighborhood sub-regions; c is the index of cliques; C is the clique set; U c is the potential energy of clique c. In the region-level, C is the binary clique composed of all adjacent sub-regions. Thus, U c can be defined based on the Potts model, where β is neighborhood sub-regions interaction intensity, β∈[0, 1].

Parameter Estimation
According to the characteristics of parameters, some applicable solution schemes are designed to obtain the optimal parameter. (1) For R, it needs to achieve the minimization objective function under the constrain k l=1 r jl = 1, r jl ∈ [0, 1], thus, it is necessary to establish Lagrange function L(R, G, θ) by Equation (17), where (2) For θ, the analytical formulas can be obtained directly from derivatives. Let = 0, then, µ l and Σ l can be obtained by Equations (19) and (20), (3) For G. It is implicitly expressed in the objective function and cannot be solved by derivative. To determine the optimal position of seed points, a special descent method based on the image domain is designed. Suppose that the seed point of sub-region V j slightly changes its location from the original position (u j , v j ) to candidate position (u j , v j ) + o(u j , v j ), i.e., (u j , v j ) * , and all other seed points remain at the same position, where o(u j , v j ) represented the descent direction is a minimal moving distance. Let V = {V 1 , V 2 , . . . , V j , . . . , V m } be the sub-regions set generated by the original seed points set G = {g 1 (u 1 , v 1 ), g 2 (u 2 , v 2 ), . . . , g j (u j , v j ), . . . , g m (u m , v m )} and V* = {V 1 , V 2 , . . . , V j *, . . . , V m } be the candidate sub-regions set generated by the candidate seed points set G* = {g 1 (u 1 , v 1 ), g 2 (u 2 , v 2 ), . . . , g j *(u j , v j )*, . . . , g m (u m , v m )}. Owing to the slight moving operation, the sub-regions change slightly. This change occurs only in V j and its adjacent sub-regions, as shown in Figure 4, where the green represents adjacent sub-regions, the black heavy line area represents V j , the red heavy line area represents V j *, the blue represents the newly added area, the yellow represents the discarded area, the purple represents overlapping areas. From this property, the judgment function can be calculated by Equation (21), where s (il) can be regarded as the dissimilarity between pixel i and cluster l. If ∆J < 0, accept the candidate seed point (u j , v j )*, otherwise, stay at the same.

Experimental Results
To verify the effectiveness of the proposed algorithm, the experiments based on the proposed algorithm and five comparing algorithms including FCM_S [16], HMRF_FCM [24], SLIC_FCM [29], SF_FCM [36], VT_HMRF_FCM [25] are carried out on simulated image and multispectral remote sensing images. The characteristics of these comparing algorithms are listed in Table 1. The simulated image can provide a controllable environment for better evaluation and verification. In addition, the effectiveness of the proposed algorithm is also discussed by qualitative and quantitative evaluation, respectively. The recent effective method in region-based segmentation algorithms VT_HMRF_FCM The sub-regions are generated based on Voronoi tessellation according to the spatial information A region-based algorithm with good noise immunity that tessellation and optimization can be carried out simultaneously Figure 6a is the simulated image generated by the template with distribution parameters, as shown in Figure 6b and Table 2, where I-V are five homogeneous regions representing grass, water, forest, road, and artificial surface, respectively. There are similar means in regions I and III, the similar standard deviation in regions I and II, the sharp boundary between regions I and II, and slender geometry-shape in region IV.   Figure 7 is the representative segmentation result of the simulated image, where the three in the column represent segmentation results, outline images of segmentation regions, and superposition images of outline and simulated image, as shown in Figure 7a1-a3, the six in the row represent FCM_S, HMRF_FCM, SLIC_FCM, SF_FCM, VT_HMRF_FCM, and the proposed algorithm, as shown in Figure 7a1-f1. It can be seen that the speckle noise is unavoidable in the segmentation results based on pixel-based algorithms, as shown in Figure 7a1-a3,b1-b3. SLIC_FCM can effectively solve the speckle noise, but it cannot effectively distinguish the boundaries between homogeneous regions with similar spectra and segment the slender roads, as shown in Figure 7c1 To analyze the quality of sub-regions, the fitting deviation of sub-regions to homogeneous regions are shown in Figure 8.  Figure 8a1-a3, the sub-regions of SCLIC_FCM always cross the boundaries of homogeneous regions because the fixed weight used to integrate spatial and spectral information is easy to make the spectral information not giving full play. In Figure 8b1-b3, due to the neglect mechanism of gradient details, the boundaries of the sub-regions deviate from that of the homogeneous regions. In Figure 8c1-c3, although the phenomenon of sub-regions crossing the boundary of homogeneous regions has been partially improved by optimizing the location of the seed points, it is still not accurate enough on the slender road. In Figure 8d1-d3, the sub-regions generated by the adaptive distance-weighted Voronoi tessellation can fit the boundary of the homogeneous regions smoothly and accurately.  To quantitatively evaluate the effectiveness of the proposed algorithm, the User's Accuracy (UA), Producer's Accuracy (PA), Overall Accuracy(OA), and Kappa coefficient (Kappa) are calculated based on the confusion matrix built by segmentation results and template, as shown in Table 3. It shows that FCM_S is at the lowest accuracy, the OA and Kappa are only 74.48% and 0.63, respectively. Those of HMRF_FCM are 95.85% and 0.93 respectively, which is mainly influenced by the UA of region III.

Simulated Image
The OA of the three region-based algorithms, SLIC_FCM, SF_FCM, and VT_HMRF_FCM, are all above 96%, Kappas are also above 0.93. For the proposed algorithm, the OA and Kappa are 98.90% and 0.98, which is much higher than the algorithms mentioned above.

Remote Sensing Image Segmentation
To verify the segmentation effect for multispectral remote sensing images, many different types of images are tested with the proposed algorithm and five comparing algorithms, where the representative images are selected and shown in Figure 9.   Figure 10 is the representative segmentation result of multispectral remote sensing images. From left to right, the six columns represent FCM_S, HMRF_FCM, SLIC_FCM, SF_FCM, VT_HMRF_FCM, and the proposed algorithm, respectively. FCM_S usually makes a big mis-segmentation, such as regions II and III in Figure 10a1, and exists many speckle noise, such as region I in Figure 10a2 and region IV in Figure 10a4. Although HMRF_FCM has greatly improved the confusion phenomenon because of the advantage of the statistical model, there is still a lot of segmentation noise, especially for the region with a big variance, as shown in Figure 10b4. For SLIC_FCM, the segmentation noise is solved, but the results have a high under segmentation rate, such as region III in Figure 10c2 and region II in Figure 10c3. SF_FCM is good at segmenting homogeneous regions and difficult to overcome heterogeneous elements in regions with a big variance, as shown in the comparison between Figure 10d1,d4. VT_HMRFC_FCM has a strong ability to overcome heterogeneity, but it is easy to ignore the positive details, such as the slender road of region II in Figure 10e1 and the gap between houses of region III in Figure 10e2. For the proposed algorithm, it can overcome the heterogeneity while maintaining a great deal of positive information. The slender road, gaps, and the regions with big variance are effectively segmented, as shown in Figure 10f1-f4.  Table 4 is the quantitative evaluation of multispectral remote sensing image segmentation results. It shows that the highest OA and Kappa of FCM_S are only 83.26% and 0.77, respectively. The accuracy of HMRF_FCM is higher than SLIC_FCM, but less than SF_FCM, such as Figure 9a1,d1. For Figure 9a1, SF_FCM and VT_HMRF_FCM have similar accuracy. But for the images with strong heterogeneity, SF_FCM is much lower than VT_HMRF_FCM, such as Figure 9d1. However, the accuracy of the proposed algorithm is always the highest one, the OA and Kappa can be as high as 98.23% and 0.97, respectively. Table 4. Quantitative evaluation of multispectral remote sensing images segmentation results.

Algorithms
Figure 9a1

Discussion
To study the proposed algorithm deeply, the parameters influence and comprehensive performance are further discussed in this section, where the number of sub-regions m can control the fineness of the segmentation, the regularized term coefficient λ controls the fuzzy degree of clustering, the neighborhood sub-regions interaction intensity β determines the degree of spatial constrain, and the adaptive factor α is used to balance the weight of spatial and spectral information. In addition, OA is used as the evaluation criteria.

Parameter Influence Analysis
The influence of different model parameters on segmentation results is discussed by the control variate method, as shown in Figure 11. Figure 11a-d represent m, λ, β, and α respectively. Figure 11a shows that the OA increases from rapid to slow with the increased m, but the running time increases steadily. The bigger the m is, the more updating of seed points is needed. According to segmentation efficiency, it can be seen that the best value range of m is 80-100. It can achieve higher accuracy in less time. Figure 11b illustrates that λ has little effect on segmentation results in the range of 0.1-0.7. In Figure 11c, the OA drops sharply from 0.7, and the highest value is at 0.3. In Figure 11d, OA shows a decreasing trend from 0.2. Generally, the best value of α is about 0.2. Figure 12 is the box-plot for multispectral remote sensing image segmentation results, which is used to evaluate the comprehensive performance of algorithms. The dotted line covers the range of OA. The box shows the range of centralized data. The red line represents the median. Figure 12a-d corresponds to Figure 9a1-d1, respectively. It shows that HMRF_FCM, SLIC_FCM, SF_FCM, and VT_HMRF_FCM have different degrees of instability in segmenting different types of images, FCM_S is stable at a relatively low level, the proposed algorithm is stable at a high level and always above 90%. Besides, the greater the variance of the image, the more obvious the advantage of the proposed algorithm, as shown in Figure 12d.

Conclusions
In this study, an adaptive distance-weighted Voronoi tessellation is proposed to deal with the difficulty of balancing the noise immunity and effective characteristic retention in remote sensing image segmentation. The monotone decreasing function is designed to describe the weight for coupling spatial distance and spectral distance in Voronoi tessellation, which provides a new technique for ensuring the spectral homogeneity and spatial connectivity of sub-regions as much as possible. The pdf of Gaussian distribution used to model dissimilarity can effectively describe the random distribution characteristics of spectra. The Voronoi sub-regions-based fuzzy clustering objective function established by KL entropy regularization term based on MRF model not only can accurately model the segmentation uncertainty, but also effectively model the spatial neighborhood effect at sub-region level. Besides, the parameter solving methods are designed according to the characteristics, which can greatly approach the optimal solution.
A large number of experiments show that the proposed algorithm can keep effective detail information while keeping high noise immunity. The segmentation accuracy is significantly improved, especially for the images with large spectral variance and great differences of cluster scale. However, there are still some limitations, one is that the number of Voronoi sub-regions needs to be determined according to the empirical knowledge, another is that the unimodal distribution cannot effectively describe the multimodal distribution caused by complexed scene. To solve these limitations, the initialization method of the number of sub-regions and modeling method based on multimodal distribution can be further studied.