Image Segmentation by Searching for Image Feature Density Peaks

: Image segmentation attempts to classify the pixels of a digital image into multiple groups to facilitate subsequent image processing. It is an essential problem in many research areas such as computer vision and image processing application. A large number of techniques have been proposed for image segmentation. Among these techniques, the clustering-based segmentation algorithms occupy an extremely important position in this ﬁeld. However, existing popular clustering schemes often depends on prior knowledge and threshold used in the clustering process, or lack of an automatic mechanism to ﬁnd clustering centers. In this paper, we propose a novel image segmentation method by searching for image feature density peaks. We apply the clustering method to each superpixel in an input image and construct the ﬁnal segmentation map according to the classiﬁcation results of each pixel. Our method can give the number of clusters directly without prior knowledge, and the cluster centers can be recognized automatically without interference from noise. Experimental results validate the improved robustness and effectiveness of the proposed method.


Introduction
Image segmentation refers to partition an image into distinctive regions, where each region consists of pixels with similar attributes.The purpose of image segmentation is to simplify or change the representation of an image into some common features that are more meaningful and easier to analyze [1][2][3].Over the past several decades, image segmentation has been widely used in diverse applications of computer vision and image processing, such as object detection [4], face recognition [5], image retrieval [6], and medical image analysis [7].
A large number of techniques and algorithms have been proposed for image segmentation.Color image segmentation of natural and outdoor scene is a well-studied problem due to its numerous applications in computer vision.Different methods have been already proposed in the state of the art based on different perspectives [8,9].The most commonly used in image segmentation methods are listed roughly as follow: segmentation based on edge detection, region extraction, threshold method, and clustering techniques [10,11].There also exists graph cut based method which performs image segmentation by using both edge and regional information [12].Besides, segmentation may be also viewed as classification problem based on color and spatial features.In this regard, the rough set theory [13], which can extract the discriminative features from original data, has been applied to color image segmentation [14].
In particular, the K-means and fuzzy c-means (FCM) clustering are two of the most effective and popular algorithms in this field, which are carried out by classifying elements into different regions based on element similarity [15][16][17].K-means clustering is a widely used technique with simple implementation and good convergence speed.This method aims to group pixels of a picture into K clusters, according to the similarity between a pair of data components [17].In the segmentation process, K clustering centers are first determined, then this method intends to place them as far as possible away from each other for optimally clustering [1].However, the clustering performance of this method depends heavily on prior knowledge, which may fail when data points described in the feature space are nonspherical clusters.Fuzzy clustering algorithm has been applied to many domains.This method superior to most hard clustering techniques in preserving original image information in the clustering process [18].FCM is an unsupervised algorithm based on the idea that clustering the data points by minimizing the cost function iteratively and maximizing the distance between cluster centers [19][20][21].However, the FCM algorithm is very sensitive to additive noise due to the lack of consideration of the image context, which lacks algorithm's robustness to deal with image noise [22].In addition, this method requires a large amount of calculation and often appears an over-segmentation phenomenon.
Alex et al. [15] propose a new clustering algorithm.It first defines two variables for each data point based on the nature of clustering centers.Then, a decision graph is designed to make the cluster centers isolated from other data points for clustering.However, the strategy in Alex et al. [15] still has drawbacks.For instance, the accuracy of the algorithm largely depends on the choice of threshold which is difficult to define effectively [23].Besides, the cluster centers must be manually selected from the final decision graph generated by the algorithm.
In this paper, we propose a novel clustering-based method for image segmentation, which can automatically recognize the cluster centers by searching density peaks efficiently without defining the threshold.More specifically, we define a function to separate the clustering centers and other data points, change the way to calculate variables.In addition, this algorithm requires neither iteration nor prior knowledge, which can simply select the cluster centers and return the number of clusters.
The rest of this paper is organized as follows.We first describe the architecture of our work in Section 2. Experimental results are demonstrated in Section 3. Section 4 presents the conclusion and future work in image segmentation.

Materials and Methods
In this section, we present the technical details of our approach, which is carried out by searching for image feature density peaks.After image preprocessing and color feature extraction, the algorithm abstracts the color features into the sample distribution in cluster analysis.Then two variables are defined for each sample point separately: the local density ρ i of point i and its distance δ i from the points with higher density.Finally, the cluster centers are recognized as points which have anomalously large values of δ i and ρ i .
Figure 1 presents an overview of the approach.When dealing with image segmentation problems, what we need are often the picture's feature information for each pixel, such as luminosity, color, contrast and other feature information, etc.So in the first place, we should extract this information, and store in an array or a vector digitally.Furthermore, it is crucial to find an applicable space to express these features for analyzing and quantifying in this space.Since the image size is relatively large, we need to preprocess it by using superpixel method for retaining the useful feature information and reducing data redundancy.

Superpixel Method for Image Preprocessing
The conventional segmentation approach will process each pixel one by one, which will cause a massive amount of data have trouble in analyzing, especially when the picture size is relatively large.In the present work, we use superpixel segmentation as an intermediate step to reduce the complexity of images from millions of pixels [24].Superpixel segmentation aims to partition a picture into multiple homogeneous cells, we called superpixels, which has been widely applied to the picture analysis and simplification.Stutz et al. [25] comprehensively evaluated a variety of advanced superpixel algorithms in their studies.Among these algorithms, the simple linear iterative clustering (SLIC) can effectively adhere to image boundaries as well as or better than other schemes [26].We employ this algorithm to divide the original image into a number of irregular blocks according to their similarity, which changes a pixel-level image into a district-level image.Each block is a perceptually consistent unit, and all pixels in a little superpixel region are most likely uniform in color.The different quantity of superpixels can be set according to the different size of the picture, and the values of the pixels contained in each superpixel region are the same.Hence each superpixel can be abstracted as a sample point.It is obviously that this way makes system requires less computation compared with other algorithms.
In this process, superpixel segmentation provides a more characteristic and significant representation easy to perceive of the digital image [27], most structures in the image are preserved.It is more helpful and meaningful for centralized processing of valid elements utilizing superpixel segmentation.There is very little loss in moving from the pixel-level map to the superpixel-level map.In the next task of image segmentation, superpixels are used as sample points for analysis.This method is illustrated by a simple example in Figure 2.

CIELab Color Space for Image Feature Description
In the conventional color representation space, such as RGB color space and CMYK color space, their channels contain not only color information but also luminosity information, which can not be extracted separately.In CIELab space, because of its unique channel settings, luminosity features are stored in L channels alone, and color features are stored in a and b channels.They are independent of each other.Therefore, any operation to the image in the Lab color space will not affect the hue.If luminosity and color features need to be extracted and adjusted, Lab space will facilitate the operation.In addition, Lab color space has a large range than RGB space, which means that color information described in RGB space can also be mapped in Lab space, and it can make up for color distribution inhomogeneity in the other color space.
In this method, we choose the CIELab color space, which consists of three channels for describing colors visible to human beings [28].A method based on rough set for color channel selection proposed by Soumyabrata Dev et al. [14] provides ideas for the choice of color space in our work.Thus, in image segmentation, even only choose one color channel to analyze can we accomplish our task satisfactorily either.It also contributes to reducing the computation amount of the algorithm.After the superpixel segmentation is finished, we will use the planar space composed of L channel and a channel as the image feature space.With the distribution of sample points in the feature space, cluster analysis can be performed.More specifically, we take a fraction of Figure 2a and take it as an example, the above process is intuitively illustrated in

Improvement of the Clustering Method
The novel clustering approach based on the assumption is that the cluster centers are surrounded by points with lower local density and have a relatively large distance from points with higher density [15].The steps that required to complete this method are as follow: For each point i, 1 ≤ i ≤ N (N is the total number of superpixels) based on the basic assumption, there are only two quantities need to be computed: its distance δ i from points of higher density, and its local density ρ i [15,23,29].In the original clustering method, the local density ρ i is computed according to the following equation: where d ij is the Euclidean distance between data point i and data point j, and d c is the cutoff distance.Actually, ρ i is decided based on the number of points that are closer than d c to point i.It's obvious that the choice of d c has a huge impact on the algorithm if the value of d c is not appropriate, then the efficiency of the procedure will be greatly reduced.The solution of local density ρ i is an important parameter which indicates the sparseness of distribution of sample points and has a considerable influence on the final analysis results [30].
In our work, we proposed another way to compute local density ρ i without a cutoff distance.Kernel density estimation (KDE) [31] is the most commonly used density estimation method, which is a non-parametric way to estimate the probability density function of a random variable.Let us define an multivariable independent sample ((x 1 , y 1 ), (x 2 , y 2 ), . . ., (x n , y n )), whose distribution is drawn from our superpixels distribution in the feature space.In other words, the values of the sample points in the L channel and a channel are respectively stored in x and y.We intend to fit the shape of samples' probability density function ρ, its kernel density estimator is: where K is kernel which is a non-negative function integrates into one.h > 0 is bandwidth, which is also called window, it is a smoothing parameter whose choice will strongly influence the estimation results [32].
The kernel density estimation is to use a smooth peak function we called 'kernel' to fit the observed data points, so as to simulate the true probability distribution curve.Kernel density estimate has many types of kernel, the Gaussian kernel function is one of the most commonly used among them, so we apply it to figure out local density ρ i .
In this method, we used the kernel approach described in [16].And we can see the 3-dimensional density estimation result clearly in Figure 4a.In addition to the following calculations, the density values and function shape also provide grounds for roughly estimating the location of the cluster centers and more comprehensively classifying sample points.
After we figure out the local density ρ i , the distance δ i can be computed by choosing the minimum distance between the point i and any other points with higher density, we take the maximum distance d ij between data points as its δ i .The distance for δ i each point i is defined as: We can find an important characteristic of the cluster centers based on the results of δ i and ρ i .The cluster centers are the points with high local density ρ i and a relatively high distance between other points with higher density, i.e. δ i .Based on this assumption, we construct a new graph containing both the δ i and ρ i in it, called decision graph.Figure 5 shows the decision graph, which is represented based on the both the local density ρ i and distance δ i adopted in our method.The decision graph is designed to represent the core nature of cluster centers, horizontal ρ axis and vertical axis δ respectively pull the cluster center upward and rightward, so that the cluster centers stand out.The KDE searches the density peaks in the probability estimate function plot (Figure 4c) accurately, and in Figure 4d it is not difficult to find that there are only a few points with a large value of δ.Comprehensive δ and ρ values, when their values of some sample points are both large, these points are most likely cluster centers.However, the final result in [15] still needs to be manually selected, which is considerably affected by human factors, different options will lead to entirely different segmentation results.

Adaptive Selection of Cluster Centers
Due to the difficulty of solving the above problem, a separate function is found to assist in picking the cluster centers automatically.The measure of functions is based on the method in [33] as following: k is a constant determined by experience, which will affect the accuracy of the algorithm to some extent.In Section 3, the exact k value and its influence on the accuracy will be discussed in detail.
The points are considered as cluster centers when R > 0. Thus, the separate function can achieve the separation between cluster centers and other sample points.Therefore, the decision graph in Figure 5 can be redrawn as follows: As shown in Figure 6, the points to the right of the curve are considered as the cluster centers.And after the cluster centers have been found, the remaining data points will be assigned to the same clusters as its nearest neighbors of higher density.Once a point is assigned to a cluster, the information regarding the classification is updated immediately, this procedure continues until no valid candidates are left to be assigned [8].We define the candidacy of a point i as follows: For each point, we check its candidacy firstly.This helps us to filter out the points which are not valid candidates to be analyzed and hence reduces the computational time.This step is shown in Algorithm 1.

Algorithm 1 Assignment of remaining points
for all superpixels computed from the original image.1: for each: superpixel i with cady(i) = 0 in the rest of un-classified superpixels do 2: Searching for a series of superpixels SL with lower local density ρ i by using density set P and Equations ( 1)-( 4).

3:
Searching for one superpixel pm with the minimum value of σ i from SL by using the distance set Q and Equations ( 5)-( 8).Marking each pixel points with its cluster number, and the cluster number of each pixel data is the cluster number of the superpixel area to which the point belongs.And fill all pixel points with the mean color of the clusters they are belonging to.Then achieving the final segmentation based on the numbers marked through the last step, as shown in Figure 7.

Data and Experimental Setting
We employed the Berkeley segmentation database (BSDS300) for evaluation of our segmentation scheme.The BSDS300 database consists of 300 natural images, each with multiple ground truth segmentations provided by different individuals.This database contains a variety of content, including landscapes, animals, buildings, and portraits, makes it a challenge for any segmentation algorithm.In addition, it has found wide acceptance as a public benchmark for testing image segmentation algorithms [34,35].In this paper, the entire BSDS300 database is first employed to investigate the effect of several user-defined parameters and evaluate the performance of our method compared to other schemes.Then, we divided the database into 7 different subsets according to specific content in order to further evaluate the segmentation performance in preserving the salient features of the input images.The evaluation details will be discussion in Section 3.2.All of our experiments are performed on PC with Intel CoreTM i7-6700 CPU with a frequency of 3.40 GHz and 8 GB of RAM under MS Windows 10.
Following the previous works [22,36], we adopt the commonly used Probabilistic Rand index (PRI) [34,37] to measure the results of image segmentation for each image within the dataset.PRI was initially introduced for measuring the similarity between two data clusterings.It is now widely used for the comparison of segmentation algorithms using multiple ground truth images.PRI operates by comparing the compatibility of assignments between pairs of elements in the clusters.Its value between test and ground truth segmentations is computed by the sum of the number of pairs of pixels that have the same label in test S and ground truth segmentations G, and those that have different labels, divided by the total number of the couple of pixels [34].Specifically, given a set of ground-truth segmentations {G k }, the PRI is defined as: where c ij is the event that pixels i and j have the same label and p ij its probability.T is the total number of pixel pairs.We employ the sample mean to estimate p ij , Equation (10) amounts to averaging the PRI among different ground-truth segmentations.The PRI has been reported to suffer from a small dynamic range [38,39], and PRI values across images and algorithms are often similar.In [38], this drawback is addressed by normalization with empirical estimation of its expected value.

Evaluation
We carried out a series of experiments to validate the usefulness of the output of our segmentation method.We first ran our algorithm on the entire database to investigate the effect of several user-defined parameters and to empirically choose an optimal value for these parameters.Then, we evaluate the proposed method by comparing the qualitative & quantitative performance of our method with state-of-the-art schemes.Both the quantitative and qualitative results demonstrated that our method could successfully improve segmentation accuracy.

Effects of Parameters
Ronneberger2015UNetConvolutionalIn our algorithm, there are practically only two parameters to control, h (bandwidth in Equation ( 3)) and k (constant in Equation ( 6)).h is a parameter which basically determines the bandwidth of smoothing window and have a clear impact on the performance.Typically, an estimate with smaller value of h might provide a better estimate to the empirical cumulative distribution function.We performed our algorithms with varied value of h, and we found that the segmentation accuracy tends to be stable when when h varied from 0.5 to 1.5.Afterwards, we compare the performance when k = 0.5, 0.6, • • • 1.5 under fixed value of k.We observed that the segmentation accuracy achieves the best when h = 0.8.Thus we empirically set h to be 0.8 in our experiment for keeping a stable performance.Moreover, we found that the accuracy tends to be stable when k takes values between 0.04 and 0.06 after running our algorithm for a certain number of times.For k ∈ (0.04, 0.06), we found that it also has a clear impact on the segmentation performance, as shown in Figure 8, where the accuracy indicates the ratio of the total correctly classified pixels.In addition, the accuracy tends to be stable when k is 0.0462 and 0.0515, and it reaches the peak of 0.0972 when k = 0.0501.

Quantitative Evaluation
To quantitatively evaluate our algorithm, we use PRI to compare the performance of the proposed method with other schemes.We first perform all methods on the entire database by following their optimal settings to do the comparison experiments.The average values of PRI computed on 300 images of BSD300 dataset are shown in 8th raw of Table 1, in which our scheme achieves the highest value.In addition, in order to further evaluate the classification performance of our method, we divided the databased into 7 subsets based on different content and characteristics appearing on images.Then the PRI counts the fraction of pairs of pixels whose labels are consistent between the computed segmentation and the ground truth, averaging across multiple ground truth segmentations to account for scale variation in human perception [40].The PRI values for all types of images are shown in the first 7 raws of Table 1.As noted in this Table, the sensitivity of the proposed method is best with PRI = 0.897, which are also the highest value among the other two methods, and only 0.002 behind the K-means method [41] on the image 122048 type.Moreover, we have shown the average computation time for all methods, which indicates that our method can be carried out with less computational cost.

Qualitative Evaluation
To prove the effectiveness of the proposed method, we compare our method with existing segmentation methods on the Berkeley datasets: BSDS300.We compute the results of our method on entire datasets, and then the results of FCM-S and K-means are computed by following the optimal settings in the previous work [41,42], respectively.Figure 9 shows the qualitative results of 7 images, with each one is randomly selected from one of 7 image types (corresponding to first 7 rows of Table 1).These results suggest that our method can achieve superior performance in preserving the salient features of the input images due to its excellent accuracy of pixel classification.For example, from Figure 9, it can clearly be observed that there are considerable number of pixels in the white kerchief or human face (1st row), body of the bird (5th row), and the church clock (end row) are falsely clustered as background or the part of the other objects.Comparing with the other two segmentation approaches, the proposed scheme accomplishes the segmentation requests by avoiding these classification errors through reasonably classifying these pixels.Moreover, as we can see in the 3rd row of Figure 9, there is an obvious intensity inhomogeneity occurring around the image.By contrast, the proposed scheme outperforms the FCM-S and K-means by producing more homogeneous background.

Discussion
In this paper, we proposed a new image segmentation method which does not require a priori knowledge and a large amount of computation.This method is based on the assumption that a cluster center is surrounded by its neighbors with lower density, and the data with a higher density than the cluster center must be far it.The decision graph is designed to take variables into comprehensive consideration, and a separate function is automatically defined to select cluster centers, and thus the algorithm can figure out the most appropriate number of clusters according to the actual situation of different pictures in the case of unsupervised.
Image segmentation problem is challenging, many issues still need to be resolved.Our method is based on the clustering model in [15], but more of an automatic method and does not define the threshold.Superpixel segmentation methods are used to make an image preprocessing to reduce the computational complexity, and regardless of the size of the input image.Moreover, the computation time of the algorithm can always be maintained within a very reasonable range.Experiment results indicate that our method is more effective and more stable compared to state-of-the-art clustering methods.
We still have some challenges, such as the number of superpixel areas still needs more evidence to determine to avoid segmentation results are not meticulous enough, and it is also closely related to the runtime.

Figure 1 .
Figure 1.Overview of our image segmentation framework.

Figure 3 .Figure 2 .
Figure 2. The process of superpixel pre-segmentation.(a) The original image.(b) We partition the original image into multiple homogeneous.(c)The superpixel level map.Calculating the average of the color features for each superpixel region and use that value to replace the pixel values for all points in the region.In this process, not only can maximize the retention of the effective information but can drop some unnecessary noise, the complexity of the calculation will be greatly reduced.

Figure 3 .Figure 4 .
Figure 3.The flowchart of Sections 2.1 and 2.2.The small image is an interesting region selected from the original image, and this interesting region is divided into 62 superpixels.We consider each superpixel as a sample point and represent them in feature space.

Figure 5 .
Figure 5.The decision graph, which is represented based on the both the local density ρ i and distance δ i used in our method.

Figure 6 .
Figure 6.Separate Function Decision Graph.The letters of final cluster centers are marked by red color.

4 : 6 :
if pm is a cluster center then 5: i ← L, where L is the class label of pm Update cluster centers cady(i): cady(i) ← 1 7: end if 8: end for Output: All of the classified superpixels.

Figure 7 .
Figure 7.The segmentation result (a) by our method on the original picture (b).

Figure 8 .
Figure 8. Performance with different parameter values, k is an constant between 0.04 and 0.06.

Figure 9 .
Figure 9. Results for different segmentation algorithms based on clustering.(a) Original image (b) FCM-S based segmentation (c) K-means based segmentation (d) The method we proposed.

Table 1 .
Comparison of different method for Berkeley image dataset, Probabilistic Rand Index (PRI).