Efﬁcient Superpixel-Guided Interactive Image Segmentation Based on Graph Theory

: Image segmentation is a challenging task in the ﬁeld of image processing and computer vision. In order to obtain an accurate segmentation performance, user interaction is always used in practical image-segmentation applications. However, a good segmentation method should not rely on much prior information. In this paper, an efﬁcient superpixel-guided interactive image-segmentation algorithm based on graph theory is proposed. In this algorithm, we ﬁrst perform the initial segmentation by using the MeanShift algorithm, then a graph is built by taking the pre-segmented regions (superpixels) as nodes, and the maximum ﬂow–minimum cut algorithm is applied to get the superpixel-level segmentation solution. In this process, each superpixel is represented by a color histogram, and the Bhattacharyya coefﬁcient is chosen to calculate the similarity between any two adjacent superpixels. Considering the over-segmentation problem of the MeanShift algorithm, a narrow band is constructed along the contour of objects using a morphology operator. In order to further segment the pixels around edges accurately, a graph is created again for those pixels in the narrow band and, following the maximum ﬂow–minimum cut algorithm, the ﬁnal pixel-level segmentation is completed. Extensive experimental results show that the presented algorithm obtains much more accurate segmentation results with less user interaction and less running time than the widely used GraphCut algorithm, Lazy Snapping algorithm, GrabCut algorithm and a region merging algorithm based on maximum similarity (MSRM).


Introduction
Image segmentation, which aims to extract objects of interest from a complex background for object detection, tracking, recognition, scene analysis, etc., is one of the basic problems in image processing, and has been widely used in pattern recognition and computer vision [1][2][3][4][5][6][7].Many image-segmentation algorithms have been proposed in recent years .According to different image types, image segmentation can be divided into monochrome and color image segmentation.According to image representation, image segmentation can be divided into single scale and multi-scale approaches.According to the principle of operation, image segmentation can be divided into spatially blind and spatially guided methods.According to whether priori knowledge is provided, image segmentation can be divided into automatic and interactive segmentation.For natural images of various types and with complicated content, an interactive segmentation-based method is usually used, because its segmentation is more consistent with users' subjective intentions [12][13][14][15][16][17][18][19][20].GraphCut is one of the generally used interactive segmentation algorithms [13][14][15][16][17][18][19] due to its global optimization, strong numerical robustness, high execution efficiency, free topological structure of partitioned weighted graph, and N-D image-segmentation In this paper, we present an efficient superpixel-guided interactive image-segmentation algorithm based on graph theory.Firstly, MeanShift algorithm is applied to pre-segment an image into regions (superpixels), and then the proposed algorithm constructs a weighted directed graph whose nodes are composed of the pre-partitioned regions.This model not only considers the correlation between adjacent superpixels, but also takes the relationship between each superpixel and the interaction information into account.Most importantly, we use the color histogram to represent each superpixel, which can accurately represent the regional color distribution.As with the MSRM segmentation algorithm, we choose Bhattacharyya coefficients to measure the similarity between superpixels, and the maximum flow-minimum cut algorithm is then performed to obtain the first-stage segmentation results.By considering the edge leakage problem caused by the MeanShift algorithm, the proposed algorithm creates a narrow band region by using the morphological operation at the boundary of targets based on the first-stage segmentation results.At the same time, the foreground and background regions are determined, and the corresponding foreground and background models are established by Gaussian mixed models (GMMs).Finally, a graph model is rebuilt again by considering pixels as nodes in the narrow band region and the final segmentation results are obtained by using the maximum flow-minimum cut algorithm.Through a large number of experiments, it is shown that the proposed algorithm achieves better segmentation results with less user interaction and execution time compared with the GraphCut, Lazy Snapping, GrabCut and MSRM algorithms.
The remainder of this paper is organized as follows: Section 2 summarizes the related work including the GraphCut, LazySnapping, GrabCut and MSRM algorithms.Section 3 introduces the motivation of the proposed algorithm and the detail is described in Section 4. Section 5 performs extensive experiments to verify the proposed algorithm.Section 6 concludes this paper.

GraphCut Algorithm
The GraphCut algorithm [13] transforms the image-segmentation problem into graph-partition problem.It represents an image as a weighted directed graph G = (V, E, W), where V represents nodes of the graph corresponding to pixels in the image; E denotes the edges of the graph corresponding to connection between two adjacent nodes (usually for four or eight in the neighborhood); W is the weight of edges used to indicate the similarity of adjacent nodes.Before executing the maximum flow-minimum cut algorithm, users will be asked to first input the interaction information to mark the foreground (indicated by a set F) and the background (indicated by a set B); the other unmarked pixels are indicated by a set U and V = F ∪ B ∪ U.The foreground and background are modeled by the gray-scale histogram, and the two virtual nodes are then constructed as the source node (corresponding to the foreground model) and the sink node (corresponding to the background model).Among them, the weight is corresponding to the edges between any node in set V and the source/sink nodes, indicating the tendency that the corresponding node belongs to the foreground or the background.Figure 1 shows a graph cut example for a 3 × 3 image, where Figure 1a is a directed graph composed of a pixels set V and two virtual nodes source and sink (denoted as s and t), and Figure 1b is an optimal cut solved by the maximum flow-minimum cut algorithm.
Symmetry 2018, 10, x FOR PEER REVIEW 3 of 22 In this paper, we present an efficient superpixel-guided interactive image-segmentation algorithm based on graph theory.Firstly, MeanShift algorithm is applied to pre-segment an image into regions (superpixels), and then the proposed algorithm constructs a weighted directed graph whose nodes are composed of the pre-partitioned regions.This model not only considers the correlation between adjacent superpixels, but also takes the relationship between each superpixel and the interaction information into account.Most importantly, we use the color histogram to represent each superpixel, which can accurately represent the regional color distribution.As with the MSRM segmentation algorithm, we choose Bhattacharyya coefficients to measure the similarity between superpixels, and the maximum flow-minimum cut algorithm is then performed to obtain the first-stage segmentation results.By considering the edge leakage problem caused by the MeanShift algorithm, the proposed algorithm creates a narrow band region by using the morphological operation at the boundary of targets based on the first-stage segmentation results.At the same time, the foreground and background regions are determined, and the corresponding foreground and background models are established by Gaussian mixed models (GMMs).Finally, a graph model is rebuilt again by considering pixels as nodes in the narrow band region and the final segmentation results are obtained by using the maximum flow-minimum cut algorithm.Through a large number of experiments, it is shown that the proposed algorithm achieves better segmentation results with less user interaction and execution time compared with the GraphCut, Lazy Snapping, GrabCut and MSRM algorithms.
The remainder of this paper is organized as follows: Section 2 summarizes the related work including the GraphCut, LazySnapping, GrabCut and MSRM algorithms.Section 3 introduces the motivation of the proposed algorithm and the detail is described in Section 4. Section 5 performs extensive experiments to verify the proposed algorithm.Section 6 concludes this paper.

GraphCut Algorithm
The GraphCut algorithm [13] transforms the image-segmentation problem into graph-partition problem.It represents an image as a weighted directed graph G = (V, E, W), where V represents nodes of the graph corresponding to pixels in the image; E denotes the edges of the graph corresponding to connection between two adjacent nodes (usually for four or eight in the neighborhood); W is the weight of edges used to indicate the similarity of adjacent nodes.Before executing the maximum flow-minimum cut algorithm, users will be asked to first input the interaction information to mark the foreground (indicated by a set F) and the background (indicated by a set B); the other unmarked pixels are indicated by a set U and V F B U = ∪ ∪ .The foreground and background are modeled by the gray-scale histogram, and the two virtual nodes are then constructed as the source node (corresponding to the foreground model) and the sink node (corresponding to the background model).Among them, the weight is corresponding to the edges between any node in set V and the source/sink nodes, indicating the tendency that the corresponding node belongs to the foreground or the background.Figure 1   The GraphCut algorithm solves the optimal segmentation problem by minimizing the following energy function, where ι is the segmented label value, 0 indicates the background and 1 indicates the foreground.R(ι) represents the area item, B(ι) represents the edge item, and the parameter λ (λ ≥ 0) is the balance factor between the region item and the edge item.R(ι) and B(ι) are defined as follows, where, R i (ι i ) and B {i,j} are defined as follows, where, B(i, j), N represents the neighborhood of pixels.I i represents the gray value of pixel i, Pr(I i |F) represents the probability distribution value of I i in the histogram of the foreground F, and Pr(I i |B) represents the probability distribution value of I i in the histogram of the background B. The parameter σ is used to control the gray level difference between pixels i and j, dist(i, j) represents the spatial distance between pixels i and j.

Lazy Snapping Algorithm
Since the GraphCut algorithm chooses pixels as nodes to construct a directed graph, the corresponding number of nodes is huge, which largely reduces the efficiency of this algorithm.In response to this problem, Li et al. proposed the Lazy Snapping algorithm [18].By using the WaterShed algorithm [21], they pre-segment images and replace pixels in the GraphCut algorithm with processed regions as nodes.In addition, this algorithm used the color information instead of the gray information in the original GraphCut algorithm, which further improves the segmentation accuracy.Based on the region representation for graph nodes, Li et al. defined the following energy function, where E 1 (ι i ) represents the likelihood energy and E 2 (ι i , ι j ) represents the prior energy.In the Lazy Snapping algorithm, Li et al. uses the mean color of each region (denoted as C i ) to represent the node, and the K-means algorithm was used to cluster the marked foreground pixels set F and the background pixels set B.
Symmetry 2018, 10, 169 where, the minimal distance of region i to the foreground and background are calculated as It should be noted that although Equations ( 3) and ( 6) are exactly opposite, they can obtain the same segmentation results.

GrabCut Algorithm
In the process of interactive image segmentation, in order to further reduce user interaction Rother et al. proposed an efficient GrabCut segmentation algorithm [19].The algorithm only needs users to draw a rectangle to cover objects to be segmented, where the outside part of the box is all the background and the inside part of the frame consists of the foreground and the background simultaneously.The GrabCut algorithm then applies the iterative maximum flow-minimum cut algorithm to extract objects automatically.The process of this algorithm is described as follows: Step 1. Users complete the image marking by setting a rectangle.
Step 2. According to the users' marking, the image is initially divided into two groups: pixels inside the rectangle are taken as the foreground (actually a mixture of foreground and background), and pixels outside the rectangle are taken as the background.
Step 3. A Gaussian mixed model (GMM) is created for the foreground and the background respectively, and each GMM has K Gaussian models.
Step 4. Dividing each pixel in the foreground into a Gaussian with the largest probability distribution value in the foreground Gaussian mixed model, and emphatically dividing each pixel in the background into a Gaussian with the largest probability distribution value in the background Gaussian mixed model.
Step 5. Updating each Gaussian distribution in the foreground and background respectively.
Step 6. Establishing a graph model by using all image pixels as nodes, the maximum flow-minimum cut algorithm is then implemented to complete the optimal graph partition of the current iteration.
Step 7. Repeating Steps 4-6, until the energy obtained in Step 6 no longer changes.Compared with the GraphCut and Lazy Snapping algorithms, the interaction process of the GrabCut algorithm is simple.Because all the pixels outside the rectangle belong to background and their labels have been completely determined, in the iterative process it is only necessary to solve the optimal energy problem using the maximum flow-minimum cut algorithm for those pixels inside the rectangle.

Region Merging Algorithm Based on Maximum Similarity (MSRM)
Ning et al. proposed a region merging algorithm based on maximum similarity (MSRM) [20].Firstly, this algorithm pre-segments a color image by the MeanShift algorithm [25], then chooses the Bhattacharyya coefficient to calculate the similarity between adjacent regions, and finally automatically completes the segmentation process through an iterative region-merging method.The MSRM algorithm quantizes the RGB color space of an image into 16 × 16 × 16 = 4096 grids and then calculates the normalized histogram H i of each area.The similarity between two regions i and j measured by the Bhattacharyya coefficient is computed as follows: To complete the automatic image segmentation, Ning et al. defined the following region-merging rules: For any region R, let Q be an adjacent region of R and a set of all the adjacent regions of Q is represented as The similarity between Q and its all adjacent regions is calculated by using Equation (8), and the ranking is performed from maximum to minimum. the largest, the region Q should be merged into the region R.The merger algorithm is implemented in two steps: step one is to merge an unlabeled region in set U with a background-labeled region B; and the other is the automatic merging between unlabeled regions in set U. The MSRM algorithm repeats the above two steps until iteration convergence when no new merging occurs.

Motivation
In the Lazy Snapping algorithm [18], a pre-segmentation scheme is implemented by using WaterShed algorithm [21], and pixels as nodes in original GraphCut algorithm [13] are replaced by the obtained pre-segmentation regions as new nodes, which effectively improve segmentation efficiency and performance.But this algorithm also has the following problems: (1) Each region in the Lazy Snapping algorithm is modeled by the mean of color in this region, which is not sufficient to describe each region effectively.(2) The K-Means algorithm is used in clustering of the marked seed points.This clustering algorithm is very sensitive to noise, singular points, number of clusters, initial clustering center etc.As a result, the clustering performance is often problematic.(3) The clustering process needs to set a larger number of centers.In the Lazy Snapping algorithm, the number of clusters for the foreground and the background is set to 64.This implies that users need to mark the foreground and background using at least 64 regions, which leads to much interactive work.(4) Because WaterShed segmentation algorithm is based on gradient information of grayscale image and ignore rich color information, serious over-segmentation happens and the number of segmentation regions is still abundant.(5) In order to get better segmentation results, Lazy Snapping algorithm uses a fine-tuning process to further improve the segmentation accuracy in the post-processing stage, but the consequence is that the entire interaction process is too cumbersome.
Compared with WaterShed algorithm, MeanShift algorithm [25] performs better, but with much less segmentation regions.Figure 2 illustrates the segmentation results obtained by using WaterShed algorithm and MeanShift algorithm respectively on a Flower image with size 500 × 351 (175,500 pixels).WaterShed segmentation result contains 10,072 regions, while the MeanShift segmentation result contains only 1101 regions.In the proposed algorithm, we adopt the MeanShift algorithm as the pre-segmentation step to provide a good initialization for the subsequent high effective implementation of maximum flow-minimum cut algorithm.
To complete the automatic image segmentation, Ning et al. defined the following region-merging rules: For any region R , let Q be an adjacent region of R and a set of all the adjacent regions of Q is represented as and its all adjacent regions is calculated by using Equation (8), and the ranking is performed from maximum to minimum.If ( , ) max ( , ) is held, that is the similarity between Q and R is the largest, the region Q should be merged into the region R .The merger algorithm is implemented in two steps: step one is to merge an unlabeled region in set U with a background-labeled region B ; and the other is the automatic merging between unlabeled regions in set U .The MSRM algorithm repeats the above two steps until iteration convergence when no new merging occurs.

Motivation
In the Lazy Snapping algorithm [18], a pre-segmentation scheme is implemented by using WaterShed algorithm [21], and pixels as nodes in original GraphCut algorithm [13] are replaced by the obtained pre-segmentation regions as new nodes, which effectively improve segmentation efficiency and performance.But this algorithm also has the following problems: (1) Each region in the Lazy Snapping algorithm is modeled by the mean of color in this region, which is not sufficient to describe each region effectively.(2) The K-Means algorithm is used in clustering of the marked seed points.This clustering algorithm is very sensitive to noise, singular points, number of clusters, initial clustering center etc.As a result, the clustering performance is often problematic.(3) The clustering process needs to set a larger number of centers.In the Lazy Snapping algorithm, the number of clusters for the foreground and the background is set to 64.This implies that users need to mark the foreground and background using at least 64 regions, which leads to much interactive work.(4) Because WaterShed segmentation algorithm is based on gradient information of grayscale image and ignore rich color information, serious over-segmentation happens and the number of segmentation regions is still abundant.(5) In order to get better segmentation results, Lazy Snapping algorithm uses a fine-tuning process to further improve the segmentation accuracy in the post-processing stage, but the consequence is that the entire interaction process is too cumbersome.
Compared with WaterShed algorithm, MeanShift algorithm [25] performs better, but with much less segmentation regions.Figure 2   On the other hand, compared with the GraphCut algorithm and the Lazy Snapping algorithm, the MSRM algorithm [20] improved a lot by incorporating an automatic merging mechanism.However, MSRM algorithm also has the following main problems: Symmetry 2018, 10, 169 7 of 21 (1) This algorithm only considers the similarity between adjacent regions in the merging process without considering the relationship between each region and the foreground/background information marked by users.Generally speaking the MSRM algorithm does not make full use of the interactive information input by users, thereby slowing down the region-merging speed and weakening the segmentation accuracy greatly.(2) For each region, we need to calculate its color histogram before the merging algorithm is performed, and in the process of algorithm execution, it also needs to re-calculate the color histogram many times for each region after completing the region combination.In addition, in the region-merging process, it needs to calculate the similarity between each region with its adjacent regions according to Equation ( 8).In the whole segmentation process, this algorithm usually requires a large number of region merging, so its space-time overhead is very large.(3) The execution efficiency of MSRM algorithm has been seriously affected with increasing of the image size and the number of regions for those images with complex image content.Ning et al. has pointed out this problem in reference [20], so the application of this algorithm is limited to a certain extent.(4) Although MeanShift algorithm has excellent segmentation performance, the over-segmentation problem still exists in the segmentation results, and edge leak phenomenon occurs from time to time.But the MSRM algorithm does not do corresponding post-processing to resolve this problem.
By contrast with the GraphCut, Lazy Snapping and MSRM algorithms, the GrabCut algorithm [19] only needs to set a rectangular area for users to simplify the interaction process greatly.However, this algorithm also has several major problems: (1) Image content inside the rectangle marked by users contains both background and foreground information simultaneously, while the image content outside of the rectangle contains background information only.Although the algorithm establishes two models for foreground and background respectively at the same time, how accurate the establishment of the background model will be directly affects the final segmentation results.(2) When the size of an object is very large and almost occupies the whole space area in an image, or when the number of objects is large and is almost distributed in the whole image surface, the background area marked by users in the GrabCut algorithm is usually small.Due to insufficient background information, the established background model makes it difficult to represent the distribution of the background information in the whole image accurately.So the algorithm cannot obtain ideal segmentation results even after more iterations, and the whole process is very time-consuming.(3) This algorithm completes the modeling for the foreground through iterative online learning.
Obviously, it is not effective compared with adopting the direct modeling scheme for the foreground.However, this modeling method for the GrabCut algorithm is determined by its interaction mode.(4) Since the number, area size and position distribution of targets in the image to be segmented directly determine the setting of the rectangular box, which cannot be changed by users in the interaction process, the flexibility of the interaction mode is poor.It seems that the execution efficiency and segmentation performance of the GrabCut algorithm are affected by the setting of the rectangular box on the image surface.However, from the above analysis, we can further find that the execution efficiency and segmentation performance of the GrabCut algorithm are essentially determined by the inherent characteristics of the image itself.

Proposed Algorithm
In this paper, we present an efficient superpixel-guided interactive image-segmentation algorithm based on graph theory.The proposed algorithm is completed in two stages.In the first stage, we first use the MeanShift algorithm to pre-segment the image to extract superpixels.The superpixels are then taken as nodes to construct a directed graph.Finally, we applied the maximum flow-minimum cut algorithm to complete superpixel-level based graph segmentation.In the second stage, in order to solve the boundary leakage problem caused by the MeanShift algorithm, the proposed algorithm creates a mask image, which is termed Trimap, using morphological erosion and expansion operations based on the first-stage segmentation results.This Trimap is composed of three parts: TrimapUnknown, TrimapForeground and TrimapBackground, respectively.The TrimapUnknown is a narrow band area containing both the background and foreground information.The TrimapForeground only contains foreground information, while the TrimapBackground only contains background information.Then we establish the corresponding foreground and background models for TrimapForeground and TrimapBackground respectively, and create a pixel-level graph model for the narrow band area.Finally the maximum flow-minimum cut algorithm is performed again to complete the pixel-level based image segmentation.
Because of the relatively large area of each region obtained in the pre-segmentation step using MeanShift, it is more effective to use the color histogram to represent each region than the color mean.Hence, as with the MSRM algorithm, the proposed algorithm also uses a color histogram to represent each superpixel, denoted as H i for superpixel i.For user-marked foreground F and background B, a color histogram is also used to model them, which is denoted as H F and H B respectively.We still use the Bhattacharyya coefficient ρ(i, j) to measure the similarity between adjacent superpixels i and j, which is defined as Equation ( 8).In addition to measuring the similarity between superpixels, the similarity of each superpixel with respect to the foreground and the background is calculated as follows, where Z is the number of feature dimensions (size of the color histogram).
The above equations show that we not only consider the similarity between superpixels, but also consider the relationship between each superpixel and the interaction with the foreground and background, thus we define the following energy function, where the first part is the regional term that measures the similarity between each superpixel and the interaction information with the foreground and background.The second part is the edge information that measures the similarity between adjacent superpixels.The definition of the regional term and the edge term is denoted as follows, Symmetry 2018, 10, x FOR PEER REVIEW 8 of 22

Proposed Algorithm
In this paper, we present an efficient superpixel-guided interactive image-segmentation algorithm based on graph theory.The proposed algorithm is completed in two stages.In the first stage, we first use the MeanShift algorithm to pre-segment the image to extract superpixels.The superpixels are then taken as nodes to construct a directed graph.Finally, we applied the maximum flow-minimum cut algorithm to complete superpixel-level based graph segmentation.In the second stage, in order to solve the boundary leakage problem caused by the MeanShift algorithm, the proposed algorithm creates a mask image, which is termed Trimap, using morphological erosion and expansion operations based on the first-stage segmentation results.This Trimap is composed of three parts: TrimapUnknown, TrimapForeground and TrimapBackground, respectively.The TrimapUnknown is a narrow band area containing both the background and foreground information.The TrimapForeground only contains foreground information, while the TrimapBackground only contains background information.Then we establish the corresponding foreground and background models for TrimapForeground and TrimapBackground respectively, and create a pixel-level graph model for the narrow band area.Finally the maximum flowminimum cut algorithm is performed again to complete the pixel-level based image segmentation.
Because of the relatively large area of each region obtained in the pre-segmentation step using MeanShift, it is more effective to use the color histogram to represent each region than the color mean.Hence, as with the MSRM algorithm, the proposed algorithm also uses a color histogram to represent each superpixel, denoted as i H for superpixel i .For user-marked foreground F and background B , a color histogram is also used to model them, which is denoted as F H and B H respectively.We still use the Bhattacharyya coefficient ( , ) i j ρ to measure the similarity between adjacent superpixels i and j , which is defined as Equation ( 8).In addition to measuring the similarity between superpixels, the similarity of each superpixel with respect to the foreground and the background is calculated as follows, where Z is the number of feature dimensions (size of the color histogram).
The above equations show that we not only consider the similarity between superpixels, but also consider the relationship between each superpixel and the interaction with the foreground and background, thus we define the following energy function, where the first part is the regional term that measures the similarity between each superpixel and the interaction information with the foreground and background.The second part is the edge information that measures the similarity between adjacent superpixels.The definition of the regional term and the edge term is denoted as follows, where, , N represents pixels neighborhood, and λ is a balance control parameter.

Proposed Algorithm
In this paper, we present an efficient superpixel-guided interactive image-segmentatio algorithm based on graph theory.The proposed algorithm is completed in two stages.In the firs stage, we first use the MeanShift algorithm to pre-segment the image to extract superpixels.Th superpixels are then taken as nodes to construct a directed graph.Finally, we applied the maximum flow-minimum cut algorithm to complete superpixel-level based graph segmentation.In th second stage, in order to solve the boundary leakage problem caused by the MeanShift algorithm the proposed algorithm creates a mask image, which is termed Trimap, using morphologica erosion and expansion operations based on the first-stage segmentation results.This Trimap i composed of three parts: TrimapUnknown, TrimapForeground and TrimapBackground respectively.The TrimapUnknown is a narrow band area containing both the background and foreground information.The TrimapForeground only contains foreground information, while th TrimapBackground only contains background information.Then we establish the correspondin foreground and background models for TrimapForeground and TrimapBackground respectively and create a pixel-level graph model for the narrow band area.Finally the maximum flow minimum cut algorithm is performed again to complete the pixel-level based image segmentation.
Because of the relatively large area of each region obtained in the pre-segmentation step usin MeanShift, it is more effective to use the color histogram to represent each region than the colo mean.Hence, as with the MSRM algorithm, the proposed algorithm also uses a color ρ to measure the similarity between adjacent superpixels i and j , which is defined as Equation ( 8).In addition to measuring th similarity between superpixels, the similarity of each superpixel with respect to the foreground and the background is calculated as follows, where Z is the number of feature dimensions (size of the color histogram).The above equations show that we not only consider the similarity between superpixels, bu also consider the relationship between each superpixel and the interaction with the foreground and background, thus we define the following energy function, where the first part is the regional term that measures the similarity between each superpixel and th interaction information with the foreground and background.The second part is the edg information that measures the similarity between adjacent superpixels.The definition of the regiona term and the edge term is denoted as follows, ( 1) ( , ) ( 0) ( , ) where, represents pixels neighborhood, and λ is a balance contro parameter.
where, R PEER REVIEW 8 of 22

ithm
, we present an efficient superpixel-guided interactive image-segmentation n graph theory.The proposed algorithm is completed in two stages.In the first the MeanShift algorithm to pre-segment the image to extract superpixels.The n taken as nodes to construct a directed graph.Finally, we applied the maximum t algorithm to complete superpixel-level based graph segmentation.In the der to solve the boundary leakage problem caused by the MeanShift algorithm, rithm creates a mask image, which is termed Trimap, using morphological sion operations based on the first-stage segmentation results.This Trimap is ree parts: TrimapUnknown, TrimapForeground and TrimapBackground, TrimapUnknown is a narrow band area containing both the background and ation.The TrimapForeground only contains foreground information, while the only contains background information.Then we establish the corresponding ckground models for TrimapForeground and TrimapBackground respectively, l-level graph model for the narrow band area.Finally the maximum flowrithm is performed again to complete the pixel-level based image segmentation.
relatively large area of each region obtained in the pre-segmentation step using ore effective to use the color histogram to represent each region than the color ith the MSRM algorithm, the proposed algorithm also uses a color histogram to erpixel, denoted as to measure the similarity between ls i and j , which is defined as Equation ( 8).In addition to measuring the superpixels, the similarity of each superpixel with respect to the foreground and calculated as follows, mber of feature dimensions (size of the color histogram).uations show that we not only consider the similarity between superpixels, but lationship between each superpixel and the interaction with the foreground and e define the following energy function, t is the regional term that measures the similarity between each superpixel and the ation with the foreground and background.The second part is the edge easures the similarity between adjacent superpixels.The definition of the regional term is denoted as follows, :{ , } ( , ) represents pixels neighborhood, and λ is a balance control ρ(i, j), N represents pixels neighborhood, and λ is a balance control parameter.Figure 3 illustrates the segmentation result using the proposed method for the first stage on a rose image.As shown in Figure 3a, the edge of the target in the segmentation result is not smooth and there is some error segmentation due to the over-segmentation of the MeanShift algorithm.Therefore, based on the segmentation results of the first stage, we perform the segmentation process for the boundary region again.The corresponding binary image is denoted as S and is shown in Figure 3b.In this Symmetry 2018, 10, 169 9 of 21 process, we first morphologically erode the binary image S to obtain a region S F = SΘb which only contains objects and is described in white color (TrimapForeground) as shown in Figure 3c.Then, the morphological dilation operation is performed on the binary image S, and the difference between the dilation result and the erosion result is used as the narrow band region: S U = (S ⊕ b) − (SΘb), which contains both foreground and background and is represented in gray color (TrimapUnknown) in Figure 3c.The remaining area only contains the background information and can be expressed as S B = S − (S U + S F ); we show it in black color (TrimapBackground) in Figure 3c.Here, we use the square structural element for the morphology process, where the element b is in size (2d + 1) × (2d + 1) and d is a positive integer.Finally, we obtain a triple mask image (Trimap) as in Figure 3c.We then use the Orchard-Bouman clustering algorithm [27] to create the corresponding Gaussian mixed model for the foreground and background, respectively.For each Gaussian mixed model the number of Gaussians is denoted as K.In addition, in order to solve the over-segmentation problem caused by the MeanShift algorithm and further improve the segmentation accuracy of the narrow band region, this paper takes pixels at the narrow band region as nodes to build a graph again and then applies the maximum flow-minimum cut algorithm to obtain the final segmentation results.
Figure 3 illustrates the segmentation result using the proposed method for the first stage on a rose image.As shown in Figure 3a, the edge of the target in the segmentation result is not smooth and there is some error segmentation due to the over-segmentation of the MeanShift algorithm.Therefore, based on the segmentation results of the first stage, we perform the segmentation process for the boundary region again.The corresponding binary image is denoted as S and is shown in Figure 3b.In this process, we first morphologically erode the binary image S to obtain a region The whole process of the second stage is basically similar to the GrabCut algorithm.But the difference is that the foreground region in the proposed algorithm is completely determined and is no longer an unknown region containing both foreground and background information.Therefore, the color model established in the first stage of the proposed algorithm does not need to be updated.Moreover, the maximum flow-minimum cut algorithm only needs to execute once to get the ideal segmentation results.The reason is that because the area size of the determined narrow band in the first stage is very small, the determined foreground and background regions almost contain all the foreground information and background information simultaneously.Therefore the foreground and background models established in the proposed algorithm are very accurate.In the second stage, we use the same energy function as Equation (10) in the first stage, but the corresponding two energy terms are redefined as follows, The whole process of the second stage is basically similar to the GrabCut algorithm.But the difference is that the foreground region in the proposed algorithm is completely determined and is no longer an unknown region containing both foreground and background information.Therefore, the color model established in the first stage of the proposed algorithm does not need to be updated.Moreover, the maximum flow-minimum cut algorithm only needs to execute once to get the ideal segmentation results.The reason is that because the area size of the determined narrow band in the first stage is very small, the determined foreground and background regions almost contain all the foreground information and background information simultaneously.Therefore the foreground and background models established in the proposed algorithm are very accurate.In the second stage, we use the same energy function as Equation (10) in the first stage, but the corresponding two energy terms are redefined as follows,

Proposed Algorithm
In this paper, we present an efficient superpixel-guided interactive image-segmentation algorithm based on graph theory.The proposed algorithm is completed in two stages.In the first stage, we first use the MeanShift algorithm to pre-segment the image to extract superpixels.The superpixels are then taken as nodes to construct a directed graph.Finally, we applied the maximum flow-minimum cut algorithm to complete superpixel-level based graph segmentation.In the second stage, in order to solve the boundary leakage problem caused by the MeanShift algorithm, the proposed algorithm creates a mask image, which is termed Trimap, using morphological erosion and expansion operations based on the first-stage segmentation results.This Trimap is composed of three parts: TrimapUnknown, TrimapForeground and TrimapBackground, respectively.The TrimapUnknown is a narrow band area containing both the background and foreground information.The TrimapForeground only contains foreground information, while the TrimapBackground only contains background information.Then we establish the corresponding foreground and background models for TrimapForeground and TrimapBackground respectively, and create a pixel-level graph model for the narrow band area.Finally the maximum flowminimum cut algorithm is performed again to complete the pixel-level based image segmentation.
Because of the relatively large area of each region obtained in the pre-segmentation step using MeanShift, it is more effective to use the color histogram to represent each region than the color mean.Hence, as with the MSRM algorithm, the proposed algorithm also uses a color histogram to represent each superpixel, denoted as i H for superpixel i .For user-marked foreground F and background B , a color histogram is also used to model them, which is denoted as F H and B H respectively.We still use the Bhattacharyya coefficient ( , ) i j ρ to measure the similarity between adjacent superpixels i and j , which is defined as Equation ( 8).In addition to measuring the similarity between superpixels, the similarity of each superpixel with respect to the foreground and the background is calculated as follows, where Z is the number of feature dimensions (size of the color histogram).
The above equations show that we not only consider the similarity between superpixels, but also consider the relationship between each superpixel and the interaction with the foreground and background, thus we define the following energy function, where the first part is the regional term that measures the similarity between each superpixel and the interaction information with the foreground and background.The second part is the edge information that measures the similarity between adjacent superpixels.The definition of the regional term and the edge term is denoted as follows, ( where,


, N represents pixels neighborhood, and λ is a balance control parameter.

Proposed Algorithm
In this paper, we present an efficient superpixel-guided interactive image-segmentation algorithm based on graph theory.The proposed algorithm is completed in two stages.In the first stage, we first use the MeanShift algorithm to pre-segment the image to extract superpixels.The superpixels are then taken as nodes to construct a directed graph.Finally, we applied the maximum flow-minimum cut algorithm to complete superpixel-level based graph segmentation.In the second stage, in order to solve the boundary leakage problem caused by the MeanShift algorithm, the proposed algorithm creates a mask image, which is termed Trimap, using morphological erosion and expansion operations based on the first-stage segmentation results.This Trimap is composed of three parts: TrimapUnknown, TrimapForeground and TrimapBackground, respectively.The TrimapUnknown is a narrow band area containing both the background and foreground information.The TrimapForeground only contains foreground information, while the TrimapBackground only contains background information.Then we establish the corresponding foreground and background models for TrimapForeground and TrimapBackground respectively, and create a pixel-level graph model for the narrow band area.Finally the maximum flowminimum cut algorithm is performed again to complete the pixel-level based image segmentation.
Because of the relatively large area of each region obtained in the pre-segmentation step using MeanShift, it is more effective to use the color histogram to represent each region than the color mean.Hence, as with the MSRM algorithm, the proposed algorithm also uses a color histogram to represent each superpixel, denoted as i H for superpixel i . For user-marked foreground F and background B , a color histogram is also used to model them, which is denoted as respectively.We still use the Bhattacharyya coefficient ( , ) i j ρ to measure the similarity between adjacent superpixels i and j , which is defined as Equation ( 8).In addition to measuring the similarity between superpixels, the similarity of each superpixel with respect to the foreground and the background is calculated as follows, where Z is the number of feature dimensions (size of the color histogram).
The above equations show that we not only consider the similarity between superpixels, but also consider the relationship between each superpixel and the interaction with the foreground and background, thus we define the following energy function, where the first part is the regional term that measures the similarity between each superpixel and the interaction information with the foreground and background.The second part is the edge information that measures the similarity between adjacent superpixels.The definition of the regional term and the edge term is denoted as follows, where, represents pixels neighborhood, and λ is a balance control parameter.
where, I i represents the color information of pixel i and

Proposed Algorithm
In this paper, we present an efficient superpixel-guided interactive image-segmentation algorithm based on graph theory.The proposed algorithm is completed in two stages.In the first stage, we first use the MeanShift algorithm to pre-segment the image to extract superpixels.The superpixels are then taken as nodes to construct a directed graph.Finally, we applied the maximum flow-minimum cut algorithm to complete superpixel-level based graph segmentation.In the second stage, in order to solve the boundary leakage problem caused by the MeanShift algorithm, the proposed algorithm creates a mask image, which is termed Trimap, using morphological erosion and expansion operations based on the first-stage segmentation results.This Trimap is composed of three parts: TrimapUnknown, TrimapForeground and TrimapBackground, respectively.The TrimapUnknown is a narrow band area containing both the background and foreground information.The TrimapForeground only contains foreground information, while the TrimapBackground only contains background information.Then we establish the corresponding foreground and background models for TrimapForeground and TrimapBackground respectively, and create a pixel-level graph model for the narrow band area.Finally the maximum flowminimum cut algorithm is performed again to complete the pixel-level based image segmentation.
Because of the relatively large area of each region obtained in the pre-segmentation step using MeanShift, it is more effective to use the color histogram to represent each region than the color mean.Hence, as with the MSRM algorithm, the proposed algorithm also uses a color histogram to represent each superpixel, denoted as i H for superpixel i .For user-marked foreground F and background B , a color histogram is also used to model them, which is denoted as to measure the similarity between adjacent superpixels i and j , which is defined as Equation ( 8).In addition to measuring the similarity between superpixels, the similarity of each superpixel with respect to the foreground and the background is calculated as follows, where Z is the number of feature dimensions (size of the color histogram).
The above equations show that we not only consider the similarity between superpixels, but also consider the relationship between each superpixel and the interaction with the foreground and background, thus we define the following energy function, where the first part is the regional term that measures the similarity between each superpixel and the interaction information with the foreground and background.The second part is the edge information that measures the similarity between adjacent superpixels.The definition of the regional term and the edge term is denoted as follows, where, represents pixels neighborhood, and λ is a balance control parameter.
B(ι i , ι j ).In the process of building a graph model, we use 8 neighborhoods, and since λ ≥ B(ι i , ι j ) is satisfied, we set K = 8λ + 1.
Symmetry 2018, 10, 169 10 of 21 D x (i) (variable x satisfies x ∈ {F, B}) is the distribution of the ith pixel in the foreground or background model following the Gaussian mixed model, which is calculated as: where, π x k denotes the weight of the kth Gaussian distribution in the Gaussian mixed model of the foreground or background, µ x k and ∑ x k denote the mean and covariance matrix of the kth Gaussian distribution in the foreground or background model, and det ∑ x k denotes the determinant of covariance of matrix ∑ x k .From the above description of the proposed algorithm, we can see that the proposed algorithm only requires a small amount of interaction information and can obtain ideal segmentation results in a relatively short period of time.As shown in Figure 2, before executing the GraphCut algorithm, users are asked to input interactive information in Figure 2a.However, each pixel needs to be determined.As for Figure 2b, although it is area-based segmentation, it is still difficult for users to decide which areas should be marked before performing the Lazy Snapping algorithm, while in Figure 2c, the large area size and small number of areas provides guidance for user interaction and simplifies the interaction process.In addition, compared with the MSRM algorithm, the proposed algorithm not only takes the similarity between regions into account, but also the similarity between each region and the foreground/background information.Furthermore, the proposed algorithm could accurately determine the foreground and background regions after the first stage segmentation process.Compared with the GrabCut algorithm, the established background model is more accurate and, most importantly, the foreground model is also established at the same time.For those images that contain only one large size object or a large number of objects almost distributed over the entire image, it is difficult to obtain ideal segmentation results using the GrabCut algorithm even if the maximum flow-minimum cut algorithm is applied more times.

Experimental Results
We performed the experiment on some public datasets.Figure 4 shows three testing images and Table 1 shows some basic information of the testing images, including the image size, the total number of pixels, the number of superpixels segmented using the WaterShed [21] and MeanShift [25] algorithms, respectively.As can be clearly seen from the table, compared with the number of pixels, the number of superpixels processed by the pre-segmentation techniques has been greatly reduced, where we can see that the MeanShift algorithm is more effective than the WaterShed algorithm.In the experiment, the parameters of our method were set as follows: the size of the color histogram was set as Z = 4096, the balance control parameter in the graph model was set as λ = 1.0, the size of the square structural element in the morphology process was set as b = 2, and the number of Gaussians of GMMs in the foreground and background models was set as K = 5.All the parameter values were set by following exhaustive experiments.Figure 5 shows the interaction information marked by users for the flower image.For the sake of fairness, the GraphCut [13], Lazy Snapping [18] and MSRM [20] algorithms and the proposed algorithm use the same interaction information, as shown in Figure 5a.The red strokes mark foreground information and the blue strokes mark background information.Because the GrabCut algorithm [19] completes the user interaction by setting a rectangular area, different from the above four algorithms, we only need to mark a red box as shown in Figure 5b.All pixels outside of the red box belong to the background, the others containing both the foreground and background are segmented iteratively by the maximum flow-minimum cut algorithm.Figure 6 illustrates the corresponding segmentation results of each algorithm, where the first row is the original image-segmentation results and the second row is the corresponding binary images.From the experimental results, we can see that the GraphCut and Lazy Snapping algorithms have a large number of misclassifications.Although the GrabCut algorithm is good at boundary processing, there are some misclassifications inside the target, which can be seen from the corresponding binary image.The segmentation result of the MSRM algorithm is ideal inside the foreground and background, but over-segmentation is serious around the edge of the target, which is similar to the segmentation result in the first stage of the proposed algorithm, as shown in Figure 3a.After the second pixel-level based segmentation for the narrow band, the proposed algorithm effectively solves the edge leakage problem, resulting in a good segmentation result, as shown in Figure 6e below.Figure 5 shows the interaction information marked by users for the flower image.For the sake of fairness, the GraphCut [13], Lazy Snapping [18] and MSRM [20] algorithms and the proposed algorithm use the same interaction information, as shown in Figure 5a.The red strokes mark foreground information and the blue strokes mark background information.Because the GrabCut algorithm [19] completes the user interaction by setting a rectangular area, different from the above four algorithms, we only need to mark a red box as shown in Figure 5b.All pixels outside of the red box belong to the background, the others containing both the foreground and background are segmented iteratively by the maximum flow-minimum cut algorithm.Figure 6 illustrates the corresponding segmentation results of each algorithm, where the first row is the original image-segmentation results and the second row is the corresponding binary images.From the experimental results, we can see that the GraphCut and Lazy Snapping algorithms have a large number of misclassifications.Although the GrabCut algorithm is good at boundary processing, there are some misclassifications inside the target, which can be seen from the corresponding binary image.The segmentation result of the MSRM algorithm is ideal inside the foreground and background, but over-segmentation is serious around the edge of the target, which is similar to the segmentation result in the first stage of the proposed algorithm, as shown in Figure 3a.After the second pixel-level based segmentation for the narrow band, the proposed algorithm effectively solves the edge leakage problem, resulting in a good segmentation result, as shown in Figure 6e below.Figure 7 illustrates the interaction information marked by users for the person image.Since the number of regions partitioned by the WaterShed algorithm is still large and the area size is generally small, no guide information can be provided to users in the interaction process.Therefore, in this experiment, the GraphCut and Lazy Snapping algorithms use the same interaction information.Because the MSRM algorithm and our algorithm use the same pre-segmentation MeanShift algorithm, the two algorithms use the same interaction information.As shown in Figure 7c, only a few marks are needed to determine a large amount of background and foreground information.The corresponding segmentation results are shown in Figure 8.Because of the fewer background area markers, as shown in Figure 7a, a large number of misclassifications in the GraphCut and Lazy Snapping algorithms occur.The main reason is that because users find it hard to make accurate Figure 7 illustrates the interaction information marked by users for the person image.Since the number of regions partitioned by the WaterShed algorithm is still large and the area size is generally small, no guide information can be provided to users in the interaction process.Therefore, in this experiment, the GraphCut and Lazy Snapping algorithms use the same interaction information.Because the MSRM algorithm and our algorithm use the same pre-segmentation MeanShift algorithm, the two algorithms use the same interaction information.As shown in Figure 7c, only a few marks are needed to determine a large amount of background and foreground information.The corresponding segmentation results are shown in Figure 8.Because of the fewer background area markers, as shown in Figure 7a, a large number of misclassifications in the GraphCut and Lazy Snapping algorithms occur.The main reason is that because users find it hard to make accurate interactive marking without guidance, this problem becomes more prominent when the image content is complex.As with the flower image, the segmentation result of GrabCut algorithm is also acceptable.This is because the background information outside the rectangle area can fully represent the background distribution of the image, and the established background model is relative accurate.However, only having the accurate background model is not enough, and we can see some misclassifications inside the target.Furthermore, because the MeanShift algorithm has the problem of edge leakage in the segmentation process, the misclassification of the segmentation result of the MSRM algorithm is still serious at the target edge, as shown in Figure 8d.In contrast, the segmentation result of the proposed algorithm performs much better than the others in terms of segmentation accuracy, especially in the target boundary areas, as shown in Figure 8e. Figure 7 illustrates the interaction information marked by users for the person image.Since the number of regions partitioned by the WaterShed algorithm is still large and the area size is generally small, no guide information can be provided to users in the interaction process.Therefore, in this experiment, the GraphCut and Lazy Snapping algorithms use the same interaction information.Because the MSRM algorithm and our algorithm use the same pre-segmentation MeanShift algorithm, the two algorithms use the same interaction information.As shown in Figure 7c, only a few marks are needed to determine a large amount of background and foreground information.The corresponding segmentation results are shown in Figure 8.Because of the fewer background area markers, as shown in Figure 7a, a large number of misclassifications in the GraphCut and Lazy Snapping algorithms occur.The main reason is that because users find it hard to make accurate interactive marking without guidance, this problem becomes more prominent when the image content is complex.As with the flower image, the segmentation result of GrabCut algorithm is also acceptable.This is because the background information outside the rectangle area can fully represent the background distribution of the image, and the established background model is relative accurate.However, only having the accurate background model is not enough, and we can see some misclassifications inside the target.Furthermore, because the MeanShift algorithm has the problem of edge leakage in the segmentation process, the misclassification of the segmentation result of the MSRM algorithm is still serious at the target edge, as shown in Figure 8d.In contrast, the segmentation result of the proposed algorithm performs much better than the others in terms of segmentation accuracy, especially in the target boundary areas, as shown in Figure 8e.In addition, Figure 9 shows the user interaction for the animal image and the segmentation results are shown in Figure 10.From the segmentation results, we can find that the segmentation results of the GraphCut and Lazy Snapping algorithms are not perfect even when a lot of marker information is used.For example, it is difficult for users to mark the narrow tail region, which leads to false segmentation of this region.For the GrabCut algorithm, because the two targets (dogs) are scattered in the image, the marked background information is not enough to represent the background distribution effectively, and a large amount of background information is contained in the rectangular area.Thus, it is difficult for the GrabCut algorithm to learn the background model accurately.Although the MSRM algorithm has the same initial conditions with the proposed algorithm, it can be seen from Figure 9c that the segmentation results by using the MSRM algorithm In addition, Figure 9 shows the user interaction for the animal image and the segmentation results are shown in Figure 10.From the segmentation results, we can find that the segmentation results of the GraphCut and Lazy Snapping algorithms are not perfect even when a lot of marker information is used.For example, it is difficult for users to mark the narrow tail region, which leads to false segmentation of this region.For the GrabCut algorithm, because the two targets (dogs) are scattered in the image, the marked background information is not enough to represent the background distribution effectively, and a large amount of background information is contained in the rectangular area.Thus, it is difficult for the GrabCut algorithm to learn the background model accurately.Although the MSRM algorithm has the same initial conditions with the proposed algorithm, it can be seen from Figure 9c that the segmentation results by using the MSRM algorithm have a large number of errors.The main reason is that the MSRM algorithm only considers the similarity among regions and does not fully consider the relationship between regions and the interaction information marked by users.Although the GraphCut, Lazy Snapping, GrabCut and MSRM algorithms can improve segmentation accuracy by adding more markers, the interaction process is more complicated.However, the proposed method achieves very promising segmentation results under the same user interaction or less marking conditions, which is more effective in practical applications.
Table 2 compares the running time of the five algorithms.As can be seen from the table, the proposed algorithm has almost the same efficiency in terms of running time with the GraphCut and Lazy Snapping algorithms.Although the segmentation performance of the GrabCut algorithm is relatively good, it is not efficient to use in real-time applications.The execution efficiency of the MSRM algorithm is mainly determined by the number of partitioned regions, which is also not robust for practical implementation.For our proposed algorithm, the high efficiency of the maximum flow-minimum cut algorithm and its property of being less sensitive to the number of regions makes it efficient and useful for real-time segmentation.In addition, Figure 9 shows the user interaction for the animal image and the segmentation results are shown in Figure 10.From the segmentation results, we can find that the segmentation results of the GraphCut and Lazy Snapping algorithms are not perfect even when a lot of marker information is used.For example, it is difficult for users to mark the narrow tail region, which leads to false segmentation of this region.For the GrabCut algorithm, because the two targets (dogs) are scattered in the image, the marked background information is not enough to represent the background distribution effectively, and a large amount of background information is contained in the rectangular area.Thus, it is difficult for the GrabCut algorithm to learn the background model accurately.Although the MSRM algorithm has the same initial conditions with the proposed algorithm, it can be seen from Figure 9c that the segmentation results by using the MSRM algorithm have a large number of errors.The main reason is that the MSRM algorithm only considers the similarity among regions and does not fully consider the relationship between regions and the interaction information marked by users.Although the GraphCut, Lazy Snapping, GrabCut and MSRM algorithms can improve segmentation accuracy by adding more markers, the interaction process is more complicated.However, the proposed method achieves very promising segmentation results under the same user interaction or less marking conditions, which is more effective in practical applications.Table 2 compares the running time of the five algorithms.As can be seen from the table, the proposed algorithm has almost the same efficiency in terms of running time with the GraphCut and Lazy Snapping algorithms.Although the segmentation performance of the GrabCut algorithm is relatively good, it is not efficient to use in real-time applications.The execution efficiency of the MSRM algorithm is mainly determined by the number of partitioned regions, which is also not robust for practical implementation.For our proposed algorithm, the high efficiency of the maximum flow-minimum cut algorithm and its property of being less sensitive to the number of regions makes it efficient and useful for real-time segmentation.In order to further compare our proposed algorithm with the other four algorithms, we perform a detailed experiment on the Microsoft GrabCut image dataset [19], which is composed of  In order to further compare our proposed algorithm with the other four algorithms, we perform a detailed experiment on the Microsoft GrabCut image dataset [19], which is composed of 30 images provided with ground truth.Some testing images and the corresponding ground truth are shown in Figure 11.In this experiment, we evaluated the segmentation performance using the misclassification error (ME) [5], Rand index (RI) [28] and boundary recall (BR) [26].ME is defined as follows, where B G and F G denote the background and foreground pixels of the ground truth (G), B S and F S denote the background and foreground pixels of the segmentation result (S), and |•| is the cardinality of the set.The ME measures the percentage of wrongly assigned pixels, which ranges from zero for no error and one for completely wrong.where N denotes the total number of pixels, ( ) Ι  is a binary function with ( ) The segmentation results are illustrated in Figure 12, Tables 3-5.Similarly, the GraphCut, Lazy Snapping algorithm and MSRM algorithms and the proposed algorithm use the same interaction information, as shown in Figure 12a, where the red strokes mark the foreground and the blue strokes mark the background.The green rectangular box is used for the GrabCut algorithm as also shown in Figure 12a.From the segmentation results, we find that the proposed algorithm obtains the overall best segmentation performance in ME, RI and BR measures.
where S b and G b denote the union sets of segmentation boundaries and ground truth boundaries, respectively.The segmentation results are illustrated in Figure 12, Tables 3-5.Similarly, the GraphCut, Lazy Snapping algorithm and MSRM algorithms and the proposed algorithm use the same interaction information, as shown in Figure 12a, where the red strokes mark the foreground and the blue strokes
illustrates the segmentation results obtained by using WaterShed algorithm and MeanShift algorithm respectively on a Flower image with size 500 351 × (175,500 pixels).WaterShed segmentation result contains 10,072 regions, while the MeanShift segmentation result contains only 1101 regions.In the proposed algorithm, we adopt the MeanShift algorithm as the pre-segmentation step to provide a good initialization for the subsequent high effective implementation of maximum flow-minimum cut algorithm.

iH
for superpixel i .For user-marked foreground F and color histogram is also used to model them, which is denoted as

Figure 3 .
Figure 3. First stage segmentation result and ternary mask images.(a) Segmentation result in the first stage; (b) binary result; (c) Trimap.

Figure 3 .
Figure 3. First stage segmentation result and ternary mask images.(a) Segmentation result in the first stage; (b) binary result; (c) Trimap.

Figure 5 .
Figure 5. User interaction for the flower image.(a) Interaction of GraphCut, Lazy Snapping, MSRM and proposed algorithms; (b) interaction of GrabCut.

Figure 7 . 22 Figure 7 .
Figure 7. User interaction for the person image.(a) Interaction of GraphCut and Lazy Snapping; (b) interaction of GrabCut; (c) interaction of MSRM and proposed algorithm.Symmetry 2018, 10, x FOR PEER REVIEW 13 of 22 Figure 7. User interaction for the person image.(a) Interaction of GraphCut and Lazy Snapping; (b) interaction of GrabCut; (c) interaction of MSRM and proposed algorithm.

Figure 9 .
Figure 9. User interaction for the animal image.(a) Interaction of GraphCut and Lazy Snapping; (b) interaction of GrabCut; (c) interaction of MSRM and proposed algorithm.

Figure 9 .Figure 10 .
Figure 9. User interaction for the animal image.(a) Interaction of GraphCut and Lazy Snapping; (b) interaction of GrabCut; (c) interaction of MSRM and proposed algorithm.Symmetry 2018, 10, x FOR PEER REVIEW 14 of 22

Symmetry 2018 , 22 Figure 11 .
Figure 11.Partial test images and ground truth of the GrabCut image dataset.RI computes the ratio of the number of pixel-pairs sharing the same label relationship between the segmentation result ( S ) and the ground truth ( G ).The definition of RI is described as follows,( ) ( ) ( ) ( ) ( ) , , 2 1 S S G G S S G G i j i j i j i j N i j i j RI l l l l l l l l <   = Ι = Ι = + Ι ≠ Ι ≠    takes values in the range [0, 1], where a score of zero indicates the labelling of the test segmentation is totally opposite to the ground truth and 1 indicates that they are the same on every pixel pair.BR measures the percentage of the ground truth boundaries recovered by the segmentation boundaries and is defined as follows, b G denote the union sets of segmentation boundaries and ground truth boundaries, respectively.

Figure 11 . 1 N 2 ∑
Figure 11.Partial test images and ground truth of the GrabCut image dataset.
The corresponding clustering result centers are denoted as K F n and K B m respectively, where n ∈ [1, . . ., 64], m ∈ [1, . . ., 64].Li et al. defined the likelihood energy and the priori energy as follows, held, that is the similarity between Q and R is

Table 1 .
Number of pixels and superpixels of three test images.

Table 1 .
Number of pixels and superpixels of three test images.

Table 1 .
Number of pixels and superpixels of three test images.