You are currently viewing a new version of our website. To view the old version click .
Symmetry
  • Article
  • Open Access

17 May 2018

Efficient Superpixel-Guided Interactive Image Segmentation Based on Graph Theory

,
,
,
and
College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
*
Author to whom correspondence should be addressed.

Abstract

Image segmentation is a challenging task in the field of image processing and computer vision. In order to obtain an accurate segmentation performance, user interaction is always used in practical image-segmentation applications. However, a good segmentation method should not rely on much prior information. In this paper, an efficient superpixel-guided interactive image-segmentation algorithm based on graph theory is proposed. In this algorithm, we first perform the initial segmentation by using the MeanShift algorithm, then a graph is built by taking the pre-segmented regions (superpixels) as nodes, and the maximum flow–minimum cut algorithm is applied to get the superpixel-level segmentation solution. In this process, each superpixel is represented by a color histogram, and the Bhattacharyya coefficient is chosen to calculate the similarity between any two adjacent superpixels. Considering the over-segmentation problem of the MeanShift algorithm, a narrow band is constructed along the contour of objects using a morphology operator. In order to further segment the pixels around edges accurately, a graph is created again for those pixels in the narrow band and, following the maximum flow–minimum cut algorithm, the final pixel-level segmentation is completed. Extensive experimental results show that the presented algorithm obtains much more accurate segmentation results with less user interaction and less running time than the widely used GraphCut algorithm, Lazy Snapping algorithm, GrabCut algorithm and a region merging algorithm based on maximum similarity (MSRM).

1. Introduction

Image segmentation, which aims to extract objects of interest from a complex background for object detection, tracking, recognition, scene analysis, etc., is one of the basic problems in image processing, and has been widely used in pattern recognition and computer vision [1,2,3,4,5,6,7]. Many image-segmentation algorithms have been proposed in recent years [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]. According to different image types, image segmentation can be divided into monochrome and color image segmentation. According to image representation, image segmentation can be divided into single scale and multi-scale approaches. According to the principle of operation, image segmentation can be divided into spatially blind and spatially guided methods. According to whether priori knowledge is provided, image segmentation can be divided into automatic and interactive segmentation. For natural images of various types and with complicated content, an interactive segmentation-based method is usually used, because its segmentation is more consistent with users’ subjective intentions [12,13,14,15,16,17,18,19,20]. GraphCut is one of the generally used interactive segmentation algorithms [13,14,15,16,17,18,19] due to its global optimization, strong numerical robustness, high execution efficiency, free topological structure of partitioned weighted graph, and N-D image-segmentation ability [13,14,15]. As a pre-segmentation solution, superpixel segmentation has been paid more attention and an abundance of superpixel segmentation algorithms have been proposed in recent years [7,21,22,23,24,25,26], such as watershed algorithms [21,22,23,24], MeanShift algorithm [25], turbopixels [26] etc. Superpixel algorithms group pixels into perceptually meaningful regions which can capture image redundancy and greatly reduce the complexity of subsequent image processing such as an object’s segmentation, detection, tracking and recognition tasks [6,7,18,20].
A good interactive image-segmentation method must perform accurate segmentation with minimal user interaction and less feedback time [17]. GraphCut is a representative interactive image-segmentation algorithm. It takes pixels as nodes to construct a weighted directed graph, and the maximum flow–minimum cut algorithm [14] is applied to obtain the global solution. Unfortunately, this algorithm only considers gray scale images, and hence the segmentation results are not good when the image content is complex. Moreover, when the pixel values in the foreground and background are closer in the gray scale, this algorithm usually requires users to provide a lot of interactive information to derive the ideal segmentation results. In addition, since pixels are used as nodes, the number of nodes in the graph model constructed by GraphCut algorithm would be huge when the image size is large, which also leads to the maximum flow–minimum cut algorithm taking a great deal of execution time. To overcome these shortcomings, Li et al. proposed the Lazy Snapping segmentation algorithm [18]. In this algorithm, the WaterShed algorithm [21] is first used to pre-segment images and a weighted directed graph is constructed by rendering the partitioned regions as nodes, and the maximum flow–minimum cut algorithm is then used to solve the graph partition problem. Finally, a manual adjustment scheme is adopted to make the segmentation results more accurate. This Lazy Snapping algorithm, however, has several major issues: (1) because the WaterShed algorithm only uses the gradient information of grayscale images, the over-segmentation is serious, and the number of regions of the pre-segmentation results is still large; (2) this algorithm uses the mean of color information to characterize each region, which is too simple to accurately represent the color distribution of each region; (3) in the modeling process, a K-means clustering algorithm is chosen, where the clustering performance is greatly affected by the initial conditions and disturbing factors; (4) this algorithm requires a lot of coarse-tuning and fine-tuning in the process of interaction, making the entire interactive process very complicated.
Compared with the WaterShed algorithm, the MeanShift segmentation algorithm [25] has been more widely studied and has a wider range of applications due to its excellent segmentation performance. The MeanShift algorithm makes full use of the color information, which suppresses over-segmentation effectively and the number of segmentation regions has been significantly reduced. Combining pre-segmentation by the MeanShift algorithm with color histogram representation for each region, Ning et al. proposed a region-merging algorithm based on maximum similarity (MSRM) [20]. By contrast with the maximum flow–minimum cut algorithm, a regional automatic merging mechanism is used to complete the color image segmentation for the MSRM algorithm. However, this algorithm only considers the similarity between adjacent regions and ignores the inter-relationship between each region with interaction information. In the process of region merging, the color histogram of each region needs to be made and simultaneously the similarity between adjacent regions needs to be calculated, and the whole space-time cost overhead is hence very expensive. In addition, the MSRM algorithm does not consider the over-segmentation problem introduced by the MeanShift algorithm, which leads to a large number of error segmentations around the edge of the targets.
To simplify the process of user interaction, Rother et al. proposed the GrabCut algorithm [19]. This algorithm establishes the initial foreground and background models, respectively, according to a rectangular area marked by users. Since in the rectangular area both foreground and background information are contained, the algorithm iteratively updates the foreground and background models through the iterative maximum flow–minimum cut scheme and iteration stops until the global energy solution is converged. In the initial step, only the background areas are determined accurately, so the final segmentation results for the GrabCut algorithm are affected by the initial background model.
In this paper, we present an efficient superpixel-guided interactive image-segmentation algorithm based on graph theory. Firstly, MeanShift algorithm is applied to pre-segment an image into regions (superpixels), and then the proposed algorithm constructs a weighted directed graph whose nodes are composed of the pre-partitioned regions. This model not only considers the correlation between adjacent superpixels, but also takes the relationship between each superpixel and the interaction information into account. Most importantly, we use the color histogram to represent each superpixel, which can accurately represent the regional color distribution. As with the MSRM segmentation algorithm, we choose Bhattacharyya coefficients to measure the similarity between superpixels, and the maximum flow–minimum cut algorithm is then performed to obtain the first-stage segmentation results. By considering the edge leakage problem caused by the MeanShift algorithm, the proposed algorithm creates a narrow band region by using the morphological operation at the boundary of targets based on the first-stage segmentation results. At the same time, the foreground and background regions are determined, and the corresponding foreground and background models are established by Gaussian mixed models (GMMs). Finally, a graph model is rebuilt again by considering pixels as nodes in the narrow band region and the final segmentation results are obtained by using the maximum flow–minimum cut algorithm. Through a large number of experiments, it is shown that the proposed algorithm achieves better segmentation results with less user interaction and execution time compared with the GraphCut, Lazy Snapping, GrabCut and MSRM algorithms.
The remainder of this paper is organized as follows: Section 2 summarizes the related work including the GraphCut, LazySnapping, GrabCut and MSRM algorithms. Section 3 introduces the motivation of the proposed algorithm and the detail is described in Section 4. Section 5 performs extensive experiments to verify the proposed algorithm. Section 6 concludes this paper.

3. Motivation

In the Lazy Snapping algorithm [18], a pre-segmentation scheme is implemented by using WaterShed algorithm [21], and pixels as nodes in original GraphCut algorithm [13] are replaced by the obtained pre-segmentation regions as new nodes, which effectively improve segmentation efficiency and performance. But this algorithm also has the following problems:
(1)
Each region in the Lazy Snapping algorithm is modeled by the mean of color in this region, which is not sufficient to describe each region effectively.
(2)
The K-Means algorithm is used in clustering of the marked seed points. This clustering algorithm is very sensitive to noise, singular points, number of clusters, initial clustering center etc. As a result, the clustering performance is often problematic.
(3)
The clustering process needs to set a larger number of centers. In the Lazy Snapping algorithm, the number of clusters for the foreground and the background is set to 64. This implies that users need to mark the foreground and background using at least 64 regions, which leads to much interactive work.
(4)
Because WaterShed segmentation algorithm is based on gradient information of grayscale image and ignore rich color information, serious over-segmentation happens and the number of segmentation regions is still abundant.
(5)
In order to get better segmentation results, Lazy Snapping algorithm uses a fine-tuning process to further improve the segmentation accuracy in the post-processing stage, but the consequence is that the entire interaction process is too cumbersome.
Compared with WaterShed algorithm, MeanShift algorithm [25] performs better, but with much less segmentation regions. Figure 2 illustrates the segmentation results obtained by using WaterShed algorithm and MeanShift algorithm respectively on a Flower image with size 500 × 351 (175,500 pixels). WaterShed segmentation result contains 10,072 regions, while the MeanShift segmentation result contains only 1101 regions. In the proposed algorithm, we adopt the MeanShift algorithm as the pre-segmentation step to provide a good initialization for the subsequent high effective implementation of maximum flow–minimum cut algorithm.
Figure 2. Flower Image and Its Segmentation Results. (a) Flower Image; (b) WaterShed Segmentation Result; (c) MeanShift Segmentation Result.
On the other hand, compared with the GraphCut algorithm and the Lazy Snapping algorithm, the MSRM algorithm [20] improved a lot by incorporating an automatic merging mechanism. However, MSRM algorithm also has the following main problems:
(1)
This algorithm only considers the similarity between adjacent regions in the merging process without considering the relationship between each region and the foreground/background information marked by users. Generally speaking the MSRM algorithm does not make full use of the interactive information input by users, thereby slowing down the region-merging speed and weakening the segmentation accuracy greatly.
(2)
For each region, we need to calculate its color histogram before the merging algorithm is performed, and in the process of algorithm execution, it also needs to re-calculate the color histogram many times for each region after completing the region combination. In addition, in the region-merging process, it needs to calculate the similarity between each region with its adjacent regions according to Equation (8). In the whole segmentation process, this algorithm usually requires a large number of region merging, so its space-time overhead is very large.
(3)
The execution efficiency of MSRM algorithm has been seriously affected with increasing of the image size and the number of regions for those images with complex image content. Ning et al. has pointed out this problem in reference [20], so the application of this algorithm is limited to a certain extent.
(4)
Although MeanShift algorithm has excellent segmentation performance, the over-segmentation problem still exists in the segmentation results, and edge leak phenomenon occurs from time to time. But the MSRM algorithm does not do corresponding post-processing to resolve this problem.
By contrast with the GraphCut, Lazy Snapping and MSRM algorithms, the GrabCut algorithm [19] only needs to set a rectangular area for users to simplify the interaction process greatly. However, this algorithm also has several major problems:
(1)
Image content inside the rectangle marked by users contains both background and foreground information simultaneously, while the image content outside of the rectangle contains background information only. Although the algorithm establishes two models for foreground and background respectively at the same time, how accurate the establishment of the background model will be directly affects the final segmentation results.
(2)
When the size of an object is very large and almost occupies the whole space area in an image, or when the number of objects is large and is almost distributed in the whole image surface, the background area marked by users in the GrabCut algorithm is usually small. Due to insufficient background information, the established background model makes it difficult to represent the distribution of the background information in the whole image accurately. So the algorithm cannot obtain ideal segmentation results even after more iterations, and the whole process is very time-consuming.
(3)
This algorithm completes the modeling for the foreground through iterative online learning. Obviously, it is not effective compared with adopting the direct modeling scheme for the foreground. However, this modeling method for the GrabCut algorithm is determined by its interaction mode.
(4)
Since the number, area size and position distribution of targets in the image to be segmented directly determine the setting of the rectangular box, which cannot be changed by users in the interaction process, the flexibility of the interaction mode is poor. It seems that the execution efficiency and segmentation performance of the GrabCut algorithm are affected by the setting of the rectangular box on the image surface. However, from the above analysis, we can further find that the execution efficiency and segmentation performance of the GrabCut algorithm are essentially determined by the inherent characteristics of the image itself.

4. Proposed Algorithm

In this paper, we present an efficient superpixel-guided interactive image-segmentation algorithm based on graph theory. The proposed algorithm is completed in two stages. In the first stage, we first use the MeanShift algorithm to pre-segment the image to extract superpixels. The superpixels are then taken as nodes to construct a directed graph. Finally, we applied the maximum flow–minimum cut algorithm to complete superpixel-level based graph segmentation. In the second stage, in order to solve the boundary leakage problem caused by the MeanShift algorithm, the proposed algorithm creates a mask image, which is termed Trimap, using morphological erosion and expansion operations based on the first-stage segmentation results. This Trimap is composed of three parts: TrimapUnknown, TrimapForeground and TrimapBackground, respectively. The TrimapUnknown is a narrow band area containing both the background and foreground information. The TrimapForeground only contains foreground information, while the TrimapBackground only contains background information. Then we establish the corresponding foreground and background models for TrimapForeground and TrimapBackground respectively, and create a pixel-level graph model for the narrow band area. Finally the maximum flow–minimum cut algorithm is performed again to complete the pixel-level based image segmentation.
Because of the relatively large area of each region obtained in the pre-segmentation step using MeanShift, it is more effective to use the color histogram to represent each region than the color mean. Hence, as with the MSRM algorithm, the proposed algorithm also uses a color histogram to represent each superpixel, denoted as H i for superpixel i . For user-marked foreground F and background B , a color histogram is also used to model them, which is denoted as H F and H B respectively. We still use the Bhattacharyya coefficient ρ ( i , j ) to measure the similarity between adjacent superpixels i and j , which is defined as Equation (8). In addition to measuring the similarity between superpixels, the similarity of each superpixel with respect to the foreground and the background is calculated as follows,
{ ρ ( i , F ) = k = 1 Z H i ( k ) · H F ( k ) ρ ( i , B ) = k = 1 Z H i ( k ) · H B ( k )
where Z is the number of feature dimensions (size of the color histogram).
The above equations show that we not only consider the similarity between superpixels, but also consider the relationship between each superpixel and the interaction with the foreground and background, thus we define the following energy function,
E ( ι ) = i V R ( ι i ) + ( i , j ) E | ι i ι j | · B ( ι i , ι j )
where the first part is the regional term that measures the similarity between each superpixel and the interaction information with the foreground and background. The second part is the edge information that measures the similarity between adjacent superpixels. The definition of the regional term and the edge term is denoted as follows,
{ R ( ι i = 1 ) = ϒ R ( ι i = 0 ) = 0 i F R ( ι i = 1 ) = 0 R ( ι i = 0 ) = ϒ i B R ( ι i = 1 ) = ρ ( i , F ) R ( ι i = 0 ) = ρ ( i , B ) i U
B ( ι i , ι j ) = λ · ρ ( i , j )
where, ϒ = 1 + max i V j :   { i , j } N ρ ( i , j ) , N represents pixels neighborhood, and λ is a balance control parameter.
Figure 3 illustrates the segmentation result using the proposed method for the first stage on a rose image. As shown in Figure 3a, the edge of the target in the segmentation result is not smooth and there is some error segmentation due to the over-segmentation of the MeanShift algorithm. Therefore, based on the segmentation results of the first stage, we perform the segmentation process for the boundary region again. The corresponding binary image is denoted as S and is shown in Figure 3b. In this process, we first morphologically erode the binary image S to obtain a region S F = S Θ b which only contains objects and is described in white color (TrimapForeground) as shown in Figure 3c. Then, the morphological dilation operation is performed on the binary image S , and the difference between the dilation result and the erosion result is used as the narrow band region: S U = ( S b ) ( S Θ b ) , which contains both foreground and background and is represented in gray color (TrimapUnknown) in Figure 3c. The remaining area only contains the background information and can be expressed as S B = S ( S U + S F ) ; we show it in black color (TrimapBackground) in Figure 3c. Here, we use the square structural element for the morphology process, where the element b is in size ( 2 d + 1 ) × ( 2 d + 1 ) and d is a positive integer. Finally, we obtain a triple mask image (Trimap) as in Figure 3c. We then use the Orchard–Bouman clustering algorithm [27] to create the corresponding Gaussian mixed model for the foreground and background, respectively. For each Gaussian mixed model the number of Gaussians is denoted as K . In addition, in order to solve the over-segmentation problem caused by the MeanShift algorithm and further improve the segmentation accuracy of the narrow band region, this paper takes pixels at the narrow band region as nodes to build a graph again and then applies the maximum flow–minimum cut algorithm to obtain the final segmentation results.
Figure 3. First stage segmentation result and ternary mask images. (a) Segmentation result in the first stage; (b) binary result; (c) Trimap.
The whole process of the second stage is basically similar to the GrabCut algorithm. But the difference is that the foreground region in the proposed algorithm is completely determined and is no longer an unknown region containing both foreground and background information. Therefore, the color model established in the first stage of the proposed algorithm does not need to be updated. Moreover, the maximum flow–minimum cut algorithm only needs to execute once to get the ideal segmentation results. The reason is that because the area size of the determined narrow band in the first stage is very small, the determined foreground and background regions almost contain all the foreground information and background information simultaneously. Therefore the foreground and background models established in the proposed algorithm are very accurate. In the second stage, we use the same energy function as Equation (10) in the first stage, but the corresponding two energy terms are redefined as follows,
{ R ( ι i = 1 ) = ϒ R ( ι i = 0 ) = 0 i TrimapForeground R ( ι i = 1 ) = 0 R ( ι i = 0 ) = ϒ i TrimapBackground R ( ι i = 1 ) = D B ( i ) R ( ι i = 0 ) = D F ( i ) i TrimapUnknown
B ( ι i , ι j ) = λ d i s t ( i , j ) exp ( I i I j 2 2 σ 2 )
where, I i represents the color information of pixel i and ϒ = 1 + max i V j :   { i , j } N B ( ι i , ι j ) . In the process of building a graph model, we use 8 neighborhoods, and since λ B ( ι i , ι j ) is satisfied, we set K = 8 λ + 1 . D x ( i ) (variable x satisfies x { F , B } ) is the distribution of the i th pixel in the foreground or background model following the Gaussian mixed model, which is calculated as:
D x ( i ) = log k = 1 K π k x 1 det k x exp ( 1 2 [ I i μ k x ] T ( k x ) 1 [ I i μ k x ] )
where, π k x denotes the weight of the k th Gaussian distribution in the Gaussian mixed model of the foreground or background, μ k x and k x denote the mean and covariance matrix of the k th Gaussian distribution in the foreground or background model, and det k x denotes the determinant of covariance of matrix k x .
From the above description of the proposed algorithm, we can see that the proposed algorithm only requires a small amount of interaction information and can obtain ideal segmentation results in a relatively short period of time. As shown in Figure 2, before executing the GraphCut algorithm, users are asked to input interactive information in Figure 2a. However, each pixel needs to be determined. As for Figure 2b, although it is area-based segmentation, it is still difficult for users to decide which areas should be marked before performing the Lazy Snapping algorithm, while in Figure 2c, the large area size and small number of areas provides guidance for user interaction and simplifies the interaction process. In addition, compared with the MSRM algorithm, the proposed algorithm not only takes the similarity between regions into account, but also the similarity between each region and the foreground/background information. Furthermore, the proposed algorithm could accurately determine the foreground and background regions after the first stage segmentation process. Compared with the GrabCut algorithm, the established background model is more accurate and, most importantly, the foreground model is also established at the same time. For those images that contain only one large size object or a large number of objects almost distributed over the entire image, it is difficult to obtain ideal segmentation results using the GrabCut algorithm even if the maximum flow–minimum cut algorithm is applied more times.

5. Experimental Results

We performed the experiment on some public datasets. Figure 4 shows three testing images and Table 1 shows some basic information of the testing images, including the image size, the total number of pixels, the number of superpixels segmented using the WaterShed [21] and MeanShift [25] algorithms, respectively. As can be clearly seen from the table, compared with the number of pixels, the number of superpixels processed by the pre-segmentation techniques has been greatly reduced, where we can see that the MeanShift algorithm is more effective than the WaterShed algorithm. In the experiment, the parameters of our method were set as follows: the size of the color histogram was set as Z = 4096 , the balance control parameter in the graph model was set as λ = 1.0 , the size of the square structural element in the morphology process was set as b = 2 , and the number of Gaussians of GMMs in the foreground and background models was set as K = 5 . All the parameter values were set by following exhaustive experiments.
Figure 4. Test images. (a) Flower; (b) person; (c) animal.
Table 1. Number of pixels and superpixels of three test images.
Figure 5 shows the interaction information marked by users for the flower image. For the sake of fairness, the GraphCut [13], Lazy Snapping [18] and MSRM [20] algorithms and the proposed algorithm use the same interaction information, as shown in Figure 5a. The red strokes mark foreground information and the blue strokes mark background information. Because the GrabCut algorithm [19] completes the user interaction by setting a rectangular area, different from the above four algorithms, we only need to mark a red box as shown in Figure 5b. All pixels outside of the red box belong to the background, the others containing both the foreground and background are segmented iteratively by the maximum flow–minimum cut algorithm. Figure 6 illustrates the corresponding segmentation results of each algorithm, where the first row is the original image-segmentation results and the second row is the corresponding binary images. From the experimental results, we can see that the GraphCut and Lazy Snapping algorithms have a large number of misclassifications. Although the GrabCut algorithm is good at boundary processing, there are some misclassifications inside the target, which can be seen from the corresponding binary image. The segmentation result of the MSRM algorithm is ideal inside the foreground and background, but over-segmentation is serious around the edge of the target, which is similar to the segmentation result in the first stage of the proposed algorithm, as shown in Figure 3a. After the second pixel-level based segmentation for the narrow band, the proposed algorithm effectively solves the edge leakage problem, resulting in a good segmentation result, as shown in Figure 6e below.
Figure 5. User interaction for the flower image. (a) Interaction of GraphCut, Lazy Snapping, MSRM and proposed algorithms; (b) interaction of GrabCut.
Figure 6. Segmentation results for the flower image. (a) GraphCut; (b) Lazy Snapping; (c) GrabCut; (d) MSRM; (e) proposed algorithm.
Figure 7 illustrates the interaction information marked by users for the person image. Since the number of regions partitioned by the WaterShed algorithm is still large and the area size is generally small, no guide information can be provided to users in the interaction process. Therefore, in this experiment, the GraphCut and Lazy Snapping algorithms use the same interaction information. Because the MSRM algorithm and our algorithm use the same pre-segmentation MeanShift algorithm, the two algorithms use the same interaction information. As shown in Figure 7c, only a few marks are needed to determine a large amount of background and foreground information. The corresponding segmentation results are shown in Figure 8. Because of the fewer background area markers, as shown in Figure 7a, a large number of misclassifications in the GraphCut and Lazy Snapping algorithms occur. The main reason is that because users find it hard to make accurate interactive marking without guidance, this problem becomes more prominent when the image content is complex. As with the flower image, the segmentation result of GrabCut algorithm is also acceptable. This is because the background information outside the rectangle area can fully represent the background distribution of the image, and the established background model is relative accurate. However, only having the accurate background model is not enough, and we can see some misclassifications inside the target. Furthermore, because the MeanShift algorithm has the problem of edge leakage in the segmentation process, the misclassification of the segmentation result of the MSRM algorithm is still serious at the target edge, as shown in Figure 8d. In contrast, the segmentation result of the proposed algorithm performs much better than the others in terms of segmentation accuracy, especially in the target boundary areas, as shown in Figure 8e.
Figure 7. User interaction for the person image. (a) Interaction of GraphCut and Lazy Snapping; (b) interaction of GrabCut; (c) interaction of MSRM and proposed algorithm.
Figure 8. Segmentation results for the person image. (a) GraphCut; (b) Lazy Snapping; (c) GrabCut; (d) MSRM; (e) proposed algorithm.
In addition, Figure 9 shows the user interaction for the animal image and the segmentation results are shown in Figure 10. From the segmentation results, we can find that the segmentation results of the GraphCut and Lazy Snapping algorithms are not perfect even when a lot of marker information is used. For example, it is difficult for users to mark the narrow tail region, which leads to false segmentation of this region. For the GrabCut algorithm, because the two targets (dogs) are scattered in the image, the marked background information is not enough to represent the background distribution effectively, and a large amount of background information is contained in the rectangular area. Thus, it is difficult for the GrabCut algorithm to learn the background model accurately. Although the MSRM algorithm has the same initial conditions with the proposed algorithm, it can be seen from Figure 9c that the segmentation results by using the MSRM algorithm have a large number of errors. The main reason is that the MSRM algorithm only considers the similarity among regions and does not fully consider the relationship between regions and the interaction information marked by users. Although the GraphCut, Lazy Snapping, GrabCut and MSRM algorithms can improve segmentation accuracy by adding more markers, the interaction process is more complicated. However, the proposed method achieves very promising segmentation results under the same user interaction or less marking conditions, which is more effective in practical applications.
Figure 9. User interaction for the animal image. (a) Interaction of GraphCut and Lazy Snapping; (b) interaction of GrabCut; (c) interaction of MSRM and proposed algorithm.
Figure 10. Segmentation results for the animal image. (a) GraphCut; (b) Lazy Snapping; (c) GrabCut; (d) MSRM; (e) proposed algorithm.
Table 2 compares the running time of the five algorithms. As can be seen from the table, the proposed algorithm has almost the same efficiency in terms of running time with the GraphCut and Lazy Snapping algorithms. Although the segmentation performance of the GrabCut algorithm is relatively good, it is not efficient to use in real-time applications. The execution efficiency of the MSRM algorithm is mainly determined by the number of partitioned regions, which is also not robust for practical implementation. For our proposed algorithm, the high efficiency of the maximum flow–minimum cut algorithm and its property of being less sensitive to the number of regions makes it efficient and useful for real-time segmentation.
Table 2. Running time comparison (ms).
In order to further compare our proposed algorithm with the other four algorithms, we perform a detailed experiment on the Microsoft GrabCut image dataset [19], which is composed of 30 images provided with ground truth. Some testing images and the corresponding ground truth are shown in Figure 11. In this experiment, we evaluated the segmentation performance using the misclassification error (ME) [5], Rand index (RI) [28] and boundary recall (BR) [26]. ME is defined as follows,
M E = 1 | B G B S | + | F G F S | | B G | + | F G |
where B G and F G denote the background and foreground pixels of the ground truth ( G ), B S and F S denote the background and foreground pixels of the segmentation result ( S ), and | | is the cardinality of the set. The ME measures the percentage of wrongly assigned pixels, which ranges from zero for no error and one for completely wrong.
Figure 11. Partial test images and ground truth of the GrabCut image dataset.
RI computes the ratio of the number of pixel-pairs sharing the same label relationship between the segmentation result ( S ) and the ground truth ( G ). The definition of RI is described as follows,
R I = 1 ( 2 N ) i , j , i < j [ I ( l i S = l j S ) · I ( l i G = l j G ) + I ( l i S l j S ) · I ( l i G l j G ) ]
where N denotes the total number of pixels, I ( · ) is a binary function with I ( 1 ) = 1 and I ( 0 ) = 1 . The RI takes values in the range [0, 1], where a score of zero indicates the labelling of the test segmentation is totally opposite to the ground truth and 1 indicates that they are the same on every pixel pair.
BR measures the percentage of the ground truth boundaries recovered by the segmentation boundaries and is defined as follows,
B R = p G b I ( min q S b p q < 2 ) | G b |
where S b and G b denote the union sets of segmentation boundaries and ground truth boundaries, respectively.
The segmentation results are illustrated in Figure 12, Table 3, Table 4 and Table 5. Similarly, the GraphCut, Lazy Snapping algorithm and MSRM algorithms and the proposed algorithm use the same interaction information, as shown in Figure 12a, where the red strokes mark the foreground and the blue strokes mark the background. The green rectangular box is used for the GrabCut algorithm as also shown in Figure 12a. From the segmentation results, we find that the proposed algorithm obtains the overall best segmentation performance in ME, RI and BR measures.
Figure 12. Segmentation results on the GrabCut image dataset for different algorithms. (a) Interaction markers; (b) GraphCut; (c) Lazy Snapping; (d) GrabCut; (e) MSRM; (f) proposed algorithm.
Table 3. Misclassification error (ME) comparison.
Table 4. Rand index (RI) comparison.
Table 5. Boundary recall (BR) comparison.

6. Conclusions

Interactive image segmentation has a very wide range of applications in the field of natural image editing and other practical applications. By comparing the existing classical algorithms including GraphCut, Lazy Snapping, GrabCut and MSRM, this paper summarizes the advantage and disadvantage of each algorithm. On this basis, we propose an efficient superpixel-guided interactive image-segmentation algorithm based on graph theory. Our algorithm considers the MeanShift algorithm as a pre-segmentation technique and sequentially considers the segmented superpixels as nodes to establish the graph model, thus effectively improving the execution efficiency of the maximum flow–minimum cut algorithm. The proposed algorithm uses a color histogram to represent each superpixel, which is more efficient than the regional color mean information. The foreground and background are also all modeled by color histograms, and thus no additional modeling process is needed. In the process of interaction, the pre-segmented superpixels can provide guidance information for users, which simplifies the interaction process and automatically determines a large amount of foreground or background information with a small amount of markers. Considering the over-segmentation problem of the MeanShift algorithm in the segmentation process, the proposed algorithm uses morphological operation to construct a narrowband region near the target boundary, and simultaneously determines the foreground and background regions. After that, the corresponding foreground and background models are established, and then a graph model for the narrow-band region is constructed taking pixels as nodes. Finally, we execute the maximum flow–minimum cut algorithm again to effectively improve the segmentation accuracy around the object boundary. Experiments show that the proposed algorithm obtains promising segmentation performance compared with the GraphCut, Lazy Snapping, GrabCut and MSRM algorithms. However, because the proposed algorithm is fully based on the MeanShift algorithm, the segmentation performance of our algorithm depends on the segmentation results of the MeanShift algorithm. In future work, we will study multi-scale superpixel-based image-segmentation algorithms.

Author Contributions

J.L. conceived the work, wrote the paper, designed and performed the experiments; X.F. designed the experiments and commented on the paper; X.Z. designed the experiments; J.Z. and G.G. analyzed the experimental results.

Acknowledgments

This work was supported by the National Natural Science Foundation of China for Young Scientists (Grant No. 61502065), the Foundation and Frontier Research Key Program of Chongqing Science and Technology Commission (Grant No. cstc2015jcyjBX0127, No. cstc2017jcyjBX0059), the Humanities and Social Sciences Research Key Program of Chongqing Municipal Education Commission (Grant No. 17SKG136), the Humanities and Social Sciences of Ministry of Education Planning Fund (Grant No. 17YJCZH043), the Scientific and Technological Research Program of Chongqing Municipal Education Commission (Grant No. KJ1500922, No. KJ1600937, No. KJ1600945), the Foundation and Frontier Research Program of Chongqing Science and Technology Commission (Grant No. cstc2017jcyjAX0339, No. cstc2017jcyjAX0144) and the Youth Spark Support Project of Chongqing University of Technology (Grant No. 2015XH16).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yilmaz, A.; Javed, O.; Shah, M. Object tracking: A survey. ACM Comput. Surv. 2006, 38. [Google Scholar] [CrossRef]
  2. Xu, Y.; Dong, J.; Zhang, B.; Xu, D. Background modeling methods in video analysis: A review and comparative evaluation. CAAI Trans. Intell. Technol. 2016, 1, 43–60. [Google Scholar] [CrossRef]
  3. Andreopoulos, A.; Tsotsos, J.K. 50 years of object recognition: Directions forward. Comput. Vis. Image Underst. 2013, 117, 827–891. [Google Scholar] [CrossRef]
  4. Popescu, D.; Ichim, L. Intelligent Image Processing System for Detection and Segmentation of Regions of Interest in Retinal Images. Symmetry 2018, 10, 73. [Google Scholar] [CrossRef]
  5. Sezgin, M.; Sankur, B. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 2004, 13, 146–166. [Google Scholar]
  6. Vantaram, S.R.; Saber, E. Survey of contemporary trends in color image segmentation. J. Electron. Imaging 2012, 21. [Google Scholar] [CrossRef]
  7. Stutz, D.; Hermans, A.; Leibe, B. Superpixels: An evaluation of the state-of-the-art. Comput. Vis. Image Underst. 2018, 166, 1–27. [Google Scholar] [CrossRef]
  8. Long, J.; Shen, X.; Chen, H. Adaptive minimum error thresholding algorithm. Zidonghua Xuebao/Acta Autom. Sin. 2012, 38, 1134–1144. [Google Scholar] [CrossRef]
  9. Li, Q.; Zheng, M.; Li, F.; Wang, J.; Geng, Y.; Jiang, H. Retinal Image Segmentation Using Double-Scale Nonlinear Thresholding on Vessel Support Regions. CAAI Trans. Intell. Technol. 2017, 2, 178–190. [Google Scholar]
  10. Shen, X.; Long, J.; Chen, H.; Wei, W. Otsu thresholding algorithm based on rebuilding and dimension reduction of the 3-dimensional histogram. Tien Tzu Hsueh Pao/Acta Electron. Sin. 2011, 39, 1108–1114. [Google Scholar]
  11. Guo, Y.; Akbulut, Y.; Şengür, A.; Xia, R.; Smarandache, F. An Efficient Image Segmentation Algorithm Using Neutrosophic Graph Cut. Symmetry 2017, 9, 185. [Google Scholar] [CrossRef]
  12. Long, J.; Shen, X.; Chen, H. Interactive document images thresholding segmentation algorithm based on image regions. Jisuanji Yanjiu Yu Fazhan/Comput. Res. Dev. 2012, 49, 1420–1431. [Google Scholar]
  13. Boykov, Y.; Veksler, O.; Zabih, R. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 1222–1239. [Google Scholar] [CrossRef]
  14. Boykov, Y.; Kolmogorov, V. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 1124–1137. [Google Scholar] [CrossRef] [PubMed]
  15. Boykov, Y.; Funka-Lea, G. Graph cuts and efficient nd image segmentation. Int. J. Comput. Vis. 2006, 70, 109–131. [Google Scholar] [CrossRef]
  16. Chen, D.; Li, G.; Sun, Y.; Kong, J.; Jiang, G.; Tang, H.; Ju, Z.; Yu, H.; Liu, H. An interactive image segmentation method in hand gesture recognition. Sensors 2017, 17, 253. [Google Scholar] [CrossRef] [PubMed]
  17. McGuinness, K.; O’connor, N.E. A comparative evaluation of interactive segmentation algorithms. Pattern Recognit. 2010, 43, 434–444. [Google Scholar] [CrossRef]
  18. Li, Y.; Sun, J.; Tang, C.-K.; Shum, H.-Y. Lazy snapping. ACM Trans. Graph. 2004, 23, 303–308. [Google Scholar] [CrossRef]
  19. Rother, C.; Kolmogorov, V.; Blake, A. Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 2004, 23, 309–314. [Google Scholar] [CrossRef]
  20. Ning, J.; Zhang, L.; Zhang, D.; Wu, C. Interactive image segmentation by maximal similarity based region merging. Pattern Recognit. 2010, 43, 445–456. [Google Scholar] [CrossRef]
  21. Vincent, L.; Soille, P. Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 583–598. [Google Scholar] [CrossRef]
  22. Ciecholewski, M. Automated coronal hole segmentation from solar euv images using the watershed transform. J. Vis. Commun. Image Represent. 2015, 33, 203–218. [Google Scholar] [CrossRef]
  23. Cousty, J.; Bertrand, G.; Najman, L.; Couprie, M. Watershed cuts: Minimum spanning forests and the drop of water principle. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 1362–1374. [Google Scholar] [CrossRef] [PubMed]
  24. Cousty, J.; Bertrand, G.; Najman, L.; Couprie, M. Watershed cuts: Thinnings, shortest path forests, and topological watersheds. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 925–939. [Google Scholar] [CrossRef] [PubMed]
  25. Comaniciu, D.; Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 603–619. [Google Scholar] [CrossRef]
  26. Levinshtein, A.; Stere, A.; Kutulakos, K.N.; Fleet, D.J.; Dickinson, S.J.; Siddiqi, K. Turbopixels: Fast superpixels using geometric flows. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 2290–2297. [Google Scholar] [CrossRef] [PubMed]
  27. Orchard, M.T.; Bouman, C.A. Color quantization of images. IEEE Trans. Signal Process. 1991, 39, 2677–2690. [Google Scholar] [CrossRef]
  28. Unnikrishnan, R.; Pantofaru, C.; Hebert, M. Toward objective evaluation of image segmentation algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 929–944. [Google Scholar] [CrossRef] [PubMed]

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.