Unsupervised Object Segmentation Based on Bi-Partitioning Image Model Integrated with Classification
Abstract
:1. Introduction
- We integrate the segmentation model with a classifier to solve the image segmentation problem.
- We use the Mumford–Shah functional and the Chan–Vese algorithm for the segmentation model. Furthermore, with the segmentation model, we can achieve more accurate boundaries.
- For the classifier, we propose a loss function that can meaningfully distinguish between the background and the foreground.
2. Related Works
2.1. Image Segmentation
2.1.1. Classical Image Segmentation
2.1.2. Supervised Image Segmentation
2.1.3. Unsupervised Image Segmentation
2.1.4. Weakly Supervised Image Segmentation
2.2. Mumford–Shah Functional and Chan–Vese Algorithm
2.2.1. Chan–Vese Algorithm
2.2.2. Mumford–Shah Functional
2.3. Region of Interest Detection
3. Method
3.1. Network Structure
3.1.1. Network with Chan–Vese Energy Function
3.1.2. Network with Piecewise-Smooth Mumford–Shah Energy Function
3.2. Loss Function
3.2.1.
- Chan–Vese Energy Function for Segmentation: To apply the Chan–Vese algorithm to our network, we modify (4) to (7). Where I is the input image, is the mask, and is the derivative of the mask with forward difference.Furthermore, and the are calculated with (8). The is the mean value of the first region, and the is the mean value of the second region.In (7), the first term and the second term divide the input image into two regions with similar pixel intensity. With the , which only has values 0 and 1, and the constant mean value , the first term only considers the first regions regardless of the second regions, and the first regions are grouped into the pixels of similar value. Similarly, with the and the constant mean value , the second term only considers the second regions regardless of the other regions and the pixel values in the second regions have similar values. The third term controls the regions’ size of the mask, and the fourth term controls the noise of the mask. We call each loss term foreground fidelity, background fidelity, mask region regularization, and mask smooth regularization. For a given mini-batch of training set {(, )(, )} when the is the ground-truth image level labels (i.e., images class labels) of mini-batch of input images , we set (7) as (9). is the sum of the foreground fidelity term and the background fidelity term. is the regularization of , and M is the size of mini-batch.
- Piecewise-smooth Mumford–Shah Energy Function for Segmentation: As with the relationship between (4) and traditional piecewise-smooth Mumford–Shah Energy Function, the differences between each loss function of our two networks are that constant values and of (7) are changed to (i.e., foreground matrix) and (i.e., background matrix). The loss function is (10).The terms in are foreground fidelity, background fidelity, mask region regularization, mask smooth regularization, foreground smooth regularization, and background smooth regularization in order. With the and the foreground matrix , the first term only considers the remained regions by the regardless of the other regions, and the remained regions resemble the same regions of the input image. Same as the first term, the remaining regions of the input image by the second term depend on the . The foreground smooth regularization and the background smooth regularization work with the foreground fidelity and the background fidelity, respectively, to control the smooth of the foreground matrix and the background matrix. These terms work closely with mask smooth regularization and adjust the smooth of the mask. The effect of the mask region regularization and the mask smooth regularization are the same as (7). Furthermore, we set (10) as (11), where is the sum of the foreground fidelity term and the background fidelity term. Furthermore, is the summation of the regularizations of , F, and B.
3.2.2.
3.3. Algorithm
3.3.1. Chan–Vese Algorithm
Algorithm 1 Chan–Vese Segmentation Algorithm. |
|
3.3.2. Piecewise-Smooth Mumford–Shah Algorithm
Algorithm 2 Mumford–Shah Segmentation Algorithm. |
|
4. Experimental Results
4.1. Training and Testing Details
4.2. Qualitative Comparisons
4.3. Quantitative Comparison
5. Discussions and Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- O’Shea, K.; Nash, R. An introduction to convolutional neural networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
- Cunningham, P.; Cord, M.; Delany, S.J. Supervised Learning. In Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval; Cord, M., Cunningham, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 21–49. [Google Scholar] [CrossRef]
- Ghahramani, Z. Unsupervised Learning. In Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2–14, 2003, Tübingen, Germany, August 4–16, 2003, Revised Lectures; Bousquet, O., von Luxburg, U., Rätsch, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 72–112. [Google Scholar] [CrossRef]
- Mumford, D.B.; Shah, J. Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 1989, 42, 577–685. [Google Scholar] [CrossRef] [Green Version]
- Chan, T.F.; Vese, L.A. Active contours without edges. IEEE Trans. Image Process. 2001, 10, 266–277. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Getreuer, P. Chan-vese segmentation. Image Process. Line 2012, 2, 214–224. [Google Scholar] [CrossRef]
- Cohen, R. The chan-vese algorithm. arXiv 2011, arXiv:1107.2782. [Google Scholar]
- Osher, S.; Sethian, J.A. Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton–Jacobi formulations. J. Comput. Phys. 1988, 79, 12–49. [Google Scholar] [CrossRef] [Green Version]
- Dogs, vs. Cats Redux: Kernels Edition. Available online: https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition/overview (accessed on 15 September 2016).
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
- Tobias, O.J.; Seara, R. Image segmentation by histogram thresholding using fuzzy sets. IEEE Trans. Image Process. 2002, 11, 1457–1465. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Arifin, A.Z.; Asano, A. Image segmentation by histogram thresholding using hierarchical cluster analysis. Pattern Recognit. Lett. 2006, 27, 1515–1521. [Google Scholar] [CrossRef]
- Ma, W.Y.; Manjunath, B.S. EdgeFlow: A technique for boundary detection and image segmentation. IEEE Trans. Image Process. 2000, 9, 1375–1388. [Google Scholar] [PubMed] [Green Version]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Kim, B.; Ye, J.C. Mumford–Shah loss functional for image segmentation with deep learning. IEEE Trans. Image Process. 2019, 29, 1856–1866. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26–30 June 2016; pp. 2921–2929. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Lee, J.; Kim, E.; Lee, S.; Lee, J.; Yoon, S. Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5267–5276. [Google Scholar]
- Huang, Z.; Wang, X.; Wang, J.; Liu, W.; Wang, J. Weakly-supervised semantic segmentation network with deep seeded region growing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7014–7023. [Google Scholar]
- Wang, Y.; Zhang, J.; Kan, M.; Shan, S.; Chen, X. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 12275–12284. [Google Scholar]
- Zolna, K.; Geras, K.J.; Cho, K. Classifier-agnostic saliency map extraction. Comput. Vis. Image Underst. 2020, 196, 102969. [Google Scholar] [CrossRef] [Green Version]
- Araslanov, N.; Roth, S. Single-stage semantic segmentation from image labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 4253–4262. [Google Scholar]
- Caselles, V.; Kimmel, R.; Sapiro, G. Geodesic active contours. Int. J. Comput. Vis. 1997, 22, 61–79. [Google Scholar] [CrossRef]
- Malladi, R.; Sethian, J.A.; Vemuri, B.C. Topology-independent shape modeling scheme. In Geometric Methods in Computer Vision II; International Society for Optics and Photonics: Washington, DC, USA, 1993; Volume 2031, pp. 246–258. [Google Scholar]
- Kass, M.; Witkin, A.; Terzopoulos, D. Snakes: Active contour models. Int. J. Comput. Vis. 1988, 1, 321–331. [Google Scholar] [CrossRef]
- Li, K.; Wu, Z.; Peng, K.C.; Ernst, J.; Fu, Y. Tell me where to look: Guided attention inference network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9215–9223. [Google Scholar]
- Fong, R.C.; Vedaldi, A. Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3429–3437. [Google Scholar]
- Singh, K.K.; Lee, Y.J. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3544–3553. [Google Scholar]
- Fong, R.; Patrick, M.; Vedaldi, A. Understanding deep networks via extremal perturbations and smooth masks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 2950–2958. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2017–2025. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Boston, MA, USA, 7–12 June 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the 2015 ICML Deep Learning Workshop, Lille, France, 10–11 June 2015; Volume 2. [Google Scholar]
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Class | Dog | Cat | Horse | Cow | Car | Person |
---|---|---|---|---|---|---|
Baseline CASM | 0.79 | 0.80 | 0.81 | 0.79 | 0.66 | 0.63 |
only classification loss | 0.56 | 0.54 | 0.42 | 0.36 | 0.21 | 0.62 |
Chan–Vese with classification loss | 0.86 | 0.82 | 0.83 | 0.79 | 0.65 | 0.65 |
Mumford–Shah with classification loss | 0.88 | 0.82 | 0.80 | 0.76 | 0.75 | 0.64 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Choi, H.-T.; Hong, B.-W. Unsupervised Object Segmentation Based on Bi-Partitioning Image Model Integrated with Classification. Electronics 2021, 10, 2296. https://doi.org/10.3390/electronics10182296
Choi H-T, Hong B-W. Unsupervised Object Segmentation Based on Bi-Partitioning Image Model Integrated with Classification. Electronics. 2021; 10(18):2296. https://doi.org/10.3390/electronics10182296
Chicago/Turabian StyleChoi, Hyun-Tae, and Byung-Woo Hong. 2021. "Unsupervised Object Segmentation Based on Bi-Partitioning Image Model Integrated with Classification" Electronics 10, no. 18: 2296. https://doi.org/10.3390/electronics10182296
APA StyleChoi, H.-T., & Hong, B.-W. (2021). Unsupervised Object Segmentation Based on Bi-Partitioning Image Model Integrated with Classification. Electronics, 10(18), 2296. https://doi.org/10.3390/electronics10182296