Interactive Image Segmentation Based on Feature-Aware Attention
Abstract
1. Introduction
2. Related Work
3. Proposed Method
3.1. Coarse Segmentation Module
3.2. Feature-Aware Attention Module
3.3. Refinement Network
4. Experiments
4.1. Datasets
- GrabCut [3]: The dataset contains 50 images and the segmentation masks of the respective scene objects.
- Berkeley [42]: One hundred photos with a single foreground object make up this dataset. The photos in this dataset contain numerous characteristics that make image segmentation challenging, such as poor foreground-background contrast or a heavily textured backdrop.
- MS COCO [43]: With 80 distinct object categories, this dataset is a sizable image segmentation dataset. For evaluation, we sample 800 object instances from the validation set of COCO 2017 following the implemenation of [31]. Specifically, we sample 10 unique instances from each of the 80 categories in MS COCO.
| Dataset | Year | Classes | Instances | Images | Resolution | 
|---|---|---|---|---|---|
| SBD [38] | 2011 | 20 | 26,843 | 11,355 | variable | 
| GrabCut [3] | 2004 | - | one object each | 50 | variable | 
| DAVIS [40] | 2016 | 4 | one object each | 345 | 640 × 480 | 
| Berkeley [42] | 2010 | - | 100 | 96 | variable | 
| MS COCO [43] | 2014 | 80 | 800 | 800 | variable | 
4.2. Experimental Settings
4.3. Evaluation Metric
4.4. Comparison Results
4.5. Ablation Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Boykov, Y.Y.; Jolly, M.P. Interactive graph cuts for optimal boundary & region segmentation of objects in ND images. In Proceedings of the Eighth IEEE International Conference on Computer Vision (ICCV) 2001, Vancouver, BC, Canada, 7–14 July 2001; Volume 1, pp. 105–112. [Google Scholar]
- Freedman, D.; Zhang, T. Interactive graph cut based segmentation with shape priors. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 755–762. [Google Scholar]
- Rother, C.; Kolmogorov, V.; Blake, A. “GrabCut” interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 2004, 23, 309–314. [Google Scholar] [CrossRef]
- Grady, L. Random walks for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1768–1783. [Google Scholar] [PubMed]
- Kass, M.; Witkin, A.; Terzopoulos, D. Snakes: Active contour models. Int. J. Comput. Vis. 1988, 1, 321–331. [Google Scholar] [CrossRef]
- Xu, N.; Price, B.; Cohen, S.; Yang, J.; Huang, T.S. Deep interactive object selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 373–381. [Google Scholar]
- Majumder, S.; Yao, A. Content-aware multi-level guidance for interactive instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11602–11611. [Google Scholar]
- Ding, Z.; Wang, T.; Sun, Q.; Chen, F. Rethinking click embedding for deep interactive image segmentation. IEEE Trans. Ind. Inform. 2022, 19, 261–273. [Google Scholar] [CrossRef]
- Chen, S.; Tan, X.; Wang, B.; Hu, X. Reverse attention for salient object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 234–250. [Google Scholar]
- Li, K.; Wu, Z.; Peng, K.C.; Ernst, J.; Fu, Y. Tell me where to look: Guided attention inference network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 9215–9223. [Google Scholar]
- Sinha, A.; Dolz, J. Multi-scale self-guided attention for medical image segmentation. IEEE J. Biomed. Health Inform. 2020, 25, 121–130. [Google Scholar] [CrossRef] [PubMed]
- Lin, Z.; Zhang, Z.; Chen, L.Z.; Cheng, M.M.; Lu, S.P. Interactive image segmentation with first click attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 14–19 June 2020; pp. 13339–13348. [Google Scholar]
- Chen, L.C.; Yang, Y.; Wang, J.; Xu, W.; Yuille, A.L. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3640–3649. [Google Scholar]
- Zhao, H.; Zhang, Y.; Liu, S.; Shi, J.; Loy, C.C.; Lin, D.; Jia, J. Psanet: Point-wise spatial attention network for scene parsing. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 267–283. [Google Scholar]
- Bai, X.; Sapiro, G. Geodesic matting: A framework for fast interactive image and video segmentation and matting. Int. J. Comput. Vis. 2009, 82, 113–132. [Google Scholar] [CrossRef]
- Feng, J.; Price, B.; Cohen, S.; Chang, S.F. Interactive segmentation on rgbd images via cue selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 156–164. [Google Scholar]
- Chen, X.; Zhao, Z.; Zhang, Y.; Duan, M.; Qi, D.; Zhao, H. FocalClick: Towards Practical Interactive Image Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1300–1309. [Google Scholar]
- Liu, Q.; Zheng, M.; Planche, B.; Karanam, S.; Chen, T.; Niethammer, M.; Wu, Z. PseudoClick: Interactive Image Segmentation with Click Imitation. arXiv 2022, arXiv:2207.05282. [Google Scholar]
- Kontogianni, T.; Celikkan, E.; Tang, S.; Schindler, K. Interactive Object Segmentation in 3D Point Clouds. arXiv 2022, arXiv:2204.07183. [Google Scholar]
- Wang, Y.; Deng, Z.; Hu, X.; Zhu, L.; Yang, X.; Xu, X.; Heng, P.A.; Ni, D. Deep attentional features for prostate segmentation in ultrasound. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; Springer: Cham, Switzerland, 2018; pp. 523–530. [Google Scholar]
- Zhou, Y.; Zhu, Y.; Ye, Q.; Qiu, Q.; Jiao, J. Weakly supervised instance segmentation using class peak response. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3791–3800. [Google Scholar]
- Bearman, A.; Russakovsky, O.; Ferrari, V.; Fei-Fei, L. What’s the point: Semantic segmentation with point supervision. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 549–565. [Google Scholar]
- Khoreva, A.; Benenson, R.; Hosang, J.; Hein, M.; Schiele, B. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 876–885. [Google Scholar]
- Xu, N.; Price, B.; Cohen, S.; Yang, J.; Huang, T. Deep grabcut for object selection. arXiv 2017, arXiv:1707.00243. [Google Scholar]
- Dai, J.; He, K.; Sun, J. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1635–1643. [Google Scholar]
- Lin, D.; Dai, J.; Jia, J.; He, K.; Sun, J. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3159–3167. [Google Scholar]
- Xu, C.; Dong, B.; Stier, N.; McCully, C.; Howell, D.A.; Sen, P.; Höllerer, T. Interactive Segmentation and Visualization for Tiny Objects in Multi-megapixel Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–23 June 2022; pp. 21447–21452. [Google Scholar]
- Liew, J.; Wei, Y.; Xiong, W.; Ong, S.H.; Feng, J. Regional interactive image segmentation networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2746–2754. [Google Scholar]
- Acuna, D.; Ling, H.; Kar, A.; Fidler, S. Efficient interactive annotation of segmentation datasets with polygon-rnn++. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 859–868. [Google Scholar]
- Ling, H.; Gao, J.; Kar, A.; Chen, W.; Fidler, S. Fast interactive object annotation with curve-gcn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5257–5266. [Google Scholar]
- Sofiiuk, K.; Petrov, I.; Barinova, O.; Konushin, A. f-brs: Rethinking backpropagating refinement for interactive segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 14–19 June 2020; pp. 8623–8632. [Google Scholar]
- Yang, Z.; Wei, Y.; Yang, Y. Collaborative video object segmentation by foreground-background integration. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 332–348. [Google Scholar]
- Ding, H.; Cohen, S.; Price, B.; Jiang, X. Phraseclick: Toward achieving flexible interactive segmentation by phrase and click. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 417–435. [Google Scholar]
- Kontogianni, T.; Gygli, M.; Uijlings, J.; Ferrari, V. Continuous adaptation for interactive object segmentation by learning from corrections. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 579–596. [Google Scholar]
- Sofiiuk, K.; Petrov, I.A.; Konushin, A. Reviving iterative training with mask guidance for interactive segmentation. arXiv 2021, arXiv:2102.06583. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 7354–7363. [Google Scholar]
- Hariharan, B.; Arbeláez, P.; Bourdev, L.; Maji, S.; Malik, J. Semantic contours from inverse detectors. In Proceedings of the 2011 International Conference On Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 991–998. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Perazzi, F.; Pont-Tuset, J.; McWilliams, B.; Van Gool, L.; Gross, M.; Sorkine-Hornung, A. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 724–732. [Google Scholar]
- Li, Z.; Chen, Q.; Koltun, V. Interactive image segmentation with latent diversity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 577–585. [Google Scholar]
- McGuinness, K.; O’connor, N.E. A comparative evaluation of interactive segmentation algorithms. Pattern Recognit. 2010, 43, 434–444. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
- Jang, W.D.; Kim, C.S. Interactive image segmentation via backpropagating refinement scheme. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5297–5306. [Google Scholar]
- Gulshan, V.; Rother, C.; Criminisi, A.; Blake, A.; Zisserman, A. Geodesic star convexity for interactive image segmentation. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 3129–3136. [Google Scholar]





| Layer | 1 | 2 | 3 | 4 | 5 | 6 | 
|---|---|---|---|---|---|---|
| Convolution | 1 × 1 | 3 × 3 | 3 × 3 | 3 × 3 | 3 × 3 | 1 × 1 | 
| Dilation | 1 | 2 | 4 | 8 | 16 | 1 | 
| Method | GrabCut | Berkeley | SBD | DAVIS | COCO | 
|---|---|---|---|---|---|
| mNoc@90 | mNoc@90 | mNoc@85 | mNoc@85 | mNoc@85 | |
| GC [3] | 10 | 14.22 | 13.6 | 15.13 | 18.53 | 
| RW [4] | 13.77 | 14.02 | 12.22 | 16.71 | 14.10 | 
| GSC [49] | 9.12 | 12.57 | 12.69 | 15.35 | 14.08 | 
| DOS [6] | 6.08 | 8.65 | 9.22 | 9.03 | 8.31 | 
| LD [41] | 4.79 | - | 7.41 | 5.05 | - | 
| RIS [28] | 5.00 | - | 6.03 | - | 5.98 | 
| CAG [7] | 3.58 | 5.6 | - | - | 5.4 | 
| BRS [48] | 3.60 | 5.08 | 6.59 | 5.58 | - | 
| f-BRS [31] | 2.98 | 4.34 | 5.06 | 5.04 | - | 
| Ours (VGG19) | 2.89 | 5.16 | 5.32 | 4.58 | 5.79 | 
| Ours (ResNet101) | 2.43 | 4.78 | 4.89 | 4.23 | 5.35 | 
| Settings | Backbone | GrabCut | Berkeley | 
|---|---|---|---|
| VGG19 | 2.89 | 5.16 | |
| Full | ResNet50 | 2.50 | 4.97 | 
| ResNet101 | 2.43 | 4.78 | |
| VGG19 | 3.32 | 5.90 | |
| w/o RF | ResNet50 | 3.08 | 5.63 | 
| ResNet101 | 2.99 | 5.42 | 
| Datasets | Baseline | 1st Click | 2nd Click | 3rd Click | 
|---|---|---|---|---|
| Grabcut | 0.81 | 0.83 | 0.89 | 0.93 | 
| SBD | 0.7 | 0.72 | 0.81 | 0.83 | 
| DAVIS | 0.69 | 0.72 | 0.83 | 0.87 | 
| berkeley | 0.73 | 0.8 | 0.84 | 0.87 | 
| COCO | 0.54 | 0.61 | 0.72 | 0.81 | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, J.; Ban, X.; Han, B.; Yang, X.; Yao, C. Interactive Image Segmentation Based on Feature-Aware Attention. Symmetry 2022, 14, 2396. https://doi.org/10.3390/sym14112396
Sun J, Ban X, Han B, Yang X, Yao C. Interactive Image Segmentation Based on Feature-Aware Attention. Symmetry. 2022; 14(11):2396. https://doi.org/10.3390/sym14112396
Chicago/Turabian StyleSun, Jinsheng, Xiaojuan Ban, Bing Han, Xueyuan Yang, and Chao Yao. 2022. "Interactive Image Segmentation Based on Feature-Aware Attention" Symmetry 14, no. 11: 2396. https://doi.org/10.3390/sym14112396
APA StyleSun, J., Ban, X., Han, B., Yang, X., & Yao, C. (2022). Interactive Image Segmentation Based on Feature-Aware Attention. Symmetry, 14(11), 2396. https://doi.org/10.3390/sym14112396
 
        




 
       