Robust Localization-Guided Dual-Branch Network for Camouflaged Object Segmentation
Abstract
:1. Introduction
- (1)
- We designed a comprehensive camouflaged object segmentation network that mimics the process of humans recognizing camouflaged targets by performing initial localization and distinguishing details between the background and foreground, enhancing the model’s performance in complex backgrounds.
- (2)
- We designed dual branches to address both initial localization and detailed analysis of the background and foreground. In the localization branch, a robust localization module (RLM) is employed, which introduces Atrous Spatial Pyramid Pooling (ASPP) [12] to jointly explore contextual information and perform multi-scale high-level feature fusion for more accurate localization of camouflaged targets. In the overall refinement branch, an edge refinement module (ERM) is designed to decode from top to bottom, using an improved SPP to remove noise from features and highlight detailed edge information of the target.
- (3)
- We designed an attention-guided head (AG-Head) to guide the predictions in the overall refinement branch, aiding the model in learning and utilizing important features more effectively. Additionally, joint loss functions are designed for training with five predictions from both branches. Comparative experiments on publicly available COS datasets against 13 state-of-the-art models demonstrate that our model exhibits the best overall performance. Furthermore, we conducted ablation experiments to validate the effectiveness of each key component.
2. Related Work
3. Methodology
3.1. Overall Introduction
3.2. Localization Branch
3.3. Overall Refinement Branch
3.4. Loss Function
4. Experiments
4.1. Experiment Setup
4.2. Evaluation Criteria
- Structure-measure: It includes region-aware structural similarity measure and object-aware structural similarity measure .
- 2.
- The Mean Absolute Error (MAE) is a widely used metric for evaluating image segmentation results. It measures the average absolute difference between predicted values and ground truth.
- 3.
- The weighted F-measure is a comprehensive evaluation metric based on precision and recall [36].
- 4.
- The adaptive E-measure () considers both pixel-level matching and image-level statistics to accurately evaluate the overall and local accuracy of the segmentation results [37]. Pixel-level matching involves comparing and matching the pixel values between the algorithm’s segmentation result and the ground truth segmentation result to determine their differences. On the other hand, image-level statistics involve analyzing and summarizing features such as pixel regions, shapes, and positions in the segmentation result to obtain a more comprehensive and detailed evaluation. The adaptive E-measure combines information from both aspects, enabling a more accurate assessment of the algorithm’s precision and performance, and providing strong support for the improvement and optimization of image segmentation algorithms. The adaptive E-measure is an effective tool for evaluating image segmentation techniques in practical applications, including medical image segmentation, scene understanding, and object detection. By incorporating pixel-level and image-level information, it provides a comprehensive and accurate evaluation that aligns with human perception and judgment. This contributes to the advancement and refinement of image segmentation techniques, benefiting various computer vision tasks.
4.3. Implementation Details
4.4. Experimental Results and Comparisons
4.5. Ablation Experiments
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Stevens, M.; Merilaita, S. Animal camouflage: Current issues and new perspectives. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 2009, 364, 423–427. [Google Scholar] [CrossRef] [PubMed]
- Perez-de la Fuente, R.; Delclos, X.; Penalver, E.; Speranza, M.; Wierzchos, J.; Ascaso, C.; Engel, M.S. Early evolution and ecology of camouflage in insects. Proc. Natl. Acad. Sci. USA 2012, 109, 21414–21419. [Google Scholar] [CrossRef] [PubMed]
- Fan, D.-P.; Zhou, T.; Ji, G.-P.; Zhou, Y.; Chen, G.; Fu, H.; Shen, J.; Shao, L. Inf-Net: Automatic COVID-19 Lung Infection Segmentation From CT Images. IEEE Trans. Med. Imaging 2020, 39, 2626–2637. [Google Scholar] [CrossRef] [PubMed]
- Trung-Nghia, L.; Nguyen, T.V.; Nie, Z.; Minh-Triet, T.; Sugimoto, A. Anabranch network for camouflaged object segmentation. Comput. Vis. Image Underst. 2019, 184, 45–56. [Google Scholar] [CrossRef]
- Lv, Y.; Zhang, J.; Dai, Y.; Li, A.; Liu, B.; Barnes, N.; Fan, D.-P. Simultaneously Localize, Segment and Rank the Camouflaged Objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 11586–11596. [Google Scholar]
- Zhai, Q.; Li, X.; Yang, F.; Chen, C.; Cheng, H.; Fan, D.-P. Mutual Graph Learning for Camouflaged Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 12992–13002. [Google Scholar]
- Tankus, A.; Yeshurun, Y. Convexity-Based Visual Camouflage Breaking. Comput. Vis. Image Underst. 2001, 82, 208–237. [Google Scholar] [CrossRef]
- Bhajantri, N.U.; Nagabhushan, P. Camouflage defect identification: A novel approach. In Proceedings of the 9th International Conference on Information Technology (ICIT’06), Bhubaneswar, India, 18–21 December 2006; pp. 145–148. [Google Scholar]
- Song, L.; Geng, W. A new camouflage texture evaluation method based on WSSIM and nature image features. In Proceedings of the 2010 International Conference on Multimedia Technology, Ningbo, China, 29–31 October 2010; pp. 1–4. [Google Scholar]
- Qin, X.; Zhang, Z.; Huang, C.; Gao, C.; Dehghan, M.; Jagersand, M. Basnet: Boundary-aware salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7479–7489. [Google Scholar]
- Mei, H.Y.; Ji, G.P.; Wei, Z.Q.; Yang, X.; Wei, X.P.; Fan, D.P. Camouflaged Object Segmentation with Distraction Mining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 8768–8777. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Lee, G.Y.; Tai, Y.W.; Kim, J.M. Deep Saliency with Encoded Low level Distance Map and High Level Features. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 660–668. [Google Scholar]
- Fan, D.P.; Cheng, M.M.; Liu, J.J.; Gao, S.H.; Hou, Q.B.; Borji, A. Salient Objects in Clutter: Bringing Salient Object Detection to the Foreground. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 196–212. [Google Scholar]
- Zhang, P.; Wang, D.; Lu, H.; Wang, H.; Ruan, X. Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detectionn. In Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 202–211. [Google Scholar]
- Chen, H.; Li, Y. Three-stream attention-aware network for RGB-D salient object detection. IEEE Trans. Image Process. 2019, 28, 2825–2835. [Google Scholar] [CrossRef] [PubMed]
- Wan, B.; Zhou, X.; Zheng, B.; Yin, H.; Zhu, Z.; Wang, H.; Sun, Y.; Zhang, J.; Yan, C. LFRNet: Localizing, Focus, and Refinement Network for Salient Object Detection of Surface Defects. IEEE Trans. Instrum. Meas. 2023, 72, 1–12. [Google Scholar] [CrossRef]
- Pang, Y.; Zhao, X.; Zhang, L.; Lu, H. Multi-scale interactive network for salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9413–9422. [Google Scholar]
- Chen, Z.; Xu, Q.; Cong, R.; Huang, Q. Global context-aware progressive aggregation network for salient object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York City, NY, USA, 7–12 February 2020; pp. 10599–10606. [Google Scholar]
- Uijlings, J.R.R.; van de Sande, K.E.A.; Gevers, T.; Smeulders, A.W.M. Selective Search for Object Recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Fan, D.-P.; Ji, G.-P.; Sun, G.; Cheng, M.-M.; Shen, J.; Shao, L. Camouflaged object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2777–2787. [Google Scholar]
- Wu, Z.; Su, L.; Huang, Q. Cascaded partial decoder for fast and accurate salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3907–3916. [Google Scholar]
- Liu, J.; Zhang, J.; Barnes, N. Confidence-aware learning for camouflaged object detection. arXiv 2021, arXiv:2106.11641. [Google Scholar]
- Ji, G.-P.; Zhu, L.; Zhuge, M.; Fu, K. Fast Camouflaged Object Detection via Edge-based Reversible Re-calibration Network. Pattern Recognit. 2022, 123, 108414. [Google Scholar] [CrossRef]
- Zhang, Y.; Wu, C. Unsupervised camouflaged object segmentation as domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 4334–4344. [Google Scholar]
- Shin, G.; Albanie, S.; Xie, W. Unsupervised salient object detection with spectral cluster voting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 3971–3980. [Google Scholar]
- Siméoni, O.; Sekkat, C.; Puy, G.; Vobecký, A.; Zablocki, É.; Pérez, P. Unsupervised object localization: Observing the background to discover objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 3176–3186. [Google Scholar]
- Gao, S.-H.; Cheng, M.-M.; Zhao, K.; Zhang, X.-Y.; Yang, M.-H.; Torr, P. Res2Net: A New Multi-Scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 652–662. [Google Scholar] [CrossRef] [PubMed]
- Luo, Z.; Mishra, A.; Achkar, A.; Eichel, J.; Li, S.; Jodoin, P.-M. Non-local deep features for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6609–6617. [Google Scholar]
- Milletari, F.; Navab, N.; Ahmadi, S.-A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar]
- Wei, J.; Wang, S.; Huang, Q. F3Net: Fusion, feedback and focus for salient object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12321–12328. [Google Scholar]
- Pang, Y.; Zhao, X.; Xiang, T.-Z.; Zhang, L.; Lu, H. Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2160–2170. [Google Scholar]
- Skurowski, P.; Abdulameer, H.; Błaszczyk, J.; Depta, T.; Kornacki, A.; Kozieł, P. Animal Camouflage Analysis: Chameleon Database; Politechniki Śląskiej: Gliwice, Poland, 2018. [Google Scholar]
- Fan, D.-P.; Cheng, M.-M.; Liu, Y.; Li, T.; Borji, A. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4548–4557. [Google Scholar]
- Margolin, R.; Zelnik-Manor, L.; Tal, A. How to evaluate foreground maps? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 248–255. [Google Scholar]
- Fan, D.-P.; Gong, C.; Cao, Y.; Ren, B.; Cheng, M.-M.; Borji, A. Enhanced-alignment measure for binary foreground map evaluation. arXiv 2018, arXiv:1805.10421. [Google Scholar]
- Fan, D.-P.; Ji, G.-P.; Zhou, T.; Chen, G.; Fu, H.; Shen, J.; Shao, L. Pranet: Parallel reverse attention network for polyp segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Lima, Peru, 4–8 October 2020; pp. 263–273. [Google Scholar]
- Liu, N.; Han, J.; Yang, M.-H. Picanet: Learning pixel-wise contextual attention for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3089–3098. [Google Scholar]
- Fan, D.-P.; Ji, G.-P.; Cheng, M.-M.; Shao, L. Concealed object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 6024–6042. [Google Scholar] [CrossRef] [PubMed]
- Sun, Y.; Chen, G.; Zhou, T.; Zhang, Y.; Liu, N. Context-aware cross-level fusion network for camouflaged object detection. arXiv 2021, arXiv:2105.12555. [Google Scholar]
- Jia, Q.; Yao, S.; Liu, Y.; Fan, X.; Liu, R.; Luo, Z. Segment, magnify and reiterate: Detecting camouflaged objects the hard way. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4713–4722. [Google Scholar]
- Wang, T.; Wang, J.; Wang, R. Camouflaged Object Detection with a Feature Lateral Connection Network. Electronics 2023, 12, 2570. [Google Scholar] [CrossRef]
- Yan, X.; Sun, M.; Han, Y.; Wang, Z.J.I.T.o.N.N.; Systems, L. Camouflaged object segmentation based on matching–recognition–refinement network. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Lv, Y.; Zhang, J.; Dai, Y.; Li, A.; Barnes, N.; Fan, D.-P. Toward Deeper Understanding of Camouflaged Object Detection. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 3462–3476. [Google Scholar] [CrossRef]
- De Curtò, J.; de Zarzà, I.; Calafate, C.T. Semantic scene understanding with large language models on unmanned aerial vehicles. Drones 2023, 7, 114. [Google Scholar] [CrossRef]
- Li, J.; Li, D.; Savarese, S.; Hoi, S. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv 2023, arXiv:2301.12597. [Google Scholar]
Model | CHAMELEON | COD10K | CAMO | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
FPN [21] | 0.794 | 0.590 | 0.075 | 0.835 | 0.697 | 0.411 | 0.075 | 0.711 | 0.684 | 0.483 | 0.131 | 0.791 |
NLDF [30] | 0.798 | 0.652 | 0.063 | 0.893 | 0.701 | 0.473 | 0.059 | 0.819 | 0.665 | 0.495 | 0.123 | 0.790 |
PiCANet [39] | 0.765 | 0.552 | 0.085 | 0.846 | 0.696 | 0.415 | 0.081 | 0.788 | 0.701 | 0.510 | 0.125 | 0.799 |
BASNet [10] | 0.687 | 0.474 | 0.118 | 0.742 | 0.634 | 0.365 | 0.105 | 0.676 | 0.618 | 0.413 | 0.159 | 0.719 |
CPD [23] | 0.857 | 0.731 | 0.048 | 0.923 | 0.750 | 0.531 | 0.053 | 0.853 | 0.716 | 0.556 | 0.113 | 0.796 |
PraNet [38] | 0.860 | 0.763 | 0.044 | 0.935 | 0.789 | 0.629 | 0.045 | 0.879 | 0.769 | 0.663 | 0.094 | 0.837 |
SINet [22] | 0.872 | 0.806 | 0.034 | 0.936 | 0.776 | 0.631 | 0.043 | 0.874 | 0.745 | 0.644 | 0.092 | 0.829 |
PFNet [11] | 0.882 | 0.810 | 0.033 | 0.942 | 0.800 | 0.660 | 0.040 | 0.868 | 0.782 | 0.695 | 0.085 | 0.852 |
SINet-V2 [40] | 0.888 | 0.816 | 0.030 | 0.942 | 0.815 | 0.680 | 0.037 | 0.887 | 0.820 | 0.743 | 0.070 | 0.882 |
C2FNet [41] | 0.888 | 0.828 | 0.032 | 0.935 | 0.813 | 0.686 | 0.036 | 0.890 | 0.796 | 0.719 | 0.080 | 0.864 |
ZoomNet [33] | 0.865 | 0.823 | 0.031 | 0.939 | 0.821 | 0.741 | 0.032 | 0.866 | 0.789 | 0.741 | 0.076 | 0.829 |
SegMaR (S-1) [42] | 0.892 | 0.823 | 0.028 | 0.937 | 0.813 | 0.682 | 0.035 | 0.880 | 0.805 | 0.724 | 0.072 | 0.864 |
FLCNet [43] | 0.891 | 0.837 | 0.028 | 0.948 | 0.818 | 0.700 | 0.034 | 0.893 | 0.808 | 0.741 | 0.071 | 0.782 |
Ours | 0.901 | 0.841 | 0.028 | 0.946 | 0.827 | 0.706 | 0.033 | 0.893 | 0.820 | 0.745 | 0.070 | 0.878 |
Model | ||
---|---|---|
SINet | 22.7 | 19.5G |
PFNet | 31.9 | 26.5G |
C2FNet | 24.1 | 25.2G |
Our | 28.2 | 24.1G |
Model | COD10K-Test | |||
---|---|---|---|---|
BASIC | 0.780 | 0.593 | 0.051 | 0.803 |
BASIC + PD1 | 0.789 | 0.628 | 0.046 | 0.829 |
BASIC + PD2 | 0.791 | 0.633 | 0.045 | 0.833 |
BASIC + PD3 | 0.790 | 0.634 | 0.045 | 0.832 |
BASIC + PD1 + ASPP | 0.791 | 0.631 | 0.046 | 0.831 |
BASIC + PD2 + ASPP | 0.793 | 0.633 | 0.044 | 0.836 |
BASIC + PD3 + ASPP | 0.792 | 0.633 | 0.044 | 0.836 |
Model | COD10K-Test | |||
---|---|---|---|---|
BASIC + RLM + 2 × ERM | 0.801 | 0.688 | 0.037 | 0.876 |
BASIC + RLM + 2 × ERM-I | 0.805 | 0.690 | 0.036 | 0.880 |
BASIC + RLM + 3 × ERM | 0.810 | 0695 | 0.034 | 0.883 |
BASIC + RLM + 3 × ERM-I | 0.813 | 0.699 | 0.033 | 0.886 |
BASIC + RLM + 3 × ERM-I + AG-H | 0.827 | 0.706 | 0.033 | 0.893 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, C.; Li, Y.; Wei, G.; Hou, X.; Sun, X. Robust Localization-Guided Dual-Branch Network for Camouflaged Object Segmentation. Electronics 2024, 13, 821. https://doi.org/10.3390/electronics13050821
Wang C, Li Y, Wei G, Hou X, Sun X. Robust Localization-Guided Dual-Branch Network for Camouflaged Object Segmentation. Electronics. 2024; 13(5):821. https://doi.org/10.3390/electronics13050821
Chicago/Turabian StyleWang, Chuanjiang, Yuepeng Li, Guohui Wei, Xiankai Hou, and Xiujuan Sun. 2024. "Robust Localization-Guided Dual-Branch Network for Camouflaged Object Segmentation" Electronics 13, no. 5: 821. https://doi.org/10.3390/electronics13050821
APA StyleWang, C., Li, Y., Wei, G., Hou, X., & Sun, X. (2024). Robust Localization-Guided Dual-Branch Network for Camouflaged Object Segmentation. Electronics, 13(5), 821. https://doi.org/10.3390/electronics13050821