Dynamic Knowledge Distillation with Noise Elimination for RGB-D Salient Object Detection
Abstract
:1. Introduction
- We propose a novel dynamic distillation strategy, which can adaptively assign the distillation weight by simultaneously considering the detection performance of the teacher and student networks within the training stage. As a result, the final model can pay more attention on hard samples and improve the overall performance.
- We propose a noise elimination method by taking full merit of knowledge prior from the teacher network to alleviate the impact of depth maps with low quality. The student network can take benefit from this method without increasing extra parameters and computations.
- We adopt a single stream for RGB-D SOD in order to bypass the depth network and avoid designing a complicated model. This single stream achieves competitive performance by only using VGG16 (57.9 MB) and VGG19 (78.2 MB), which are more applicable for practical use. Extensive experimental results on five benchmarks demonstrate that our methods can achieve competing performance within a fast lightweight architecture.
2. Related Work
3. Methodologies
3.1. Overview
3.2. Dynamic Knowledge Distillation
3.3. Noises Elimination with the DKD
| Algorithm 1 DKD | 
| Require:  is the prediction of teacher network, is the prediction of student network, G is the corresponding ground truth. 
 | 
4. Experiments
4.1. Datasets and Evaluation Metrics
4.2. Implementation Details
4.3. Comparisons with the State of Arts
4.4. Ablation Studies
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Zhu, J.Y.; Wu, J.; Xu, Y.; Chang, E.; Tu, Z. Unsupervised object class discovery via saliency-guided multiple class learning. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Fan, D.P.; Wang, W.; Cheng, M.M.; Shen, J. Shifting more attention to video salient object detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Shimoda, W.; Yanai, K. Distinct class-specific saliency maps for weakly supervised semantic segmentation. In Proceedings of the ECCV, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Mahadevan, V.; Vasconcelos, N. Saliency-based discriminant tracking. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Zhao, R.; Oyang, W.; Wang, X. Person re-identification by saliency learning. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 356–370. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Fan, D.P.; Dai, Y.; Yu, X.; Zhong, Y.; Barnes, N.; Shao, L. RGB-D saliency detection via cascaded mutual information minimization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 4338–4347. [Google Scholar]
- Zhao, X.; Pang, Y.; Zhang, L.; Lu, H. Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction. arXiv 2022, arXiv:2203.04895. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Zhang, P.; Su, L.; Li, L.; Bao, B.; Cosman, P.; Li, G.; Huang, Q. Training Efficient Saliency Prediction Models with Knowledge Distillation. In Proceedings of the ACM, Aberdeen, UK, 15–17 July 2019. [Google Scholar]
- Piao, Y.; Rong, Z.; Zhang, M.; Ren, W.; Lu, H. A2dele: Adaptive and attentive depth distiller for efficient RGB-D salient object detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Lang, C.; Nguyen, T.V.; Katti, H.; Yadati, K.; Kankanhalli, M.; Yan, S. Depth matters: Influence of depth cues on visual saliency. In Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012. [Google Scholar]
- Zhao, J.X.; Cao, Y.; Fan, D.P.; Cheng, M.M.; Li, X.Y.; Zhang, L. Contrast prior and fluid pyramid integration for RGBD salient object detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Pang, Y.; Zhang, L.; Zhao, X.; Lu, H. Hierarchical dynamic filtering network for RGB-D salient object detection. arXiv 2020, arXiv:2007.06227. [Google Scholar]
- Chen, S.; Fu, Y. Progressively guided alternate refinement network for RGB-D salient object detection. arXiv 2020, arXiv:2008.07064. [Google Scholar]
- Zhou, X.; Wen, H.; Shi, R.; Yin, H.; Zhang, J.; Yan, C. FANet: Feature aggregation network for RGBD saliency detection. Signal Process. Image Commun. 2022, 102, 116591. [Google Scholar] [CrossRef]
- Cheng, X.; Rao, Z.; Chen, Y.; Zhang, Q. Explaining knowledge distillation by quantifying the knowledge. arXiv 2020, arXiv:2003.03622. [Google Scholar]
- Zheng, Z.; Ye, R.; Wang, P.; Ren, D.; Zuo, W.; Hou, Q.; Cheng, M.M. Localization Distillation for Dense Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–23 June 2022; pp. 9407–9416. [Google Scholar]
- Yang, Z.; Li, Z.; Jiang, X.; Gong, Y.; Yuan, Z.; Zhao, D.; Yuan, C. Focal and global knowledge distillation for detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–23 June 2022; pp. 4643–4652. [Google Scholar]
- Xu, C.; Gao, W.; Li, T.; Bai, N.; Li, G.; Zhang, Y. Teacher–student collaborative knowledge distillation for image classification. Appl. Intell. 2022, 1–13. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Liu, S.; Huang, D.; Wang, Y. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400. [Google Scholar]
- Máttyus, G.; Luo, W.; Urtasun, R. Deeproadmapper: Extracting road topology from aerial images. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Peng, H.; Li, B.; Xiong, W.; Hu, W.; Ji, R. RGBD salient object detection: A benchmark and algorithms. In Proceedings of the 13th European Conference, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
- Ju, R.; Liu, Y.; Ren, T.; Ge, L.; Wu, G. Depth-aware salient object detection using anisotropic center-surround difference. Signal Process. Image Commun. 2015, 38, 115–126. [Google Scholar] [CrossRef]
- Fan, D.; Lin, Z.; Zhao, J.; Liu, Y.; Zhang, Z.; Hou, Q.; Zhu, M.; Cheng, M. Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks. arXiv 2019, arXiv:1907.06781. [Google Scholar] [CrossRef] [PubMed]
- Cheng, Y.; Fu, H.; Wei, X.; Xiao, J.; Cao, X. Depth enhanced saliency detection method. In Proceedings of the Conference on Internet Multimedia Computing and Service, Xiamen, China, 10–12 July 2014. [Google Scholar]
- Li, N.; Ye, J.; Ji, Y.; Ling, H.; Yu, J. Saliency detection on light field. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Qu, L.; He, S.; Zhang, J.; Tian, J.; Tang, Y.; Yang, Q. RGBD salient object detection via deep fusion. IEEE Trans. Image Process. 2017, 26, 2274–2285. [Google Scholar] [CrossRef]
- Han, J.; Chen, H.; Liu, N.; Yan, C.; Li, X. CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans. Image Process. 2017, 26, 2274–2285. [Google Scholar] [CrossRef] [PubMed]
- Wang, N.; Gong, X. Adaptive fusion for RGB-D salient object detection. IEEE Access 2019, 7, 55277–55284. [Google Scholar] [CrossRef]
- Chen, H.; Li, Y.; Su, D. Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognit. 2019, 86, 376–385. [Google Scholar] [CrossRef]
- Piao, Y.; Ji, W.; Li, J.; Zhang, M.; Lu, H. Depth-induced multi-scale recurrent attention network for saliency detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Chen, H.; Li, Y. Three-stream attention-aware network for RGB-D salient object detection. IEEE Trans. Image Process. 2019, 28, 2825–2835. [Google Scholar] [CrossRef]
- Zhao, X.; Zhang, L.; Pang, Y.; Lu, H.; Zhang, L. A single stream network for robust and real-time RGB-D salient object detection. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020. [Google Scholar]









| Metric | DF | CTMF | AFNet | MMCI | TANet | DMRA | CPFP | D3Net | A2dele | DANet | FANet | CMINet | Ours | Ours * | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NJUD | ↑ | 0.789 | 0.857 | 0.804 | 0.868 | 0.888 | 0.896 | 0.890 | 0.903 | 0.905 | 0.890 | 0.892 | 0.925 | 0.928 | 0.934 | 
| 0.735 | 0.849 | 0.772 | 0.859 | 0.878 | 0.885 | 0.878 | 0.895 | 0.867 | 0.897 | 0.899 | 0.939 | 0.916 | 0.920 | ||
| 0.818 | 0.866 | 0.847 | 0.882 | 0.909 | 0.920 | 0.900 | 0.901 | 0.914 | 0.926 | 0.914 | 0.956 | 0.949 | 0.952 | ||
| 0.151 | 0.085 | 0.100 | 0.079 | 0.061 | 0.051 | 0.053 | 0.051 | 0.052 | 0.046 | 0.044 | 0.032 | 0.032 | 0.030 | ||
| NLPR | 0.752 | 0.841 | 0.816 | 0.841 | 0.876 | 0.888 | 0.884 | 0.904 | 0.891 | 0.908 | 0.885 | 0.909 | 0.922 | 0.930 | |
| 0.769 | 0.860 | 0.799 | 0.856 | 0.886 | 0.898 | 0.884 | 0.906 | 0.889 | 0.908 | 0.913 | 0.941 | 0.921 | 0.924 | ||
| 0.840 | 0.869 | 0.884 | 0.872 | 0.926 | 0.942 | 0.920 | 0.934 | 0.937 | 0.945 | 0.951 | 0.964 | 0.958 | 0.960 | ||
| 0.110 | 0.056 | 0.058 | 0.059 | 0.041 | 0.031 | 0.038 | 0.034 | 0.031 | 0.031 | 0.026 | 0.019 | 0.022 | 0.021 | ||
| DES | 0.625 | 0.865 | 0.775 | 0.839 | 0.853 | 0.906 | 0.882 | 0.917 | 0.897 | 0.916 | 0.874 | 0.926 | 0.926 | 0.928 | |
| 0.685 | 0.863 | 0.770 | 0.848 | 0.858 | 0.899 | 0.872 | 0.904 | 0.883 | 0.905 | 0.894 | 0.953 | 0.918 | 0.918 | ||
| 0.806 | 0.911 | 0.874 | 0.904 | 0.919 | 0.944 | 0.927 | 0.956 | 0.918 | 0.961 | 0.925 | 0.970 | 0.965 | 0.966 | ||
| 0.131 | 0.055 | 0.068 | 0.065 | 0.046 | 0.030 | 0.038 | 0.030 | 0.030 | 0.028 | 0.026 | 0.015 | 0.022 | 0.023 | ||
| LFSD | 0.854 | 0.815 | 0.780 | 0.813 | 0.827 | 0.872 | 0.850 | 0.849 | 0.858 | - | 0.855 | 0.862 | 0.865 | 0.862 | |
| 0.786 | 0.796 | 0.738 | 0.787 | 0.801 | 0.847 | 0.828 | 0.832 | 0.833 | - | 0.850 | 0.877 | 0.834 | 0.839 | ||
| 0.841 | 0.851 | 0.810 | 0.840 | 0.851 | 0.899 | 0.867 | 0.860 | 0.875 | - | 0.882 | 0.911 | 0.883 | 0.883 | ||
| 0.142 | 0.120 | 0.133 | 0.132 | 0.111 | 0.076 | 0.088 | 0.099 | 0.077 | - | 0.076 | 0.064 | 0.080 | 0.078 | ||
| SIP | 0.704 | 0.720 | 0.756 | 0.840 | 0.851 | 0.847 | 0.870 | 0.882 | 0.855 | 0.901 | - | 0.887 | 0.872 | 0.882 | |
| 0.653 | 0.716 | 0.720 | 0.833 | 0.835 | 0.800 | 0.850 | 0.864 | 0.828 | 0.878 | - | 0.894 | 0.855 | 0.865 | ||
| 0.794 | 0.824 | 0.815 | 0.886 | 0.894 | 0.858 | 0.899 | 0.903 | 0.890 | 0.914 | - | 0.933 | 0.908 | 0.914 | ||
| 0.185 | 0.139 | 0.118 | 0.086 | 0.075 | 0.088 | 0.064 | 0.063 | 0.070 | 0.054 | - | 0.044 | 0.060 | 0.056 | ||
| Backbone | VGG16 | VGG16 | VGG16 | VGG16 | VGG16 | VGG19 | VGG16 | VGG16 | VGG16 | VGG16/19 | VGG16 | ResNet50 | VGG16 | VGG19 | |
| Epoch | - | - | - | 30,000(ite) | - | 50 | 10,000(ite) | 30 | 50 | 40 | 40 | 100 | 40 | 40 | 
| Method | MMCI | TANet | PCANet | D3Net | CPFP | DMRA | DANet | CMINet | A2dele | Ours | 
|---|---|---|---|---|---|---|---|---|---|---|
| Model Size (MB) | 951.9 | 929.7 | 533.6 | 519 | 278 | 238.8 | 106.7 | 84 | 57.3 | 57.9/78.2 | 
| FPS | 19 | - | 15 | - | 7 | 10 | 32 | 10 | 120 | 136 | 
| Metric | RGB | RGBD | s = 0.3 | s = 0.5 | s = 0.7 | s = Dynamic | +Threshold | |
|---|---|---|---|---|---|---|---|---|
| SIP | ↑ | 0.704 | 0.773 | 0.832 | 0.845 | 0.843 | 0.849 | 0.853 | 
| 0.654 | 0.724 | 0.796 | 0.805 | 0.809 | 0.811 | 0.817 | ||
| 0.108 | 0.086 | 0.063 | 0.061 | 0.059 | 0.058 | 0.056 | ||
| NJUD | 0.776 | 0.830 | 0.902 | 0.902 | 0.898 | 0.904 | 0.914 | |
| 0.739 | 0.799 | 0.895 | 0.889 | 0.880 | 0.893 | 0.901 | ||
| 0.080 | 0.060 | 0.030 | 0.034 | 0.037 | 0.032 | 0.030 | ||
| NLPR | 0.780 | 0.816 | 0.873 | 0.876 | 0.870 | 0.876 | 0.890 | |
| 0.746 | 0.781 | 0.877 | 0.875 | 0.865 | 0.876 | 0.887 | ||
| 0.046 | 0.041 | 0.022 | 0.024 | 0.026 | 0.024 | 0.021 | ||
| LFSD | 0.713 | 0.784 | 0.825 | 0.832 | 0.830 | 0.834 | 0.835 | |
| 0.656 | 0.741 | 0.780 | 0.793 | 0.790 | 0.795 | 0.796 | ||
| 0.142 | 0.102 | 0.086 | 0.080 | 0.080 | 0.078 | 0.078 | 
| Metric | DANet | DANet + DKD | |
|---|---|---|---|
| SIP | ↑ | 0.864 | 0.848 | 
| 0.829 | 0.811 | ||
| 0.054 | 0.058 | ||
| DES | 0.891 | 0.892 | |
| 0.848 | 0.870 | ||
| 0.028 | 0.025 | ||
| Model Size (MB) | 106.7 | 78.2 | 
| Metric | s = 0.3 | s = 0.5 | s = 0.7 | s = Dynamic | |
|---|---|---|---|---|---|
| NLPR | ↑ | 0.882 | 0.876 | 0.876 | 0.884 | 
| 0.869 | 0.865 | 0.866 | 0.879 | ||
| 0.024 | 0.025 | 0.027 | 0.024 | ||
| LFSD | 0.790 | 0.786 | 0.788 | 0.798 | |
| 0.745 | 0.735 | 0.751 | 0.753 | ||
| 0.104 | 0.106 | 0.101 | 0.097 | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ren, G.; Yu, Y.; Liu, H.; Stathaki, T. Dynamic Knowledge Distillation with Noise Elimination for RGB-D Salient Object Detection. Sensors 2022, 22, 6188. https://doi.org/10.3390/s22166188
Ren G, Yu Y, Liu H, Stathaki T. Dynamic Knowledge Distillation with Noise Elimination for RGB-D Salient Object Detection. Sensors. 2022; 22(16):6188. https://doi.org/10.3390/s22166188
Chicago/Turabian StyleRen, Guangyu, Yinxiao Yu, Hengyan Liu, and Tania Stathaki. 2022. "Dynamic Knowledge Distillation with Noise Elimination for RGB-D Salient Object Detection" Sensors 22, no. 16: 6188. https://doi.org/10.3390/s22166188
APA StyleRen, G., Yu, Y., Liu, H., & Stathaki, T. (2022). Dynamic Knowledge Distillation with Noise Elimination for RGB-D Salient Object Detection. Sensors, 22(16), 6188. https://doi.org/10.3390/s22166188
 
         
                                                

 
       