High Precision Detection of Salient Objects Based on Deep Convolutional Networks with Proper Combinations of Shallow and Deep Connections
Abstract
:1. Introduction
2. Related Work
3. Analysis and Optimization of the Connection Mechanism of DNNs
3.1. Fully Convolutional Networks (FCNs)
3.2. Deep Convolutional Encoder–Decoder Networks (EN-DE Nets)
3.3. Deep Convolutional Models with Combinations of Shallow and Deep Connections
4. Saliency Detection Based on a DNN with Combinations of Shallow and Deep Connections
4.1. DNN Structures with Combinations of Shallow and Deep Connections for Saliency Detection
4.1.1. Backbone Networks
4.1.2. Skip-Layer Architecture
4.1.3. Integrations of Multiple Side Outputs
4.1.4. Combinatorial Optimization of Shallow and Deep Connections on Various Side Outputs
4.2. A Saliency Detection Model Based on a DNN with Combinations of Shallow and Deep Connections
4.3. A Transferable Model with Combinations of Shallow and Deep Connections Based on ImageNet Training
4.3.1. Hyperparameters Optimization
4.3.2. Performance Comparison of Backbone Networks
4.3.3. Cross-Validation and Assessments
4.3.4. Ablation Analysis
4.4. Integrated Architecture of Saliency Detection System
5. Performance Analysis and Assessment
5.1. Benchmark Datasets and Evaluation Indices
5.1.1. Benchmark Datasets
5.1.2. Evaluation Indices
5.2. Platform and Implementation Details
5.3. Performance Assessment by Verification on ECSSD
5.4. Performance Assessment by Verification on MSRA-10K
5.5. Performance Assessment by Verification on iCoSeg
6. Discussion and Comments
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Sperling, G. A Brief Overview of Computational Models of Spatial, Temporal, and Feature Visual Attention. In Invariances in Human Information Processing; Routledge: Abingdon, UK, 2018; pp. 143–182. [Google Scholar]
- Liu, T.; Yuan, Z.; Sun, J.; Wang, J.; Zheng, N.; Tang, X.; Shum, H.Y. Learning to detect a salient object. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 353–367. [Google Scholar] [PubMed]
- Borji, A.; Itti, L. State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 185–207. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Borji, A.; Kuo, C.C.J.; Itti, L. Learning a combined model of visual saliency for fixation prediction. IEEE Trans. Image Process. 2016, 25, 1566–1579. [Google Scholar] [CrossRef] [PubMed]
- Liu, N.; Han, J.; Liu, T.; Li, X. Learning to predict eye fixations via multi resolution convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 392–404. [Google Scholar] [CrossRef]
- Xiao, F.; Peng, L.; Fu, L.; Gao, X. Salient object detection based on eye tracking data. Signal Process. 2018, 144, 392–397. [Google Scholar] [CrossRef]
- Ayoub, N.; Gao, Z.; Chen, B.; Jian, M. A synthetic fusion rule for salient region detection under the framework of ds-evidence theory. Symmetry 2018, 10, 183. [Google Scholar] [CrossRef]
- Li, X.; Zhao, L.; Wei, L.; Yang, M.H.; Wu, F.; Zhuang, Y.; Ling, H.; Wang, J. Deep saliency: Multi-task deep neural network model for salient object detection. IEEE Trans. Image Process. 2016, 25, 3919–3930. [Google Scholar] [CrossRef] [PubMed]
- Jiang, H.; Wang, J.; Yuan, Z.; Wu, Y.; Zheng, N.; Li, S. Salient object detection: A discriminative regional feature integration approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2083–2090. [Google Scholar]
- Zhu, D.; Dai, L.; Luo, Y.; Zhang, G.; Shao, X.; Itti, L.; Lu, J. Multi-scale adversarial feature learning for saliency detection. Symmetry 2018, 10, 457. [Google Scholar] [CrossRef]
- Li, G.; Yu, Y. Visual saliency based on multi scale deep features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5455–5463. [Google Scholar]
- Yan, Q.; Xu, L.; Shi, J.; Jia, J. Hierarchical saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1155–1162. [Google Scholar]
- Liu, N.; Han, J. Dhsnet: Deep hierarchical saliency network for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 678–686. [Google Scholar]
- Lee, G.; Tai, Y.W.; Kim, J. Deep saliency with encoded low level distance map and high level features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 660–668. [Google Scholar]
- Tang, Y.; Wu, X. Saliency detection via combining region-level and pixel-level predictions with CNNs. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Switzerland, 2016; pp. 809–825. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Dumoulin, V.; Visin, F. A guide to convolution arithmetic for deep learning. arXiv, 2016; arXiv:1603.07285.1–31. [Google Scholar]
- Tong, N.; Lu, H.; Zhang, Y.; Ruan, X. Salient object detection via global and local cues. Pattern Recognit. 2015, 48, 3258–3267. [Google Scholar] [CrossRef]
- Wang, L.; Lu, H.; Ruan, X.; Yang, M.H. Deep networks for saliency detection via local estimation and global search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3183–3192. [Google Scholar]
- Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, Las Condes, Chile, 11–18 December 2015; pp. 1395–1403. [Google Scholar]
- Borji, A.; Cheng, M.M.; Jiang, H.; Li, J. Salient object detection: A benchmark. IEEE Trans. Image Process. 2015, 24, 5706–5722. [Google Scholar] [CrossRef]
- Li, G.; Yu, Y. Deep contrast learning for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 478–487. [Google Scholar]
- Huang, X.; Shen, C.; Boix, X.; Zhao, Q. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision, Las Condes, Chile, 11–18 December 2015; pp. 262–270. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, NIPS 2012, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Cornia, M.; Baraldi, L.; Serra, G.; Cucchiara, R. A deep multi-level network for saliency prediction. In Proceedings of the 2016 IEEE 23rd International Conference on Pattern Recognition (ICPR), Cancún, Mexico, 4–8 December 2016; pp. 3488–3493. [Google Scholar]
- Hou, Q.; Cheng, M.M.; Hu, X.; Borji, A.; Tu, Z.; Torr, P. Deeply supervised salient object detection with short connections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5300–5309. [Google Scholar]
- Wang, W.; Shen, J. Deep visual attention prediction. IEEE Trans. Image Process 2018, 27, 2368–2378. [Google Scholar] [CrossRef] [PubMed]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE Computer Society: Washington, DC, USA, 2017; pp. 2980–2988. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Scene Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, PP, 1–9. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Mao, X.; Shen, C.; Yang, Y.B. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 2802–2810. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Turan, M.; Almalioglu, Y.; Araujo, H.; Konukoglu, E.; Sitti, M. Deep endovo: A recurrent convolutional neural network (rcnn) based visual odometry approach for endoscopic capsule robots. Neurocomputing 2018, 275, 1861–1870. [Google Scholar] [CrossRef]
- Połap, D.; Winnicka, A.; Serwata, K.; Kęsik, K.; Woźniak, M. An Intelligent System for Monitoring Skin Diseases. Sensors 2018, 18, 2552. [Google Scholar] [CrossRef] [PubMed]
- Babaee, M.; Dinh, D.T.; Rigoll, G. A deep convolutional neural network for video sequence background subtraction. Pattern Recognit. 2018, 76, 635–649. [Google Scholar] [CrossRef]
- Połap, D.; Woźniak, M.; Wei, W.; Damaševičius, R. Multi-threaded learning control mechanism for neural networks. Future Gener. Comput. Syst. 2018, 87, 16–34. [Google Scholar] [CrossRef]
- Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [CrossRef]
- Kheradpisheh, S.R.; Ganjtabesh, M.; Thorpe, S.J.; Masquelier, T. STDP-based spiking deep convolutional neural networks for object recognition. Neural Netw. 2018, 99, 56–67. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Itti, L.; Koch, C.; Niebur, E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1254–1259. [Google Scholar] [CrossRef] [Green Version]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 4278–4284. [Google Scholar]
- Xiao, F.; Deng, W.; Peng, L.; Cao, C.; Hu, K.; Gao, X. Multi-scale deep neural network for salient object detection. IET Image Process. 2018, 12, 2036–2041. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.B.; He, K. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE Computer Society: Washington, DC, USA, 2017; pp. 2261–2269. [Google Scholar]
- Zhao, R.; Ouyang, W.; Li, H.; Wang, X. Saliency detection by multi-context deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1265–1274. [Google Scholar]
- Jetley, S.; Murray, N.; Vig, E. End-to-end saliency mapping via probability distribution prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5753–5761. [Google Scholar]
- Simon, M.; Rodner, E.; Denzler, J. Imagenet pre-trained models with batch normalization. arXiv, 2016; arXiv:1612.01452. [Google Scholar]
- Borji, A.; Itti, L. Exploiting local and global patch rarities for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 478–485. [Google Scholar]
- Shi, J.; Yan, Q.; Xu, L.; Jia, J. Hierarchical image saliency detection on extended CSSD. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 717–729. [Google Scholar] [CrossRef] [PubMed]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv, 2014; arXiv:1412.6980.1–15. [Google Scholar]
- Cheng, M.M.; Mitra, N.J.; Huang, X.; Torr, P.H.; Hu, S.M. Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 569–582. [Google Scholar] [CrossRef] [PubMed]
- Batra, D.; Kowdle, A.; Parikh, D.; Luo, J.; Chen, T. icoseg: Interactive co-segmentation with intelligent scribble guidance. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 3169–3176. [Google Scholar]
- Batra, D.; Kowdle, A.; Parikh, D.; Luo, J.; Chen, T. Interactively co-segmentating topically related images with intelligent scribble guidance. Int. J. Comput. Vis. 2011, 93, 273–292. [Google Scholar] [CrossRef]
- Wang, W.; Shen, J.; Shao, L. Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 2018, 27, 38–49. [Google Scholar] [CrossRef] [PubMed]
- Zhu, W.; Liang, S.; Wei, Y.; Sun, J. Saliency optimization from robust background detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2814–2821. [Google Scholar]
- Li, X.; Lu, H.; Zhang, L.; Ruan, X.; Yang, M.H. Saliency detection via dense and sparse reconstruction. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2976–2983. [Google Scholar]
- Jiang, B.; Zhang, L.; Lu, H.; Yang, C.; Yang, M.H. Saliency detection via absorbing Markov chain. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1665–1672. [Google Scholar]
- Yang, C.; Zhang, L.; Lu, H. Graph-regularized saliency detection with convex-hull-based center prior. IEEE Signal Process. Lett. 2013, 20, 637–640. [Google Scholar] [CrossRef]
- Goferman, S.; Zelnik-Manor, L.; Tal, A. Context-aware saliency detection. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1915–1926. [Google Scholar] [CrossRef] [PubMed]
Index Terms | Validation Accuracy | Validation Loss | |||
---|---|---|---|---|---|
Actual Indices | |||||
Combination Modes | |||||
0.9476 | 0.1840 | 0.8576 | |||
0.9531 | 0.1679 | 0.8598 | |||
0.9602 | 0.1414 | 0.8622 | |||
0.9567 | 0.1445 | 0.8607 | |||
0.9596 | 0.1419 | 0.8616 | |||
0.9619 | 0.1345 | 0.8637 |
Groups | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Indices | ||||||||||||||||||||||
Parameters | ||||||||||||||||||||||
0.0022 | 0.0688 | 0.0074 | 0.0401 | 0.0127 | 0.0532 | 0.0656 | 0.0619 | 0.0008 | 0.0002 | 0.0550 | 0.0071 | 0.0006 | 0.0008 | 0.0693 | 0.0068 | 0.0489 | 0.0219 | 0.0008 | 0.0010 | |||
0.8657 | 0.8432 | 0.7167 | 0.7198 | 0.8952 | 0.9738 | 0.7548 | 0.9983 | 0.7666 | 0.8666 | 0.7999 | 0.7561 | 0.9190 | 0.8981 | 0.7645 | 0.8428 | 0.8589 | 0.7582 | 0.7617 | 0.9000 | |||
0.9995 | 0.9792 | 0.9995 | 0.9281 | 0.9381 | 0.9424 | 0.9996 | 0.9288 | 0.9936 | 0.9998 | 0.9909 | 0.9938 | 0.9138 | 0.9474 | 0.9946 | 0.9998 | 0.9260 | 0.9991 | 0.9601 | 0.9990 | |||
Val. loss | 0.1345 | 0.1358 | 0.1402 | 0.1439 | 0.1366 | 0.1489 | 0.1506 | 0.1398 | 0.1521 | 0.1482 | 0.1539 | 0.1586 | 0.1392 | 0.1421 | 0.1429 | 0.1502 | 0.1383 | 0.1622 | 0.1528 | 0.1374 |
DNN Models | VGG-16 without Pre-Training | VGG-16 with Pre-Training | Resnet-50 without Pre-Training | Resnet-50 with Pre-Training | ||
---|---|---|---|---|---|---|
Actual Indices | ||||||
Index Terms | ||||||
Training accuracy | 0.9231 | 0.9579 | 0.9799 | 0.9892 | ||
Training loss | 0.1823 | 0.1284 | 0.0619 | 0.0335 | ||
Validation accuracy | 0.7674 | 0.8224 | 0.9290 | 0.9619 | ||
Validation loss | 0.4793 | 0.4374 | 0.2226 | 0.1345 |
Kth-Fold | 1-Fold | 2-Fold | 3-Fold | 4-Fold | 5-Fold | 6-Fold | 7-Fold | 8-Fold | 9-Fold | 10-Fold | Mean | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Actual Indices | |||||||||||||
Index Terms | |||||||||||||
Training accuracy | 0.9885 | 0.9888 | 0.9865 | 0.9888 | 0.9903 | 0.9874 | 0.9916 | 0.9915 | 0.9895 | 0.9891 | 0.9892 | ||
Training loss | 0.0357 | 0.0343 | 0.0431 | 0.0349 | 0.0293 | 0.0395 | 0.0257 | 0.0256 | 0.0329 | 0.0338 | 0.0335 | ||
Validation accuracy | 0.9682 | 0.9657 | 0.9659 | 0.9546 | 0.9633 | 0.9581 | 0.9763 | 0.9408 | 0.9622 | 0.9636 | 0.9619 | ||
Validation loss | 0.1510 | 0.1408 | 0.1469 | 0.1228 | 0.1270 | 0.1232 | 0.1592 | 0.1161 | 0.1246 | 0.1329 | 0.1345 |
Methods | IoU |
---|---|
FCN | 0.4515 |
EN-DE (E) | 0.6297 |
Skip layers + EN-DE (S+E) | 0.6418 |
Side outputs + Skip layers + EN-DE (S+S+E) | 0.6911 |
Fusion + Side outputs + Skip layers + EN-DE (F+S+S+E) | 0.7547 |
Datasets | Index Terms | ECSSD | MSRA | ICOSEG | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Actual Indices | IoU | MAE | Time (ms) | IoU | MAE | Time (ms) | IoU | MAE | Time (ms) | ||||||
Methods | |||||||||||||||
DMLN [25] | 0.7897 | 0.6448 | 0.1038 | 51.79 | 0.8682 | 0.7873 | 0.0546 | 50.96 | 0.8690 | 0.7124 | 0.0889 | 51.58 | |||
NDS [55] | 0.8434 | 0.7172 | 0.0815 | 68.79 | 0.8955 | 0.7824 | 0.0479 | 68.32 | 0.8641 | 0.7063 | 0.0912 | 69.16 | |||
NSS [55] | 0.8435 | 0.7228 | 0.0812 | 67.31 | 0.8928 | 0.8029 | 0.0428 | 67.28 | 0.8732 | 0.7266 | 0.0874 | 67.62 | |||
DVA [27] | 0.8182 | 0.6871 | 0.0922 | 53.46 | 0.8288 | 0.7357 | 0.0730 | 53.58 | 0.8999 | 0.7770 | 0.0790 | 53.74 | |||
DSC [26] | 0.8590 | 0.7337 | 0.0745 | 68.39 | 0.9059 | 0.8163 | 0.0636 | 67.82 | 0.9296 | 0.8460 | 0.0717 | 68.31 | |||
Ours | 0.8637 | 0.7547 | 0.0722 | 74.66 | 0.9478 | 0.8470 | 0.0438 | 79.38 | 0.9842 | 0.8810 | 0.0555 | 74.92 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, L.; Qin, S. High Precision Detection of Salient Objects Based on Deep Convolutional Networks with Proper Combinations of Shallow and Deep Connections. Symmetry 2019, 11, 5. https://doi.org/10.3390/sym11010005
Guo L, Qin S. High Precision Detection of Salient Objects Based on Deep Convolutional Networks with Proper Combinations of Shallow and Deep Connections. Symmetry. 2019; 11(1):5. https://doi.org/10.3390/sym11010005
Chicago/Turabian StyleGuo, Lin, and Shiyin Qin. 2019. "High Precision Detection of Salient Objects Based on Deep Convolutional Networks with Proper Combinations of Shallow and Deep Connections" Symmetry 11, no. 1: 5. https://doi.org/10.3390/sym11010005
APA StyleGuo, L., & Qin, S. (2019). High Precision Detection of Salient Objects Based on Deep Convolutional Networks with Proper Combinations of Shallow and Deep Connections. Symmetry, 11(1), 5. https://doi.org/10.3390/sym11010005