Unsupervised Monocular Depth Estimation Method Based on Uncertainty Analysis and Retinex Algorithm
Abstract
:1. Introduction
- An unsupervised depth estimation network based on uncertainty is proposed to improve the problem of low prediction depth accuracy in monocular depth estimation. This method of uncertainty learning solves the problem in which the convolutional neural network currently used for monocular depth estimation has a strong expressive ability but cannot evaluate the reliability of the output result. By modeling the uncertainty, the confidence of the estimated depth can be predicted while the model prediction accuracy is improved and the uncertainty of the output result is quantified.
- Retinex lighting theory is used to construct the photometric loss function to solve the interference problem caused by dynamic objects in the scene.
2. Materials and Methods
2.1. Photometric Loss
2.2. Smoothness Loss
2.3. Uncertainty Analysis
3. Results
3.1. Experimental Environment
3.2. Network Architecture
3.3. Evaluation Index
3.4. Comparisons with the State-of-the-Art Methods
3.5. Ablation Study
4. Discussion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth Map Prediction from a Single Image Using a Multi-Scale Deep Network. In Proceedings of the Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2366–2374. [Google Scholar]
- Eigen, D.; Fergus, R. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 11–18 December 2016. [Google Scholar]
- Liu, F.; Shen, C.; Lin, G. Deep Convolutional Neural Fields for Depth Estimation from a Single Image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Liu, F.; Shen, C.; Lin, G.; Reid, I. Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 2024–2039. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Trigueiros, P.; Ribeiro, F.; Reis, L.P. A Comparison of Machine Learning Algorithms Applied to Hand Gesture Recognition. In Proceedings of the 7th Iberian Conference on Information Systems and Technologies, Mardin, Spain, 20–23 June 2012. [Google Scholar]
- Li, N.B.; Shen, N.C.; Dai, N.Y.; Hengel, A.V.D.; He, N.M. Depth and Surface Normal Estimation from Monocular Images Using Regression on Deep Features and Hierarchical Crfs. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N. Deeper Depth Prediction with Fully Convolutional Residual Networks. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016. [Google Scholar]
- Cao, Y.; Wu, Z.; Circuits, C. Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks. IEEE Trans. Circuits Syst. Video Technol. 2018, 28, 3174–3182. [Google Scholar] [CrossRef] [Green Version]
- Xu, D.; Elisa, R.; Ouyang, W.L. Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Arsalan, M.; Hamed, P.; Jana, K. Joint semantic segmentation and depth estimation with deep convolutional networks. In Proceedings of the 4th International Conference on 3D Vision, Stanford, CA, USA, 25–28 October 2016. [Google Scholar]
- Zhang, Z.Y.; Alexander, G.S.; Sanja, F. Monocular object instance segmentation and depth ordering with CNNs. In Proceedings of the 15th International Conference on Computer Vision, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Liu, B.Y.; Stephen, G.; Stephen, G. Single image depth estimation from predicted semantic labels. In Proceedings of the 23th IEEE Conference on Computer Vision and Pattern, San Francisco, CA, USA, 13–18 June 2010. [Google Scholar]
- Wang, P.; Shen, X.H.; Lin, Z. Towards unified depth and semantic prediction from a single image. In Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Zhou, T.; Brown, M.; Snavely, N.; Lowe, D.G. Unsupervised Learning of Depth and Ego-Motion from Video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Yin, Z.; Shi, J. Geonet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Mahjourian, R.; Wicke, M.; Angelova, A. Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3d Geometric Constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Clement, G.; Oisin, M.A.; Gabriel, J.B. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Zhan, H.; Garg, R.; Weerasekera, C.S.; Li, K.; Agarwal, H.; Reid, I. Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Garg, R.; Bg, V.K.; Carneiro, G.; Reid, I. Unsupervised Cnn for Single View Depth Estimation: Geometry to the Rescue. In Proceedings of the European Conference on Computer Vision 2016, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Godard, C.; Mac Aodha, O.; Brostow, G.J. Unsupervised Monocular Depth Estimation with Left-Right Consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Kuznietsov, Y.; Stuckler, J.; Leibe, B. Semi-supervised deep learning for monocular depth map prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? In Proceedings of the Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Kendall, A.; Gal, Y.; Cipolla, R. Multi-task learning using uncertainty to weight losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. In Proceedings of the Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets Robotics:The kitti dataset. Int. J. Robot. Res. (IJRR) 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Liu, F.; Shen, C.; Lin, G.; Reid, I. Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Recognit. Mach. Intell. PAMI 2016, 38, 2024–2039. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, C.; Miguel Buenaposada, J.; Zhu, R.; Lucey, S. Learning depth from monocular videos using direct methods. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Zou, Y.; Luo, Z.; Huang, J.B. DF-Net: Unsupervised joint learning of depth and flow using cross-task consistency. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Ranjan, A.; Jampani, V.; Kim, K.; Sun, D.; Wulff, J.; Black, M.J. Competitive Collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
Method | AbsRel | SqRel | RMSE | RMSlog | <1.25 | <1.252 | <1.253 |
---|---|---|---|---|---|---|---|
Eigen, D. et al. [2] | 0.203 | 1.548 | 6.307 | 0.282 | 0.702 | 0.890 | 0.958 |
Liu et al. [26] | 0.202 | 1.614 | 6.523 | 0.275 | 0.678 | 0.895 | 0.965 |
Garg et al. [19] | 0.152 | 1.226 | 5.849 | 0.246 | 0.784 | 0.921 | 0.967 |
Kuznietsov et al. [21] | 0.113 | 0.741 | 4.621 | 0.189 | 0.862 | 0.960 | 0.986 |
Godard et al. [23] | 0.148 | 1.344 | 5.927 | 0.247 | 0.803 | 0.922 | 0.964 |
Zhan et al. [18] | 0.144 | 1.391 | 5.869 | 0.241 | 0.803 | 0.928 | 0.969 |
Zhou et al. [14] | 0.208 | 1.768 | 6.856 | 0.283 | 0.678 | 0.885 | 0.957 |
Mahjourian et al. [16] | 0.163 | 1.240 | 6.220 | 0.250 | 0.762 | 0.916 | 0.968 |
Wang et al. [27] | 0.151 | 1.257 | 5.583 | 0.228 | 0.810 | 0.936 | 0.974 |
Geonet et al. [15] | 0.155 | 1.296 | 5.587 | 0.233 | 0.806 | 0.933 | 0.973 |
DF-Net [28] | 0.150 | 1.124 | 5.507 | 0.223 | 0.806 | 0.933 | 0.973 |
CC [29] | 0.140 | 1.070 | 5.326 | 0.217 | 0.826 | 0.941 | 0.975 |
Ours | 0.112 | 0.792 | 4.526 | 0.191 | 0.843 | 0.965 | 0.967 |
Method | AbsRel | SqRel | RMSE | RMSlog | <1.25 | <1.252 | <1.253 |
---|---|---|---|---|---|---|---|
Basic | 0.161 | 1.225 | 5.765 | 0.237 | 0.780 | 0.927 | 0.972 |
Basic + Retinex | 0.132 | 0.905 | 4.689 | 0.196 | 0.791 | 0.935 | 0.974 |
Basic + Uncertainty | 0.152 | 0.836 | 4.634 | 0.199 | 0.801 | 0.942 | 0.965 |
Basic + Retinex + Uncertainty | 0.112 | 0.792 | 4.526 | 0.191 | 0.843 | 0.965 | 0.967 |
Method | AbsRel | SqRel | RMSE | RMSlog | <1.25 | <1.252 | <1.253 |
---|---|---|---|---|---|---|---|
Basic | 0.151 | 1.154 | 5.716 | 0.232 | 0.798 | 0.930 | 0.972 |
Basic + Retinex | 0.129 | 1.023 | 4.785 | 0.196 | 0.802 | 0.923 | 0.974 |
Basic + Uncertainty | 0.145 | 0.866 | 4.854 | 0.201 | 0.800 | 0.915 | 0.975 |
Basic + Retinex + Uncertainty | 0.127 | 0.892 | 4.625 | 0.189 | 0.822 | 0.939 | 0.977 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, C.; Qi, C.; Song, S.; Xiao, F. Unsupervised Monocular Depth Estimation Method Based on Uncertainty Analysis and Retinex Algorithm. Sensors 2020, 20, 5389. https://doi.org/10.3390/s20185389
Song C, Qi C, Song S, Xiao F. Unsupervised Monocular Depth Estimation Method Based on Uncertainty Analysis and Retinex Algorithm. Sensors. 2020; 20(18):5389. https://doi.org/10.3390/s20185389
Chicago/Turabian StyleSong, Chuanxue, Chunyang Qi, Shixin Song, and Feng Xiao. 2020. "Unsupervised Monocular Depth Estimation Method Based on Uncertainty Analysis and Retinex Algorithm" Sensors 20, no. 18: 5389. https://doi.org/10.3390/s20185389
APA StyleSong, C., Qi, C., Song, S., & Xiao, F. (2020). Unsupervised Monocular Depth Estimation Method Based on Uncertainty Analysis and Retinex Algorithm. Sensors, 20(18), 5389. https://doi.org/10.3390/s20185389