Improving Accuracy and Efficiency of Monocular Depth Estimation in Power Grid Environments Using Point Cloud Optimization and Knowledge Distillation
Abstract
:1. Introduction
2. Related Work
2.1. Multi-View Stereo
2.2. Monocular Depth Estimation
2.3. Knowledge Distillation
3. Methods
3.1. MaskModule
3.2. DepthModule
3.3. Knowledge Distillation Architecture
3.4. Depth Optimization
Algorithm 1 Depth Image Optimization |
Require Depth Image1: DepthImageOptimization |
spatial_filter() |
normal_filter() |
color_filter() |
return 2: spatialFilter |
statistical_filter() |
return 3: normalFilter |
depth_map_to_point_cloud() |
calculate_normals() |
filter_with_normals(, ) |
return 4: colorFilter |
extract_colors |
Color threshold |
{Simple color consistency check: retain if colors are consistent, discard otherwise} |
{Threshold value used to define color consistency} |
return |
4. Experiments
4.1. Datasets
4.2. Evaluation Metrics
4.3. Training Strategy
4.4. Results
4.5. Ablation Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Thrun, S.; Burgard, W.; Fox, D. A real-time algorithm for mobile robot mapping with applications to multi-robot and 3D mapping. In Proceedings of the 2000 ICRA. Millennium Conference, IEEE International Conference on Robotics and Automation, Symposia Proceedings (Cat. No.00CH37065), Online, 24–28 April 2000; Volume 1, pp. 321–328. [Google Scholar] [CrossRef]
- Kutulakos, K.N.; Seitz, S.M. A theory of shape by space carving. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 December 1999; IEEE: Piscataway, NJ, USA, 1999; Volume 1, pp. 307–314. [Google Scholar]
- Tomasi, C.; Kanade, T. Shape and motion from image streams: A factorization method. Proc. Natl. Acad. Sci. USA 1993, 90, 9795–9802. [Google Scholar] [CrossRef] [PubMed]
- Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
- Barnes, C.; Shechtman, E.; Finkelstein, A.; Goldman, D.B. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 2009, 28, 24. [Google Scholar] [CrossRef]
- Campos, C.; Elvira, R.; Rodriguez, J.J.G.; Montiel, J.M.M.; D. Tardos, J. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1106–1114. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Wu, C.Y.; Wang, J.; Hall, M.; Neumann, U.; Su, S. Toward Practical Monocular Indoor Depth Estimation. arXiv 2022, arXiv:2112.02306. [Google Scholar]
- Zhao, Q.; Gao, X.; Li, J.; Luo, L. Optimization algorithm for point cloud quality enhancement based on statistical filtering. J. Sens. 2021, 2021, 1–10. [Google Scholar] [CrossRef]
- Wimbauer, F.; Yang, N.; von Stumberg, L.; Zeller, N.; Cremers, D. MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera. arXiv 2021, arXiv:2011.11814. [Google Scholar]
- Longuet-Higgins, H.C. A computer algorithm for reconstructing a scene from two projections. Nature 1981, 293, 133–135. [Google Scholar] [CrossRef]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Vomputer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Curless, B.; Levoy, M. A volumetric method for building complex models from range images. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, New Tork, NY, USA, 1 August 1996; pp. 303–312. [Google Scholar]
- Seitz, S.M.; Curless, B.; Diebel, J.; Scharstein, D.; Szeliski, R. A comparison and evaluation of multi-view stereo reconstruction algorithms. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; IEEE: Picataway, NJ, USA, 2006; Volume 1, pp. 519–528. [Google Scholar]
- Newcombe, R.A.; Izadi, S.; Hilliges, O.; Molyneaux, D.; Kim, D.; Davison, A.J.; Kohi, P.; Shotton, J.; Hodges, S.; Fitzgibbon, A. Kinectfusion: Real-time dense surface mapping and tracking. In Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality, Basel, Switzerland, 26–29 October 2011; IEEE: Picataway, NJ, USA, 2011; pp. 127–136. [Google Scholar]
- Kim, H.; Leutenegger, S.; Davison, A.J. Real-time 3D reconstruction and 6-DoF tracking with an event camera. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part VI 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 349–364. [Google Scholar]
- Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Corfu, Greece, 20–25 September 1999; IIEEE: Picataway, NJ, USA, 1999; Volume 2, pp. 1150–1157. [Google Scholar]
- Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Proceedings, Part I 9. Springer: Berlin/Heidelberg, Germany, 2006; pp. 404–417. [Google Scholar]
- Lee, L.H.; Braud, T.; Zhou, P.; Wang, L.; Xu, D.; Lin, Z.; Kumar, A.; Bermejo, C.; Hui, P. All one needs to know about metaverse: A complete survey on technological singularity, virtual ecosystem, and research agenda. arXiv 2021, arXiv:2110.05352. [Google Scholar]
- Shen, S. Accurate multiple view 3d reconstruction using patch-based stereo for large-scale scenes. IEEE Trans. Image Process. 2013, 22, 1901–1914. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Picataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
- Huang, P.H.; Matzen, K.; Kopf, J.; Ahuja, N.; Huang, J.B. DeepMVS: Learning Multi-view Stereopsis. arXiv 2018, arXiv:1804.00650. [Google Scholar]
- Schönberger, J.L.; Frahm, J.M. Structure-from-Motion Revisited. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Yao, Y.; Luo, Z.; Li, S.; Fang, T.; Quan, L. MVSNet: Depth Inference for Unstructured Multi-view Stereo. arXiv 2018, arXiv:1804.02505. [Google Scholar]
- Wang, F.; Galliani, S.; Vogel, C.; Speciale, P.; Pollefeys, M. PatchmatchNet: Learned Multi-View Patchmatch Stereo. arXiv 2020, arXiv:2012.01411. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. arXiv 2020, arXiv:2003.08934. [Google Scholar] [CrossRef]
- Wang, C.; Miguel Buenaposada, J.; Zhu, R.; Lucey, S. Learning Depth From Monocular Videos Using Direct Methods. In Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
- Pilzer, A.; Lathuilière, S.; Sebe, N.; Ricci, E. Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation. arXiv 2019, arXiv:1903.04202. [Google Scholar]
- Wang, Y.; Li, X.; Shi, M.; Xian, K.; Cao, Z. Knowledge distillation for fast and accurate monocular depth estimation on mobile devices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2457–2465. [Google Scholar]
- Ren, W.; Wang, L.; Piao, Y.; Zhang, M.; Lu, H.; Liu, T. Adaptive co-teaching for unsupervised monocular depth estimation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 89–105. [Google Scholar]
- Feng, Z.; Yang, L.; Jing, L.; Wang, H.; Tian, Y.; Li, B. Disentangling object motion and occlusion for unsupervised multi-frame monocular depth. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 228–244. [Google Scholar]
- Li, H.; Gordon, A.; Zhao, H.; Casser, V.; Angelova, A. Unsupervised monocular depth learning in dynamic scenes. In Proceedings of the Conference on Robot Learning, London, UK, 8–11 November 2021; PMLR: Londin, UK, 2021; pp. 1908–1917. [Google Scholar]
- Hui, T.W. RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes. arXiv 2023, arXiv:2303.04456. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
- Newcombe, R.A.; Lovegrove, S.J.; Davison, A.J. DTAM: Dense tracking and mapping in real-time. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2320–2327. [Google Scholar] [CrossRef]
- Godard, C.; Aodha, O.M.; Firman, M.; Brostow, G. Digging Into Self-Supervised Monocular Depth Estimation. arXiv 2019, arXiv:1806.01260. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
- Mallya, A.; Lazebnik, S. PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning. arXiv 2018, arXiv:1711.05769. [Google Scholar]
- Wu, Z.; Li, Z.; Fan, Z.G.; Wu, Y.; Wang, X.; Tang, R.; Pu, J. ADU-Depth: Attention-based Distillation with Uncertainty Modeling for Depth Estimation. In Proceedings of the Conference on Robot Learning, Auckland, New Zealand, 14–18 December 2023; PMLR: London, UK, 2023; pp. 3167–3179. [Google Scholar]
- Feng, C.; Chen, Z.; Zhang, C.; Hu, W.; Li, B.; Lu, F. Iterdepth: Iterative residual refinement for outdoor self-supervised multi-frame monocular depth estimation. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 329–341. [Google Scholar] [CrossRef]
- Bhat, S.F.; Birkl, R.; Wofk, D.; Wonka, P.; Müller, M. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv 2023, arXiv:2302.12288. [Google Scholar]
- Ke, B.; Obukhov, A.; Huang, S.; Metzger, N.; Daudt, R.C.; Schindler, K. Repurposing diffusion-based image generators for monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 9492–9502. [Google Scholar]
- Li, Z.; Bhat, S.F.; Wonka, P. Patchfusion: An end-to-end tile-based framework for high-resolution monocular metric depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 10016–10025. [Google Scholar]
Model | AbsRel | SqRel | RMSE | RMSLE | Model Size (MB) |
---|---|---|---|---|---|
PackNet [43] | 0.080 | 0.331 | 2.914 | 0.124 | 328.0 |
MonoRec [13] | 0.050 | 0.295 | 2.266 | 0.082 | 473.0 |
ADU-Depth [44] | 0.077 | 0.290 | 2.723 | 0.113 | 341.0 |
IterDepth [45] | 0.103 | 1.160 | 3.968 | 0.166 | 502.0 |
ZoeDepth [46] | 0.061 | 0.726 | 3.218 | 0.189 | 522.1 |
Marigold [47] | 0.057 | 0.662 | 1.028 | 0.206 | 419.2 |
PatchFusion [48] | 0.046 | 0.560 | 2.656 | 0.126 | 312.9 |
KD-MonoRec | 0.043 | 0.281 | 2.259 | 0.073 | 289.0 |
Model | AbsRel | SqRel | RMSE | RMSLE |
---|---|---|---|---|
MonoRec | 0.050 | 0.295 | 2.266 | 0.082 |
KDM w/o KD | 0.048 | 0.291 | 2.237 | 0.077 |
KDM w/o Point Cloud Optimization | 0.053 | 0.302 | 2.287 | 0.093 |
KD-MonoRec | 0.043 | 0.281 | 2.259 | 0.073 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, J.; Zhang, K.; Xu, X.; Liu, S.; Wu, S.; Huang, Z.; Li, L. Improving Accuracy and Efficiency of Monocular Depth Estimation in Power Grid Environments Using Point Cloud Optimization and Knowledge Distillation. Energies 2024, 17, 4068. https://doi.org/10.3390/en17164068
Xiao J, Zhang K, Xu X, Liu S, Wu S, Huang Z, Li L. Improving Accuracy and Efficiency of Monocular Depth Estimation in Power Grid Environments Using Point Cloud Optimization and Knowledge Distillation. Energies. 2024; 17(16):4068. https://doi.org/10.3390/en17164068
Chicago/Turabian StyleXiao, Jian, Keren Zhang, Xianyong Xu, Shuai Liu, Sheng Wu, Zhihong Huang, and Linfeng Li. 2024. "Improving Accuracy and Efficiency of Monocular Depth Estimation in Power Grid Environments Using Point Cloud Optimization and Knowledge Distillation" Energies 17, no. 16: 4068. https://doi.org/10.3390/en17164068
APA StyleXiao, J., Zhang, K., Xu, X., Liu, S., Wu, S., Huang, Z., & Li, L. (2024). Improving Accuracy and Efficiency of Monocular Depth Estimation in Power Grid Environments Using Point Cloud Optimization and Knowledge Distillation. Energies, 17(16), 4068. https://doi.org/10.3390/en17164068