Unsupervised 3D Reconstruction with Multi-Measure and High-Resolution Loss
Abstract
:1. Introduction
2. Materials and Method
2.1. Multi-Scale Depth Estimation Based on PatchMatch
2.2. Multi-Measure and High-Resolution Loss
2.2.1. Photometric Consistency Loss
2.2.2. Semantic Consistency Loss
2.2.3. Feature Point Consistency Loss
2.2.4. High-Resolution Loss
3. Results
3.1. Performance Evaluation Based on DTU
3.1.1. Implementation Details
3.1.2. Result on DTU Dataset
3.1.3. Memory and Run-Time Comparison
3.2. Ablation Studies
3.2.1. Effect of Different Loss Modules
3.2.2. Effect of High-Resolution Loss
3.3. Generalization Ability on Tanks and Temples
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mi, Q.; Gao, T. 3D reconstruction based on the depth image: A review. In Proceedings of the International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, Kitakyushu, Japan, 29 June–1 July 2022; pp. 172–183. [Google Scholar]
- Jalayer, S.; Sharifi, A.; Abbasi-Moghadam, D.; Tariq, A.; Qin, S. Modeling and predicting land use land cover spatiotemporal changes: A case study in chalus watershed, Iran. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 5496–5513. [Google Scholar] [CrossRef]
- Darmon, F.; Bascle, B.; Devaux, J.C.; Monasse, P.; Aubry, M. Deep multi-view stereo gone wild. In Proceedings of the International Conference on 3D Vision (3DV), Prague, Czech Republic, 1–3 December 2021; pp. 484–493. [Google Scholar]
- Sinha, S.N.; Mordohai, P.; Pollefeys, M. Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–21 October 2007; pp. 1–8. [Google Scholar]
- Ulusoy, A.O.; Black, M.J.; Geiger, A. Semantic multi-view stereo: Jointly estimating objects and voxels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4531–4540. [Google Scholar]
- Furukawa, Y.; Ponce, J. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1362–1376. [Google Scholar] [CrossRef]
- Locher, A.; Perdoch, M.; Van Gool, L. Progressive prioritized multi-view stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3244–3252. [Google Scholar]
- Galliani, S.; Lasinger, K.; Schindler, K. Massively parallel multiview stereopsis by surface normal diffusion. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 873–881. [Google Scholar]
- Xu, Q.; Tao, W. Multi-scale geometric consistency guided multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5483–5492. [Google Scholar]
- Aanæs, H.; Jensen, R.R.; Vogiatzis, G.; Tola, E.; Dahl, A.B. Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. 2016, 120, 153–168. [Google Scholar] [CrossRef] [Green Version]
- Knapitsch, A.; Park, J.; Zhou, Q.Y.; Koltun, V. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Trans. Graph. 2017, 36, 1–13. [Google Scholar] [CrossRef]
- Ji, M.; Gall, J.; Zheng, H.; Liu, Y.; Fang, L. Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2307–2315. [Google Scholar]
- Kar, A.; Häne, C.; Malik, J. Learning a multi-view stereo machine. arXiv 2017, arXiv:1708.05375. [Google Scholar]
- Yao, Y.; Luo, Z.; Li, S.; Fang, T.; Quan, L. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 767–783. [Google Scholar]
- Yu, Z.; Gao, S. Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1949–1958. [Google Scholar]
- Yang, J.; Mao, W.; Alvarez, J.M.; Liu, M. Cost volume pyramid based depth inference for multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4877–4886. [Google Scholar]
- Khot, T.; Agrawal, S.; Tulsiani, S.; Mertz, C.; Lucey, S.; Hebert, M. Learning unsupervised multi-view stereopsis via robust photometric consistency. arXiv 2019, arXiv:1905.02706. [Google Scholar]
- Dai, Y.; Zhu, Z.; Rao, Z.; Li, B. Mvs2: Deep unsupervised multi-view stereo with multi-view symmetry. In Proceedings of the International Conference on 3D Vision (3DV), Québec City, QC, Canada, 16–19 September 2019; pp. 1–8. [Google Scholar]
- Huang, B.; Yi, H.; Huang, C.; He, Y.; Liu, J.; Liu, X. M3VSNet: Unsupervised multi-metric multi-view stereo network. In Proceedings of the International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 3163–3167. [Google Scholar]
- Xu, H.; Zhou, Z.; Qiao, Y.; Kang, W.; Wu, Q. Self-supervised multi-view stereo via effective co-segmentation and data-augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 2–9 February 2021; pp. 3030–3038. [Google Scholar]
- Yao, Y.; Luo, Z.; Li, S.; Shen, T.; Fang, T.; Quan, L. Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5525–5534. [Google Scholar]
- Luo, K.; Guan, T.; Ju, L.; Huang, H.; Luo, Y. P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10452–10461. [Google Scholar]
- Gu, X.; Fan, Z.; Zhu, S.; Dai, Z.; Tan, F.; Tan, P. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2495–2504. [Google Scholar]
- Chen, R.; Han, S.; Xu, J.; Su, H. Point-based multi-view stereo network. In Proceedings of the IEEE/CVF international conference on computer vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1538–1547. [Google Scholar]
- Wang, F.; Galliani, S.; Vogel, C.; Speciale, P.; Pollefeys, M. Patchmatchnet: Learned multi-view patchmatch stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14194–14203. [Google Scholar]
- Barnes, C.; Shechtman, E.; Finkelstein, A.; Goldman, D.B. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph. (TOG) 2009, 28, 24–33. [Google Scholar] [CrossRef]
- Yang, J.; Alvarez, J.M.; Liu, M. Self-supervised learning of depth inference for multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7526–7534. [Google Scholar]
- Hui, T.W.; Loy, C.C.; Tang, X. Depth map super-resolution by deep multi-scale guidance. In Proceedings of the European conference on computer vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 353–369. [Google Scholar]
- Xu, H.; Zhou, Z.; Wang, Y.; Kang, W.; Sun, B.; Li, H.; Qiao, Y. Digging into uncertainty in self-supervised multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 6078–6087. [Google Scholar]
- Bansal, M.; Kumar, M.; Kumar, M. 2D object recognition: A comparative analysis of SIFT, SURF and ORB feature descriptors. Multimed. Tools Appl. 2021, 80, 18839–18857. [Google Scholar] [CrossRef]
- Tola, E.; Strecha, C.; Fua, P. Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach. Vis. Appl. 2012, 23, 903–920. [Google Scholar] [CrossRef] [Green Version]
- Campbell, N.D.; Vogiatzis, G.; Hernández, C.; Cipolla, R. Using multiple hypotheses to improve depth-maps for multi-view stereo. In Proceedings of the European Conference on Computer Vision, Marseille, France, 12–18 October 2008; pp. 766–779. [Google Scholar]
Method | Acc. (mm) | Comp. (mm) | Overall (mm) | |
---|---|---|---|---|
Geo. | Furu [6] | 0.613 | 0.941 | 0.777 |
Tola [31] | 0.342 | 1.190 | 0.766 | |
Camp [32] | 0.835 | 0.554 | 0.694 | |
Gipuma [8] | 0.283 | 0.873 | 0.578 | |
Sup. | Surfacenet [12] | 0.450 | 1.040 | 0.745 |
MVSNet [14] | 0.444 | 0.741 | 0.592 | |
R-MVSNet [21] | 0.383 | 0.452 | 0.417 | |
PatchmatchNet [25] | 0.427 | 0.277 | 0.352 | |
UnSup. | Unsup_MVS [17] | 0.881 | 1.073 | 0.977 |
MVS2 [18] | 0.760 | 0.515 | 0.637 | |
M3VSNet [19] | 0.636 | 0.531 | 0.583 | |
Ours | 0.513 | 0.489 | 0.501 |
Method | Semantic Consistency Loss | Feature Point Consistency Loss | High-Resolution Loss | Acc. (mm) | Comp. (mm) | Overall (mm) |
---|---|---|---|---|---|---|
(a) | × | × | × | 0.714 | 0.662 | 0.688 |
(b) | √ | × | × | 0.696 | 0.534 | 0.615 |
(c) | √ | √ | × | 0.613 | 0.519 | 0.566 |
(d) | √ | √ | √ | 0.513 | 0.489 | 0.501 |
Method | Acc. (mm) | Comp. (mm) | Overall (mm) |
---|---|---|---|
(a) | 0.613 | 0.519 | 0.566 |
(b) | 0.581 | 0.511 | 0.546 |
(c) | 0.537 | 0.505 | 0.521 |
(d) | 0.513 | 0.489 | 0.501 |
Method | Mean | Family | Francis | Horse | Lighthouse | M60 | Panther | Playground | Train |
---|---|---|---|---|---|---|---|---|---|
MVS2 | 37.27 | 47.74 | 21.55 | 19.50 | 44.54 | 44.86 | 46.32 | 43.38 | 29.72 |
M3VSNet | 37.67 | 47.74 | 24.38 | 18.74 | 44.42 | 43.45 | 44.95 | 47.39 | 30.31 |
Ours | 43.91 | 49.00 | 48.56 | 25.74 | 49.52 | 39.77 | 42.44 | 51.14 | 45.15 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zheng, Y.; Luo, J.; Chen, W.; Zhang, Y.; Sun, H.; Pan, Z. Unsupervised 3D Reconstruction with Multi-Measure and High-Resolution Loss. Sensors 2023, 23, 136. https://doi.org/10.3390/s23010136
Zheng Y, Luo J, Chen W, Zhang Y, Sun H, Pan Z. Unsupervised 3D Reconstruction with Multi-Measure and High-Resolution Loss. Sensors. 2023; 23(1):136. https://doi.org/10.3390/s23010136
Chicago/Turabian StyleZheng, Yijie, Jianxin Luo, Weiwei Chen, Yanyan Zhang, Haixun Sun, and Zhisong Pan. 2023. "Unsupervised 3D Reconstruction with Multi-Measure and High-Resolution Loss" Sensors 23, no. 1: 136. https://doi.org/10.3390/s23010136
APA StyleZheng, Y., Luo, J., Chen, W., Zhang, Y., Sun, H., & Pan, Z. (2023). Unsupervised 3D Reconstruction with Multi-Measure and High-Resolution Loss. Sensors, 23(1), 136. https://doi.org/10.3390/s23010136