Large-Scale 3D Reconstruction from Multi-View Imagery: A Comprehensive Review
Abstract
:1. Introduction
2. Traditional Methods
2.1. Sparse Reconstruction: SfM
2.1.1. Incremental SfM
- Sensitivity to the selection of initial image pairs, which limits the quality of reconstruction to the initial pairs chosen.
- The accumulation of errors as new images are added, resulting in the scene-drift phenomenon.
- Incremental SfM is an iterative process where each image undergoes bundle adjustment optimization, leading to a significant number of redundant computations and lower reconstruction efficiency.
2.1.2. Global SfM
- Global SfM aims to optimize the camera poses and 3D scene structure simultaneously, ensuring that the entire reconstruction is globally consistent. This results in more accurate and reliable reconstructions.
- Global SfM typically employs optimization techniques such as global bundle adjustment, allowing it to provide high-precision estimates of camera parameters and 3D point clouds.
- Global SfM is typically suitable for large-scale scenes.
- Global SfM methods are computationally intensive and may require significant amounts of time and computational resources, especially for large datasets with many images and 3D points.
- The global camera position estimation results are unstable.
- Global SfM can be sensitive to outliers in the data. If there are incorrect correspondences or noisy measurements, they can have a significant impact on the global optimization process.
2.1.3. Distributed SfM
2.2. Dense Reconstruction: Stereo Matching and MVS
2.2.1. Stereo Matching
Local Matching Method
Global Matching Method
Semi-Global Matching Method
2.2.2. Multi-View Stereo
3. Learning-Based Methods
3.1. SfM with Deep Learning
3.2. Stereo Matching with Deep Learning
3.2.1. Non-End-to-End Methods
3.2.2. End-to-End Methods
3.3. MVS with Deep Learning
3.4. Neural Radiance Fields
- Accurate six-DoF camera pose estimation;
- Normalization of lighting conditions to avoid overexposed scenes;
- Handling open outdoor scenes and dynamic objects;
- Striking a balance between accuracy and computational efficiency.
4. Datasets and Evaluation Metrics
4.1. Structure from Motion
4.1.1. Datasets
4.1.2. Evaluation Metrics
- Registered: The more registered images there are, the more information is used in SfM reconstruction, which indirectly indicates the accurate reconstruction of the points because the reconstruction registration depends on the accuracy of the intermediate process points.
- Points: The more points there are in the sparse point cloud, the higher the degree of matching between the poses of the camera and the 2D points because the accuracy of triangulation depends on both of the above.
- Track: The number of 2D points corresponding to each 3D point. The longer the trajectory of the point, the more information is used, which indirectly means that the accuracy is high.
- Reprojection Error: The average distance error between the position of each 3D point projected to each frame with the poses and the position of the actual detected 2D point. The smaller the reprojection error, the higher the accuracy of the overall structure.
4.2. Stereo Matching
4.2.1. Datasets
4.2.2. Evaluation Metrics
- The false matching rate is
- MAE:
- RMSE:
4.3. Multi-View Stereo
4.3.1. Datasets
4.3.2. Evaluation Metrics
- Accuracy: For each estimated 3D point, a true 3D point is found within a certain threshold, and the final matching ratio is the accuracy. It should be noted that, since the ground truth of the point cloud itself is incomplete, it is necessary to estimate the unobservable part of the ground truth first and ignore it when estimating the accuracy.
- Completeness: The nearest estimated 3D point is found within a certain threshold for each true 3D point, and the final matching ratio is the completeness.
- F1-Score (F1-Score): There is a trade-off between the metrics of accuracy and completeness because points can be filled in the entire space to achieve 100% completeness, or only very few absolutely accurate points can be reserved to obtain a very high accuracy index. Therefore, the final evaluation metrics need to integrate both of the above. Assuming that the accuracy is p and the completeness is r, the F1-score is their harmonic mean, i.e., .
Methods | Mean | Family | Francis | Horse | Lighthouse | M60 | Panther | Playground | Train |
---|---|---|---|---|---|---|---|---|---|
MVSNet [104] | 43.48 | 55.99 | 28.55 | 25.07 | 50.79 | 53.96 | 50.86 | 47.9 | 34.69 |
RMVSNet [105] | 48.4 | 69.96 | 46.65 | 32.59 | 42.95 | 51.88 | 48.8 | 52 | 42.38 |
PointMVSNet [106] | 48.27 | 61.79 | 41.15 | 34.2 | 50.79 | 51.97 | 50.85 | 52.38 | 43.06 |
P-MVSNet [107] | 55.62 | 70.04 | 44.64 | 40.22 | 65.2 | 55.08 | 55.17 | 60.37 | 54.29 |
MVSCRF [108] | 45.73 | 59.83 | 30.6 | 29.93 | 51.15 | 50.61 | 51.45 | 52.6 | 39.68 |
PVA-MVSNet [109] | 54.46 | 69.36 | 46.8 | 46.01 | 55.74 | 57.23 | 54.75 | 56.7 | 49.06 |
Fast-MVSNet [110] | 47.39 | 65.18 | 39.59 | 34.98 | 47.81 | 49.16 | 46.2 | 53.27 | 42.91 |
CasMVSNet [111] | 56.84 | 76.37 | 58.45 | 46.26 | 55.81 | 56.11 | 54.06 | 57.18 | 49.51 |
CVP-MVSNet [112] | 54.03 | 76.5 | 47.74 | 36.34 | 55.12 | 57.28 | 54.28 | 57.43 | 47.54 |
DSC-MVSNet [121] | 53.48 | 68.06 | 47.43 | 41.6 | 54.96 | 56.73 | 53.86 | 53.46 | 51.71 |
vis-MVSNet [122] | 60.03 | 77.4 | 60.23 | 47.07 | 63.44 | 62.21 | 57.28 | 60.54 | 52.07 |
AACVP-MVSNet [124] | 58.39 | 78.71 | 57.85 | 50.34 | 52.76 | 59.73 | 54.81 | 57.98 | 54.94 |
4.4. Neural Radiance Fields
4.4.1. Datasets
4.4.2. Evaluation Metrics
- Illumination:
- Contrast:
- Structural Score:
- SSIM:
4.5. Comprehensive Datasets
5. Results and Discussion
- Reconstructing areas with texture repetition or weak textures, such as lakes and walls, often leads to reconstruction failures and holes in the reconstructed models. The accuracy of the reconstruction of fine details of objects is still insufficient.
- The construction of datasets for large-scale outdoor scenes is crucial for the development of 3D reconstruction techniques. Currently, there is a scarcity of dedicated datasets for large-scale outdoor scenes, especially city-level real-world scenes.
- The current methods for the 3D reconstruction of large-scale scenes are time-intensive and unable to facilitate real-time reconstruction. Despite the implementation of strategies such as scene partitioning during training and the utilization of computing clusters to expedite the process, these methods still fall short of achieving the efficiency levels required for real-time industrial applications.
- Outdoor scenes contain large numbers of dynamic objects that can significantly impact processes such as image feature matching and camera pose estimation, leading to a decrease in the accuracy of the reconstructed models.
- Addressing the issue of regions with weak textures: Previous studies have focused on incorporating semantic information in indoor scenes to recognize and constrain weak-texture areas, thereby improving reconstruction accuracy. However, in the context of the reconstruction of large-scale outdoor scenes, it is crucial to integrate semantic information not only for areas with weak textures but also for common objects in outdoor scenes, such as buildings and dynamic objects. This integration of semantic information represents a significant research direction.
- Building large-scale real-world datasets: Constructing comprehensive datasets for city scenes using data from satellites, aerial planes, drones, and other sources is of paramount importance. Additionally, there is a need for more robust evaluation algorithms for 3D reconstruction. The current metrics, which are largely borrowed from the 2D image domain, may not fully capture the complexities of 3D reconstruction. Future research should focus on developing evaluation algorithms that combine global and local aspects, as well as visual and geometric accuracy, to provide a more comprehensive assessment of 3D reconstruction results.
- Real-time reconstruction: Image-based 3D reconstruction is computationally intensive, making real-time reconstruction a significant challenge. Recent studies have explored methods such as federated learning, where individual drones train using their own data, to improve efficiency. Therefore, integrating techniques such as federated learning and scene partitioning to train lightweight network models using large-scale scene data will be a crucial and challenging research area for achieving the real-time 3D reconstruction of outdoor scenes. This research has significant implications for applications in areas such as smart cities and search-and-rescue missions.
- Fusion of images with other sensors: Another valuable direction is the exploration of efficient fusion techniques that combine images with other sensor data, such as LiDAR data, to address challenges related to some large and complex scenes, including unconventional architecture, vegetation, and occlusions, during the reconstruction of outdoor scenes. By effectively integrating multiple sensor modalities, the accuracy of reconstruction can effectively be improved. This can provide significant enhancements for the planarity of irregular structures and contribute to the restoration of ground points in scenes with dense vegetation.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yu, H.; Feng, S.; Cui, L. Research on multi-scale 3D modeling method for urban digital twin. Appl. Electron. Tech. 2022, 48, 78–80+85. [Google Scholar] [CrossRef]
- Martinez Espejo Zaragoza, I.; Caroti, G.; Piemonte, A. The use of image and laser scanner survey archives for cultural heritage 3D modelling and change analysis. ACTA IMEKO 2021, 10, 114–121. [Google Scholar] [CrossRef]
- Liu, Z.; Dai, Z.; Tian, S. Review of non-contact three-dimensional reconstruction techniques. Sci. Technol. Eng. 2022, 22, 9897–9908. [Google Scholar]
- Tachella, J.; Altmann, Y.; Mellado, N.; McCarthy, A.; Tobin, R.; Buller, G.S.; Tourneret, J.Y.; McLaughlin, S. Real-time 3D reconstruction from single-photon lidar data using plug-and-play point cloud denoisers. Nat. Commun. 2019, 10, 4984. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Liang, Y. Generation and detection of structured light: A review. Front. Phys. 2021, 9, 688284. [Google Scholar] [CrossRef]
- Liu, B.; Yang, F.; Huang, Y.; Zhang, Y.; Wu, G. Single-Shot Three-Dimensional Reconstruction Using Grid Pattern-Based Structured-Light Vision Method. Appl. Sci. 2022, 12, 10602. [Google Scholar] [CrossRef]
- Wang, C.; Wen, C.; Dai, Y.; Yu, S.; Liu, M. Urban 3D modeling with mobile laser scanning: A review. Virtual Real. Intell. Hardw. 2020, 2, 175–212. [Google Scholar] [CrossRef]
- Rüfenacht, D.; Fredembach, C.; Süsstrunk, S. Automatic and accurate shadow detection using near-infrared information. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 1672–1678. [Google Scholar] [CrossRef] [PubMed]
- Panchal, M.H.; Gamit, N.C. A comprehensive survey on shadow detection techniques. In Proceedings of the 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, 23–25 March 2016; pp. 2249–2253. [Google Scholar]
- Tychola, K.A.; Tsimperidis, I.; Papakostas, G.A. On 3D reconstruction using RGB-D cameras. Digital 2022, 2, 401–421. [Google Scholar] [CrossRef]
- Wadhwa, P.; Thielemans, K.; Efthimiou, N.; Wangerin, K.; Keat, N.; Emond, E.; Deller, T.; Bertolli, O.; Deidda, D.; Delso, G.; et al. PET image reconstruction using physical and mathematical modelling for time of flight PET-MR scanners in the STIR library. Methods 2021, 185, 110–119. [Google Scholar] [CrossRef]
- Woodham, R.J. Photometric method for determining surface orientation from multiple images. Opt. Eng. 1980, 19, 139–144. [Google Scholar] [CrossRef]
- Ju, Y.; Shi, B.; Chen, Y.; Zhou, H.; Dong, J.; Lam, K.M. GR-PSN: Learning to Estimate Surface Normal and Reconstruct Photometric Stereo Images. IEEE Trans. Vis. Comput. Graph. 2023. online ahead of print. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Liu, J.; Ni, Y.; Li, C.; Wang, Z. Accurate normal measurement of non-Lambertian complex surface based on photometric stereo. IEEE Trans. Instrum. Meas. 2023, 72, 5032511. [Google Scholar] [CrossRef]
- Ikehata, S. Scalable, Detailed and Mask-Free Universal Photometric Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 13198–13207. [Google Scholar]
- Zheng, T.X.; Huang, S.; Li, Y.F.; Feng, M.C. Key techniques for vision based 3D reconstruction: A review. Acta Autom. Sin. 2020, 46, 631–652. [Google Scholar]
- Jiang, S.; Jiang, C.; Jiang, W. Efficient structure from motion for large-scale UAV images: A review and a comparison of SfM tools. ISPRS J. Photogramm. Remote Sens. 2020, 167, 230–251. [Google Scholar] [CrossRef]
- Snavely, N.; Seitz, S.M.; Szeliski, R. Photo tourism: Exploring photo collections in 3D. ACM ACM Trans. Graph. 2006, 25, 835–846. [Google Scholar] [CrossRef]
- Liang, Y.; Yang, Y.; Fan, X.; Cui, T. Efficient and Accurate Hierarchical SfM Based on Adaptive Track Selection for Large-Scale Oblique Images. Remote Sens. 2023, 15, 1374. [Google Scholar] [CrossRef]
- Ye, Z.; Bao, C.; Zhou, X.; Liu, H.; Bao, H.; Zhang, G. EC-SfM: Efficient Covisibility-based Structure-from-Motion for Both Sequential and Unordered Images. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 110–123. [Google Scholar] [CrossRef]
- Chen, Y.; Yu, Z.; Song, S.; Yu, T.; Li, J.; Lee, G.H. AdaSfM: From Coarse Global to Fine Incremental Adaptive Structure from Motion. arXiv 2023, arXiv:2301.12135. [Google Scholar]
- Moulon, P.; Monasse, P.; Marlet, R. Adaptive structure from motion with a contrario model estimation. In Proceedings of the Computer Vision–ACCV 2012: 11th Asian Conference on Computer Vision, Daejeon, Republic of Korea, 5–9 November 2012; Revised Selected Papers, Part IV 11. Springer: Berlin/Heidelberg, Germany, 2013; pp. 257–270. [Google Scholar]
- Wu, C. Towards linear-time incremental structure from motion. In Proceedings of the 2013 International Conference on 3D Vision-3DV, Seattle, WA, USA, 29 June–1 July 2013; pp. 127–134. [Google Scholar]
- Schonberger, J.L.; Frahm, J.M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
- Zhu, S.; Shen, T.; Zhou, L.; Zhang, R.; Fang, T.; Quan, L. Accurate, Scalable and Parallel Structure from Motion. Ph.D. Thesis, Hong Kong University of Science and Technology, Hong Kong, China, 2017. [Google Scholar]
- Qu, Y.; Huang, J.; Zhang, X. Rapid 3D reconstruction for image sequence acquired from UAV camera. Sensors 2018, 18, 225. [Google Scholar] [CrossRef]
- Duan, J. Incremental monocular SFM 3D reconstruction method based on graph optimization. Jiangsu Sci. Technol. Inf. 2019, 36, 37–40. [Google Scholar]
- Liu, B.; Liu, X.; Zhang, H. Linear incremental 3D sparse reconstruction system design. Electron. Opt. Control 2019, 26, 100–104+109. [Google Scholar]
- Cui, H.; Shen, S.; Gao, W.; Liu, H.; Wang, Z. Efficient and robust large-scale structure-from-motion via track selection and camera prioritization. ISPRS J. Photogramm. Remote Sens. 2019, 156, 202–214. [Google Scholar] [CrossRef]
- Sturm, P.; Triggs, B. A factorization based algorithm for multi-image projective structure and motion. In Proceedings of the Computer Vision—ECCV’96: 4th European Conference on Computer Vision, Cambridge, UK, 15–18 April 1996; Proceedings Volume II 4. Springer: London, UK, 1996; pp. 709–720. [Google Scholar]
- Crandall, D.; Owens, A.; Snavely, N.; Huttenlocher, D. Discrete-continuous optimization for large-scale structure from motion. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 3001–3008. [Google Scholar]
- Hartley, R.; Aftab, K.; Trumpf, J. L1 rotation averaging using the Weiszfeld algorithm. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 3041–3048. [Google Scholar]
- Wilson, K.; Snavely, N. Robust global translations with 1dsfm. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part III 13. Springer: Cham, Switzerland, 2014; pp. 61–75. [Google Scholar]
- Sweeney, C.; Sattler, T.; Hollerer, T.; Turk, M.; Pollefeys, M. Optimizing the viewing graph for structure-from-motion. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 801–809. [Google Scholar]
- Cui, H.; Shen, S.; Gao, W.; Hu, Z. Efficient large-scale structure from motion by fusing auxiliary imaging information. IEEE Trans. Image Process. 2015, 24, 3561–3573. [Google Scholar] [PubMed]
- Zhu, S.; Zhang, R.; Zhou, L.; Shen, T.; Fang, T.; Tan, P.; Quan, L. Very large-scale global sfm by distributed motion averaging. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4568–4577. [Google Scholar]
- Pang, Q. Research on Fast 3D Reconstruction Technology of Field Scene Based on UAV Image. Ph.D. Thesis, Hong Kong University of Science and Technology, Hong Kong, China, 2022. [Google Scholar]
- Yu, G.; Liu, X.; Shi, C.; Wang, Z. A robust 3D reconstruction method of UAV images. Bull. Surv. Mapp. 2022, 76–81. [Google Scholar]
- Cui, H.; Gao, X.; Shen, S.; Hu, Z. HSfM: Hybrid structure-from-motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1212–1221. [Google Scholar]
- Wang, X.; Xiao, T.; Kasten, Y. A hybrid global structure from motion method for synchronously estimating global rotations and global translations. ISPRS J. Photogramm. Remote Sens. 2021, 174, 35–55. [Google Scholar] [CrossRef]
- Li, D.; Xu, L.; Tang, X.S.; Sun, S.; Cai, X.; Zhang, P. 3D imaging of greenhouse plants with an inexpensive binocular stereo vision system. Remote Sens. 2017, 9, 508. [Google Scholar] [CrossRef]
- Zhang, W.; Liu, B.; Li, H. Characteristic point extracts and the match algorithm based on the binocular vision in three dimensional reconstruction. Remote Sens. 2008, 9, 508. [Google Scholar]
- Nguyen, P.H.; Ahn, C.W. Stereo matching methods for imperfectly rectified stereo images. Symmetry 2019, 11, 570. [Google Scholar]
- Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
- Hamzah, R.A.; Ibrahim, H.; Hassan, A.H.A. Stereo matching algorithm based on per pixel difference adjustment, iterative guided filter and graph segmentation. J. Vis. Commun. Image Represent. 2017, 42, 145–160. [Google Scholar] [CrossRef]
- Hamzah, R.A.; Ibrahim, H. Literature survey on stereo vision disparity map algorithms. J. Sens. 2016, 2016, 8742920. [Google Scholar] [CrossRef]
- Zheng, G.W.; Jiang, X.H. A fast stereo matching algorithm based on fixed-window. Appl. Mech. Mater. 2013, 411, 1305–1313. [Google Scholar] [CrossRef]
- Yang, C.; Li, Y.; Zhong, W.; Chen, S. Real-time hardware stereo matching using guided image filter. In Proceedings of the 26th Edition on Great Lakes Symposium on VLSI, Boston, MA, USA, 18–20 May 2016; pp. 105–108. [Google Scholar]
- Hirschmüller, H.; Innocent, P.R.; Garibaldi, J. Real-time correlation-based stereo vision with reduced border errors. Int. J. Comput. Vis. 2002, 47, 229–246. [Google Scholar] [CrossRef]
- Yoon, K.J.; Kweon, I.S. Adaptive support-weight approach for correspondence search. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 650–656. [Google Scholar] [CrossRef]
- Wang, Z.F.; Zheng, Z.G. A region based stereo matching algorithm using cooperative optimization. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 24–26 June 2008; pp. 1–8. [Google Scholar]
- Liu, X. Research on Stereo Matching Algorithm Based on Binocular Stereo vision. Ph.D. Thesis, Central South University, Changsha, China, 2011. [Google Scholar]
- Zhong, D.; Yao, J.; Guo, T. Stereo Matching Algorithm Based on Image Segmentation. Video Eng. 2014, 38, 5–7+12. [Google Scholar]
- Brown, M.Z.; Burschka, D.; Hager, G.D. Advances in computational stereo. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 993–1008. [Google Scholar] [CrossRef]
- Sung, M.C.; Lee, S.H.; Cho, N.I. Stereo Matching Using Multi-directional Dynamic Programming. In Proceedings of the 2006 International Symposium on Intelligent Signal Processing and Communications, Yonago, Japan, 12–15 December 2006; pp. 697–700. [Google Scholar]
- Li, K.; Wang, S.; Yuan, M.; Chen, N. Scale invariant control points based stereo matching for dynamic programming. In Proceedings of the 2009 9th International Conference on Electronic Measurement & Instruments, Beijing, China, 16–19 August 2009; pp. 3–769. [Google Scholar]
- Hu, T.; Qi, B.; Wu, T.; Xu, X.; He, H. Stereo matching using weighted dynamic programming on a single-direction four-connected tree. Comput. Vis. Image Underst. 2012, 116, 908–921. [Google Scholar] [CrossRef]
- Sun, J.; Zheng, N.N.; Shum, H.Y. Stereo matching using belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 787–800. [Google Scholar]
- Zhou, Z.; Fan, J.; Zhao, J.; Liu, X. Parallel stereo matching algorithm base on belief propagation. Opt. Precis. Eng. 2011, 19, 2774–2781. [Google Scholar] [CrossRef]
- Hong, L.; Chen, G. Segment-based stereo matching using graph cuts. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004, Washington, DC, USA, 19 July 2004; Volume 1, p. I. [Google Scholar]
- Bleyer, M.; Gelautz, M. Graph-cut-based stereo matching using image segmentation with symmetrical treatment of occlusions. Signal Process. Image Commun. 2007, 22, 127–143. [Google Scholar] [CrossRef]
- Lempitsky, V.; Rother, C.; Blake, A. Logcut-efficient graph cut optimization for markov random fields. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio De Janeiro, Brazil, 14–21 October 2007; pp. 1–8. [Google Scholar]
- He, X.; Zhang, G.; Dong, J. Improved stereo matching algorithm based on image segmentation. Microelectron. Comput. 2014, 31, 61–66. [Google Scholar]
- Hirschmuller, H. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 30, 328–341. [Google Scholar] [CrossRef] [PubMed]
- Zabih, R.; Woodfill, J. Non-parametric local transforms for computing visual correspondence. In Proceedings of the Computer Vision—ECCV’94: Third European Conference on Computer Vision, Stockholm, Sweden, 2–6 May 1994; Proceedings, Volume II 3. Springer: Berlin/Heidelberg, Germany, 1994; pp. 151–158. [Google Scholar]
- Hermann, S.; Klette, R. Iterative semi-global matching for robust driver assistance systems. In Proceedings of the Asian Conference on Computer Vision, Daejeon, Republic of Korea, 5–9 November 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 465–478. [Google Scholar]
- Rothermel, M.; Wenzel, K.; Fritsch, D.; Haala, N. SURE: Photogrammetric surface reconstruction from imagery. In Proceedings of the Proceedings LC3D Workshop, Berlin, Germany, 4–5 December 2012; Volume 8. [Google Scholar]
- Jie, P. 3D Surface Reconstruction and Optimization Based on Geometric and Radiometric Integral Imaging Model. Ph.D. Thesis, Wuhan University, Wuhan, China, 2016. [Google Scholar]
- Li, Y.; Li, Z.; Yang, C.; Zhong, W.; Chen, S. High throughput hardware architecture for accurate semi-global matching. Integration 2019, 65, 417–427. [Google Scholar] [CrossRef]
- Chai, Y.; Yang, F. Semi-global stereo matching algorithm based on minimum spanning tree. In Proceedings of the 2018 2nd IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Xi’an, China, 25–27 May 2018; pp. 2181–2185. [Google Scholar]
- Wang, Y.; Qin, A.; Hao, Q.; Dang, J. Semi-global stereo matching of remote sensing images combined with speeded up robust features. Acta Opt. Sin. 2020, 40, 1628003. [Google Scholar] [CrossRef]
- Shrivastava, S.; Choudhury, Z.; Khandelwal, S.; Purini, S. FPGA accelerator for stereo vision using semi-global matching through dependency relaxation. In Proceedings of the 2020 30th International Conference on Field-Programmable Logic and Applications (FPL), Gothenburg, Sweden, 31 August–4 September 2020; pp. 304–309. [Google Scholar]
- Huang, B.; Hu, L.; Zhang, Y. Improved census stereo matching algorithm based on adaptive weight. Comput. Eng. 2021, 47, 189–196. [Google Scholar]
- Zhao, C.; Li, W.; Zhang, Q. Variant center-symmetric census transform for real-time stereo vision architecture on chip. J. Real-Time Image Process. 2021, 18, 2073–2083. [Google Scholar] [CrossRef]
- Lu, Z.; Wang, J.; Li, Z.; Chen, S.; Wu, F. A resource-efficient pipelined architecture for real-time semi-global stereo matching. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 660–673. [Google Scholar] [CrossRef]
- Kar, A.; Häne, C.; Malik, J. Learning a multi-view stereo machine. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Kutulakos, K.N.; Seitz, S.M. A theory of shape by space carving. Int. J. Comput. Vis. 2000, 38, 199–218. [Google Scholar] [CrossRef]
- Lhuillier, M.; Quan, L. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 418–433. [Google Scholar] [CrossRef]
- Furukawa, Y.; Ponce, J. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1362–1376. [Google Scholar] [CrossRef]
- Shen, S. Accurate multiple view 3d reconstruction using patch-based stereo for large-scale scenes. IEEE Trans. Image Process. 2013, 22, 1901–1914. [Google Scholar] [CrossRef]
- Bloesch, M.; Czarnowski, J.; Clark, R.; Leutenegger, S.; Davison, A.J. Codeslam—Learning a compact, optimisable representation for dense visual slam. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2560–2568. [Google Scholar]
- Xue, Y.; Shi, P.; Jia, F.; Huang, H. 3D reconstruction and automatic leakage defect quantification of metro tunnel based on SfM-Deep learning method. Undergr. Space 2022, 7, 311–323. [Google Scholar] [CrossRef]
- Zhou, T.; Brown, M.; Snavely, N.; Lowe, D.G. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1851–1858. [Google Scholar]
- Ummenhofer, B.; Zhou, H.; Uhrig, J.; Mayer, N.; Ilg, E.; Dosovitskiy, A.; Brox, T. Demon: Depth and motion network for learning monocular stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5038–5047. [Google Scholar]
- Wang, C.; Buenaposada, J.M.; Zhu, R.; Lucey, S. Learning depth from monocular videos using direct methods. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2022–2030. [Google Scholar]
- Tang, C.; Tan, P. Ba-net: Dense bundle adjustment network. arXiv 2018, arXiv:1806.04807. [Google Scholar]
- Zbontar, J.; LeCun, Y. Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1592–1599. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Bromley, J.; Guyon, I.; LeCun, Y.; Säckinger, E.; Shah, R. Signature verification using a “siamese” time delay neural network. Adv. Neural Inf. Process. Syst. 1993, 6, 737–744. [Google Scholar] [CrossRef]
- Luo, J.; Xu, Y.; Tang, C.; Lv, J. Learning inverse mapping by autoencoder based generative adversarial nets. In Proceedings of the Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, 14–18 November 2017; Proceedings, Part II 24. Springer: Beijing, China, 2017; pp. 207–216. [Google Scholar]
- Chang, J.R.; Chen, Y.S. Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5410–5418. [Google Scholar]
- Li, J.; Wang, P.; Xiong, P.; Cai, T.; Yan, Z.; Yang, L.; Liu, J.; Fan, H.; Liu, S. Practical Stereo Matching via Cascaded Recurrent Network With Adaptive Correlation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 16263–16272. [Google Scholar]
- Xu, G.; Cheng, J.; Guo, P.; Yang, X. Attention Concatenation Volume for Accurate and Efficient Stereo Matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12981–12990. [Google Scholar]
- Wang, C.; Wang, X.; Zhang, J.; Zhang, L.; Bai, X.; Ning, X.; Zhou, J.; Hancock, E. Uncertainty estimation for stereo matching based on evidential deep learning. Pattern Recognit. 2022, 124, 108498. [Google Scholar] [CrossRef]
- Zagoruyko, S.; Komodakis, N. Learning to compare image patches via convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4353–4361. [Google Scholar]
- Khamis, S.; Fanello, S.; Rhemann, C.; Kowdle, A.; Valentin, J.; Izadi, S. Stereonet: Guided hierarchical refinement for real-time edge-aware depth prediction. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 573–590. [Google Scholar]
- Pilzer, A.; Xu, D.; Puscas, M.; Ricci, E.; Sebe, N. Unsupervised adversarial depth estimation using cycled generative networks. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 587–595. [Google Scholar]
- Gwn Lore, K.; Reddy, K.; Giering, M.; Bernal, E.A. Generative adversarial networks for depth map estimation from RGB video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1177–1185. [Google Scholar]
- Matias, L.P.; Sons, M.; Souza, J.R.; Wolf, D.F.; Stiller, C. Veigan: Vectorial inpainting generative adversarial network for depth maps object removal. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 310–316. [Google Scholar]
- Mayer, N.; Ilg, E.; Hausser, P.; Fischer, P.; Cremers, D.; Dosovitskiy, A.; Brox, T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4040–4048. [Google Scholar]
- Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 66–75. [Google Scholar]
- Wang, Y.; Lai, Z.; Huang, G.; Wang, B.H.; Van Der Maaten, L.; Campbell, M.; Weinberger, K.Q. Anytime stereo image depth estimation on mobile devices. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 5893–5900. [Google Scholar]
- Zhang, F.; Prisacariu, V.; Yang, R.; Torr, P.H. Ga-net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 185–194. [Google Scholar]
- Yao, Y.; Luo, Z.; Li, S.; Fang, T.; Quan, L. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 767–783. [Google Scholar]
- Yao, Y.; Luo, Z.; Li, S.; Shen, T.; Fang, T.; Quan, L. Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5525–5534. [Google Scholar]
- Chen, R.; Han, S.; Xu, J.; Su, H. Point-based multi-view stereo network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1538–1547. [Google Scholar]
- Luo, K.; Guan, T.; Ju, L.; Huang, H.; Luo, Y. P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10452–10461. [Google Scholar]
- Xue, Y.; Chen, J.; Wan, W.; Huang, Y.; Yu, C.; Li, T.; Bao, J. Mvscrf: Learning multi-view stereo with conditional random fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4312–4321. [Google Scholar]
- Yi, H.; Wei, Z.; Ding, M.; Zhang, R.; Chen, Y.; Wang, G.; Tai, Y.W. Pyramid multi-view stereo net with self-adaptive view aggregation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part IX 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 766–782. [Google Scholar]
- Yu, Z.; Gao, S. Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1949–1958. [Google Scholar]
- Gu, X.; Fan, Z.; Zhu, S.; Dai, Z.; Tan, F.; Tan, P. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2495–2504. [Google Scholar]
- Yang, J.; Mao, W.; Alvarez, J.M.; Liu, M. Cost volume pyramid based depth inference for multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4877–4886. [Google Scholar]
- Cheng, S.; Xu, Z.; Zhu, S.; Li, Z.; Li, L.E.; Ramamoorthi, R.; Su, H. Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2524–2534. [Google Scholar]
- Yan, J.; Wei, Z.; Yi, H.; Ding, M.; Zhang, R.; Chen, Y.; Wang, G.; Tai, Y.W. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 674–689. [Google Scholar]
- Liu, J.; Ji, S. A novel recurrent encoder-decoder structure for large-scale multi-view stereo reconstruction from an open aerial dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6050–6059. [Google Scholar]
- Wang, L.; Gong, Y.; Ma, X.; Wang, Q.; Zhou, K.; Chen, L. Is-mvsnet: Importance sampling-based mvsnet. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 668–683. [Google Scholar]
- Chang, D.; Božič, A.; Zhang, T.; Yan, Q.; Chen, Y.; Süsstrunk, S.; Nießner, M. RC-MVSNet: Unsupervised multi-view stereo with neural rendering. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 665–680. [Google Scholar]
- Liao, J.; Ding, Y.; Shavit, Y.; Huang, D.; Ren, S.; Guo, J.; Feng, W.; Zhang, K. Wt-mvsnet: Window-based transformers for multi-view stereo. Adv. Neural Inf. Process. Syst. 2022, 35, 8564–8576. [Google Scholar]
- Li, Y.; Zhao, Z.; Fan, J.; Li, W. ADR-MVSNet: A cascade network for 3D point cloud reconstruction with pixel occlusion. Pattern Recognit. 2022, 125, 108516. [Google Scholar] [CrossRef]
- Weilharter, R.; Fraundorfer, F. ATLAS-MVSNet: Attention Layers for Feature Extraction and Cost Volume Regularization in Multi-View Stereo. In Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022; pp. 3557–3563. [Google Scholar]
- Zhang, S.; Wei, Z.; Xu, W.; Zhang, L.; Wang, Y.; Zhou, X.; Liu, J. DSC-MVSNet: Attention aware cost volume regularization based on depthwise separable convolution for multi-view stereo. Complex Intell. Syst. 2023, 9, 6953–6969. [Google Scholar] [CrossRef]
- Zhang, J.; Li, S.; Luo, Z.; Fang, T.; Yao, Y. Vis-mvsnet: Visibility-aware multi-view stereo network. Int. J. Comput. Vis. 2023, 131, 199–214. [Google Scholar] [CrossRef]
- Yu, D.; Ji, S.; Liu, J.; Wei, S. Automatic 3D building reconstruction from multi-view aerial images with deep learning. ISPRS J. Photogramm. Remote Sens. 2021, 171, 155–170. [Google Scholar] [CrossRef]
- Yu, A.; Guo, W.; Liu, B.; Chen, X.; Wang, X.; Cao, X.; Jiang, B. Attention aware cost volume pyramid based multi-view stereo network for 3d reconstruction. ISPRS J. Photogramm. Remote Sens. 2021, 175, 448–460. [Google Scholar] [CrossRef]
- Gao, J.; Liu, J.; Ji, S. A general deep learning based framework for 3D reconstruction from multi-view stereo satellite images. ISPRS J. Photogramm. Remote Sens. 2023, 195, 446–461. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhu, J.; Lin, L. Multi-View Stereo Representation Revist: Region-Aware MVSNet. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 17376–17385. [Google Scholar]
- Huang, B.; Yi, H.; Huang, C.; He, Y.; Liu, J.; Liu, X. M3VSNet: Unsupervised multi-metric multi-view stereo network. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Virtual, 19–22 September 2021; pp. 3163–3167. [Google Scholar]
- Ma, X.; Gong, Y.; Wang, Q.; Huang, J.; Chen, L.; Yu, F. Epp-mvsnet: Epipolar-assembling based depth prediction for multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 5732–5740. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Zhang, K.; Riegler, G.; Snavely, N.; Koltun, V. Nerf++: Analyzing and improving neural radiance fields. arXiv 2020, arXiv:2010.07492. [Google Scholar]
- Rebain, D.; Jiang, W.; Yazdani, S.; Li, K.; Yi, K.M.; Tagliasacchi, A. DeRF: Decomposed Radiance Fields. arXiv 2020, arXiv:2011.12490. [Google Scholar]
- Deng, K.; Liu, A.; Zhu, J.Y.; Ramanan, D. Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12882–12891. [Google Scholar]
- Barron, J.T.; Mildenhall, B.; Tancik, M.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P.P. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. arXiv 2013, arXiv:2103.13415. [Google Scholar]
- Barron, J.T.; Mildenhall, B.; Verbin, D.; Srinivasan, P.P.; Hedman, P. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Chen, X.; Zhang, Q.; Li, X.; Chen, Y.; Ying, F.; Wang, X.; Wang, J. Hallucinated Neural Radiance Fields in the Wild. arXiv 2021, arXiv:2111.15246. [Google Scholar]
- Li, Z.; Wang, Q.; Cole, F.; Tucker, R.; Snavely, N. Dynibar: Neural dynamic image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 4273–4284. [Google Scholar]
- Yang, G.; Wei, G.; Zhang, Z.; Lu, Y.; Liu, D. MRVM-NeRF: Mask-Based Pretraining for Neural Radiance Fields. arXiv 2023, arXiv:2304.04962. [Google Scholar]
- Chen, A.; Xu, Z.; Zhao, F.; Zhang, X.; Xiang, F.; Yu, J.; Su, H. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 14124–14133. [Google Scholar]
- Xu, Q.; Xu, Z.; Philip, J.; Bi, S.; Shu, Z.; Sunkavalli, K.; Neumann, U. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5438–5448. [Google Scholar]
- Guo, H.; Peng, S.; Lin, H.; Wang, Q.; Zhang, G.; Bao, H.; Zhou, X. Neural 3d scene reconstruction with the manhattan-world assumption. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5511–5520. [Google Scholar]
- Yen-Chen, L.; Florence, P.; Barron, J.T.; Rodriguez, A.; Isola, P.; Lin, T.Y. inerf: Inverting neural radiance fields for pose estimation. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 1323–1330. [Google Scholar]
- Lin, C.H.; Ma, W.C.; Torralba, A.; Lucey, S. Barf: Bundle-adjusting neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 5741–5751. [Google Scholar]
- Martin-Brualla, R.; Radwan, N.; Sajjadi, M.S.; Barron, J.T.; Dosovitskiy, A.; Duckworth, D. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7210–7219. [Google Scholar]
- Rückert, D.; Franke, L.; Stamminger, M. Adop: Approximate differentiable one-pixel point rendering. ACM Trans. Graph. (ToG) 2022, 41, 1–14. [Google Scholar] [CrossRef]
- Mildenhall, B.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P.P.; Barron, J.T. Nerf in the dark: High dynamic range view synthesis from noisy raw images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16190–16199. [Google Scholar]
- Rudnev, V.; Elgharib, M.; Smith, W.; Liu, L.; Golyanik, V.; Theobalt, C. Nerf for outdoor scene relighting. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 615–631. [Google Scholar]
- Ost, J.; Mannan, F.; Thuerey, N.; Knodt, J.; Heide, F. Neural scene graphs for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2856–2865. [Google Scholar]
- Paul, N. TransNeRF-Improving Neural Radiance Fields Using Transfer Learning for Efficient Scene Reconstruction. Master’s Thesis, University of Twente, Enschede, The Netherlands, 2021. [Google Scholar]
- Rybkin, O.; Zhu, C.; Nagabandi, A.; Daniilidis, K.; Mordatch, I.; Levine, S. Model-based reinforcement learning via latent-space collocation. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 9190–9201. [Google Scholar]
- Kundu, A.; Genova, K.; Yin, X.; Fathi, A.; Pantofaru, C.; Guibas, L.J.; Tagliasacchi, A.; Dellaert, F.; Funkhouser, T. Panoptic neural fields: A semantic object-aware neural scene representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12871–12881. [Google Scholar]
- Turki, H.; Ramanan, D.; Satyanarayanan, M. Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12922–12931. [Google Scholar]
- Derksen, D.; Izzo, D. Shadow neural radiance fields for multi-view satellite photogrammetry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1152–1161. [Google Scholar]
- Xiangli, Y.; Xu, L.; Pan, X.; Zhao, N.; Rao, A.; Theobalt, C.; Dai, B.; Lin, D. Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 106–122. [Google Scholar]
- Rematas, K.; Liu, A.; Srinivasan, P.P.; Barron, J.T.; Tagliasacchi, A.; Funkhouser, T.; Ferrari, V. Urban radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12932–12942. [Google Scholar]
- Tancik, M.; Casser, V.; Yan, X.; Pradhan, S.; Mildenhall, B.; Srinivasan, P.P.; Barron, J.T.; Kretzschmar, H. Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8248–8258. [Google Scholar]
- Marí, R.; Facciolo, G.; Ehret, T. Sat-nerf: Learning multi-view satellite photogrammetry with transient objects and shadow modeling using rpc cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1311–1321. [Google Scholar]
- Huang, J.; Stoter, J.; Peters, R.; Nan, L. City3D: Large-scale urban reconstruction from airborne point clouds. arXiv 2022, arXiv-2201. [Google Scholar]
- Zhang, Y.; Chen, G.; Cui, S. Efficient Large-scale Scene Representation with a Hybrid of High-resolution Grid and Plane Features. arXiv 2023, arXiv:2303.03003. [Google Scholar]
- Xu, L.; Xiangli, Y.; Peng, S.; Pan, X.; Zhao, N.; Theobalt, C.; Dai, B.; Lin, D. Grid-guided Neural Radiance Fields for Large Urban Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 8296–8306. [Google Scholar]
- Crandall, D.J.; Owens, A.; Snavely, N.; Huttenlocher, D.P. SfM with MRFs: Discrete-continuous optimization for large-scale structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 2841–2853. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Snavely, N.; Huttenlocher, D.P. Location recognition using prioritized feature matching. In Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; Proceedings, Part II 11. Springer: Berlin/Heidelberg, Germany, 2010; pp. 791–804. [Google Scholar]
- Li, S.; He, S.; Jiang, S.; Jiang, W.; Zhang, L. WHU-Stereo: A Challenging Benchmark for Stereo Matching of High-Resolution Satellite Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
- Bosch, M.; Foster, K.; Christie, G.; Wang, S.; Hager, G.D.; Brown, M. Semantic stereo for incidental satellite images. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 1524–1532. [Google Scholar]
- Patil, S.; Comandur, B.; Prakash, T.; Kak, A.C. A new stereo benchmarking dataset for satellite images. arXiv 2019, arXiv:1907.04404. [Google Scholar]
- Schops, T.; Schonberger, J.L.; Galliani, S.; Sattler, T.; Schindler, K.; Pollefeys, M.; Geiger, A. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3260–3269. [Google Scholar]
- Knapitsch, A.; Park, J.; Zhou, Q.Y.; Koltun, V. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Trans. Graph. (ToG) 2017, 36, 1–13. [Google Scholar] [CrossRef]
- Sensefly. Public Dataset. 2023. Available online: https://www.sensefly.com/education/datasets (accessed on 25 July 2023).
- Yao, Y.; Luo, Z.; Li, S.; Zhang, J.; Ren, Y.; Zhou, L.; Fang, T.; Quan, L. Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1790–1799. [Google Scholar]
- Luo, Z.; Shen, T.; Zhou, L.; Zhu, S.; Zhang, R.; Yao, Y.; Fang, T.; Quan, L. Geodesc: Learning local descriptors by integrating geometry constraints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 168–183. [Google Scholar]
- Lin, L.; Liu, Y.; Hu, Y.; Yan, X.; Xie, K.; Huang, H. Capturing, reconstructing, and simulating: The urbanscene3d dataset. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 93–109. [Google Scholar]
- Aanæs, H.; Jensen, R.R.; Vogiatzis, G.; Tola, E.; Dahl, A.B. Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. 2016, 120, 153–168. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
Dataset Name | Images | Number of Scenes | Task | Details of Objects | ||
---|---|---|---|---|---|---|
Type | Numbers | Resolution | ||||
Quad6K [160] | Internet | 6514 | / | / | SfM | Street view, landmarks |
Dubrovnik6K [161] | Internet | 6844 | / | / | SfM | Street view, landmarks |
Rome16K [161] | Internet | 16,179 | / | / | SfM | Street view, landmarks |
The older NotreDame [18] | Internet | 715 | / | 1 | SfM | Street view, landmarks |
WHU-Stereo [162] | Remote sensing | 1757 pairs | / | 12 | Stereo matching | Buildings and vegetation scenes of six cities in China |
US3D [163] | Remote sensing | 4292 pairs | / | 2 | Stereo matching | Two satellite urban scenes from Jacksonville and Omaha |
Satstereo [164] | Remote sensing | / | / | / | Stereo matching | Images from WorldView-2 and 3 |
ETH3D [165] | Handheld camera | 350 | 2048 × 1536 | 7 | MVS | Buildings, natural landscapes, indoor scenes and industrial scenes |
Tanks and Temples [166] | Handheld camera | / | / | 21 | MVS | Large outdoor scenes such as museums, palaces, and temples; some indoor scenes and sculptures |
Sensefly [167] | UAV | / | / | / | MVS | Cities, highway, blueberry field and other scenes |
BlendedMVS [168] | Created by Mesh | 17,818 | 768 × 576/ 2048 × 1536 | 113 | MVS | 29 large scenes, 52 small scenes, and 32 scenes of sculptures |
Mill19 [151] | UAV | 3618 | 4608 × 3456 | 2 | NeRF | Two scenes around an industrial building and nearby ruins |
GL3D [168,169] | UAV | 125,623 | High-resolution | 543 | SfM, MVS | Including urban areas, rural areas, scenic spots, and small objects |
UrbanScene3D [170] | Cars and UAV | 128K | High-resolution | 16 | MVS, NeRF | Urban scenes including 10 virtual and 6 real scenes |
Metrics | NeRF [129] | NeRF++ [130] | DS-NeRF [132] | mip-NeRF [133] | MVSNeRF [138] | Point-NeRF [139] |
---|---|---|---|---|---|---|
PSNR↑ | 31.01 | 31.65 | 24.9 | 33.09 | 27.07 | 33.31 |
SSIM↑ | 0.947 | 0.952 | 0.72 | 0.961 | 0.931 | 0.978 |
LPIPS↓ | 0.081 | 0.051 | 0.34 | 0.043 | 0.163 | 0.049 |
Methods | Mill19-Building | Mill19-Rubble | Quad 6K | ||||||
---|---|---|---|---|---|---|---|---|---|
PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ | |
NeRF [129] | 19.54 | 0.525 | 0.512 | 21.14 | 0.522 | 0.546 | 16.75 | 0.559 | 0.616 |
NeRF++ [130] | 19.48 | 0.52 | 0.514 | 20.9 | 0.519 | 0.548 | 16,73 | 0.56 | 0.611 |
Mega-NeRF [151] | 20.93 | 0.547 | 0.504 | 24.06 | 0.553 | 0.516 | 18.13 | 0.568 | 0.602 |
GP-NeRF [158] | 20.99 | 0.565 | 0.49 | 24.08 | 0.563 | 0.497 | 17.67 | 0.521 | 0.623 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luo, H.; Zhang, J.; Liu, X.; Zhang, L.; Liu, J. Large-Scale 3D Reconstruction from Multi-View Imagery: A Comprehensive Review. Remote Sens. 2024, 16, 773. https://doi.org/10.3390/rs16050773
Luo H, Zhang J, Liu X, Zhang L, Liu J. Large-Scale 3D Reconstruction from Multi-View Imagery: A Comprehensive Review. Remote Sensing. 2024; 16(5):773. https://doi.org/10.3390/rs16050773
Chicago/Turabian StyleLuo, Haitao, Jinming Zhang, Xiongfei Liu, Lili Zhang, and Junyi Liu. 2024. "Large-Scale 3D Reconstruction from Multi-View Imagery: A Comprehensive Review" Remote Sensing 16, no. 5: 773. https://doi.org/10.3390/rs16050773
APA StyleLuo, H., Zhang, J., Liu, X., Zhang, L., & Liu, J. (2024). Large-Scale 3D Reconstruction from Multi-View Imagery: A Comprehensive Review. Remote Sensing, 16(5), 773. https://doi.org/10.3390/rs16050773