Video-Based 3D Reconstruction: A Review of Photogrammetry and Visual SLAM Approaches
Abstract
1. Introduction
- What are the primary features and drawbacks of the approaches used today?
- How can photogrammetric constraints and acquisition principles improve real-time reconstruction reliability?
- Why does SfM dominate the field, and what are its practical limitations compared to V-SLAM in real-time applications?
- How do modern systems trade off speed, accuracy, robustness, and computational cost?
2. Fundamentals of Video-Based 3D Reconstruction
2.1. Structure from Motion (SfM)
- Feature detection and description (e.g., SIFT/ORB-type keypoints);
- Feature matching across views to establish correspondences;
- Geometric verification (e.g., via epipolar constraints) to remove outliers;
- Incremental or global pose/structure estimation through triangulation and repeated registration of new views.
2.2. Multi-View Stereo (MVS)
- Build depth maps for selected reference views by searching along the epipolar geometry and evaluating photo-consistency;
- Apply regularization or filtering to improve robustness in low-texture or repetitive regions;
- Fuse depth maps from multiple views into a dense point cloud.
2.3. Visual Odometry and Visual SLAM
2.4. Role of Video Data vs. Still Images
3. Categories of Video-Based Reconstruction Methods
3.1. Photogrammetry-Based Methods
3.2. VSLAM
3.3. Learning-Based and Hybrid Methods
3.4. Four-Dimensional (4D) (Spatio-Temporal) Reconstruction Methods
4. Keyframe Extraction from Video
5. Discussion
5.1. Statistical Analysis and Evolution of Research Activity
5.2. Method Selection Under Video Constraints: Trade-Offs and Open Challenges
- Dynamic scenes;
- Drift and long-term consistency;
- Real-time dense reconstruction.
5.3. Future Directions
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bazo, R.; Reis, E.; Seewald, L.A.; Rodrigues, V.F.; da Costa, C.A.; Gonzaga, L., Jr.; Antunes, R.S.; da Rosa Righi, R.; Maier, A.; Eskofier, B.; et al. Baptizo: A sensor fusion based model for tracking the identity of human poses. Inf. Fusion 2020, 62, 1–13. [Google Scholar] [CrossRef]
- Rasti, B.; Ghamisi, P. Remote sensing image classification using subspace sensor fusion. Inf. Fusion 2020, 64, 121–130. [Google Scholar] [CrossRef]
- Trzeciak, M.; Brilakis, I. Dense 3D reconstruction of building scenes by ai-based camera–lidar fusion and odometry. J. Comput. Civ. Eng. 2023, 37, 04023010. [Google Scholar] [CrossRef]
- Malihi, S.; Valadan Zoej, M.J.; Hahn, M.; Mokhtarzade, M. Window detection from UAS-derived photogrammetric point cloud employing density-based filtering and perceptual organization. Remote Sens. 2018, 10, 1320. [Google Scholar] [CrossRef]
- Luhmann, T.; Robson, S.; Kyle, S.; Boehm, J. Close-Range Photogrammetry and 3D Imaging; Walter de Gruyter GmbH & Co. KG: Berlin, Germany, 2023. [Google Scholar] [CrossRef]
- Schönberger, J.L.; Frahm, J.M. Structure-from-Motion Revisited. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
- Furukawa, Y.; Ponce, J. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1362–1376. [Google Scholar] [CrossRef] [PubMed]
- Davison. Real-time simultaneous localisation and mapping with a single camera. In Proceedings Ninth IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2003. [Google Scholar] [CrossRef]
- Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar] [CrossRef]
- Zhao, Y.-L.; Hong, Y.-T.; Huang, H.-P. Comprehensive performance evaluation between visual slam and lidar slam for mobile robots: Theories and experiments. Appl. Sci. 2024, 14, 3945. [Google Scholar] [CrossRef]
- Bisson-Larrivée, A.; LeMoine, J.-B. Photogrammetry and the impact of camera placement and angular intervals between images on model reconstruction. Digit. Appl. Archaeol. Cult. Herit. 2022, 26, e00224. [Google Scholar] [CrossRef]
- Panagiotopoulou, A.; Grammatikopoulos, L.; El Saer, A.; Petsa, E.; Charou, E.; Ragia, L.; Karras, G. Super-resolution techniques in photogrammetric 3D reconstruction from close-range UAV imagery. Heritage 2023, 6, 2701–2715. [Google Scholar] [CrossRef]
- Fraser, C. SLAM, SfM and photogrammetry: What’s in a name. In Proceedings of the ISPRS Technical Comission II: Symposium, Riva del Garda, Italy, 3–7 June 2018. [Google Scholar]
- Fraser, C.S. Network design considerations for non-topographic photogrammetry. Photogramm. Eng. Remote Sens. 1984, 50, 1115–1126. [Google Scholar]
- Ramirez, D.; Jayasuriya, S.; Spanias, A. Towards Live 3D Reconstruction from Wearable Video: An Evaluation of V-SLAM, NeRF, and Videogrammetry Techniques. arXiv 2022, arXiv:2211.11836. [Google Scholar] [CrossRef]
- Aulinas, J.; Petillot, Y.; Salvi, J.; Lladó, X. The SLAM problem: A survey. Artif. Intell. Res. Dev. 2008, 184, 363–371. [Google Scholar] [CrossRef]
- Fuentes-Pacheco, J.; Ruiz-Ascencio, J.; Rendón-Mancha, J.M. Visual simultaneous localization and mapping: A survey. Artif. Intell. Rev. 2015, 43, 55–81. [Google Scholar] [CrossRef]
- Lu, Z.; Hu, Z.; Uchimura, K. SLAM estimation in dynamic outdoor environments: A review. In Proceedings of the Intelligent Robotics and Applications: Second International Conference, ICIRA 2009, Singapore, 16–18 December 2009; Proceedings 2; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar] [CrossRef]
- Ros, G.; Sappa, A.; Ponsa, D.; Lopez, A.M. Visual slam for driverless cars: A brief survey. In Intelligent Vehicles Symposium (IV) Workshops; IEEE: New York, NY, USA, 2012; Available online: https://www.semanticscholar.org/paper/Visual-SLAM-for-Driverless-Cars-%3A-A-Brief-Survey-Ros-Sappa/5229c6781deb77dec8499985943ab3e057a86d26 (accessed on 11 January 2026).
- Yousif, K.; Bab-Hadiashar, A.; Hoseinnezhad, R. An overview to visual odometry and visual SLAM: Applications to mobile robotics. Intell. Ind. Syst. 2015, 1, 289–311. [Google Scholar] [CrossRef]
- Taketomi, T.; Uchiyama, H.; Ikeda, S. Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 2017, 9, 16. [Google Scholar] [CrossRef]
- Chen, Y.; Zhou, Y.; Lv, Q.; Deveerasetty, K.K. A review of v-slam. In 2018 IEEE International Conference on Information and Automation (ICIA); IEEE: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
- Saputra, M.R.U.; Markham, A.; Trigoni, N. Visual SLAM and structure from motion in dynamic environments: A survey. ACM Comput. Surv. (CSUR) 2018, 51, 37. [Google Scholar] [CrossRef]
- Gao, B.; Lang, H.; Ren, J. Stereo visual SLAM for autonomous vehicles: A review. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC); IEEE: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
- Arshad, S.; Kim, G.-W. Role of deep learning in loop closure detection for visual and lidar slam: A survey. Sensors 2021, 21, 1243. [Google Scholar] [CrossRef]
- Chen, W.; Shang, G.; Ji, A.; Zhou, C.; Wang, X.; Xu, C.; Li, Z.; Hu, K. An overview on visual slam: From tradition to semantic. Remote Sens. 2022, 14, 3010. [Google Scholar] [CrossRef]
- Cheng, J.; Zhang, L.; Chen, Q.; Hu, X.; Cai, J. A review of visual SLAM methods for autonomous driving vehicles. Eng. Appl. Artif. Intell. 2022, 114, 104992. [Google Scholar] [CrossRef]
- Jia, G.; Li, X.; Zhang, D.; Xu, W.; Lv, H.; Shi, Y.; Cai, M. Visual-SLAM Classical framework and key Techniques: A review. Sensors 2022, 22, 4582. [Google Scholar] [CrossRef]
- Kazerouni, I.A.; Fitzgerald, L.; Dooly, G.; Toal, D. A survey of state-of-the-art on visual SLAM. Expert Syst. Appl. 2022, 205, 117734. [Google Scholar] [CrossRef]
- Macario Barros, A.; Michel, M.; Moline, Y.; Corre, G.; Carrel, F. A comprehensive survey of visual slam algorithms. Robotics 2022, 11, 24. [Google Scholar] [CrossRef]
- Tourani, A.; Bavle, H.; Sanchez-Lopez, J.L.; Voos, H. Visual SLAM: What are the current trends and what to expect? Sensors 2022, 22, 9297. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Zhao, S.; An, D.; Liu, J.; Wang, H.; Feng, Y.; Li, D.; Zhao, R. Visual SLAM for underwater vehicles: A survey. Comput. Sci. Rev. 2022, 46, 100510. [Google Scholar] [CrossRef]
- Sharafutdinov, D.; Griguletskii, M.; Kopanev, P.; Kurenkov, M.; Ferrer, G.; Burkov, A.; Gonnochenko, A.; Tsetserukou, D. Comparison of modern open-source visual SLAM approaches. J. Intell. Robot. Syst. 2023, 107, 43. [Google Scholar] [CrossRef]
- Zhang, Y.; Wu, Y.; Tong, K.; Chen, H.; Yuan, Y. Review of Visual Simultaneous Localization and Mapping Based on Deep Learning. Remote Sens. 2023, 15, 2740. [Google Scholar] [CrossRef]
- Al-Tawil, B.; Hempel, T.; Abdelrahman, A.; Al-Hamadi, A. A review of visual SLAM for robotics: Evolution, properties, and future applications. Front. Robot. AI 2024, 11, 1347985. [Google Scholar] [CrossRef]
- Shen, S.; Meng, J. A Review of Autonomous Navigation Technology for Orchard Robots Based on Visual SLAM. Asian Res. J. Agric. 2025, 18, 261–271. [Google Scholar] [CrossRef]
- Pavoni, G.; Dellepiane, M.; Callieri, M.; Scopigno, R. Automatic Selection of Video Frames for Path Regularization and 3D Reconstruction. In GCH ‘16: Proceedings of the 14th Eurographics Workshop on Graphics and Cultural Heritage; Eurographics Association: Goslar, Germany, 2016; Available online: https://dl.acm.org/doi/10.5555/3061275.3061277 (accessed on 11 January 2026).
- Koschel, A.; Müller, C.; Reiterer, A. Selection of Key Frames for 3D Reconstruction in Real Time. Algorithms 2021, 14, 303. [Google Scholar] [CrossRef]
- Zhang, C.; Wang, H.; Li, H.; Liu, J. A fast key frame extraction algorithm and an accurate feature matching method for 3D reconstruction from aerial video. In 2017 29th Chinese Control And Decision Conference (CCDC); IEEE: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
- Iglhaut, J.; Cabo, C.; Puliti, S.; Piermattei, L.; O’Connor, J.; Rosette, J. Structure from motion photogrammetry in forestry: A review. Curr. For. Rep. 2019, 5, 155–168. [Google Scholar] [CrossRef]
- Herrera-Granda, E.P.; Torres-Cantero, J.C.; Peluffo-Ordóñez, D.H. Monocular visual SLAM, visual odometry, and structure from motion methods applied to 3D reconstruction: A comprehensive survey. Heliyon 2024, 10, e37356. [Google Scholar] [CrossRef]
- Niu, Y.; Liu, L.; Huang, F.; Huang, S.; Chen, S. Overview of image-based 3D reconstruction technology. J. Eur. Opt. Soc.-Rapid Publ. 2024, 20, 18. [Google Scholar] [CrossRef]
- Luo, H.; Zhang, J.; Liu, X.; Zhang, L.; Liu, J. Large-scale 3d reconstruction from multi-view imagery: A comprehensive review. Remote Sens. 2024, 16, 773. [Google Scholar] [CrossRef]
- Croce, V.; Billi, D.; Caroti, G.; Piemonte, A.; De Luca, L.; Véron, P. Comparative Assessment of Neural Radiance Fields and Photogrammetry in Digital Heritage: Impact of Varying Image Conditions on 3D Reconstruction. Remote Sens. 2024, 16, 301. [Google Scholar] [CrossRef]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Torresani, A.; Remondino, F. Videogrammetry vs. photogrammetry for heritage 3D reconstruction. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. XLII-2/W15 2019, 42, 1157–1162. [Google Scholar] [CrossRef]
- Yin, H.; Yu, H. Incremental SFM 3D reconstruction based on monocular. In 2020 13th International Symposium on Computational Intelligence and Design (ISCID); IEEE: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
- Tian, X.; Liu, R.; Wang, Z.; Ma, J. High quality 3D reconstruction based on fusion of polarization imaging and binocular stereo vision. Inf. Fusion 2022, 77, 19–28. [Google Scholar] [CrossRef]
- Alsadik, B.; Khalaf, Y.H. Potential use of drone ultra-high-definition videos for detailed 3d city modeling. ISPRS Int. J. Geo-Inf. 2022, 11, 34. [Google Scholar] [CrossRef]
- Zhan, Z.; Xia, R.; Yu, Y.; Xu, Y.; Wang, X. On-the-Fly SfM: What you capture is What you get. arXiv 2023, arXiv:2309.11883. [Google Scholar] [CrossRef]
- Zhan, Z.; Yu, Y.; Xia, R.; Gan, W.; Xie, H.; Perda, G.; Morelli, L.; Remondino, F.; Wang, X. SfM on-the-fly: A robust near real-time SfM for spatiotemporally disordered high-resolution imagery from multiple agents. ISPRS J. Photogramm. Remote Sens. 2025, 224, 202–221. [Google Scholar] [CrossRef]
- Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality; IEEE: New York, NY, USA, 2007. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Tardós, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Gao, H.; Mao, W.; Liu, M. VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar] [CrossRef]
- Sayed, M.; Gibson, J.; Watson, J.; Prisacariu, V.A.; Firman, M.D.; Godard, C. 3D Reconstruction Without 3D Convolutions. arXiv 2023. [Google Scholar] [CrossRef]
- Morelli, L.; Ioli, F.; Beber, R.; Menna, F.; Remondino, F.; Vitti, A. Colmap-Slam: A Framework for Visual Odometry. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 48, 317–324. [Google Scholar] [CrossRef]
- Sumikura, S.; Shibuya, M.; Sakurada, K. OpenVSLAM: A versatile visual SLAM framework. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019. [Google Scholar] [CrossRef]
- Rosinol, A.; Leonard, J.J.; Carlone, L. Nerf-slam: Real-time dense monocular slam with neural radiance fields. arXiv 2022, arXiv:2210.13641. [Google Scholar] [CrossRef]
- Yan, C.; Qu, D.; Xu, D.; Zhao, B.; Wang, Z.; Wang, D.; Li, X. Gs-slam: Dense visual slam with 3d gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024. [Google Scholar] [CrossRef]
- Zhang, B.; Dong, Y.; Zhao, Y.; Qi, X. DynPL-SLAM: A Robust Stereo Visual SLAM System for Dynamic Scenes Using Points and Lines. IEEE Trans. Intell. Veh. 2024, 1–13. Available online: https://ieeexplore.ieee.org/document/10561575 (accessed on 11 January 2026). [CrossRef]
- Huang, S.; Ren, W.; Li, M. PLFF-SLAM: A Point and Line Feature Fused Visual SLAM Algorithm for Dynamic Illumination Environments. IEEE Access 2025, 13, 34946–34953. [Google Scholar] [CrossRef]
- Zhu, F.; Zhao, Y.; Chen, Z.; Jiang, C.; Zhu, H.; Hu, X. DyGS-SLAM: Realistic Map Reconstruction in Dynamic Scenes Based on Double-Constrained Visual SLAM. Remote Sens. 2025, 17, 625. [Google Scholar] [CrossRef]
- Sun, J.; Xie, Y.; Chen, L.; Zhou, X.; Bao, H. NeuralRecon: Real-time coherent 3D reconstruction from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
- Hermann, M.; Ruf, B.; Weinmann, M. Real-time dense 3D reconstruction from monocular video data captured by low-cost UAVs. arXiv 2021, arXiv:2104.10515. [Google Scholar] [CrossRef]
- Teed, Z.; Deng, J. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. Adv. Neural Inf. Process. Syst. 2022, 34, 16558–16569. [Google Scholar] [CrossRef]
- Albukhari, I.; El-Sayed, A.; Alshibli, M. Mini-Droid-Slam: Improving Monocular Visual Slam Using Mini-Gru Rnn Network. Sensors 2025, 25, 5448. [Google Scholar] [CrossRef]
- Maggio, D.; Lim, H.; Carlone, L. VGGT-SLAM: Dense RGB SLAM optimized on the SL(4) manifold. arXiv 2025, arXiv:2505.12549. [Google Scholar] [CrossRef]
- Maggio, D.; Carlone, L. VGGT-SLAM 2.0: Real time Dense Feed-forward Scene Reconstruction. arXiv 2026, arXiv:2601.19887. [Google Scholar] [CrossRef]
- Carranza, J.; Theobalt, C.; Magnor, M.A.; Seidel, H.-P. Free-viewpoint video of human actors. ACM Trans. Graph. 2003, 22, 569–577. [Google Scholar] [CrossRef]
- De Aguiar, E.; Stoll, C.; Theobalt, C.; Ahmed, N.; Seidel, H.-P.; Thrun, S. Performance capture from sparse multi-view video. In ACM SIGGRAPH 2008 Papers; Association for Computing Machinery: New York, NY, USA, 2008; pp. 1–10. Available online: https://dl.acm.org/doi/10.1145/1399504.1360697 (accessed on 11 January 2026).
- Habermann, M.; Xu, W.; Zollhoefer, M.; Pons-Moll, G.; Theobalt, C. A deeper look into deepcap. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 45, 4009–4022. [Google Scholar] [CrossRef]
- Wu, G.; Yi, T.; Fang, J.; Xie, L.; Zhang, X.; Wei, W.; Liu, W.; Tian, Q.; Wang, X. 4D gaussian splatting for real-time dynamic scene rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024. [Google Scholar] [CrossRef]
- Wang, G.; Lu, Y.; Zhang, L.; Alfarrarjeh, A.; Zimmermann, R.; Kim, S.H.; Shahabi, C. Active key frame selection for 3d model reconstruction from crowdsourced geo-tagged videos. In 2014 IEEE International Conference on Multimedia and Expo (ICME); IEEE: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Dong, Y.; Li, P.; Zhang, L.; Zhou, X.; He, B.; Tang, J. KINND: A Keyframe Insertion Framework via Neural Network Decision-Making for VSLAM. IEEE Robot. Autom. Lett. 2025, 10, 3908–3915. [Google Scholar] [CrossRef]
- Saurabh, A.; Aggrawal, A.; Gupta, S. K-HOG Unsupervised Keyframe Identifier (K-HUKI): Extracting Action-Rich Frames with HOG Features and Unsupervised Learning. 2025. Available online: https://www.researchsquare.com/article/rs-6567616/v1?utm_source=researchgate.net&utm_medium=article (accessed on 11 January 2026). [CrossRef]
- Conti, A.; Poggi, M.; Cambareri, V.; Mattoccia, S. Range-agnostic multi-view depth estimation with keyframe selection. In 2024 International Conference on 3D Vision (3DV); IEEE: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
- Arslan, S.; Tanberk, S. Key frame extraction with attention based deep neural networks. arXiv 2023, arXiv:2306.13176. [Google Scholar] [CrossRef]
- Zhan, Z.; Yu, Y.; Xia, R.; Gan, W.; Xie, H.; Perda, G.; Morelli, L.; Remondino, F.; Wang, X. Sfm on-the-fly: Get better 3d from what you capture. arXiv 2024, arXiv:2407.03939. [Google Scholar] [CrossRef]
- Alonso, I.; Riazuelo, L.; Murillo, A.C. Enhancing v-slam keyframe selection with an efficient ConvNet for semantic analysis. In 2019 International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
- Chen, B.; Yuan, D.; Liu, C.; Wu, Q. Loop Closure Detection Based on Multi-Scale Deep Feature Fusion. Appl. Sci. 2019, 9, 1120. [Google Scholar] [CrossRef]
- Sheng, L.; Xu, D.; Ouyang, W.; Wang, X. Unsupervised collaborative learning of keyframe detection and visual odometry towards monocular deep slam. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
- Soares, J.C.V.; Gattass, M.; Meggiolaro, M.A. Visual SLAM in human populated environments: Exploring the trade-off between accuracy and speed of YOLO and Mask R-CNN. In 2019 19th International Conference on Advanced Robotics (ICAR); IEEE: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
- Li, J.; Pei, L.; Zou, D.; Xia, S.; Wu, Q.; Li, T.; Sun, Z.; Yu, W. Attention-SLAM: A visual monocular SLAM learning from human gaze. IEEE Sens. J. 2020, 21, 6408–6420. [Google Scholar] [CrossRef]
- Bescos, B.; Campos, C.; Tardós, J.D.; Neira, J. DynaSLAM II: Tightly-coupled multi-object tracking and SLAM. IEEE Robot. Autom. Lett. 2021, 6, 5191–5198. [Google Scholar] [CrossRef]
- Bruno, H.M.S.; Colombini, E.L. LIFT-SLAM: A deep-learning feature-based monocular visual SLAM method. Neurocomputing 2021, 455, 97–110. [Google Scholar] [CrossRef]
- Liu, Y.; Miura, J. RDS-SLAM: Real-time dynamic SLAM using semantic segmentation methods. IEEE Access 2021, 9, 23772–23785. [Google Scholar] [CrossRef]
- Wimbauer, F.; Yang, N.; Von Stumberg, L.; Zeller, N.; Cremers, D. MonoRec: Semi-supervised dense reconstruction in dynamic environments from a single moving camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
- Raoui, Y.; Weber, C.; Wermter, S. NeoSLAM: Neural Object SLAM for Loop Closure and Navigation. In International Conference on Artificial Neural Networks; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar] [CrossRef]
- Zhang, K.; Ma, J.; Jiang, J. Loop closure detection with reweighting NetVLAD and local motion and structure consensus. IEEE/CAA J. Autom. Sin. 2022, 9, 1087–1090. [Google Scholar] [CrossRef]
- Zhou, D.; Luo, Y.; Zhang, Q.; Xu, Y.; Chen, D.; Zhang, X. A Lightweight Neural Network for Loop Closure Detection in Indoor Visual SLAM. Int. J. Comput. Intell. Syst. 2023, 16, 49. [Google Scholar] [CrossRef]
- Zhong, Y.; Hu, S.; Huang, G.; Bai, L.; Li, Q. WF-SLAM: A robust VSLAM for dynamic scenarios via weighted features. IEEE Sens. J. 2022, 22, 10818–10827. [Google Scholar] [CrossRef]
- Qu, H.; Zhang, L.; Mao, J.; Tie, J.; He, X.; Hu, X.; Shi, Y.; Chen, C. DK-SLAM: Monocular Visual SLAM with Deep Keypoints Adaptive Learning, Tracking and Loop-Closing. arXiv 2024, arXiv:2401.09160. [Google Scholar] [CrossRef]
- Dias, P.; Kassim, A.A.; Srinivasan, V. A neural network based corner detection method. In Proceedings of ICNN’95-International Conference on Neural Networks; IEEE: New York, NY, USA, 1995. [Google Scholar] [CrossRef]
- Kendall, A.; Grimes, M.; Cipolla, R. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar] [CrossRef]
- Zhang, X.; Su, Y.; Zhu, X. Loop closure detection for visual SLAM systems using convolutional neural network. In 2017 23rd International Conference on Automation and Computing (ICAC); IEEE: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
- Han, S.; Li, M.; Tang, H.; Song, Y.; Tong, G. UVMO: Deep unsupervised visual reconstruction-based multimodal-assisted odometry. Pattern Recognit. 2024, 153, 110573. [Google Scholar] [CrossRef]
- Zhou, Y.; Sun, M. A visual SLAM loop closure detection method based on lightweight siamese capsule network. Sci. Rep. 2025, 15, 7644. [Google Scholar] [CrossRef]
- Yu, C.H.; Huang, C.C. Incremental map modeling for lightweight SLAM via deep reinforcement learning. In 2023 IEEE International Conference on Consumer Electronics (ICCE); IEEE: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
- Wang, J.; Chen, M.; Karaev, N.; Vedaldi, A.; Rupprecht, C.; Novotny, D. Vggt: Visual geometry grounded transformer. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025. [Google Scholar] [CrossRef]
- Malleson, C.; Guillemaut, J.-Y.; Hilton, A. Hybrid modeling of non-rigid scenes from RGBD cameras. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 2391–2404. [Google Scholar] [CrossRef]
- Mustafa, A.; Volino, M.; Kim, H.; Guillemaut, J.-Y.; Hilton, A. Temporally coherent general dynamic scene reconstruction. Int. J. Comput. Vis. 2021, 129, 123–141. [Google Scholar] [CrossRef]
- Rashidi, A.; Dai, F.; Brilakis, I.; Vela, P. Optimized selection of key frames for monocular videogrammetric surveying of civil infrastructure. Adv. Eng. Inform. 2013, 27, 270–282. [Google Scholar] [CrossRef]
- Crete, F.; Dolmiere, T.; Ladret, P.; Nicolas, M. The blur effect: Perception and estimation with a new no-reference perceptual blur metric. In Human Vision and Electronic Imaging XII; SPIE: Bellingham, WA USA, 2007; Available online: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/6492/1/The-blur-effect--perception-and-estimation-with-a-new/10.1117/12.702790.full (accessed on 11 January 2026). [CrossRef]
- Griwodz, C.; Gasparini, S.; Calvet, L.; Gurdjos, P.; Castan, F.; Maujean, B.; De Lillo, G.; Lanthony, Y. AliceVision Meshroom: An open-source 3D reconstruction pipeline. In Proceedings of the 12th ACM Multimedia Systems Conference; Association for Computing Machinery: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
- Tian, F.; Gao, Y.; Fang, Z.; Gu, J.; Yang, S. 3D reconstruction with auto-selected keyframes based on depth completion correction and pose fusion. J. Vis. Commun. Image Represent. 2021, 79, 103199. [Google Scholar] [CrossRef]
- Montas-Laracuente, N.; Delgado Martos, E.; Pesqueira-Calvo, C.; Intra Sidola, G.; Maitín, A.; Nogales, A.; García-Tejedor, Á.J. Automatic 3D Reconstruction: Mesh Extraction Based on Gaussian Splatting from Romanesque–Mudéjar Churches. Appl. Sci. 2025, 15, 8379. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In 2011 International Conference on Computer Vision; IEEE: New York, NY, USA, 2011. [Google Scholar]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Revaud, J.; De Souza, C.; Humenberger, M.; Weinzaepfel, P. R2d2: Reliable and repeatable detector and descriptor. Adv. Neural Inf. Process. Syst. 2019, 32. Available online: https://papers.nips.cc/paper_files/paper/2019/hash/3198dfd0aef271d22f7bcddd6f12f5cb-Abstract.html (accessed on 11 January 2026).
- Dusmanu, M.; Rocco, I.; Pajdla, T.; Pollefeys, M.; Sivic, J.; Torii, A.; Sattler, T. D2-NET: A trainable CNN for joint description and detection of local features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Bonarini, A.; Burgard, W.; Fontana, G.; Matteucci, M.; Sorrenti, D.G.; Tardos, J.D. Rawseeds: Robotics advancement through web-publishing of sensorial and elaborated extensive data sets. In Proceedings of the IROS, Beijing, China, 9–15 October 2006. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2012. [Google Scholar] [CrossRef]
- Handa, A.; Whelan, T.; McDonald, J.; Davison, A.J. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In 2014 IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Burri, M.; Nikolic, J.; Gohl, P.; Schneider, T.; Rehder, J.; Omari, S.; Achtelik, M.W.; Siegwart, R. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 2016, 35, 1157–1163. [Google Scholar] [CrossRef]
- McCormac, J.; Handa, A.; Leutenegger, S.; Davison, A.J. Scenenet rgb-d: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation? In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar] [CrossRef]
- Shi, X.; Li, D.; Zhao, P.; Tian, Q.; Tian, Y.; Long, Q.; Zhu, C.; Song, J.; Qiao, F.; Song, L. Are we ready for service robots? the openloris-scene datasets for lifelong slam. In 2020 IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
- Wang, W.; Zhu, D.; Wang, X.; Hu, Y.; Qiu, Y.; Wang, C.; Hu, Y.; Kapoor, A.; Scherer, S. Tartanair: A dataset to push the limits of visual slam. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
- Nguyen, T.-M.; Yuan, S.; Cao, M.; Lyu, Y.; Nguyen, T.H.; Xie, L. Ntu viral: A visual-inertial-ranging-lidar dataset, from an aerial vehicle viewpoint. Int. J. Robot. Res. 2022, 41, 270–280. [Google Scholar] [CrossRef]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems; IEEE: New York, NY, USA, 2012. [Google Scholar] [CrossRef]









| SfM [40,41] | MVS [42,43] | V-SLAM [33,35] | Learning-Based Methods [44] | |
|---|---|---|---|---|
| Objective |
|
|
|
|
| Inputs | Overlapping images (without a specific order) | Overlapping images | Video or a sequence of images | Images (sometimes accompanied by depth data) |
| Outputs | Sparse point cloud and camera pose | High-density point cloud and 3D mesh | Environmental map generally with low density | Volumetric model, point cloud and 3D mesh |
| Real-time Capability | No | No | Yes | Possible but challenging (mostly offline) |
| Algorithmic Basis |
|
|
|
|
| Advantages |
|
|
|
|
| Disadvantages |
|
|
|
|
| Applications |
|
|
|
|
| ORB-SLAM | Sensor Support | Output | Application | Author and Year | Reference |
|---|---|---|---|---|---|
| I | Monocular | Camera pose estimation | Indoor navigation | Mur-Artal et al., 2015 | [54] |
| II | Mono, Stereo, RGBD | Keyframe selection and sparse point cloud | Mobile Mapping, VR | Mur-Artal and Tardós, 2017 | [53] |
| III | Mono, Stereo, IMU, Fish eye | 2D and 3D Mapping | Robotics, 3D Reconstruction | Campos et al., 2021 | [55] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Javadi Moghadam, A.; Kiani, A.; Naeimaei, R.; Malihi, S.; Brilakis, I. Video-Based 3D Reconstruction: A Review of Photogrammetry and Visual SLAM Approaches. J. Imaging 2026, 12, 128. https://doi.org/10.3390/jimaging12030128
Javadi Moghadam A, Kiani A, Naeimaei R, Malihi S, Brilakis I. Video-Based 3D Reconstruction: A Review of Photogrammetry and Visual SLAM Approaches. Journal of Imaging. 2026; 12(3):128. https://doi.org/10.3390/jimaging12030128
Chicago/Turabian StyleJavadi Moghadam, Ali, Abbas Kiani, Reza Naeimaei, Shirin Malihi, and Ioannis Brilakis. 2026. "Video-Based 3D Reconstruction: A Review of Photogrammetry and Visual SLAM Approaches" Journal of Imaging 12, no. 3: 128. https://doi.org/10.3390/jimaging12030128
APA StyleJavadi Moghadam, A., Kiani, A., Naeimaei, R., Malihi, S., & Brilakis, I. (2026). Video-Based 3D Reconstruction: A Review of Photogrammetry and Visual SLAM Approaches. Journal of Imaging, 12(3), 128. https://doi.org/10.3390/jimaging12030128

