MDPI - Publisher of Open Access Journals

20 pages, 4820 KB

Open AccessArticle

Sem-SLAM: Semantic-Integrated SLAM Approach for 3D Reconstruction

by Shuqi Liu, Yufeng Zhuang, Chenxu Zhang, Qifei Li and Jiayu Hou

Appl. Sci. 2025, 15(14), 7881; https://doi.org/10.3390/app15147881 - 15 Jul 2025

Viewed by 997

Under the upsurge of research on the integration of Simultaneous Localization and Mapping (SLAM) and neural implicit representation, existing methods exhibit obvious limitations in terms of environmental semantic parsing and scene understanding capabilities. In response to this, this paper proposes a SLAM system [...] Read more.

Under the upsurge of research on the integration of Simultaneous Localization and Mapping (SLAM) and neural implicit representation, existing methods exhibit obvious limitations in terms of environmental semantic parsing and scene understanding capabilities. In response to this, this paper proposes a SLAM system that integrates a full attention mechanism and a multi-scale information extractor. This system constructs a more accurate 3D environmental model by fusing semantic, shape, and geometric orientation features. Meanwhile, to deeply excavate the semantic information in images, a pre-trained frozen 2D segmentation algorithm is employed to extract semantic features, providing a powerful support for 3D environmental reconstruction. Furthermore, a multi-layer perceptron and interpolation techniques are utilized to extract multi-scale features, distinguishing information at different scales. This enables the effective decoding of semantic, RGB, and Truncated Signed Distance Field (TSDF) values from the fused features, achieving high-quality information rendering. Experimental results demonstrate that this method significantly outperforms the baseline-based methods in terms of mapping and tracking accuracy on the Replica and ScanNet datasets. It also shows superior performance in semantic segmentation and real-time semantic mapping tasks, offering a new direction for the development of SLAM technology. Full article

(This article belongs to the Special Issue Applications of Data Science and Artificial Intelligence)

► Show Figures

Figure 1

20 pages, 3115 KB

Open AccessArticle

Real-Time LiDAR–Inertial Simultaneous Localization and Mesh Reconstruction

by Yunqi Cheng, Meng Xu, Kezhi Wang, Zonghai Chen and Jikai Wang

World Electr. Veh. J. 2024, 15(11), 495; https://doi.org/10.3390/wevj15110495 - 29 Oct 2024

Viewed by 2224

Abstract

In this paper, a novel LiDAR–inertial-based Simultaneous Localization and Mesh Reconstruction (LI-SLAMesh) system is proposed, which can achieve fast and robust pose tracking and online mesh reconstruction in an outdoor environment. The LI-SLAMesh system consists of two components, including LiDAR–inertial odometry and a [...] Read more.

In this paper, a novel LiDAR–inertial-based Simultaneous Localization and Mesh Reconstruction (LI-SLAMesh) system is proposed, which can achieve fast and robust pose tracking and online mesh reconstruction in an outdoor environment. The LI-SLAMesh system consists of two components, including LiDAR–inertial odometry and a Truncated Signed Distance Field (TSDF) free online reconstruction module. Firstly, to reduce the odometry drift errors we use scan-to-map matching, and inter-frame inertial information is used to generate prior relative pose estimation for later LiDAR-dominated optimization. Then, based on the motivation that the unevenly distributed residual terms tend to degrade the nonlinear optimizer, a novel residual density-driven Gauss–Newton method is proposed to obtain the optimal pose estimation. Secondly, to achieve fast and accurate 3D reconstruction, compared with TSDF-based mapping mechanism, a more compact map representation is proposed, which only maintains the occupied voxels and computes the vertices’ SDF values of each occupied voxels using an iterative Implicit Moving Least Squares (IMLS) algorithm. Then, marching cube is performed on the voxels and a dense mesh map is generated online. Extensive experiments are conducted on public datasets. The experimental results demonstrate that our method can achieve significant localization and online reconstruction performance improvements. The source code will be made public for the benefit of the robotic community. Full article

(This article belongs to the Special Issue Intelligent Control and Energy Systems for Modern Mobility and Industry)

► Show Figures

Figure 1

18 pages, 7128 KB

Open AccessArticle

RGBTSDF: An Efficient and Simple Method for Color Truncated Signed Distance Field (TSDF) Volume Fusion Based on RGB-D Images

by Yunqiang Li, Shuowen Huang, Ying Chen, Yong Ding, Pengcheng Zhao, Qingwu Hu and Xujie Zhang

Remote Sens. 2024, 16(17), 3188; https://doi.org/10.3390/rs16173188 - 29 Aug 2024

Viewed by 4905

Abstract

RGB-D image mapping is an important tool in applications such as robotics, 3D reconstruction, autonomous navigation, and augmented reality (AR). Efficient and reliable mapping methods can improve the accuracy, real-time performance, and flexibility of sensors in various fields. However, the currently widely used [...] Read more.

RGB-D image mapping is an important tool in applications such as robotics, 3D reconstruction, autonomous navigation, and augmented reality (AR). Efficient and reliable mapping methods can improve the accuracy, real-time performance, and flexibility of sensors in various fields. However, the currently widely used Truncated Signed Distance Field (TSDF) still suffers from the problem of inefficient memory management, making it difficult to directly use it for large-scale 3D reconstruction. In order to address this problem, this paper proposes a highly efficient and accurate TSDF voxel fusion method, RGBTSDF. First, based on the sparse characteristics of the volume, an improved grid octree is used to manage the whole scene, and a hard coding method is proposed for indexing. Second, during the depth map fusion process, the depth map is interpolated to achieve a more accurate voxel fusion effect. Finally, a mesh extraction method with texture constraints is proposed to overcome the effects of noise and holes and improve the smoothness and refinement of the extracted surface. We comprehensively evaluate RGBTSDF and similar methods through experiments on public datasets and the datasets collected by commercial scanning devices. Experimental results show that RGBTSDF requires less memory and can achieve real-time performance experience using only the CPU. It also improves fusion accuracy and achieves finer grid details. Full article

(This article belongs to the Special Issue New Insight into Point Cloud Data Processing)

► Show Figures

Figure 1

15 pages, 1556 KB

Open AccessArticle

SVR-Net: A Sparse Voxelized Recurrent Network for Robust Monocular SLAM with Direct TSDF Mapping

by Rongling Lang, Ya Fan and Qing Chang

Sensors 2023, 23(8), 3942; https://doi.org/10.3390/s23083942 - 13 Apr 2023

Cited by 6 | Viewed by 2788

Abstract

Simultaneous localization and mapping (SLAM) plays a fundamental role in downstream tasks including navigation and planning. However, monocular visual SLAM faces challenges in robust pose estimation and map construction. This study proposes a monocular SLAM system based on a sparse voxelized recurrent network, [...] Read more.

Simultaneous localization and mapping (SLAM) plays a fundamental role in downstream tasks including navigation and planning. However, monocular visual SLAM faces challenges in robust pose estimation and map construction. This study proposes a monocular SLAM system based on a sparse voxelized recurrent network, SVR-Net. It extracts voxel features from a pair of frames for correlation and recursively matches them to estimate pose and dense map. The sparse voxelized structure is designed to reduce memory occupation of voxel features. Meanwhile, gated recurrent units are incorporated to iteratively search for optimal matches on correlation maps, thereby enhancing the robustness of the system. Additionally, Gauss–Newton updates are embedded in iterations to impose geometrical constraints, which ensure accurate pose estimation. After end-to-end training on ScanNet, SVR-Net is evaluated on TUM-RGBD and successfully estimates poses on all nine scenes, while traditional ORB-SLAM fails on most of them. Furthermore, absolute trajectory error (ATE) results demonstrate that the tracking accuracy is comparable to that of DeepV2D. Unlike most previous monocular SLAM systems, SVR-Net directly estimates dense TSDF maps suitable for downstream tasks with high efficiency of data exploitation. This study contributes to the development of robust monocular visual SLAM systems and direct TSDF mapping. Full article

(This article belongs to the Topic AI and Data-Driven Advancements in Industry 4.0)

► Show Figures

Figure 1

18 pages, 10193 KB

Open AccessArticle

3D Reconstruction of Remote Sensing Mountain Areas with TSDF-Based Neural Networks

by Zipeng Qi, Zhengxia Zou, Hao Chen and Zhenwei Shi

Remote Sens. 2022, 14(17), 4333; https://doi.org/10.3390/rs14174333 - 1 Sep 2022

Cited by 14 | Viewed by 4359

Abstract

The remote sensing 3D reconstruction of mountain areas has a wide range of applications in surveying, visualization, and game modeling. Different from indoor objects, outdoor mountain reconstruction faces additional challenges, including illumination changes, diversity of textures, and highly irregular surface geometry. Traditional neural [...] Read more.

The remote sensing 3D reconstruction of mountain areas has a wide range of applications in surveying, visualization, and game modeling. Different from indoor objects, outdoor mountain reconstruction faces additional challenges, including illumination changes, diversity of textures, and highly irregular surface geometry. Traditional neural network-based methods that lack discriminative features struggle to handle the above challenges, and thus tend to generate incomplete and inaccurate reconstructions. Truncated signed distance function (TSDF) is a commonly used parameterized representation of 3D structures, which is naturally convenient for neural network computation and computer storage. In this paper, we propose a novel deep learning method with TSDF-based representations for robust 3D reconstruction from images containing mountain terrains. The proposed method takes in a set of images captured around an outdoor mountain and produces high-quality TSDF representations of the mountain areas. To address the aforementioned challenges, such as lighting variations and texture diversity, we propose a view fusion strategy based on reweighted mechanisms (VRM) to better integrate multi-view 2D features of the same voxel. A feature enhancement (FE) module is designed for providing better discriminative geometry prior in the feature decoding process. We also propose a spatial–temporal aggregation (STA) module to reduce the ambiguity between temporal features and improve the accuracy of the reconstruction surfaces. A synthetic dataset for reconstructing images containing mountain terrains is built. Our method outperforms the previous state-of-the-art TSDF-based and depth-based reconstruction methods in terms of both 2D and 3D metrics. Furthermore, we collect real-world multi-view terrain images from Google Map. Qualitative results demonstrate the good generalization ability of the proposed method. Full article

(This article belongs to the Special Issue Pattern Recognition and Image Processing for Remote Sensing II)

► Show Figures

Graphical abstract

14 pages, 8120 KB

Open AccessArticle

DFusion: Denoised TSDF Fusion of Multiple Depth Maps with Sensor Pose Noises

by Zhaofeng Niu, Yuichiro Fujimoto, Masayuki Kanbara, Taishi Sawabe and Hirokazu Kato

Sensors 2022, 22(4), 1631; https://doi.org/10.3390/s22041631 - 19 Feb 2022

Cited by 4 | Viewed by 4869

Abstract

The truncated signed distance function (TSDF) fusion is one of the key operations in the 3D reconstruction process. However, existing TSDF fusion methods usually suffer from the inevitable sensor noises. In this paper, we propose a new TSDF fusion network, named DFusion, to [...] Read more.

The truncated signed distance function (TSDF) fusion is one of the key operations in the 3D reconstruction process. However, existing TSDF fusion methods usually suffer from the inevitable sensor noises. In this paper, we propose a new TSDF fusion network, named DFusion, to minimize the influences from the two most common sensor noises, i.e., depth noises and pose noises. To the best of our knowledge, this is the first depth fusion for resolving both depth noises and pose noises. DFusion consists of a fusion module, which fuses depth maps together and generates a TSDF volume, as well as the following denoising module, which takes the TSDF volume as the input and removes both depth noises and pose noises. To utilize the 3D structural information of the TSDF volume, 3D convolutional layers are used in the encoder and decoder parts of the denoising module. In addition, a specially-designed loss function is adopted to improve the fusion performance in object and surface regions. The experiments are conducted on a synthetic dataset as well as a real-scene dataset. The results prove that our method outperforms existing methods. Full article

(This article belongs to the Special Issue Computer Vision and Machine Learning for Intelligent Sensing Systems)

► Show Figures

Figure 1

24 pages, 36055 KB

Open AccessEditor’s ChoiceArticle

VDBFusion: Flexible and Efficient TSDF Integration of Range Sensor Data

by Ignacio Vizzo, Tiziano Guadagnino, Jens Behley and Cyrill Stachniss

Sensors 2022, 22(3), 1296; https://doi.org/10.3390/s22031296 - 8 Feb 2022

Cited by 70 | Viewed by 11319

Abstract

Mapping is a crucial task in robotics and a fundamental building block of most mobile systems deployed in the real world. Robots use different environment representations depending on their task and sensor setup. This paper showcases a practical approach to volumetric surface reconstruction [...] Read more.

Mapping is a crucial task in robotics and a fundamental building block of most mobile systems deployed in the real world. Robots use different environment representations depending on their task and sensor setup. This paper showcases a practical approach to volumetric surface reconstruction based on truncated signed distance functions, also called TSDFs. We revisit the basics of this mapping technique and offer an approach for building effective and efficient real-world mapping systems. In contrast to most state-of-the-art SLAM and mapping approaches, we are making no assumptions on the size of the environment nor the employed range sensor. Unlike most other approaches, we introduce an effective system that works in multiple domains using different sensors. To achieve this, we build upon the Academy-Award-winning OpenVDB library used in filmmaking to realize an effective 3D map representation. Based on this, our proposed system is flexible and highly effective and, in the end, capable of integrating point clouds from a 64-beam LiDAR sensor at 20 frames per second using a single-core CPU. Along with this publication comes an easy-to-use C++ and Python library to quickly and efficiently solve volumetric mapping problems with TSDFs. Full article

(This article belongs to the Special Issue Best Practice in Simultaneous Localization and Mapping (SLAM))

► Show Figures

Figure 1

17 pages, 10195 KB

Open AccessArticle

Reconstruction of High-Precision Semantic Map

by Xinyuan Tu, Jian Zhang, Runhao Luo, Kai Wang, Qingji Zeng, Yu Zhou, Yao Yu and Sidan Du

Sensors 2020, 20(21), 6264; https://doi.org/10.3390/s20216264 - 3 Nov 2020

Cited by 4 | Viewed by 3723

Abstract

We present a real-time Truncated Signed Distance Field (TSDF)-based three-dimensional (3D) semantic reconstruction for LiDAR point cloud, which achieves incremental surface reconstruction and highly accurate semantic segmentation. The high-precise 3D semantic reconstruction in real time on LiDAR data is important but challenging. Lighting [...] Read more.

We present a real-time Truncated Signed Distance Field (TSDF)-based three-dimensional (3D) semantic reconstruction for LiDAR point cloud, which achieves incremental surface reconstruction and highly accurate semantic segmentation. The high-precise 3D semantic reconstruction in real time on LiDAR data is important but challenging. Lighting Detection and Ranging (LiDAR) data with high accuracy is massive for 3D reconstruction. We so propose a line-of-sight algorithm to update implicit surface incrementally. Meanwhile, in order to use more semantic information effectively, an online attention-based spatial and temporal feature fusion method is proposed, which is well integrated into the reconstruction system. We implement parallel computation in the reconstruction and semantic fusion process, which achieves real-time performance. We demonstrate our approach on the CARLA dataset, Apollo dataset, and our dataset. When compared with the state-of-art mapping methods, our method has a great advantage in terms of both quality and speed, which meets the needs of robotic mapping and navigation. Full article

(This article belongs to the Special Issue Sensors and Computer Vision Techniques for 3D Object Modeling)

► Show Figures

Figure 1

19 pages, 40946 KB

Open AccessArticle

Real-Time Large-Scale Dense Mapping with Surfels

by Xingyin Fu, Feng Zhu, Qingxiao Wu, Yunlei Sun, Rongrong Lu and Ruigang Yang

Sensors 2018, 18(5), 1493; https://doi.org/10.3390/s18051493 - 9 May 2018

Cited by 11 | Viewed by 6815

Abstract

Real-time dense mapping systems have been developed since the birth of consumer RGB-D cameras. Currently, there are two commonly used models in dense mapping systems: truncated signed distance function (TSDF) and surfel. The state-of-the-art dense mapping systems usually work fine with small-sized regions. [...] Read more.

Real-time dense mapping systems have been developed since the birth of consumer RGB-D cameras. Currently, there are two commonly used models in dense mapping systems: truncated signed distance function (TSDF) and surfel. The state-of-the-art dense mapping systems usually work fine with small-sized regions. The generated dense surface may be unsatisfactory around the loop closures when the system tracking drift grows large. In addition, the efficiency of the system with surfel model slows down when the number of the model points in the map becomes large. In this paper, we propose to use two maps in the dense mapping system. The RGB-D images are integrated into a local surfel map. The old surfels that reconstructed in former times and far away from the camera frustum are moved from the local map to the global map. The updated surfels in the local map when every frame arrives are kept bounded. Therefore, in our system, the scene that can be reconstructed is very large, and the frame rate of our system remains high. We detect loop closures and optimize the pose graph to distribute system tracking drift. The positions and normals of the surfels in the map are also corrected using an embedded deformation graph so that they are consistent with the updated poses. In order to deal with large surface deformations, we propose a new method for constructing constraints with system trajectories and loop closure keyframes. The proposed new method stabilizes large-scale surface deformation. Experimental results show that our novel system behaves better than the prior state-of-the-art dense mapping systems. Full article

(This article belongs to the Special Issue Sensors Signal Processing and Visual Computing)

► Show Figures

Figure 1

13 pages, 1059 KB

Open AccessArticle

Compressed Voxel-Based Mapping Using Unsupervised Learning

by Daniel Ricao Canelhas, Erik Schaffernicht, Todor Stoyanov, Achim J. Lilienthal and Andrew J. Davison

Robotics 2017, 6(3), 15; https://doi.org/10.3390/robotics6030015 - 29 Jun 2017

Cited by 10 | Viewed by 10592

Abstract

In order to deal with the scaling problem of volumetric map representations, we propose spatially local methods for high-ratio compression of 3D maps, represented as truncated signed distance fields. We show that these compressed maps can be used as meaningful descriptors for selective [...] Read more.

In order to deal with the scaling problem of volumetric map representations, we propose spatially local methods for high-ratio compression of 3D maps, represented as truncated signed distance fields. We show that these compressed maps can be used as meaningful descriptors for selective decompression in scenarios relevant to robotic applications. As compression methods, we compare using PCA-derived low-dimensional bases to nonlinear auto-encoder networks. Selecting two application-oriented performance metrics, we evaluate the impact of different compression rates on reconstruction fidelity as well as to the task of map-aided ego-motion estimation. It is demonstrated that lossily reconstructed distance fields used as cost functions for ego-motion estimation can outperform the original maps in challenging scenarios from standard RGB-D (color plus depth) data sets due to the rejection of high-frequency noise content. Full article

(This article belongs to the Special Issue Robotics and 3D Vision)

► Show Figures

Figure 1

Search Results (10)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (10)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI