1. Introduction
With the rapid development of unmanned robots, geohazard monitoring, and field exploration, the demand for high-precision 3D reconstruction of complex unstructured terrain is becoming increasingly urgent [
1]. Unstructured terrain, such as mountains, ruins, forests, etc., is characterized by irregular surface morphology, severe occlusion, and sparse texture features [
2]. Traditional 3D reconstruction methods based on GPS or preset markers have significant limitations in complex environments, resulting in sparse reconstructed point clouds, much noise interference, and low efficiency [
3]. How to achieve efficient and robust 3D point cloud reconstruction of unstructured terrain has become a research hotspot in the fields of computer vision, autonomous robot navigation, and environment perception [
4].
Three-dimensional reconstruction can be mainly divided into two research directions, LiDAR map construction [
5] and visual map construction [
6], depending on the different types of sensors. Visual sensor-based map construction has been widely utilized due to its low energy consumption and light weight [
7]. Boston Dynamics employs a combination of binocular vision and sensors such as IMUs to achieve autonomous exploration of unknown environments in the wild by robot dogs. Cheng Y et al. investigated visual sensor-based environment modeling to provide critical Mars surface terrain information for motion control as well as path planning of the rover [
8]. However, the accuracy of visual sensor modeling is relatively low, in contrast to LiDAR sensor-based map construction that can reach millimeter-level accuracy and is more widely adopted in applied research on environmental reconstruction [
9].
The crucial factor in LIDAR-based environment modeling is the process of data association and matching between neighboring point cloud frames [
10]. A typical method to identify the corresponding transformation relationship between two neighboring frames of LiDAR scanned point cloud is the ICP algorithm [
11]. The algorithm iteratively aligns the two point clouds by finding the correspondence of each point between the two frames until a set threshold is reached and then outputs the transformation parameters between the two point clouds. However, when the scanned point cloud is of a large order of magnitude, the algorithm employed results in excessive computational cost and poor timeliness [
12].
Zhang et al. proposed the LOAM real-time laser odometry and construction algorithm [
13], whose algorithm focuses on matching the estimated state information in order to further realize the real-time construction of maps. When employing LiDAR sensing sensors for mapping research and motion estimation, the LOAM algorithm offers several benefits, including high accuracy and strong real-time performance. The map construction problem is split into two distinct processes by this algorithm, which runs concurrently. A range algorithm is implemented at a higher frequency by an algorithm to estimate the single line LiDAR’s motion state. As illustrated in
Figure 1, an alternative technique matches point clouds at lower frequencies with accuracy to produce real-time environmental mapping.
In recent years, with the development of multi-sensor fusion applications and deep learning, LOAM algorithms have evolved a variety of optimization algorithms. LeGO-LOAM splits a single-frame point cloud into subframes for parallel matching and shortens the processing delay [
14]. LO-Net convolutional network jointly learns inter-frame matching with position and attitude estimation, and implicitly handles dynamic objects through mask-weighted geometrically constrained loss with accuracy comparable to LOAM and better generalization performance [
15]. CSS-LOAM replaces traditional curvature computation with curvature scale space algorithms to analyze multi-scale geometric properties of point clouds with 40% computational efficiency improvement [
16]. KDD-LOAM constructs keyframe KD trees for millisecond feature retrieval in loopback detection [
17]. However, the majority of such algorithms show better performance and accuracy in structured environments, which are distorted to a certain extent in the face of unclear environmental features of unstructured terrain and large fluctuations in robot positional attitude caused by terrain excitation. Therefore, in this paper, we take the algorithm LOAM as the basis to explore the accurate environment modeling method for unstructured terrain.
However, in order to gather more precise and thorough three-dimensional environmental information, the LOAM algorithm often adopts a stop-scan-forward motion using a single line LiDAR. This results in subpar real-time performance across the system. Additionally, robots operating in complicated unstructured terrain conditions frequently experience unstable motion because of the nonlinear stimulation of the ground on the tires [
18]. This will result in the distortion of LiDAR-scanned point cloud data, which will have a significant impact on the subsequent data correlation. Finding trustworthy feature correspondences in the point cloud between two consecutive LiDAR scans is further complicated by the absence of manually labeled features and ambiguous features in unstructured terrain environments [
19]. This paper presents IMU motion information in response to the aforementioned features, with the aim of mitigating the distortion impact of scanned point clouds. In addition, we present graph optimization theory to enhance and optimize the conventional LOAM algorithm, accomplishing an accurate reconstruction of unstructured terrain.
Figure 2 displays the foundation for 3D terrain reconstruction using LiDAR data. First, calibrate the LiDAR and IMU. Then, use the motion information from the IMU to lessen the distortion effect of the scanned point cloud. To perform LiDAR motion estimation, next processes include clustering and segmenting the scanned point cloud, extracting feature points, and finishing data matching. In order to optimize the mapping technique and accomplish the reconstruction of unstructured terrain environments, the graph optimization theory is finally introduced under the G2O framework.
2. Preprocessing of Point Clouds Scanned by LiDAR
LiDAR is relatively insensitive to light compared to visual sensors that can be used stably even at night when there is insufficient light [
20]. In addition, the high resolution of LiDAR permits it to capture details of the environment at remote distances over a wide field of view. The error of LiDAR is relatively constant whether the distance measured is remote or not [
21]. Therefore, robot-mounted LiDAR is used as an environment-aware sensor for reconstruction in this paper. However, the LiDAR is in motion along with the movement of the robots. The movement-induced distortion of the ambient 3D point cloud data may usually be disregarded if the LiDAR scan speed is significantly faster than the speed of the moving robots [
22]. In this case, the standard ICP algorithm can be utilized for matching between different scanned frames to achieve better matching performance. Meanwhile, compensation following velocity estimation based on ICP can reduce LiDAR motion-induced distortion. However, if the scanning motion of LiDAR is relatively slow, especially in unstructured terrain environment where the robots are often accompanied by unsteady motion, it will further result in the distortion of the scanned point cloud affecting the accuracy of the algorithms. Therefore, the pre-processing of the LiDAR scanned point cloud is required. In this case, this paper introduces the IMU motion information for aberration calibration of the acquired raw point clouds.
2.1. Calibration of LiDAR Distortion
One potential source of distortion in a single-frame point cloud obtained through LiDAR scanning is the robot-mounted LiDAR, which tracks the motion of the robots. Unpredictable and variable terrain, especially in unstructured terrain conditions, results in sudden shifts in robots’ motion, which worsens the distortion of the point cloud data.
Independent position estimation is a commonly used approach to address LiDAR distortion. The odometer is another kind of measurement technique. For instance, we calibrate it by using an encoder mounted on the wheel for motion estimation. Wheel encoders are inappropriate in unstructured terrain due to the propensity of wheels to slip. Therefore, in this paper, the robots motion data provided by the IMU is introduced in the initial stage of the point cloud processing to eliminate the point cloud distortion caused by nonlinear motion.
Motion compensation is another way of explaining the elimination of distortion. Concurrently map every point in the constantly scanned point cloud
to the same time
. As illustrated in
Figure 3, calibrate the posture increment during this scanning phase during [
,
] between the gradually detected point cloud
and the final scanned point cloud
.
The data acquisition frequency of the LiDAR is set to 10 Hz to maintain the data density of the acquisition and the effective operation of real-time data processing. A single frame of LiDAR data acquisition is separated by 100 milliseconds between the beginning and the end of the frame. In the same point cloud frame, the attitude of the body will be changed. Therefore, as the LiDAR is moved with the robots, each scan line will no longer be a traditional circle, but a spiral line as shown in
Figure 4. Segmentation may occur to varying degrees when there is a sudden change in robots’ motion, and therefore it is necessary to compensate for this in conjunction with the attitude information of the robots’ motion.
Compared with 10 Hz LiDAR data, the IMU’s acquisition frequency is 100 Hz, which can effectively eliminate the data distortion caused by LiDAR’s movement with the robots, and better compensate for the loss of LiDAR information during the scanning process. It is important to note that all IMU information throughout the scanning period is collected during the point cloud preprocessing phase of the point cloud data motion distortion calibration procedure.
Figure 5 illustrates the method of calibrating LiDAR point cloud distortion. Initially, the acceleration and Euler angles measured by the IMU were converted to the LiDAR coordinate system using the transformation relations identified during the calibration process. Then, the acceleration from the LiDAR coordinate system is converted to the world coordinate system utilizing the Euler Angle Transform, and the acceleration data from the IMU is utilized to calculate the cumulative displacement and velocity.
The main contributor to the distortion is the drift caused by the unstable motion of the LiDAR. Attitude measurements during LiDAR accelerated motion are calculated by interpolating the IMU data in this paper.
The sampling frequencies of the LiDAR and IMU are 10 Hz and 100 Hz, respectively. This determines the timestamps of the two IMU data points corresponding to the start and end of the scan, as well as the timestamp of the current LiDAR scan frame. The displacement produced by the accelerated motion from the current frame of the LiDAR point cloud to the current scan time is:
In the formula above, the drift displacement is denoted by , and the total displacement is denoted by .
The calculated values are referenced to the world coordinate system and require an additional conversion to the LiDAR coordinate system in order to compensate for the corresponding LiDAR point cloud data to achieve an aberration calibration of the point cloud.
2.2. Three-Dimensional Point Cloud Feature Extraction
Since unstructured terrain environments contain common and relatively smooth terrain, planar segmentation of the original point cloud is performed to speed up the computational efficiency of the algorithm. The original point cloud is initially downsampled by the VoxelGrid to reduce the amount of data and maintain the overall geometric and topological properties of the point cloud. The normal is then utilized as a local descriptor of the point cloud to seed a number of points with minimal curvature for growth clustering in order to cluster points that satisfy a curvature smoothing threshold to form planar blocks. Finally, the plane is fitted utilizing a random sampling consistent iterative algorithm.
For the dense point cloud generated, it is essential to ensure that enough stable feature points can be efficiently extracted to fulfill the alignment requirements. The parameter settings of the feature point detection algorithm are shown in
Table 1, and all the parameters are determined and approved through a large number of experiments to guarantee the high efficiency and stability of the algorithm.
The point cloud is further segmented in order to improve processing efficiency and feature extraction accuracy. Suppose is the original single frame point cloud acquired at time . is a point in the single-frame point cloud . is initially projected onto the distance image, and associated with represents the Euclidean distance from the corresponding point . Then, the points are clustered utilizing a distance-based clustering segmentation method and independent labels are assigned to the points in the same class. The original point cloud contains numerous independent noisy points at the same time, which may produce unreliable features, in order to perform fast and reliable feature extraction in this paper clustering with less than a specific number of thresholds is ignored. Once planar segmentation has been performed, only points that may represent large objects are retained for further subsequent processing.
To create a single frame point cloud
, LiDAR scans the surrounding at a frequency of 10 Hz. Edge and planar points are chosen to be feature points in the laser point cloud
. Let
represent a point in
,
. Considering
is the collection of continuous points
that the same scanning line’s LiDAR collected.
will comprise half of the points on either side of
when a complete point cloud is obtained, with an angle separation of 0.2° between neighboring points. Define an expression for evaluating the smoothness
of a localized surface as follows:
The coordinates of the -th point in the point cloud in the LiDAR origin coordinate system are represented by in the formula above.
Sort the points in the same scan line based on the smoothness value of the local surface and select feature points. The threshold
is used to distinguish different types of feature points. Points with smaller
values are chosen as planar feature points, whereas ones with larger
values are chosen as edge feature points.
Figure 6 displays the feature point extraction results from the point cloud. The red and yellow spots correspond to the planar feature points and edge feature points, respectively.
2.3. Matching of Point Cloud Feature Points
The parameter settings of the feature point alignment algorithm are shown in
Table 2, and all the parameters are determined through a large number of experiments to ensure the high efficiency and stability of the algorithm.
Meanwhile, assuming is the start scanning time of the current point cloud frame . LiDAR scanned a single frame point cloud, which is represented as . Additionally, represents the point cloud at time where it is reprojected. Combine the recently obtained point cloud and inside the next scanning period to estimate the LiDAR’s motion status. Moreover, the set of planar feature points as and the set of edge feature points as from the using the previously mentioned technique. Determine which edge lines in match the points in , and which planar blocks in match the points in .
Assuming that
is the six degrees of freedom position and that attitude transformation corresponds to the LiDAR in time
:
where the translation distance along the
X,
Y, and
Z axes is represented by the values
,
, and
. The
X,
Y, and
Z axes’ rotation angle information for
,
, and
.
Being calibrated using the IMU, LiDAR motion can be roughly assumed to remain constant for a certain amount of time. The calibration to be performed above is as expressed in
Section 2.1. In this paper, the LiDAR is initially calibrated with the IMU. Then, we refer to the six degrees of freedom information of vehicle motion provided by the IMU and combine it with the LiDAR point cloud to eliminate the point cloud distortion due to nonlinear motion. Prior to the point cloud data calibration process, all six degrees of freedom information of the IMU for that scan time was already available. Finally, the accumulated displacements and velocities are calculated according to the information of the six degrees of freedom of the IMU to form a complete point cloud which is closer to the real unstructured environment.
The position and attitude transformation between
and [
,
], for a given point
and time
, can then be used to determine
using linear interpolation of
:
The collection of edge points and plane points that
extracted, and the set of points that
and
reprojected at time
, are denoted by
and
, respectively. Between
and
or
and
, the conversion connection is:
The coordinates of
and
midpoint
in the LiDAR origin coordinate system are represented by
and
.
R is the rotation matrix, and
stands for the first to third rows of
:
The rotation angle is denoted by
in the equation above,
It can be concluded that:
Next, for every point in
and
, find the closest neighboring point in
. If
and
are the respective edge lines for the edge point
in the point cloud
, then the equivalent distance is:
The coordinates of points , , and are denoted by the symbols , , and in the formula above.
For the planar point
, if
,
, and
are the corresponding planar blocks in the point cloud
, the corresponding distances are:
The coordinates of , , , and are represented by the letters , , , and in the formula above.
3. Algorithm for 3D Reconstruction
3.1. Point Cloud Feature Matching for 3D Reconstruction
Figure 7 displays the schematic diagram for 3D reconstruction. The LiDAR posture
produced by the algorithm at time
is represented by the blue curve. The LiDAR motion data
computed with the previously described laser mileage calculation method is represented by the blue dotted line. Project the calibrated point cloud onto the red area of
that is depicted on the current map after obtaining
and
. On the map, the calibrated point cloud is shown as
.
The detailed procedure is as follows:
Firstly, convert the global coordinate system from the calibrated point cloud, and then match features with the local map that was created. Point clouds outside of the range will be eliminated upon surpassing the predetermined threshold range. Secondly, based on the smoothness S of the nearby surface, ascertain if the point is a plane feature point or an edge feature point. Finally, locate the feature points’ matching points on the created local map . Determine the plane block of the matching plane formed by the three nearest neighbor plane points, or the edge line of the corresponding edge formed by the two nearest neighbor edge points. Accurately match newly contributed point clouds to the locally created map. Simultaneously, the attitude determined by the laser mileage computation method was slightly modified.
When compared to the laser mileage calculation approach, the reconstruction algorithm in this paper operates at a very low frequency. The laser odometer operates at a frequency of 10 Hz, whereas the reconstruction process operates at 1 Hz. When , LiDAR scanning produces an undistorted point cloud frame and concurrently produces motion pose change information from the LiDAR. At the end of the scanning time , the attitude of the LiDAR on the local map is denoted by , and the generated map point cloud gathered to that point is specified as . The point cloud of represented by , and it is projected onto the global coordinate system. Next, use the LiDAR’s attitude to optimize the match between and .
Determine the matching relationship between feature points first. To facilitate point retrieval, store the point cloud in a designated cubic region with a predetermined edge length. Then, extract the points that correspond to the feature point and store them in an octree. The covariance matrix of is , assuming that is a collection of surrounding points inside the area surrounding the feature points identified in . has an eigenvalue of and an eigenvector of . has two larger and one smaller eigenvalue if is on a plane block. The minimal eigenvalue of a feature vector corresponds to the direction of a planar block. has one larger and two smaller eigenvalues if is on the edge. The feature vector that corresponds to the largest eigenvalue represents the edge line’s orientation. Three points are chosen on the plane block, or two points are chosen on the edge line and fitted using the L-M method, then we add to the locally created map. This process is performed in order to further calculate the distance between feature points and their corresponding points.
3.2. An Algorithm for Modeling Optimization Derived from Graph Optimization Theory
The accuracy of reconstruction cannot be guaranteed when using data association matching between two point clouds, as this might easily result in inaccurate matching and error accumulation [
23]. Graph optimization can solve drift and erroneous matching while reducing cumulative errors even further, increasing the algorithm’s adaptability [
24]. In order to achieve complete motion trajectory estimation of robots while preserving global consistency in map construction, this algorithm determines the relationship between the historical pose state of the robots and the current observation information, and forms correction constraints between the current pose and the historical pose. A backend graph optimization processing approach is presented in this article with the ultimate goal of improving local map creation outcomes.
Figure 8 shows the process of graph optimization.
It can be broken down into the subsequent steps:
3.2.1. Converting the Real-Time Reconstruction Issue into a Description-Based Least Squares Problem
The fundamental task of real-time reconstruction is to solve the maximum posterior probability estimation of the position of feature points and the pose of moving robots on the map based on the observer’s output and the motion state data analyzed by the LiDAR mileage calculation method, as indicated by the following equation:
in the formula above stands for the estimated pose information of moving robots and the position information of feature points on the map. The motion control input
and the corrective constraint result
are included in the parameter
U. To solve the joint probability distribution on the right side of the equation above, break it down as follows:
Determine the logarithm of the two sides of the preceding equation by breaking down the posterior probability estimation into closed-loop constraints and motion information model constraints:
At this stage, the least squares optimization problem will be solved in place of the real-time reconstruction.
3.2.2. Fundamental Structure Derived from Graph Optimization Theory
Real-time reconstruction is modeled using graph models algorithm based on graph optimization theory. It utilizes observed data to maximize the estimation of a moving robot’s whole motion trajectory and produces mapping results that are globally consistent. In the graph model, the edges stand in for constraint connections, and each node represents the system state of moving robots and the environment at various moments in time.
Graph optimization theory separates the real-time reconstruction problem into two parts: the back-end, which further optimizes the graph model created by the front-end, and the front-end, which creates graph models based on observation and system constraints. The front-end data association pertains to the assessment of the relative attitude of robots and the inter-frame matching of neighboring data. From the standpoint of data processing, the main goal of this process is to address the problem of data correlation. Graph optimization takes into account global data correlation without directly processing observed data. From the optimization process’s point of view, it is dependent on the observed data to determine the ideal relationship between local graphs and finish the construction of the graph.
The front-end’s graph will not match the global graph because of mistakes and noise in observations. In an ideal state, all of the following criteria must be satisfied for there to be a fully closed loop:
The identity matrix is denoted by I in the formula above, and the relative transformation matrix between the observed data frames is denoted by .
3.2.3. Building a Dynamic Bayesian Network-Based Graph Model
Dynamic Bayesian networks are currently widely used in research domains like pattern recognition, computer vision, and signal processing. The time-varying random variables and their interrelation can be expressed simply by using dynamic Bayesian networks. Typically, the graph model represented by dynamic Bayesian networks consists of two kinds of nodes: implicit nodes that correspond to unobservable parameter variables and explicit nodes that match observable parameter variables.
Figure 9 depicts the graph model structure for real-time reconstruction based on dynamic Bayesian network expression, with white nodes standing in for visible explicit parameter variables. As an illustration, consider the observation output information
(
i = 1, …,
n + 1) of feature points and the control input information
(
i = 1, …,
n) of mobile robots. Gray nodes stand in for the unobservable implicit parameter variables. Edges in the graph, such as the pose information
(
i = 1, …,
n + 1) of mobile robots and the position information
(
i = 1, …,
n + 1) of feature points, describe the conditional dependency relationship between nodes.
The moving robots current pose is represented by the constraint equation nodes that were built in order to optimize the structure in this paper. The point cloud matching result parameters and pose state information are represented by the edge. Using the nonlinear least squares approach to solve the optimization result that minimizes the error and creates an objective function of error through the establishment of a covariance matrix.
Figure 10 displays the structure of the established graph optimization model.
stands for the local map among them, and
for the LiDAR pose parameter data, respectively.
By incorporating node operators and edge error functions, parameter analysis is carried out by further utilizing the G2O framework to address local and global graph optimization issues. The purposeful function is:
represents a set of points in the point cloud.
indicates neighboring nodes.
indicates the point cloud’s coordinate value.
indicates the rotation angle of each axis.
indicates the covariance matrix.
indicates the edge of the observation constraint, and
indicates the edge that has to be estimated.
An unbiased estimate of the error variance is represented by in the formula above. depicts the matrix of correlation. The mistake at the corresponding location is denoted by , and the concatenated matrix made up of is represented by . The discrepancy between the point cloud’s actual position and its predicted optimal position is denoted by . The computed decomposition matrix is represented by . The decomposition matrix is used to determine the covariance of the point cloud, and the concatenated matrix is made up of .
Further linearize
in the form of correlation matrix:
.
and
are, respectively, composed of
and
in series. The objective function is expressed as:
The block array in the formula above is made up of , and the block array is made up of connected in series.
Supposing
,
then the sub items
of
and
can be expressed as:
Lastly, a system of equations solved in order to define the optimization problem as
. The optimum posture information is as follows:
The current pose is denoted by in the formula above. The difference between the current and estimated poses is represented by , and the optimized position is denoted by .
While the local maps are being constructed, the algorithm calculates a fast point feature histogram for each frame of the point clouds. In this paper, loop closure detection is executed with this histogram to improve the efficiency of the algorithm. The new constraints gained from loop closure detection will be added into the graph as new edges, which can effectively reduce the cumulative error during the robot movement. As shown in
Figure 10, the robot travels sequentially from P0 to P8 and detects a marked similarity with P0 at point P8. Therefore, the P0–P8 position and attitudes constraints are established, and this constraint is added into the graph as a new edge. Finally, the local map positions and attitudes are optimized according to the graph optimization framework, and the local maps are stitched together to form a global map.