3.1. System Overview
The overview of the proposed framework is depicted in 
Figure 1. The system adopts an optimization-based, tightly coupled framework for LiDAR and IMU integration. To accommodate various LiDAR scanning modes, the point cloud feature extraction module is omitted in favor of direct raw point cloud matching. Initially, the system receives time-synchronized LiDAR point clouds and IMU measurement data. IMU preintegration is then employed for motion compensation of the point cloud. The deskewed point cloud undergoes weighted scan-to-map registration. Based on the locally fitted plane normal vectors during registration, the point cloud is categorized into horizontal plane points, vertical plane points, and general points. The registration residuals of horizontal and vertical plane points are used to compute different components of the 6-DoF pose, termed the first optimization. The optimization results from the first step serve as initial values for the joint optimization of the registration residuals of general points and the IMU preintegration residuals, which include gravity constraints. Compared to predictions from a constant velocity model or an IMU motion model, the optimized results are closer to the true values, facilitating rapid convergence in the second optimization. The estimated odometry from the second optimization updates the local map with the current scan. The updated map is then used for matching the next scan. The following sections provide a detailed exposition of the proposed system.
Before delving into specifics, we establish the notation and frames utilized throughout this paper. We define the IMU frame as the body frame, denoted as . The first IMU frame is considered the world frame, denoted as .  represents the gravity vector in the world frame. We assume a rigid link between the IMU and the LiDAR.
  3.2. IMU Preintegration
The measurements of the IMU are modeled as
        
        where 
 and 
 are the raw IMU measurements in the body frame at time 
t. The IMU measurements are affected by bias 
 and additive white noise 
 and 
. 
 is the rotation matrix from the world frame to the body frame. Assuming that the position, velocity, rotation, and IMU measurements of the platform at time 
t are known, and the platform keeps a constant velocity for a brief period of time. Then, we can predict the position, velocity, and orientation of the platform at time 
:
        where 
 is the time step from time 
t to time 
, and 
, 
, and 
 represent displacement, velocity, and rotation matrix from the body frame to the world frame, respectively. To avoid recalculating the above equation when the linearization point changes at time 
t, the preintegration method proposed in [
11] is introduced, that is, to calculate the relative motion from time 
t to time 
,
        
The incorporation of preintegration not only enhances computational efficiency but also establishes a constraint for sensor fusion, as elaborated in 
Section 3.4. To delve into the specific derivation of preintegration, readers are encouraged to refer to [
11] due to space limitations.
  3.3. Weighted Point Cloud Registration and Classification
The LiDAR point cloud is acquired through continuous scanning, where the movement of the platform during scanning can introduce distortions to the point cloud. As depicted in 
Figure 2, the LiDAR continuously moves from the start to the end of the scanning process, represented as 
. Points originally situated on the same plane on the object (depicted in black) may appear as non-coplanar points (depicted in blue). To achieve precise results in point cloud registration, we utilize IMU measurements to generate motion corrections. However, due to the different sampling frequencies of the sensors, a certain number of LiDAR points may be distributed between two consecutive IMU measurements. Let 
 and 
 denote the start and end times of a LiDAR scan, respectively. For a point 
 measured at time 
m, we identify the two closest IMU measurements in time: 
 and 
. Consequently, the platform’s position at time 
m can be predicted:
        where 
 represents the time interval between 
 and 
. To ensure consistency in the timing of the point cloud within the same scan, all scattered points are projected to the end time of the scan:
        where 
 denotes the extrinsic transformation between the LiDAR and IMU, 
 represents the platform position at the end of the scan in the world frame, and 
 signifies the transformation from the body frame at time 
m to the world frame. The corner mark 
 denotes the body frame at the end of the scan. The undistorted point 
 is subsequently utilized for scan-to-map registration.
We focus on the development of a LiDAR-inertial odometry and mapping system tailored for various LiDAR types. To cater to diverse scan patterns, we employ direct matching on raw points. This method not only leverages intricate environmental features to boost accuracy but also simplifies the process by eliminating manual feature extraction, thereby reducing preprocessing time for LiDAR point clouds. As illustrated in 
Figure 3, the current point is initially represented in the world frame. Subsequently, the system searches for the five closest points to the current point within the local map. A plane is then fitted through these five points. If the distance between each point involved in plane fitting and the fitted plane falls below the pre-set threshold, the current point is selected as the point to be matched and is subsequently used for odometry estimation. Let 
 denote the coefficients of the fitted plane. The distance from point 
 to the fitted plane is calculated as follows:
Traditional point-to-plane matching, as employed by FAST-LIO2 [
17] and Point-LIO [
19], focuses solely on optimizing geometric distances without considering the local geometric distribution of individual feature points. In practical applications, the accuracy of plane fitting is crucial for point cloud matching. Generally, the smaller the average distance from the points involved in plane fitting to the fitted plane, the more accurately the fitted plane can represent the local geometric distribution. Therefore, we introduce a weighting function to further balance the matching process. This weighting function is designed to adjust the importance of each feature point in the matching process based on the distance from the point to the plane, thereby better reflecting the characteristics of the local geometric distribution. To reduce computational costs, we reuse information from previously fitted planes to determine the weighting function. Specifically, for the mean distance 
 from the points involved in plane fitting to the plane, the weight 
W of the fitted plane can be defined as
        
        where 
 is the point cloud at the 
th scan.
As a LiDAR scan typically consists of thousands of points, and with real-time requirements in mind, we commence by downsampling the raw point cloud. Subsequently, we traverse the downsampled points to perform plane fitting. Concurrently, we utilize an ikd-Tree [
18] to manage the map and conduct efficient nearest neighbor searches. The ikd-Tree proves to be a highly effective data structure, allowing for incremental updates and dynamic data balancing. For more in-depth information, readers are encouraged to consult [
18].
Inspired by LeGO-LOAM [
5], we categorize the fitted planes into three classes based on their normal vectors. Specifically, through the fitting of the plane’s normal vector 
, if the angle between 
 and the normal vector of the horizontal plane 
 is approximately 0°, the current point is labeled as a candidate horizontal plane point. Similarly, if this angle is close to 90°, the current point is identified as a candidate vertical plane point. Otherwise, the current point is categorized as a general point. The registration residuals of points labeled as horizontal planes are employed in solving the 
 part of the 6-DoF pose, while the residuals of points labeled as vertical planes contribute to solving the 
 components of the 6-DoF pose. A detailed explanation is provided in 
Section 3.4. It is worth mentioning, LeGO-LOAM considers only ground horizontal constraints and requires additional preprocessing time for ground segmentation. Our method leverages normal vectors during the point cloud registration process for classification, eliminating extra processing time and extending classification beyond just ground points. This approach not only saves computational costs but also enhances the adaptability and robustness of the system across various types of environments.
While we divide points into vertical, horizontal, and general categories, it is important to clarify that our method is designed to adapt to various environmental conditions, including those lacking distinct vertical or horizontal features. In such scenarios, the majority of points may indeed be classified as general points. However, the subsequent point cloud matching and optimization calculations remain effective.
  3.4. Odometry Estimation
Drawing inspiration from visual-inertial odometry [
20,
21], we adopt a tightly coupled, optimization-based approach to construct our proposed LiDAR-inertial odometry system. While previous optimization-based LiDAR-inertial odometry methods [
6,
10,
16] have been explored, few address the gravity vector as an estimated state. Typically, these methods align the gravity vector with the world frame upon system initialization and consider it as a known parameter thereafter. However, it is important to note that the gravity estimation made during initialization may drift over time. To address this issue, we include the gravity vector as one of the states to be optimized in this study. The complete state vector is defined as follows:
        where 
 represents the IMU state at the 
 LiDAR scan end time, encompassing the orientation, position, and velocity of the IMU in the world frame, along with IMU bias in the body frame. The parameter 
n denotes the total number of LiDAR scans.
We design a two-step optimization approach to achieve the maximum a posteriori (MAP) estimate of the system state by solving two nonlinear least squares problems.
  3.4.1. First Step: Optimization
The first-step optimization involves computing the initial 6-DoF pose values based on the classification results of scanned points. In this phase, we select scanned points labeled as horizontal planes and vertical planes, with the quantities denoted by 
v and 
h, respectively. Given that horizontal plane points constrain only 
, and vertical plane points constrain only 
, during optimization, the unconstrained parameters can be fixed to enhance computational efficiency. The optimization aims to minimize the following cost function based on the registration residuals of the scanned points:
          where 
 represents the six DoFs of the pose expressed in Euler angles for rotation components. 
 denotes the squared norm, but fixing some parameters during optimization, where a subset of parameters is held constant. 
 and 
 correspond to the sum of distances from each point in the current scan to the planes fitted in the local map during the registration of points labeled as vertical and horizontal planes, respectively. The Jacobian matrix of the point-to-plane residual is defined by
          
          where
          
The derivation of  is evident. Owing to space constraints, the specific form is not provided in this context.
The pose optimization in this step is initialized using IMU preintegration. In contrast to a constant velocity motion model, this method enables a more accurate prediction of platform motion. Consequently, the initial values for optimization are closer to the optimal solution.
  3.4.2. Second Step: Joint Optimization
The aim of the second optimization is to minimize the residual from IMU preintegration and the registration residual of points labeled as general points in the LiDAR scan. The associated cost function is formulated as follows:
          where 
 and 
 denote the weighted registration residuals of the LiDAR scan and the preintegration residuals of the IMU with the gravity constraint, respectively. 
 is the set of the current scan labeled as a general point. 
 and 
 represent the covariance matrices of the LiDAR and IMU residual terms. The initial values of parameters 
 and 
 in this optimization step are dictated by the outcomes of the first optimization step, given by 
:
          where 
 is the function to convert Euler angles into a rotation matrix.
The LiDAR residuals offer geometric constraints for pose optimization of the current LiDAR frame. For each point 
 in the current scan, the residual is defined as the distance from it to its corresponding fitted plane 
 in the local map:
          where 
 is defined in Equation (
7).
The IMU preintegration provides a relative motion constraint for two consecutive frames of LiDAR scans. To optimize the gravity vector, we extend gravity variation to the IMU measurement residual:
          where 
, 
, and 
 represent the IMU preintegration as defined in Equation (
3). For a detailed derivation of the Jacobian of each residual term of IMU preintegration, we refer readers to [
11]. This paper primarily focuses on optimizing and updating the gravity vector. Despite being a three-dimensional vector, gravity exhibits only two degrees of freedom owing to its known magnitude, meaning it lies on a unit sphere. Simply optimizing the gravity vector as a three-dimensional entity necessitates normalization after each optimization, often leading to unstable system outcomes. Hence, we propose optimizing gravity on the mathematical 
-space [
22], offering the advantage of a closed optimization operation. This ensures that the optimized gravity consistently resides on a sphere without requiring redundant normalization procedures. Let 
 and 
 denote the variables on the manifold 
 and the corresponding tangent plane, respectively. Moreover, let 
 denote the local map: 
. Then, we have
          
          where
          
Then, we can derive the Jacobian matrix of the gravity vector as follows:
          where 
 represents the skew matrix of a 3D vector, and 
 corresponds to the form of 
, as described in Equation (
17).