A Novel LiDAR–IMU–Odometer Coupling Framework for Two-Wheeled Inverted Pendulum (TWIP) Robot Localization and Mapping with Nonholonomic Constraint Factors

This paper proposes a method to solve the problem of localization and mapping of a two-wheeled inverted pendulum (TWIP) robot on approximately flat ground using a Lidar–IMU–Odometer system. When TWIP is in motion, it is constrained by the ground and suffers from motion disturbances caused by rough terrain or motion shaking. Combining the motion characteristics of TWIP, this paper proposes a framework for localization consisting of a Lidar-IMU-Odometer system. This system formulates a factor graph with five types of factors, thereby coupling relative and absolute measurements from different sensors (including ground constraints) into the system. Moreover, we analyze the constraint dimension of each factor according to the motion characteristics of TWIP and propose a new nonholonomic constraint factor for the odometry pre-integration constraint and ground constraint factor in order to add them naturally to the factor graph with the robot state node on SE(3). Meanwhile, we calculate the uncertainty of each constraint. Utilizing such a nonholonomic constraint factor, a complete lidar–IMU–odometry-based motion estimation system for TWIP is developed via smoothing and mapping. Indoor and outdoor experiments show that our method has better accuracy for two-wheeled inverted pendulum robots.


Introduction
In recent years, Simultaneous localization and mapping (SLAM) for intelligent mobile robots has become a research hotspot [1,2]. Under GPS-denied and unknown environment conditions, mobile robots must be able to utilize onboard sensors to construct an environment map that can be used for navigation [3] and then rely on this environment map for navigation, collision avoidance, and path planning. Many solutions have been proposed based on different sensing methods, mainly divided into vision-based [4][5][6], lidar-based [7][8][9], and a combination of the two [10,11]. Meanwhile, in order to cope with complex environments, sensors such as IMUs and encoders that are less affected by the environment can be used to assist with visual [12,13] or lidar [14][15][16][17][18] localization. Although vision-based methods are particularly suitable for position recognition, their sensitivity to initialization, illumination, and range as well as their high depth computation cost makes them unreliable when supporting autonomous navigation systems.
On the other hand, lidar can directly obtain depth information about the surrounding environment, and lidar are immune to illumination changes. Many impressive LiDARbased localization and mapping algorithms have been proposed. In these algorithms, the IMU, encoder and other high-frequency output sensors are usually used to assist with lidar localization and the deskew point cloud. In [7], the authors present lidar odometry and mapping (LOAM) for low-drift real time state estimation and mapping. In this algorithm, the IMU is only used to de-skew the point cloud and does not participate in LIDAR-IMU to process odometric measurements and improves the system's localization accuracy. • A new nonholonomic factor is proposed to introduce encoder pre-integration constraints and ground constraints, which are naturally coupled to the factor graph optimization of the system. • Indoor and outdoor experiments demonstrate the improved localization accuracy and robustness of the proposed lidar-IMU-odometry coupling framework when the controller is mounted on a two-wheeled inverted pendulum robot navigating on an approximately flat surface.
The rest of the paper is organized as follows: Section 2 provides an overview of the entire methodology; Section 3 reviews the basic knowledge which is be used later in the paper; Section 4 details the five types of factors introduced in our algorithm; and Section 5 describes several experiments we conducted to demonstrate the effectiveness of the proposed method.

Methodology Overview
This paper aims to estimate the motion state of a two-wheeled inverted pendulum robot (TWIP) via lidar-IMU-Odometry coupling. Compared with a six-degrees-of-freedom robot, TWIP cannot move sideways, and is constrained by the ground. Compared with autonomous vehicles on the ground, TWIP swings back and forth during movement to maintain the balance of movement. If we directly parameterize the robot pose on SE(3) without considering the ground constraints, the state estimation accuracy and robustness is depressed due to fewer constraints in the vertical direction of the lidar. If only ground constraints are considered, that is, the deterministic constraint, the performance of system motion estimation is degraded because the six dimensions of the pose are highly coupled with sensor observations. Therefore, neither accurate pose parameterization considering ground constraints nor 3D pose parameterization can satisfactorily represent the pose of the ground robot, because the former suffers from out-of-constraint motion while the latter does not use plane motion constraints. This prompted us to study the state estimation of the two-wheeled inverted pendulum robot through a lidar-IMU-Odometry system. This paper proposes a framework for localization using a Lidar-IMU-Odometer system. The flow chart of this framework is shown in Figure 2. When the system receives a lidar scan, it first uses the measurement of the IMU to de-skew the point cloud, then extracts the line and surface feature points. If the pose exceeds a pre-set threshold, the current frame is selected as a keyframe; otherwise, it is discarded. Measurements from the IMU provide the initial pose. Then, the keyframes are matched to the global map to obtain the rough global pose and construct the lidar pose factor. The IMU and odometer measurements between two adjacent lidar keyframes are used to construct constraint factors via pre-integration techniques. Meanwhile, we use the random constraint model to construct the ground constraint factor. Finally, all factors are included in the factor graph and jointly optimized to obtain the optimal robot pose. The factor graph formulated for TWIP, which is shown in Figure 3, incorporates the measurement data of each sensor as a factor in our optimization system. In addition to the above four factors, this factor graph contains closed-loop factors grouped into two categories. The IMU pre-integration factor, the lidar pose factor, and the loop closure factor constrain each dimension of the robot pose, and are named holonomic constraint factors. However, the odometry preintegration factor and the ground constraint factor only constrain part of the dimensions of the robot's pose, and thus are named non-holonomic constraint factors. For these factors we propose a nonholonomic constraint factor to add them to the system, inspired by the SE(2) constraint SE(3) parameterization method used in [24]. Utilizing such a constraining factor, a complete lidar-IMU-odometry-based motion estimation system for TWIP is developed via smoothing and mapping.

Preliminaries
This paper formulates the state estimation problem based on the Lidar-IMU-Odometry system in terms of a factor graph, that is, a nonlinear optimization problem involving the quantities existing on smooth manifolds. Before delving into details, we review several of the related practical geometric concepts.

On-Manifold Pose Parameterization
We usually use the Matrix Lie group to describe three-dimensional rotation, which is defined as follows: However, the rotation combination operation in the SO(3) involves matrix multiplication in the optimisation process, which cannot be directly operated as Euclidean vectors can. Due to Lie groups having the properties of differential manifolds, the tangent space to the manifold (at the identity), called the Lie algebra, can be introduced as a representation of the most diminutive vector form, and is related to Lie groups through matrix exponents and logarithms. Lie algebra is defined as the follows: where φ is the rotation vector of R 3 , corresponding to a 3 × 3 skew-symmetric matrix. The hat operator (•) ∧ is used to find the antisymmetric matrix of a vector, and its form is At the same time, we define the inverse mapping of the hat operator (•) ∨ as follows: For any two 3D vectors r and s and the rotation matrix R, the skew-symmetric matrix has the following properties: Lie algebra can be transformed into a Lie group by exponential mapping (Rodriguez formula): A first-order approximation of the exponential map used later on is In the optimization process, we often use a left-multiplicative perturbation on the manifold through the left Jacobian matrix, approximating a local on-manifold transformation with Lie algebra: Another valuable property of the exponential map is

Factor Graph Optimization
As TWIP moves, lidar measurements are continuously collected to estimate the robot's motion state and model the surrounding environment. This process can be described by a factor graph, where the robot states to be estimated are state nodes. The continuously collected measurements are introduced as factor nodes connected by edges to the involved state nodes (see Figure 3). In general, we assume that these measurements are independent of each other and are affected by Gaussian noise. The robot state estimation problem represented by the factor graph above can be formulated for maximum a posteriori estimation. Therefore, we formulate the above problem as the following nonlinear least-squares problem:x = arg min where r i is the zero-mean residual associated with measurement i, x is the parameter vector to be optimized, Σ i is the measurement covariance, and r i 2 The above least squares problem can be solved iteratively using the Gauss-Newton method. At each GN iteration, we approximate the optimized objective function (the residual function) by first-order Taylor expansion at the current estimated statex − : where H i = ∂r i ∂x is the Jacobian matrix of the ith residual for the robot state and ∆x is the increment of the state obtained at each iteration. We define the generalized update operation, ⊕, which maps a change in the error state to one in the entire state. After solving the linearized equation, the current state is updated asx + =x − ⊕ ∆x. This linearization process is then repeated until convergence

Pose Parameterization of TWIP
Compared with a six-degrees-of-freedom robot, TWIP cannot move sideways and is constrained by the ground. Compared with autonomous vehicles on the ground, TWIP swings back and forth during movement to maintain the balance of movement. Therefore, we cannot characterize its motion state completely in SE(2) or SE(3) space. A simplified diagram of this movement is shown in Figure 4. In order to analyze the movement of TWIP more conveniently, its coordinate system is defined as follows: Lidar coordinate system (L): The lidar frame originates at the lidar center. The y-axis points to the left, the z-axis points upward, and the x-axis points forward, coinciding with the positive direction of TWIP.
IMU coordinate system (I): The IMU is located below the lidar; its frame originates at the IMU measurement center. The x-, y-, and z-axes are parallel to (L), pointing in the exact directions, and the z-axis points forward, coinciding with the z-axis of the lidar frame. Robot body coordinate system (B): As shown in Figure 4, the robot body frame is connected to the robot body during the movement process. Its origin is at the midpoint of the line connecting the centers of the two wheels. The x-, y-, and z-axes are parallel to (L), pointing in the exact directions, and the z-axis points forward, coinciding with the z-axis of the lidar frame.
World coordinate system (W): The world frame is the gravity-aligned coordinate system, which is initially coincident with the robot body coordinate system (B).
All of the above frames follow the right-hand rule. We calibrate the transformation between the lidar frame and the robot body frame and between the IMU frame and the robot body frame. The transformation between the robot body coordinate system and the world coordinate system is the motion state of the robot, that is, the target parameters for optimization. Figure 5 shows the posture of TWIP as obtained by the Attitude and Heading Reference System (ARHS) during movement. As seen from the figure, when TWIP moves, there are movements on the two-dimensional plane (yaw) and front and back swings (pitch), as well as movement disturbances in the roll direction. Therefore, we parameterize the robot's pose on SE(3). The motion state of the robot corresponding to the lidar keyframe F i at time i is is the robot's rotation. Because the rotation matrix is not additive, it is not convenient for iterative optimization; thus, we use Euler angles and Lie algebra to represent the robot's rotation in the following derivation. Euler angles easily distinguish the constraint dimension of each factor, and the pitch does not reach ±90 degrees during the movement of TWIP, avoiding any problems with gimbal lock. In this paper, θ = θ r θ p θ y T represents the Euler angle and θ r , θ p , θ y represents the values of the roll, pitch, and yaw angles, respectively. The rotation integral based on Euler angles is only exact up to the first order in robot state estimation [26]. Therefore, we use Lie algebra, φ = φ 1 φ 2 φ 3 T , to represent the rotation in the optimization process. Lie algebra can be converted to rotation matrices using Equation (7). According to the motion characteristics of TWIP, x y θ p θ y is the result of the robot's motion, which is named the primary parameter, while z θ r is the motion disturbance caused by uneven ground or the vibration of the robot, and is named the second parameter. The partial parameters in the figure are shown in Table 1.

Symbols Descriptions
X, Z, O The x-axis, z-axis, and origin of the world frame The x-axis, z-axis, and origin of the robot body frame X L , Z L , O L The x-axis, z-axis, and origin of the lidar frame The position of the robot in the world frame at time k (in meters) The pitch angle of the robot relative to the world frame at time k (in radians) The yaw angle of the robot relative to the world frame at time k (in radians)

Coupling of Lidar-IMU-Odometer via Factor Graph
In this section, we introduce five types of factors to construct factor graphs in order to estimate the motion state of TWIP. The robot's state at a specific time is attributed to the graph's nodes. According to the constraints of factors on nodes, these factors are divided into holonomic and nonholonomic constraints. The holonomic constraint factors include the lidar pose factor, the IMU pre-integration factor, and the loop closure factor. The nonholonomic constraint factor consists of the odometer pre-integration and ground constraints. A new robot state node is added to the graph when the change in robot pose exceeds a user-defined threshold. The factor graph is optimized upon the insertion of a new node using incremental smoothing and mapping with the Bayes tree (iSAM2) [17]. The process for generating these factors is described in the following sections.

Lidar Pose Factor
Mechanical 3D LiDAR senses its surroundings by rotating a vertically aligned array of laser beams. When a lidar keyframe F i is selected, we first de-skew the point cloud. The movement of the carrier causes point cloud distortion during the lidar data collection process, and the points in a lidar scan are not collected at the same time; that is, the frames of different lidar points are inconsistent. The bias of the IMU obtained by the factor graph optimization is used to correct the IMU measurement, which can estimate the pose of the carrier. Here, we find the pose corresponding to each lidar point, transform them into the same coordinate system, and obtain the resulting point cloud without distortion. The feature point matching method is used in this paper because it is more robust and efficient than noise-sensitive matching methods, such as Iterative Closest Point (ICP). Similar to [7], this paper uses plane feature points and edge feature points as matching points, which are extracted by smoothness. For any point p k in F i , We find ten straight points of p k from the same sweep, of which half are on either side of p k . The smoothness of p k can then be calculated by the following formula: where D is the point set adjacent to this point we selected and |D| is the number of points in D .
to represent the feature points extracted from the lidar scan at time i, where F π i represents the planar points and F ε i represents the edge points. Points with slight smoothness are selected as plane points and points with significant smoothness are selected as edge points. A detailed description of the feature extraction process is provided in [7].
After the above processing is complete, we have the feature points of the current lidar frame without distortion. We then need to find the plane and edge line corresponding to the current frame feature point in the target point cloud for the feature-based matching method. Unlike traditional algorithms such as lio_sam and lego_loam, which take lidar scans close to the current frame in the time series as matching point clouds, this paper searches for the matching point cloud of the current frame in Euclidean space, which is more accurate because our optimization process is to minimize the point-to-line and point-to-plane spatial distances. Therefore, in this algorithm the position of the current robot is taken as a priori, and a part of the point cloud is extracted from the global map to align with the feature points without distortion in the current frame in order to estimate the pose of the robot. The IMU provides the initial position. The global map consists of edge feature maps and plane feature maps, which are updated and maintained, respectively. The global map is stored in an incremental 3D tree [18]. Compared with the traditional dynamic data structure, incremental 3D tree has an efficient nearest neighbor search and supports the incremental update of the map. Similar to [18], we convert the current frame to the world frame to obtain W F i = W F ε i , W F π i , while the IMU provides the initial pose. We find the N nearest edge points in the global edge map for any edge feature point p ε i , then calculate the variance of these nearby points. If one of the eigenvalues of the variance matrix is significantly larger than the other two, the points are distributed in a line. The eigenvector n ε i , which corresponds to the largest eigenvalue, is the direction vector of the line. The edge point p ε j closest to the feature point is taken as a point on the line. The distance from the feature point to the global line is For any plane feature point p π i , we find the M nearest plane points in the global plane map and calculate the variance of these nearby points. If one of the eigenvalues of the variance matrix is significantly smaller than the other two, the points are distributed in a plane. The eigenvector n π i , which corresponds to the smallest eigenvalue, is the normal vector of the plane. We select the closest plane feature point p π j to the feature point as a point on the plane. Then, the distance from the feature point to the global plane is We can obtain the optimal pose of the current lidar scan relative to the world frame by minimizing the point-to-line and point-to-face distances: After optimization, we have the optimal absolute transformationT W L i from the current keyframe to the world frame, which is used to measure the lidar pose factor.
We obtain the residual of the lidar pose factor using the following formula: where T B L is the transformation between the lidar coordinate system and the robot body coordinate system, calibrated in advance.
The variance of the factor is calculated by the method used in [26]: where p k is the feature point in the current lidar keyframe, and is transformed into the matching point cloud through the optimal transformation,T W through the above matching. Here, N is the number of features in the current lidar keyframe, Λ is the zero-mean white Gaussian noise of the measurements, and σ x , σ y and σ z are the noise sigma (in meters) on the x, y, and z axes, respectively. The sigmas are multiplied by two, as both the target point and the matching are affected by this noise.

IMU Pre-Integration Factor
The main components of an IMU are a three-axis accelerometer and a three-axis gyroscope that measure the linear acceleration and angular velocity of the carrier, respectively. The pose transformation of the carrier to an inertial frame can be obtained by integral operation. The raw measurements,ã t andω t , at moment t are provided bỹ These IMU measurements are measured in the IMU frame, and are affected by the additive noise, n a ,n ω , the acceleration bias, b a t , and the gyroscope bias, b ω t . Generally, in this paper we model the bias as a random walk with Gaussian derivatives, while the additive noise of the accelerometer and the gyroscope is assumed to be subject to a Gaussian distribution: Over time, high-frequency IMU measurement leads to rapid growth in the number of variables in the optimization, increasing the computational cost. We therefore convert the IMU measurements between keyframes into motion constraints via the pre-integration technique, reducing the computational cost and providing a factor, namely, the IMU pre-integration factor. The parameters to be optimized include the robot's pose and the bias of the IMU.
According to the kinematic model in [26], the pose and velocity of the current keyframe in the world frame can be obtained from the measurement of the previous frame. With the pose and velocity of a keyframe i, the pose and velocity of the current keyframe j are: Using Formula (28), we iterate the IMU integral between times i and j, and obtain: where ∆t is the interval between two adjacent measurements, ∆t ij = j−1 ∑ k=i ∆t . Next, we separate the parts related to IMU measurement from the pose and velocity. The parts related to IMU measurement are as follows: where δP ij , δV ij , and δϕ ij are the measurement error of IMU, and are obtained by iterating the initial value at time i. In order to avoid repeated integration due to changes in the bias during the optimization process, we use first-order approximation to linearize the bias terms, as follows: ∆P The items related to the state at times i and j are as follows: We construct the motion constraints of the IMU between two keyframes, which naturally provide the IMU pre-integration factor. This factor takes the bias of the IMU as an optimization parameter, and its residual is defined as follows: where R B I is the rotation between the lidar coordinate system and the robot body coordinate system, calibrated in advance.
The variance of the IMU pre-integration factor can be obtained iteratively from an initial value 0 using the method in [26].

Odometry Pre-Integration Factor
Generally, wheeled robots are equipped with wheel encoders that often provide noisy and only intermittently reliable measurements of the motion of each wheel. In order to reduce the amount of computation and accumulated error, by analogy to IMU we use the pre-integration technique on SE(2) to derive the constraints of the odometer. Unlike general ground mobile robots, TWIP swings back and forth in addition to plane motion, while the encoder only provides two-dimensional plane motion. In order to analyze the constraints of the encoder more conveniently, we add an auxiliary coordinate system, that is, a plane coordinate system. The coordinate system is fixed to the robot; its origin coincides with the origin of the robot body frame, the xy plane coincides with the physical plane, the z-axis is vertically upward, and the y axis coincides with the y axis of the robot body frame, as shown in Figure 6. The encoder measurements between two adjacent keyframes indicate the pose transformation between the plane coordinate systems of the corresponding moments. Therefore, we analyze the constraints provided by the encoder in the planar coordinate system.
The measurement data of the encoder can be converted into linear velocity and angular velocity on a two-dimensional plane: v = r l ω l +r r ω r 2 ω = r l ω l −r r ω r l (34) where ω l and ω r are the rotational velocities of the left and right wheels, respectively, r l and r r are their corresponding radii, and l denotes the robot's baseline. We model the encoder's measurement error with a Gaussian model to process the noisy odometer data. We begin by deriving the measurement model, assuming that between successive odometer readings the motion is planar; the encoder provides the following measurements in the planar coordinate system of the current keyframe: ; ε vi and ε wi represent zero-mean Gaussian noise.
With two consecutive lidar keyframes, i and j, we obtain the pose of the plane coordinate system corresponding to the current keyframe in the world coordinate system from the pose of the previous keyframe and the measurements of the encoder: where φ W π k , k = i, j is the yaw angle of the plane coordinate system corresponding to keyframe k relative to the world coordinate system and P W π k = x k y k , k = i, j is the 2D position of the plane coordinate system corresponding to keyframe k relative to the world coordinate system: Note that the position propagated here depends on the rotational state. In order to generate a measurement between two keyframes i and j without depending on the robot's state in the world coordinate system, we place the items related to the measurement together and eliminate the items related to the state of i and j. Equation (36) can be written as: In order to obtain the error transfer model, we derive the integrated noise terms in the form of iterative transfer: Writing Equation (40) in a compact form, we then have: Iterating from the initial value, Σ δi = 0 3×3 , at the time i, we can incrementally calculate the pre-integrated terms ∆φ ij and ∆P ij and their covariance matrix, Thus far, we have the encoder's pre-integration terms, ∆φ ij and ∆P ij , which provide the measurements of the odometer constraints. Next, we formulate the residual of the odometer pre-integration constraint. Through the definition of the plane coordinate system, we can obtain the rotation of the plane coordinate system corresponding to the keyframes i and j relative to the world coordinate system, respectively: where e i is the ith column vector of the identity matrix, I 3×3 .
Then, the relative transformation of the plane coordinate system corresponding to keyframes i and j is as follows: As can be seen from Appendixes A and B, the measurement of the rotating part of the encoder ∆φ ij provides only one constraint, and the residual of the rotation part is: We obtain the variance of the rotation residuals using a first-order linear approximation of the trigonometric functions: Then, its Jacobian matrix relative to the TWIP motion state (the pose of the robot body frame relative to the world frame) is and the residual of the translation part is where [] xy represents the first two rows of the vector, and its covariance is Then, its Jacobian matrix relative to the motion state of TWIP is

Ground Constraint Factor
This paper proposes a new nonholonomic constraint factor in order to add ground constraints. We assume that TWIP moves on an approximately flat surface and that the robot is always constrained by the ground plane during movement. Ideally, the z and pitch angle would remain constant during the planar motion for a two-wheeled inverted pendulum robot. In the natural environment, however, there are motion disturbances in the other dimensions of the robot's pose due to rough terrain and shaking in the robot's motion. In order to reduce the error caused by this motion disturbance, we use a random model to introduce ground constraints and assume that the motion disturbance is a zero-mean Gaussian model. This is more in line with the natural environment, allowing a least squares to be used to model the ground constraints, which naturally adds the ground constraints as a new factor to the factor graph. As shown in Figure 7, when TWIP moves on an ideal plane the roll angle and z of its body frame relative to the world frame should both be zero. Therefore, the residual of the ground constraint is defined as follows: The corresponding covariance matrix is In this paper, we assume that the motion disturbance of the robot due to rough terrains or motion shaking obeys the Gaussian distribution; σ z and σ r are the noise sigma of the z and roll, respectively, which are determined by the conditions of the terrain and the robot's structure.
The Jacobian matrix relative to the state of TWIP is

Loop Closure Factor
When a new lidar keyframe, F i+1 , arrives, its corresponding robot state, K i+1 , is added to the factor graph. We obtain the absolute pose of the current lidar keyframe as in Section 4.2.1. We obtain the absolute pose of the current lidar keyframe as in Section 4.2.1, and use this pose as a prior to find the previous keyframe closest to F i+1 in Euclidean space as a candidate frame. For example, the lidar keyframe F 3 in Figure 3 is one of the returned loop closure candidates. Then, taking the position of the lidar keyframe F 3 as the center, we extract the point cloud in the cuboid with a length of 10 m and a height of 5 m in the global map and transform the point cloud into the robot body coordinate system corresponding to the lidar keyframe F 3 to construct a local map M i+1 . We try to match F i+1 to the local map M i+1 using the scan-matching method discussed in Section 4.2.1. In the local map, we select the N nearest feature points for each feature point of the current frame to fit a line or plane. After optimization, we obtain the relative transformation, ∆T i+1 3 , and add it to the graph as a loop closure factor. Throughout this paper, we choose to set the loop closure's search distance to 15 m from the new lidar keyframe.

Experiment
We conducted experiments on our collected datasets, as we could not find an open dataset to provide calibrated Lidar-IMU-Odometry for a two-wheeled inverted pendulum robot. All experiments were carried out with an Intel CPU i7-8559U (8 cores@2.70 GHz) laptop computer with 8 GB RAM. As shown in Figure 8, the platform was equipped with a Velodyne VLP-16 lidar oprating at 10 HZ, an XSENS MTI-G-710 IMU to provide acceleration and angular velocity at 200 HZ, and an encoder with 1khz, all of which were synchronized and calibrated in advance. The dataset comes from two scenes; one is a long straight corridor scene, named Dataset Corridor, and the other is a flat outdoor road, named Dataset Outdoor. The environment of the two datasets is shown in Figure 9. As with the process of data collection, the actual ground truth of the robot is unavailable. Inspired by [25], we equipped TWIP with downward-viewing cameras on the front and rear (their position is shown in Figure 7) and an attitude sensor with a dynamic accuracy of 0.2 • . Tens of Aruco code markers were laid out on the floor, and their locations in a predefined world reference were measured with a location error of less than 1 cm. Transformations between industrial cameras and attitude sensors and the robot were all pre-calibrated. The two industrial cameras detected and recorded these marks during the robot's movement. In order not to affect the algorithm in this paper as well as to improve the true accuracy, the trajectory truth was obtained offline. We extracted the marked corner points in each recorded image and approximated the trajectory truth by optimizing the reprojection error of the image and the pose constraints of the pose sensor in an offline batch. In order to verify the algorithm used in this article, we tested the constructed SLAM algorithm and the state-of-the-art lidar slam method (Fast_lio2 [16], Lio_sam [18], Loam [7]) on the same dataset.  Figure 10 shows the results of our algorithm for localization and mapping with Dataset Corridor. Figure 11 compares trajectories in the xy direction recovered by our algorithm, Fast_lio2, Lio_sam, Loam, Odometry, and ground truth on the Long Straight Corridor dataset. The odometer is a trajectory obtained using the less accurate odometer raw data calculated directly from the encoder to emphasize the correction effect of the lidar method on the odometer's accumulated error. Meanwhile, to compare the long-term positioning accuracy of the algorithms, all algorithms did not add a loop detection module, and the pose estimation of all algorithms was performed in the robot body frame. Figures 12 and 13 show the corresponding trajectory errors relative to the ground truth, respectively. It can be seen from Figure 12 that the x-direction is the running direction of TWIP, which has fewer constraints, and these algorithms have more significant errors in the x-direction. Among them, the accuracy of loam in the x-direction is significantly lower than the other algorithms due to the lack of measurements from other sensors. There is a constraint from the wall in the y-direction; all the lidar-based SLAM algorithms have little difference in accuracy, as they all use the same matching method. While the results by our method exhibit much better accuracy, there is a smaller upper limit of errors in the x and y directions. Attention should be paid to the z-axis values in Figure 14, where the z-coordinates estimated by our method are well-bounded around zero (which is consistent with natural indoor environments) while the other algorithms have significantly larger upper bounds. Using a nonholonomic constraint factor and adding planar constraints and encoder pre-integration constraints improves indoor pose estimation accuracy, especially in the z-direction. The numerical results of the estimation errors are presented in Table 2. The root mean square of the errors (RMSE) is used here, including those of the errors in principal parameter directions, namely, the x, y, pitch, and yaw coordinates. In terms of RMSE, the error of our algorithm is lower than the other algorithms in all directions for a two-wheeled inverted pendulum robot.

Experimental Results and Analysis of Outdoor Structured Environment
In our outdoor experiments, we used algorithms such as lio_sam for comparison with our algorithm. Figure 15 shows our algorithm's trajectory and environment map on Dataset Outdoor. Figure 16 compares estimated trajectories in the x and y directions on Dataset Outdoor, including our algorithm, fast_lio2, lio_sam, loam, odometry, and ground truth. Figures 17 and 18 are the corresponding errors relative to the ground truth, respectively. Figure 19 shows the trajectory in the z-direction, which should be constrained around zero. As can be seen from the results, our algorithm continues to outperform the other algorithms in outdoor scenes.
Moreover, the accuracy improvement of our algorithm is more significant in the z-axis direction than in the xand y-axis directions. There is no degradation scene in the outdoor experiment. The lidar feature points mainly come from the reflection of surrounding objects, which constrains the robot's pose in the horizontal direction. In contrast, lidar provides relatively few feature points in the vertical direction due to the limitation of the vertical field of view. Our algorithm introduces ground constraints, which increase the pose constraints of the robot in the vertical direction and improve the localization accuracy of the system, especially in the z-direction. Meanwhile, we quantitatively analyzed the performance of our algorithm and other algorithms in the direction of the principal parameters on the outdoor dataset in terms of RMSE. The results are shown in Table 3, indicating that the error of our algorithm in the direction of principal parameters is lower than that of the other algorithms.      Our indoor and outdoor experiments show that the non-full constraint factor proposed by our algorithm naturally introduces the pre-integration constraint of the encoder while adding a ground constraint in the form of the random constraint model, which is more in line with the actual motion of the two-wheeled inverted pendulum robot, resulting in the improved positioning and mapping accuracy of the system.

Conclusions
This paper proposes a novel Lidar-IMU-Odometer coupled framework for localization and mapping for a two-wheeled inverted pendulum robot on approximately flat ground. We formulate a factor graph incorporating relative and absolute measurements from different sensors (including ground constraints) into the system as factors. In this coupled framework using a lidar-IMU-encoder system based on a factor graph, we introduce five types of factors and analyze the constraint dimension of each type of factor on the robot state node according to the motion characteristics of TWIP. For the non-fully constrained factors (the encoder pre-integration factor and ground constraint factor), we propose a new factor with constraints in order to add them naturally to the factor graph with the robot state node on SE (3). Experiments in indoor and outdoor environments validate the superior performance of our system compared to other state-of-the-art lidar-based SLAM algorithms.
This proposal paves the way for using constrained models to consider motion perturbations for specially-structured ground robot state estimation with different sensors. In the future, based on the work inof this paper we intend to realize the positioning and mapping of TWIP on undulating roads via the proposed LiDAR-IMU-Encoder coupling framework

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
This section proves the relationship between the item in row 0, column 0,r 00 and the item in row 1, column 1,r 11 in the matrix finally obtained in Equation (43). From Equations (7) and (8) Therefore, we can see that the item r 00 and the item r 11 in the matrix are equal.

Appendix B
This section proves the constraints implicit in the final matrix obtained in Formula (43). When TWIP moves on the ideal ground plane, the x-y plane of the plane coordinate system corresponding to each keyframe is the same plane. Therefore, the y-axis of the plane frame corresponding to keyframe j is in the x-y plane of the plane coordinate system corresponding to keyframe i. As shown in Figure A1, we translate the y-axis y j of the plane coordinate system π j to the plane coordinate system π i , and all coordinate axes are represented in the world coordinate system. We assume that the angle between y j and y i is α, that the angle between y j and x i is β, and that α+β=90 • . Therefore, we have  Figure A1. Geometric relationship between two continuous plane coordinate systems π i and π j .