Positioning of Quadruped Robot Based on Tightly Coupled LiDAR Vision Inertial Odometer

Quadruped robots, an important class of unmanned aerial vehicles, have broad potential for applications in education, service, industry, military, and other fields. Their independent positioning plays a key role for completing assigned tasks in a complex environment. However, positioning based on global navigation satellite systems (GNSS) may result in GNSS jamming and quadruped robots not operating properly in environments sheltered by buildings. In this paper, a tightly coupled LiDAR vision inertial odometer (LVIO) is proposed to address the positioning inaccuracy of quadruped robots, which have poor mileage information obtained though legs and feet structures only. With this optimization method, the point cloud data obtained by 3D LiDAR, the image feature information obtained by binocular vision, and the IMU inertial data are combined to improve the precise indoor and outdoor positioning of a quadruped robot. This method reduces the errors caused by the uniform motion model in laser odometer as well as the image blur caused by rapid movements of the robot, which can lead to error-matching in a dynamic scene; at the same time, it alleviates the impact of drift on inertial measurements. Finally, the quadruped robot in the laboratory is used to build a physical platform for verification. The experimental results show that the designed LVIO effectively realizes the positioning of four groups of robots with high precision and strong robustness, both indoors or outdoors, which verify the feasibility and effectiveness of the proposed method.


Introduction
Quadruped robots have broad potential for applications in education, service, industry, military, and other fields, due to their advantages, such as good flexibility and strong environmental adaptability [1]. The independent positioning of quadruped robots is key factor in completing assigned tasks in a complex environment and has become a research hotspot [2]. At present, the mainstream positioning technologies include map matching [3], mileage estimation [4], satellite and beacon [5], and so on. The positioning based on map matching requires expensive high-precision sensors to establish prior maps [6]. In studies, satellite-based positioning accuracy is not high, and the signal under the shelter of buildings is weak [7]. Beacon-based positioning is also limited to the layout of positioning base stations [8]. As compared to the aforementioned methods, location-positioning based on mileage estimation has advantages in both cost and the scope of application.
The design of the current odometer has mostly been applied to wheeled robots, whose mileage data are obtained by measuring the rotation of the wheel using a built-in encoder to achieve accurate positioning information. However, the wheel odometer has been shown as idle on smooth ground and cannot measure wheel "skidding", which adversely affects the positioning accuracy [9]. However, as opposed to wheeled robots, legged robots are driven by motorized leg joints, which cannot provide accurate mileage data when relying To ensure high stability and strong robustness in odometer-based location data for quadruped robots, nonlinear optimization is combined with point cloud data obtained by 3D LiDAR, the image characteristics obtained from binocular vision and IMU inertial data, and loopback detection is used for repositioning to eliminate any cumulative error. In addition, this study proposed a tightly coupled LiDAR vision inertia odometer (LVIO) algorithm that could effectively resolve the errors caused by uniform motion models in a LiDAR odometer, such as mismatching, feature loss, pose calculation failures, and data drift, among others. Precise positioning with strong robustness is achieved for quadruped robots, both indoors and outdoors. This paper is arranged as follows. Section 2 introduces the LVIO algorithm and describes the algorithmic flow of the proposed method. In Section 3, the experimental platform and environment are briefly introduced, followed by the calibration of relevant sensors, and then the experiments were conducted and analyzed. Section 4 discusses the experimental results. Finally, Section 5 summarizes the experiment and conclusions as well as provides direction for future research.

Method
In this section, we introduce the tightly coupled LiDAR visual inertia odometer algorithm. Specifically, the front-end data processing algorithm will be introduced, including feature extraction and tracking image data acquired by binocular camera through optical flow method, pre-integration of acceleration and angular velocity data from IMU, and alignment processing of LiDAR point cloud data. The current state quantity, including position ( p ), speed ( v ), and attitude ( q ), can be obtained. Then the tightly coupled back-end nonlinear optimization is performed using a sliding window, and the optimization variables are derived by minimizing the edge, visual, and IMU residuals to obtain pvq for all frames of the sliding window. Finally, loopback detection and repositioning are performed, and the LiDAR odometry, visual odometry, and IMU pre- To ensure high stability and strong robustness in odometer-based location data for quadruped robots, nonlinear optimization is combined with point cloud data obtained by 3D LiDAR, the image characteristics obtained from binocular vision and IMU inertial data, and loopback detection is used for repositioning to eliminate any cumulative error. In addition, this study proposed a tightly coupled LiDAR vision inertia odometer (LVIO) algorithm that could effectively resolve the errors caused by uniform motion models in a LiDAR odometer, such as mismatching, feature loss, pose calculation failures, and data drift, among others. Precise positioning with strong robustness is achieved for quadruped robots, both indoors and outdoors. This paper is arranged as follows. Section 2 introduces the LVIO algorithm and describes the algorithmic flow of the proposed method. In Section 3, the experimental platform and environment are briefly introduced, followed by the calibration of relevant sensors, and then the experiments were conducted and analyzed. Section 4 discusses the experimental results. Finally, Section 5 summarizes the experiment and conclusions as well as provides direction for future research.

Method
In this section, we introduce the tightly coupled LiDAR visual inertia odometer algorithm. Specifically, the front-end data processing algorithm will be introduced, including feature extraction and tracking image data acquired by binocular camera through optical flow method, pre-integration of acceleration and angular velocity data from IMU, and alignment processing of LiDAR point cloud data. The current state quantity, including position (p), speed (v), and attitude (q), can be obtained. Then the tightly coupled back-end nonlinear optimization is performed using a sliding window, and the optimization variables are derived by minimizing the edge, visual, and IMU residuals to obtain pvq for all frames of the sliding window. Finally, loopback detection and repositioning are performed, and the LiDAR odometry, visual odometry, and IMU pre-integration constraints, as well as the loopback detection constraint are jointly optimized for the global poses.

Optical Flow Feature Tracking
The front-end of visual odometer (VO) adopts a Lucas-Kanade (LK) optical flow for feature tracking [27]. Optical flow is the projection of a moving object, which describes the 1.
On the premise of constant brightness, short distance movement, and spatial consistency (similar motion of adjacent pixels), the brightness of a pixel point on the image is constant at the moment, and the basic equation of optical flow can be obtained as follows: 2. Let the increments of coordinates be x d , y d , and the increments of time be t d , we attain the following: ( ) ( , , ) , , 3. With small motion, the position does not change as much with time; then the Taylor series of the image at ( , , ) I x y t is as follows: The effect of using LK optical flow algorithm to track the image feature points is shown in Figure 2.

IMU Pre-Integration
IMU pre-integration adopts the median method, that is, the pose of two adjacent moments is calculated by means of the average value of IMU measurements at two moments [28]. The camera-IMU model is shown in Figure 3.

IMU Pre-Integration
IMU pre-integration adopts the median method, that is, the pose of two adjacent moments is calculated by means of the average value of IMU measurements at two moments [28]. The camera-IMU model is shown in Figure 3.
where w is the world coordinate system, superscript b is the IMU coordinate system, bw q represents the rotation quaternion (from world to IMU coordinates), g represents the gravity vector, and b and n denote bias and noise of IMU accelerometer and gyroscope, respectively.
2. The derivative of pvq with respect to time can be written as follows: 3. By integrating the measured value of IMU for the state quantity at the i moment, the value at the j moment is as follows: The traditional IMU recursive calculation method obtains the state quantity pvq at the current moment by using the measured linear acceleration _ a and angular velocity _ ω (the real values are a and ω) through integral operation [29].

1.
Calculation formula of IMU measurement is as follows: where w is the world coordinate system, superscript b is the IMU coordinate system, q bw represents the rotation quaternion (from world to IMU coordinates), g represents the gravity vector, and b and n denote bias and noise of IMU accelerometer and gyroscope, respectively.

2.
The derivative of pvq with respect to time can be written as follows: .
3. By integrating the measured value of IMU for the state quantity at the i moment, the value at the j moment is as follows: 4.
In VIO based on nonlinear optimization, to avoid repeated integration when absolute poses are optimized, IMU needs to be pre-integrated [30], that is, the integration model is converted into a pre-integration model, and the formula is as follows:

5.
When q wb i is separated from q wb t outside the integral operation, the integral term of the attitude quaternion in the integral formula of pvq becomes the attitude q b i b t relative to the i moment. According to the acceleration and angular velocity obtained by IMU sensor, p t v t q t can be obtained by continuous integration on the basis of initial p 0 v 0 q 0 . Equation (7) can be written as follows: 6.
In the process of each optimization iteration, the attitude adjustment is adjusted relative to the world coordinate system, that is, q wb j is adjusted outside the integral while q b i b t is relatively unchanged, so as to avoid the problem of repeated calculation of integral. In Equation (9): we find the pre-integral of IMU, corresponding to position (p), velocity (v), and attitude (q) respectively.

LiDAR Point Cloud Registration
Before registration, feature primitives were extracted from the LiDAR point cloud and synchronized with camera frames and IMU timestamps to optimize all sensors at once [31]. The processing includes point cloud distortion removal, timestamp synchronization, filtering, primitive extraction, and tracking. The motion distortion compensation and timestamp synchronization of the laser frame point cloud are shown in Figure 4.  The point cloud registration adopts ICP algorithm to transform the point convergence into the point convergence with similar local geometric features. The traditional point-to-point ICP algorithm [32] has the following disadvantages: noise or abnormal data may lead to algorithm convergence or error. The selected initial iteration value will have an important influence on the final registration result, and if the initial iteration value is not properly selected, the algorithm may be limited to the local optimum. Pointto-plane ICP algorithm [33], as shown in Figure 5, uses the distance from the measuring point to the plane as the objective function. It converges faster than point-to-point and approximates nonlinear problems through linear optimization. The point cloud registration adopts ICP algorithm to transform the point convergence into the point convergence with similar local geometric features. The traditional point-topoint ICP algorithm [32] has the following disadvantages: noise or abnormal data may lead to algorithm convergence or error. The selected initial iteration value will have an important influence on the final registration result, and if the initial iteration value is not properly selected, the algorithm may be limited to the local optimum. Point-to-plane ICP algorithm [33], as shown in Figure 5, uses the distance from the measuring point to the plane as the objective function. It converges faster than point-to-point and approximates nonlinear problems through linear optimization. point-to-point ICP algorithm [32] has the following disadvantages: noise or abnormal data may lead to algorithm convergence or error. The selected initial iteration value will have an important influence on the final registration result, and if the initial iteration value is not properly selected, the algorithm may be limited to the local optimum. Pointto-plane ICP algorithm [33], as shown in Figure 5, uses the distance from the measuring point to the plane as the objective function. It converges faster than point-to-point and approximates nonlinear problems through linear optimization.
where M is a 4 × 4 3D rigid body transformation matrix composed of rotation ( R ) and shift ( T ), which is shown as follows: Point cloud data were collected by 3D LiDAR. We calculate a point-to-plane ICP algorithm. Red was the source point clustering, and green was the target point clustering. Algorithm results are shown in Figure 6. As shown in the figure, two points converged (source surface and destination surface), source point s i and target point d i , as well as target plane. n i is the normal vector of the target plane, and l i is the distance from the source point to the target plane. The objective function is shown as follows: where M is a 4 × 4 3D rigid body transformation matrix composed of rotation (R) and shift (T), which is shown as follows: Point cloud data were collected by 3D LiDAR. We calculate a point-to-plane ICP algorithm. Red was the source point clustering, and green was the target point clustering. Algorithm results are shown in Figure 6. After point cloud registration, a sparse depth map is created. By stacking several frames together, a dense depth map can be obtained for the visual feature points [34].

Tightly Coupled Nonlinear Optimization
As opposed to the loosely coupled method, which assumes image processing as a black box and integrates IMU data until visual odometry (VO) is calculated, the tightly coupled method adds image feature information into the state vector. The filter-based tightly coupled VIO state vector has a lower dimension as its calculation method is relatively simple. The calculation consisted of two parts: one predicts and updates the status data, and the other measures the data from the other sensors. The positioning accuracy is relatively low [35].
The tightly coupled back-end nonlinear optimization based on sliding windows can achieve high accuracy. An objective function is used to unify visual constraints, IMU After point cloud registration, a sparse depth map is created. By stacking several frames together, a dense depth map can be obtained for the visual feature points [34].

Tightly Coupled Nonlinear Optimization
As opposed to the loosely coupled method, which assumes image processing as a black box and integrates IMU data until visual odometry (VO) is calculated, the tightly coupled method adds image feature information into the state vector. The filter-based tightly coupled VIO state vector has a lower dimension as its calculation method is relatively simple. The calculation consisted of two parts: one predicts and updates the status data, and the other measures the data from the other sensors. The positioning accuracy is relatively low [35].
The tightly coupled back-end nonlinear optimization based on sliding windows can achieve high accuracy. An objective function is used to unify visual constraints, IMU constraints, and closed-loop constraints to implement bundle adjustment (BA), minimize edge residuals, visual residuals, and IMU residuals, and calculate the Jacobian matrix by differentiating the optimization variables. The bias, external parameters from IMU to camera, and pvq of all frames in the sliding window were obtained.
The above formula consists of three objective items, namely, marginalization residual, IMU measurement residual, and visual reprojection error. X is all states in the sliding window. x k is the IMU state (pvq and acceleration bias b a , gyroscope bias b g ) captured in the image of frame k. x b c is the external parameter of the camera, which is expressed as follows: where n is the number of key frames, and m is the total number of all observed landmark points in the sliding window. r p in the optimization term is the marginal residual, and H p is the new information matrix after Schur complement, where p refers to prior, that is, the measurement information is transformed into prior information.

1.
Marginalization residual: Briefly, to remove the information of pose and feature point constraints in the sliding window and retain the image information constraints for optimization through Ceres Solver (nonlinear optimization library) [36], as shown in Figure 7.  represents the old key frame, represents the latest frame, represents IMU constraint, represents visual feature, represents fixed state, and represents estimated state.
Marginalization ensures the sparsity of the system and the continuity of preintegration, ensures enough parallax between key frames, and can triangulate enough feature points.
2. IMU measurement residual: It is generated by IMU data between adjacent frames, including state propagation prediction and residuals of IMU pre-integration, namely r  in the optimization term. Optimization variables are IMU state ( pvq and bias). The calculation of r  is shown as follows:  represents the old key frame, represents the latest frame, represents IMU constraint, represents visual feature, represents fixed state, and represents estimated state.
Marginalization ensures the sparsity of the system and the continuity of preintegration, ensures enough parallax between key frames, and can triangulate enough feature points.
2. IMU measurement residual: It is generated by IMU data between adjacent frames, including state propagation prediction and residuals of IMU pre-integration, namely r  in the optimization term. Optimization variables are IMU state ( pvq and bias). The calculation of r  is shown as follows:  represents the old key frame, represents the latest frame, represents IMU constraint, represents visual feature, represents fixed state, and represents estimated state.
Marginalization ensures the sparsity of the system and the continuity of preintegration, ensures enough parallax between key frames, and can triangulate enough feature points.
2. IMU measurement residual: It is generated by IMU data between adjacent frames, including state propagation prediction and residuals of IMU pre-integration, namely r  in the optimization term. Optimization variables are IMU state ( pvq and bias). The calculation of r  is shown as follows:  represents the old key frame, represents the latest frame, represents IMU constraint, represents visual feature, represents fixed state, and represents estimated state.
Marginalization ensures the sparsity of the system and the continuity of preintegration, ensures enough parallax between key frames, and can triangulate enough feature points.
2. IMU measurement residual: It is generated by IMU data between adjacent frames, including state propagation prediction and residuals of IMU pre-integration, namely r  in the optimization term. Optimization variables are IMU state ( pvq and bias). The calculation of r  is shown as follows: 1. Marginalization residual: Briefly, to remove the information of pose and feature point constraints in the sliding window and retain the image information constraints for optimization through Ceres Solver (nonlinear optimization library) [36], as shown in Figure 7. represents the old key frame, represents the latest frame, represents IMU constraint, represents visual feature, represents fixed state, and represents estimated state.
Marginalization ensures the sparsity of the system and the continuity of preintegration, ensures enough parallax between key frames, and can triangulate enough feature points.
2. IMU measurement residual: It is generated by IMU data between adjacent frames, including state propagation prediction and residuals of IMU pre-integration, namely r  in the optimization term. Optimization variables are IMU state ( pvq and bias). The calculation of r  is shown as follows: 1. Marginalization residual: Briefly, to remove the information of pose and feature point constraints in the sliding window and retain the image information constraints for optimization through Ceres Solver (nonlinear optimization library) [36], as shown in Figure 7. represents the old key frame, represents the latest frame, represents IMU constraint, represents visual feature, represents fixed state, and represents estimated state.
Marginalization ensures the sparsity of the system and the continuity of preintegration, ensures enough parallax between key frames, and can triangulate enough feature points.
2. IMU measurement residual: It is generated by IMU data between adjacent frames, including state propagation prediction and residuals of IMU pre-integration, namely r  in the optimization term. Optimization variables are IMU state ( pvq and bias). The calculation of r  is shown as follows: represents estimated state.
Marginalization ensures the sparsity of the system and the continuity of pre-integration, ensures enough parallax between key frames, and can triangulate enough feature points.

2.
IMU measurement residual: It is generated by IMU data between adjacent frames, including state propagation prediction and residuals of IMU pre-integration, namely r B in the optimization term. Optimization variables are IMU state (pvq and bias). The calculation of r B is shown as follows: 3.
Visual residual: The visual reprojection error of the feature point in the sliding window is the error between the observed value and the estimated value of the same landmark point in the normalized camera coordinate system, namely r C in the optimization term. The calculation formula is as follows: The feature quantity to be estimated is the three-dimensional space coordinate (x, y, z) T of the feature point, and the observed value (u, v) T is the coordinate of the feature point under the camera normalized plane. Since the observation depth of feature points may be large, it is difficult to optimize, so inverse depth parameterization is performed to align with the Gaussian system and reduce the parameter variables of actual optimization. Coordinate relations of feature points in the normalized camera coordinate system are as follows: where λ = 1/z is called inverse depth. 4. Jacobian residual matrix: The value of the feature point in frame i projected to the camera coordinate system in frame j is: Its three-dimensional coordinate form is: To solve the Jacobian matrix is to differentiate the above state variables by visual residuals. The derivative of visual residual to reprojected 3D point f c j is as follows: Remote Sens. 2022, 14, 2945 10 of 21 The derivative of each optimization variable of reprojected 3D point f c j is as follows:

Relocation
Even though marginalization and sliding windows reduce computational complexity, the cumulative drift errors of the system are still inevitable where the errors are the global position (XYZ coordinates) and yaw angle (rotation around the direction of gravity). The purpose of introducing relocation is to move the local sliding window to align the current pose with a past pose [37]. Specific steps include loopback detection and feature matching between loopback candidate frames and tightly coupled relocation, as shown in Figure 8.

1.
First, visual word bag position recognition method (DBoW2) [38] is used for loop detection. During this period, only pose estimation is performed (blue part), and the past state (green part) is always recorded. After time and space consistency test, the visual word bag returns loopback detection candidate frames, and the red dotted line in the figure represents loopback association.

2.
If loopback is detected in the latest frame, the multi-constraint relocation is initiated, the corresponding relationship is matched by BRIEF descriptors [39], and the outer points are removed by two-step geometric elimination method (2D-2D, 3D-2D) [40]. When the inner point exceeds a certain threshold, the candidate frame is regarded as the correct loopback detection and relocation is performed. Feature matching of loopback candidate frames is shown in Figure 9.  2. If loopback is detected in the latest frame, the multi-constraint relocation is initiated, the corresponding relationship is matched by BRIEF descriptors [39], and the outer points are removed by two-step geometric elimination method (2D-2D, 3D-2D) [40]. When the inner point exceeds a certain threshold, the candidate frame is regarded as the correct loopback detection and relocation is performed. Feature matching of loopback candidate frames is shown in Figure 9. the corresponding relationship is matched by BRIEF descriptors [39], and the outer points are removed by two-step geometric elimination method (2D-2D, 3D-2D) [40]. When the inner point exceeds a certain threshold, the candidate frame is regarded as the correct loopback detection and relocation is performed. Feature matching of loopback candidate frames is shown in Figure 9. As shown in Figure 9, loopback is detected in frame 1334 and frame 1387, relocation is started, and feature matching of loopback candidate frame is performed in frame 949 and frame 27, respectively. The matching effect is good without feature matching failure. The specific calculation method is as follows: Figure 9. Feature matching between loopback candidate frames.
As shown in Figure 9, loopback is detected in frame 1334 and frame 1387, relocation is started, and feature matching of loopback candidate frame is performed in frame 949 and frame 27, respectively. The matching effect is good without feature matching failure. The specific calculation method is as follows: The difference between the above formula and the previous tightly coupled nonlinear optimization model lies in the addition of the loopback term, that is, the loopback frame attitude obtained from the pose diagram.

3.
Finally, key frames were added to the pose map, and the global 4-DOF (coordinate XYZ and yaw angle) pose optimization of the past pose and closed loop image frames are performed by minimizing the cost function: where S is the set of all sequential edges (the relative transformation between two key frames in the local slide window directly obtained from VIO), L is the set of all loopback edges (the connection between the latest marginalized key frame i with loop connection and the previous key frame j), and another Huber norm ρ(·) is added to the loopback edge. The impact of error loops can be further reduced. r i,j represents the minimal residual between i and j: whereφ i andθ i are the estimation of roll angle and pitch angle obtained directly from VIO, respectively.

Algorithm Flow
The flow of LiDAR vision inertial odometer (LVIO) algorithm based on tight coupling optimization is shown in Figure 10. The program inputs are image data obtained by binocular camera, IMU data, and point cloud data obtained by LiDAR. The specific algorithm flow is as follows: 1.
The input image data is used for feature point tracking by optical flow method, that is, the motion of the previous frame is used to estimate the initial position of feature points in the current frame. The positive and negative matching of both eyes was performed by LK optical flow method, and its error must be less than 0.5 pixel; otherwise, it is regarded as outlier. Then, the fundamental matrix (F) is calculated by using tracking points, and outliers are further removed using polar constraint [41]. If feature points are insufficient, corner points are used to replace them, and the tracking times of feature points are updated. Then the normalized camera coordinates of feature points and the moving speed relative to the previous frame are calculated. Finally, the feature point data of the current frame (including normalized camera plane coordinates, pixel coordinates, and normalized camera plane moving speed) are saved.

2.
After IMU data is input, IMU median integration is performed to calculate the current frame pose and update pvq. The specific steps are using the IMU data between the previous frame and the current frame; if there is no initialization, the IMU acceleration between this frame is averaged, and the gravity is aligned to obtain the initial IMU attitude. Then the pose of the previous frame is applied it to IMU integral to obtain the pose of the current frame.

3.
The LiDAR odometer is obtained by extracting point cloud features and matching them with visual features. The feature graph is optimized in real time by sliding window. 4.
The estimation results of the laser odometer were used for initialization, and the depth information of the visual features was optimized using the LiDAR measurement results. The visual odometer was obtained by minimizing the visual residuals, IMU measurement residuals and marginalization residuals, and the pose estimation was performed.

5.
Key frame check: The criteria for confirming key frames is that there are many new feature points or the parallax of feature points in the first two frames is large enough [42]. After the keyframe check, the visual word bag method is used for loop detection. 6.
Global pose joint optimization: The whole state estimation task of the system is expressed as a maximum estimation posterior probability problem, and the global pose joint optimization is performed by IMU pre-integral constraint, visual odometer constraint, radar odometer constraint, and loopback detection constraint. 3. The LiDAR odometer is obtained by extracting point cloud features and matching them with visual features. The feature graph is optimized in real time by sliding window. 4. The estimation results of the laser odometer were used for initialization, and the depth information of the visual features was optimized using the LiDAR measurement results. The visual odometer was obtained by minimizing the visual residuals, IMU measurement residuals and marginalization residuals, and the pose estimation was performed.

Key frame check:
The criteria for confirming key frames is that there are many new feature points or the parallax of feature points in the first two frames is large enough [42]. After the keyframe check, the visual word bag method is used for loop detection.
6. Global pose joint optimization: The whole state estimation task of the system is expressed as a maximum estimation posterior probability problem, and the global pose joint optimization is performed by IMU pre-integral constraint, visual odometer constraint, radar odometer constraint, and loopback detection constraint.

Experimental Platform and Environment
The experimental platform is a quadruped robot in our laboratory. The processor is an embedded controller (Jetson Xavier NX), and the software environment is Ubuntu operating system and Robot Operating System (ROS). The LiDAR used is Velodyne 16line three-dimensional LiDAR VLP-16. The camera used is Intel sense-depth camera (RealSense D455). They are shown in Figure 11.
Among them, the image of the RealSense D455 camera includes the global shutter and the integrated IMU (BMI055). The experimental image and IMU data are processed by time synchronization. The image acquisition frequency is 30 Hz, the resolution is 848 × 480, and the IMU data acquisition frequency is 100 Hz.

Experimental Platform and Environment
The experimental platform is a quadruped robot in our laboratory. The processor is an embedded controller (Jetson Xavier NX), and the software environment is Ubuntu operating system and Robot Operating System (ROS). The LiDAR used is Velodyne 16-line three-dimensional LiDAR VLP-16. The camera used is Intel sense-depth camera (RealSense D455). They are shown in Figure 11. The indoor and outdoor experimental environment is shown in Figure 12.
(a) (b) Among them, the image of the RealSense D455 camera includes the global shutter and the integrated IMU (BMI055). The experimental image and IMU data are processed by time synchronization. The image acquisition frequency is 30 Hz, the resolution is 848 × 480, and the IMU data acquisition frequency is 100 Hz.
The indoor and outdoor experimental environment is shown in Figure 12. The indoor and outdoor experimental environment is shown in Figure 12.

The Sensor Calibration
To obtain the internal parameters, such as the distortion parameters of camera, the noise density, and the random walk parameters of the IMU as well as the rotation and translation matrix of binocular camera to IMU, Kalibr (visual inertial calibration toolbox) is used to perform the joint calibration. The process is shown in Figure 13.

The Sensor Calibration
To obtain the internal parameters, such as the distortion parameters of camera, the noise density, and the random walk parameters of the IMU as well as the rotation and translation matrix of binocular camera to IMU, Kalibr (visual inertial calibration toolbox) is used to perform the joint calibration. The process is shown in Figure 13. The indoor and outdoor experimental environment is shown in Figure 12.

The Sensor Calibration
To obtain the internal parameters, such as the distortion parameters of camera, the noise density, and the random walk parameters of the IMU as well as the rotation and translation matrix of binocular camera to IMU, Kalibr (visual inertial calibration toolbox) is used to perform the joint calibration. The process is shown in Figure 13. After calibration, the reprojection error of binocular camera is shown in Figure 14. After calibration, the reprojection error of binocular camera is shown in Figure 14. As shown in the figure, the reprojection errors of both the left and right target are within one pixel; therefore, the calibration effect is good.
The calibration of the external rotation parameters of the 3D LiDAR and binocular camera is a key factor as the calibration results would directly affect the stability and positioning accuracy of LVIO. In this study, the LiDAR-camera joint calibration is per- As shown in the figure, the reprojection errors of both the left and right target are within one pixel; therefore, the calibration effect is good.
The calibration of the external rotation parameters of the 3D LiDAR and binocular camera is a key factor as the calibration results would directly affect the stability and positioning accuracy of LVIO. In this study, the LiDAR-camera joint calibration is performed using the calibration toolbox Autoware. After calibration, we obtained parameters such as "camera extrinsic mat", "camera mat", "distcoeff", and reprojection error. The calibration process is shown in Figure 15. As shown in the figure, the reprojection errors of both the left and right target are within one pixel; therefore, the calibration effect is good.
The calibration of the external rotation parameters of the 3D LiDAR and binocular camera is a key factor as the calibration results would directly affect the stability and positioning accuracy of LVIO. In this study, the LiDAR-camera joint calibration is performed using the calibration toolbox Autoware. After calibration, we obtained parameters such as "camera extrinsic mat", "camera mat", "distcoeff", and reprojection error. The calibration process is shown in Figure 15.

The Indoor Experiment
In the indoor environment, the proposed LVIO method is conducted, which involved marching the quadruped robot in the hall for one week. The total distance of the

The Indoor Experiment
In the indoor environment, the proposed LVIO method is conducted, which involved marching the quadruped robot in the hall for one week. The total distance of the loop is 18 m. At the same time, LVI-SAM [43], which is the latest LiDAR vision inertial odometer calculation method, is used in a comparative experiment. The trajectory is shown in Figure 16. loop is 18 m. At the same time, LVI-SAM [43], which is the latest LiDAR vision inertial odometer calculation method, is used in a comparative experiment. The trajectory is shown in Figure 16. The blue line is the trajectory drawn by a wheeled odometer, the green is drawn by our algorithm, and the red is drawn by LVI-SAM. The trajectory error referred to the absolute error between the estimated trajectory and the output trajectory of the wheeled odometer, which is shown in Table 1.  The blue line is the trajectory drawn by a wheeled odometer, the green is drawn by our algorithm, and the red is drawn by LVI-SAM. The trajectory error referred to the absolute error between the estimated trajectory and the output trajectory of the wheeled odometer, which is shown in Table 1. The above results may have different operating effects in different situations, depending on the operating environment and motion model of the quadruped robot.
In this study, absolute positional error (APE) is used to measure the accuracy of the trajectory. The track errors of the LVIO algorithm and LVI-SAM algorithm are shown in Figure 17. The specific mean error, median error, root-mean-square error, standard error, and the error of the trajectory in the x, y, and z directions, as well as the error of the trajectory in pitch, roll, and yaw direction angles, are shown in Figure 18. The specific mean error, median error, root-mean-square error, standard error, and the error of the trajectory in the x, y, and z directions, as well as the error of the trajectory in pitch, roll, and yaw direction angles, are shown in Figure 18.
Positioning accuracy is an important criterion for the operational effectiveness of the LVIO algorithm. To evaluate the measurement accuracy of the visual inertial odometry, the mean error and root-mean-square error are selected to measure the trajectory error. As shown in Table 1 and Figure 15, the average error between the estimated trajectory and the actual trajectory of the method proposed is 0.290409 m, and the root-mean-square error is 0.358962 m. The average error of the trajectory of the LVI-SAM algorithm is 0.306697 m, and the root-mean-square error is 0.523203 m. The proposed method has a higher localization accuracy. The significant error in the figure is due to the lack of texture on the indoor white wall, which resulted in the deviation of feature matching, as shown in Figure 19. In the indoor environment, the light intensity is moderate, and the variation is weak, so our algorithm operates as expected. Though there is some deviation in feature matching caused by the lack of wall texture, the algorithm still shows good robustness and accuracy.

The Loop Detection Experiment
In an indoor environment, the quadruped robot travels several laps around a circular path with a certain radius. The operation effect of the algorithm is shown in Figure  20. In the indoor environment, the light intensity is moderate, and the variation is weak, so our algorithm operates as expected. Though there is some deviation in feature matching caused by the lack of wall texture, the algorithm still shows good robustness and accuracy.

The Loop Detection Experiment
In an indoor environment, the quadruped robot travels several laps around a circular path with a certain radius. The operation effect of the algorithm is shown in Figure 20. and accuracy.

The Loop Detection Experiment
In an indoor environment, the quadruped robot travels several laps around a circular path with a certain radius. The operation effect of the algorithm is shown in Figure  20. Before loopback detection was added, the trajectory of the quadruped robot deviated from the established path due to the accumulated drift error of the system after several laps. However, after loopback detection is added and the loopback candidate frame matching is performed, the accumulated drift error can always be eliminated in time after the track drift occurs, so as to realize relocation and form a closed loop.

The Outdoor Experiments
To verify the positioning effect of our algorithm when applied to the quadruped robot, relevant experiments need to be performed in an environment with high outdoor lighting intensity and unstable lighting conditions. In the outdoor test, the surrounding environment of the building is used as the experimental object, and the quadruped robot marches around it. Using ROS and Leaflet (open-source interactive map for mobile de- Before loopback detection was added, the trajectory of the quadruped robot deviated from the established path due to the accumulated drift error of the system after several laps. However, after loopback detection is added and the loopback candidate frame matching is performed, the accumulated drift error can always be eliminated in time after the track drift occurs, so as to realize relocation and form a closed loop.

The Outdoor Experiments
To verify the positioning effect of our algorithm when applied to the quadruped robot, relevant experiments need to be performed in an environment with high outdoor lighting intensity and unstable lighting conditions. In the outdoor test, the surrounding environment of the building is used as the experimental object, and the quadruped robot marches around it. Using ROS and Leaflet (open-source interactive map for mobile devices), the trajectory output by our algorithm was displayed in an offline map through a series of coordinate transformations. The total distance of the loop is 550 m. The effect is shown in Figure 21.  In the experiment, in addition to the high light intensity and unstable light conditions, there is also the interference of the dynamic environment (e.g., pedestrians and vehicles passing, etc.). However, the performance of our visual inertial odometer algorithm shows good accuracy and robustness. In the experiment, in addition to the high light intensity and unstable light conditions, there is also the interference of the dynamic environment (e.g., pedestrians and vehicles passing, etc.). However, the performance of our visual inertial odometer algorithm shows good accuracy and robustness.

Discussion
Since the trajectory of a quadruped robot is not in a plane, and the robot must maintain its balance during rigorous motion, the robustness and stability of the visual inertial odometer are required. The LiDAR vision inertial odometer calculation used in this experiment has a good effect on the quadruped robot platform. There is no feature point loss or pose calculation failure occurring under either a static indoor environment or a dynamic outdoor environment. After adding loopback detection and relocation, the track forms a closed loop, which illustrates the improved positioning effect. Although the experiments show that the proposed method works well both indoors or outdoors, we should clearly recognize that there are also some fluctuations in the trajectories of LVIO due to the limitations of sensor accuracy and degraded environments (e.g., no texture, highlight changes, etc.) and dynamic environments. As a result, how to overcome such undesirable phenomenon by improving the accuracy of the sensor and increasing the constraints on the degradation and dynamic environments can be a topic of our future work.

Conclusions
In this study, based on nonlinear optimization, point cloud data obtained by LiDAR, image characteristics of binocular vision, and IMU inertial data are effectively integrated, leading to the LVIO algorithm. A loopback detection and repositioning method is also used to eliminate accumulated errors and improve the positioning accuracy and stability. Finally, the average error of indoor positioning is 0.290409 m. In comparison, the positioning error of LVI-SAM [43], which is the latest LiDAR vision inertial odometer calculation method, is 0.306697 m. The experimental results demonstrate the feasibility and effectiveness of the proposed method. The next step is to study the cumulative drift error of IMU and further improve the real-time performance as well as the large range of positioning accuracy in our algorithm. We plan to use the state-of-the-art MEMS inertial sensors (e.g., Xsens) in future research, and we will improve the feature extraction algorithm, either by adding quadtree or fusing point-line features, to ensure the uniformity and robustness of feature extraction. We will integrate the absolute positioning information of GPS and Beidou into the system to compensate for the cumulative errors of relative positioning in large scenes.