INS/LIDAR/Stereo SLAM Integration for Precision Navigation in GNSS-Denied Environments

Traditionally, navigation systems have relied solely on global navigation satellite system (GNSS)/inertial navigation system (INS) integration. When a temporal loss of GNSS signal lock is encountered, these systems would rely on INS, which can sustain short bursts of outages, albeit drift significantly in prolonged outages. In this study, an extended Kalman filter (EKF) is proposed to develop an integrated INS/LiDAR/Stereo simultaneous localization and mapping (SLAM) navigation system. The first update stage of the filter integrates the INS with the LiDAR, after which the resultant navigation solution is integrated with the stereo SLAM solution, which yields the final integrated navigation solution. The system was tested for different driving scenarios in urban and rural environments using the raw Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset in the complete absence of the GNSS signal. In addition, the selected KITTI drives covered low and high driving speeds in feature-rich and feature-poor environments. It is shown that the proposed INS/LiDAR/Stereo SLAM navigation system yielded better position estimations in comparison to using the INS without any assistance from onboard sensors. The accuracy improvement was expressed as a reduction of the root-mean-square error (RMSE) by 83% and 82% in the horizontal and up directions, respectively. In addition, the proposed system outperformed the positioning accuracy of some of the state-of-the-art algorithms.


Introduction
Precise positioning is one significant obstacle to overcome to achieve accurate vehicular navigation. An effective navigation system should be capable of functioning effectively in any driving scenario, such as in urban or rural areas with high or low traffic, and in varying weather and lighting conditions. Moreover, incorporating redundant sensors is a vital aspect to consider when creating a navigation system, which formulates a robust navigation system. For a navigation system to function securely and accurately, it is necessary to have multiple onboard sensors, which ensures that if one sensor fails, for any reason, the other sensors can compensate for the malfunction, and thereby the system continues to operate safely and effectively [1][2][3].
The integration of the global navigation satellite system (GNSS) with the inertial navigation system (INS) has been extensively researched and implemented in numerous studies [4][5][6][7][8][9][10][11]. Typically, the GNSS and the inertial measurement unit (IMU) measurements are merged using a Kalman filter [12], which can employ various integration methods between the GNSS and the IMU, such as loosely coupled and tightly coupled integrations. Loosely coupled integration of GNSS/INS involves combining independent outputs of both systems to enhance navigation accuracy and reliability, while tightly coupled integration integrates GNSS measurements directly into INS computations [13,14]. However, relying solely on INS in the event of prolonged GNSS signal outages can lead to degraded positioning accuracy, especially when utilizing a low-cost micro-electro-mechanical system (MEMS) IMU. As a result, it is crucial to mount more onboard sensors, such as camera(s) or LiDAR sensors.
Using a camera or a LiDAR sensor for positioning operates on simultaneous localization and mapping (SLAM) algorithms. SLAM can be formally defined as the ability to map the surroundings while simultaneously monitoring the sensor's location [15]. Many visual SLAM (VSLAM) algorithms were proposed as collected in [16]. Visual SLAM employs visual markers or landmarks to establish a map of the surroundings and determine the sensor's position. Typically, visual SLAM utilizes a camera as the primary sensor and analyzes images to identify environmental features. By examining the motion of these features between successive images, visual SLAM can approximate the sensor's location while generating a map of the area. Visual SLAM can be executed using various types of cameras, including monocular cameras [17][18][19][20][21], which retrieve the trajectory up to a scale factor, and stereo cameras [22][23][24], which correct the scale factor. It is worth mentioning that visual SLAM has the key benefit of capturing considerably more environmental details compared to using LiDAR. However, contrary to LiDAR SLAM, visual SLAM is considerably impacted by variations in illumination. It is worth mentioning that in some lighting conditions (i.e., bright sunlight), the LiDAR sensor performance can be negatively affected as a result of interference or backscattering from highly reflective targets [25].
Similar to the developed visual SLAM algorithms, a large number of studies proposed many LiDAR SLAM algorithms, the leading of which is the LiDAR odometry and mapping (LOAM) [26] which was tested on the KITTI odometry benchmark [27]. As a result, numerous variations of LOAM, including A-LOAM and Kitware [28][29][30][31], have been created, which have enhanced the computational efficiency of the original LOAM algorithm. It is worth mentioning that a major advantage of LiDAR SLAM is that it is not impacted by lighting conditions because the LiDAR scanner is an active sensor.
Considering the advantages of both LiDAR sensors and cameras, the most ideal scenario would be to integrate both sensors (i.e., the camera and LiDAR) to obtain rich environmental information while reducing any negative effects of illumination variations.
Several studies have incorporated an INS with either a LiDAR sensor or a camera. In our previous work in [32], we introduced a loosely-coupled integrated navigation system, which aimed to combine INS and LiDAR SLAM using an EKF. The efficacy of this novel navigation approach was evaluated across a range of driving scenarios and environments, utilizing the raw residential and highway datasets from the KITTI dataset. The study comprised three distinct case studies. The first case study considered residential datasets from the KITTI collection, encompassing a total driving duration of 48 min during a complete GNSS signal outage. The second case study delved into highway datasets spanning a total duration of 7 min, encompassing all available highway datasets within the raw KITTI data. Paralleling the first case study, the LiDAR SLAM system showcased better positional results in extended datasets. The third case study tackled scenarios involving intermittent GNSS signal outages, simulating situations like urban canyons and tunnels on highways. The results demonstrated significant enhancements when compared to the same datasets in the first case study, which dealt with complete GNSS signal loss. Across all driving scenarios, the integrated INS/LiDAR SLAM navigation system showcased substantial improvements in positional accuracy for residential datasets, achieving an average RMSE reduction of 88% and 32% in the horizontal and vertical components, respectively. The improvements for highway datasets were approximately 70% and 0.2% for the horizontal and vertical components, respectively. Comparisons were drawn between the proposed system and three state-of-the-art LiDAR SLAM algorithms. While the system slightly outperformed its counterparts for residential datasets, it significantly outshone other SLAM algorithms in highway environments. This comprehensive investigation underscores the potential of the proposed integrated navigation system across various driving scenarios and its notable contributions to enhancing navigation accuracy and reliability.
Similarly, another study [33] integrated INS and LiDAR SLAM in an unmanned aerial vehicle (UAV) mapping system to address a malfunctioning GNSS, and the developed GNSS/INS/LiDAR SLAM system was able to overcome GNSS outages. In another research [34], an EKF was used to fuse data from 3D-RISS, GNSS, and LiDAR odometry (LO) to improve positioning accuracy, yielding a 64% decrease in positioning errors compared to using the INS only. In [35], a navigation model that integrated a monocular camera, IMU, and GNSS to navigate ground vehicles in GNSS environments that are susceptible to signal outages was proposed. The system yielded up to a 74% reduction in position error compared to using only GNSS data.
In this research, we extend our work in [32] by proposing an integration between the INS, LiDAR SLAM, and stereo visual SLAM, which leverages the often overlooked capabilities of the INS as a navigation sensor that can produce reliable information. Additionally, the developed navigation model stands out in terms of achieving redundancy, which is crucial for safe and efficient navigation. The integration is executed using a series of EKFs in a loosely coupled integration scheme. To comprehensively validate the performance of our advanced navigation model, we conducted extensive testing across various drives utilizing raw data from the KITTI datasets. These drives encompassed a wide spectrum of driving scenarios, ranging from intricate urban environments to sprawling rural landscapes, and included variations in driving speeds. Finally, our integrated system is weighed against some state-of-the-art models.

LiDAR SLAM
The Kitware SLAM [28], an updated version of the LOAM algorithm [26], is the LiDAR SLAM employed in this study. In our work in [32], a thorough explanation of Kitware SLAM and the various improvements it offers over LOAM is presented.

Stereo Visual SLAM
ORB-SLAM [21,24,36] can be considered a superior state-of-the-art visual SLAM algorithm that can be implemented for both monocular and stereo cameras. Therefore, ORB-SLAM3 [36,37] was adopted in this research to process the stereo images of the KITTI data.
The pipeline of the stereo SLAM system consists of feature detection, feature matching, pose estimation, point cloud construction, and bundle adjustment. Using a scale-invariant feature transform (SIFT) algorithm, ORB features are detected and matched in the left and right stereo images, followed by orientation estimation using the intensity centroid method. The matched points are utilized to calculate the fundamental matrix, and the relative camera pose between frames is then estimated by solving the five-point algorithm with RANSAC to exclude outliers. Using a robust optimizer known as the robust information matrix optimization (RIMO) algorithm, which takes into account the feature measurements' uncertainty, the camera pose estimation is further refined. The environment's 3D point cloud is generated by triangulating the matched feature points in the left and right camera images using relative pose estimates. Using bundle adjustment, the point cloud is optimized to minimize the reprojection error of the 3D points onto the stereo images. Utilizing the Levenberg-Marquardt algorithm with the Schur complement factorization, the optimization is carried out. The optimization of bundle adjustment includes robust loss functions to manage outliers and noise. Using a bag-of-words approach, loop closures are identified by comparing the current image to a database of prior images. The method represents images using histograms of visual words and matches the current image to the database entry with the highest degree of similarity. Using geometric constraints, the loop closure is validated, and then the map and camera configurations are optimized using bundle adjustment. In other words, all poses in LiDAR SLAM are in reference to the first LiDAR frame, while in Visual SLAM, they are based on the first camera frame. Therefore, all local coordinates will be transformed from local frames to the WGS84 ellipsoid, which is necessary for the subsequent Kalman filtering. Figure 1 illustrates the process of transformation graphically. In other words, all poses in LiDAR SLAM are in reference to the first LiDAR frame, while in Visual SLAM, they are based on the first camera frame. Therefore, all local coordinates will be transformed from local frames to the WGS84 ellipsoid, which is necessary for the subsequent Kalman filtering. Figure 1 illustrates the process of transformation graphically.

INS/LiDAR/Stereo SLAM Integration
The measurements of the IMU, LiDAR, and the stereo camera are fused using the proposed integrated navigation system through a series of EKFs as illustrated by the block diagram in Figure 2. Firstly, the INS solution is fused with the LiDAR SLAM pose Pose transformations are executed using 4 × 4 homogenous transformation matrices. Let P i cam and P i Li denote the coordinates of a point expressed in the camera frame and LiDAR frame, respectively. Let P i ece f denote the same point expressed in the WGS84 reference frame. The same point referenced to the WGS84 reference frame is denoted by P i ece f . Equations (1)-(4) represent the sequence of homogeneous transformations.
where (R/t) superscript−destination subscript−origin symbolizes the 4 × 4 homogeneous transformation matrix from the origin frame to the destination frame. That is, (R/t) Li cam is the homogenous transformation from the camera frame to the LiDAR frame, (R/t) b Li denotes the homogenous transformation from the LiDAR frame to the body frame, (R/t) l b is the transformation from the body frame to the local-level frame, and (R/t) ece f l is the transformation from the local-level frame to the WGS84 reference frame.

INS/LiDAR/Stereo SLAM Integration
The measurements of the IMU, LiDAR, and the stereo camera are fused using the proposed integrated navigation system through a series of EKFs as illustrated by the block diagram in Figure 2. Firstly, the INS solution is fused with the LiDAR SLAM pose estimations through the first EKF, forming the navigation solution of the INS/LiDAR SLAM integration. The latter is then fused with the stereo SLAM solution in the second EKF to form the final integrated INS/LiDAR/Stereo navigation system. Finally, the updated errors are fed back into the IMU's full mechanization, which creates a closed-loop error scheme.
estimations through the first EKF, forming the navigation solution of the INS/LiDAR SLAM integration. The latter is then fused with the stereo SLAM solution in the second EKF to form the final integrated INS/LiDAR/Stereo navigation system. Finally, the updated errors are fed back into the IMU's full mechanization, which creates a closed-loop error scheme. The EKF mathematical and stochastic models (i.e., the system model, the design matrix (H), and the system dynamic matrix (F)) are the same as our work in [32]. The state vector is given by Equation (5). The first and second update vectors are given by Equations (6) and (7), respectively.
where ℎ is the error vector of the position; is the error vector of the velocity, is the error vector of the attitude; is the error vector of the accelerometer bias; is the error vector of the gyroscope bias;

Data Acquisition and Description
In this research, the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset [38] is adopted as the data source, which is available through [39]. The KITTI dataset stands out as a valuable resource for researchers internationally and they can publish their results on the KITTI odometry benchmark [27] to enable fair comparisons between different algorithms and methods. It is renowned for its diverse sensor suite, which enables capturing data from various perspectives. The platform's sensor configuration includes an integrated GNSS/IMU unit (OXTS RT3003), a 360-degree rotating mechanical LiDAR with 64 beams (Velodyne HDL-64E), and two Sony stereo pairs that collect both color and grayscale images. These sensors work in harmony to provide a rich and comprehensive view of the environment surrounding the data collection vehicle. The EKF mathematical and stochastic models (i.e., the system model, the design matrix (H), and the system dynamic matrix (F)) are the same as our work in [32]. The state vector is given by Equation (5). The first and second update vectors are given by Equations (6) and (7), respectively.
where δr = δφ δλ δh T is the error vector of the position; δv = δv e δv n δv u T is the error vector of the velocity, δε = δp δr δy T is the error vector of the attitude; δb a = δb ax δb ay δb az T is the error vector of the accelerometer bias; δb g = δb gx δb gy δb gz T is the error vector of the gyroscope bias; δZ k1 , δZ k2 are the measurement update vectors; δr 1 , δr 2 are the position error vectors in the first and second update stages, respectively (i.e., latitude error: ϕ I NS−Li = ϕ I NS − ϕ Li ); and δε 1 , δε 2 are the attitude error vectors in both update stages.

Data Acquisition and Description
In this research, the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset [38] is adopted as the data source, which is available through [39].
The KITTI dataset stands out as a valuable resource for researchers internationally and they can publish their results on the KITTI odometry benchmark [27] to enable fair comparisons between different algorithms and methods. It is renowned for its diverse sensor suite, which enables capturing data from various perspectives. The platform's sensor configuration includes an integrated GNSS/IMU unit (OXTS RT3003), a 360-degree rotating mechanical LiDAR with 64 beams (Velodyne HDL-64E), and two Sony stereo pairs that collect both color and grayscale images. These sensors work in harmony to provide a rich and comprehensive view of the environment surrounding the data collection vehicle.
The KITTI dataset was gathered in Germany, specifically in the town of Karlsruhe. The choice of location offers an advantage for researchers, as it presents a wide range of realworld driving scenarios and environmental conditions. The dataset encompasses urban, rural, and highway scenes, making it applicable to various driving applications worldwide.
The KITTI dataset is comprised of several categories, namely raw data, object detection and tracking, road segmentation, and odometry. Each category serves a specific purpose for vehicular navigation and autonomous driving research. The raw data consists of raw sensor measurements, including images, point clouds, and sensor calibration parameters. Researchers can utilize these data to develop and test their algorithms for tasks such as object detection, semantic segmentation, and depth estimation. It is worth noting that the raw datasets are available in time unsynchronized and synchronized data, where the latter is widely utilized. The object detection and tracking dataset focuses on providing annotated 2D and 3D bounding box information for objects of interest, such as cars, pedestrians, and cyclists. It allows researchers to train and evaluate object detection and tracking algorithms. The third category of data, road segmentation, is designed to tackle road scene understanding. It provides pixel-level annotations for road regions, enabling the development of algorithms for road segmentation, a crucial task for autonomous navigation. Finally, the odometry dataset offers precise ego-motion estimates, allowing researchers to evaluate and refine their localization and mapping algorithms. It consists of synchronized camera-LIDAR sequences with accurate ground truth pose information. However, it does not contain raw IMU measurements (i.e., raw accelerations and angular rotations). Therefore, in order to take advantage of the IMU's raw measurements which are essential components of the proposed navigation models, the raw dataset is the one considered in this study.
The structure of the KITTI raw dataset is a well-defined, consistent structure across all drives. Each dataset is organized into sequences, where each sequence represents a continuous driving segment. For instance, a sequence might cover a specific urban area or a highway stretch. Each sequence contains multiple data modalities, such as grayscale and color images, Velodyne point clouds, and GPS/IMU measurements. Additionally, the dataset includes calibration files for sensor alignment and synchronization purposes.
In this research, both residential and highway datasets are considered, which forms two case studies. The drive number of each of the raw KITTI datasets is presented in Table 1. In addition, the table presents the length, the elapsed drive time, average speed, and the number of frames of each dataset.

Residential Datasets-Case Study 1
The first case study analyzes seven drives of the raw KITTI dataset, in which the driving environment is in urban areas at low speeds with high-density features. All seven datasets add up to approximately 17 kilometers of driving distance for a total driving time of around 34 min. In addition, all drives were used to test the performance of the developed integrated navigation system in residential driving environments. The first residential drive to consider was D-34, which is a 920-m drive segment averaging a speed of around 26 km/h. Figure 3  datasets add up to approximately 17 kilometers of driving distance for a total driving time of around 34 min. In addition, all drives were used to test the performance of the developed integrated navigation system in residential driving environments. The first residential drive to consider was D-34, which is a 920-m drive segment averaging a speed of around 26 km/h. Figure 3 presents the position error in the ENU frame (navigation frame) for four navigation solutions, namely the INS, the LiDAR SLAM, the Stereo SLAM, and the final integrated navigation solution. These errors are calculated in reference to the reference trajectory provided by the integrated GNSS/INS unit (OXTS). The errors of the attitude angles (roll, pitch and yaw) are shown in Figure 4.   From Figure 3, it is observable that the INS solution drifts significantly shortly after the vehicle starts the drive, which is typical behaviour of the sole use of an IMU. Both the LiDAR and Stereo SLAM yield significantly more accurate results when compared to the INS, with the Stereo SLAM being the most accurate. Consequently, the final navigation solution followed that of Stereo SLAM for positioning results. On the contrary, in Figure  4, the INS produces more accurate and stable estimations of the attitude angles as opposed to the LiDAR and Stereo SLAM estimations. This is a result of the high-quality IMU utilized by the KITTI data collection platform. In addition, this behaviour is consistent with our previous findings in [32]. Consequently, the integrated system was tuned to match the INS's attitude estimation. In Table 2, the error statistics of the INS/LiDAR/Stereo nav- the LiDAR and Stereo SLAM estimations. This is a result of the high-quality IMU utilized by the KITTI data collection platform. In addition, this behaviour is consistent with our previous findings in [32]. Consequently, the integrated system was tuned to match the INS's attitude estimation. In Table 2, the error statistics of the INS/LiDAR/Stereo navigation solution are displayed, while Figure 5 compares all four trajectories to the ground truth.  The remaining datasets in Table 1 experienced the same analysis as previously done for D-34. The datasets yielded the same trend for the position and attitude estimations. Appendix A, Figures A1-A10 provide detailed position and attitude results for each of the remaining drives of the KITTI dataset. Figure 6 compares the INS, LiDAR SLAM, Stereo SLAM, and the INS/LiDAR/Stereo integrated navigation solution to the ground truth trajectory for each drive, as displayed in the navigation frame.
Finally, Figure 7 showcases the RMSE reductions of the INS position estimations for the horizontal and up directions for all drives of the KITTI dataset. It is worth mentioning that the average reduction of INS RMSE for all residential drives was approximately 99% and 86% in the horizontal and up directions, respectively. The remaining datasets in Table 1   . Comparison of trajectories of the raw residential KITTI drives in the ENU frame. Figure 6. Comparison of trajectories of the raw residential KITTI drives in the ENU frame.
Finally, Figure 7 showcases the RMSE reductions of the INS position estimations for the horizontal and up directions for all drives of the KITTI dataset. It is worth mentioning that the average reduction of INS RMSE for all residential drives was approximately 99% and 86% in the horizontal and up directions, respectively.

Highway Datasets-Case Study 2
The second case study involves testing the performance of the developed integrated navigation system using raw highway KITTI datasets. Four highway datasets were considered, which totalled a driving distance and time of roughly 5 kilometres and 5 min, respectively. Detailed results are provided for D-42, which is a 2591-m driving segment with a driving time of 121.19 s. Figures 8 and 9 present the position errors in the ENU frame and attitude errors (roll, pitch, and yaw), respectively. The statistics of these errors are presented in Table 3.

Highway Datasets-Case Study 2
The second case study involves testing the performance of the developed integrated navigation system using raw highway KITTI datasets. Four highway datasets were considered, which totalled a driving distance and time of roughly 5 kilometres and 5 min, respectively. Detailed results are provided for D-42, which is a 2591-m driving segment with a driving time of 121.19 s. Figures 8 and 9 present the position errors in the ENU frame and attitude errors (roll, pitch, and yaw), respectively. The statistics of these errors are presented in Table 3.

Highway Datasets-Case Study 2
The second case study involves testing the performance of the developed integrated navigation system using raw highway KITTI datasets. Four highway datasets were considered, which totalled a driving distance and time of roughly 5 kilometres and 5 min, respectively. Detailed results are provided for D-42, which is a 2591-m driving segment with a driving time of 121.19 s. Figures 8 and 9 present the position errors in the ENU frame and attitude errors (roll, pitch, and yaw), respectively. The statistics of these errors are presented in Table 3.  It is noticeable from Figures 8 and 9 that the integrated navigation system persists to have the same behaviour as shown in the first case study. That is, the system follows the position estimations of the more accurate sensors (stereo SLAM) while following the most accurate attitude estimates by the INS. Having conducted the same analysis for the remaining highway datasets, Figures  A11-A16 in Appendix A provide a detailed comparison of the positional and attitude errors of the datasets. In addition, Figure 10 presents a graphical comparison between the trajectories of the INS, LiDAR SLAM, Stereo SLAM, the integrated navigation system and the ground truth trajectory in the ENU reference frame.  It is noticeable from Figures 8 and 9 that the integrated navigation system persists to have the same behaviour as shown in the first case study. That is, the system follows the position estimations of the more accurate sensors (stereo SLAM) while following the most accurate attitude estimates by the INS.
Having conducted the same analysis for the remaining highway datasets, Figures A11-A16 in Appendix A provide a detailed comparison of the positional and attitude errors of the datasets. In addition, Figure 10 presents a graphical comparison between the trajectories of the INS, LiDAR SLAM, Stereo SLAM, the integrated navigation system and the ground truth trajectory in the ENU reference frame.  Figure 11 presents the RMSE reductions of the INS position estimations in the horizontal and up directions for highway drives considered in the second case study. It is important to note that, on average, the INS RMSE was reduced by approximately 50% for the horizontal direction and 73% for the up direction in highway drives.    Figure 11 presents the RMSE reductions of the INS position estimations in the horizontal and up directions for highway drives considered in the second case study. It is important to note that, on average, the INS RMSE was reduced by approximately 50% for the horizontal direction and 73% for the up direction in highway drives.

Comparison to State-of-the-Art Models
It is worth mentioning that even though the developed INS/LiDAR/Stereo navigation system does follow the position of the stereo SLAM and the attitude of the INS, the LiDAR plays a vital role as a redundant onboard sensor. If some unexpected malfunction occurs to the stereo camera, the navigation system will continue to operate efficiently to the accuracy of the LiDAR SLAM, which prevents full navigation system failure.
The INS/LiDAR/Stereo navigation system yielded superior performance to the system we developed in [32] as illustrated in Table 4. The table compares both navigation models using a sample of the KITTI dataset. The accuracy improvement in the positional RMSE is around 50% and 64% for the horizontal and up directions, respectively. In addition, it is important to mention that the INS/LiDAR navigation model was compared in [32] to other state-of-the-art SLAM algorithms, which promotes the accuracy of our INS/LiDAR/Stereo system versus both our previous work and other state-of-the-art SLAM algorithms [29,30,41].

Conclusions
In this research, an integrated INS/LiDAR/Stereo navigation system was developed and tested in various driving scenarios using the raw KITTI dataset. The datasets were broken into two case studies, namely residential and highway datasets. Both case studies covered residential areas driving scenarios at relatively low driving speeds with dense features and driving in highway areas at higher speeds and fewer features. In both case studies, the developed integrated navigation system followed the position of the stereo SLAM estimation, albeit the attitude estimations of the INS. In addition, the integrated system led to an overall improvement in the INS RMSE for all case studies of 83% and 82% in the horizontal and up directions, respectively. Finally, the developed navigation model was superior to our previously developed INS/LiDAR navigation model by an average accuracy improvement of approximately 50% and 64% for the horizontal and up directions, respectively. Council of Canada (NSERC), Toronto Metropolitan University, and the Government of Ontario.