Deep Learning-Aided Inertial/Visual/LiDAR Integration for GNSS-Challenging Environments

This research develops an integrated navigation system, which fuses the measurements of the inertial measurement unit (IMU), LiDAR, and monocular camera using an extended Kalman filter (EKF) to provide accurate positioning during prolonged GNSS signal outages. The system features the use of an integrated INS/monocular visual simultaneous localization and mapping (SLAM) navigation system that takes advantage of LiDAR depth measurements to correct the scale ambiguity that results from monocular visual odometry. The proposed system was tested using two datasets, namely, the KITTI and the Leddar PixSet, which cover a wide range of driving environments. The system yielded an average reduction in the root-mean-square error (RMSE) of about 80% and 92% in the horizontal and upward directions, respectively. The proposed system was compared with an INS/monocular visual SLAM/LiDAR SLAM integration and to some state-of-the-art SLAM algorithms.


Introduction
Accurate positioning is one of the key challenges in the field of vehicular navigation. A robust navigation system should work under all driving environments (i.e., urban areas, rural roads, high-traffic, or low-traffic) and conditions (i.e., dry, wet, day, or night). In addition, the concept of sensor redundancy is crucial when designing a navigation system. A navigation system should be equipped with several integrated onboard sensors so that if one sensor malfunctions for any reason, the other sensors will allow the system to operate safely and properly [1][2][3]. The global navigation satellite system (GNSS)/inertial navigation system (INS) integration has been well-studied and adopted in a large number of studies [4][5][6][7][8][9][10][11]. Typically, the measurements of both the GNSS and the inertial measurement unit (IMU) are fused using a Kalman filter [12] that features the use of several integration methods between the GNSS and the IMU [13] (i.e., loosely coupled and tightly coupled). In a loosely-coupled integration, for instance, the INS estimates the pose of the vehicle via IMU mechanization while receiving updates from the GNSS at a slower frequency, which minimizes the significant drift of the IMU. The main problem with such navigation systems is that they solely rely on the INS in cases of prolonged GNSS signal outages. This makes the whole system vulnerable to degraded positioning accuracy as a result of the INS solution drift, especially when using a low-cost micro-electro-mechanical system (MMEMS) IMU. Consequently, the need for more onboard sensors is crucial to help maintain a navigation system's accuracy during a GNSS signal outage.
Light detection and ranging (LiDAR) sensors and cameras have been heavily under investigation to be used for both indoor and outdoor navigation using LiDAR and visual simultaneous localization and mapping (SLAM) techniques. SLAM algorithms are designed to create a map of the surrounding environment while concurrently monitoring the location of the sensor, which is achieved by a LiDAR sensor or a camera [14]. minimizing the difference between the predicted feature points and the corresponding feature points in the current frame. The proposed method was evaluated using the KITTI dataset and compared with several state-of-the-art LO methods. The experimental results showed that the proposed method achieves higher accuracy and robustness, particularly in challenging scenarios such as fast motion and occlusion. Moreover, the proposed method has a lower computational cost than some of the existing methods, making it more practical for real-time applications. A review of the use of deep learning in localization is presented in [39].
In this research, we propose an INS/monocular visual SLAM integrated system, where a loosely-coupled integration is implemented between the INS solution and the LIMO algorithm (monocular visual SLAM) using an EKF, with the aid of deep learning for imagery semantic segmentation. The developed model is then tested using the raw KITTI dataset in both urban and residential driving environments. It is worth mentioning that the KITTI dataset was considered in our research as a source of low to moderate traffic environment. Therefore, to expand the testing of our model in high-traffic environments, we used the Leddar PixSet dataset.
Subsequently, we extended our work in [30], which developed an INS/LiDAR SLAM integrated navigation system, to include the LIMO algorithm as a part of the navigation system. As a result, an integrated INS/LiDAR/LIMO navigation system was developed for two reasons. The first reason was to investigate the inclusion of the LiDAR sensor as an additional onboard sensor to the IMU and the monocular camera. Secondly, we tested our proposed INS/LIMO navigation system against the INS/LiDAR/LIMO and certain state-of-the-art navigation algorithms.

LiDAR SLAM
The LiDAR SLAM used in this research is the Kitware SLAM [17], which is an updated version of the original LOAM algorithm [15]. A detailed description of the Kitware SLAM and the different enhancements it provides in comparison to LOAM are presented in our work in [30].

LiDAR/Monocular Visual SLAM Integration
The LIMO [37,40] algorithm was adopted in our research to integrate the LiDAR sensor into the monocular camera. The LIMO algorithm can be divided into two main stages, namely, monocular visual odometry and scale correction. The first stage involves the use of a conventional visual odometry pipeline, starting from the feature extraction and matching to frame-to-frame motion estimation. The second stage is the key to correcting the scale ambiguity of the resulting trajectory from the first stage, which is accomplished using the LiDAR sensor. Before projecting the LiDAR points into the camera frame and the image plane, all images are semantically segmented to identify the moving objects.
Semantic segmentation refers to the task of assigning each pixel in an image to a specific object class. This is a challenging computer vision task that has many practical applications, such as autonomous driving. One of the leading state-of-the-art models is the original NVIDIA semantic segmentation model [41]. The model was designed to be highly accurate and efficient, which led to the processing of large volumes of imagery data in real time. The model uses an encoder-decoder architecture with skip connections. The encoder consists of a series of convolutional layers, followed by batch normalization and ReLU activation. The convolutional layers use small filters (3 × 3 or 1 × 1) and are designed to reduce the spatial dimensionality of the image while increasing the number of feature channels. The decoder consists of a series of upsampling layers, followed by convolutional layers with batch normalization and ReLU activation. The upsampling layers use bilinear interpolation to restore the spatial resolution of the image. The skip connections are implemented by concatenating the feature maps from the encoder with the upsampled feature maps from the decoder. The output layer of the model is a convolutional layer with softmax activation. This layer assigns a probability distribution over the object classes to each pixel in the image. The class with the highest probability is then assigned to that pixel. To train the NVIDIA semantic segmentation model, the Cityscapes dataset was adopted [42]. The dataset contains over 5000 high-resolution images of urban scenes with detailed annotations of objects and their boundaries. The model was trained to minimize a loss function that measures the difference between the predicted segmentation map and the ground truth segmentation map. The loss function used was the cross-entropy loss, which is commonly used in classification tasks.
In [43], the hierarchical multi-scale attention for semantic segmentation (HMSA) was developed as a modified version of the original NVIDIA semantic segmentation model [41]. The HMSA model introduces additional attention mechanisms and feature aggregation techniques to improve the accuracy of the segmentation. The hierarchical attention mechanism in the HMSA model is composed of three levels of attention modules: a global attention module, a local attention module, and a detail attention module. The global attention module is responsible for capturing the high-level context information of the input image, whereas the local attention module focuses on capturing the mid-level details. The detail attention module is designed to capture the fine-grained details of the input image. Each attention module is implemented as a convolutional neural network (CNN) that takes the feature maps of the previous layer as input and outputs a set of attention maps. The attention maps are used to weigh the feature maps in the previous layer, producing an attention-weighted feature map that is used as input to the next layer of the network. The attention maps are learned via a training process that minimizes the loss function that measures the difference between the predicted segmentation and the ground truth segmentation. In addition to the attention modules, the HMSA model also includes a feature aggregation module that combines features from multiple scales to improve the accuracy of the segmentation. The feature aggregation module is designed to take advantage of the complementary information provided by the features at different scales. Specifically, the feature aggregation module combines the attention-weighted feature maps from each scale using a weighted sum, where the weights are learned via a training process. To further improve the accuracy of the segmentation, the HMSA model uses a multi-scale testing strategy during inference. This strategy involves resizing the input image to multiple scales and passing each resized image through the network to obtain a set of segmentation maps at different scales. The final segmentation map is obtained by upsampling and averaging the segmentation maps obtained at each scale.
The hierarchical multi-scale attention for semantic segmentation (HMSA) model was evaluated using several benchmark datasets, including Cityscapes and PASCAL VOC, and achieved state-of-the-art performance on these datasets. The HMSA model outperformed other semantic segmentation models that do not use attention mechanisms or feature aggregation techniques.
In particular, on the Cityscapes dataset, the HMSA model achieved a mean intersection over union (mIoU) score of 82.2%, outperforming previous state-of-the-art models such as the pyramid scene parsing network (PSPNet) [44] and DeepLabv3+ [45], which achieved mIoU scores of 81.2% and 80.3%, respectively. On the PASCAL VOC dataset, the HMSA model achieved a mIoU score of 88.5%, outperforming previous state-of-the-art models such as DeepLabv3+ and RefineNet, which achieved mIoU scores of 85.7% and 83.4%, respectively.
These results demonstrate the effectiveness of the attention mechanisms and feature aggregation techniques used in the HMSA model, and its ability to outperform previous state-of-the-art semantic segmentation models on benchmark datasets. Figure 1 presents a sample of the semantic segmentation results using the NVIDIA semantic segmentation model on the KITTI dataset, D-28. The figure depicts one frame of the dataset, frame 1644.  After completion of the semantic segmentation, the LiDAR points are projected into the image plane. Any LiDAR point that lies on a moving object is neglected. The remaining LiDAR points are used for depth estimation, and thus scale correction.

Coordinate Transformations
SLAM pose estimations are calculated with respect to the reference frame of the sensor. That is, in LiDAR SLAM, all poses are referenced to the local frame of the first point cloud, whereas in Visual SLAM, they are referenced to the first camera frame. Therefore, all coordinates are transformed into the WGS84 reference, which is essential for subsequent Kalman filtering. The sequence of transformation is graphically shown in Figure 2. After completion of the semantic segmentation, the LiDAR points are projected into the image plane. Any LiDAR point that lies on a moving object is neglected. The remaining LiDAR points are used for depth estimation, and thus scale correction. That is, in LiDAR SLAM, all poses are referenced to the local frame of the first point cloud, whereas in Visual SLAM, they are referenced to the first camera frame. Therefore, all coordinates are transformed into the WGS84 reference, which is essential for subsequent Kalman filtering. The sequence of transformation is graphically shown in Figure 2.   The position and rotation transformations can be performed via a homogenous transformation using 4 × 4 transformation matrices. Let P cam i and P Li i denote the coordinates of a point captured in the local frame of the camera and the LiDAR, respectively. Let P ece f i denote the same point expressed in the WGS84 reference frame. The sequence of homogenous transformations is presented by Equations (1)- (4).
where (R/t) superscript−destination subscript−origin symbolizes the 4 × 4 homogeneous transformation matrix from the original frame to the destination frame (i.e., (R/t) f rame−2 f rame−1 : the transformation that transforms coordinates from frame 1 to frame 2; and the frame abbreviations are defined as cam: camera frame, Li: LiDAR frame, b: body frame, l: local-level frame, and ecef : WGS84 reference frame.

INS/LIMO Integration
The proposed integrated system features a loosely-coupled (LC) integration between the INS and the LIMO, as shown by the block diagram in Figure 3, using an extended Kalman filter (EKF). The raw IMU measurements (i.e., accelerations and angular rotations) are fed into a full IMU mechanization, as described in [30,46], which yields the position, velocity, and attitude of the INS. Meanwhile, both monocular camera images and LiDAR point clouds are fed into the LIMO algorithm to produce the position and attitude estimations. These are used as measurement updates in the update stage of the EKF, which results in the final integrated navigation solution. Finally, the estimated errors are fed back into the full mechanization in a closed-loop error scheme. The EKF mathematical and stochastic models (i.e., the system model, the design matrix (H), and the system dynamic matrix (F)) are the same as our work in [30]. The system noise covariance matrix (Q) is a diagonal matrix, which includes the noise variance of each parameter in the system state. The state vector and measurement update vector are given by Equations (5) and (6), respectively. The EKF mathematical and stochastic models (i.e., the system model, the design matrix (H), and the system dynamic matrix (F)) are the same as our work in [30]. The system noise covariance matrix (Q) is a diagonal matrix, which includes the noise variance of each  (5) and (6), respectively.
where δr = δφ δλ δh T is the position error vector; δv = δv e δv n δv u T is the velocity error vector; δε = δp δr δy T is the attitude angles' error vector; δb a = δb ax δb ay δb az T is the accelerometer bias error vector; δb g = δb gx δb gy δb gz T is the gyroscope bias error The covariance matrix of the measurements R k−LI MO is assumed to be diagonal and contains the variances of the position and attitude angles received from the LIMO estimations, which are depicted by Equation (7).
are the variances of the LIMO's position estimations; and σ 2 p Limo , σ 2 r Limo , σ 2 y Limo are the variances of the attitude estimations of the LIMO algorithm.

INS/LIMO/LiDAR Integration
As a continuation of our work in [30], we investigated the integration of the INS, LIMO, and LiDAR SLAM as opposed to our proposed INS/LIMO system described in  The state vector is the same as expressed by Equation (5). The first and second update stages are expressed by Equations (8) and (9), respectively.  The state vector is the same as expressed by Equation (5). The first and second update stages are expressed by Equations (8) and (9), respectively. where δr 1 and δr 2 are the position error vectors in the first and second update stages, respectively; δε 1 and δε 2 are the attitude error vectors in both update stages; and δZ k1 and δZ k2 are the measurement update vectors. The covariance matrices of the measurement updates are R k and R k−LI MO , as shown by Equation (7) and in [30], respectively.

Scenario 1: Low to Moderate Traffic Environment
In this research, the first scenario involves using the raw Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI, Karlsruhe, Germany) dataset [35], which features driving within low to medium traffic conditions. In addition, it is widely used as a benchmark dataset, which is accessible online via [36]. Researchers worldwide post their results and models on the KITTI odometry benchmark [16]. The sensors used in the KTTTI ground vehicle are an integrated GNSS/IMU unit (OXTS RT3003), a 360-degree rotating mechanical lidar with 64 beams (Velodyne HDL-64E), and two Sony stereo pairs that collect coloured and greyscale images. The dataset includes the estimates of all the system's intrinsic and extrinsic calibration parameters. The KITTI dataset can be broken down into two main categories, namely, the raw dataset and the odometry dataset. The latter is derived from the raw dataset; however, it does not include IMU raw data measurements (i.e., accelerations and angular rotations). Therefore, the raw dataset is the one considered in this research in order to benefit from the IMU raw measurements, which also contributes as an essential part of the proposed navigation models.
In this study, the KITTI dataset is used to formulate two case studies. The first case study proposes an integrated INS/LIMO navigation model, as presented in Section 4.1-Case Study 1. Meanwhile, the second case study is an extension to our previous work in [30], which is presented in Section 4.3-Case Study 3. This showcases another integrated navigation system (INS/LIMO/LiDAR SLAM) that achieves the concept of sensor redundancy. Table 1 provides a summary and some descriptive trajectory statistics for the raw KITTI drive considered in the aforementioned case studies.

Scenario 2: High Traffic Environment
The second scenario considers driving in high-traffic environments, for which the Leddar PixSet dataset was used [47]. In 2020, this dataset was created by LeddarTech, a company specializing in lidar technology. The data collection platform used by LeddarTech is a ground vehicle (Toyota RAV 4). All perception sensors were mounted on the front bumper of the vehicle, as shown in [47]. The data collection sensors included a solid-state LiDAR (Leddar PIXELL sensor), a mechanical LiDAR (Ouster64), three cameras (FLIR), a 180-deg panorama camera (FLIR), a radar (TI AWR1843 mmWave radar), an integrated GNSS/IMU unit, and a PEAKCAN (Toyota RAV4 CAN bus). The system's extrinsic and intrinsic calibration parameters are provided for all datasets. In addition, LeddarTech provided researchers with an online Python API to perform a number of analysis tasks on the datasets (i.e., data synchronization and colourizing LiDAR point clouds) [48]. It is highly recommended by the company to use its online API when using its datasets. It is worth mentioning that the datasets were collected in a variety of weather and lighting conditions, and a variety of challenging driving scenarios (occlusions and crowded intersections).
As a result, we considered this dataset, especially the datasets with high traffic, to develop and test the performance of our proposed integrated INS/LIMO navigation system, as presented in Section 4.2 of this paper-Case Study 2. Thse data are recent in the field and the onboard sensor configuration of the data collection platform (front bumper) is different from the KITTI dataset (vehicle top). In addition, the sensors are also of different makes. The drives considered from the Leddar PixSet datasets, along with some descriptive information, are shown in Table 2.

Case Study 1: INS/LIMO Integration-KITTI Dataset
The first case study presents several raw KITTI drives used to test the developed INS/LIMO integrated navigation system. The first dataset 2011_09_30_drive_0028_sync, labelled as D-28, was considered, for which the ground truth is the integrated GNSS/INS solution provided by the OXTS unit operating in the RTK mode. It is worth mentioning that this drive corresponds to sequence number 8 of the KITTI odometry dataset. Consequently, the frames used from raw D-28 are frames 1100 to 5170, which lasted for approximately 408 s, a travelled distance of 3312.20 m, and an average speed of around 28 km/h. Figure 5 presents the errors in the navigation frame (the ENU reference frame). The figure presents three navigation solutions, namely, the INS solution, the LIMO solution, and the INS/LIMO solution. The INS solution drifts significantly in the east, north, and upward directions, which is expected behaviour and echoes our previous work in [30]. Conversely, the LIMO positioning solution is significantly more accurate. As a result, the final INS/LIMO solution followed that of LIMO by tuning the covariance matrix of the system noise (Q) and the covariance of the measurement updates (R). Figure 6 shows the attitude results (roll, pitch, and yaw angles) for the three navigation solutions. In contrast to the positioning results, it is noticeable that the INS produces slightly more accurate attitude results than LIMO as quantified in Table 3. This behaviour also echoes our previous work in [30], where the accuracy of the INS attitude was justified.
However, the INS attitude solution seems to be more stable and significantly less noisy than LIMO. This promotes the INS attitude estimations to be preferable, and thereby the INS/LIMO followed the INS solution by tuning the Q and R matrices. Table 3 presents the errors statistic for both the position and attitude of all navigation solutions. The RMSE values were calculated using Equation (10).
where i is an index for each data point, n is the total number of points, and x i and        The same analysis was conducted for drives D-42, D-16, D-27, D-33, and D-34. The drives yielded the same trends for position and attitude. This showcases that the proposed integrated INS/LIMO navigation system is robust and accommodates different driving scenarios in both rural and urban environments. Figure 8 illustrates the comparison of the ground truth trajectory versus all navigation solutions, visualized in the ENU reference frame. Figure 9 presents the error statistics for the aforementioned drives in the form of a reduction in the RMSE between the INS/LIMO navigation system and the INS solution. Table 4 presents the values of the RMSE in the east, north, horizontal, and upward directions. The same analysis was conducted for drives D-42, D-16, D-27, D-33, and D-34. The drives yielded the same trends for position and attitude. This showcases that the proposed integrated INS/LIMO navigation system is robust and accommodates different driving scenarios in both rural and urban environments. Figure 8 illustrates the comparison of the ground truth trajectory versus all navigation solutions, visualized in the ENU reference frame.   Table 4 presents the values of the RMSE in the east, north, horizontal, and upward directions.
The drastic reductions in the horizontal and upward directions for all drives are evident. The only exception is D-16 in the horizontal direction, where the INS slightly outperformed the integrated system solution. The reason for this is that D-16 is very short (roughly 400 m in 28 s). During this short time, the INS solution did not drift significantly. This stability over short periods of time is expected from high-quality IMUs, such as the one used in the KITTI data collection platform. This echoes many similar short drives shown in [30]. This stability over short periods of time is expected from high-quality IMUs, such as the one used in the KITTI data collection platform. This echoes many similar short drives shown in [30].

Case Study 2: INS/LIMO Integration-LeddarTech PixSet Dataset
The second case study adopts the Leddar PixSet dataset as a source for high-traffic driving environments. Therefore, it was considered to further test the developed integrated INS/LIMO navigation system. The first dataset used is 20200721_165008_part39_640_1040, labelled as D-1040. This is an approximately 170 m drive that lasted for 40 s, yielding an average speed of 15 km/h. The ground truth for all drives of the Leddar PixSet dataset is provided by the integrated solution of the GNSS/INS unit (SBG EKINOX) operating in the RTK mode. Figures 10 and 11 present the position and attitude errors of all navigation solutions, respectively. The statistical characteristics of these errors are quantified in Table 5.

Case Study 2: INS/LIMO Integration-LeddarTech PixSet Dataset
The second case study adopts the Leddar PixSet dataset as a source for high-traffic driving environments. Therefore, it was considered to further test the developed integrated INS/LIMO navigation system. The first dataset used is 20200721_165008_part39_640_1040, labelled as D-1040. This is an approximately 170 m drive that lasted for 40 s, yielding an average speed of 15 km/h. The ground truth for all drives of the Leddar PixSet dataset is provided by the integrated solution of the GNSS/INS unit (SBG EKINOX) operating in the RTK mode. Figures 10 and 11 present the position and attitude errors of all navigation solutions, respectively. The statistical characteristics of these errors are quantified in Table 5.      In contrast to the first case study (the KITTI dataset), Figure 10 shows that the performance of the LIMO is not quite as accurate as the first case study. However, it still provides a better solution than the INS position estimations. As a result, the integrated navigation solution continues to follow the LIMO solution. In regard to attitude estimations, it can be seen in Figure 11 that INS provides more accurate results for the roll, pitch, and yaw angles, which are similar to those of the first case study. However, it is worth mentioning that the only difference in the comparison to the first case study is the degradation in the quality of LIMO attitude estimations, as shown in Table 4. The reason for such degradation is because of the onboard sensors that are mounted on the front bumper of the car, which narrows the field view of the sensors (both camera and LiDAR) in comparison to the case of the KITTI dataset (car-top mount). This, in turn, makes the entire system sensitive to scene occlusion while driving in high-traffic environments. Figure 12 illustrates a comparison between the ground truth trajectories and all other navigation solutions in the WGS84 reference frame.
dation in the quality of LIMO attitude estimations, as shown in Table 4. The reason for such degradation is because of the onboard sensors that are mounted on the front bumper of the car, which narrows the field view of the sensors (both camera and LiDAR) in comparison to the case of the KITTI dataset (car-top mount). This, in turn, makes the entire system sensitive to scene occlusion while driving in high-traffic environments. Figure 12 illustrates a comparison between the ground truth trajectories and all other navigation solutions in the WGS84 reference frame.  Figure 14, which are quantified in Table 6.  Figure 14, which are quantified in Table 6.     It is important to mention that the same trends of the integrated INS/LIMO system continue to occur among all drives. However, in Figure 13, D-1509 shows that LIMO produces more accurate positioning results in comparison to the other drives. Upon investigation, it is found that D-1509 is a straight driving segment with no turns, with minimal  It is important to mention that the same trends of the integrated INS/LIMO system continue to occur among all drives. However, in Figure 13, D-1509 shows that LIMO produces more accurate positioning results in comparison to the other drives. Upon investigation, it is found that D-1509 is a straight driving segment with no turns, with minimal scene occlusions, which is not the case in the remaining drives.

Case Study 3: INS/LIMO/LiDAR Integration-KITTI Dataset
The third and final case study is a continuation of our work in [30], where the integrated INS/LIMO/LiDAR navigation system described in Section 2.3.3 is tested using D-28. The main reason that the KITTI dataset was chosen for this case study is that it is a benchmark dataset that allows for comparison with state-of-the-art models. Furthermore, we used the KITTI dataset in [30], and therefore, we continue to use it in this study to expand our previously proposed INS/LiDAR SLAM navigation model.
The position and attitude errors are presented in Figures 15 and 16, respectively, whereas Table 7 provides the statistics of these errors. It is noticeable from Table 7 that the INS/LIMO/LIDAR system offers improved accuracy in comparison with LiDAR. The reduction in the RMSE is approximately 70% and 60% in the horizontal and upward directions, respectively. Finally, Figure 17  The position and attitude errors are presented in Figures 15 and 16, respectively, whereas Table 7 provides the statistics of these errors. It is noticeable from Table 7 that the INS/LIMO/LIDAR system offers improved accuracy in comparison with LiDAR. The reduction in the RMSE is approximately 70% and 60% in the horizontal and upward directions, respectively. Finally, Figure 17 graphically compares the INS, LIMO, LiDAR SLAM, and INS/LIMO/LiDAR navigation solutions in the WGS84 reference frame.

Comparison between the INS/LIMO and INS/LIMO/LiDAR Integration
It is evident from Figures 15 and 16 that the final integrated navigation system follows the LIMO position estimations and the INS attitude estimations. This means that the performance of the LIMO is superior to the LiDAR SLAM, as solidified by Table 5. As a

Comparison between the INS/LIMO and INS/LIMO/LiDAR Integration
It is evident from Figures 15 and 16 that the final integrated navigation system follows the LIMO position estimations and the INS attitude estimations. This means that the performance of the LIMO is superior to the LiDAR SLAM, as solidified by Table 5. As a

Comparison between the INS/LIMO and INS/LIMO/LiDAR Integration
It is evident from Figures 15 and 16 that the final integrated navigation system follows the LIMO position estimations and the INS attitude estimations. This means that the performance of the LIMO is superior to the LiDAR SLAM, as solidified by Table 5. As a result, we can conclude that the use of our proposed INS/LIMO integrated system is the best system, given that no sensor failure occurs. In other words, the inclusion of the LiDAR SLAM in the integrated navigation system is redundant. This redundancy enhances the robustness of the navigation system against any potential malfunctioning of the onboard sensors. Therefore, in the case of INS/LiDAR/LIMO integration, the use of the LiDAR as a redundant onboard sensor will prove to be important if the monocular camera fails for any known or unknown reason. Consequently, the navigation system will continue to operate and generate reliable pose estimates up to the accuracy of LiDAR SLAM.

Comparison with State-of-the-Art Models
In order to assess the quality of our proposed navigation model, it is imperative to compare our work with state-of-the-art models. Taking this into account, we used the KITTI dataset in the first and third case studies in this research as a benchmark dataset. In the first case study, our proposed INS/LIMO navigation system provides better attitude estimations in comparison to the sole use of LIMO. In addition, the system positioning estimations rely on LIMO, which outperformed a number of algorithms on the KITTI odometry benchmark [16], such as ORB-SLAM [26,29] and Stereo LSD-SLAM [27].
Regarding the additional integration of INS/LIMO/LiDAR presented in the third case study, we extensively compared the LiDAR SLAM algorithm we used in [30] with leading state-of-the-art LiDAR models [18,20,50]. Our model outperformed the previous models using the KITTI dataset.

Conclusions
In this research, an integrated INS/Visual SLAM (LIMO) navigation system was proposed. The system featured a loosely-coupled integration of the INS and the monocular camera SLAM pose estimations using an EKF. In addition, the system was tested on two datasets, namely, the KITTI dataset and the Leddar PixSet dataset, which covered various driving environments in terms of the nature of the environment (i.e., rural and urban drives) and traffic intensity (i.e., low to moderate traffic and high traffic). The performance of the integrated system resulted in an average reduction of 80% and 92% in the RMSE in the horizontal and upward directions, respectively. In addition, the proposed navigation system was tested against another integration scheme that fuses the measurement of the IMU, a monocular camera, and a LiDAR sensor. The results showed a reduction of 99% and 97% in the RMSE in the horizontal and upward directions, respectively. In addition, the results also confirmed that the inclusion of LiDAR did not affect the final accuracy of the navigation system, albeit adding sensor redundancy to the system, which can be beneficial in the case of a camera malfunctioning. Finally, our proposed navigation system was compared with state-of-the-art models.

Conflicts of Interest:
The authors declare no conflict of interest.