1. Introduction
The norm DIN 1076 [
1] regulates the inspection of engineering structures in connection with roads in Germany. It requires the close-hand detection of damage such as cracks, delaminations, spalling, and cavities every three years. Large surface areas and diverse damage characteristics make this approach time-consuming and subjective. Automated mapping and damage detection using autonomous platforms like UGVs or UAVs can overcome this problem. For the visual inspection of bigger surface damage, the resolution of laser scanning can be sufficient. Photogrammetry is particularly suitable for fine structures like cracks. For the detection of subsurface damage, new technologies like LiDAR-based cavity detection can be used, as proposed by Vierhub-Lorenz et al. [
2]. For the photogrammetric approach, the usual practice is taking enough images of the bridge from different perspectives and deriving a point cloud or textured mesh using a photogrammetric pipeline based on structure from motion (SfM). This computationally intensive approach cannot be used for real-time navigation of the platform but can be applied after the mission by using the recorded data. Moreover, real-time navigation is required in the case of real-time damage detection. For example, a close-up can be taken additionally, if damage is detected in a distant image. Many engineering structures like bridges shadow GNSS, as stated and tested in several works [
3,
4,
5,
6,
7], complicating precise localization especially in the case of low structures and narrow areas between piers and girders. As used for indoor environments, SLAM solves this problem since it does not require GNSS. There is LiDAR-based SLAM, which usually uses a multi-layer LiDAR sensor and an IMU, and there is visual SLAM, which relies on a single camera or a system of multiple cameras. Visual SLAM is dependent on texture and context, which is dependent on the field of view (FoV). Engineering structures like bridges with a low ratio of height to width usually have bad ambient light coverage and shadowed areas. High contrast between the structure surface and the surrounding leads to the overexposure of images. Moreover, to ensure sufficient context in images, big structures require long working distances or a big FoV, which both reduce the ground sampling distance (GSD). The listed constraints even challenge state-of-the-art visual SLAM algorithms [
8,
9,
10].
Due to the limitations of visual SLAM, this work deliberately concentrates solely on LiDAR-based SLAM. There are SLAM algorithms using a single LiDAR sensor like LIO-SAM [
11] or an extended version using scan context for improved loop closure called SC-LIO-SAM [
12], and algorithms using multiple LiDAR sensors. As an example, Xiao et al. [
13] present a tight coupling of a dual LiDAR inertial odometry. They show the benefit of using a combination of horizontal and vertical LiDAR, e.g., in stair scenes. The recent work of Jung et al. [
14] uses an asynchronous multiple LiDAR-inertial odometry. It is compatible with Velodyne, Ouster, and Livox LiDAR sensors. Apart from SLAM, a recent work on traditional iterative closest point (ICP) algorithms [
15] shows that even without using an IMU and using ICP based on subsampled points instead of features, good results can be achieved. A big advantage is the low number of parameters compared to SLAM. However, the proposed KISS-ICP does not include loop closure.
Recent works focus on SLAM in GNSS-denied areas. Rizk et al. [
16] present a method to overcome the limitations in complexity and memory requirement for UAV localization using image stitching. Chen et al. [
17] fuse position information from ultra-wideband anchors with LiDAR point cloud data to detect line-of-sight measurements. Saleh and Rahiman [
18] give an overview of recent mobile robot applications using visual SLAM in GNSS-denied areas. To optimize processing time, Jang et al. [
19] use a GPU-accelerated normal distribution transform localization algorithm for GNSS-denied urban areas. Apart from classical methods, Petrakis and Partsinevelos [
20] propose a deep learning method based on depth images. By using target markers and a combination of SLAM, deep learning, and point cloud processing, they achieve accuracies in the range of centimeters. Dai et al. [
21] present deep learning-based scenario recognition using GNSS measurements on smartphones to recognize deep indoor, shallow indoor, semioutdoor, and open outdoor scenarios. The use of this information is part of their future work. Antonopoulos et al. [
22] propose a localization module based on GNSS, inertial, and visual depth data which can be used for the autonomous navigation of UAVs in GNSS-denied areas. An et al. [
23] propose a novel unsupervised multi-channel visual LiDAR SLAM method, called MVL-SLAM, which uses features based on deep learning for loop closure. Their experiments on the KITTI odometry dataset result in lower rotation and translation errors than other unsupervised methods, including UnMono [
24], SfmLearner [
25], DeepSLAM [
26], and UnDeepVO [
27]. Reitbauer et al. [
28] propose LIWO-SLAM for wheeled platforms also using the wheel odometry information and show reductions in drift on tunnel datasets. Furthermore, Abdelaziz and El-Rabbany [
29] propose a SLAM integration of inertial navigation system, LiDAR, and stereo data for indoor environments and tested it on the KITTI dataset including tunnel scenarios.
For outdoor scenarios, the SLAM results are usually compared to the pose information acquired using GNSS for precise localization. For indoor scenarios, known checkpoints, as used for the Hilti SLAM challenge dataset, can be used for evaluation. In small areas, 6D tracking based on camera setups like Optitrack, as used by Sier et al. [
30], or a laser tracker like the Leica tracker with a T-Probe, allows sufficient ground truth information. However, in outdoor areas, those systems suffer from overexposure and limited distance. Another option is moving the system on a controlled path, as carried out by Filip et al. [
31]. They test different state-of-the-art SLAM algorithms in a featureless tunnel. A rectangular path marked on the floor defines the reference path. However, they only consider a 2D trajectory. Moreover, this does not work on loose surfaces like sand or soil since it depends on a surface where reference markings can be attached. For profound evaluation of new SLAM methods for so called helmet laser scanning, Li et al. [
32] present an outdoor dataset including forests and underground spaces.
In contrast to previous studies, the objectives of this work are (1) presenting an alternative method to measure the performance of SLAM in GNSS-denied outdoor areas using a tachymeter and a reference point cloud acquired by a terrestrial laser scanner; (2) implementing a time synchronization between tachymeter and SLAM trajectory data based on a fitting approach; (3) automated transformation of SLAM point clouds and reference point cloud in tachymeter coordinates; (4) evaluating the proposed method using a dual LiDAR system with IMU on three different algorithms: KISS-ICP, SC-LIO-SAM, and MA-LIO by (5) using a challenging 3D track including height variations and steep slopes leading to vibrations and high variations in roll and pitch.
3. Results
The results of the proposed time synchronization based on absolute distance to the start position are depicted in
Figure 7. By using this approach, even the low frequency of 10 Hz of the tachymeter tracking and higher time duration between key frames of SC-LIO-SAM can be compensated. The graphs already show errors and drift of the SLAM results even without registration. For longer tracks, it might be necessary to use only part of the track, since rotational errors will influence the distance to the starting position.
The six different SLAM trajectories, transformed into tachymeter coordinates, and the tachymeter reference trajectories are shown in
Figure 8. The height map below, which is derived from the RTC360 point cloud, indicates the hills and piers of the pump track. The first two tracks are clockwise and counterclockwise where similar results are expected. However, the driving speed can vary and the exact same path is not driven. Track 3 includes most of the hills of the pump track. Track 4 is a return on the same line in the flat side area. Track 5 is a figure eight to include a meeting point for loop closure and vary the direction of rotation. Track 6, the last track, starts from the center to vary the start location.
The absolute RMS error for each algorithm and track is listed in
Table 1. MA-LIO performs best, followed by SC-LIO-SAM and KISS-ICP. SC-LIO-SAM struggles with the end of track 1 and track 5, which is the bottom side in
Figure 8a,e. One reason could be that SC-LIO has a problem with the person walking behind the robot in this narrow area or the fact that half of the FoV was blocked by the border of the pump track obscuring the view of the upper two bridge piers. Based on that error, the overall registration is shifted downwards. This leads to an RMS error of 1.693 m and 0.617 m. The other trajectories have an RMS error of 6 to 14 cm. Even track 3 with hills has a decent RMS error of 7.6 cm. The performance of MA-LIO is constant for all trajectories and is between 5 and 9 cm. For track 3 with big hills, it was expected that MA-LIO would perform better than SC-LIO-SAM due to the second LiDAR. However, compared to SC-LIO, it has a slightly bigger error but also has much more poses per trajectory, since it does not only use key frames, each 1 m or 0.2 rad. KISS-ICP performs best for itself on track 4, which is the simplest track. The voxel size parameter is already reduced to 1 cm instead of a default value of 1 m to improve accuracy. For the other tracks, a mix of drift and jumps along the trajectory leads to bigger errors of KISS-ICP. Even if the start pose would be at the reference start point, the offset would be even higher. Strong vibrations and steep slope, a LiDAR frame frequency of 10 Hz, and the absence of loop closure and an IMU could be the reasons.
As previously stated, the relative error can give further information on the local SLAM performance. The aligned sub-trajectories for each track and SLAM method are depicted in
Figure 9 and for comprehensive visualization, the detailed view of each sub-trajectory is given in
Figure 10. Compared to
Figure 8, most of the SLAM sub-trajectories are very close to the tachymeter reference trajectory. KISS-ICP shows good performance except from track 3, 4, and 5, which all include hills of the pump track, which cause vibrations and strong roll and pitch changes. The mean relative error is given in
Table 2. Moreover,
Figure 11 depicts the relative sub-trajectory errors per track for the three different SLAM methods plotted over the distance tracked by the tachymeter. Except for track 3, MA-LIO achieves the best results. MA-LIO is followed by SC-LIO-SAM and KISS-ICP. The relative trajectories are smaller than the absolute trajectory errors. This is due to the fact that even parts of the overall trajectory which are displaced due to a previous drift can locally have low errors compared to the reference trajectory. This relative trajectory error highly depends on the movement of the robot and the local environment acquired during a sub-trajectory. Moreover, for this short sub-trajectory duration, only a few key poses of SC-LIO are included. This decreases the relative error because the alignment using ICP is based on a low number of points compared to MA-LIO and KISS-ICP, which give more pose estimates in between. For KISS-ICP, the relative error is down to 10 cm. This shows that for local mapping or navigation, KISS-ICP could be sufficient.
Since the tachymeter trajectory contains only position information, the next results include the distances between the point clouds, which are also a result of orientation errors. The SLAM point clouds with cloud-to-cloud distance as the color scale are shown in
Figure 12. The point clouds of KISS-ICP and SC-LIO-SAM mainly cover the bridge piers, since only the horizontal LiDAR is used. Due to slopes and far scanning distances, the ceiling is partially covered by SC-LIO-SAM. KISS-ICP also covers the same areas; however, they are not included in
Figure 12, since the errors are bigger than the set threshold of 50 cm. Therefore, this is the first sign that the KISS-ICP trajectory also includes orientation errors. MA-LIO covers most of the bridge underside due to the use of the horizontal and vertical LiDAR sensors. For track 4 and track 6, some areas are missing due to incomplete coverage because of the trajectory and offsets larger than 50 cm. Based on the distribution of distances, as shown in the point clouds and next to the color scale bar in
Figure 12, it can be observed that there are multiple layers representing the same surfaces. Those can be partial registration errors or drifts along the trajectory. However, for MA-LIO, the biggest distances are in the ceiling area. To better compare the different results,
Table 3 lists the mean cloud-to-cloud distances. It can be observed that they give a similar ranking for each track as the previously discussed trajectory errors listed in
Table 1. For track 2, 3, 4, and 6, SC-LIO performs better. However, this can be due to the effect that roll and pitch errors have fewer consequences, since only a few ceiling points are scanned. In summary, the results show superior performance of MA-LIO in four of six test tracks, with 5 to 7 cm RMS trajectory error, followed by SC-LIO-SAM and KISS-ICP in last place. SC-LIO-SAM reaches the lowest point cloud-to-reference point cloud distance in four of six test tracks, with 4 to 12 cm.
The results of the last test track passing through all test areas are depicted in
Figure 13. Running KISS-ICP in real time using the default rosbag play settings led to major divergence and loss of location. Playing the rosbag at half the speed of real time and using a voxel size of 1 m, this problem is partially solved. While the horizontal poses appear to be correct, there is a huge drift in the global vertical component leading to the upward curved point cloud and trajectory data. Moreover, the hill area is not correct. In contrast, SC-LIO-SAM successfully maps the environment without major divergence and even covers big parts of the bridge ceiling. Also, the hill area is mapped correctly. However, for this result, the minimum time between frames used for loop closure is increased in such a way that no loop closure is used. With activated loop closure, two similar sections in the flat area are mistakenly matched. Depending on the minimum searching radius for loop closure, errors occur at different positions. Probably, further parameter adjustments are necessary to make the scan context-based loop closure more stable in case of homogeneous or repeating structures. Even without loop closure, MA-LIO performs best. The straight bridge ceiling indicates correct representation of the environment. Moreover, there are no bigger shifts or rotated frames. Lastly, the start and end positions are almost the same, which was manually controlled during the test drive by parking at the same spot where it started. This shows that good results can be achieved even without loop closure.
As a final test, the central processing unit (CPU) usage for all trajectories and SLAM methods is depicted in
Figure 14 and
Figure 15. The visualization shows that the CPU usage is linked to critical areas with larger offsets. The CPU usage of KISS-ICP is relatively high due to the small voxel size of 1 cm for track 1 to track 6. For the long track, it is reduced due to the higher voxel size of 1 m and half the rosbag play speed. Despite the second LiDAR, the CPU usage of MA-LIO is lower than the usage of SC-LIO-SAM.
4. Discussion
4.1. Method
The objective of implementing time synchronization of tachymeter and SLAM trajectory data in post-processing is successfully reached within this work. For all tracks and SLAM algorithms even with key frames only, using the absolute distance to start position is a simple approach for finding the time offset without using features or registration and it is more stable and accurate than using only the beginning and end of the movement. The time synchronization is used for absolute and relative trajectory error calculation. Compared to distance, it is more precise, since it is independent of the temporal resolution of key poses, as used in SC-LIO-SAM. An alternative approach is wireless synchronization between tachymeter and robot or at least synchronizing before starting the measurement. However, the proposed post-processing approach is simple and does not require additional hardware or preparation. One limitation is the visual line of sight, which is required for tracking of the prism. Especially in the case of indoor areas with many rooms and obstacles, the tachymeter will lose the prism. Moreover, the prism should be mounted in such a way that it is always visible. For using SLAM on a UAV, the prism should face downwards if it is tracked from the bottom.
Transforming the reference point cloud into tachymeter coordinates is performed using reference markers. For the SLAM point cloud transformation, the trajectory transformation is used, which is more accurate than detecting reference markers in sparse or noisy SLAM point cloud data. Drawbacks of this method are the need of a tachymeter and that the respective environment must allow tracking of the prism. One solution for areas behind bridge structures could be using the proposed approach only in the visible area where the robot moves in the beginning. These trajectory data can be used for deriving the required transformation. The transformation of the start area can then be applied to the overall trajectory and point cloud data. Additionally, if the trajectory ends within the starting area, a second tracking could be used since the tachymeter coordinates stay the same.
As part of this work, the proposed method is tested on three different algorithms. The gained information on absolute position error, relative error, and cloud-to-cloud distance allows comparing different algorithms or different parameters for the same SLAM approach. The RMS trajectory error and error in mean cloud-to-cloud distance for distances below 50 cm give the same ranking results for all tracks except for track 2 and track 4, where SC-LIO-SAM performs better.
Although this is not part of this work, the proposed synchronization method can be used to enhance the total accuracy by fusing the position information given by the tachymeter and the orientation information of the SLAM for enhanced mapping results in post-processing. In particular, SC-LIO-SAM which allows GNNS input could be replaced by the tachymeter information.
4.2. Test Environment
Using a pump track as a test scenario has shown to be a good way of testing SLAM when using a UGV to include more height and rotational variations. The second LiDAR sensor used for MA-LIO was expected to give better results than SC-LIO-SAM, especially on track 3, including strong height variations. However, the RMS trajectory error and the mean point cloud-to-reference point cloud distance is slightly bigger. As already stated in
Section 3, this might be due to the ceiling area where roll and pitch errors have a big effect due to long distances and a bigger surface than the bridge piers. For SC-LIO, there are fewer consequences due to using only the horizontal LiDAR sensor. Moreover, the lack of loop closure can lead to multiple scanned layers of the same surface in MA-LIO. For track 1, the second LiDAR might have helped to overcome the narrow area with the limited field of view of the horizontal LiDAR. Moreover, for track 5 with the figure eight including hills and a change in direction of rotation, it obtains the best results, which could be due to more context because of the second LiDAR sensor. This outcome supports the statement of the study of Xiao et al. [
13] where the second LiDAR reduced errors in staircases in indoor areas. It must be mentioned that the absolute and relative errors derived using the proposed method are dependent on the selected bridge scenario, the SLAM algorithm parameters, and the driven trajectories. For other scenarios, the absolute and relative performance can be different. This must be further studied in a variety of scenarios, which is possible using the proposed method.
4.3. Sensor Selection
Within this study, only the VLP16 LiDAR sensor with a vertical FoV of 30 with 16 scanning lines is used. For more context and resolution, there are other sensors with up to 90 FoV and up to 128 channels. In particular, single-LiDAR algorithms like SC-LIO-SAM and KISS-ICP could benefit. It is expected that the drift of KISS-ICP in the hill area could be reduced. However, it will not replace the big advantage of an IMU. Applying the proposed methods to other sensor configurations including the variation in the number and the relative orientation and distance of multiple LiDAR sensors is part of future work.
4.4. Contributions
The proposed SLAM evaluation method has several advantages over existing methods. Most ground truth data in GNSS-denied areas are usually not time-synchronized, since either checkpoints or ICP without correspondence are used. The latter can lead to the following problems. The evaluated absolute error can be smaller than the actual error of time-corresponding positions. Furthermore, selecting sub-trajectories for the evaluation of the relative trajectory error based on traveled distance is critical when comparing SLAM methods using different pose frequencies due to key frame settings. The traveled distance is shorter for more distant key frames and longer if the positions, derived using a SLAM method, jump. This problem can be amplified in the case of omni-directional movements due to vibrations as a result of uneven terrain for UGVs, windy conditions for UAVs, and complex trajectories in small or medium-sized environments required for inspection tasks or full-coverage and high-resolution mapping tasks.
Using time correspondence for registration gives more precise information on the actual absolute and relative position error. Creating time-synchronized ground truth data with millimeter accuracy is most easily possible using an external tracking system. In the case of a tachymeter, Thalmann and Neuner [
34] use wireless communication with the robot and reach sub-millisecond synchronization. As part of this work, a post-processing method for time synchronization with millimeter accuracy is proposed which is independent of the used tracking system and does not require wireless communication. By optimizing the absolute distance from the trajectory start to each trajectory position using a sliding time window approach, the point-to-point distances are not optimized, but a single time offset is derived which can be used for time synchronization. The proposed time synchronization method is successfully used in this work for evaluating multiple different SLAM methods for multiple trajectories driven on uneven terrain.
Apart from the time synchronization method, this work analyzes different SLAM methods based on a LiDAR sensor with and without IMU and the effect of an additional tilted LiDAR sensor acquiring data of the bridge ceiling and higher parts of the bridge piers. Previous results of Xiao et al. [
13], showing the advantages of a dual-LiDAR system in indoor scenes, are confirmed as part of this work. Most research on multi-LiDAR system configurations is conducted in the field of autonomous driving for road scenarios [
35,
36]. However, also for other scenarios, this is crucial due to the increasing use of multi-LiDAR systems in recent works [
14,
37,
38]. By using the proposed time synchronization method, more combinations of different FOVs, scanning lines, and spatial alignments for different platforms can be tested using the benefits of time correspondence in GNSS-denied areas. More complex environments, like forests with occlusions due to trees or indoor environments with many obstacles, require more complex ground truth creation, as proposed by Li et al. [
32] for their helmet laser scanning dataset.
4.5. Future Work
The future work will contain further study and integration of SLAM on different autonomous mobile platforms. This will include the study of different commercially available LiDAR sensors and LiDAR sensors manufactured at Fraunhofer IPM. The use or combination with visual SLAM and the integration on UAVs is also of research interest. Furthermore, multi-SLAM to solve SLAM on multiple platforms in real time, and multi-session SLAM, which is important for UAVs with limited flight time, requiring multiple flights, are further studied. The final goal is the autonomous navigation of multiple mobile platforms based on (Multi-) SLAM for full-coverage and high-quality mapping of complex engineering structures and real-time damage detection.