A Benchmark for Multi-Modal Lidar SLAM with Ground Truth in GNSS-Denied Environments

Lidar-based simultaneous localization and mapping (SLAM) approaches have obtained considerable success in autonomous robotic systems. This is in part owing to the high-accuracy of robust SLAM algorithms and the emergence of new and lower-cost lidar products. This study benchmarks current state-of-the-art lidar SLAM algorithms with a multi-modal lidar sensor setup showcasing diverse scanning modalities (spinning and solid-state) and sensing technologies, and lidar cameras, mounted on a mobile sensing and computing platform. We extend our previous multi-modal multi-lidar dataset with additional sequences and new sources of ground truth data. Specifically, we propose a new multi-modal multi-lidar SLAM-assisted and ICP-based sensor fusion method for generating ground truth maps. With these maps, we then match real-time pointcloud data using a natural distribution transform (NDT) method to obtain the ground truth with full 6 DOF pose estimation. This novel ground truth data leverages high-resolution spinning and solid-state lidars. We also include new open road sequences with GNSS-RTK data and additional indoor sequences with motion capture (MOCAP) ground truth, complementing the previous forest sequences with MOCAP data. We perform an analysis of the positioning accuracy achieved with ten different SLAM algorithm and lidar combinations. We also report the resource utilization in four different computational platforms and a total of five settings (Intel and Jetson ARM CPUs). Our experimental results show that current state-of-the-art lidar SLAM algorithms perform very differently for different types of sensors. More results, code, and the dataset can be found at: \href{https://github.com/TIERS/tiers-lidars-dataset-enhanced}{github.com/TIERS/tiers-lidars-dataset-enhanced.


I. INTRODUCTION
Lidar sensors have been adopted as the core perception sensor in many applications, from self-driving cars [1] to unmanned aerial vehicles [2], including forest surveying and industrial digital twins [3].High resolution spinning lidars enable a high-degree of awareness from the surrounding environments.More dense 3D pointclouds and maps are in creasing demand to support the next wave of ubiquitous autonomous systems as well as more detailed digital twins across industries.However, higher angular resolution comes at increased cost in analog lidars requiring a higher number of laser beams or a more compact electronics and optics solution.New solid-state and other digital lidars are paving the way § These authors have contributed equally to this manuscript.
(a) Ground truth map for one of the indoor sequences generated based on the proposed approach (SLAM-assisted ICP-based prior map).This enables benchmarking of lidar odometry and mapping algorithms in larger environments where a motion capture system or similar is not available, with significantly higher accuracy than GNSS/RTK solutions.to cheaper and more widespread 3D lidar sensors capable of dense environment mapping [4], [5], [6], [7].
So-called solid-state lidars overcome some of the challenges of spinning lidars in terms of cost and resolution, but introduce some new limitations in terms of a relatively small field of view (FoV) [8], [6].Indeed, these lidars provide more sensing range at significantly lower cost [9].Other limitations that affect traditional approaches to lidar data processing include irregular scanning patterns or increased motion blur.
Despite their increasing popularity, few works have bench-arXiv:2210.00812v1[cs.RO] 3 Oct 2022 marked the performance of both spinning lidar and solid-state lidar in diverse environments, which limits the development of more general-purpose lidar-based SLAM algorithms [9].To bridge the gap in the literature, we present a benchmark that compares different modality lidars (spinning, solid-state) in diverse environments, including indoor offices, long corridors, halls, forests, and open roads.To allow for more accurate and fair comparison, we introduce a new method for ground truth generation in larger indoor spaces (see Fig. 1a).This enhanced ground truth enables significantly higher degree of quantitative benchmarking and comparison with respect to our previous work [9].We hope for the extended dataset and ground truth labels, as well as more detailed data, to provide a performance reference for multi-modal lidar sensors in both structured and unstructured environments to both academia and industry.
In summary, this work evaluates state-of-the-art SLAM algorithms with a multi-modal multi-lidar platform as an extension of our previous work [9].The main contributions of this work are as follows: 1) a ground truth trajectory generation method for environments where MOCAP or GNSS/RTK are unavailable that leverages the multi-modality of the data acquisition platform and high-resolution sensors; 2) a new dataset with data from 5 different lidar sensors, one lidar camera, and one stereo fisheye cameras in a variety of environments as illustrated in Fig. 1b.Ground truth data is provided for all sequences; 3) the benchmarking of ten state-of-the-art filter-based and optimization-based SLAM methods on our proposed dataset in terms of the accuracy of odometry, memory and computing resource consumption.The results indicate the limitations of current SLAM algorithms and potential future research directions.The structure of the paper is as follows.Section II surveys recent progress in SLAM and existing lidar-based SLAM benchmarks.Section III provides an overview of the configuration of the proposed sensor system.Section IV offers the detailed benchmark and ground truth generation methodology.Section V concludes the study and suggests future work.

II. RELATED WORKS
Owing to high accuracy, versatility, and resilience across environments, 3D lidar SLAM has received much study as a crucial component of robotic and autonomous systems [10].In this section, we limit the scope to the well-known and welltested 3D lidar SLAM methods.We also include an overview of the most recent 3D lidar SLAM benchmarks.

A. 3D Lidar SLAM
The primary types of 3D lidar SLAM algorithms today are lidar-only [11], and loosely-coupled [12] or tightlycoupled [13] with IMU data.Tightly-coupled approaches integrate the lidar and IMU data at an early stage, in opposition to SLAM methods that loosely fuse the lidar and IMU outputs towards the end of their respective processing pipelines.
In terms of lidar-only methods, an early work by Zhang et al. on Lidar Odometry and Mapping (LOAM) introduced a method that can achieve low-drift and low-computational complexity already in 2014 [14].Since then, there have been multiple variations of LOAM that enhance its performance.By incorporating a ground point segmentation and a loop closure module, LeGO-LOAM is more lightweight with the same accuracy but improved computational expense and lower longterm drift [15].However, lidar-only approaches are mainly limited by a high susceptibility to featureless landscapes [16], [17].By incorporating IMU data into the state estimation pipeline, SLAM systems naturally become more precise and flexible.
In LIOM [13], the authors proposed a novel tightly-coupled approach with lidar-IMU fusion based on graph optimization which outperformed the state-of-the-art lidar-only and loosely coupled.Owing to the better performance of tightly-coupled approaches, subsequent studies have focused in this direction.Another practical tightly-coupled method is Fast-LIO [18], which provides computational efficiency and robustness by fusing the feature points with IMU data through a iterated extended Kalman filter.By extending FAST-LIO, FAST-LIO2 [19] integrated a dynamic structure ikd-tree to the system that allows for the incremental map update at every step, addressing computational scalability issues while inheriting the tightly-coupled fusion framework from FAST-LIO.
The vast majority of these algorithms function well with spinning lidars.Nonetheless, new approaches are in demand since new sensors such as solid-state Livox lidars have emerged novel sensing modalities, smaller FoVs and irregular samplings have emerged [9].Multiple existing studies using enhanced SLAM algorithms are being researched to fit these new lidar characteristics.Loam livox [20] is a robust and realtime LOAM algorithm for these types of lidars.LiLi-OM [6] is another tightly-coupled method that jointly minimizes the cost derived from lidar and IMU measurements for both solid-state Lidars and conventional Lidars.
It is worth mentioning that there are other studies addressing lidar odometry and mapping by fusing not only IMU but also visual information or other ranging data for more robust and accurate state estimation [21], [22].

B. SLAM benchmarks
There are various multi-sensor datasets available online.We had a systematic comparison of the popular datasets in the Table III of our former work [9].Among these datasets, not all of them have an analytical benchmark of 3D Lidar SLAM based on multi-modality Lidars.KITTI benchmark [23] is the most significant one with capabilities of evaluating several tasks including odometry, SLAM, objects detection, tracking ans so alike.

III. DATA COLLECTION
Our data collection platform is shown in Fig. 1b, and details of sensors are listed in Table I.The platform has been mounted on a mobile wheeled vehicle to adapt to varying environments.In most scenarios, the platform is manually pushed or teleoperated, except for the forest environment where the platform is handheld.

A. Data Collection Platform
The data collection platform contains various lidar sensors, from traditional spinning lidars with different resolutions to novel solid-state lidar featured with non-repetition scanning patterns.A lidar camera and stereo fisheye camera are also included.There are three spinning lidars, a 16-channel Velodyne lidar (VLP-16), a 64-channel Ouster lidar (OS1), and a 128-channel Ouster lidar (OS0).The OS0 and OS1 sensors were mounted left and right sides, where the OS1 is turned 45 degrees clockwise, and the OS0 is turned 45 degrees anticlockwise.The Velodyne lidar is at the top-most position.Two solid-state lidars, Horizon and Avia, were installed in the center of the frame.The Optitrack marker set for the MOCAP-based and the antenna for GNSS/RTK ground truth are both fixed on the top of the aluminum stick to maximize its visibility and detection range.All sensors are connected to a computer, featuring an Intel i7-10750h processor, 64 GB of DDR4 RAM memory and 1 TB SSD storage, through a Gigabit Ethernet router.The data collection system, including sensor drivers and online calibration scripts, are running on ROS Melodic under Ubuntu 18.04 entirely owing to the wider variety of ROS-based lidar SLAM methods available for Melodic.

B. Calibration and Synchronization
Efficient extrinsic parameters calibration is crucial to multisensor platforms, especially for handmade devices where the extrinsic parameters may change due to unstable connections  or distortion of the material during transit.Similar to our previous work [9], we calculated the extrinsic parameter of sensors before each data collecting process.Fig. 2 shows the calibration result of sample lidar data from one of the indoor data sequences.
Different to our previous work [9], where the timestamp of Ouster and Livox lidars are kept based on their own clock, we synchronized all lidar sensors in ethernet mode via the software-based precise timestamp protocol (PTP) [24].We compared the orientation estimation between the sensor's built IMUs, and SLAM results with lidars and concluded that the latency of our system is below 5 ms.

C. SLAM assisted Ground Truth Map
To provide accurate ground truth for large-scale indoor and outdoor environments, where the MOCAP system is unavailable or GNSS/RTK positioning result becomes unreliable due to the multi-path effect, we propose a SLAM-assisted solidstate lidar-based ground map generation framework.
Inspired by the prior map generation methods in [25], where a survey-grade 3D imaging laser scanner Leica BLK360 scanner is unitized to obtain static pointclouds of the target environment, we employed a low-cost solid-state lidar Livox Avia and high resolution spinning lidar to collect undistorted pointclouds from environments.According to the Livox Avia datasheet,the range accuracy of the Avia sensor is 2 cm with a maximum detection range of 480 m.Due to the non-repetitive scanning pattern, the environment coverage of the pointcloud within the FoV increases with time.Therefore, we integrated multiple frames when the platform is stationary to get more detailed undistorted environmental sampling.Each integrated pointcloud contains more than 240,000 points.The Livox built-in IMU is used to detect the stationary state of the platform when the acceleration values are smaller than 0.01 m/s 2 along all axes.After gathering multiple undistorted pointcloud submaps from the target environment, the next step is to match and merge all submap into a global map by ICP.As the ICP process requires a good initial guess, we employ a high resolution spinning lidar os0 with a 360-degree horizontal FOV to provide raw position by performing real-time SLAM algorithms.This process is outlined in Algorithm 1.A dense and high-definition ground truth map can be obtained by denoising the map generated by the algorithm described above to remove noise.Fig. 1a shows ground truth map of sequence indoor08 generated based on Algorithm 1 Let P sk be the pointcloud produced by the spinning lidar, P dk be the pointcloud generated by solid-state lidar, and I k be the IMU data from built-in IMU.Our previous work has shown high resolution spinning lidar has the most robust performance in diverse environments.Therefore, LeGo-LOAM [15] is per- formed with a high resolution spinning lidar (OS0-128) and outputs the estimated pose for each submap.
The cached data S cache stores submaps and the related poses.Let P i be the pointcloud and related pose p i in S cache [i].The submap P i will be first transformed to map coordinate as P m i based on estimated pose p i ; then GICP methods are employed on P m i to minimize the Euclidean distance between closest points against pointcloud M ap iteratively; P m i will be transformed by the transformation matrix generated from GICP process, then merged to the map M ap .The result map M ap is treated as ground truth map.
After the ground truth map generated, we employ normal NDT method in [26] to match the real-time pointcloud data from spinning lidar against the HR map as the Figure 4 shows to get the platform position in ground truth map.The matching result from the NDT localizer is treated as the ground truth.

IV. SLAM BENCHMARK
In this study, we evaluated popular 3D Lidar SLAM algorithms in multiple data sequences of various scenarios, including indoor, outdoor, and forest environments.

A. Ground Truth Evaluation
The evaluation of the accuracy of the proposed ground truth prior map method is challenging for some scenes in the dataset, as both GNSS and MOCAP systems are not available in indoor environments such as long corridors.Figure 5 (a),(b),(c) shows the standard deviations of the ground truth generated by the proposed method during the first 10 seconds when the device is stationary from sequence Indoors09.The standard deviations along the X, Y , and Z axes are 2.2 cm, 4.1 cm, and 2.5 cm, respectively, or about 4.8 cm overall.However, evaluating localization performance when the device Algorithm 1: SLAM-assisted ICP-based prior map generation for ground truth data. is in motion is more difficult.To better understand the order of magnitude of the accuracy, we compare the NDT-based ground truth Z values with the MOCAP-based ground truth Z values in the sequence Indoor06 environment.The results in Fig. 5 (d) show that the maximum difference does not exceed 5 cm.

B. Lidar Odometry Benchmarking
Different types of SLAM algorithms are selected and tested in our experiment.Lidar-only algorithms LeGo-LOAM (LEGO) and Livox-Mapping (LVXM) are applied on data from the VLP-16 and Horizon separately; Tightly-coupled iterated extended Kalman filter-based methods, FAST-LIO (FLIO) [27], are applied on both spinning lidar and solidstate lidar with built-in IMUs; A tightly coupled lidar inertial SLAM system based on sliding window optimization, LiLi-OM [28] is tested with OS1 and Horizon.Furthermore, a tightly coupled method featuring sliding window optimization developed for Horizon lidar, LIO-LIVOX (LIOL) has also been tested on Horizon lidar data.
We provide a quantitative analysis of the odometry error based on the ground truth in Table III.To compare the trajectories in the same coordinate, we treat the coordinate of OS0 as a reference coordinate and transformed all trajectories generated by selected SLAM methods to reference coordinate.The absolute pose errors (APE) [29] is employed as the core evaluation metric.We calculated the error of each trajectory with the open-source EVO toolset .
From the result, we can conclude that FAST LIO with high resolution spinning lidar OS0 and OS1 has the most robust performance that can complete all the trajectories on different sequences with promising accuracy.Especially for sequence Indoor09 showcasing a long corridor, all other methods failed and Fast LIO with high resolution lidar remain survived.Solid-state lidar-based SLAM systems such as LIOL Hori perform as well or even better in outdoor environments than rotating lidars with appropriate algorithms, but perform significantly more poorly in the indoor environments.For the open road sequences Road03, all SLAM methods perform well, and the trajectories are completed without major disruptions.For the indoor sequence Indoor06, Avia-based and Horizonbased FLIO are able to reconstruct the sensor trajectory but significant drift accumulates.In all of these sequences, all the methods applied to spinning lidars perform satisfactorily.This result can be expected as they have full view of the environment, which has a clear geometry.For the sequence Indoor10 showcasing a long corridor, almost all methods can reconstruct the complete trajectory again.The best performance comes from OS0-FLIO and OS1-FLIO with correct alignment between the first and last positions.We hypothesize that this occurs because OS0 has more channels than OS1, leading to lower accumulated cumulative angular drift.
In addition to the quantitative trajectory analysis, we visualize trajectories generated by selected methods in 3 representative environments (indoors, outdoors, forest) in Fig. 6.Full reconstructed paths are available in the dataset repository.

C. Run-time evaluation across certain computing platforms
We conducted this experiment on 4 different platforms.First, a Lenovo Legion Y7000P with 16 GB RAM, a 6-core Intel i5-9300H (2.40 GHz) and an Nvidia GTX 1660Ti (1536 CUDA cores, 6 GB VRAM).Then, the Jetson Xavier AGX, a popular computing platform for mobile robots, has an 8-core ARMv8.2 64-bit CPU (2.25 GHz), 16 GB RAM and 512-core Volta GPU.From its 7 power modes, we chose MAX and 30 W (6 core only) modes.The Nvidia Xavier NX is also a common embedded computing platform with a 6-core ARM v8.2 64-bit CPU, 8 GB RAM, and 384-core Volta GPU with 48 Tensor cores.For the NX, we choose the 15 W power mode (all 6 cores).Finally, the UP Xtreme board features an 8-core Intel i7-8665UE (1.70 GHz) and 16 GB RAM.
These platforms all run ROS Melodic on Ubuntu 18.04.The CPU and memory utilization is measured with a ROS resource monitor tool .Additionally, for minimizing the difference of the operating environment, we unified the dependencies used in each SLAM system into same version, and each hyperparameter in the SLAM system is configured with the default values.The results are shown in Table IV.
The memory utilization of each selected SLAM approach among the two processor architectures platforms are roughly equivalent.However, the CPU utilization of the same SLAM algorithm running on Intel processors is generally higher than the other algorithms, and also the highest publishing frequency is obtained.LeGO LOAM has the lowest CPU utilization but its accuracy is towarsd the low end (see Table III), and has a very low pose publishing frequency.Fast-LIO performs well, especially on embedded computing platforms, with good accuracy, low resource utilization, and high pose publishing   A final takeaway is in the generalization of the studied methods.Many state-of-the-art methods are only applicable to a single lidar modality.In addition, those that have higher flexibility (e.g., FLIO) still lack the ability to support a pointcloud resulting from the fusion of both types of lidars.

V. CONCLUSION
In this paper, we provide lidar datasets covering the characteristics of various environments (indoor, outdoor, forest), and systematically evaluate 5 open source SLAM algorithms in terms of lidar Odometry, and power consumption.The experiments have covered 9 sequences across 2 computing platforms.By including the Nvidia Jetson Xavier platform, it provides further references for the application of various SLAM algorithms on computationally resource-constrained devices such as drones.Overall, we found that in both indoor and outdoor environments, the spinning lidar-based FLIO exhibited good performance with low power consumption, which we believe is due to the ability of the spinning lidar to obtain a full view of the environment .However, in the forest environment, the LIOL algorithm based on solid-state lidar has the best performance in terms of accuracy and mapping quality, although it has the highest power consumption due to the sliding window optimization.
Finally, we aim to further extend our dataset to provide more refined and difficult sequences and open source it in the future.In this paper, our benchmark tests only focus on SLAM algorithms based on spinning lidar and solid-state lidar.In the future, we will add benchmark tests based on cameras and even SLAM algorithms based on multiple sensor fusions.
Front view of the multi-modal data acquisition system.Next to each sensor, we show the individual coordinate frames.

Fig. 1 :
Fig. 1: Multi-modal lidar data acquisition platform and samples from maps obtained in the different environments included in the dataset.

Fig. 2 :
Fig. 2: Top view of pointcloud data generated for the calibration process with multiple lidars.The red and green pointclouds represent data obtained from the Livox Horizon and Avia, respectively.The pruple, yellow, blue and black clouds are from the VLP-16, OS1, OS0 and L515 sensors, respectively.

Fig. 3 :
Fig. 3: Samples of map data form different dataset sequences.From left to right and top to down, we display maps generated from a forest, an urban area, an open road, and a large indoors lab space, respectively.

Fig. 4 :
Fig. 4: NDT localization with ground truth map.External view and Internal view when the current laser scan (orange) is aligned with the Ground truth map (blue).
Fig. (a) (b) (c): Ground truth position values for the first 10 seconds of the dataset when the device was stationary.Red lines show the mean values over this period of time.(d): Comparison of NDT-based ground-truth z-values (green) to MOCAP-based z-values (red) over the course of 60 seconds of the dataset while the device was in motion.

Fig. 6 :
Fig.6: Demos of trajectories generated by multiple 3D LiDAR SLAM based on data from indoor, road, and wild environments frequency.In contrast, LIO LIVOX has the highest CPU utilization due to the computational complexity of the frameto-model registration method applied to estimate the pose.A final takeaway is in the generalization of the studied methods.Many state-of-the-art methods are only applicable to a single lidar modality.In addition, those that have higher flexibility (e.g., FLIO) still lack the ability to support a pointcloud resulting from the fusion of both types of lidars.

TABLE I :
Sensor specification for the presented dataset.Angular resolution is configurable in the OS1-64 (varying the vertical FoV).Livox lidars have a non-repetitive scan pattern that delivers higher angular resolution with longer integration times.For lidars, range is based on manufacturer information, with values corresponding to 80% Lambertian reflectivity and 100 klx sunlight, except for the L515 lidar camera.

TABLE II :
[9]t of data sequences in our extended dataset.The table includes the sequences introduced in our previous work[9], together with new sequences showcasing new ground truth data sources.The five lidars indicated (5x Lidars) and cameras are listed in TableI.

TABLE III :
Absolute position error (APE) (µ/σ) in cm of the selected methods (N/A when odometry estimations diverge).Best results in bold.

TABLE IV :
Average run-time resource (CPU/RAM) utilization and performance (pose calculation speed) comparison of selected SLAM methods across multiple platforms.For the pose publishing frequency, the data is played at 15 times the real speed.CPU utilization of 100% equals one full processor core.