LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation

Keypoint detection and description play a pivotal role in various robotics and autonomous applications including visual odometry (VO), visual navigation, and Simultaneous localization and mapping (SLAM). While a myriad of keypoint detectors and descriptors have been extensively studied in conventional camera images, the effectiveness of these techniques in the context of LiDAR-generated images, i.e. reflectivity and ranges images, has not been assessed. These images have gained attention due to their resilience in adverse conditions such as rain or fog. Additionally, they contain significant textural information that supplements the geometric information provided by LiDAR point clouds in the point cloud registration phase, especially when reliant solely on LiDAR sensors. This addresses the challenge of drift encountered in LiDAR Odometry (LO) within geometrically identical scenarios or where not all the raw point cloud is informative and may even be misleading. This paper aims to analyze the applicability of conventional image key point extractors and descriptors on LiDAR-generated images via a comprehensive quantitative investigation. Moreover, we propose a novel approach to enhance the robustness and reliability of LO. After extracting key points, we proceed to downsample the point cloud, subsequently integrating it into the point cloud registration phase for the purpose of odometry estimation. Our experiment demonstrates that the proposed approach has comparable accuracy but reduced computational overhead, higher odometry publishing rate, and even superior performance in scenarios prone to drift by using the raw point cloud. This, in turn, lays a foundation for subsequent investigations into the integration of LiDAR-generated images with LO. Our code is available on GitHub: https://github.com/TIERS/ws-lidar-as-camera-odom.


I. INTRODUCTION
LiDAR technology has become a primary sensor for facilitating advanced situational awareness in the domains of robotics and autonomous systems ranging from LiDAR Odometry(LO), Simultaneous Localization and Mapping (SLAM), object detection and tracking, as well as navigation.Among these applications, LO, as a fundamental component in robotics has significantly drawn our attention.Extensive research efforts have focused on the integration of diverse sensors, including Inertial Measurement Units (IMUs), to Notably, recent years have witnessed substantial progress in LiDAR technology, marked by the emergence of numerous high-resolution spinning and solid-state LiDAR devices offering various modalities of sensor data [1], [2].The increased density of the point cloud brings a challenge for point cloud registration with a significant computation overhead, especially for devices with limited computational resources.
Within the aforementioned modalities, LiDAR-generated images, including reflectivity images, range images, and nearinfrared images, have introduced the potential to apply conventional camera image processing techniques to LiDARgenerated images.These images are low-resolution but possi-bly panoramic and exhibit heightened resilience and robustness in challenging environments, such as those characterized by fog and rain, compared to conventional camera images.Additionally, these images can potentially provide crucial information for point cloud registration when there is a deficiency of geometric data or the raw point cloud lacks useful information so as to avoid drift (Fig. 1b).
Keypoint detectors and descriptors have found extensive utility across diverse domains within visual tasks such as place recognition, scene reconstruction, Visual Odometry (VO), Visual Simultaneous Localization and Mapping (VSLAM), and Visual Inertial Odometry (VIO).Nevertheless, there remains a lack of investigation into the performance of extant keypoint detectors and descriptors when applied to LiDAR-generated imagery.
Contemporary methodologies for Visual Odometry (VO) or Visual Inertial Odometry (VIO) rely significantly on the operability of visual sensors, necessitating knowledge of camera intrinsics to facilitate Structure from Motion (SfM) -a requisite not met by LiDAR-generated images.This poses the difficulty of extracting key points from LiDAR-generated images in a certain way to further apply in the odometry estimation.
Therefore, to address the above issues, in this study, i) We investigate the efficacy of the existing keypoint detectors and descriptors on LiDAR-generated images with multiple specialized metrics providing a quantitative evaluation.ii) We conduct an extensive study of the optimal resolution and interpolation approaches for enhancing the lowresolution LiDAR-generated data to extract key points more effectively.iii) we propose a novel approach by leverages the detected key points and their neighbors to extract a reliable point cloud (downsampling) for the purpose of point cloud registration with reduced computational overhead and fewer deficiencies in valuable point acquisition.
The structure of this paper is as follows.In Section II, we survey the recent progress on keypoint detectors and descriptors including approaches and metrics, point cloud matching, and the status of LO.Section III provides an overview of the quantitative evaluation of the existing keypoint detectors and descriptors, the proposed keypoint-assisted point cloud registration, and others.Section IV demonstrates the experimental results in detail.In the end, we conclude the work and sketch out some future research directions in Section V.

II. RELATED WORK
In this section, we commence by presenting a comprehensive review of the prevailing detector and descriptor algorithms documented in the literature.Subsequently, a brief summary of the current advancements in the domain of LiDAR-imaged techniques is offered.We conclude with a concise analysis of the leading algorithms for point cloud registration in LO.

A. Keypoint Detector and Descriptor
In recent years, there have been multiple widely applied detectors and descriptors in the field of computer vision.As illustrated in Table I, we've captured the essential characteristics of different detectors and descriptors.
Harris detector [3] can be seen as an enhanced version of Moravec's corner detector [4] [5].It's used to identify corners in an image, which are the regions with large intensity variations in multiple directions.The Shi-Tomasi Corner Detector [6], is an improvement upon the Harris Detector with a slight modification in the corner response function that makes it more robust and reliable in certain scenarios.The Features from Accelerated Segment Test (FAST) [7] algorithm operates by examining a circle of pixels surrounding a candidate pixel and testing for a contiguous segment of pixels that are either significantly brighter or darker than the central pixel.
For descriptor-only algorithms, Binary Robust Independent Elementary Features (BRIEF) [8] utilizes a set of binary tests on pairs of pixels within a patch surrounding one key point.Fast Retina Keypoint (FREAK) [9] is inspired by the human visual system, which constructs a retinal sampling pattern that is more densely sampled towards the center and sparser towards the periphery.Then it compares pairs of pixels within this pattern to generate a robust binary descriptor.
With respect to the combined detector-descriptor algorithms, the Scale-Invariant Feature Transform (SIFT) [10] [11] detects key points by identifying local extrema in the Difference of Gaussian scale-space pyramid, then computes a gradientbased descriptor for each keypoint.Speeded-Up Robust Features (SURF) [12] is designed to address the computational complexity of SIFT while maintaining robustness to various transformations.Binary Robust Invariant Scalable Keypoints (BRISK) [13] uses a scale-space FAST [7] detector to identify key points and computes binary descriptors based on a sampling pattern of concentric circles.Oriented FAST and Rotated BRIEF (ORB) [14] extends the FAST detector with a multiscale pyramid and computes a rotation-invariant version of the BRIEF [8] descriptor, aiming to provide a fast and robust alternative to SIFT and SURF.Accelerated-KAZE (AKAZE) [15] employs a Fast Explicit Diffusion scheme to accelerate the detection process and computes a Modified Local Difference Binary (M-LDB) descriptor [16] for robust matching.
The emergence of deep learning (DL) techniques, particularly convolutional neural networks (CNN) [17] [18], has revolutionized computer vision, over the last decade.SuperPoint [19] detector employs a fully CNN to predict a set of keypoint heatmaps, where each heatmap corresponds to an interest point's probability at a given pixel location.Then, the descriptor part generates a dense descriptor map for the input image by predicting a descriptor vector at each pixel location.
To sum up, while numerous detector and descriptor algorithms have gained popularity, it is imperative to note that they have primarily been designed for traditional camera images, not LiDAR-based images.Consequently, it's of paramount importance for this study to identify the algorithms that maintain efficacy for LiDAR-based images.

AKAZE ✓ ✓
Builds on KAZE but faster, good for wide baseline stereo correspondence.

Superpoint ✓ ✓
A state-of-the-art AI approach that exhibits superior performance when applied to traditional camera images.

B. LiDAR-Generated Images in Robotics
Within the realm of robotics, some studies over the years have delved into the utilization of LiDAR-based images.But before exploring specific applications, it is vital to know the process by which range images, signal images are generated from point cloud, as detailed in [20], [21].And it's also essential to understand the effectiveness of LiDAR-based images, through an extensive evaluation in the article [22], showing that LiDAR-based images have remarkable resilience to seasonal and environmental variations.
Perception emerges as the indisputable first step in the use of LiDAR within robotics.In [23], Ouster introduced their work, to explain the possibility of using LiDAR as a camera.They demonstrate the effectiveness of car and road segmentation by putting the LiDAR-based image into a pretrained DL model.In the work [24], Tsiourva et al. proposed a saliency detection model based on LiDAR-generated images.In the model, the attributes of reflectivity, intensity, range, and ambient images are carefully contrasted and analyzed.After several advanced image processing steps, multiple conspicuity maps are created.These maps help make a unified saliency map, which identifies and emphasizes the most distinct objects in the image.In the research [25], Sier et al. explored using LiDAR-as-a-camera sensors to track Unmanned Aerial Vehicles (UAVs) in GNSS-denied environments, fusing LiDAR-generated images and point clouds for real-time accuracy.The work [26] explores the potential of general-purpose deep learning perception algorithms, specifically detection and segmentation neural networks, based on LiDAR-generated images.The study provides both a qualitative and quantitative analysis of the performance of a variety of neural network architectures, proving that the DL models built for visual camera images also offer significant advantages when applied to LiDAR-generated images.
Delving deeper into subsequent applications, for example, localization, research in [27], explores the problem of localizing mobile robots and autonomous vehicles within a largescale outdoor environment map, by leveraging range images produced by 3D LiDAR.

C. Evaluation Metrics for Keypoint Detectors and Descriptors
The efficacy of detector and descriptor algorithms is typically assessed through some specific evaluation metrics.As illustrated in Table II, the first three metrics, Number of Keypoints, Computational Efficiency, and Robustness of Detector are straightforward to comprehend and implement, and also widely adopted in numerous studies [28] [12] [29].For instance, the Robustness of the Detector [30] is implemented by contrasting key points before and after the transformations like scaling, rotation, and Gaussian noise interference.
When assessing the precision of the entire algorithmic procedure, which is prioritized by the majority of tasks, the prevalent metrics often necessitate benchmark datasets, such as KITTI [31], HPatches [32].These datasets either provide the transformation matrix between images or directly contain the key point ground truth.For example, in Mukherjee et al.'s study [33], one crucial metric: "Precision", is defined as correct matches/all detected matches, where correct matches are ascertained through the geometric verification based on a known camera position provided by dataset [34].Similarly, in this recent work [35], the evaluation tasks including "keypoint verification", "image matching", and "keypoint retrieval", all rely on the homography matrix between images in the benchmark dataset [32].
Nevertheless, given that research predicated on LiDAR images is at a nascent stage, there exists no benchmark dataset in the field of LiDAR-based images.And the effort required for data labeling [36] [37] to produce such a dataset is considerable and challenging.To bridge this gap, we select multiple key evaluation metrics: Match ratio, Match score, and Distinctiveness, as shown in Table II from previous studies.Match Ratio [33] is quantitatively defined as number of matches/number of key points.A high Match Ratio can suggest that the algorithm is adept at identifying and correlating distinct features; While the exact homography matrix between images remains unknown when lacking banchmark datasets, it can be approximated using mathematical methodologies from two point sets.This computed homography can subsequently be utilized to find correct matches.And number of estimated correct matches/number of matches is denoted as Match Score in our work; And Distinctiveness is computed as follows:

Number of Keypoints
A high number of key points can always lead to more detailed image analysis and better performance in subsequent tasks like object recognition.

Computational Efficiency
Computational efficiency remains paramount in any computer vision algorithms.We gauge this efficiency by timing the complete detection, description, and matching process.

Robustness of Detector
An efficacious detector should recognize identical key points under varying conditions such as scale, rotation, and Gaussian noise interference.

Match Ratio
The ratio of successfully matched points to the total number of detected points, offers insights into the algorithm's capability in identifying and relating unique keypoints.

Match Score
A homography matrix is estimated from two point sets, to distinguish spurious matches, then the algorithm precision is quantified by the inlier ratio.For every image, the k-nearest neighbors algorithm, with k=2, is employed to identify the two best matches [10].If the descriptor distance of the primary match is notably lower than that of the secondary match, it demonstrates the algorithm's competence in recognizing and describing highly distinctive key points.Consequently, this defines the metric: Distinctiveness.

D. 3D Point Cloud Downsampling
Point cloud downsampling is crucial in operating LO or SLAM within a computation-constrained device.Nowadays, there is a substantial of work focusing on the employment of DL networks, for example, a lightweight transformer [38].Other approaches utilized various filters in order to achieve not only point cloud downsampling but also denoising [39].

E. 3D Point Cloud Matching in LO
LO has been widely studied yet challenging due to the complexity of the environment in the robotic field.Contemporary research endeavors have witnessed a notable surge in efforts integrating supplementary sensors, such as Inertial Measurement Units (IMUs), aimed at augmenting the precision and resilience of LO.However, as we focus on the point cloud registration phase of LO, this is out of the scope of the related work of this part.We primarily discuss the solely LiDAR-based LO.Among these solely LiDAR-based approaches, LOAM [40] as a popular matching-based SLAM and LO approach has encouraged a great amount of other LO approaches including Lego-LOAM [41] and F-LOAM [42].
Point cloud matching or registration constitutes the key component in LO.Since its inception approximately three decades ago, the Iterative Closest Point (ICP) algorithm, as introduced by Besl and McKay [43], has spawned numerous variants.These include notable adaptations such as Voxelized Generalized ICP (GICP) [44], CT-ICP [45], and KISS-ICP [46].Among these ICP iterations, KISS-ICP, denoting "keep it small and simple", distinguishes itself by providing a point-to-point ICP approach characterized by robustness and accuracy in pose estimation.Furthermore, the Normal Distributions Transform (NDT) [47] represents another prominent point cloud registration technique frequently employed in LO research.As the latest ICP approach, KISS-ICP is the designated methodology for the point cloud registration we adopted in this study.

III. METHODOLOGY
In this section, we first introduce the dataset we used.Then, we describe our experimental procedure in detail.

A. Dataset
For the evaluation of keypoint detectors and descriptors and our proposed approach, we utilized the published opensource dataset for multi-modal LiDAR sensing [1].The dataset consists of various LiDARs and among them, Ouster LiDAR provides not only point cloud but also its generated images.The Ouster LiDAR applied in the dataset is OS0-128 with its detailed specifications shown in Table III.The images generated by OS0-128 shown in Fig. 2 include signal images, reflectivity images, near-infrared images, and range images with its expansive 360 • × 90 • field of view.Signal images are representations of the signal strength of the light returned to the sensor for a given point, which depends on various factors, such as the angle of incidence, the distance from the sensor, and the material properties of the object.In near-infrared images, each pixel's intensity is represented by the amount of detected photons that are not emitted by the sensor's own laser pulse but may come from sources such as sunlight or moonlight.And every pixel in a reflectivity image represents the calculated calibrated reflectivity.Then, range images demonstrate the distance from the sensor to objects in the environment.
As indicated by the findings of our previous research, signal images have exhibited superior performance in the execution of conventional DL tasks within the domain of computer vision [26].In light of this, for the first two parts of our experiment, we opt to employ signal images from the "indoor 01 square" scene provided by the dataset, which is a scene that spans 114 seconds and comprises 1146 image messages.

B. Optimal Preprocessing Configuration Searching for LiDAR-Generated Images
LiDAR-generated images at hand are typically panoramic but low-resolution.Moreover, these images often exhibit a substantial degree of noise.This prompts a concern of utilizing the original images for facilitating the functionality evaluation of the keypoint detector and descriptor algorithms.And our preliminary experiments have evinced unsatisfactory performance across an array of detectors and descriptors when employing the unaltered original LiDAR-generated images.To identify the optimal resolution and interpolation methodology for augmenting image resolution, an extensive comparative experiment was conducted.
In this part, we implement an array of interpolation techniques on the original images, employing an extensive spectrum of image resolution combinations.The interpolation methodologies encompass bicubic interpolation (CUBIC), Lanczos interpolation over 8x8 neighborhood (LANCZOS4), resampling using pixel area relation (AREA), nearest neighbor interpolation (NEAREST), and bilinear interpolation (LIN-EAR).The primary procedure of the preprocessing is elucidated in Algorithm 1.
More specifically, we iterate a range of image dimensions and interpolation methods in conjunction with the suite of detector and descriptor algorithms designated for evaluation.Each iteration involves a rigorous evaluation of a comprehensive metrics set detailed in Table II.Following a quantitative analysis, we compute mean values for these metrics.This extensive assessment aims to identify the optimal preprocessing configuration that offers a balanced performance for different keypoint detectors and descriptors.

C. Keypoint Detectors and Descriptors for LiDAR-Generated Images
The evaluation workflow of detector-descriptor algorithms typically comprises three stages including feature extraction, keypoint description, and keypoint matching between successive image frames.In this section, the specific procedures for executing these stages in our experimental setup will be elaborated upon.
1) Designated Keypoint Detector and Descriptor: An extensive array of keypoint detectors and descriptors, as detailed in Table I from Section II-A, were investigated.The employed keypoint detectors include SHITOMASI, HARRIS, FAST, BRISK, SIFT, SURF, AKAZE, and ORB.Additionally, we integrated Superpoint, a DL-based keypoint detector, into our methodology.The keypoint descriptors implemented in our experiment are BRISK, SIFT, SURF, BRIEF, FREAK, AKAZE, ORB.
2) Key Points Matching between Images: Keypoint matching, the final stage of the detector-descriptor workflow, focuses on correlating key points between two images, which is essential for establishing spatial relationships and forming a coherent scene understanding.The smaller the distance of the descriptors between two points, the more likely it is that they are the same point or object between two images.In our implementation, we employ a technique termed "brute-force match with cross check", which means for a given descriptor D A in image A and another descriptor D B in image B, a valid correspondence requires that both descriptors recognize each other as their closest descriptors.
3) Selected Evaluation Metrics: As explained in Section II-C, we have opted not to rely on ground truthbased evaluation methodologies due to the lack of benchmark datasets and the substantial labor involved in data labeling.Instead, we combined some specially-designed metrics that are independent of ground truth, together with several intuitive metrics, to form the complete indicators listed in Table II.To our best understanding, this represents the most extensive set of evaluation metrics currently available in the absence of a benchmark dataset.
4) Evaluation Process: The flowchart shown in Algorithm 2 below provides an outline of the steps carried out by the program.Two nested loops are employed to iterate over different detector-descriptor pairs.For each image, the algorithm detects and describes its keypoints.If more than one image has been processed, keypoints from the current image are matched to the previous one.And metrics are placed in corresponding positions to assess the algorithm's performance.

D. LiDAR-Generated Image Keypoints Assisted Point Cloud Registration
1) Selected Data: The selected data for the evaluation from the dataset mentioned in Section III-A includes indoor and outdoor environments.The outdoor environment are from the normal road, denoted as "Open road", and a forest, denoted as "Forest".The indoor data include a hall in a building, denoted as "Hall (large)", and two rooms, denoted as "Lab space (hard)", and "Lab space (easy)".
2) Point Cloud Matching Approach: In this part, we applied KISS-ICP 1 as our point cloud matching approach.It provides also the odometry information, affording us the means to assess the efficacy of our point cloud downsampling approach through an examination of a positioning error, namely translation error and rotation error.To generalize our proposed approach, we tested an NDT-based simple SLAM program 2 as well.
3) Proposed Method for Point Cloud Downsampling: Following the preprocessing of LiDAR-generated images outlined in Section III-C, we derive optimal configurations for the keypoint detectors and descriptors.Utilizing these configurations as a foundation, we establish the workflow of our proposed Analyze the metric values.methodology, illustrated in Fig. 3. Within this process, we conduct distinct preprocessing procedures for both the range and signal images, employing them individually for keypoint detection and descriptor extraction.Subsequently, we combine the key points obtained from both images and search the K nearest points to each of these key points.We systematically varied K within the range of 3 to 7, adhering to a maximum threshold of 7 to align with our primary objective of downsampling the point cloud.Consequently, we find the corresponding point cloud of the key points and their neighbors within the raw point cloud, thereby constituting the downsampled point cloud.
In our analysis, we examined not only the positional error but also the rotational error, computational resource utilization, downsampling-induced alterations in point cloud density, and the publishing rate of LO.

E. Hardware and Software Information
Our experiments are run on the ROS Noetic on the Ubuntu 20.04 system.The platform is equipped with an i7 8-core 1.6 GHz CPU and an Nvidia GeForce MX150 graphics card.Primarily, we used libraries like OpenCV and PCL.Note, that we have used some non-free copyright-protected algorithms from OpenCV, such as SURF, just for research.The assessment of keypoint-based point cloud downsampling was conducted on a Lenovo Legion notebook equipped with the following specifications: 16 GB RAM, a 6-core Intel i5-9300H processor (2.40 GHz), and an Nvidia GTX 1660Ti graphics card (boasting 1536 CUDA cores and 6 GB VRAM).Within this study, our primary focus was on the evaluation of the two open-source algorithms delineated in subsection III-D2, namely, KISS-ICP and Simple-NDT-SLAM.It is imperative to highlight that a consistent voxel size of 0.2 m was employed for both algorithms.Our project is primarily written in C++ (including the DL approach, Superpoint), publicly available in GitHub 3 .

IV. EXPERIMENT RESULT
Through this section, we first cover the final results of our exploration of the preprocessing workflow of LiDAR-based images.Subsequently, an in-depth analysis of keypoint detectors and descriptors for LiDAR-based images is conducted.Then, a detailed quantitative assessment of the performance of LO facilitated by LiDAR-generated image keypoints is presented.As elucidated in Section II-C, Distinctiveness and Match Score are considered as paramount measures for the overall accuracy of the entire algorithm pipeline.Consequently, in 3 https://github.com/TIERS/ws-lidar-as-camera-odomscenarios where different sizes and interpolation methods show peak performance on different metrics, these two metrics are our primary concern.Based on such an criteria, the size 1024 x 64 demonstrated better performance across all dectectors and descriptors methods.Then in Table IV, our evaluation also revealed that the linear interpolation method yielded the most optimal results among the various interpolation techniques.The findings in Table V, also suggest that there is a clear advantage in properly reducing the size of an image as opposed to enlarging it.Additionally, in the process of image downscaling, one pixel often corresponds to several pixels in original image.So overly downscaled images might lead to substantial deviations in the detected key points when reprojected to their original poistions, suggesting that extreme image size reductions should be avoided.

A. Results of preprocessing methods for LiDAR-generated image
And here is a more intuitive result to show that how reducing the size of a image is far better than enlarging it.In Fig. 4a and Fig. 4b, Superpoint detectors identify keypoints as green dots.The enlarged image Fig. 4a displays many disorganized points.Conversely, the downscaled image Fig. 4a, reveals distinct keypoints, such as room corners and the points where various planes of objects meet.Note that we resized the two images for paper readability, originally, their sizes varied.

B. Results of Keypoint Detectors and Descriptors For LiDAR Image
In Fig. 5, which is a metric that only related to detectors, FAST and BRISK algorithms detected the highest number of keypoints, but there were significant fluctuations in the counts.
Comparatively, AKAZE, ORB, and Superpoint identified a reduced number of keypoints, but the consistency was notable.Fig. 6 depicts the Computational Efficiency, where the majority of the algorithms operate in less than 50 ms.After CUDA enabled, SuperPoint runs significantly faster with minimal variance.Among all algorithms, BRISK is the most time-consuming, just using BRISK solely as a descriptor with other detectors will hinder the overall efficiency.Fig. 7 shows the Robustness of Detector.Superpoint consistently demonstrates robust performance across various transformations.Among conventional detectors, AKAZE has proven effective, especially in handling rotated transformations and noise interference.And most detectors, exhibit marked poor performance under scale invariance.The horizontal textures inherent in LiDAR-based images might explain such weakness: when the images are enlarged, these textures can be erroneously detected as keypoints.
As emphasized in Section II-C, a multitude of keypoint detections and rapid matches could be useless if their accuracy is not guaranteed.Therefore, Match Ratio, Match Score, and Dinctiveness, which pertain to algorithmic accuracy, can be regarded as the most pivotal indicators across various application perspectives.The disparity between the points extracted based on these two types of images shows the significance of different LiDAR-generated images.Additionally, in the preliminary evaluation of LO, we found the accuracy of LO is lower if we only integrated the signal images instead of both modalities.This encourages us to utilize both signal and range images in the latter part.
2) LO based Evaluation: In our experiment, various numbers of neighbor points are utilized, ranging from 3 to 7 for each type of LiDAR-generated image.We selected part of them to show the result here based on the principle that more accurate but less amount points.As we found in the previous section, the Superpoint has reliable key points detected, so we utilize this DL method to extract key points in our proposed approach while KISS-ICP is the point cloud registration and LO method.As shown in Table VI, in the scenarios of Open road, Lab space (hard), and Hall(Large), the LO from KISS-ICP applying raw point cloud can not work properly with large drift which the error can not be calculated.Meanwhile, our proposed approach works all the time.Additionally, even when applying raw point cloud to KISS-ICP works, our approach can achieve comparable translation state estimation while more robust in the rotation state estimation across most of the situations.
In outdoor settings, a neighbor size 4 7 (4 × 4 for signal images and 7 × 7 for range images) exhibits notable efficacy in both translation and rotation state estimation.Conversely, in indoor environments, a neighbor size 5 5 (5 × 5 for signal images and 5×5 for range images) demonstrates commendable performance in the estimation of translation and rotation states, in addition to exhibiting efficient downsampling capabilities, as delineated in Table VI.
Based on the above result, we apply the neighbor size 4 7 for outdoor settings and the neighbor size 5 5 for indoor settings to further extend the performance evaluation by including the conventional keypoint detector approach and another point cloud matching approach, NDT.It is worth noting that the purpose of applying NDT here is not to compare with KISS-ICP but to show the generalization of our proposed approach among other point cloud registration methods.
The result in   translation estimation and more accurate rotation estimation to Superpoint.This performance is obtained with much less CPU and memory utilization, and fewer cloud points, but higher odometry publishing rates.Similar results are achieved by NDT based approach which validates the above result in a certain way.Notably, the memory consumption using KISS-ICP with raw point cloud in Table .VIII is lower than others.As our observation indicates, the primary reason behind this is the drift, resulting in few points for the point cloud registration.

V. CONCLUSION AND FUTURE WORK
To mitigate computational overhead while ensuring the retention of a sufficient number of dependable key points for point cloud registration in LO, this study introduces a novel approach that incorporates LiDAR-generated images.A comprehensive analysis of keypoint detection and descriptors, originally designed for conventional images, is conducted on the LiDAR-generated image.This not only informs subsequent sections of this paper but also sets the stage for future research endeavors aimed at enhancing the robustness and resilience of LO and SLAM technology.Building upon the insights gleaned from this analysis, we propose a methodology for down-sampling the raw point cloud while preserving the integrity of salient points.Our experiments demonstrate that our proposed approach exhibits comparable performance to utilizing the complete raw point cloud and, notably, surpasses it in scenarios where the full raw point cloud proves ineffective, such as in cases of drift.Additionally, our approach exhibits commendable robustness in the face of rotational transformations.The computation overhead of our approach is lower than the LO utilizing raw point cloud but with a higher odometry publishing rate.
In future work, there is potential to seamlessly integrate the current LiDAR-generated image keypoint extraction process into the broader SLAM pipeline.For instance, one avenue of exploration could involve amalgamating features extracted from LiDAR-generated images with those derived from point cloud data, facilitating the development of a lightweight SLAM system complemented by additional sensors, such as an IMU.

Fig. 1 :
Fig. 1: Samples of LiDAR odometry results run in our experiment

Fig. 2 :
Fig. 2: Samples of LiDAR-Generated Images, from above to bottom are signal image, range image, reflectivity image, and point cloud.

Fig. 3 :
Fig. 3: The process of the proposed LiDAR-generated images assisted point cloud registration (a) Detect key points in an enlarged image.(b) Detect key points in a downscaled image.

p e r p o i nFig. 5 :
Fig. 5: Number of key points Fig.6depicts the Computational Efficiency, where the majority of the algorithms operate in less than 50 ms.After CUDA enabled, SuperPoint runs significantly faster with minimal variance.Among all algorithms, BRISK is the most time-consuming, just using BRISK solely as a descriptor with other detectors will hinder the overall efficiency.Fig.7shows the Robustness of Detector.Superpoint consistently demonstrates robust performance across various transformations.Among conventional detectors, AKAZE has proven effective, especially in handling rotated transformations and noise interference.And most detectors, exhibit marked poor performance under scale invariance.The horizontal textures inherent in LiDAR-based images might explain such weakness: when the images are enlarged, these textures can be erroneously detected as keypoints.As emphasized in Section II-C, a multitude of keypoint detections and rapid matches could be useless if their accuracy is not guaranteed.Therefore, Match Ratio, Match Score, and Dinctiveness, which pertain to algorithmic accuracy, can be regarded as the most pivotal indicators across various application perspectives.Fig. ??, Fig. ??, and Fig. ?? present the results of these three metrics, indicating that Superpoint, when augmented with CUDA, is the most effective solution.Moreover, among traditional algorithms, AKAZE demonstrates top-tier performance across the majority of evaluated metrics, making it a commendable choice.C. Results of LiDAR-generated Image Keypoints Assisted LO 1) Downsampled Point Cloud: In Fig. ??, we demonstrate the sample result of the downsampled point cloud in Fig. ?? compared with the raw point cloud in Fig. ??.Notably, in the downsampled point cloud in Fig. ??, the red points are extracted based on signal images and the green ones are from range images.We draw the key points from both images to the signal image shown in the lower part of Fig. ??.The

p e r p o i nFig. 7 :
Fig. 6: Computational efficiency

TABLE I :
Keypoint detectors and descriptors

TABLE IV :
Evaluation metrics under different interpolation approaches.

TABLE V :
Evaluation metrics under different resized resolutions.
Table VI shows the performance of LO based on different sizes of neighbor point sizes in both indoor (Lab space, Hall) and outdoor (Open road and Forest) environments.

TABLE VI :
Performance evaluation of LO (KISS-ICP) with raw point cloud and our downsampled point cloud, 'Sig' and 'Rng' represent the size of neighboring point areas for the signal and range images, respectively, denoted as Sig Rng.
Table.VIII and Table.IX proves that the conventional keypoint extractor can achieve comparable LO

TABLE VII :
The number of points left after downsampling with varied neighbor size, 'Sig' and 'Rng' represent the size of neighboring point areas for the signal and range images, respectively, denoted as Sig Rng.

TABLE VIII :
Evaluation of LO based on conventional and DL keypoint detectors with KISS-ICP

TABLE IX :
Evaluation of LO based on conventional and DL keypoint detectors with NDT