LIDAR Scan Matching in Off-Road Environments

: Accurately matching the LIDAR scans is a critical step for an Autonomous Land Vehicle (ALV). Whilst most previous works have focused on the urban environment, this paper focuses on the off-road environment. Due to the lack of a publicly available dataset for algorithm comparison, a dataset containing LIDAR pairs with varying amounts of offsets in off-road environments is ﬁrstly constructed. Several popular scan matching approaches are then evaluated using this dataset. Results indicate that global approaches, such as Correlative Scan Matching (CSM), perform best on large offset datasets, whilst local scan matching approaches perform better on small offset datasets. To combine the merits of both approaches, a two-stage fusion algorithm is designed. In the ﬁrst stage, several transformation candidates are sampled from the score map of the CSM algorithm. Local scan matching approaches then start from these transformation candidates to obtain the ﬁnal results. Four performance indicators are also designed to select the best transformation. Experiments on a real-world dataset prove the effectiveness of the proposed approach.


Introduction
Simultaneous Localization and Mapping (SLAM) has always been a hot research topic in the field of autonomous driving. One of the core modules behind SLAM is the ability to accurately register two scans captured in nearby positions. LIDAR is among the most popular sensors for an Autonomous Land Vehicle (ALV) due to its highly precise range measurements [1,2]. Whilst most of the previous works have focused on urban environments, this paper focuses on the problem of registering two LIDAR scans captured in an off-road environment.
There have been a lot of research on LIDAR point cloud registration, and many approaches have been proposed, such as Generalized-ICP (GICP) [3,4], Normal Distribution Transform (NDT) [5,6], LIDAR Odometry and Mapping (LOAM) [7], Correlative Scan Matching (CSM) [8], etc. However, those approaches are mostly designed for urban environments. In the off-road environment, some new challenges might be encountered: • In off-road environments, the distribution of trees and bushes is random and cluttered, as shown in Figure 1. Therefore, the point's curvature, normal and other geometric features, which are frequently used in existing approaches, are unstable. For example, GICP uses the plane-to-plane distance to describe the proximity of two matched points, which requires calculating the covariance matrix and the surface normal at that point. However, cluttered points will weaken these features and may cause mismatches. The edge point and planar point selected in the LOAM algorithm are also closely related to the point's curvature, which may not be accurately estimated in the off-road environment.
• In off-road environments, the vehicle, as well as the LIDAR attached to it, might shake abruptly. This unexpected movement may change the local distribution of points. Consequently, the performance of NDT may be degraded, as this algorithm requires the distribution of local points to be stable. In this paper, the problem of registering LIDAR scans in an off-road environment is investigated. The main contributions of this paper are as as follows: • A dataset containing a large number of LIDAR pairs with different offsets in an off-road environment is proposed.

•
A rigorous evaluation of existing scan matching approaches is performed on this dataset, and results indicate that CSM [8] performs best on large offset dataset, whilst LOAM performs better on small offset dataset. • A more robust approach combining the advantages of CSM and LOAM is proposed. Experiments on the dataset indicate that the proposed method performs better than existing scan registration approaches.

Related Work
For matching two LIDAR scans, based on how they model the point cloud, existing methods could be categorized into local approaches and global approaches. Whilst local approaches treat the point cloud as an assembly of local segments, global approaches treat the point cloud as a whole.
For local approaches, it focuses on optimizing the distance between corresponding local geometric structures. Iterative Closet Point (ICP) algorithm and its variants [3,4,9] are the defacto baseline approaches for LIDAR point cloud matching. These approaches either use point-to-point, point-to-plane, or plane-to-plane distance as the proximity measure for two local geometric structures. Normal Distribution Transformation (NDT) [5,10] is another LIDAR scan matching approach which models the local geometric structure as a normal distribution. The distance between the local geometric structures is then computed using the mahalanobis distance. For faster iteration and more robust distance modeling, LOAM algorithm [7,11] only selects a subset of point clouds which are either the edge points or the planar points. For the edge points, the point-to-line distance was used, whilst the point-to-plane distance was used for the planar points.
For global approaches [12], the point cloud is considered as a whole. To make the computation tractable, it is often necessary to project the 3D point cloud to a 2D ground plane. In [13], the 3D LIDAR scan was projected onto a 2D orthographic reflectivity map. In [8], the Correlative Scan Matching (CSM) approach was used to match the two-point cloud in 2D. Similarly, a 2D height map is used to represent the 3D point cloud in [14,15]. Chong et al. [16] proposed a 2D map representation which combines reflectivity, height, color, curvature and normal for more robust matching. In [15], the authors utilized both the reflectance intensity map and the height map. The scan matching problem is casted as an optical flow [17] computation problem, and an efficient gradient descend approach is proposed to solve it.
Most of the methods mentioned above are designed and tested in urban environments. In this paper, the focus is on off-road environments. Three typical local scan matching methods: GICP, NDT, LOAM and a popular global method (CSM) will be tested in the following experiments.

Existing Scan Matching Approaches
The problem of scan matching is formally defined as follows. Let P and Q represent the target point cloud and the source point cloud respectively. These two-point clouds are then converted as P = C(P) and Q = C(Q) by some conversion function C. The task of scan matching is to find a Euclidean transformation of T so that T (Q ) is best aligned with P . To improve readability, the superscript of the converted cloud is omitted in the following equations.
Possible forms of the conversion function C include: C 0 : a uniform downsampling of the original point cloud; C 1 : project the 3D point cloud to the ground plane to get 2D height map; C 2 : binarize the projected 2D height map to generate an obstacle map; C 3 : deletes those points that belong to the ground plane. A combination of different conversion functions is also possible, such as C 1 + C 2 + C 3 or C 0 + C 3 , etc.
Let q i represent a point in the source point cloud, p (G) i is its closest geometric structure in the target point cloud. The geometric structure can be points (spheres), lines or planes. In order to improve the robustness, a penalty function ρ(•) is introduced to reduce the effect of outliers on optimization. The energy function of scan matching is usually expressed as: (1)

The Generalized ICP (GICP) Approach
In this approach, it is assumed that each point is generated from its local distribution: , wherep i andq i are the underlying ideal points (with no errors due to occlusion or sampling) s.t.p i = T * (q i ). {C P i } and {C Q i } are the covariance matrices associated with the local measurement points. Therefore, . The distance function can be written as: The GICP algorithm treats the point cloud as a collection of many segmented planes. The term C P i + T C Q i T T contains the covariance matrix of the two surfaces. When C P i = I, C Q i = 0, the above formula is equal to the point-to-point ICP. When where n i is the normal vector at q i , the above equation is equivalent to point-to-plane ICP. When C P i and C Q i represent the covariance matrix at p i and q i , GICP becomes the plane-to-plane ICP.

The Normal Distribution Transformation (NDT) Approach
Unlike GICP, this method divides the point cloud into discrete grid cells, and each grid cell is represented by a Gaussian distribution N (µ i , Σ i ). If a transformed point from the source point cloud falls into the i-th cell, its distance from the cell center can be written as: where Σ i captures the local geometry of the cell. For spherical, line or plane structures, the eigenvalues Compared to GICP, NDT only considers the distribution of target cells, which directly depends on the cell resolution. Due to the discretization of the cell structure, NDT algorithm is usually faster than GICP.

The LIDAR Odometry and Mapping (LOAM) Approach
LOAM [7] is a specifically designed scan registration algorithm. In this approach, only a subset of the original point cloud is selected. According to the local geometric structure, these selected points are termed as edge feature points or plane feature points. For edge feature points, the point-to-line distance is used as the distance metric, whilst the point-to-plane distance is used for the plane feature points.
Previous research has shown that, in urban environments, LOAM is faster and more accurate than NDT or GICP. However, in the off-road environments, the less obvious curvature will result in a reduction in the number of effective feature points. In addition, due to the weaker local geometry, the features at this point will also become less prominent.

The Correlative Scan Matching (CSM) Approach
Unlike all the above local scan matching methods, CSM [8] is a global method that treats the point cloud as a whole. For computational efficiency, the 3D point cloud is usually projected to a 2D grid map. Similar to Equuation (1), the distance function can be written as: In the above equation, k represents different forms of compressed 2D grid maps, including obstacle maps or height maps. p ij represents the i-th row and j-th column of the 2D grid map. The local geometry p (G) ij can be an occupied cell (C 2 ) or a vertical bar with a specific height (C 1 ). As shown in Equation (4), L1 norm is used to calculate the distance between the map grid and the measured grid. The calculation speed can be accelerated through a pre-built lookup table. In order to embed the variance in the distance function, a Gaussian filter is used to pre-blur the target map, as shown in Figure 5a.
In this paper, in order to achieve more robust matching in off-road environment, the original CSM algorithm is expanded to include both 2D obstacle map and 2D height map, as shown in Figures 5 and 6. Although obstacle maps are more suitable for global registration, 2D height maps can capture local differences in occupied grids. Therefore, the likelihood function can be written as: Here, 1 k represents the maximum possible distance of the k-th modality, and ω k represents the weight of the corresponding modality.

An Evaluation of Existing Scan Matching Approaches
To collect data in an off-road environment, a vehicle equipped with a Velodyne 64 LIDAR and an integrated Novatal GNSS/IMU is used. It drives in the suburbs of Beijing. The length of the route is about 10 km. Figure 1 shows some typical scenarios.
Since the environment includes tall bridges and tall trees, even the high-end Novatal GNSS/IMU system can produce positioning errors of up to several meters. Therefore, similar to [15], the graph SLAM [18] algorithm is used to obtain the ground-truth position of each scan. The graph SLAM uses a factor graph representation [19]. The factor graph contains several types of different factors, including wheel odometer factor, scan matching factor, loop closing factor and GNSS/IMU factor [20]. A diagram of the factor graph is shown in Figure 2. To ensure the quality of the factor graph optimizer, the scan matching constraints and the loop closing constraints are manually confirmed. The iSAM-optimized [21] poses are considered as the ground-truth positions for LIDAR scans.

Datasets
A total of about 15,000 scans are collected. A subset of the scan pairs is further selected to generate four datasets representing different levels of scan matching difficulty: In all the above four datasets, a zero-mean Gaussian noise with variance of 3 degrees is added to the azimuth angle of the ground-truth. Each dataset contains about 271 scan pairs. The experiments are repeated for 8 times, resulting in a total of 2168 trials on each dataset.

Experimental Results
The aforementioned algorithms were tested on the dataset and the results are shown in Figures 3 and 4. By comparing the performance of NDT, GICP, CSM with NDT+C 3 , GICP+C 3 , CSM+C 3 , it is observed that for GICP and NDT, when the position offset is small (on Dataset#0.5 and Dataset#2), the ground points can be safely removed. However, when the offset is large (on Dataset#5 and Dataset#10), retaining the ground points is a better choice. Since the ground points pose constraints in the vertical direction, these constraints facilitate the convergence of the 3D registration approaches (GICP and LOAM) to the correct transformation, thereby improving the overall performance. For CSM, the ground points should always be removed, but for LOAM it is not necessary to do so. Because LOAM is specifically designed to execute on the entire point cloud, it will automatically select the point-to-plane distance as the distance metric for points on the ground plane. In Dataset#0.5, LOAM obtains the highest accuracy. However, its performance drops rapidly on Dataset#2, Dataset#5 and Dataset#10. This suggests that LOAM is very sensitive to the initial transformation. As long as the initial transformation is quite close to the ground-truth, LOAM will output the correct result.  CSM achieves the highest accuracy on Dataset#2, Dataset#5 and Dataset#10. This is mainly because CSM searches in a relatively large area. As long as the ground-truth position is within the search radius, CSM is likely to find the correct position.
Due to the complementary characteristics of CSM and LOAM, a natural idea is to use CSM in combination with LOAM to improve robustness and accuracy. It is also observed that in Dataset#10, the accuracy of all the tested algorithms drops rapidly. This is mainly because the dense and cluttered weeds can easily cause the scan matching algorithm to fall into a local minimum. In order to overcome this problem, it is necessary to design an evaluation system for evaluating matching performance. In the following section, the method of combining CSM with LOAM and how to evaluate the performance of the LOAM algorithm will be described.

Post-Process of Global Scan Matching
When it is confident in registering with the global scan registration (CSM) method, the matching result T c can be directly passed to LOAM as the initial transformation, as shown in Figure 5d. If not, a sampling based on the matching uncertainty needs to be performed. The sampling operator will generate several high-confidence transformations, which will be passed to the local matcher as shown in Figure 6d. The matching uncertainty of the global method is defined as follows.  In the CSM algorithm, each transformation candidate T i is associated with a matching score S i . Therefore, for each transformation T ri , a score map S ri will be obtained. The size of the score map is the same as the search window as shown in Figures 5d and 6d.
Each grid cell of the score map will be normalized to [0, 1]. For each score map S ri , the high confidence posterior T λ,ri is defined as a set of poses with scores greater than λ, as shown in Equation (6). Here, the size of T λ,ri reflects the translation uncertainty, which is defined in Equation (7). This uncertainty is rotation-specific, and needs to be calculated for each rotation angle. To reduce the computational burden, the different U λt are fused together and only the one with the highest score is retained, as shown in Equation (8). If the translation uncertainty is greater than a threshold T λ , several poses will be sampled from the high confidence region T λ,ri .

Evaluation of Local Scan Matching Performance
Depending on the matching uncertainty of the CSM, there may be several possible transformations passed to LOAM. LOAM will then start from these initial transformations and may converge to different results. To choose the best result, some criterions need to designed to judge which result is better.
It is noticed that the following four parameters are very relevant to the performance of LOAM: the number of matched feature points N p , the optimized objective function (i.e., the residual distance error) J, and the number of iterations I t . Therefore, based on these parameters, four evaluation indicators { f k } are designed as follows: Here, f 1 refers to the average matching error. f 2 is an enhanced version of f 1 , which promotes a larger N p value. f 3 describes the convergence rate. It is reasonable to believe that the faster LOAM converges, the more likely it converges to the correct position. Similarly, f 4 is an enhanced version of f 3 , which also promote a larger number of matched points.
Define f k,i as the matching performance value of the i-th candidate. After matching all N c candidates, a performance array F k : F k = { f k,i } is generated. The corresponding index array I k is then obtained by sorting F k . A higher rank indicates a better matching performance. All the candidates are then ranked according to the four proposed performance indicators. The ranking results are then fused together:Ŝ where ω f k refers to the weight of the k-th matching performance indicator. To determine these weight parameters, 10 scan pairs are randomly selected from Dataset#5. For each scan pair, the local scan registration is performed multiple times from different initial transformations. If the registration result is very close to the ground-truth (translation error < 50 cm, rotation error < 0.5 degree), it is considered to be a correct match, otherwise it is considered as a wrong match. Ideally, the scores for the correct matches should be greater than the wrong matches. Although this problem can be cast as a learning to rank problem [22], a simpler method is adopted in this paper. We randomly select 100 pairs of correct/wrong matches, and count the number of times when the correct match score is larger than the wrong match score. This count value is used as the optimization criterion for selecting the best weight. Since there are only four weight parameters, we simply use a grid search to find the best weight. Assume that each weight lies between 0 and 2. The step size of the grid search is set to 0.1. Finally, these weights are set as follows: The whole algorithm is described in Algorithm 1.

Algorithm 1
The proposed scan matching algorithm. Generate the score map S ri and the high-belief posteriori T λ,ri according to Equation (6); 5: Calculate the translation uncertainty U λt,ri according to Equations (7) and (8); 6: if U λt,ri is larger than some threshold T λ then 7: Perform sampling in high-belief posteriori region and generate the new initial rigid transformation set {T i } for the 3D matcher; 8: for all candidates in {T i } do 9: Sort each performance array F k (ascending) and generate the correspondent index array I k ; 13: Calculate the i-th candidate's synthesis scoreŜ i according to Equation (10); 14: Select the best candidate with maximum score: T * ; 15: else 16: Pass T c to the 3D matcher as the initial transformation; Align Q to P based on LOAM algorithm with initial T c , and generate the matched transformation T * 17: end if 18: Return T *

Experimental Results
The proposed method is tested on Dataset#2, Dataset#5 and Dataset#10. The proposed method is called CSM+LOAM*, in comparison to CSM+LOAM which has no sampling step and only pass one transformation is passed from CSM to LOAM. CSM is also combined with GICP and NDT. The results are shown in Figure 7. As can be seen from Figure 7, the proposed method performs best on all datasets. Since CSM+LOAM* is more accurate than CSM+LOAM, it proves that the proposed sampling and evaluation mechanism is effective.
Comparing Figures 3 and 4 with Figure 7, it can be noticed that the performance of the combined method, including CSM+GICP and CSM+NDT, is much better than the original GICP or NDT, especially in Dataset#5 and Dataset#10. This shows that the proposed sampling and evaluation techniques can be generally adopted to improve the performance of any local scan matching method.

Computational Complexity Analysis
The proposed 'CSM+ * ' (where ' * ' represents any local scan registration approaches) approach mainly consists of two parts: the CSM part and the local scan registration part. Other than that, the computation of the sampling and the evaluation part is neglectable. Let t 0 and t 1 represent the computation time of the CSM algorithm and the local scan registration algorithm respectively. n represents the number of sampled transformations passed from CSM to local scan registration approach. Therefore, the overall computational time of the proposed 'CSM+ * ' will be t 0 + n * t 1 . It could be further accelerated to t 0 + t 1 if the n local scan matchers run in parallel.

Conclusions
This paper studies the problem of scan registration in an off-road environment. Several scan registration approaches, both local and global methods, were evaluated on a newly created dataset. Experimental results indicate that, whilst local scan registration approaches, i.e., LOAM, performs best on small offset dataset, a global scan registration approach, i.e., CSM, performs better on large offset dataset. To combine the advantages of both approaches, a sampling and an evaluation algorithm is proposed. Further experiments show that the sampling and evaluation techniques can improve the performance of any local scan registration approaches, thus making it a universal method.