Large-Scale Outdoor SLAM Based on 2D Lidar

: For autonomous driving, it is important to navigate in an unknown environment. In this paper, we propose a fully automated 2D simultaneous localization and mapping (SLAM) system based on lidar working in large-scale outdoor environments. To improve the accuracy and robustness of the scan matching module, an improved Correlative Scan Matching (CSM) algorithm is proposed. For efﬁcient place recognition, we design an AdaBoost based loop closure detection algorithm which can efﬁciently reject false loop closures. For the SLAM back-end, we propose a light-weight graph optimization algorithm based on incremental smoothing and mapping (iSAM). The performance of our system is veriﬁed on various large-scale datasets including our real-world datasets and the KITTI odometry benchmark. Further comparisons to the state-of-the-art approaches indicate that our system is competitive with established techniques.


Introduction
Simultaneous localization and mapping (SLAM) is one of the key problems for an autonomous mobile robot and has been actively researched for over two decades [1]. As a fundamental module of mobile robot, SLAM has wide application domains ranging from exploration to autonomous driving. While other sensor modalities such as IMU, camera, GPS or RGBD sensors are commonly used, SLAM based on lidar sensor has been very popular, in part because lidar sensor has the advantages of high precision and strong resistance to interference, and is robust to illumination variations. In particular, in this work we focus on large-scale outdoor SLAM based on 2D lidar sensor. 2D lidar sensor enjoys the merits of both high-accuracy range measurement and low-cost compared to the expensive 3D lidar scanners, making it a more suitable choice for cost-sensitive application scenarios.
Typically, a lidar SLAM system consists of a lidar odometry front-end and a back-end that performs the pose graph optimization to obtain a globally optimized map and the full trajectory of the robot. Besides, loop closure detection plays an important role that provides constraints between poses that were captured at the same place but at different times. For the front-end, Iterative Closest Point (ICP) [2], Normalized Distribution Transform (NDT) [3] and Correlative Scan Matcher (CSM) [4] are typical algorithms that were commonly used for 2D lidar scan matching. In particular, ICP and NDT are efficient local scan matching algorithms that depend on an accurate initial guess, but may get stuck in a local optimum if the initial guess is far from the ground-truth. On the contrary, CSM is a global scan matching algorithm that could find the global optimum at a fixed resolution in a probabilistic framework. Various approaches [5,6] have been proposed to improve the performance of CSM for 2D lidar scan matching. Previous works have obtained satisfactory results in relatively small indoor environment, whilst the application in large-scale outdoor environment is still a challenging task.
The primary challenges in designing a 2D lidar SLAM system for large-scale outdoor environment lie in three aspects: Firstly, the front-end should be designed computationally-efficient and robust to initialization error, generating a good prior for loop closure detection and global optimization, especially in low-texture and dynamic environment. Secondly, an efficient loop closure detection algorithm is needed to reduce accumulation drift when loops are detected. This is of particular importance in large-scale settings. However, most existing loop closure detection algorithms are designed for visual SLAM systems [7][8][9][10], and little attention has been given to 2D lidar SLAM systems [11][12][13]. Finally, an accurate and robust optimization back-end is needed to reject false loop closures.
To tackle the above challenges, this paper makes the following three contributions: (1) Firstly, a modified CSM algorithm is proposed to improve the accuracy and robustness of the front-end scan matching algorithm, especially in low-texture and dynamic environment. (2) Secondly, we propose an AdaBoost based loop closure detection algorithm and a false loop closure rejection algorithm which work efficiently to perform place recognition. (3) Thirdly, we propose a light-weight back-end optimization algorithm that works in real-time.
The optimization results are also utilized to eliminate false loop closures.
The remainder of this paper is organized as follows. In Section 2, some related works are introduced. The details of the implementation of our approach are presented in Section 3. Section 4 provides the experimental results and comparisons with state-of-the-art approaches on both publicly available datasets and our own collected datasets. Finally, the conclusions and future works are summarized in Section 5.

Scan Matching Approaches
The aim of scan matching [14] is to estimate the rigid-body transformation between two laser scans. The Iterative Closest Point (ICP) [2] and Iterative Closest Line [15] are the most popular scan matching algorithms which estimate the relative transform by finding the nearest points between the two laser scans. Zhang et al. [16] proposed a low-drift and real-time lidar odometry and mapping (LOAM) algorithm, which selects feature points based on the roughness of the point's local region. The feature points are then matched using either point-to-line or point-to-plane distances. There have been other scan matching approaches, including NDT [3], hough transform-based approaches [17] and histogram-based approaches [18,19]. All these approaches rely on some model assumptions, and may get stuck in a local optimum, especially in large-scale, low-texture and dynamic environments.
Olson et al. [4] proposed the Correlative Scan Matcher (CSM) to search for the best rigid transformation in a probabilistic framework. Compared with other scan matching approaches, CSM is robust to large initialization errors while running in real-time by utilizing the multi-resolution lookup-table rasterization technique. Chong et al. [5] proposed the Extended Correlative Scan Matcher (ECSM) which improves the scan matching accuracy with surface normal information.

Loop Closure Detection
There have been works based on Monte Carlo localization [20] and Rao-Blackwellized filtering [21] to tackle the 2D lidar loop closure detection problem. Recent works on loop closure detection can be roughly classified into three categories: feature based approaches, optimization based approaches and learning based approaches [13].
Bosse et al. [22] proposed to use orientation histogram and projection histogram for loop closure detection. Himstedt et al. [23] augmented the original Fast Laser Interest Point Transform (FLIRT) [24] descriptor with the orientation and distances of landmarks to accelerate place recognition. The approach of cartographer [12] firstly built submaps and then aligned scans to submaps to generate closed loops by a branch-and-bound approach. Li et al. [13] utilized deep learning to perform loop closure detection in indoor environment. The most similar work to our approach is the method proposed in [11], which treat the loop closure detection problem as a classification problem, and use AdaBoost to detect loop closures.

Methodology
As shown in Figure 1, our full system mainly contains three parts: The front-end (Section 3.1), the loop closure detection part (Section 3.2) and the back-end (Section 3.3). The front-end is in charge of key frame selection, data preprocessing, point cloud feature extraction and scan matching. The loop closure detection part is responsible for the detection and validation of the loop closures. The back-end optimizes vehicle poses and global map based on the loop closure detection results and the scan matching results obtained from the front-end. Unlike traditional descriptions, we separate the loop closure part from the front-end to emphasize its importance, especially in mapping large-scale outdoor environment.

Probabilistic Formulation and CSM Overview
Given two lidar scans, CSM is applied to calculate the optimal rigid body transform between the two lidar poses. The scan matching problem can be described in a probabilistic framework. Let the vehicle move from x i−1 to x i , with motion u obtained from the IMU and the wheel encoder. The goal of scan matching is to find the maximum posterior probability p(x i |x i−1 , u, m, z), where m is the environment model and z is the current observation. According to Bayes rule, we have Supposing every lidar observation z j is independent, the observation model can be written as In the context of scan matching, we set the target scan as P, and the source scan as Q. Let T be the rigid body transform that transforms points in Q into P's coordinate. The observation model can be written as Based on Equations (2) and (3), the problem of finding the optimal transformation T * is thus equivalent to search for the maximum log likelihood of the observation model where P raster stands for the grid map rasterized from the target scan P and Q i (i ∈ N) is a single point of scan Q. T corresponds to a 3*3 homogeneous matrix, defined as

The Improved Correlative Scan Matcher
The multi-resolution CSM proposed in [4][5][6] is described in Algorithm 1. This approach could obtain satisfactory results in most structured environment; however, it will encounter difficulties in handling large-scale, low-texture, and dynamic environment, especially for corridors and tunnels. Based on this algorithm, we make the following improvements:

Algorithm 1 The multi-resolution CSM algorithm
Require: Target scan P, Source scan Q, Initial guess T 0 1: Rasterization of lookup table: L high , L low 2: Initialization of scan matching score and transformation: S best (1) Firstly, we improved the rasterization of the lookup table. Instead of rasterizing the single target scan, we utilize historical multi-frame to generate a more robust and more comprehensive rasterized cost table. The multi-frame rasterization algorithm is summarized in Algorithm 2, and the comparison with traditional rasterization approach is shown in Figure 2. To increase the accuracy of the raster table, a weighted sum of historical raster tables can be used, as shown in line 7 of Algorithm 2. A scan that is far away from the current scan will be set a smaller weight. Transform historical scan χ i to coordinate t − 1: 4: for j = 1 to n i do 6: Rasterization of χ i : 7: end if 11: end for 13: end for 14: return L(χ t−1 ) (2) Secondly, the covariance of the transformation T is utilized to recognize scenes that have high matching uncertainty, such as corridors or tunnels. Similar to [4], we calculate the mean and the covariance for T as follows: Suppose T i is a particle corresponding to a single transform along with its weight S i , {T i , S i } is the posterior distribution of T . Let ∑ be the covariance of transformation T , we have Let ∑ x , ∑ y , ∑ z be the diagonal values of ∑ , we define ∑ l and ∑ s as We further define k as the ratio of ∑ s and ∑ l The comparative results of the k-value in different scenes are shown in Figure 3. As shown, scenes that are well constrained in both x and y directions have a value of k nearby 1.0. And scenes like corridors have small values of k, which implies high uncertainty of translation. Figure 4 shows examples taken from the KITTI odometry benchmark [32]. The score distribution is visualized according to the respective translation for a fixed orientation, and the corresponding raw scans are visualized above.   Our improved CSM algorithm utilizes k as the penalty factor ω(k) to increase the odometry accuracy by means of motion model p(x i |x i−1 , u) as where T j − u means the difference between the observation model and the motion model, and T * corresponds to T j with the maximum score S j . We set ω(k) = λ(k − 1), where λ > 0.

Loop Closure Detection and Validation
Our loop closure detection and validation method is mainly composed of three parts: Detecting loop closure candidates based on AdaBoost, hierarchically selecting positive loops with high confidence and rejecting false loops according to the residual error from the back-end optimizer. It should be noted that, our method of detecting loop closure candidates is similar to [11]. However, we improve the accuracy of classification by introducing more robust features and decrease the false alarm by several validation steps.

Point Cloud Feature Extraction
Similar to [11], we extract rotation invariant features as follows: (1) Area features: Area, close area; (2) Distance features: Distance, close distance, far distance, average range, centroid, maximum range, mean deviation, standard deviation of range; (3) Shape features: Circularity radius, circularity residual, curvature mean, curvature standard deviation, regularity; (4) Cluster features: Number of groups, mean group size; and (5) Others: Size and mean of angular difference. For detailed definition and formulation of these features, please refer to [11,33]. Based on [11,33], we have improved the feature extraction in three aspects.
(1) Firstly, for the mean and the standard deviation of the curvatures, the original method in [11] only takes three consecutive points into account to calculate the curvature of the middle point, which is sensitive to noise. Let p i be a point in the lidar cloud P k , and let S i be the neighboring consecutive point sets of p i . We choose three points on each side of p i , and the smoothness is evaluated as The curvature mean c and the standard deviation σ(c) are defined as (2) Secondly, for the number of groups and the mean of group size, rather than using a constant maximum distance to separate consecutive points [11], we adopt an adaptive breakpoint detector as introduced in [34]. The adaptive distance threshold of point p i is defined as where r i is the distance from p i to the origin of the coordinate, ∆ϕ corresponds to the angle interval of the range scan, λ is an empirical value determining the radius of the threshold circle and σ corresponds to the measurement accuracy of the laser scanner. A breakpoint is detected when the distance between consecutive points exceeds D i Th . An example is shown in Figure 5, where the black points represent breakpoints and the red point sets between two consecutive breakpoints belong to one cluster. Besides the above two features, we introduce another five-dimensional feature based on the size of the cluster. We discretize the size of the cluster into 5 levels, and calculate the ratio of each level L = [l 1 , l 2 , l 3 , l 4 , l 5 ], where l i is normalized. We termed this feature as Cluster Distribution Histogram (CDH). The size of each level is set empirically. This five-dimensional feature together with the first two features constitutes the distribution of the point cloud clusters. Three examples of CDH from our real scan data is shown in Figure 6. (3) Thirdly, we introduce three more features based on scan matching result, including the normalized score of the maximum scan matching alignment, the k-value computed by Equation (7) and the missing rate of query scan points. In particular, the k-value reflects the characteristic of the scene, and the missing rate corresponds to the rate of the query scan points which are projected into the black region of the rasterized cost table as shown in Figure 2.

Classification Based on AdaBoost
All those m rotation invariant features are computed for each scan, defined as where N corresponds to the total number of scans. Given two laser scans k and l, the set of features for the classifier is defined as For the training data, a set of n labeled scan pairs are provided where y i ∈ {1, −1} is the label of a sample input indicating matching and nonmatching scan pairs. AdaBoost is applied as the classifier constituted with a series of weak classifiers. The AdaBoost algorithm is described in Algorithm 3. The algorithm firstly trains a base classifier from the initial training sets, and then adjusts the distribution of the training samples according to the classification error, so that the previously misclassified samples receive more attention in the following step. And then the next weak classifier is trained based on the updated sample distribution. The above two steps are repeated until the maximum iteration T is reached. The final classifier is a weighted combination of T weak classifiers, and these weak classifiers with smaller classification errors have larger weights.

Loop Closure Validation
It is worth noting that one single wrong loop closure can cause severe damage to the whole pose graph. It is necessary to perform validation to eliminate the false loop closures. Our system adopts three strategies to reject false loops: (1) Firstly, for every positive loop closure detected by the AdaBoost classifier, we calculate the best score of the query scan projected to the coarse raster table, defined as S best L . The candidates detected from the classifier with S best L < S Th L are rejected. (2) Secondly, we calculate the scan matching results at high resolution, including the best score S best H , the k computed by Equation (8) and the missing rate of query scan points defined as p miss . Three thresholds are used to select positive loop closures with high confidence.
(3) Thirdly, we perform consistency validation by inverse scan alignment which we named as Direct-Inverse Matching Consistency (DIMC). Unlike local matching algorithms such as ICP and ICL, CSM may produce a large deviation between the direct matching and the inverse matching when two scans belong to a false loop closure. An example from our data is shown in Figure 7, where the blue points belong to the target scan and the red points belong to the source scan. Due to the significant changes in viewpoint, the scan matching result is unreliable, which can be verified through DIMC.

Algorithm 3 Classifier based on AdaBoost
Require: Training set D = {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n )}, Maximum iteration T 1: Initialize the sample weight distribution: ω 1 (x) = 1 n ; 2: for t = 1 to T do 3: Train a weak classifier h t based on ω t : Compute classification error of h t : is the authenticity discriminant function. 5: if ε t ≥ 0.5 then 6: continue; 7: end if 8: Compute the weight of the weak classifier h t : Update the sample weight distribution: where η t is the normalization factor. 10: end for 11:

Back-End Optimization
The back-end performs graph optimization based on the results generated from the front-end and the loop closure detection, resulting in a consistent trajectory and an optimized map.
The problem can be represented as the Bayesian belief network [25]. The joint probability of the variables is modeled as where X is the set of nodes and U stands for the set of edges, P(x i+1 |x i , u i ) is the motion model constructed from adjacent frame constraints given by the front-end, P(x j |x i , u ij ) corresponds to the measurement model generated from loop closure detections. The state transformation model corresponding to P(x i+1 |x i , u i ) is defined as where f i,i+1 (·) is the state transformation function, w i,i+1 is the Gaussian noise with zero-mean and covariance ∑ i . Similarly, the state transformation model corresponding to P(x j |x i , u ij ) is defined as and the covariance of w i,j is Λ ij . Combining ( (15), (16), (17)), the maximum-a-posteriori estimation can be formed into a nonlinear optimization problem where e 2 ∑ = e T ∑ −1 e. As described in [25], the above nonlinear optimization problem can be transformed into the standard least squares problem based on Gauss-Newton or Levenberg-Marquardt algorithm where X ∈ R n contains all pose variables, the sparse measurement matrix A ∈ R m×n contains all edges, and b ∈ R m is the right hand side (RHS) vector corresponding to the difference between the estimated state and the measurement vector. Different from the work in [25], we improved the back-end mainly in two aspects.
(1) Firstly, the covariance matrix computed in Equation (6) can be embedded into the above optimization framework as an adaptive information matrix. This will ensure that edges with strong confidence are not prone to be adjusted during the optimization procedure, whilst edges with low confidence are more likely to be tuned.
(2) Secondly, our system rejects false loops according to the residual error from the back-end optimizer. The residual error is computed between trajectories before and after the optimization. A large difference indicates a high likelihood of false loops.

Experiments and Results
During our experiments, the algorithms run on a laptop computer with 2.4 GHz Intel Core i7 CPU and 8 GB of RAM. In order to achieve real-time performance, multiple CPU threads are utilized in parallel. We perform experiments both on the data collected by our own platform and also on KITTI benchmark. Our platform is a golf cart equipped with a Velodyne VLP-16, a wheel encoder and a low cost IMU, as shown in Figure 8. Although VLP-16 produces 16 scanning lines, we only utilize one single horizontal scanning line to mimic a 2D lidar. The same method is also applied on the KITTI datasets.

The Results of the Improved Correlative Scan Matcher
For the first experiment, we perform the simple laser scan alignment without loop closure detection or back-end optimization to validate the improvement of our front-end method. The comparative results of the original CSM and our improved CSM are shown in Figure 9. The translation errors are evaluated following the KITTI's protocol with lengths (100 m, 200 m, . . . , 800 m) in 2D coordinates [32].
As shown in Figure 9, the trajectories of our improved CSM are closer to the ground truth than those of the original CSM in both the 0th sequence and the 5th sequence. While the average running time of our improved CSM increases by less than 0.01 s.

The Results of Loop Closure Detection and Validation
As described in Section 3.2, the loop closure classifier based on AdaBoost is trained from our dataset campus #1. The dataset is collected around the campus covering an area of 1750 m × 910 m while traveling around 15.99 kilometers with 22,935 scans. We choose 2000 matching and 2000 nonmatching laser pairs which are pre-labeled to train the classifier by 10-fold cross validation.
We choose three other approaches for comparison: Method 1 [11] is the most similar one to our method, and the main difference is that we utilize more features as described in Section 3.2; method 2 is based on Support Vector Machine (SVM) using the same features as our method; method 3 is based on deep learning algorithm similar to [13]. Figure 10 shows the precision-recall curves for all these three approaches together with our proposed approach.
Compared with method 1, our algorithm performs better on the campus #1 dataset. A threshold of K = 0.3 gives a detection rate of 92.6% and a false alarm rate of 2.7%. Although the false alarm rate of our method is a bit higher than method 1, it will be further reduced through validation (DIMC) and false loop closure rejection based on the back-end optimization. The subsequent experiments will show that our system can achieve nearly 0% false alarm for different types of datasets. Method 3 achieves the worst performance on our dataset. The main reason is that the dataset tested in [13] is a much smaller dataset collected in structured indoor environment whilst our campus #1 dataset is built in large-scale outdoor environment. For time complexity, our method outperforms method 1 both in feature extraction and classification, as shown in Table 1. During all the experiments including fifteen datasets from campus #1, one dataset from the Zhongdian Science and Technology Park and eleven datasets from KITTI benchmark, only two false loop closures are rejected according to the residual error from the back-end optimizer. Our loop closure detection and validation algorithm performs consistently well. Figure 11 shows the mapping of campus #1 area, where the map (shown in red dots) is projected onto a satellite image. The map shows a good consistency with the satellite image. Besides campus #1, we also test our 2D-SLAM system on several other datasets including a dataset from the Zhongdian Science and Technology Park as shown in Figure 12. Results indicate that our system works consistently well in large-scale outdoor environment. Figure 13 shows a snapshot of our mapping process. The red points are obstacles mapped online; the blue points represent the current laser scan; the green points represent the vehicle trajectory and the dark yellow lines are the loop closures. The current vehicle pose is shown as the yellow arrow.   We qualitatively compare our method with a state-of-the-art 2D SLAM algorithm: Cartographer [12]. Figure 14 shows the trajectories generated by our method and cartographer. As can be seen from Figure 14, our method can compete with cartographer in terms of accuracy. For computational efficiency, our method built the whole map in 27.54 minutes compared with 31.09 min for cartographer. Detailed comparisons of these two approaches are shown in Figure 15, where scenes #1, #2 and #3 are labeled in Figure 14. For scene #1, cartographer generates a wrong loop closure which results in blurry lines at place A. As for our map rendering step applies dynamic objects filtering borrowed from Wang et al. [35], the unreliable points are filtered effectively. The different results can be seen from B and C in scene #1, A and B in scene #2 as well as C in scene #3. Place C in scene #2 along with A and B in scene #3 indicate that our map contains richer details with less noise than cartographer.

Test with KITTI Datasets
Our system has also been tested on the large-scale and publicly available KITTI datasets, and the experimental results are shown in Figure 16. To mimic the 2D laser setting, we extract one single laser scan of the Velodyne HDL-64 lidar. Due to space issue, we present results of two typical sequences (Sequences 13 and 15). We choose LOAM, ICP and NDT as the comparison approaches. ICP and NDT are widely used scan matching approaches in lidar SLAM system. LOAM is ranked #1 among all lidar based methods on KITTI dataset.
It is worth mentioning that all these comparison approaches are based on the whole 3D lidar data, whilst our approach only utilizes one single scanning line of the 3D lidar. As shown in Figure 16, the experimental results indicate that our system achieves comparable performance compared to other benchmark approaches. It is worth mentioning that sequence 13 is prone to false loop closures around the viaduct due to the limitation of 2D framework. However, our system can detect and eliminate these wrong loops successfully. For computational efficiency, our system achieves a superior runtime of 0.06 s per scan compared with 0.1 s of LOAM.

Concluding Remarks
In this paper, a robust 2D lidar SLAM system for large-scale outdoor environment is proposed. To improve the accuracy and robustness of scan matching, an extension to the CSM is proposed. For efficient place recognition, an AdaBoost based loop closure detection is proposed. We also propose a light-weight back-end optimization algorithm that works in real-time. The optimization results are also utilized to eliminate false loop closures. Experiments both on our own collected dataset and the publicly available KITTI dataset confirmed the effectiveness of our approach. For the future work, we would like to apply our method to 3D mapping and localization. Moreover, the implementation of our map for localization on different platforms will be verified in our further work.