Feature-Based Laser Scan Matching and Its Application for Indoor Mapping

Scan matching, an approach to recover the relative position and orientation of two laser scans, is a very important technique for indoor positioning and indoor modeling. The iterative closest point (ICP) algorithm and its variants are the most well-known techniques for such a problem. However, ICP algorithms rely highly on the initial guess of the relative transformation, which will reduce its power for practical applications. In this paper, an initial-free 2D laser scan matching method based on point and line features is proposed. We carefully design a framework for the detection of point and line feature correspondences. First, distinct feature points are detected based on an extended 1D SIFT, and line features are extracted via a modified Split-and-Merge algorithm. In this stage, we also give an effective strategy for discarding unreliable features. The point and line features are then described by a distance histogram; the pairs achieving best matching scores are accepted as potential correct correspondences. The histogram cluster technique is adapted to filter outliers and provide an accurate initial value of the rigid transformation. We also proposed a new relative pose estimation method that is robust to outliers. We use the lq-norm (0 < q < 1) metric in this approach, in contrast to classic optimization methods whose cost function is based on the l2-norm of residuals. Extensive experiments on real data demonstrate that the proposed method is almost as accurate as ICPs and is initial free. We also show that our scan matching method can be integrated into a simultaneous localization and mapping (SLAM) system for indoor mapping.


Introduction
The mapping of indoor environments or underground spaces has become increasingly important in recent years. However, the lack of availability of GPS signals inside buildings and underground spaces makes indoor modeling and mapping a challenging task. Recently, numerous works have appeared to solve this issue. The most popular solution may be the simultaneous localization and mapping (SLAM) technique, which is widely applied in robotics. Compared with vision-based SLAM systems, laser-based ones can provide more accurate indoor maps and models.
The core of laser-based SLAM, scan matching, is a technique to recover the relative position and orientation of two laser scans. It estimates a rigid transformation to project one laser scan so that the projected laser scan aligns with the other one. The ICP algorithm [1,2] and its variants (to name a few [3,4]) are used pervasively in laser scan matching. However, the performance of ICPs is unstable. When the good initial guess of the transformation is unavailable, the ICPs may not converge to a correct solution. Thus, ICP-based SLAM systems usually fuse additional highly expensive sensor data for guaranteeing robustness, such as those from inertial measurement unit (IMU) and odometry. This will reduce the power of these methods for practical applications because of the very expensive hardware systems.
In vision-based SLAM systems, visual features (such as SIFT [5], SURF [6], ORB [7], and BRISK [8]) are adapted for relative pose recovery. Keyframe image matching typically consists of four major stages: keypoint detection, keypoint description, keypoint matching, and transformation estimation. In the first stage, salient and stable interest points are extracted. The keypoint is then described based on its photometric neighborhoods, such as local gradients. The third stage calculates the distances between the descriptor vectors to recognize reliable correspondences (inliers). Finally, the transformation matrix is computed based on RANSAC [9]. This pipeline has no assumption regarding the initial guess of the transformation.
Currently, 3D indoor mapping is drawing increasing attention. It can directly produce dense point clouds and does not rely on the prior of planar floors. However, 2D indoor mapping is still very important. The 2D indoor maps can be used for emergencies, evacuation and localization. For example, Google indoor maps are produced by a 2D indoor mapping system called Cartographer. To obtain 3D indoor models, another two 2D lasers can be integrated. Moreover, 2D lasers are much cheaper than 3D ones, and the processing of 2D laser scans is much more efficient than that of 3D scans.
In this paper, we carefully design a feature-based framework for 2D laser scan matching. Like keyframe image matching, there are also four basic stages in our framework. First, distinct feature points are detected based on an extended 1D SIFT, and line features are extracted via a modified Split-and-Merge [10,11] algorithm. The point and line features are then described by a distance histogram; the pairs that achieve the best matching scores are accepted as potential correct correspondences. Finally, we estimate the relative positions and orientations of two laser scans based on a robust cost function. Extensive experiments on real data demonstrate that the proposed method is almost as accurate as ICPs and is initial free. We also show that our scan matching method can be integrated into a SLAM system for indoor mapping and indoor modeling. The contributions of our work are summarized as follows: (1) We propose a new initial-free 2D laser scan matching method by combining point and line features, which is a very effective technique for indoor mapping and modeling. (2) We carefully design a framework for the detection of point and line feature correspondences from laser scan pairs. We also give an effective strategy to discard unreliable features. Thus, our detected feature correspondences are distinct, reliable, and invariant to rotation changes. (3) We propose a new relative pose estimation method that is robust to outliers. We use the lq-norm (0 < q < 1) metric in this approach, in contrast to classic optimization methods whose cost function is based on the l 2 -norm of residuals. Unlike the conventional RANSAC-based [9] strategy, strategy, there is no gross error detection stage in our pose estimation algorithm. In addition, our pose estimation algorithm is more robust to noise than the RANSAC-based one. (4) We make an honest attempt to present our work to a level of detail allowing readers to re-implement the method.

Related Work
Many works on 2D laser scan matching have been presented, so we do not give here a comprehensive study. We will review only a few most relevant and recent works, including the most well-known ICP families, feature-based methods, and other techniques.
The idea of the ICP algorithm [1,2] is intuitive and simple: it alternates between finding closest point correspondences under an initial transformation and fitting the transformation with the correspondences until convergence. The classical ICP algorithm is to minimize a function of point-to-point distance. Lu and Milios [12] proposed two ICP variants for scan registration. The first one alternates between translation estimation and rotation computation. Given a fixed rotation, the translation is optimized by least squares. The translation is then fixed, and the rotation is searched by the global-section method [13]. The second is an iterative dual correspondence (IDC) method that adapts two types of correspondences, i.e., Euclidean distance and range angular distance. Minguez et al. [14] presented a new metric distance for ICP, called MbICP, which considers the configuration space of the sensor and combines the rotation and translation errors of the sensor. As shown in their paper, MbICP improves the correspondence building stage and the convergence rate of the classical ICP. Censi [4] describes a point-to-line metric-based ICP (PLICP). In contrast to other point-to-line methods [2], PLICP develops a closed-form solution for the planar case, just like 2D laser scans. PLICP also improves the convergence of classical ICP from linear to quadratic. ICPs can achieve very high accuracy in scan matching. However, as mentioned earlier, their high dependence on a good initial guess will reduce their power for practical applications.
Diosi and Kleeman [15,16] developed a point-to-point matching method called polar scan matching (PSM). It directly matches the range measurement of two laser scans in the native polar coordinate system. PSM uses the matching bearing rule to associate the range measurements of the target scan with the reference scan and minimizes a weighted range residual cost function. This method is much faster than ICPs because it avoids searching for correspondences. As noted in [17], PSM can also improve the ability to converge to an optimal solution from a laser range initial guess.
Montesano et al. [18] give a probabilistic formulation of scan registration called probabilistic scan matching (Probabilistic SM). Like classical ICP, it also consists of two stages, i.e., correspondence location and transformation estimation. In the first stage, the uncertainty in both relative transformation and range measurement is modeled based on Gaussian distributions. In addition, Probabilistic SM allows these correspondences to be found by probability integration over all possible associations between the range measurements of the two scans. The probabilistic modeling of uncertainty is more suitable for real sensor data compared with pure geometrical approaches.
Another probabilistic-based method, called correlative scan matching (CSM) [19], uses cross-correlation to register two laser scans, which achieves real-time performance with little loss of accuracy. CSM maximizes the probability of having observed range measurements. That is, it searches for an optimal rigid transformation that overlaps the two laser scans as much as possible. To avoid a local maximum, CSM searches over the entire parameter space of plausible rigid-body transformations. To detect the plausible region, some additional information should be provided, such as that from visual odometry, wheel odometry, or commanded motion.
These techniques also rely on a good initial guess or prior information. To eliminate the dependence on assumptions, feature-based scan matching methods have appeared. Ramos et al. [20] propose a novel feature-based approach based on conditional random fields (CRF). In their CRF-Matching, a joint estimation over all observations in a laser scan is performed, and shape information is considered to reject outliers. The CRF model could integrate multiple features, such as shape features and appearance features. The maximum posteriori of the rigid transformation is estimated by loopy belief propagation. The main limitations are the high computational complexity and the unreliability for partially overlapping laser scan pairs. Tipaldi and Arras [21,22] presented a method called FLIRT. In their work, they studied three types of laser feature detectors (range-based, normal-based, and curvature-based detectors) and two feature descriptors (linear local shape context descriptor and β-Grids descriptor. Based on the comprehensive analysis of these detectors and descriptors, they combine the best detector with the best descriptor to form FLIRT. They show that FLIRT can be adapted in laser SLAM systems and that impressive results can be achieved. However, their descriptor in FLIRT is very slow, which prevents it from being widely used in real SLAM applications because the number of scans in a SLAM process is usually huge. In addition, there is no outlier removal section in their literature.

Scan Matching
We first define some notations for this task. The right subscription k, k  Z + is used to indicate the frame of laser scans. Sk, Pk and Lk represent the laser scan measurements, point features, and line segment features of the frame k. Let fp(k,i)(x(k,i),y(k,i)) be the i-th feature point in Pk and fj(k,i) be the i-th feature line in Lk. Thus, our scan matching problem can be described as: Problem: Given two consecutive 2D laser scans Sk-1 and Sk, recover the relative rigid transformation (relative position tk-1 and orientation rk-1) Tk-1 = {rk-1, tk-1} from these two laser scans based on point and line feature correspondences.

Feature Detection
Keypoint detection: The scale-invariant feature transform (SIFT) [5] is widely adapted in computer vision applications because of its robustness to image scale, rotation, illumination and viewpoint changes. In this stage, we propose an extended 1D SIFT keypoint detector so that it can be suitable for 1D range data ( Figure 1a). We extract the keypoints in scale space only based on the raw laser range information. The SIFT keypoint detector is based on the scale-space theory [23], which is a multiscale signal representation technique. The scale space uses a set of smoothed signals Sig(x, σi) to represent the original signal Sig(x), where σi is the size of the smoothing kernel, whose role is to suppress the fine scale structures. In our task, Sig(x) is the range information Sk(x) of the k-th frame laser scan. The formulation of the scale-space construction is as follows: where K(x, σi) is a scale-space kernel. In the original SIFT, the Gaussian function is adopted because it is the only possible choice for continuous signals, as evidenced by Lindeberg [23]. However, laser range scans are 1D discrete signals. A discrete kernel, maintaining the properties of the Gaussian function, is chosen: where Ix stands for the modified Bessel function. In SIFT, scale selection is performed to make detected feature points invariant to scale changes. However, the range data of different scans have the same scale. This means that the detected laser feature points are inherently invariant to scale. Thus, instead of extracting local extrema among three scale layers, we regard the local extrema of the Laplacian of the range signal at each scale layer The SIFT keypoint detector is based on the scale-space theory [23], which is a multiscale signal representation technique. The scale space uses a set of smoothed signals Sig(x, σ i ) to represent the original signal Sig(x), where σ i is the size of the smoothing kernel, whose role is to suppress the fine scale structures. In our task, Sig(x) is the range information S k (x) of the k-th frame laser scan. The formulation of the scale-space construction is as follows: where K(x, σ i ) is a scale-space kernel. In the original SIFT, the Gaussian function is adopted because it is the only possible choice for continuous signals, as evidenced by Lindeberg [23]. However, laser range scans are 1D discrete signals. A discrete kernel, maintaining the properties of the Gaussian function, is chosen: where I x stands for the modified Bessel function. In SIFT, scale selection is performed to make detected feature points invariant to scale changes. However, the range data of different scans have the same scale. This means that the detected laser feature points are inherently invariant to scale. Thus, instead of extracting local extrema among three scale layers, we regard the local extrema of the Laplacian of the range signal at each scale layer S k (x, σ i ) as feature keypoints. The Laplacian function is equivalent to the second derivative of S k (x,σ i ) for one-dimensional signals. The keypoints P k are the local peaks of Equation (3): For stability, we want to reject unreliable feature points whose local surfaces are roughly parallel to the laser beams (Figure 2a) or the points on the edges of occluded areas (Figure 2b). Point A is an unreliable feature point with low position accuracy because its local normal direction is almost perpendicular to the laser beam. Point C is also rejected because it may be occluded in the next laser scan. For example, if the laser of Figure 2b moves to the right side of the current position, point C will no longer exist. In detail, we calculate the local surface direction of each feature point and its distance to the neighboring points. If the difference between the local surface direction and laser beam is less than 10 degrees or one of the distances to the neighboring points is longer than 1 m, the feature point will be discarded. Sk(x, σi) as feature keypoints. The Laplacian function is equivalent to the second derivative of Sk(x,σi) for one-dimensional signals. The keypoints Pk are the local peaks of Equation (3): For stability, we want to reject unreliable feature points whose local surfaces are roughly parallel to the laser beams (Figure 2a) or the points on the edges of occluded areas (Figure 2b). Point A is an unreliable feature point with low position accuracy because its local normal direction is almost perpendicular to the laser beam. Point C is also rejected because it may be occluded in the next laser scan. For example, if the laser of Figure 2b moves to the right side of the current position, point C will no longer exist. In detail, we calculate the local surface direction of each feature point and its distance to the neighboring points. If the difference between the local surface direction and laser beam is less than 10 degrees or one of the distances to the neighboring points is longer than 1 m, the feature point will be discarded. Keyline detection: Line segments are a very important type of feature for indoor environments. The advantages of using line features are twofold: first, line features are more distinct for indoor scenes and are less sensitive to the noise in the laser range measurements; second, the orientation information can be easily extracted from 2D line correspondences, which can help us reject feature point outliers and provide a good initial guess for our transformation estimation stage. These advantages will be demonstrated in the following sections.
We detect keylines for each laser scan based on a slightly modified Split-and-Merge algorithm [10,11] (Figure 1b). First, consecutive laser points in laser scan Sk are segmented into clusters, which, because of the points in the same cluster, tend to belong to the same object. The distance of two consecutive laser points is chosen as the segmentation criterion. The two points will be classified into the same cluster if the distance is smaller than a threshold. Clusters with too few points are regarded as isolated points and discarded. Then, for a cluster, we fit a line segment and detect the point with maximum distance to the line. If the maximum distance is smaller than a given threshold, we advance to process the next cluster. Otherwise, we split the cluster into two subclusters at the detected point. This process is recursive. When all possible lines are extracted, we calculate their slopes. Most 2D range finders are time-of-flight lidars, so the adjacent lines can be easily extracted. If two adjacent lines are almost parallel, we compute the distance between their middle points along normal direction. The two adjacent lines will be merged into one single line segment as long as the corresponding middle point distance is very small. Finally, the line segments with small length or few points are removed as unreliable keylines. The details are summarized in Algorithm 1. Keyline detection: Line segments are a very important type of feature for indoor environments. The advantages of using line features are twofold: first, line features are more distinct for indoor scenes and are less sensitive to the noise in the laser range measurements; second, the orientation information can be easily extracted from 2D line correspondences, which can help us reject feature point outliers and provide a good initial guess for our transformation estimation stage. These advantages will be demonstrated in the following sections.
We detect keylines for each laser scan based on a slightly modified Split-and-Merge algorithm [10,11] (Figure 1b). First, consecutive laser points in laser scan S k are segmented into clusters, which, because of the points in the same cluster, tend to belong to the same object. The distance of two consecutive laser points is chosen as the segmentation criterion. The two points will be classified into the same cluster if the distance is smaller than a threshold. Clusters with too few points are regarded as isolated points and discarded. Then, for a cluster, we fit a line segment and detect the point with maximum distance to the line. If the maximum distance is smaller than a given threshold, we advance to process the next cluster. Otherwise, we split the cluster into two subclusters at the detected point. This process is recursive. When all possible lines are extracted, we calculate their slopes. Most 2D range finders are time-of-flight lidars, so the adjacent lines can be easily extracted. If two adjacent lines are almost parallel, we compute the distance between their middle points along normal direction. The two adjacent lines will be merged into one single line segment as long as the corresponding middle point distance is very small. Finally, the line segments with small length or few points are removed as unreliable keylines. The details are summarized in Algorithm 1.

Algorithm 1. Line Segment Extraction
1 input: a laser scan S k 2 output: line segments L k 3 begin 4 segment the laser scan S k and remove small segments to form clusters C k ; 5 for each cluster c i P C k do 6 fit a line f l to c i , compute the length d l of f l; 7 remove line if its length is small; add the line segment f l to L k ; 11 else 12 split c i into two subclusters sc i1 , sc i2 at p, then, perform Algorithm 1 for each subcluster; 13 end 14 end 15 computer the slope s f l pk,iq of each line segment in L k , find adjacent line pairs in L k ; 16 for each pair´f l pk,iq f l pk,jq¯d o 17 ifˇˇs f l pk,iq´s f l pk,jqˇă τ s pτ s " 3˝q then if d max ă τ d m pτ d m " 0.03q then 20 merge p f l pk,iq , f l pk,jq q into a single line, update L k ; 21 return line segment set L k ; 25 end

Feature Description
In this section, we use a distance histogram to describe point and line features. The distance histogram is different from the 2D gradient histogram because distance is invariant to rotation. Thus, our distance histogram descriptor is inherently invariant to rotation changes.
Keypoint description: For each feature point fp (k,i) P P k , we search its neighborhoods. If the distance between the searched point p (k,i) S k and feature point fp (k,i) is less than the search radius r, the searched point p (k,i) is considered to be a neighbor of the feature point fp (k,i) . These distances of the neighborhoods are then formed into an 8-bin histogram. The description of feature point fp (k,i) is the normalized distance histogram.
Keyline description: For each feature line fl (k,i) P L k , we find three points on the line (see Figure 3). These three points divide the feature line fl (k,i) into four segments with equal length. Each point is then described by an 8-bin normalized distance histogram just like the feature point description. The description of feature line fl (k,i) is a 24-bin normalized distance histogram that simply concatenates the three-point descriptor.

Feature Matching
Keyline matching: For two consecutive laser scans Sk-1 and Sk, we detect their feature line segments Lk−1 and Lk. For each feature line fl(k-1,i)  Lk-1, our goal is to search for its correspondence in Lk. We first compute the matching score between fl(k-1,i) and all lines in Lk: stands for the line descriptor; score is the matching score; and represents the Euclidean distance. The feature line fl(k,j)  Lk that achieves the best matching score will be accepted as the potential correspondence of fl(k-1,i). We then check the lengths of fl(k-1,i) and fl(k,j). They are discarded as outliers if their length difference is very large because the line correspondence pair in laser scans usually has similar Sk-1 lengths. Figure 4a shows a sample result. As seen, most feature lines are correctly matched. However, there are still some outliers, such as line correspondence 9.
Because the Lk relationship between laser scans and Sk is rigid transformation, the angles or rotations between the correct 2D line correspondences are almost the same. Thus, we first cluster the angles of line correspondence via the histogram technique. The average angle of the largest cluster is regarded as the correct relative rotation between Sk-1 and Sk, formatted as matrix ̅ −1 . The correspondences with angles that are inconsistent with ̅ −1 are rejected. We then rotate the line segments in and calculate the middle point distance between the remaining matches. The correspondences with middle point distances that are too different from others are also treated as outliers. The others are accepted as inliers (Figure 4b).

Feature Matching
Keyline matching: For two consecutive laser scans S k-1 and S k , we detect their feature line segments L k´1 and L k . For each feature line fl (k-1,i) P L k-1 , our goal is to search for its correspondence in L k . We first compute the matching score between fl (k-1,i) and all lines in L k : score " dlp f l pk´1,iq q´dlp f l pk,jq q s.t. f l pk,jq P L k (4) where dlp¨q stands for the line descriptor; score is the matching score; and ¨ represents the Euclidean distance. The feature line fl (k,j) P L k that achieves the best matching score will be accepted as the potential correspondence of fl (k-1,i) . We then check the lengths of fl (k-1,i) and fl (k,j) . They are discarded as outliers if their length difference is very large because the line correspondence pair in laser scans usually has similar S k-1 lengths. Figure 4a shows a sample result. As seen, most feature lines are correctly matched. However, there are still some outliers, such as line correspondence 9. Because the L k relationship between laser scans and S k is rigid transformation, the angles or rotations between the correct 2D line correspondences are almost the same. Thus, we first cluster the angles of line correspondence via the histogram technique. The average angle of the largest cluster is regarded as the correct relative rotation between S k-1 and S k , formatted as matrix r k´1 . The correspondences with angles that are inconsistent with r k´1 are rejected. We then rotate the line segments in and calculate the middle point distance between the remaining matches. The correspondences with middle point distances that are too different from others are also treated as outliers. The others are accepted as inliers (Figure 4b).
Keypoint matching: Suppose that we have detected the feature points P k-1 and P k of S k-1 and S k , respectively. As above, for each feature point fp (k,j) P P k-1 , we compute the matching score between fp (k,j) and all points in P k : score " dpp f p pk´1,iq q´dpp f p pk,jq q s.t. f p pk,jq P P k (5) where dpp¨q stands for the feature point descriptor. The feature point fp (k,j) P P k that achieves the best matching score will be accepted as the potential correspondence of fp (k,i) . As seen in Figure 5a, the points connected by a red line are potential correspondences. There are many false matches in the naive matching results. As we have obtained the rotation r k´1 between laser scans S k-1 and S k in the last section, we can eliminate the rotation changes between these potential correspondences. Thus, there is only translation differences between these transformed matching points. In fact, the correct discarded as outliers if their length difference is very large because the line correspondence pair in laser scans usually has similar Sk-1 lengths. Figure 4a shows a sample result. As seen, most feature lines are correctly matched. However, there are still some outliers, such as line correspondence 9.
Because the Lk relationship between laser scans and Sk is rigid transformation, the angles or rotations between the correct 2D line correspondences are almost the same. Thus, we first cluster the angles of line correspondence via the histogram technique. The average angle of the largest cluster is regarded as the correct relative rotation between Sk-1 and Sk, formatted as matrix . The correspondences with angles that are inconsistent with are rejected. We then rotate the line segments in and calculate the middle point distance between the remaining matches. The correspondences with middle point distances that are too different from others are also treated as outliers. The others are accepted as inliers (Figure 4b).
(a) (b)   Keypoint matching: Suppose that we have detected the feature points Pk-1 and Pk of Sk-1 and Sk, respectively. As above, for each feature point fp(k,j)  Pk-1, we compute the matching score between fp(k,j) and all points in Pk: score dp fp dp fp s t fp P where ( ) dp  stands for the feature point descriptor. The feature point fp(k,j)  Pk that achieves the best matching score will be accepted as the potential correspondence of fp(k,i). As seen in Figure 5a, the points connected by a red line are potential correspondences. There are many false matches in the naive matching results. As we have obtained the rotation between laser scans Sk-1 and Sk in the last section, we can eliminate the rotation changes between these potential correspondences. Thus, there is only translation differences between these transformed matching points. In fact, the correct point correspondences should have almost the same translations. We cluster the translations in x and y coordinates. The average translation ̅ of the largest cluster in x and y is considered as the true translation between Sk-1 and Sk. The correspondences that are inconsistent with the translation ̅ are then rejected as outliers. The cleaned matching points are shown in Figure 5b.

Transformation Estimation
The number of degrees of freedom of this equation is three, i.e., one for rotation and two for translation. Two correspondence pairs are enough to solve it. Classical approaches usually minimize the following nonlinear least-squares cost function via the Gauss-Newton [24] method: However, these methods are sensitive to outliers. They usually fail to obtain the correct solutions when the observations (feature correspondences) are corrupted by outliers. The current solutions usually combine minimal solvers with RANSAC-based [9] schemes to reject outliers

Transformation Estimation
Let C pk´1,kq ! p f p pk´1,iq P P k´1 , f l pk,iq P P k q ) be the feature point correspondence pairs; the relationship between pair p f p pk´1,iq P P k´1 , f l pk,iq P P k q can by modeled by a rigid transformation: The number of degrees of freedom of this equation is three, i.e., one for rotation and two for translation. Two correspondence pairs are enough to solve it. Classical approaches usually minimize the following nonlinear least-squares cost function via the Gauss-Newton [24] method: where p i " f p pk´1,iq represents the observations, andp i " r k´1¨f l pk,iq`tk´1 is the calculated value. p i andp i are introduced only for compactness of notation. n is the number of pairs in C pk´1,kq .
However, these methods are sensitive to outliers. They usually fail to obtain the correct solutions when the observations (feature correspondences) are corrupted by outliers. The current solutions usually combine minimal solvers with RANSAC-based [9] schemes to reject outliers during a preprocessing stage. Unfortunately, the minimal solvers with RANSAC-based schemes are sensitive to noise. In addition, RANSAC-based methods are very slow.
Outliers cannot be absolutely avoided in correspondences C pk´1,kq , so a good solver should have the ability to automatically segment the residual vector v " rv 1 , v 2 , ..., v n s into an inlier set Ipv i ||v i | « 0q and an outlier set Opv i ||v i | " 0q . Classical least-squares cost is based on a fundamental assumption that observations are subjected to a normal distribution and free of outliers. The results will be skewed. Recently, a sparsity-inducing norm, the lq-norm [25][26][27][28], has shown great potential and can achieve better performance than the l1-norm [29] and l0-norm [30]. Motivated by that, we reformulate the cost function based on the lq-norm metric: where ||¨|| q is an lq-norm (0 ă q ă 1) operator.
To optimize this problem, it is formed as an lq-norm penalized least-squares (lqLS) problem, which has been well studied by Marjanovic and Solo [28]. By introducing auxiliary variables M " rm 1 , m 2 , ..., m n s into Equation (8), the cost function is rewritten as: Using the augmented Lagrangian function, we can reformulate this constrained optimization function into an unconstrained one: where Λ " rλ 1 , λ 2 , ..., λ n s are Lagrange multipliers or dual variables and ρ ą 0 is a penalty parameter.
To simplify this problem, the alternating direction method of multipliers (ADMM) is employed to decompose the function into three subproblems: prob 2 :pr m`1 k´1 , t m`1 k´1 q :" argmin where the superscript m denotes the iteration counter. d i " l i ρ`pi`pi and g i " l i ρ`pi´m i are used only for compactness of notation. In prob 1, M is the only variable to be estimated, whereas the others are fixed. In prob 2, the rigid transformation T k-1 = {r k-1 ,t k-1 } is the only variable. The ADMM alternates among these three steps until convergence. Details about augmented Lagrangian methods and ADMM can be found in the literature [31]. prob 1 is an lq-norm penalized least squares (lqLS) problem. We adapt Marjanovic and Solo's lq Cyclic Descent (lqCD) algorithm [28] to solve it. More details about the lqLS problem may be seen in Marjanovic and Solo's literature [28]. prob 2 is a classical least-squares function, which can be easily solved by DLT. Our lq-norm solver can converge to the correct solution within only a few iterations because the very good initial guess has been obtained above.

SLAM System and Datasets
We design a hardware system for indoor mapping and modeling tasks. As shown in Figure 6, this SLAM system consists of three laser scanners, a panoramic camera, and two odometers. The 2D Sick laser scanner has a 360˝field of view, a 0.5˝-point measurement resolution, and a 10 Hz scanning rate. The horizontal laser is used for location (scan matching), and the other two oblique lasers are used for mapping 3D point clouds. The two odometers can provide a good initial guess for ICP scan matching. Sick laser scanner has a 360° field of view, a 0.5°-point measurement resolution, and a 10 Hz scanning rate. The horizontal laser is used for location (scan matching), and the other two oblique lasers are used for mapping 3D point clouds. The two odometers can provide a good initial guess for ICP scan matching. We collect two datasets using this hardware system. The first one is a market with an area of 8000 m 2 . We push the cart and walk. The length of the full trajectory is 1137 m. We sample this dataset into 2200 key laser scans. The second one is an underground parking garage with an area of 3200 m 2 . The length of the trajectory is 357 m. It is sampled in 1414 key laser scans. In both datasets, the moving speed of the cart is 1 m/s. Moreover, we also use two SLAM benchmarking datasets for evaluation [32]. The first dataset was collected at the Intel Research Lab, and the other one was collected at the MIT CSAIL Building. The ground truth of both datasets was provided.

Comparison with State-of-the-Art Methods
For our datasets, we randomly pick 4 laser scan pairs from each dataset and register these scan pairs by using PSM, PLICP, ICP without an initial guess (initial guess set to zero, denoted by ICP 1 ), the proposed method and ICP with a good initial guess (initial-guess obtained by the two odometers, denoted by ICP 2 ). The results of ICP 2 are regarded as the ground truth. For benchmarking datasets, we randomly pick 50 laser scan pairs. We compare our method with PSM, PLICP and ICP 1 . The ICP 1 uses a point-to-point metric and sets the maximum iterations to 100. The source codes of PSM and PLICP are obtained from the authors' websites. The source code of ICP used in this paper can be found in [33]. Figures 7-10 give the visual matching results. As seen, our method achieves similar results to ICP 2 or the ground truth. In all plots of our results, ICP 2 or the ground truth, the transformed laser scan of Sk (denoted by blue +) covers the first laser scan Sk-1 (denoted by red +). This means that both We collect two datasets using this hardware system. The first one is a market with an area of 8000 m 2 . We push the cart and walk. The length of the full trajectory is 1137 m. We sample this dataset into 2200 key laser scans. The second one is an underground parking garage with an area of 3200 m 2 . The length of the trajectory is 357 m. It is sampled in 1414 key laser scans. In both datasets, the moving speed of the cart is 1 m/s. Moreover, we also use two SLAM benchmarking datasets for evaluation [32]. The first dataset was collected at the Intel Research Lab, and the other one was collected at the MIT CSAIL Building. The ground truth of both datasets was provided.

Comparison with State-of-the-Art Methods
For our datasets, we randomly pick 4 laser scan pairs from each dataset and register these scan pairs by using PSM, PLICP, ICP without an initial guess (initial guess set to zero, denoted by ICP 1 ), the proposed method and ICP with a good initial guess (initial-guess obtained by the two odometers, denoted by ICP 2 ). The results of ICP 2 are regarded as the ground truth. For benchmarking datasets, we randomly pick 50 laser scan pairs. We compare our method with PSM, PLICP and ICP 1 . The ICP 1 uses a point-to-point metric and sets the maximum iterations to 100. The source codes of PSM and PLICP are obtained from the authors' websites. The source code of ICP used in this paper can be found in [33]. Figures 7-10 give the visual matching results. As seen, our method achieves similar results to ICP 2 or the ground truth. In all plots of our results, ICP 2 or the ground truth, the transformed laser scan of S k (denoted by blue +) covers the first laser scan S k-1 (denoted by red +). This means that both our method and ICP 2 can correctly align these laser scan pairs. PLICP and ICP 1 have similar performances; PSM performs better than PLICP and ICP 1 . However, PSM, PLICP and ICP 1 will fail in some cases. These methods are very sensitive to the initial guess, whereas our method is initial free. . Some scan matching results of dataset 1. The first row is the results of PSM; the second row is the results of PLICP; the third row is the results of ICP without an initial guess (ICP 1 ); the fourth row is our scan matching results; the last row is the results of ICP with a good initial guess (ICP 2 ). In all of these plots, the points denoted by red +, black +, and blue + represent the first laser scan Sk-1, the second laser scan Sk, and the transformed laser scan of Sk by using the estimated transformation matrix, respectively.  . Some scan matching results of dataset 1. The first row is the results of PSM; the second row is the results of PLICP; the third row is the results of ICP without an initial guess (ICP 1 ); the fourth row is our scan matching results; the last row is the results of ICP with a good initial guess (ICP 2 ). In all of these plots, the points denoted by red +, black +, and blue + represent the first laser scan S k-1 , the second laser scan S k , and the transformed laser scan of S k by using the estimated transformation matrix, respectively. Figure 7. Some scan matching results of dataset 1. The first row is the results of PSM; the second row is the results of PLICP; the third row is the results of ICP without an initial guess (ICP 1 ); the fourth row is our scan matching results; the last row is the results of ICP with a good initial guess (ICP 2 ). In all of these plots, the points denoted by red +, black +, and blue + represent the first laser scan Sk-1, the second laser scan Sk, and the transformed laser scan of Sk by using the estimated transformation matrix, respectively. . Some scan matching results of dataset 2. The first row is the results of PSM; the second row is the results of PLICP; the third row is the results of ICP without an initial guess (ICP 1 ); the fourth row is our scan matching results; the last row is the results of ICP with a good initial guess (ICP 2 ). In all of these plots, the points denoted by red +, black +, and blue + represent the first laser scan Sk-1, the second laser scan Sk, and the transformed laser scan of Sk by using the estimated transformation matrix, respectively. . Some scan matching results of dataset 2. The first row is the results of PSM; the second row is the results of PLICP; the third row is the results of ICP without an initial guess (ICP 1 ); the fourth row is our scan matching results; the last row is the results of ICP with a good initial guess (ICP 2 ). In all of these plots, the points denoted by red +, black +, and blue + represent the first laser scan S k-1 , the second laser scan S k , and the transformed laser scan of S k by using the estimated transformation matrix, respectively. Figure 8. Some scan matching results of dataset 2. The first row is the results of PSM; the second row is the results of PLICP; the third row is the results of ICP without an initial guess (ICP 1 ); the fourth row is our scan matching results; the last row is the results of ICP with a good initial guess (ICP 2 ). In all of these plots, the points denoted by red +, black +, and blue + represent the first laser scan Sk-1, the second laser scan Sk, and the transformed laser scan of Sk by using the estimated transformation matrix, respectively. . Some scan matching results of the Intel dataset. The first row is the results of PSM; the second row is the results of PLICP; the third row is the results of ICP without an initial guess (ICP 1 ); the fourth row is our scan matching results; the last row is the ground truth. In all these plots, the points denoted by red +, black +, and blue + represent the first laser scan Sk-1, the second laser scan Sk, and the transformed laser scan of Sk by using the estimated transformation matrix, respectively. . Some scan matching results of the Intel dataset. The first row is the results of PSM; the second row is the results of PLICP; the third row is the results of ICP without an initial guess (ICP 1 ); the fourth row is our scan matching results; the last row is the ground truth. In all these plots, the points denoted by red +, black +, and blue + represent the first laser scan S k-1 , the second laser scan S k , and the transformed laser scan of S k by using the estimated transformation matrix, respectively. Figure 9. Some scan matching results of the Intel dataset. The first row is the results of PSM; the second row is the results of PLICP; the third row is the results of ICP without an initial guess (ICP 1 ); the fourth row is our scan matching results; the last row is the ground truth. In all these plots, the points denoted by red +, black +, and blue + represent the first laser scan Sk-1, the second laser scan Sk, and the transformed laser scan of Sk by using the estimated transformation matrix, respectively. Figure 10. Some scan matching results of the MIT dataset. The first row is the results of PSM; the second row is the results of PLICP; the third row is the results of ICP without an initial guess (ICP 1 ); the fourth row is our scan matching results; the last row is the ground truth. In all these plots, the points denoted by red +, black +, and blue + represent the first laser scan Sk-1, the second laser scan Sk, and the transformed laser scan of Sk by using the estimated transformation matrix, respectively. Tables 1 and 2 show the quantitative evaluation results of our datasets. In the Tables, tx and ty represent the relative translation along the x-axis and y-axis between a laser scan pair, respectively; r represents the relative rotation angle between a scan pair. In this evaluation, we regard the results of ICP 2 as the approximate ground truth rigid transformation. PSM, PLICP, ICP 1 and our  Figure 10. Some scan matching results of the MIT dataset. The first row is the results of PSM; the second row is the results of PLICP; the third row is the results of ICP without an initial guess (ICP 1 ); the fourth row is our scan matching results; the last row is the ground truth. In all these plots, the points denoted by red +, black +, and blue + represent the first laser scan S k-1 , the second laser scan S k , and the transformed laser scan of S k by using the estimated transformation matrix, respectively. Tables 1 and 2 show the quantitative evaluation results of our datasets. In the Tables, tx and ty represent the relative translation along the x-axis and y-axis between a laser scan pair, respectively; rθ represents the relative rotation angle between a scan pair. In this evaluation, we regard the results of ICP 2 as the approximate ground truth rigid transformation. PSM, PLICP, ICP 1 and our method are compared with ICP 2 . From these tables, we can also draw similar conclusions as in the visual comparisons. As seen, our estimated transformations are almost the same as ICP 2 . The maximum angle difference between our method and ICP 2 is less than 0.01 rad (0.573˝). For most cases, the translation differences between our method and ICP 2 are smaller than 0.05 m. The maximum translation difference is 0.09 m, which is the result of scan pair 3 in dataset 1. We enlarge the visual results of scan pair 3 (see Figure 11) and find that our result is even better than that of ICP 2 . The success rates of PSM, PLICP and ICP 1 are 37.5%, 12.5% and 12.5%, respectively. Comparing ICP 1 with ICP 2 , only the second scan pair is correctly matched; the translation differences are 0.07 m and 0.04 m, which are less accurate than our method. Even the translation differences of the third scan pair in dataset 2 are 0.13 m and 0.13 m, and their rotation difference is up to 0.05 rad (2.865˝). ICP 1 can succeed only for scan pair 2 of dataset 1, which can be expected because the rotation between this pair is small. Thus, the proposed method is much more reliable when the initial guess is unavailable. maximum angle difference between our method and ICP is less than 0.01 rad (0.573°). For most cases, the translation differences between our method and ICP 2 are smaller than 0.05 m. The maximum translation difference is 0.09 m, which is the result of scan pair 3 in dataset 1. We enlarge the visual results of scan pair 3 (see Figure 11) and find that our result is even better than that of ICP 2 . The success rates of PSM, PLICP and ICP 1 are 37.5%, 12.5% and 12.5%, respectively. Comparing ICP 1 with ICP 2 , only the second scan pair is correctly matched; the translation differences are 0.07 m and 0.04 m, which are less accurate than our method. Even the translation differences of the third scan pair in dataset 2 are 0.13 m and 0.13 m, and their rotation difference is up to 0.05 rad (2.865°). ICP 1 can succeed only for scan pair 2 of dataset 1, which can be expected because the rotation between this pair is small. Thus, the proposed method is much more reliable when the initial guess is unavailable.
where tx gt and ty gt are the ground truth of translation; tθ gt is the ground truth of rotation; and n is the total scan pairs (in this experiment, n = 50). If e x and e y of a scan pair are smaller than 0.1 m and e θ is smaller than 0.03 rad, this estimated transformation matrix is regarded as a correct one. The success rate is the ratio of correctly matched scan pairs and total scan pairs. From this table, we can see that the translation error of our method is less than 2 cm and that the rotation error is approximately 0.1 rad. This is much better than those of PSM, PLICP and ICP 1 because the success rate of our method is 100%. Our method achieves 18%, 34% and 42% growth rates of the success rate for the Intel dataset and 20%, 50% and 56% growth rates for the MIT dataset, respectively. PSM, PLICP and ICP 1 are not very robust and stable because of the need for an initial guess.

Running Time
Scan matching is usually a front end of SLAM systems, in which time-consuming methods are unacceptable. We select four scan pairs for evaluation. The numbers of points in each scan of the four scan pairs are 180, 360, 720 and 1080. All experiments are run in Microsoft Visual Studio 2010 installed on a laptop PC with an Intel Core i5-3210M 2.5 GHz CPU and 8 GB of RAM. Note that we rewrite the ICP code in C++ for comparison. The results are given in Table 4. As seen, PSM is the fastest. It achieves 100 FPS when the number of points is 720. Our method is much faster than PLICP and ICP 1 because the numbers of keypoints and keylines are very small. Our method can achieve almost 30 FPS and 20 FPS when the numbers of points are 360 and 720, respectively. This is sufficient for 2D indoor scan matching because the number of points in a 2D laser scan is less than 1080 in most cases.

Application for SLAM
For each dataset, we first perform the proposed scan matching algorithm on all consecutive key laser scan pairs, obtaining the relative positions and orientations T k-1 = {r k-1 ,t k-1 } of these pairs. We choose the first key laser scan as the base on which to build a global map. The coordinate system of this global map is the same as the first key scan. We use Q k-1 to represent the points on the incremental global map, accumulated until the k-1-th laser scan. For a new laser scan S k , we then match S k with Q k-1 via ICP. The initial guess of ICP is provided by our method (T k-1 = {r k-1 ,t k-1 })). This transformation matrix T k-1 = {r k-1 ,t k-1 } is refined by this process, and more accurate relative positions and orientations are achieved. The global map is also updated by fusing the points of laser scan S k . This map matching technique has a very important advantage: it is a global matching method, which can largely reduce the drift caused by scan matching methods. After all laser scans are processed, a 2D indoor map and a trajectory can be produced. Finally, this trajectory (positions and orientations) can be further optimized by some global optimization approaches such as G2O [34] or some filtering techniques. In this experiment, we regard the trajectory refined by the global map as the initial guess needed by particle filter SLAM (Gmapping [35]) and obtain a more accurate indoor map. We can also generate the 3D point clouds of the indoor scenes based on the other two laser scanners and the optimized trajectory. Figures 12 and 13 show the indoor mapping results of dataset 1 and dataset 2, respectively. As seen, a smooth trajectory can be achieved by combining our scan matching approach with a map matching technique. There are no abrupt changes in the results, which indicates that there are no obvious false matches because the cart is moved at a near-constant speed. The unrefined trajectory has a similar shape to the global refined one. Its local accuracy is much closer to the global trajectory. The proposed method drifts over time, which could not be avoided by any scan matching method. From the second pictures in Figures 12 and 13, we can see that the 2D point clouds have very high local accuracy. The thickness of the point clouds is less than 0.1 m in most cases. The point clouds represent the outline of indoor walls whose thickness is almost 0 in the ideal case. However, the range noise of the Sick laser scanner is 0.02 m, and the location error cannot be avoided. Thus, a trajectory that can construct a map with 0.1 m local accuracy is sufficient to provide a good initial guess for global refinement methods, such as Gmapping. The third pictures of Figures 12 and 13 verify this conclusion. The thickness of the point clouds is less than 0.08 m in all cases. There is no "ghosting" in the maps, and the walls are straight. The constructed 2D maps are consistent with the real scenes. Figure 14 gives the final indoor maps of the Intel and MIT datasets. Figure 15 shows the 3D point clouds of dataset 1 reconstructed by the other two oblique lasers. and orientations are achieved. The global map is also updated by fusing the points of laser scan Sk. This map matching technique has a very important advantage: it is a global matching method, which can largely reduce the drift caused by scan matching methods. After all laser scans are processed, a 2D indoor map and a trajectory can be produced. Finally, this trajectory (positions and orientations) can be further optimized by some global optimization approaches such as G2O [34] or some filtering techniques. In this experiment, we regard the trajectory refined by the global map as the initial guess needed by particle filter SLAM (Gmapping [35]) and obtain a more accurate indoor map. We can also generate the 3D point clouds of the indoor scenes based on the other two laser scanners and the optimized trajectory. Figures 12 and 13 show the indoor mapping results of dataset 1 and dataset 2, respectively. As seen, a smooth trajectory can be achieved by combining our scan matching approach with a map matching technique. There are no abrupt changes in the results, which indicates that there are no obvious false matches because the cart is moved at a near-constant speed. The unrefined trajectory has a similar shape to the global refined one. Its local accuracy is much closer to the global trajectory. The proposed method drifts over time, which could not be avoided by any scan matching method. From the second pictures in Figures 12 and 13, we can see that the 2D point clouds have very high local accuracy. The thickness of the point clouds is less than 0.1 m in most cases. The point clouds represent the outline of indoor walls whose thickness is almost 0 in the ideal case. However, the range noise of the Sick laser scanner is 0.02 m, and the location error cannot be avoided. Thus, a trajectory that can construct a map with 0.1 m local accuracy is sufficient to provide a good initial guess for global refinement methods, such as Gmapping. The third pictures of Figures 12 and 13 verify this conclusion. The thickness of the point clouds is less than 0.08 m in all cases. There is no "ghosting" in the maps, and the walls are straight. The constructed 2D maps are consistent with the real scenes. Figure 14 gives the final indoor maps of the Intel and MIT datasets. Figure 15 shows the 3D point clouds of dataset 1 reconstructed by the other two oblique lasers.

Limitations
There are still some problems in our algorithm that must be resolved. First, our method is unsuitable for outdoor scenes. Outdoor scenes are more complicated than indoor environments. There are many moving objects and few line segments in outdoor scenes that will affect our method. This is an inherent drawback of our scan matching. Second, our scan matching method has only three degrees of freedom. It will fail if the floor of the indoor scene is not planar. Thus, if we want to build a map of a multiple-floor indoor environment, we must perform the process for each floor; all floors are then registered manually. This is a common drawback of 2D scan matching methods. In addition, our method highly relies on the feature detection stage. Our method needs at least one correct line correspondence and two correct point correspondences. If there are no line correspondences in a laser scan pair, we should discard one scan and obtain another one to build a new pair. This is the process of key laser scan selection. Alternatively, we can use only point correspondence for scan matching. If there are few 1D SIFT feature points, we should add other

Limitations
There are still some problems in our algorithm that must be resolved. First, our method is unsuitable for outdoor scenes. Outdoor scenes are more complicated than indoor environments. There are many moving objects and few line segments in outdoor scenes that will affect our method. This is an inherent drawback of our scan matching. Second, our scan matching method has only three degrees of freedom. It will fail if the floor of the indoor scene is not planar. Thus, if we want to build a map of a multiple-floor indoor environment, we must perform the process for each floor; all floors are then registered manually. This is a common drawback of 2D scan matching methods. In addition, our method highly relies on the feature detection stage. Our method needs at least one correct line correspondence and two correct point correspondences. If there are no line correspondences in a laser scan pair, we should discard one scan and obtain another one to build a new pair. This is the process of key laser scan selection. Alternatively, we can use only point correspondence for scan matching. If there are few 1D SIFT feature points, we should add other feature points, such as normal-based feature points, curve-based feature points, line intersection points, and so on.

Conclusions
In this paper, we propose a 2D laser scan matching method based on point and line features. Our method does not require an initial guess for the transformation. We carefully design a framework for the detection of point and line feature correspondences. The feature points are detected based on an extended 1D SIFT, and line features are extracted via a modified split-and-merge algorithm. Line segments are a very important type of features for indoor environments. The advantages of using line features are twofold: first, line features are more distinct for indoor scenes and are less sensitive to noise in the laser range measurements; second, the orientation information can be easily extracted from 2D line correspondences, which can help us reject feature point outliers and provide a good initial guess for our transformation estimation stage. The point and line features are then described by a distance histogram. The histogram cluster technique can help us filter outliers and provide an accurate initial value of the rigid transformation. We also propose a new relative pose estimation method that is robust to outliers. Unlike the conventional RANSAC-based strategy, there is no gross error detection stage in our pose estimation algorithm. In addition, our pose estimation algorithm is more robust to noise than the RANSAC-based one. Extensive experiments on real data demonstrate that the proposed method is almost as accurate as ICPs and is initial free. We also show that our scan matching method can be integrated into a simultaneous localization and mapping (SLAM) system for indoor mapping. There are also some limitations to our method. The main limitation is the three degrees of freedom. Our future work will be to extend the proposed method to six degrees of freedom.