Detection and Compensation of Degeneracy Cases for IMU-Kinect Integrated Continuous SLAM with Plane Features †

In a group of general geometric primitives, plane-based features are widely used for indoor localization because of their robustness against noises. However, a lack of linearly independent planes may lead to a non-trivial estimation. This in return can cause a degenerate state from which all states cannot be estimated. To solve this problem, this paper first proposed a degeneracy detection method. A compensation method that could fix orientations by projecting an inertial measurement unit’s (IMU) information was then explained. Experiments were conducted using an IMU-Kinect v2 integrated sensor system prone to fall into degenerate cases owing to its narrow field-of-view. Results showed that the proposed framework could enhance map accuracy by successful detection and compensation of degenerated orientations.


Introduction
Plane features have been widely used for simultaneous localization and mapping (SLAM) of indoor environments due to the following two reasons: (1) there are abundant planes in man-made indoor spaces; and (2) sensor noise can be sufficiently reduced by conventional plane extraction algorithms.
However, the use of plane features can fall into degenerate cases when the number of linearly independent information of detected planes is insufficient for pose estimation. In other words, if less than three independent planes are detected from two consecutive 3D poses, the relative geometric relationship cannot be fully estimated. For example, for a long and flat corridor, the amount of translation along the corridor's direction cannot be detected even with wide field-of-view (FoV) LiDAR sensors. If one uses narrow FoV sensors such as Kinect (Microsoft), these degeneracies are frequently encountered, even in sufficiently complex indoor spaces [1].

Seamless 3D SLAM Framework
The proposed pipeline in Figure 1 has the conventional Graph SLAM structure with three main modules: data acquisition, front-end construction, and back-end optimization. In the first block (data acquisition), raw 3D point cloud data from Kinect v2 and 6-DoF (degrees of freedom) measurements from the IMU are acquired. Those data are passed to the second block and plane features are extracted by using algorithms in [23,24]. Here, an i-th plane feature (Π i ) consists of unit normal (n i ∈ R 3×1 ), which is orthogonal to the surface, and a perpendicular distance (d i ) from the origin to the surface.
At this point, the degeneracy can be induced if less than three independent plane-pairs are used. As these degeneracies provide clear evidence in the second moment matrix, the degeneracy detection algorithm (in Section 2.1, our first contribution) can analyze the matrix and pinpoint the degeneracy direction. Then, the pose compensation algorithm (in Section 2.2, our second contribution) can compensate for the degeneracy by projecting it to that of the IMU. Remaining procedures are conventional loop detection and back-end optimization adopted from methods in [25,26], respectively.

Degeneracy Detection Algorithm
Relative pose estimation with plane features are illustrated in Figure 2, where sets of plane features are associated with each other. When there are more than three pairs of planes (as in Figure 2a), all 6-DoF parameters (3-DoF for rotation and 3-DoF for translation) can be estimated. However, when one pair of planes with identical normals exist (as in Figure 2d), 1-and 2-DoF degeneracy arises for rotation and translation, respectively. These two cases are out of the scope of this paper as noise (from motion as well as sensors) either does not significantly affect pose estimation or does not exist. However, in the case of 2-pair correspondence, noises can convert a 1-pair correspondence into a false 2-pair correspondence as shown in Figure 2c. The role of this subsection is to propose an algorithm that can discriminate false 2-pair correspondence from real 2-pair correspondence cases.
For that purpose, we adopted the second moment matrix, in which the rank and ratio of eigenvalues indicate the number of linearly independent plane correspondences and distinctiveness among independent correspondences, respectively.
To detect the rank and ratio of its eigenvalues, let us conduct the eigenvalue decomposition as where V is the orthonormal square matrix whose columns are eigenvectors v i (i = 1, 2, 3) and Λ is the diagonal matrix whose elements are associated eigenvalues (λ i ). For convenient representation, eigenvectors are sorted according to their eigenvalues in descending order (λ 1 ≥ λ 2 ≥ λ 3 ). These eigenvalues are equal to or greater than 0 because l r M is a positive semi-definite matrix. Now, let us exploit the ratio of eigenvalues (λ 2 /λ 1 ). Its purpose is to discriminate the effective rank. Note that the real and false 2-pair correspondences can be detected by evaluating this ratio as its value tends to be close to 0 (i.e., λ 1 λ 2 ) for false cases but higher than a certain threshold for real cases.
Last but not the least, the remaining task is to select a proper threshold considering sensor noise level. For Kinect v2, it has been shown that its depth distortion yields a fluctuation error within ±6 mm [27]. With this amount of distortion, rotational covariance between poses is evaluated to be 4 • for our plane-based method. In accordance with the empirical analysis, let us assume that there are two unit normalsn = 1 0 0 andn * = cos 4 • − sin 4 • 0 of which the included angle is 4 • . Then, the value of the second moment matrix forn andn * can be calculated as Here, note that the dimensionality of the second moment matrix is reduced from 3 to 2 because the number of the sample (n,n * ) is smaller than the dimension of vector space (R 3×1 ). Thus, using these eigenvalues in Equation (3), we set the threshold of the ratio to be 1.2 × 10 −3 ( 1.2 × 10 −3 /9.988 × 10 −1 ).

Compensation Method for Degenerate Rotation
In the 1-pair correspondence case, as shown in Figure 3, the amount of rotation (γ) normal to the plane (v) cannot be estimated, where v corresponds to the one effective eigenvector of the second moment matrix in Equation (2). Here, IMU measurements are only used for compensating an ill-conditioned component regarding γ of the state that cannot be estimated by the features. For other well-conditioned components of which value can be estimated from the feature, measurements of IMU are not utilized due to their inaccuracy compared to those of features. Before looking deep into mathematical details, let us show some quaternion definitions. A unit quaternion (q ∈ R 4 ) for the relative rotation is represented by where q 0 and q are scalar and vector parts, respectively. The conjugate of q is The product of two quaternions p and q is where · is the dot product and × is the cross product.
The main idea of this subsection is to set γ by projecting the IMU's estimation in a way that it becomes parallel to the plane. In other words, this idea is a hybrid of two different estimations. One prediction is derived from the plane-based estimation. It keeps the plane constraint. However, it has an uncertain γ value. The other is derived from the IMU estimation. It does not necessarily keep the plane constraint. However, it has accurate γ value (at least for the given short time interval). Now, let us denote the IMU prediction to be q imu and the amount of rotation that projects q imu parallel to the plane to be ∆q. Here, ∆q can be derived from two normal vectors, v imu and v reg . They are updated normal vectors by IMU (q imu ) and plane information (q reg ), respectively: Now, ∆q can be calculated by projecting v imu parallel to v reg in the shortest arc length as

Further Processes
Although the orientation in the degeneracy direction is compensated for by Equation (9), a translation in that direction is still in degeneracy. Pathak et al. [17] have proposed a method that corrects uncertain translation by using IMU's acceleration measurements. We also adopted this method for translation.
For loop-closure, we used a 3D Gestalt descriptor based method proposed in [25]. Finally, for back-end optimization, we implemented IRLS (iteratively reweighted least squares) that excluded less accurate outliers as in [26].

Experiments
For performance validation, SLAM experiments were conducted with two different sensor systems as shown in Figure 4. One (Figure 4a) is a low-cost sensor system with a Kinect v2 and cheap IMU (CH-UM7: ±4 • for dynamic pitch/roll accuracy, ±8 • for dynamic yaw accuracy). The other (Figure 4b) is a high-cost sensor system [28] with Velodyne LiDAR (HDL-32E) and MicroStrain IMU (3DM-GX3-45: ±2 • for dynamic pitch/roll/yaw accuracy). The purpose of the high-cost system is to extract the ground truth.  For data acquisition, an operator carried the system (as shown in Figure 5) through a small-sized building with narrow pathways. The operator navigated 62 m. A total of 410 place indices were generated. As a result, a graph was built where 255, 145, and nine edges were in 3-pair, 2-pair, and 1-pair plane correspondences, respectively.
Given the information, two methods were applied. One was a plane-IMU integrated (pl-IMU) method, which was an extension of the plane-based method [17] in a way that IMU information could be embedded. The other was our proposed method, which was identical to the previous method except that 1-pair corresponding degeneracy was detected and compensated for. As shown in Figure 6, the accuracy of all components was increased except for the two components z and yaw (of which median values are similar, respectively, in both methods). Figure 7 shows error distances of the two methods relative to the ground truth for 3D pose states. Error averages were significantly reduced using the proposed method as shown in Table 1. Here, note that compensating rotation decreased both translation and rotation errors. This is a natural consequence as graph optimization increases the overall map accuracy due to rotational updates.
The performance can also be verified by 3D mapping. The map created by the proposed method showed high-quality consistency (Figure 8a,b), similar to that shown by the ground truth (Figure 8c,d).

Conclusions
This paper proposed a degeneracy detection and compensation method for orientations that could arise when 3D SLAM algorithms used plane features. This degeneracy is induced when less than three plane-pairs are detected for two consecutive poses. It has significant correlation with the narrowness of a sensor's field of view or the target environment. Our experiment showed that, when a 550 m 2 indoor space was mapped with a Kinect v2, 37.7% of data acquisition poses were in a degeneracy situation.
The proposed method detected degeneracy by using the rank of the second moment matrix constituted by plane normals. To compensate orientations encountered by the degeneracy, IMU orientations were projected in a direction that was orthogonal to non-degenerate orientations.