State Transition for Statistical SLAM Using Planar Features in 3D Point Clouds

There is a large body of literature on solving the SLAM problem for various autonomous vehicle applications. A substantial part of the solutions is formulated based on using statistical (mainly Bayesian) filters such as Kalman filter and its extended version. In such solutions, the measurements are commonly some point features or detections collected by the sensor(s) on board the autonomous vehicle. With the increasing utilization of scanners with common autonomous cars, and availability of 3D point clouds in real-time and at fast rates, it is now possible to use more sophisticated features extracted from the point clouds for filtering. This paper presents the idea of using planar features with multi-object Bayesian filters for SLAM. With Bayesian filters, the first step is prediction, where the object states are propagated to the next time based on a stochastic transition model. We first present how such a transition model can be developed, and then propose a solution for state prediction. In the simulation studies, using a dataset of measurements acquired from real vehicle sensors, we apply the proposed model to predict the next planar features and vehicle states. The results show reasonable accuracy and efficiency for statistical filtering-based SLAM applications.


Introduction
Simultaneous Localization and Mapping (SLAM) is often considered as one of the main challenges in the field of robotics and autonomous vehicles [1,2]. The aim of SLAM is to build a map of an unknown environment while simultaneously determining the location of the vehicle within this map. Neither the map nor the vehicle position are known in advance. However, a kinematic model of the vehicle motion is assumed to be known a priori, and the unknown environment is populated with artificial or natural landmarks.
There is a large body of literature on solving the SLAM problem for various autonomous vehicle applications. Many of the solutions are formulated based on using a Bayesian filter that recursively propagates the distribution of the vehicle's dynamic states (and sometimes the map features). Davison [3] proposed a solution based on using Extended Kalman Filters (EKFs) to track the location of each map feature which was extracted from the sensor measurements. To improve the accuracy of this method, Thrun et al. [4] proposed the Sparse Extended Information Filter. planes) at time k + 1, in terms of the plane dimensions (plane area and vertex distance) and the plane orientation (normal vector).
The rest of the paper is organized as follows. Section 2 provides relevant theoretical background. Section 3 introduces the details of the proposed transition model which comprises two sub-models: the plane transition model and the vehicle transition model. Section 4 explains the detailed procedure for implementation of the proposed transition model. Section 5 provides verification of the proposed transition model by means of graphical and numerical simulation results. Section 6 concludes the paper.

Background
The proposed transition model is mainly designed to be used with a recently developed Random Finite Set (RFS) based Bayesian filter-the Labeled Multi-Bernoulli (LMB) filter [20]. In this section, we provide the theoretical background necessary for understanding this filter, followed by a brief introduction to the KITTI dataset [19] which was used for simulation studies in this work.

Labeled Multi-Bernoulli Filter
In this section, we introduce the Labeled Multi-Bernoulli (LMB) filter [20]. We adopt the same notation used in [20] where the single-object states are denoted by lower-case letters, e.g., x, x and multi-object states by upper-case letters, e.g., X, X. In order to distinguish between labeled and unlabeled states and their distributions, the labeled one is shown by bolded letters, e.g., x, X, etc., spaces by blackboard bold, e.g., X, L, C, etc., and the class of finite subsets of a space X by F (X). Following [20], throughout the paper, the standard inner product notation is used and denoted by f , g f (x)g(x)dx, the generalized Kronecker delta is denoted by and the inclusion function, a generalization of the indicator function, by The LMB RFS is completely described by its components π = {(r ( ) , p ( ) ) : ∈ L}. The LMB RFS density is given by π(X) = ∆(X)w(L(X)) [p] X , where p(x, ) = p ( ) (x) and w(L) = comprising a single component [20]. The LMB multi-target Bayes recursion propagates multi-target posterior density at each time step according to the Chapman-Kolmogorov and the Bayes rule.

Prediction
Assume that the prior and birth labeled multi-Bernoulli sets are modeled as follows: where with state space X and label space L + = B ∪ L and with the condition B ∩ L = ∅. The predicted multi-object distribution is then a LMB RFS and given by where where p S (·| ) is the survival probability of an object and f (x|·, ) is the single-object transition model. Thus, if the multi-target posterior density is an LMB RFS with parameter set π = {(r ( ) , p ( ) ) : ∈ L} with state space X and label space L and the birth model is also an LMB RFS with parameter set ) : ∈ B} with state space X and label space B then the predicted multi-target density is also an LMB RFS with state space X and label space L + = B ∪ L(B ∩ L = ∅) and it is given by where r ( ) p ( ) for more details see- [20]-proposition 2.

KITTI Dataset
The KITTI dataset [19] was recorded in and around Karlsruhe, Germany using a VW wagon equipped with various types of sensors including a 3D laser scanner, four video cameras and a GPS/IMU navigation system. The data was collected in both urban and suburban areas (city, rural, and highway) and information was gathered over several days. In this dataset, many of the scenes are dominated by large buildings with planar surfaces. The KITTI dataset has been widely used in mobile robotics and autonomous driving research.

Proposed Transition Model
As explained in Section 1, a transition model is required in SLAM filters to predict the states of the features at the next time step, based on the state information at the current time step. Specifically, in this study, we are interested in constructing a transition model for predicting the plane parameters at the next time step, using the current plane parameters estimated by a SLAM filter. Note that the planes involved in the transition model are expressed in the vehicle coordinate system (fixed to the vehicle), as the point cloud measurements are acquired by a laser scanner mounted on top of the experimental vehicle. Thus the plane parameters vary with time, as long as the experimental vehicle is in motion. On the other hand, if seen from the global coordinate system (fixed to the ground), the plane parameters are invariant since the planes are all static in this coordinate system.
We emphasize that the vehicle is assumed to move in a static environment. If this is not the case, we make the practical assumption that planar surfaces on other moving objects are small enough to be excluded from the set "plane observations" extracted from the 3D point cloud. That set is expected to include only relatively large planar surfaces such as walls of buildings along the road.

Plane Transition Model
To facilitate the design of the transition model, the two aforementioned coordinate systems are established, as shown in Figure 1. The global coordinate system OXYZ is fixed to the ground, and the vehicle coordinate system oxyz is attached to the vehicle Center of Mass and moves along with the vehicle. P is a point on a plane segmented from the 3D point cloud obtained from the on-board laser scanner. r P and r are the position vectors of P in the global and vehicle coordinate systems, and r V is the position vector of the vehicle centre of mann (i.e., origin o) in the global coordinate system. The global coordinates of P are invariant since r P is static as seen from the global coordinate system. However, the local coordinates of P, expressed in the vehicle coordinate system, vary with time due to the motion of the vehicle. As a result, the plane parameters expressed in the vehicle coordinate system also evolve with time, and a transition model is thus needed to predict the change of plane parameters for the purpose of accurate and effective SLAM. The plane equation, expressed in the vehicle coordinate system, is assumed to take the following form in this study: where x, y and z are the coordinates of a point on the plane, and a, b, c and d are the plane parameters.
Note that other forms of plane equations are also available in the literature [21], but all forms of plane equations can eventually lead to Equation (23) after simple algebraic manipulation. We may rewrite Equation (23) in the following vector form: where β = a b c d T and r = x y z 1 T .
Assuming that the vehicle is only performing a planar motion (which is normally the case in common flat-road urban driving scenarios), the coordinates of point P in the global coordinate system r P = X Y Z 1 T , and the coordinates of point P in the vehicle coordinate system r = x y z 1 T , are related according to the following equations describing the kinematics of the vehicle [22,23]: and where φ represents the heading angle of the vehicle (with respect to the X axis), and X V and Y V denote the coordinates of the vehicle centre of mass (origin o) in the global coordinate system (namely The vehicle states φ, X V and Y V are graphically shown in Figure 2. Equations (25) and (26) can be rewritten as follows, for time k: and where r k = x k y k z k 1 T . Similarly, we have the following equations for time k + 1: and where The vector r P is invariant with time, since it is the position vector of P in the global coordinate system. As a result, one can achieve the following equation by combining Equations (27) and (29): Note that Equation (24) can be rewritten as follows, for time k: where M k represents a 4-by-4 invertible matrix. We might as well let M k = R −1 k+1 R k and combine Equations (31) and (32), then we arrive at: The plane Equation (24) for time k + 1 can be expressed as follows: Combining Equations (33) and (34) leads to the following transition model for plane parameters: where This plane transition model indicates that the plane parameters at time k + 1 can be calculated based on the plane parameter information at time k and a vehicle-motion-dependent 4-by-4 matrix M −1 k . Note that the computation of matrix M −1 k = R −1 k R k+1 requires the inverse of R k . It can be easily proven that the matrix R k , expressed by Equation (28), has full rank and its inverse R −1 k exists at all times. Thus, the matrix M −1 k can be computed as long as R k+1 is available. However, this matrix R k+1 (as indicated by Equation (30)) requires the vehicle state information at the future time k + 1, which is not available in real-time SLAM applications. To tackle this issue, in the following section we will introduce a new vehicle transition model and demonstrate how the vehicle states at time k + 1 can be predicted.

Vehicle Transition Model
As mentioned above, the transformation matrix R k+1 requires the unknown information from the future time k + 1. In order to obtain matrix R k+1 , in this section a vehicle transition model is introduced to predict the future vehicle states at time k + 1, using the available information at time k.
In Bayesian filtering, the Constant Turn (CT) model [24][25][26] which describes the maneuver of the vehicle motion is commonly used. This model is formulated based on the assumption that the vehicle maneuvers with a (nearly) constant speed and a (nearly) constant angular rate [24]. Based on the CT model, an Extended Constant Turn (ECT) model is proposed in this work, which can be mathematically expressed as follows: with where X V,k , Y V,k , X V,k+1 and Y V,k+1 denote the vehicle longitudinal and lateral displacements (i.e., the coordinates of the vehicle centre of mass as introduced in Section 3.1) in the global coordinate system at time k and k + 1, φ k and φ k+1 represent the vehicle heading angles at time k and k + 1, u X , u Y and u ω are the vehicle velocity and yaw rate noise components, and T stands for the sampling period. By means of this vehicle transition model (36), the required vehicle state information at time k + 1 can be predicted and in turn the matrices R k+1 and M −1 k can be calculated. Note that the plane transition model proposed in Section 3.1 (i.e., Equation (35)), along with the above ECT vehicle transition model, constitutes a complete state transition model that propagates the state densities of the planar features to the next time. This state transition model plays a key role in the prediction step for prospective planar-feature-based SLAM filters.

Implementation
In this section, we introduce the Sequential Monte Carlo (SMC) implementation of the proposed transition model, for the recently developed LMB filter. The LMB filter has been used in many applications such as target tracking [27,28], SLAM [29], visual tracking [30], and sensor control [31,32]. In the SMC implementation, the LMB posterior for the map features (planes) at each time step are represented by a set of particles. These particles are propagated using the proposed transition model to obtain the LMB posterior at the next time step.
Assuming the following LMB posterior π k at time k: where represents the label of a plane, r ( ) k denotes the existence probability of the plane , p ( ) k is the spatial distribution of its parameters, and L stands for the label space. In an SMC implementation, the spatial distribution is represented by a set of weighted samples [28], namely: where N ( ) represents the number of particles, ω ( ) i denotes the weight of the i-th particle, and δ is the Dirac delta function.
at time k, we are able to calculate the corresponding T at time k + 1 based on the transition model proposed in Section 3. As a result, after the prediction stage, we will end up with a new set of particles with the same weights, namely: Then, based on the new predicted particles at time k + 1, the plane parameter estimateβ ( ) k+1 can be achieved from the distribution particles as the Expected A Posteriori (EAP) estimate (i.e., weighted average), or the Maximum A Posteriori (MAP) estimate (i.e., considering the particle with the largest weight as the estimate).
The above implementation procedure is illustrated step by step in Algorithm 1, in the form of a pseudo-code.

Algorithm 1 Step-by-Step Implementation of the Proposed Transition Model
Require: vehicle states X V,k , Y V,k , φ k , and particles ω representing the distribution of each plane with label , for all labels ∈ L 1: compute matrix R k use Equation (28) compute the EAP estimate 12: end for

Simulation Results
In this section, we demonstrate how well the proposed transition model performs, compared with the actually measured results from the KITTI dataset. Specifically, we generate the predicted planes at time k + 1 using the plane information at time k, by means of the proposed transition model, and we segment planes from the point cloud data in the KITTI dataset at time k + 1, by means of the Modified Selective Statistical Estimator (MSSE) [33]. Then, the predicted planes at time k + 1 are compared with the segmented planes at time k + 1, and both graphical and numerical results are demonstrated to show the closeness of the predicted planes and the segmented planes.
The point cloud data from the folder "2011_09_26_drive_0005_sync" in the KITTI dataset are employed in the simulation studies for plane segmentation. The details for accessing and using the KITTI dataset are available in [19]. The folder "2011_09_26_drive_0005_sync" includes 154 time steps (i.e., k ∈ [0, 153]), and the points obtained at k = 11, k = 12, k = 124 and k = 125 are used in the simulation. The number of points at k = 11, k = 12, k = 124 and k = 125 are 123,334, 123,316, 118,950 and 118,572 respectively.  Figure 4, on top of the corresponding planes. Figure 3 shows that by means of the transition model and the available information at time k = 124, three planes are predicted to exist at time k = 125. Also, three segmented planes are present in this figure based on the LIDAR measurements at time k = 125. It is clearly observed in Figures 3 that each predicted plane is associated with one segmented plane. Namely, these six planes constitute three pairs of planes, with each pair composed of one predicted plane and one segmented plane. Besides, we see that for each pair of orange and blue planes, the areas of planes are fairly close and the distances between each pair of plane vertexes are rather small ('Each pair of plane vertexes' is referred to one vertex on the predicted plane and its closest counterpart on the segmented plane).   Figure 4 demonstrates the three pairs of planes from different angles. Again, we observe the closeness of the predicted and segmented planes in terms of plane areas and vertex distances, when viewed from different angles. Figure 4c shows a 'side view' of the three pairs of planes, from which we clearly see that the predicted and segmented planes (and their normal vectors) almost coincide. Similar results are also observed in Figure 4d where a 'top view' of the planes is shown.

Graphical Results
In addition to k = 125, our simulation studies have included a large range of other time steps. Figures 5 and 6 provide the results of another time step k = 12. The rest of time steps present similar results and are not shown in this paper for the purpose of brevity. In the following section, the above graphical results will be further supplemented and clarified, by means of detailed numerical results.

Numerical Results
In Figures 3-6, we have graphically demonstrated the comparison results between the predicted and segmented planes, in terms of three important indicators -plane area, vertex distance and normal vector. In this section, we provide the results of our numerical simulation for 100 MC runs, in order to demonstrate 'how close' exactly these planes are.
The first two indicators, plane area and vertex distance, are mainly used to quantify the closeness in terms of plane dimensions (sizes). Table 1 shows the areas of the predicted and segmented planes for time step k = 125 (averaged from 100 MC runs), and Table 2 gives the distances between each pair of plane vertexes for time step k = 125 (averaged from 100 MC runs). We see in these tables that the areas of the predicted and segmented planes are close, and the distances between each pair of vertexes are small. These numerical results are consistent with the graphical results shown in Section 5.1.  The third indicator, normal vector, is employed to evaluate the closeness in terms of plane orientation. Table 3 shows the angles between each pair of normal vectors for time step k = 125 (averaged from 100 MC runs) ('Each pair of normal vectors' is referred to the normal vector on the predicted plane and its counterpart on the segmented plane). The small magnitudes of these angles imply that the predicted and segmented planes are very close in terms of orientation. This explains the coincidence of the predicted and segmented planes seen in Figure 4c,d. Besides the above three indicators for closeness evaluation, the execution time of the program is also recorded. The proposed transition model is implemented based on Algorithm 1, and the program is executed on a laptop equipped with an Intel i7-5600U CPU and an 8G RAM. For time step k = 125, the execution time is 0.0375 s (averaged from 100 MC runs).
Apart from k = 125, the numerical results for another time step k = 12 are also presented in Tables 4-6. Similarly, these results show that the predicted planes (obtained using the information at k = 11) are fairly close to the segmented planes at k = 12, in terms of plane areas, vertex distances and normal vectors. Besides, the execution time of the program for k = 12 is 0.0369 s (averaged from 100 MC runs). For the purpose of brevity, the numerical results for other time steps are omitted in this paper.

Conclusions
The majority of the current statistical SLAM solutions are based on using point features such as the representation of landmarks. The fast advancement of sensory technology makes it possible to utilize more sophisticated features in SLAM applications to achieve superior results while lowering computational cost. This paper presented the idea of using planar features for statistical SLAM, and proposed a stochastic transition model to propagate the plane parameters to the next time. A large range of simulation studies using real-world measurements have been conducted to evaluate the proposed transition model. Both graphical and numerical results show that the predicted planes generated by the proposed transition model closely match the segmented planes resulting from real-world point cloud measurements, in terms of three important indicators; plane area, vertex distance and normal vector. In the next step, we will look into other planar-feature-based stochastic transition models, and investigate the application of such transition models in statistical SLAM tasks.

Conflicts of Interest:
The authors declare no conflict of interest. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: