Performance Improvement in Vslam Using Stabilized Feature Points

Simultaneous localization and mapping (SLAM) is the main prerequisite for the autonomy of a mobile robot. In this paper, we present a novel method that enhances the consistency of the map using stabilized corner features. The proposed method integrates template matching based video stabilization and Harris corner detector. Extracting Harris corner features from stabilized video consistently increases the accuracy of the localization. Data coming from a video camera and odometry are fused in an Extended Kalman Filter (EKF) to determine the pose of the robot and build the map of the environment. Simulation results validate the performance improvement obtained by the proposed technique.


INTRODUCTION
SLAM has been one of the key research areas for autonomous mobile robots.It is the process of building the map of an unknown environment and determining the location of the robot using this map concurrently.The inception of the SLAM problem occured at the 1986 IEEE Robotics and Automation Conference as reported in [1].After this time, probabilistic methods were incorporated into the robotics research where they were primarily being used in the guidance, navigation and control of robots.SLAM has been an active research area after the influential work of Smith et al. [2] about stochastic mapping.Ayache and Faugeras introduced the combination of visual work and navigation using stereo pairs [3].Experimental work of Moutarlier et al. [4] attracted the interest of the researchers to the SLAM problem.Lowe et al. used the Scale Invariant Feature Transform (SIFT) algorithm in vSLAM for a mobile robot [5].Davison et al. developed an EKF based monocular vSLAM in [6].
In this paper, we propose a performance improvement technique that extracts stabilized Harris corner features using template matching based stabilized video sequences.When a non-holonomic wheeled mobile robot (WMR) navigates in an unknown environment, some undesired phenomena such as vibrations on the mobile robot and the speed bump constructions in the environment might occur.With the proposed technique, these problems are eliminated, and as a result stabilized feature extraction is achieved.Stabilized keypoint extraction ensures both consistency in map building and localization of the mobile robot.
The rest of the paper is organized as follows: In section 2, the sensor fusion algorithm is introduced.In section 3, mathematical model of a non-holonomic mobile robot is described.EKF algorithm is summarized in section 4. Stabilized feature point extraction is detailed in section 5. Simulation results are presented in section 6, and finally, the paper is concluded in section 7.

SENSOR FUSION ARCHITECTURE
The sensor fusion architecture developed in this work is shown in Figure 1 and composed of several modules.Data generated by both the camera and the odometry are used in feature extraction (FE) and dead reckoning (DR) blocks, respectively.The output of FE is the observation, and the output of DR is the robot state prediction.In measurement prediction block, predicted states obtained from the robot model are used and the sensor measurement model is utilized to predict the measurements.In matching module, measurement predictions are subtracted from observations to calculate the innovation and innovation covariance.The output of the matching block is transferred to EKF update block to estimate the non-holonomic WMR states and build the map.

MATHEMATICAL MODEL OF THE MOBILE ROBOT
The non-holonomic WMR shown in Figure 2 includes two driving wheels and a back caster that are non deforming.The robot moves on the horizontal plane and the contact of the wheels with the ground is assumed to satisfy rolling without any skidding or slipping.

Kinematic Model
In the kinematic modeling of the non-holonomic WMR, orientation must be considered since it affects the robot movement along  and  directions based on the kinematic constraints of the system.The kinematic model of the NWMR is described by the following equations [7]: or, can be written in a more compact form as where  = , ,   is the pose (position and orientation) of the centre of mass of the mobile robot , with respect to world coordinate frame ,  = ,   is the control input vector, where  is the linear velocity and  is the angular velocity of the mobile robot, respectively.Using Euler's forward difference approximation for the derivative, the discrete form of the mobile robot kinematic model can be written as: or, in a more compact form as  + =    ,   (4) where  is the discrete time index,  is the sampling period and (  ,   ) is a nonlinear mapping [8].In order to implement EKF, this nonlinear system must be linearized.In [9], it is shown that applying the Taylor series approximation to the right-hand side of Eq. 2 and ignoring the higher order terms yields the following linear state-space model of the mobile robot: The state () and input () matrices are defined as follows:

Camera Sensor Model
Ideal pin hole camera model is used as a measurement model.Acquired measurements from the camera generate the measurement vector , where  is the number of the features observed at a particular time index .At the same time, all the observed image features build up the map of the environment.At any time , for one observed image feature camera model implies: where   is the focal length of the camera, (  ,   ) is the principal point of the image plane in pixels,   = [   ,    ,    ]  is the 3D location of the extracted feature with respect to the camera frame.3D location of the  ℎ feature with respect to the world coordinate frame is given as [10]: where   is the 3D location of the image feature in world frame,    is the rotation matrix that defines the orientation of the camera frame with respect to the world frame,  is the 3D translation vector from world frame to camera frame.A rotation matrix can be parameterized by three independent variables such as Euler angles.Due to the planar robot motion assumption, the orientation matrix will be just in terms of the yaw angle [13]: In Eq. 12,  (heading angle) is taken from the estimated states of the EKF that will be summarized in the next section.By rearranging Eq. 11, one can calculate the    as: where    is simply the transpose of the rotation matrix    .Plugging Eq. 13 into the measurement model yields the extracted feature location in image plane: The measurement Jacobian   is calculated by taking the derivative of the right hand side of the Eq. 14 with respect to the states of the mobile robot   .Thus, Observation and measurement prediction data are fused in EKF to calculate the innovation and innovation covariance.

EXTENDED KALMAN FILTER (EKF)
The mobile robot navigates in an unknown environment, without any a priori knowledge about the map, takes measurements to extract feature points and consequently localizes itself.External (camera) and internal (odometry) sensory data will be fused in EKF.The robot pose () and the locations of the extracted feature points (  ) with respect to the world frame can be stacked in a new state vector as: where  = [, , ]  defines position and orientation of the robot, and is governed by the following nonlinear model: where   and   are the process and the measurement noise, which are modeled as zero-mean, independent Gaussian distributions with covariance matrices   and   , respectively.
The second element of  is defined as where   = [  ,   ]  are the locations of the extracted features with respect to the world frame and added to the map at time  .Since the positions of the extracted features are not changed, they remain at the same locations during the navigation; i.e.
Linearization of Eqs.16 and 18 with respect to  imply new Jacobians for the process model [7]: where  Є  33 ,  1 Є  32 (zero matrix),  Є  2 2 (identity matrix),  ∈  32 and  2 Є  2 2 (zero matrix) with  being the number of features extracted at time .With this framework, the following algorithm summarizes the recursions involved in computing the EKF [11]: where   is the covariance matrix of the combined state .To initialize the filter,   and  0 are set to some arbitrary random values.

STABILIZED FEATURE POINT EXTRACTION
Extracting feature points accurately increases the performance of vSLAM algorithm since they are used in EKF measurement update.It provides improvement in both map building and localization of the mobile robot.In this section, stabilization of video sequences and Harris corner detection algorithm are detailed.
Video stabilization is one of the most crucial video processes that reduces the blurring level of image sequences and unwanted camera motions.Extracting point features from stabilized video frames improves the consistency of the static landmarks and provides robust matching between corresponding points.Proposed video stabilization method in this work is based on a template matching that uses the sum of absolute differences (SAD) algorithm: where  1 and  2 are two consecutive image frames. 1 (, ) and  2 ( + ,  + ) defines the pixel intensity values.In  1 , a window , e.g.size of (15 x 15), is generated around an interest point.Meanwhile, each pixel in the second video frame is scanned by shifting this window along horizontal () and vertical () directions.Note that the intensity values in the second window is subtracted from those values in the first window.The absolute values of all these pixel intensities in  are summed.If there is a correct match, the SAD function gives a near 0 value.Thus, a similar window is created in the second video frame [14].Scan process can be applied both over the entire image or just using a region of interest.In each subsequent video frame, SAD algorithm determines the camera motion relative to the previous frame.It uses this information to remove unwanted translational camera motions and generate a stabilized video.Feature extraction from consecutive images is one of the essential steps of vision based simultaneously localization and mapping applications.In this work, extracted image features are corners that are obtained via Harris corner detector.Due to space limitations, details about Harris corners are omitted here, and the interested reader may refer to [12].

SIMULATION RESULTS
In this section, the performance of the proposed technique is verified with simulation results.Ramp and circular inputs are used to generate the odometry data.Odometry and camera outputs are fused in EKF to estimate states of the mobile robot.Extended Kalman filter both estimates the mobile robot states and generates the map of the unknown environment.Inputs for the system are summarized in Table 1.

Type of Input Input
Ramp trajectory C. Şahin and M. Ünel In this table   ,   ,   indicate the reference pose of the mobile robot and   ,   denote reference linear and angular velocities of the mobile robot, respectively.Simulation results for the ramp trajectory is depicted for 120 seconds, and 1/50 is chosen for sampling time both for EKF and the camera.In Figure 3 (a), (b) and (c) robot pose estimation is shown.According to the leftmost and center graphs, x and y positions of the mobile robot increase as time increases.Given the control input that is shown in Table 1 for ramp input,  position coordinate of the mobile robot increases more rapidly than the  coordinate position.Initial robot pose as well as the initial camera frame are used as the reference coordinate system and all estimates are represented with respect to this frame.On the rightmost graph in Figure 3, , heading angle estimation is shown.
When mobile robot starts to navigate in the environment, it has a rotation at the beginning of the movement for trajectory tracking that is related to the ramp control input.As shown in Figure 3, the errors between reference and estimated pose states are less than 1%.respectively, mobile robot navigates in circular trajectory in the environment.In the leftmost graph, at 800 and 1300 time samples, there occurs some differences between reference and estimated states.The reason why these differences occur is the rapid increase in heading angle and hence decrease in the overlap area in consecutive image frames.Reduction in the overlapped area between consecutive frames gives rise decrease in stable feature point extraction and consequently higher noise in map building.
In our vSLAM algorithm the accuracy of the mobile robot localization is highly dependent on the map building.Errors in these regions are approximately 8% .However, between 800 and 1300 time samples, the reference and the estimated states are very close to each other, i.e. the error rate is below 1%.This promising result is obtained thanks to the stabilized extracted feature points and validates the performance of our proposed algorithm.On the rightmost graph in Figure 4, it is seen that heading angle increases continuously with time.

CONCLUSION
In this paper, we proposed a performance improvement technique for vSLAM problem of mobile robots.We incorporated video stabilization into vSLAM for feature extraction, correspondingly map building and localization.In vSLAM, the performance of the algorithm depends on both the accuracy of the map and localization of the robot.In this work, it is shown that consistent feature extraction technique both improves the accuracy of map building and localization of the mobile robot by neglecting unwanted sensor motion and the noises that are caused by the external factors.Simulation results verified that EKF state estimation performance is improved thanks to the utilization of stabilized landmarks in measurement update.
As a future work, we plan to suggest a new approach to vSLAM problem that will use vision only and eliminate some of the assumptions made in this work such as 2D visual features and planar motion.Instead of using odometry data in EKF, we will estimate robot ego-motion utilizing extracted 3D visual features.

Figure 2 .
Figure 2. Non-holonomic wheeled mobile robot Figure 3. x, y and  state (pose) estimations by EKF for ramp input In Figure 4 (a), (b) and (c) pose estimation of the NWMR is shown for the circular trajectory.The simulation for circular trajectory is performed for 30 seconds and 1/50 sampling time is chosen again for both EKF and the camera as in the ramp input.In the first and second graphs of the figure,  and  position estimates are depicted.Given constant linear and angular velocity inputs, 0.3 [m/s] and 0.6 [rad/s] Figure 4. x, y and  state (pose) estimations by EKF for circular input The most prominent result of the proposed technique is the accuracy improvement of visual simultaneous localization and map building algorithm using stabilized feature point extraction.Subsequent video frames are stabilized and Harris corner features are extracted from stabilized video sequences.In Figure 5 (a) and (b),landmark positions for ramp and circular inputs are shown.While mobile robot is travelling in the unknown environment with given control inputs, naturally located planar landmarks are extracted and used for measurement update in EKF.In vSLAM algorithms, generating consistent map is one of the most crucial processes to obtain accurate navigation results.Acquiring these naturally located features in a consistent way by neglecting unwanted camera motion and jitter, our technique builds a consistent map and improves the localization correctness as shown in Figures3 and 4 .