This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
This paper presents a sensor fusion strategy applied for Simultaneous Localization and Mapping (SLAM) in dynamic environments. The designed approach consists of two features: (i) the first one is a fusion module which synthesizes line segments obtained from laser rangefinder and line features extracted from monocular camera. This policy eliminates any pseudo segments that appear from any momentary pause of dynamic objects in laser data. (ii) The second characteristic is a modified multisensor point estimation fusion SLAM (MPEFSLAM) that incorporates two individual Extended Kalman Filter (EKF) based SLAM algorithms: monocular and laser SLAM. The error of the localization in fused SLAM is reduced compared with those of individual SLAM. Additionally, a new data association technique based on the homography transformation matrix is developed for monocular SLAM. This data association method relaxes the pleonastic computation. The experimental results validate the performance of the proposed sensor fusion and data association method.
A crucial characteristic of an autonomous mobile robot is its ability to determine its whereabouts and make sense of its static and dynamic environments. The central question of perception of its position in known and unknown world has received great attention in robotics research community. Mapping, localization, and particularly their integration in the form of Simultaneous Localization and Mapping (SLAM) is the basic ability with which other advanced tasks such as exploration and autonomous navigation can be successfully implemented. Therefore, SLAM has been vigorously pursued in the mobile robot research field.
Monocular cameras have been widely used as low cost sensors in numerous robotics applications in recent years. They provide the autonomous mobile robot with abundant information that facilitates intuitive interpretation and comprehension of the environment better than other scanning sensors. The algorithms based on monocular cameras perform reasonable visual SLAM procedures. Points/landmarks extracted from images are common map elements and typically present in structured indoor scenes [
In this study, we present a sensor fusion strategy for linebased SLAM applied in dynamic environments. This approach fuses the sensor information obtained from a monocular camera and laser rangefinder and includes two modules. One is a feature fusion that integrates the lines extracted respectively from a monocular camera and a laser to remove the erroneous features corresponding to dynamic objects. The other is referred to as a modified multisensor point estimation fusion SLAM (MPEFSLAM) which incorporates two separate EKF SLAM frameworks: monocular and laser SLAM. This modified MPEFSLAM fuses the state variable and its covariance estimated from individual SLAM procedure and propagates fused values backward to each SLAM process to reduce the error of robot pose and line feature positions. Another advantage of the modified MPEFSLAM is that its implementation is on the basis of two parallel running SLAM processes, which can avoid unexpected events. For example, when one SLAM procedure does not work due to the sensor failure, the other one can be still running normally. This manifests the idea of redundancy in comparison with the single SLAM framework. Additionally, for monocular SLAM process we suggest a new data association (DA) algorithm. It employs the homography transformation matrix [
The remainder of this paper is organized as follows: after reviewing the related work in Section 2, we elucidate the framework of monocular SLAM and the data association method in Section 3. Section 4 describes the sensor fusion strategy including the feature fusion and modified MPEFSLAM modules. We validate our proposed methods through the experiments in Section 5. Section 6 gives our conclusions and suggestions for future work.
Most indoor environments can simply and expediently be represented by line segments. In our previous work [
To the best of our knowledge, almost all mentioned methods above focus on the visual SLAM in static space or the environments with few dynamic objects. In this study, we revisit the SLAM problem in a dynamic environment from a sensor fusion viewpoint. This approach incorporates the sensor information of a monocular camera and laser rangefinder to remove the feature outliers related to dynamic objects and enhances the accuracy of the localization.
Computer vision technology makes visual SLAM feasible, and related data association methodology is also an interesting area which has attracted much research attention. In addition to conventional data association algorithms such as Nearest Neighbor [
These data association methods using the SIFT technique improve the robustness compared with the NCC and image patches based approaches. With this advantage, we also developed a data association method that does not apply the SIFT descriptor as the map features but rather uses the descriptor to estimate the homography transformation matrix of any two images, and then employs the estimated homography transformation matrix to implement data association.
In this section the outline of the linebased monocular SLAM framework is discussed. We briefly present the camera/robot motion and line measurement model. After that the homography transformation matrix based data association method for monocular SLAM is introduced.
The camera is fixed on the robot platform which moves in a 2D plane, and its translational and rotational velocity are identical with the mobile robot. For convenience, as is shown in
The simplified camera motion model is:
Line extraction is actually an edge detection operation in the image processing terminology. Most of the edge features represented in the related work are extracted by using a firstorder edge detection operator: the canny operator [
The range of ROI encapsulates the ground plane since most static edges are present on the floor. In this ROI we just consider the horizontal static edges, and do not focus on tracking the dynamic targets. After horizontal edge detection processing, it is clear that several edges corresponding to the dynamic objects (
We developed two sets of parameters for edge representation: one was used in the measurement model; the other was for data association and sensor fusion. In this subsection, we mainly discuss the parameters for the measurement model. For a line reflected in the vision system, the minimal representation uses four parameters (e.g., DenavitHartenberg line coordinates) in 3D Euclidean space but it may be ineffective in some robotic research topics. There are several nonminimal representations for the 3D line, such as endpoints of the line [
Similar to [
Besides the location of line’s endpoints used for measurement model, we also considered several additional parameters: the position of projected intersection point
Step 1: Preprocess the acquired image to filter out different noise signals;
Step 2: Select the region of interest (ROI);
Step 3: In the ROI, detect the horizontal edges by the Sobel operator combined with thresholding;
Step 4:
Step 5: Remove the edges whose length is less than the length threshold;
Step 6: Implement
Step 7: Calculate the pixel coordinates of line endpoints and projected intersection point, and descriptors in Hough space.
Sampling is considered very important in Nearest Neighbor data association methods. In the reference works [
After the estimations of
When estimating
Looking back through the implemented process of the proposed HTMDA, compared with the related work in Section 2, instead of directly applying SIFT descriptors as the natural features for data association, we emphasize using the SIFT mechanism to determine the matched points between any two images and then apply these matched points to estimate the homography transformation matrix and its related covariance. The data association is based on this estimated matrix and its covariance. Additionally the main difference between our defined Mahalanobis distance in
Sometimes the position of the predicted image line endpoints may be outside the image range, and we cannot use the criteria (7) above to determine associated features. For this special case, we employ the Hough space parameters presented in previous subsection to handle the data association problem, and adopt an alternative criterion, that is, to test whether the predicted endpoints lie on the observed image lines. The ends lie on the lines if and only if
It is impractical and time consuming to compute all
HTMDA algorithm.
// I 
//O 
[desCur, locCur ] = sift(CurrentImg); // Find SIFT keypoints for current captured image. The outputs are 
// desCur: descriptor for the keypoint; locCur: keypoint location 

[desK, locK] = sift(Img(k)); // Find SIFT keypoints for kth image. 
// Estimating 
[M(k), Σ_{M}(k)] = HomographyEstimation(locCur, locK, 
// Observation prediction 

EndsPred (j) = M(k)EndsMap(j); // 
s_{m} = (EndsObs(i) – EndsPred(j)) · (Σ_{M}(k))^{−1} · (EndsObs(i) − EndsPred(j)) 

DA(i, j, k) = 1; 

// Two predicted line points locate on the same line, or one Mahalanobis distance value meets 
// condition (7) and one predicted line point lies on the observed line. 
DA(i, j, k) = 1; 

DA(i, j, k) = 0; 






As was mentioned in Section 1, this study is a natural extension of our prior research [
The purpose of the feature fusion is to remove the pseudo laser segments corresponding to dynamic objects. Before the implementation of feature fusion, it is necessary to figure out the laser segments and image lines in the same sensor detection range. As the horizontal field of view (HFOV) of the monocular camera has a limited visual angle, it is feasible to extract the laser segments within this HFOV. That is, when a frame of raw laser data is received, those located outside the HFOV are filtered out. For these filtered raw data, the robust regression model [
Suppose that
With rule (11), we validate all the laser segments located in HFOV. If
In [
Modified MPEFSLAM algorithm.
// Robot pose initialization 
[ 
[ 
[ 
Q = createQ( 
// Line Feature initialization 
SegC = HorizontalEdge(image); // Horizontal line extraction from 1^{st} captured image 
[ 
// intrinsic parameters of the camera 
SegL = LineExtraction(laserdata); // Segment extraction from 1^{st} frame of laser data 
[ 
// State variable and related covariance initialization 
// Fused robot pose initialization

// Main loop 
k = 1; 
u_{k} = getControl(k); // Obtaining control variables 
[ 
[ 
// Do MPEF procedure 


// Propagate backward the MPEF results to each individual SLAM 

// Update individual covariance 


k = k + 1; 
In the modified MPEFSLAM framework, the state variable and its covariance in each individual SLAM are first fused by a fusionweighted matrix to obtain a fused state variable and covariance. After that, the fused state variable and covariance are propagated backward to each individual SLAM for updating the individual state variable and related covariance. By this updating scheme, the covariance matrices of the fused and individual state variable are decreased, even though the fused estimation could not be kept at an optimal value. The details on the theoretical derivation of the modified MPEFSLAM are described in
We conducted extensive experiments in the corridor just outside the control laboratory of the Electrical Engineering Department. The mobile robot platform used for experimental studies was the Pioneer 2DX mounted with a Canon VCC4 monocular camera, a SICK LMS200 laser rangefinder and a 16sonar array. The camera was calibrated by the Calibration Toolbox (available online:
In this experiment, a person stood in front of the robot for few minutes shown in
Additionally, it can be found that laser segment 3 correlated with image lines 4 and 5. This is because laser segment 3 concurrently located in the angle interval determined by image lines 4 and 5 respectively. However, it only related to line 5 according to the fusion rule. We applied this feature fusion strategy in the whole EKF laser SLAM process and the result after feature fusion is shown in
The reasons for this case are: one is the locations of these laser data are out of the HFOV of the camera, the other is the assumptions of proposed Bayesian fusion rule is strict (
We firstly ran two individual EKF SLAM: monocular SLAM and laser SLAM procedures in parallel mode. The state variable and its covariance obtained respectively from each individual SLAM were integrated to compute the fused state variable and related covariance of the MPEFSLAM. Finally the fused state variable and covariance were propagated back to monocular and laser SLAM respectively for updating the individual state variables to improve the localization accuracy.
In the defined ROI, we compute
In this paper, we suggest a sensor fusion strategy including feature fusion and modified MPEFSLAM modules for the SLAM task of autonomous mobile robots in dynamic environments. Our feature fusion policy incorporates the line features extracted by a monocular camera with the segments represented by robust regression model from a laser sensor, the purpose of which is to remove the potential pseudo laser segments corresponding to the moving objects. The modified MPEFSLAM combines state variable estimates obtained from individual SLAM procedure (monocular and laser SLAM), and respectively propagates the fused state variable backward to each SLAM process to reduce the covariance of the state variable of individual SLAM furthermore improve the accuracy of localization. Additionally, for the data association problem in monocular SLAM we present a new method based on homography transformation matrix. It relaxes redundant computational procedures compared with the algorithm based on pixel by pixel computation. Experimental results verify the performance of the proposed sensor fusion strategy and data association algorithm. The planned future work will include improvement of the feature fusion module on how to use the laser data located outside the HFOV and extension of sensor fusion modules such as sensor management, active sensor. Another promising direction is on developing an online implementation for the proposed HTMDA and MPEFSLAM algorithm by embedded hardware and technique.
The first author sincere thanks the financial support from the Research Funds of Jinan University Zhuhai Campus for Introduced Talented Personnel (grant No. 50462203), and the Fundamental Research Funds for the Central Universities (grant No. 21611382).
The motion models for two SLAM procedures are identical and represented as:
Each individual EKF SLAM is:
For the fused EKF SLAM, there are similar formulas:
According to
Multiplying
Substituting
Actually in each iteration,
By
It is necessary to note that
If the latest fused state estimate
To maintain the identity with [
Rewriting
Suppose that the initial values of state variable vector and covariance for fused and individual EKF SLAM are same,
At step
Similar to
Substituting
In comparing with
On the other hand, with assumptions above we get the following equations by substituting
With replacement of the related item in
Comparing
It is obvious from
It is easy to prove that
It can be concluded that
The world and robot/camera coordinate reference. Red and subscript W: world reference; black and subscript R: camera/robot reference.
(
Matched point determined by SIFT descriptors.
Local mapping result at the 33rd sample time. (
The laser SLAM results with the feature fusion. (
The estimated covariance of fused and individual robot pose. red line: estimated covariance in laser SLAM; green line: estimated covariance in monocular SLAM; blue lines: estimated covariance in MPEFSLAM.
The estimated covariance of an endpoint of one line feature. Red line: estimated covariance in laser SLAM; Green line: estimated covariance in monocular SLAM; Blue lines: estimated covariance in MPEFSLAM.
The endpoints and projected intersection point of the lines in the stored and captured images. The image 1 (on the left) is captured at the 57th sample time and the right one (image 2) is at the 58th sample time.
Errors between the actual and estimated location of endpoint 2 from 40th to 60th sample time. The 99% confidence limit is shown in red dashdot
Intrinsic Parameters.
Focal length  fc = [365.12674 365.02905] 
Principal point  cc = [145.79917 114.50956] 
Skew factor  alpha_c = 0.000 
Distortion factor  kc = [−0.22776, 0.36413, −0.00545, −0.00192, 0.000] 
Pixel std  err = [0.10083 0.10936] 
The hypothesis test of feature fusion.
1  
2  
3  
4  
5 
means no fusion process is implemented.