^{1}

^{2}

^{*}

^{3}

^{1}

^{1}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Accurate localization of moving sensors is essential for many fields, such as robot navigation and urban mapping. In this paper, we present a framework for GPS-supported visual Simultaneous Localization and Mapping with Bundle Adjustment (BA-SLAM) using a rigorous sensor model in a panoramic camera. The rigorous model does not cause system errors, thus representing an improvement over the widely used ideal sensor model. The proposed SLAM does not require additional restrictions, such as loop closing, or additional sensors, such as expensive inertial measurement units. In this paper, the problems of the ideal sensor model for a panoramic camera are analysed, and a rigorous sensor model is established. GPS data are then introduced for global optimization and georeferencing. Using the rigorous sensor model with the geometric observation equations of BA, a GPS-supported BA-SLAM approach that combines ray observations and GPS observations is then established. Finally, our method is applied to a set of vehicle-borne panoramic images captured from a campus environment, and several ground control points (GCP) are used to check the localization accuracy. The results demonstrated that our method can reach an accuracy of several centimetres.

Imagery from mono or stereo cameras has been the main data source for many applied science fields, such as robotics, computer vision and photogrammetry. Many research studies related to Simultaneous Localization And Mapping (SLAM) based on mono cameras [

An ideal geometric sensor model of a panoramic camera has one projection centre, and all of the light beams satisfy co-linearity conditions or a pin-hole model and project the real world onto a spherical surface. This is a perspective transformation but is not projected onto a plane as in a mono/stereo camera. Geyer and Daniilidis give detailed projective geometry for a catadioptric sensor and emphasize the duality [

Motion and structure estimations from a moving vehicle with a camera or several cameras have different applications in different research fields. In computer vision, this topic is called structure from motion (SFM); in robotics research, it is called SLAM (this term is used in this paper). Two common solutions to the SLAM problem are filtering and bundle adjustment (BA) [

Providing SLAM with global georeferencing information not only constrains the propagating uncertainties but also allows the extra spatial information to be used in more applications. Many researchers have studied SLAM with several types of geo-information, particularly maps and GPS. In [

Although homography and the 3DoF models without altitude information used by these articles reduce the computation cost greatly, they all presume a planar Earth surface and may introduce large errors in elevation. Furthermore, these methods all use filtering methods. To our knowledge, no article has studied GPS-supported BA-SLAM. However, GPS-supported BA-SLAM should have a higher accuracy than filter-SLAM because of the theoretical rigor of BA itself. In this paper, we study a GPS-supported BA-SLAM method in which a 6DoF model is embedded, a rigorous sensor model is applied as the geometric projection model, and GPS data are combined with ray observations as additional restrictions for global optimisation and georeferencing. Finally, several ground control points (GCPs) are measured manually to check the absolute accuracy of the GPS-supported BA-SLAM method.

The paper is structured as follows: Section 2 introduces a common dioptric panoramic camera and establishes a camera model that is more rigorous than the ideal model. Section 3 presents a stereo co-planarity (or epipolar constraint) that is more rigorous than the ideal co-planarity. Section 4 addresses the bundle algorithm supported by GPS, and Section 5 presents the results of the mapping and localization experiments. All of these experiments were carried out using a vehicle platform that consists of a multi-rig camera and GPS receiver. Finally, we present the conclusions and future work in Section 6.

As shown in _{S}_{c}_{s}^{T} in the panoramic sphere. _{c}_{c}_{c}_{c}_{c}_{c}

The common ideal panoramic sensor model _{s}_{A}_{s}_{s}

However, _{c}u_{s}_{s}u_{s}_{c}_{s}_{c}u_{s}_{s}u_{s}

According to the analysis presented above, a rigorous sensor model should express the correct rays. The ray through _{c}_{c}_{c}u_{s}

In _{c}_{x}_{y}_{z}^{T} is the translation vector between _{c}_{s}_{c}_{c}u_{s}_{c}_{A}_{c}_{c}

In this paper, the rigorous sensor model (

Co-planarity, also called epipolar constraints, is a well-known geometric relationship between stereo-image pairs that reflects the two camera positions and the corresponding image coordinates in one plane. As described above, extra velocity and angular velocity are not needed as parameters of a motion model because a filter framework is not used and BA only needs the initial position and orientation vectors as inputs. The co-planarity will supply sufficient parameters for the image association and the initial values for BA.

_{s}_{X} B_{Y} B_{Z}_{s}u_{s}_{1} = [_{1} _{1} _{1}] and
_{2} = [_{2} _{2} _{2}]. The vectors _{1} and _{2} satisfy the epipolar constraints as follows:

In

If _{1} and

We can see that _{s}_{c}_{1} and _{2} all have errors. Because the monocular rigorous camera model is constructed in uniform panoramic coordinates, we construct the co-planarity in the same coordinates.

First, the actual corresponding rays _{1} and _{2} should pass through the projection centres of the separate cameras as in

In

If the vectors _{1} and _{2} are calculated as

In

In this paper, the rigorous panoramic co-planarity

This paper focuses on accurate global localization in large-scale outdoor environments using GPS-supported vehicle-borne panoramic imagery. The GPS-supported BA method has been used for aerial triangulation for many years, but to date, it has not appeared in the field of SLAM research to our knowledge. Filtering has been the only method used to combine these two observations. In this paper, we combine GPS data with image observations in a BA framework, and three carefully designed steps, accurate data association, segmented BA and GPS-supported BA are used to form an integral workflow (

Data association is a key point in SLAM. The data should be verified so that all mismatched features are eliminated correctly and so enough information remains. We introduce a three-step outlier elimination process to ensure that all of the matched features are correct.

The data association begins with feature extraction and matching with GPU-SIFT [

In _{i}_{i}_{0}_{0}

After the outlier elimination in the first two steps, the large errors are all removed correctly; these errors are shown as red points in

The biggest problem with a large-scale BA for SLAM is the accumulation of position and orientation uncertainties because of error propagation, which will prevent the iteration from converging because BA requires accurate initial values. In contrast, filtering methods, extended Kalman filtering and particle filtering always give a possible solution.

As described in several articles as [

In _{1}, Δ_{1}, Δ_{1} represent the difference of the scale, rotation and translation parameters between the two blocks, which can be calculated as a well-known 3D transformation according to the corresponding landmarks in two blocks:

In _{1}, _{2}represent the corresponding landmarks from the first and second blocks, respectively, which were obtained by multi-intersection. A larger dataset of _{1}, _{2} will provide a more robust solution, and we set the adjacent blocks with five overlapping images. After all of the blocks have been connected, a global BA result of the local optima can be obtained. Similar to global BA, segmented BA cannot reduce the accumulation of uncertainties. As in _{1} will be propagated to _{2} according to the error propagation law.

After the segmented BA-SLAM, GPS will be introduced to obtain georeferencing information and reduce the accumulated uncertainties. The 6DoF of all the images are then translated to global coordinates with a polynomial interpolation according to GPS values, and looked as the initial values for GPS-supported BA. The GPS observations are preprocessed with CORS (Continuously Operating Reference Station)-supported [

In _{G}_{G}_{G}_{S}_{S}_{S}

Because _{G}_{G}_{G}_{G}

To test the proposed rigorous sensor model and its application in GPS-supported SLAM on a vehicle platform, we use PGR's Ladybug system [

After the three rounds of outlier elimination, all of the remaining corresponding rays intersect correctly, as shown in

The blue line in

Because it is the only georeferencing information, the quality of the GPS observations is very important for the convergence of the union

The second test examines the robustness of our method when gross errors/outliers exist in the GPS data. A gross error of 1 m (10 times the original GPS deviations) is introduced to selected GPS observations by rules, as shown in

In this paper, we present a framework for GPS-supported BA-SLAM with a new rigorous sensor model for a panoramic camera. The test results show that our method is capable of obtaining global localization accuracy of several centimetres when GPS observations are favourable and demonstrate that our rigorous sensor model is both correct and effective. The tests show that our method is robust and provides an acceptable accuracy of several decimetres, even when GPS observations are partially unavailable or with big errors. The main contribution of this paper is that it is the first time that a GPS-supported BA has been used in a vehicle-based outdoor SLAM with a panoramic camera. This system may complement mainstream filtering solutions. The second contribution is that the paper proposes a new sensor model for panoramic cameras that is theoretically rigorous and considers the small offsets between the panoramic centre and the centres of the separate lens. The model may avoid slight but unnecessary system errors compared to the ideal sensor model.

Solutions based on BA may be more accurate than those using filters, but BA still has some shortcomings. BA requires accurate initial values to guarantee the convergence of the iteration. In our method, a three-step outlier elimination process is performed to guarantee that all of the tie-points are correct. Segmented BA has no trouble with good ray observations; in the global GPS-supported BA, however,

Future work will focus on SLAM accuracy. Two problems must be addressed. First, the robust data association will be tested and improved in complicated environments, such as in a busy highway, where the large number of moving vehicles will be the greatest challenge. Second, tall buildings in cities may cause the GPS signals to be locked out for long periods. We will develop a reliable method to maintain the consistency of the local SLAM results with GPS and detect gross errors in the GPS observations automatically.

This work was supported by the Chinese 973 Program (2012CB719902), the GRENE (Environmental Information) project of MEXT Japan (Ministry of Education, Culture, Sports, Science and Technology), and the opening project of the Key Laboratory of Xinjiang Uygur Autonomous Region (XJYS0205-2012-02).

An overview of the results of our proposed method. The blue line in the middle represents the trajectory through the Kashiwa campus of the University of Tokyo, and the nearby green circles are the tie-points. The separate images in the route are from the 5 mono-lenses. Green dots indicate correctly matched tie-points with good distributions, and red dots indicate mismatched points that have been excluded from the error detection steps. A 3D view of the results is shown in the top left corner; blue dots represent the position and posture after SLAM, and pink dots represent the GPS route. The two boxes in the bottom right are the zoomed area in which the GCPs are included. The light green corresponding rays intersect correctly in the right box, and the RMSE of the tie-points and check points both reach an accuracy of several centimetres with a grid scale of 0.2 m (left box).

Representation of a panoramic camera consisting of five mono cameras. The dashed line on the left indicates an ideal ray corresponding to ideal sensor model that passes through the panoramic centre _{S}_{s}_{s}_{s}_{c}_{c}_{s}_{c}

The main workflow of GPS-supported BA-SLAM.

A 3D view of successfully matched tie-points (green). The points excluded by RANSAC (the first outlier elimination step) and histogram voting (the second step) are shown in red, and those excluded by BA (the third step) are shown in blue. The blue rays represent features that cannot intersect precisely, such as feature 267. Feature 267 may be regarded as correctly matched (right-middle images), but the lack of information about features in the window may introduce a bias of one or more pixels, which causes a slight intersection error (left-middle image). In contrast, feature 245 has a better texture and intersects precisely.

Panoramic image and separate images captured by the Ladybug system. (

Results of the segmented BA-SLAM and GPS-supported BA-SLAM methods. The yellow line is the trajectory of the unconstrained results after data association and block BA. The start point is located in the correct position, but the trajectory shows a large accumulation of uncertainty in angle and scale. The blue line represents the trajectory after the GPS-supported BA-SLAM method is applied and shows a high level of accuracy. All eight GCPs are located in the enlarged area and are shown in

The eight GCPs, with accuracy up to 2 cm, are used in the experiments to check the accuracy of the GPS-supported BA-SLAM.

(

(

Check errors of the eight GCPs.

_{X} |
_{Y} |
_{Z} |
_{XYZ} | |
---|---|---|---|---|

K7 | 3.6 | 6.6 | 3.1 | 8.1 |

K5 | 3.8 | 6.0 | 1.7 | 7.3 |

K3 | 4.0 | 4.6 | 3.5 | 7.0 |

K8 | 4.4 | 4.2 | 2.3 | 6.5 |

K9 | 3.3 | 4.2 | 1.8 | 5.6 |

F2 | −1.7 | 6.9 | 0.5 | 7.1 |

G3 | −1.2 | 6.0 | 0.2 | 6.1 |

H2 | 2.4 | 5.2 | −0.6 | 5.7 |

| ||||

Average | 3.0 | 5.5 | 1.7 | 6.7 |