# GPS-Supported Visual SLAM with a Rigorous Sensor Model for a Panoramic Camera in Outdoor Environments

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Monocular Ideal Sensor Model vs. Rigorous Sensor Model of a Panoramic Camera

#### 2.1. Projection from Fish-Eye Lenses to Panoramic Camera

_{S}is the centre of the panoramic sphere. A two-step transformation is carried out to establish the relationship between fish-eye cameras and panoramic camera. In the first, the fish-eye image coordinates are transformed to the ideal plane camera coordinates, while the second transforms the plane coordinates to the uniform panoramic coordinates. Equation (1) describes how an image point u

_{c}with a coordinate vector

**u**in a separate lens is projected to u

_{s}with a coordinate vector

**X**= [x, y, z]

^{T}in the panoramic sphere.

**K**

_{c}is the transformation matrix from the image coordinate

**u**in the fish-eye camera c to the corresponding undistorted plane coordinate and includes such parameters as radial distortion, tangential distortion and principal point offset [23].

**R**

_{c}and

**T**

_{c}are the rotation matrix and translation vector from the coordinates of the ideal plane camera c to the panoramic coordinates, respectively.

**K**

_{c},

**R**

_{c},

**T**

_{c}are fixed values because of the advanced calibration, and k is the scale factor from the ideal plane to the panoramic sphere coordinate, which varies with different points and can be calculated associated with Equation (2) which describes that

**X**is on the panoramic sphere with a certain radius R. It should be mentioned that the panoramic coordinate

**X**for a certain image point, is the same both in ideal and rigorous sensor models:

#### 2.2. Ideal Panoramic Sensor Model

_{s}with coordinate vector

**X**

_{A}in the object space and the corresponding panoramic point u

_{s}with coordinate vector

**X**obtained from Equation (1), which passes through the panoramic center T

_{s}.

**R**and

**T**are the rotation matrix and translation vector, respectively, and λ is the scale from the panoramic coordinate to object coordinate:

**T**

_{c}u

_{s}, which passes through the centre of the separate lens (shown by the solid line) is regarded as

**T**

_{s}u

_{s}which passes through the panoramic centre (shown by the dashed line). This observation indicates that the ideal panoramic camera model is incorrectly constructed for the biased ray. The biased rays cause the second error that the real 3D point p

_{c}is translated to an incorrect position p

_{s}. However, the projection centres of the separate fish-eye cameras and the panoramic centre are very close, and the angle between

**T**

_{c}u

_{s}and

**T**

_{s}u

_{s}is very small, which may ensure that the system errors are limited to less than one pixel within a certain distance.

#### 2.3. Rigid Panoramic Sensor Model

**T**

_{c}and u

_{c}in a separate camera coordinate can be rigorous, but it loses the meaning of panoramic imaging. Thus, we construct the rigorous camera model under the uniform panoramic coordinate, which means that the co-linearity through

**T**

_{c}u

_{s}is constructed:

**T**

_{c}= [T

_{x}, T

_{y}, T

_{z}]

^{T}is the translation vector between

**T**

_{c}and

**T**

_{s}and X represents the panoramic coordinate vector as in Equation (1). The vector λ(

**X**–

**T**

_{c}) thus presents the true ray

**T**

_{c}u

_{s}but in the mono camera coordinate system. The coordinate origin of the ray should be moved to the panoramic centre by adding a translation

**T**

_{c}. Now

**X**

_{A}represents the coordinates of the correct 3D point p

_{c}. The rigid perspective model under the panoramic coordinates is then constructed after rotation and translation with

**R**and

**T**, respectively. Formulation (4b) is the algebraic form of (4a) in which the unknown λis eliminated. Please note that the panoramic coordinate X obtained from Equation (1) should be consistent with

**T**

_{c}, which is different from different mono-lenses.

## 3. Ideal Co-Planarity vs. Rigorous Co-Planarity of a Panoramic Camera

#### 3.1. Ideal Panoramic Co-Planarity

_{s}${T}_{s}^{\prime}$. We write

**B**= [B

_{X}B

_{Y}B

_{Z}] and the corresponding rays

**T**

_{s}u

_{s}as

**V**

_{1}= [X

_{1}Y

_{1}Z

_{1}] and ${T}_{s}^{\prime}{u}_{s}^{\prime}$ as

**V**

_{2}= [X

_{2}Y

_{2}Z

_{2}]. The vectors

**B**,

**V**

_{1}and

**V**

_{2}satisfy the epipolar constraints as follows:

**R**is the rotation matrix between the two images.

**V**

_{1}and

**R**. Equation (6) represents a 3D plane that passes through the coordinate origin. Combined with Equation (2), the panoramic sphere equation, we conclude that the epipolar line of ideal panoramic stereo images is a large circle through the projection's centre. Equation (6) can be used as a geometric constraint for image matching and outlier elimination:

#### 3.2. Rigorous Panoramic Co-Planarity

_{s}, ${T}_{s}^{\prime}$ of the panoramic spheres but rather through the projection centres of the separate lenses T

_{c}, ${T}_{c}^{\prime}$. Thus, the vectors

**B**,

**V**

_{1}and

**V**

_{2}all have errors. Because the monocular rigorous camera model is constructed in uniform panoramic coordinates, we construct the co-planarity in the same coordinates.

**V**

_{1}and

**V**

_{2}should pass through the projection centres of the separate cameras as in Equations (7) and (8). In addition, B should be the baseline between the separate lenses but be in the uniform panoramic coordinates, as in Equation (9):

**B**,

**V**

_{1}and

**V**

_{2}are calculated as Equations (7–9), Equation (5) will be a rigorous model for stereo panoramic co-planarity. We can also calculate the epipolar line by expanding Equation (5):

**R**, and the offsets between the panoramic centre and the projection centres of the separate cameras do not equal zero. Thus, the epipolar line is not a large circle around the panoramic sphere. However, d is typically a very small value, which makes the epipolar very similar to a large circle.

**B**and the orientation

**R**between stereo images and as a geometric constraint to eliminate outliers.

## 4. GPS-Supported Visual SLAM with the Rigorous Camera Model

#### 4.1. Data Association

**R**

_{i},

**T**

_{i}represent the rotation and translation of the i-th image to the global coordinates, respectively, and

**R**

_{0},

**T**

_{0}represent the first image.

#### 4.2. Segmented BA-SLAM

_{1}, Δ

**R**

_{1}, Δ

**T**

_{1}represent the difference of the scale, rotation and translation parameters between the two blocks, which can be calculated as a well-known 3D transformation according to the corresponding landmarks in two blocks:

**X**

_{1},

**X**

_{2}represent the corresponding landmarks from the first and second blocks, respectively, which were obtained by multi-intersection. A larger dataset of

**X**

_{1},

**X**

_{2}will provide a more robust solution, and we set the adjacent blocks with five overlapping images. After all of the blocks have been connected, a global BA result of the local optima can be obtained. Similar to global BA, segmented BA cannot reduce the accumulation of uncertainties. As in Equation (13), the errors of

**X**

_{1}will be propagated to

**X**

_{2}according to the error propagation law.

#### 4.3. GPS-Supported BA-SLAM

_{G}, Y

_{G}, Z

_{G}and X

_{S}, Y

_{S}, Z

_{S}are the GPS observations and camera positions at each exposure time, respectively.

**R**is the rotation matrix, and U, V, W represents the translation between the camera projection centre and the antenna centre of GPS receiver, which can regarded as fixed values because of the calibration that was performed in advance. When combined with (4b), the GPS-supported BA with the rigorous sensor model is obtained.

**x**represents the unknowns of the features and

**t**represents the six translation and rotation parameters.

**A**and

**B**are Jacobians of Equation (4b),

**L**represents the constant terms,

**C**is the Jacobian of Equation (14) and

**L**

_{G}represents the corresponding constants.

**P**and

**P**

_{G}are the inverse matrices of the covariance matrix that describe the uncertainties of the ray observations and GPS observations, respectively. The normal equations are then constructed as Equation (16). Equation (16) contains two types of unknowns; typically, the unknown

**x**is eliminated, and only

**t**remains, as shown in Equation (17). After Equation (17) is solved with a sparse Cholesky solver as in [33],

**t**is substituted into Equation (16) to solve for

**x**. It is time consuming to obtain an exact solution for

**P**for every observation, particularly at a large scale.

**P**is typically set to an identity matrix under the assumption that all observation errors are Gaussian and independently distributed.

**P**

_{G}will be deduced according to the accuracy of the GPS against the accuracy of the ray observations. In our test,

**P**

_{G}is between 0.1 and 1:

## 5. Experiments

#### 5.1. Test Design

#### 5.2. BA Results without GPS

#### 5.3. BA Results with GPS

## 6. Conclusions and Future Work

## Acknowledgments

## References

- Eade, E.; Fong, P.; Munich, M.E. Monocular graph SLAM with complexity reduction. Proceedings of 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, 18– 22 October 2010; pp. 3017–3024.
- Weiss, S.; Scaramuzza, D.; Siegwart, R. Monocular-SLAM-based navigation for autonomous micro helicopters in GPS-Denied environments. J. Field Robot.
**2011**, 28, 854–874. [Google Scholar] - Senlet, T.; Elgammal, A. A framework for global vehicle localization using stereo images and satellite and road maps. Proceedings of 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; pp. 2034–2041.
- Lin, K.H.; Chang, C.H.; Dopfer, A.; Wang, C.C. Mapping and localization in 3D environments using a 2D Laser scanner and a stereo camera. J. Inf. Sci. Eng.
**2012**, 28, 131–144. [Google Scholar] - Geyer, C.; Daniilidis, K. Catadioptric projective geometry. Int. J. Comp. Vis.
**2001**, 45, 223–243. [Google Scholar] - Barreto, J.P.; Araujo, H. Geometric properties of central catadioptric line images and their application in calibration. IEEE Tran. Pattern Anal. Mach. Intel.
**2005**, 27, 1327–1333. [Google Scholar] - Mei, C.; Benhimane, S.; Malis, E.; Rives, P. Efficient homography-based tracking and 3-D reconstruction for single-viewpoint sensors. IEEE Trans. Robot.
**2008**, 24, 1352–1364. [Google Scholar] - Kaess, M.; Dellaert, F. Probabilistic structure matching for visual SLAM with a multi-camera rig. Comput. Vis. Image Underst.
**2010**, 114, 286–296. [Google Scholar] - Paya, L; Fernandez, L; Gil, A.; Reinoso, O. Map building and Monte Carlo localization using global appearance of omnidirectional images. Sensors
**2010**, 10, 11468–11497. [Google Scholar] - Gutierrez, D.; Rituerto, A.; Montiel, J.M.M.; Guerrero, J.J. Adapting a real-time monocular visual SLAM from conventional to omnidirectional cameras. Proceedings of the 11th OMNIVIS in IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 343–350.
- Strasdat, H.; Montiel, J.M.M.; Davison, A.J. Visual SLAM: Why filter? Image Vision Comput
**2012**, 30, 65–77. [Google Scholar] - Artieda, J.; Sebastian, J.M.; Campoy, P.; Correa, J.F.; Mondragon, I.F.; Martinez, C.; Olivares, M. Visual 3-D SLAM from UAVs. J. Intell. Robot. Syst.
**2009**, 55, 299–321. [Google Scholar] - Davison, J. Real-time simultaneous localization and mapping with a single camera. Proceedings of the International Conference on Computer Vision (ICCV), Nice, France, 13–16 October 2003; pp. 1403–1410.
- Zhang, X.; Rad, A.B.; Wong, Y.-K. Sensor fusion of monocular cameras and Laser rangefinders for line-based simultaneous localization and mapping (SLAM) tasks in autonomous mobile robots. Sensors
**2012**, 12, 429–452. [Google Scholar] - Eade, E.; Drummond, T. Scalable monocular SLAM. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, NY, USA, 17–22 June 2006; pp. 469–476.
- Sim, R.; Elinas, P.; Griffin, M.; Little, J.J. Vision-based SLAM using the Rao-Blackwellised particle filter. Proceedings of the IJCAI Workshop on Reasoning with Uncertainty in Robotics, Edinburgh, Scotland, 30 July 2005.
- Sibley, G.; Mei, C.; Reid, I.; Newman, P. Vast-scale outdoor navigation using adaptive relative bundle adjustment. Int. J. Robot. Res.
**2010**, 29, 958–980. [Google Scholar] - Lim, J.; Pollefeys, M.; Frahm, J.M. Online environment mapping. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 20–25 June 2011; pp. 3489–3496.
- Miller, I.; Campbell, M.; Huttenlocher, D. Map-aided localization in sparse global positioning system environments using vision and particle filtering. J. Field Robot.
**2011**, 28, 619–643. [Google Scholar] - Bergasa, L.M.; Ocana, M.; Barea, R.; Lopez, M.E. Real-time hierarchical outdoor SLAM based on stereovision and GPS fusion. IEEE Trans. Intell. Trans. Syst.
**2009**, 10, 440–452. [Google Scholar] - Dusha, D.; Mejias, L. Error analysis and attitude observability of a monocular GPS/visual odometry integrated navigation filter. Int. J. Robot. Res.
**2012**, 31, 714–737. [Google Scholar] - Berrabah, S.A.; Sahli, H.; Baudoin, Y. Visual-based simultaneous localization and mapping and global positioning system correction for geo-localization of a mobile robot. Meas. Sci. Technol.
**2011**, 22. [Google Scholar] [CrossRef] - Kannala, J. A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. PAMI
**2006**, 28, 1335–1340. [Google Scholar] - Tardif, J.P.; Pavlidis, Y.; Daniilidis, K. Monocular visual odometry in urban environments using an omnidirectional camera. IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), Nice, France, 22–26 September 2008; pp. 2531–2538.
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis.
**2004**, 60, 91–110. [Google Scholar] - Sinha, S.N.; Frahm, J.M.; Pollefeys, M.; Yakup Genc, Y. GPU-based video feature tracking and matching, EDGE 2006. Proceedings of Workshop on Edge Computing Using New Commodity Architectures, Chapel Hill, NC, USA, 23–24 May 2006.
- Fischler, M.A.; Bolles, R.C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. CACM
**1981**, 24, 381–395. [Google Scholar] - Scaramuzza, D. 1-point-ransac structure from motion for vehicle-mounted cameras by exploiting non-holonomic constraints. Int. J. Comput. Vis.
**2011**, 95, 74–85. [Google Scholar] - Pinies, P.; Tardos, J.D. Scalable SLAM building conditionally independent local maps. Proceedings of IEEE Conference on Intelligent Robots and Systems, San Diego, CA, USA, 29 October–2 November 2007; pp. 3466–3471.
- Eade, E.; Drummond, T. Unified loop closing and recovery for real time monocular SLAM. Proceedings of the British Machine Vision Conference, Leeds, UK, 1–4 September 2008.
- Snay, R.; Soler, T. Continuously operating reference station (CORS): History, applications, and future enhancements. J. Surv. Eng.
**2008**, 134, 95–104. [Google Scholar] - Meguro, J.; Hashizume, T.; Takiguchi, J.; Kurosaki, R. Development of an autonomous mobile surveillance system using a network-based RTK-GPS. Proceedings of the IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 3096–3101.
- Kuemmerle, R.; Grisetti, G.; Strasdat, H.; Konolige, K.; Burgard, W. g2o: A general framework for graph optimization. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 3607–3613.
- LADYBUG. Available online: http://www.ptgrey.com/products/spherical.asp (accessed on 15 September 2012).

**Figure 1.**An overview of the results of our proposed method. The blue line in the middle represents the trajectory through the Kashiwa campus of the University of Tokyo, and the nearby green circles are the tie-points. The separate images in the route are from the 5 mono-lenses. Green dots indicate correctly matched tie-points with good distributions, and red dots indicate mismatched points that have been excluded from the error detection steps. A 3D view of the results is shown in the top left corner; blue dots represent the position and posture after SLAM, and pink dots represent the GPS route. The two boxes in the bottom right are the zoomed area in which the GCPs are included. The light green corresponding rays intersect correctly in the right box, and the RMSE of the tie-points and check points both reach an accuracy of several centimetres with a grid scale of 0.2 m (left box).

**Figure 2.**Representation of a panoramic camera consisting of five mono cameras. The dashed line on the left indicates an ideal ray corresponding to ideal sensor model that passes through the panoramic centre T

_{S}, a point on the panoramic sphere u

_{s}and the object p

_{s}. In reality, u

_{s}is imaged from the mono camera, and the projection centre is

**T**

_{c}; the real ray is represented by the solid line corresponding to our rigorous sensor model and passes through

**T**

_{c}, u

_{s}and p

_{c}. Two errors are introduced by the ideal model: one is the ray direction bias, and the other is the position offset of the landmarks.

**Figure 4.**A 3D view of successfully matched tie-points (green). The points excluded by RANSAC (the first outlier elimination step) and histogram voting (the second step) are shown in red, and those excluded by BA (the third step) are shown in blue. The blue rays represent features that cannot intersect precisely, such as feature 267. Feature 267 may be regarded as correctly matched (right-middle images), but the lack of information about features in the window may introduce a bias of one or more pixels, which causes a slight intersection error (left-middle image). In contrast, feature 245 has a better texture and intersects precisely.

**Figure 5.**Panoramic image and separate images captured by the Ladybug system. (

**a**) Panoramic image. (

**b**) Images from 6 separate fish-eye lenses. The image aimed at the sky is not used in our SLAM.

**Figure 6.**Results of the segmented BA-SLAM and GPS-supported BA-SLAM methods. The yellow line is the trajectory of the unconstrained results after data association and block BA. The start point is located in the correct position, but the trajectory shows a large accumulation of uncertainty in angle and scale. The blue line represents the trajectory after the GPS-supported BA-SLAM method is applied and shows a high level of accuracy. All eight GCPs are located in the enlarged area and are shown in Figure 7.

**Figure 7.**The eight GCPs, with accuracy up to 2 cm, are used in the experiments to check the accuracy of the GPS-supported BA-SLAM.

**Figure 8.**(

**a**) Check errors vs. the number of GPS observations used. “Distance interval n” on the X axis means that one GPS observation is selected for every n m. The check errors of all 8 GCPs increase when the number of GPS observations is reduced but are still less than 0.35 m. (

**b**) Check errors vs. number of gross errors added to the GPS observations. On the X axis, “Distance interval n” means that the gross error is added to one GPS observation every n meters. The check errors of all 8 GCPs increase when more GPS observations are given gross errors but are still less than 0.4 m.

**Figure 9.**(

**a**) Result comparison between GPS with 50 m interval and All GPS. In fact there are two trajectories with different colours, blue and green, which cannot be distinguished in a scale of 20 m. While at the zoomed area with a scale of 2 m, we can see the very slight difference. (

**b**) Results comparison between GPS with errors introduced per 5 m and all good GPS. The same to (a), we can only distinguish the difference of trajectories in the zoomed area.

ID | D_{X} (cm) | D_{Y} (cm) | D_{Z} (cm) | D_{XYZ} (cm) |
---|---|---|---|---|

K7 | 3.6 | 6.6 | 3.1 | 8.1 |

K5 | 3.8 | 6.0 | 1.7 | 7.3 |

K3 | 4.0 | 4.6 | 3.5 | 7.0 |

K8 | 4.4 | 4.2 | 2.3 | 6.5 |

K9 | 3.3 | 4.2 | 1.8 | 5.6 |

F2 | −1.7 | 6.9 | 0.5 | 7.1 |

G3 | −1.2 | 6.0 | 0.2 | 6.1 |

H2 | 2.4 | 5.2 | −0.6 | 5.7 |

Average | 3.0 | 5.5 | 1.7 | 6.7 |

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Shi, Y.; Ji, S.; Shi, Z.; Duan, Y.; Shibasaki, R.
GPS-Supported Visual SLAM with a Rigorous Sensor Model for a Panoramic Camera in Outdoor Environments. *Sensors* **2013**, *13*, 119-136.
https://doi.org/10.3390/s130100119

**AMA Style**

Shi Y, Ji S, Shi Z, Duan Y, Shibasaki R.
GPS-Supported Visual SLAM with a Rigorous Sensor Model for a Panoramic Camera in Outdoor Environments. *Sensors*. 2013; 13(1):119-136.
https://doi.org/10.3390/s130100119

**Chicago/Turabian Style**

Shi, Yun, Shunping Ji, Zhongchao Shi, Yulin Duan, and Ryosuke Shibasaki.
2013. "GPS-Supported Visual SLAM with a Rigorous Sensor Model for a Panoramic Camera in Outdoor Environments" *Sensors* 13, no. 1: 119-136.
https://doi.org/10.3390/s130100119