# Non-Linearity Analysis of Depth and Angular Indexes for Optimal Stereo SLAM

^{*}

## Abstract

**:**

## 1. Introduction

## 2. System Structure

_{c}is composed of its 3D position using cartesian coordinates, the camera orientation in terms of a quaternion, and linear and angular speeds, which are necessary for the impulse motion model used for modelling the camera movement.

_{x}u

_{y}u

_{z})

^{t}in the following way:

#### 2.1. 3D Features

#### 2.2. Inverse depth Features

_{ori}, the orientation of the ray passing through the image point (angles of azimuth θ and elevation φ) and the inverse of its depth, ρ. Figure 2 depicts the inverse depth point coding:

## 3. Non-Linearity Analysis of Depth and Angular Information

_{f}:

- If the non-linearity index L
_{f}is equal to zero for a point Z_{i}, this implies that the function f is linear in interval ΔZ. - If the non-linearity index L
_{f}takes values higher than zero, this implies that the function f is not linear in the interval ΔZ.

#### 3.1. Depth Non-Linearity

_{x}is the horizontal focal length in pixels, d

_{u}is the horizontal disparity in pixels and B is the baseline. The non-linearity index for the depth as a function of the horizontal disparity, is computed as follows:

#### 3.2. Angular Non-Linearity

_{a}is computed considering the angles of azimuth and elevation.

#### 3.3. Optimal Depth Threshold

_{x}and the image size (width (W), height (H)), we can estimate the stereo error from the maximum disparity d

_{uMAX}= W − 1 (minimum depth) to the minimum disparity d

_{uMIN}= 1 in incremental steps of 1 pixels as:

_{x}, B and image size (W,H). Figure 3 depicts the depth accuracy for different stereo baselines B considering fixed f

_{x}= 202 pixels and image size (W = 320, H = 240).

_{x}= 202 pixels and image size 320 × 240. Figure 4(b) depicts a zoomed version of the non-linearity graphs for our stereo rig configuration (f

_{x}= 202, B = 15 cm, W = 320, H = 240).

_{t}= 5.71 m as the optimal one for switching between both types of parametrizations. Our result is quite similar to the one obtained by Paz et al. in [13], where they found by means of an empirically analysis a threshold of 5 m considering a baseline of 12 cm.

## 4. EKF SLAM Overview

**Prediction Step**$$\widehat{X}(k+1|k)=f(X(k|k))=f(k|k)$$$$\widehat{P}(k+1|k)=\frac{\partial f}{\partial X}(k|k)\xb7P(k|k)\xb7{\left(\frac{\partial f}{\partial X}(k|k)\right)}^{t}+Q(k)$$**Update Step**$$\widehat{X}(k+1|k+1)=\widehat{X}(k+1|k)+W(k+1)\xb7\eta {(k+1)}_{\mathit{tot}}$$$$P(k+1|k+1)=P(k+1|k)-W(k+1)\xb7S(k+1)\xb7{(W(k+1))}^{t}$$

_{tot}is the innovation vector, i.e., it means the difference between the current measurement vector and the predicted measurement one: (η

_{tot}= z

_{tot}− h

_{tot}).

#### 4.1. Motion Model

^{W}) and angular accelerations (α⃗

^{C}) cause impulses of linear (V⃗

^{W}) and angular (Ω⃗

^{W}) velocities. According to this, the noise vector n⃗ can be expressed as:

_{v}is defined:

_{i}are the same from one step to the next one. Assuming that linear and angular speeds are independent, the covariance matrix of the noise vector n⃗ will be diagonal. Then, the process noise covariance Q can be computed via the corresponding Jacobian function as follows:

#### 4.2. Measurement Model

_{i}= (u

_{L}v

_{L}u

_{R}v

_{R})

^{t}.

#### 4.2.1. Measurement Prediction

_{i}for each of the visible features. This vector can be obtained as the result of a coordinate frame change (from the world coordinate frame W to the camera coordinate frame C) and then projecting the resulting 3D vector into the image plane according to the camera calibration matrix K and stereo-rig calibration parameters.

#### 4.2.2. Measurement Search

_{i}we have to define a search area around the predicted projections to limit the search to a high probability area of finding a good measurement inside. This area is computed based on the uncertainty of the features 3D position, which is called innovation covariance S

_{i}. This covariance essentially depends on three parameters: The camera state uncertainty P

_{XX}, the feature position uncertainty P

_{YY}and the measurement noise R

_{i}. The expression for this covariance is obtained as follows:

_{i}needs to be transformed into the projection covariances for both left and right views, S

_{iL}and S

_{iR}respectively. These two covariances can be obtained easily from the S

_{i}matrix. These two covariances define both elliptical search regions, which are obtained taking into account a certain number of standard deviations (usually 3) from the 2D Gaussians.

#### 4.2.3. Filter Update

_{i}/∂X

_{cam}and ∂h

_{i}/∂Y

_{i}are obtained from Equation 25, which conveniently grouped form the total Jacobian (∂h/∂X)

_{tot}. Following the same procedure, the vector z

_{tot}that contains all the measurements is formed as well.

#### 4.3. Feature Management

#### 4.4. Switching between Inverse Depth and 3D Features

## 5. 2D Homography Warping

_{1}and X

_{2}is defined by:

_{1}is a point on the plane defined by Equation 32:

^{t}is the plane normal. According to this, the following relationship can be found:

^{RL}and the translation vector T

^{RL}between both cameras. The values of these matrices are known accurately, since they are estimated in a previous stereo calibration process. Supposing an affine transformation between left and right image patches, the affine transformation ${H}_{A}^{RL}$ can be expressed as:

^{RL}· n

^{t}can be isolated. Denoting this product as X, it can be obtained as follows:

^{CO}and T

^{CO}are the rotation and translation matrices between the current left camera position and the reference position when the feature was initialized.

## 6. Experiments in Indoor Environments

_{t}= 10 m and Z

_{t}= 5.7 m. The second sequence was a typical L sequence of dimensions 3 m through the X axis and 6 m through the Z axis. Finally, the last sequence was a loop of dimensions 4.8 m through the X axis and 5 m through the Z axis. Figure 6(a) depicts the evolution of the state vector size for some frames of the L sequence. As it can be observed, the size of the state vector considering an inverse depth parametrization with a threshold of 5.7 m is bigger than in the rest of the cases, due to the computational overhead of using an inverse depth parametrization.

- %
**Inverse Features**: Is the percentage of the total number of features in the map that were initialized with an inverse depth parametrization. - ɛ
_{i}: Is the absolute mean error in m, for the cartesian coordinates (X, Z). **Mean**P_{YY}**Trace**: Is the mean trace of the covariance matrix P_{YY}for each of the features that compose the final map. This parameter is indicative of the uncertainty of the features, i.e., the quality of the map.

**Case**: No patch transformation or 2D patch warping.**Sequence**: The test sequences for which we performed the comparison. We selected the corridor and L sequence. In the corridor sequence the changes in appearance are mainly due to changes in scale, whereas in the L sequence changes in appearance are mainly due to changes in scale and viewpoint.- #
**Features Map**: Is the total number of features in the map at the end of the sequence. - #
**Total Attempts**: Is the total number of feature measurement attempts during all the sequence. - #
**Successful Attempts**: Is the total number of successful feature measurement attempts during all the sequence. **Ratio**: Is the ratio between the number of successful measurement attempts and the total number of attempts.

_{t}= 5.7 m. The results were taken using a 2.0 GHz speed CPU. It can be observed that the most consuming steps are: Feature initialization, measurement and update. According to Table 4, the initialization of 15 features takes approximately 15 ms, since we have to run Harris corner detector, find the correspondence of interesting points on the left image and the right image by means of the epipolar search, obtain the 3D coordinates of the point and compute the values of the normal plane for the 2D warping. However, we have only to perform such an exhaustive initialization at the first frame, then we track features and only initialize new features when the number of visible features is small than a lower bound. In our system we also have an upper bound relative to the number of features that are visible for a given camera pose, and this bound is set to 15 for computational purposes. We only try to measure those features which are predicted to be visible. EKF filter update dominates processing time, since as long as we add new more landmarks to the filter the cost of the update is O(n

^{3}), being n the number of landmarks. For this reason, our system works only under real-time constraints under small environments whose number of landmarks is below 100 approximately. For mapping larger environments submapping strategies [21] or more efficient filtering methods such as [22] can be used.

## 7. Conclusions and Future Works

## Acknowledgments

## References

- Broida, T.; Chandrashekhar, S.; Chellappa, R. Recursive 3-D Motion Estimation from a Monocular Image Sequence. IEEE Trans. Aerosp. Electron. Syst
**1990**, 26, 639–656. [Google Scholar] - Broida, T.; Chellappa, R. Estimating the Kinematics and Structure of a Rigid Object from a Sequence of Monocular Images. IEEE Trans. Pattern Anal. Machine Intell
**1991**, 13, 497–513. [Google Scholar] - Mountney, P.; Stoyanov, D.; Davison, A.J.; Yang, G.Z. Simultaneous Stereoscope Localization and Soft-Tissue Mapping for Minimally Invasive Surgery. Proceedings of Medical Image Computing and Computer Assisted Intervention (MICCAI), Copenhagen, Denmark, October 1–6, 2006.
- Klein, G.; Murray, D. Parallel Tracking and Mapping for Small AR Workspaces. Proceedings of the 6th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Phoenix, AZ, USA, October 28–November 2, 2007.
- Schleicher, D.; Bergasa, L.M.; Barea, R.; Lóez, E.; Ocaña, M.; Nuevo, J. Real-Time Wide-Angle Stereo Visual SLAM on Large Environments Using SIFT Features Correction. Proceedings of IEEE /RSJ International Conference on Intelligent Robots and Systems (IROS), San Diego, CA, USA, October 29–November 2, 2007.
- Schleicher, D.; Bergasa, L.M.; Barea, R.; Lóez, E.; Ocaña, M. Real-Time Simultaneous Localization and Mapping with a Wide-Angle Stereo Camera and Adaptive Patches. Proceedings of IEEE /RSJ International Conference on Intelligent Robots and Systems (IROS), Beijing, China, October 9–15, 2006.
- Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-Time Single Camera SLAM. IEEE Trans. Pattern Anal. Machine Intell
**2007**, 29, 1052–1067. [Google Scholar] - Civera, J.; Davison, A.J.; Montiel, J.M. Inverse Depth Parametrization for Monocular SLAM. IEEE Trans. Robotics
**2008**, 24, 932–945. [Google Scholar] - Walker, B.N.; Lindsay, J. Navigation Performance with a Virtual Auditory Display: Effects of Beacon Sound, Capture Radius, and Practice. Human Factors
**2006**, 48, 265–278. [Google Scholar] - Li, L.J.; Socher, R.; Li, F.F. Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2009), Miami, FL, USA, June 20–26, 2009.
- Oh, S.; Tariq, S.; Walker, B.; Dellaert, F. Map-Based Priors for Localization. Proceedings of IEEE /RSJ International Conference on Intelligent Robots and Systems (IROS), Sendai, Japan, September 28–October 2, 2004.
- Saéz, J.M.; Escolano, F.; Penalver, A. First Steps towards Stereo-Based 6DOF SLAM for the Visually Impared. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, June 20–26, 2005.
- Paz, L.M.; Piniés, P.; Tardós, J.D.; Neira, J. Large Scale 6DOF SLAM with Stereo-in-hand. IEEE Trans. Robotics
**2008**, 24, 946–957. [Google Scholar] - Paz, L.M.; Guivant, J.; Tardós, J.D.; Neira, J. Data Association in O(n) for Divide and Conquer SLAM. Proceedings of Robotics: Science and Systems, Atlanta, GA, USA, June 27–30, 2007.
- Harris, C.; Stephens, M. A Combined Corner and Edge Detector. Proceedings of the 4th Alvey Vision Conference, Manchester, UK, August 30–September 2, 1988; pp. 147–151.
- Eade, E.; Drummond, T. Monocular SLAM as a Graph of Coalesced Observations. Proceedings of International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, October 14–20, 2007.
- Liang, B.; Pears, N. Visual Navigation Using Planar Homographies. Proceedings of IEEE International Conference on Robotics and Automation (ICRA), Washington, DC, USA, May 11–15, 2002.
- Molton, N.; Davison, A.J.; Reid, I. Locally Planar Patch Features for Real-Time Structure from Motion. Proceedings of British Machine Vision Conference (BMVC), London, UK, September 7–9, 2004.
- Chum, O.; Pajdla, T.; Sturm, P. The Geometric Error for Homographies. Comput. Vision Image Underst
**2005**, 97, 86–102. [Google Scholar] - Documentation: Camera Calibration Toolbox for Matlab. 2007. Available online: http://www.vision.caltech.edu/bouguetj/calib_doc/ (accessed on 20 April 2010).
- Piniés, P.; Tardós, J.D. Large Scale SLAM Building Conditionally Independent Local Maps: Application to Monocular Vision. IEEE Trans. Robotics
**2008**, 24, 1094–1106. [Google Scholar] - Kaess, M.; Ranganathan, A.; Dellaert, F. iSAM: Incremental Smoothing and Mapping. IEEE Trans. Robotics
**2008**, 24, 1365–1378. [Google Scholar] - Agrawal, M.; Konolige, K.; Blas, M.R. CenSurE: Center Surround Extremas for Realtime Feature Detection and Matching. Proceedings of the 10th European Conference on Computer Vision (ECCV), Marseille, France, October 12–18, 2008.
- Schleicher, D.; Bergasa, L.M.; Ocaña, M.; Barea, R.; Lóez, E. Real-Time Hierarchical Outdoor SLAM Based on Stereovision and GPS Fusion. IEEE Trans. Intell. Transp. Systems
**2009**, 10, 440–452. [Google Scholar] - Lowe, D. Distinctive Image Features from Scale-Invariant Keypoints. Intl. J. Comput. Vision
**2004**, 60, 91–110. [Google Scholar] - Angeli, A.; Filliat, D.; Doncieux, S.; Meyer, J.A. Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words. IEEE Trans. Robotics
**2008**, 24, 1027–1037. [Google Scholar] - Cummins, M.; Newman, P. Highly Scalable Appearance-Only SLAM–FAB-MAP 2.0. Proceedings fo Robotics: Science and Systems (RSS09), Seattle, WA, USA, June 29–July 01, 2009.
- Triggs, B.; McLauchlan, P.; Hartley, R.; Fitzgibbon, A. Bundle Adjustment—A Modern Synthesis. In Vision Algorithms: Theory and Practice; Triggs, W., Zisserman, A., Szeliski, R., Eds.; Springer Verlag: New York, NY, USA, 1999; pp. 298–375. [Google Scholar]
- Llorca, F.D.; Sotelo, A.M.; Parra, I.; Ocaña, M.; Bergasa, M.L. Error Analysis in a Stereo Vision-Based Pedestrian Detection Sensor for Collision Avo idance Applications. Sensors
**2010**, 10, 3741–3758. [Google Scholar]

**Figure 3.**(a) Absolute and (b) relative depth estimation errors for a stereo rig considering a focal length f

_{x}= 202 pixels and image size 320 × 240, for different baselines.

**Figure 4.**Depth and angular non-linearity indexes witch focal length f

_{x}= 202 pixels and image size 320 × 240. (a) Different stereo rig baselines (b) A zoomed version for our stereo rig configuration B = 15 cm.

**Figure 6.**Inverse depth and 3D comparison. (a) Total state vector size, (b) Without inverse depth par. L sequence, (c) with inverse depth par. Z = 5.7 m L sequence, (d) without inverse depth par. loop sequence, (e) with inverse depth par. Z = 10 m loop sequence, (f) with inverse depth par. Z = 5.7 m loop sequence.

**Table 1.**Optimal depth thresholds for different stereo baselines and fixed focal length f

_{x}= 202 and image size (320, 240).

Stereo Baseline (cm) | Depth Threshold (m) |
---|---|

15 | 5.71 |

20 | 6.69 |

30 | 8.35 |

40 | 9.81 |

Sequence | Case | % Inverse Features | ɛ_{X} | ɛ_{Z} | Mean P_{YY} Trace |
---|---|---|---|---|---|

Corridor | Without Inverse Par. | 0.00 | 0.9394 | 0.4217 | 0.1351 |

Corridor | With Inverse Par., Z_{t} = 10 m | 5.23 | 0.9259 | 0.4647 | 0.0275 |

Corridor | With Inverse Par., Z_{t} = 5.7 m | 24.32 | 0.7574 | 0.3777 | 0.0072 |

L | Without Inverse Par. | 0.00 | 0.5047 | 0.3985 | 0.1852 |

L | With Inverse Par., Z_{t} = 10 m | 7.85 | 0.5523 | 0.1017 | 0.0245 |

L | With Inverse Par., Z_{t} = 5.7 m | 19.21 | 0.5534 | 0.2135 | 0.0078 |

Loop | Without Inverse Par. | 0.00 | 0.4066 | 0.9801 | 0.2593 |

Loop | With Inverse Par., Z_{t} = 10 m | 5.27 | 0.3829 | 0.63030 | 0.0472 |

Loop | With Inverse Par., Z_{t} = 5.7 m | 12.36 | 0.2191 | 0.3778 | 0.0310 |

Case | Sequence | # Features Map | # Total Attempts | # Successful Attempts | Ratio % |
---|---|---|---|---|---|

No Patch Transformation | Corridor | 85 | 6,283 | 5,612 | 89.32 |

2D Warping | Corridor | 68 | 6,398 | 5,781 | 90.35 |

No Patch Transformation | L | 116 | 11,627 | 8,922 | 76.73 |

2D Warping | L | 105 | 10,297 | 9,119 | 88.71 |

Filter Step | Time ms |
---|---|

Feature Initialization (15) | 18.00 |

Prediction | 0.47 |

Measurement | 10 |

Update | 4.96 |

© 2010 by the authors; licensee MDPI, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license http://creativecommons.org/licenses/by/3.0/.

## Share and Cite

**MDPI and ACS Style**

Bergasa, L.M.; Alcantarilla, P.F.; Schleicher, D.
Non-Linearity Analysis of Depth and Angular Indexes for Optimal Stereo SLAM. *Sensors* **2010**, *10*, 4159-4179.
https://doi.org/10.3390/s100404159

**AMA Style**

Bergasa LM, Alcantarilla PF, Schleicher D.
Non-Linearity Analysis of Depth and Angular Indexes for Optimal Stereo SLAM. *Sensors*. 2010; 10(4):4159-4179.
https://doi.org/10.3390/s100404159

**Chicago/Turabian Style**

Bergasa, Luis M., Pablo F. Alcantarilla, and David Schleicher.
2010. "Non-Linearity Analysis of Depth and Angular Indexes for Optimal Stereo SLAM" *Sensors* 10, no. 4: 4159-4179.
https://doi.org/10.3390/s100404159