# Visual Tilt Estimation for Planar-Motion Methods in Indoor Mobile Robots

## Abstract

**:**

## 1. Introduction

#### 1.1. Related Works

#### 1.2. Contributions of This Study

## 2. Materials and Methods

#### 2.1. Image-Space Method

#### 2.1.1. Preliminary Calculations

#### 2.1.2. Edge Pixel Extraction

#### 2.1.3. Edge Pixel Processing

#### Panoramic Projections of Vertical Elements

#### Approximating Tilts through Image Shifts

#### Tilt Parameters from Vanishing Point Shifts

#### 2.1.4. Rejecting Incorrect Pixels

#### RANSAC Variant

#### Reject-Refit-Variant

#### 2.2. Vector-Consensus Method

#### 2.2.1. Orientation Estimation

#### 2.2.2. Vertical Edge Selection

#### 2.3. Image Database

#### 2.4. Experiment Design

## 3. Results

#### 3.1. Tilt Estimation Error

#### 3.2. Computation Time

## 4. Discussion

## 5. Conclusions

#### 5.1. Outlook

## Acknowledgments

## Conflicts of Interest

## References

- Fleer, D.; Möller, R. Comparing Holistic and Feature-Based Visual Methods for Estimating the Relative Pose of Mobile Robots. Robot. Auton. Syst.
**2017**, 89, 51–74. [Google Scholar] [CrossRef] - Scaramuzza, D.; Fraundorfer, F. Visual Odometry [Tutorial]. Robot. Autom. Mag.
**2011**, 18, 80–92. [Google Scholar] [CrossRef] - Fraundorfer, F.; Scaramuzza, D. Visual Odometry: Part II: Matching, Robustness, Optimization, and Applications. Robot. Autom. Mag.
**2012**, 19, 78–90. [Google Scholar] [CrossRef] [Green Version] - Lowry, S.; Sünderhauf, N.; Newman, P.; Leonard, J.J.; Cox, D.; Corke, P.; Milford, M.J. Visual Place Recognition: A Survey. IEEE Trans. Robot.
**2016**, 32, 1–19. [Google Scholar] [CrossRef] - Fuentes-Pacheco, J.; Ruiz-Ascencio, J.; Rendón-Mancha, J.M. Visual simultaneous localization and mapping: A survey. Artif. Intell. Rev.
**2015**, 43, 55–81. [Google Scholar] [CrossRef] - Franz, M.O.; Schölkopf, B.; Mallot, H.A.; Bülthoff, H.H. Where did I take that snapshot? Scene-based homing by image matching. Biol. Cybern.
**1998**, 79, 191–202. [Google Scholar] [CrossRef] - Stürzl, W.; Mallot, H.A. Efficient visual homing based on Fourier transformed panoramic images. Robot. Auton. Syst.
**2006**, 54, 300–313. [Google Scholar] [CrossRef] - Franz, M.O.; Stürzl, W.; Hübner, W.; Mallot, H.A. A Robot System for Biomimetic Navigation—From Snapshots to Metric Embeddings of View Graphs. In Robotics and Cognitive Approaches to Spatial Mapping; Springer: Berlin/Heidelberg, Germany, 2007; pp. 297–314. [Google Scholar]
- Booij, O.; Zivkovic, Z. The Planar Two Point Algorithm; IAS Technical Report IAS-UVA-09-05; University of Amsterdam, Faculty of Science, Informatics Institute: Amsterdam, The Netherlands, 2009. [Google Scholar]
- Möller, R.; Krzykawski, M.; Gerstmayr, L. Three 2D-Warping Schemes for Visual Robot Navigation. Auton. Robot.
**2010**, 29, 253–291. [Google Scholar] [CrossRef] - Booij, O.; Kröse, B.; Zivkovic, Z. Efficient Probabilistic Planar Robot Motion Estimation Given Pairs of Images. In Robotics: Science and Systems VI; MIT Press: Cambridge, MA, USA, 2010; pp. 201–208. [Google Scholar]
- Gerstmayr-Hillen, L.; Schlüter, O.; Krzykawski, M.; Möller, R. Parsimonious Loop-Closure Detection Based on Global Image-Descriptors of Panoramic Images. In Proceedings of the International Conference on Advanced Robotics (ICAR 2011), Tallinn, Estonia, 20–23 June 2011; pp. 576–581. [Google Scholar]
- Nistér, D.; Naroditsky, O.; Bergen, J. Visual odometry for ground vehicle applications. J. Field Robot.
**2006**, 23, 3–20. [Google Scholar] [CrossRef] - Stewenius, H.; Engels, C.; Nistér, D. Recent developments on direct relative orientation. ISPRS J. Photogramm. Remote Sens.
**2006**, 60, 284–294. [Google Scholar] [CrossRef] - Lobo, J.; Dias, J. Relative pose calibration between visual and inertial sensors. Int. J. Robot. Res.
**2007**, 26, 561–575. [Google Scholar] [CrossRef] - Bazin, J.C.; Demonceaux, C.; Vasseur, P.; Kweon, I. Rotation estimation and vanishing point extraction by omnidirectional vision in urban environment. Int. J. Robot. Res.
**2012**, 31, 63–81. [Google Scholar] [CrossRef] - Coughlan, J.M.; Yuille, A.L. Manhattan World: Orientation and Outlier Detection by Bayesian Inference. Neural Comput.
**2003**, 15, 1063–1088. [Google Scholar] [CrossRef] [PubMed] - Košecká, J.; Zhang, W. Video compass. In European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2002; pp. 476–490. [Google Scholar]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B
**1977**, 1–38. [Google Scholar] - Denis, P.; Elder, J.H.; Estrada, F.J. Efficient edge-based methods for estimating manhattan frames in urban imagery. In European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2008; pp. 197–210. [Google Scholar]
- Tardif, J.P. Non-iterative approach for fast and accurate vanishing point detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Kyoto, Japan, 29 September–2 October 2009; pp. 1250–1257. [Google Scholar]
- Toldo, R.; Fusiello, A. Robust Multiple Structures Estimation with J-Linkage. In European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2008; Volume 1, pp. 537–547. [Google Scholar]
- Bazin, J.C.; Kweon, I.; Demonceaux, C.; Vasseur, P. Rectangle extraction in catadioptric images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, 14–21 October 2007; pp. 1–7. [Google Scholar]
- Hartley, R.I.; Kahl, F. Global optimization through rotation space search. Int. J. Comput. Vis.
**2009**, 82, 64–79. [Google Scholar] [CrossRef] - Schindler, G.; Dellaert, F. Atlanta world: An expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA, 27 June–2 July 2004; Volume 1, pp. I-203–I-209. [Google Scholar]
- Tretyak, E.; Barinova, O.; Kohli, P.; Lempitsky, V. Geometric image parsing in man-made environments. Int. J. Comput. Vis.
**2012**, 97, 305–321. [Google Scholar] [CrossRef] - Antone, M.E.; Teller, S. Automatic recovery of relative camera rotations for urban scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head Island, SC, USA, 13–15 June 2000; Volume 2, pp. 282–289. [Google Scholar]
- Illingworth, J.; Kittler, J. A Survey of the Hough Transform. Comput. Vis. Graph. Image Process.
**1988**, 44, 87–116. [Google Scholar] [CrossRef] - Lee, J.K.; Yoon, K.J. Real-time joint estimation of camera orientation and vanishing points. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1866–1874. [Google Scholar]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis.
**2004**, 60, 91–110. [Google Scholar] [CrossRef] - Gerstmayr, L.; Röben, F.; Krzykawski, M.; Kreft, S.; Venjakob, D.; Möller, R. A Vision-Based Trajectory Controller for Autonomous Cleaning Robots. In Autonome Mobile Systeme; Informatik Aktuell; Springer: Berlin/Heidelberg, Germany, 2009; pp. 65–72. [Google Scholar]
- Gerstmayr-Hillen, L.; Röben, F.; Krzykawski, M.; Kreft, S.; Venjakob, D.; Möller, R. Dense topological maps and partial pose estimation for visual control of an autonomous cleaning robot. Robot. Auton. Syst.
**2013**, 61, 497–516. [Google Scholar] [CrossRef] - Möller, R.; Krzykawski, M.; Gerstmayr-Hillen, L.; Horst, M.; Fleer, D.; de Jong, J. Cleaning robot navigation using panoramic views and particle clouds as landmarks. Robot. Auton. Syst.
**2013**, 61, 1415–1439. [Google Scholar] [CrossRef] - Scaramuzza, D.; Martinelli, A.; Siegwart, R. A Toolbox for Easily Calibrating Omnidirectional Cameras. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Beijing, China, 9–15 October 2006; pp. 5695–5701. [Google Scholar]
- Jähne, B.; Scharr, H.; Körkel, S. Principles of Filter Design. Handb. Comput. Vis. Appl.
**1999**, 2, 125–151. [Google Scholar] - Weickert, J.; Scharr, H. A Scheme for Coherence-Enhancing Diffusion Filtering with Optimized Rotation Invariance. J. Vis. Commun. Image Represent.
**2002**, 13, 103–118. [Google Scholar] [CrossRef] - Bradski, G. The OpenCV library. Dr. Dobbs J.
**2000**, 25, 120–126. [Google Scholar] - Murray, R.M.; Li, Z.; Sastry, S.S. A Mathematical Introduction to Robotic Manipulation; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar]
- Davies, E.R. Image Space Transforms for Detecting Straight Edges in Industrial Images. Pattern Recognit. Lett.
**1986**, 4, 185–192. [Google Scholar] [CrossRef] - Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM
**1981**, 24, 381–395. [Google Scholar] [CrossRef] - Aguilera, D.; Lahoz, J.G.; Codes, J.F. A New Method for Vanishing Points Detection in {3D} Reconstruction From a Single View. Available online: http://www.isprs.org/proceedings/XXXVI/5-W17/pdf/6.pdf (accessed on 31 October 2017).
- Wildenauer, H.; Vincze, M. Vanishing point detection in complex man-made worlds. In Proceedings of the IEEE International Conference on Image Analysis and Processing (ICIAP), Modena, Italy, 10–13 September 2007; pp. 615–622. [Google Scholar]
- Guennebaud, G.; Jacob, B. Eigen v3. 2010. Available online: http://eigen.tuxfamily.org (accessed on 22 September 2017).
- Tordoff, B.J.; Murray, D.W. Guided-MLESAC: Faster image transform estimation by using matching priors. IEEE Trans. Pattern Anal. Mach. Intell.
**2005**, 27, 1523–1535. [Google Scholar] [CrossRef] [PubMed] - Härdle, W.K.; Klinke, S.; Rönz, B. Introduction to Statistics; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Magee, M.J.; Aggarwal, J.K. Determining vanishing points from perspective images. Comput. Vis. Graph. Image Process.
**1984**, 26, 256–267. [Google Scholar] [CrossRef] - Bazin, J.C.; Seo, Y.; Demonceaux, C.; Vasseur, P.; Ikeuchi, K.; Kweon, I.; Pollefeys, M. Globally optimal line clustering and vanishing point estimation in manhattan world. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 638–645. [Google Scholar]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Torr, P.H.; Zisserman, A. MLESAC: A new robust estimator with application to estimating image geometry. Comput. Vis. Image Understand.
**2000**, 78, 138–156. [Google Scholar] [CrossRef] - Chum, O.; Matas, J. Matching with PROSAC-progressive sample consensus. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 220–226. [Google Scholar]
- Nistér, D. Preemptive RANSAC for live structure and motion estimation. Mach. Vis. Appl.
**2005**, 16, 321–329. [Google Scholar] [CrossRef] - Raguram, R.; Frahm, J.M.; Pollefeys, M. Exploiting uncertainty in random sample consensus. In Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 2074–2081. [Google Scholar]

**Figure 1.**A robot tilted relative to the ground plane (gray tiles). The robot is an abstraction of our cleaning-robot prototype from Figure 2. Colored arrows illustrate the coordinate system of an untilted robot. Under the planar-motion assumption, movement is restricted to the x–y plane. Furthermore, rotations may only occur around the blue z-axis, which is orthogonal to the ground plane. This reduces the degrees of freedom from six to three. Here, the robot has been tilted by an angle $\alpha $ in the direction $\beta $, as shown by the robot’s tilted z-axis (black arrow). For reasons of legibility, this illustration shows an exaggerated $\alpha $.

**Figure 2.**The cleaning-robot prototype used to acquire the images used in this work. Images were captured with the panoramic camera, highlighted in the center. In this picture, the robot is shown without its cover.

**Figure 3.**An overview of the tilt-estimation pipeline used in this work. We use one of two different methods, which are shown on the left and right, respectively. First, a single panoramic camera image is edge-filtered (

**top**). Next, edge pixels corresponding to vertical elements are identified (

**top left, right**, shown in red). For the image-space method (Section 2.1), we approximate the tilt as a shifting of the edge pixels within the image. We estimate the shift direction and shift magnitude by fitting a function (bottom left, red line) to two parameters derived from the edge pixels (black dots). These parameters are based solely on the edge pixels’ positions and gradient directions in the image space. The vector-consensus method (Section 2.2) determines a 3D normal vector for each edge pixel. Each of these normals is orthogonal to the direction of the vanishing point. We then estimate this direction from a consensus of the normal vectors (bottom right, blue normals are orthogonal to tilt direction). Finally, we compute the tilt parameters $(\alpha ,\beta )$ from the shift or vanishing-point direction (

**bottom**).

**Figure 4.**An illustration of the vertical element’s vanishing point. This untilted image from our database shows a typical office environment. A blue dot represents the predicted vanishing point for an untilted robot ${\overrightarrow{p}}_{c}$. The red, dashed lines represent vertical elements, which we manually extended to their vanishing point ${\overrightarrow{p}}_{n}$ (red ring, overlapping the blue dot). As expected, the predicted and actual vanishing point of the vertical elements are nearly identical.

**Figure 5.**The result of the Scharr operator applied to the example image from Figure 4. The operator was only applied to a bounding box containing the pixels above the camera horizon. Black and white correspond to a strong dark-bright or bright-dark edge, respectively; gray indicates a weak edge response. We artificially increased the contrast of these images to make the gradients more noticeable.

**Figure 6.**The same as Figure 4, but for a forward-tilted robot (${\alpha}_{T}=4.15\xb0,{\beta}_{T}=0\xb0$). Due to the tilt, the approximated vanishing point ${\overrightarrow{p}}_{a}$ (red dot) is shifted from ${\overrightarrow{p}}_{c}$ (blue dot). Close inspection reveals that the tilt caused a slight curvature in the vertical lines. Thus, the dashed red lines and the resulting ${\overrightarrow{p}}_{a}$ are merely an approximation of the true edges and vanishing point, respectively.

**Figure 7.**The effect of tilts on image edges. This illustration shows the outlines of several cuboids, as seen by our robot’s camera. In Figure 7a, the robot is not tilted. The vertical elements appear as straight lines oriented towards a vanishing point (red dot). Figure 7b shows the effect of an exaggerated tilt, where vertical elements appear as curves. These curves no longer point straight towards the expected vanishing point (blue dot). This point has been shifted from its untilted position (red dot). This figure also contains the untilted outlines in a light shade of red. In the image-space method, we assume that small tilts do not cause the distortions in Figure 7b. Instead, we model the effect of such tilts as a mere shift in the image. This approximation is shown in Figure 7c, where the vanishing point and outlines are shifted without distortion.

**Figure 8.**The camera model used in this work, which was first introduced by Scaramuzza et al. [34]. In Figure 8a, the point $\overrightarrow{X}$ (red circle) lies at a bearing of ${(x,y,z)}^{T}$ (red arrow) relative to the camera center $\overrightarrow{O}$ (black circle). A fisheye lens (light blue shape) projects ${(x,y,z)}^{T}$ onto the point ${(u,v)}^{T}$ (red circle) in an idealized sensor plane (gray square). This nonlinear projection is described by Equation (3) and the camera parameters ${a}_{k}$ from Equation (5). The distance between ${(u,v)}^{T}$ and the image center at ${(0,0)}^{T}$ is specified by $\rho $. Applying Equation (4) to ${(u,v)}^{T}$ gives us the corresponding pixel coordinates ${({u}^{\prime},{v}^{\prime})}^{T}$ in the actual digital image, as shown in Figure 8b. Here, the center of this image has the pixel coordinates ${({x}_{c}^{\prime},{y}_{c}^{\prime})}^{T}$.

**Figure 9.**The parameters associated with an edge pixel k. In this illustration, the edge pixel is shown as a black square with position ${\overrightarrow{p}}_{k}$. This edge pixel has a gradient ${\overrightarrow{g}}_{k}$, represented by a black arrow. From this gradient, we calculate the gradient direction angle ${\phi}_{k}$. The blue line indicates a hypothetical straight edge that is orthogonal to ${\phi}_{k}$ and passes through ${\overrightarrow{p}}_{k}$. This line also passes through the approximated vanishing point at ${\overrightarrow{p}}_{a}$ (white disk). A tilt has shifted ${\overrightarrow{p}}_{a}$ from the untilted vanishing point ${\overrightarrow{p}}_{c}$ (gray disk). This shift is represented by a red line, and described by the angle ${\beta}^{\prime}$ and distance l. The distance between the blue line and ${\overrightarrow{p}}_{c}$ is the edge offset ${s}_{k}$ from Equation (17). Finally, the angle between the blue and red lines is ${\u03f5}_{k}=\frac{\pi}{2}+{\phi}_{k}-{\beta}^{\prime}$. Note that we model the effects of a tilt as a simple shift (Figure 7).

**Figure 10.**The edge pixels from the prefiltered set F, which we extracted from the Scharr-filtered images in Figure 5. Valid edge pixels are shown in color, and are superimposed on the camera image. Figure 10a shows the untilted image from Figure 5, while Figure 10b shows the same location with a forward tilt. We reject pixels with bearings of more than 45° above the horizon, or with a gradient intensity of ${I}_{k}<200$. Pixels with an edge offset $|{s}_{k}|>{s}_{\mathrm{max}}$ were also rejected. In this visualization, the pixel’s hue indicates its gradient direction angle ${\phi}_{k}$. The saturation represents the edge offset ${s}_{k}$, with full saturation and desaturation corresponding to ${s}_{k}=0$ and $|{s}_{k}|={s}_{\mathrm{max}}$, respectively. $({\alpha}_{T},{\beta}_{T})$ are the ground-truth tilt parameters calculated from the measured wheel heights (Section 2.3).

**Figure 11.**The gradient direction angle ${\phi}_{k}$ and edge offset ${s}_{k}$ of the edge pixels shown in Figure 10. Edge pixels are shown as a 2D $(\phi ,s)$ histogram, with darker bins containing more edge pixels. The red, dashed line shows the $(\phi ,s)$ cosine predicted from the ground-truth tilt according to Section 2.1.3. A thin, dashed black line corresponds to $s=0$. Due to noise and non-vertical elements in the environment, some edge pixels noticeably deviate from this ideal curve.

**Figure 12.**The edge pixels from the image in Figure 13, visualized as in Figure 11. The blue line represents the $(\phi ,s)$ cosine fitted to the edge pixels using least squares. In Figure 12a incorrect pixels were not rejected. There is a large error between the estimate (blue) and ground truth (red). Applying RANSAC with a threshold ${\delta}_{s,\mathrm{max}}=5\mathrm{pixels}$ gives a much better result, shown in Figure 12b. Here, the histogram contains only the edge pixels remaining after RANSAC.

**Figure 13.**Edge pixels after prefiltering, visualized as in Figure 10. This image includes numerous incorrect edge pixels. These are caused by near-vertical elements, such as parts of the curved chairs. If they are not rejected, these pixels cause errors in the least-squares tilt estimate. The camera image was captured with a forward tilt of ${\alpha}_{T}=4.15\xb0,{\beta}_{T}=0\xb0$.

**Figure 14.**Using the reject-refit scheme to reject incorrect edge pixels from Figure 13. Edge pixels are visualized through $(\phi ,s)$ histograms, as in Figure 12. Figure 14a shows the remaining edge pixels after the first two iterations. The error between the ground truth (red) and the curve fitted to the remaining pixels ${F}_{2}$ (blue) is reduced, compared to Figure 12a. Once the reject-refit scheme converges for $n=8$, the error shown in Figure 14b is even lower.

**Figure 15.**Vertical elements in the environment and their appearance in the camera image. In Figure 15a, the orientations ${\overrightarrow{o}}_{k}$ for some elements are indicated by arrows. The robot’s movement plane is highlighted in red, with a black arrow representing the plane’s surface normal $\overrightarrow{n}$. Note that the ${\overrightarrow{o}}_{k}$ and $\overrightarrow{n}$ are parallel. Figure 15b shows the robot’s camera view in this location. The vertical elements from Figure 15a appear as straight edges. Colored circles represent the edge pixels that correspond to the vertical elements in Figure 15a. Each such pixel has a position ${\overrightarrow{p}}_{k}$ and gradient vector ${\overrightarrow{g}}_{k}$; the latter is shown by dashed lines. Since the camera image contains only the 2D projections of the 3D ${\overrightarrow{o}}_{k}$, we cannot determine these orientations directly.

**Figure 16.**An environment element with orientation $\overrightarrow{o}$ is projected onto an image plane (gray). This produces an edge pixel with position $\overrightarrow{p}$ and gradient $\overrightarrow{g}$. $\overrightarrow{v}={P}^{-1}\left(\overrightarrow{p}\right)$ is the 3D bearing vector associated with the image position $\overrightarrow{p}$, and similarly $\overrightarrow{u}={P}^{-1}(\overrightarrow{p}+\Delta \overrightarrow{g})$. The normal $\overrightarrow{m}$ specifies the orientation of the blue plane that contains both $\overrightarrow{v}$ and $\overrightarrow{o}$, this plane is only partially drawn as a triangle. We cannot fully determine $\overrightarrow{o}$ from the projection $\overrightarrow{p}$ and $\overrightarrow{g}$ alone. Any element in the blue plane—such as ${\overrightarrow{o}}^{\prime}$ or ${\overrightarrow{o}}^{\prime \prime}$—would result in the same edge pixel. All vectors in this illustration are relative to the camera reference frame. Note that we use a linear camera as an approximation of the actual camera model from Figure 8: this is plausible because the projection is approximately linear within a small radius $\Delta $ around the pixel $\overrightarrow{p}$: within this radius, $\rho \approx \mathrm{const}.$, and thus Equation (3) is a linear projection.

**Figure 17.**The normal vectors ${\overrightarrow{m}}_{k}$ of the edge pixels for an untilted robot. For the sake of clarity, only 500 randomly selected edge pixels are shown. Each edge pixel k is shown as a an arrow, with the orientation representing the normal vector ${\overrightarrow{m}}_{k}$. The base of each arrow lies along the bearing ${\overrightarrow{v}}_{k}$ belonging to the pixel k. Figure 17a is based on the edge pixels extracted from the camera image in Figure 18a. As in Figure 18, we do not include edge pixels with a gradient intensity below ${I}_{\mathrm{min}}=200$. Figure 17b shows the result of prefiltering (${\alpha}_{\mathrm{max}}=7$°) and RANSAC (${\delta}_{o,\mathrm{max}}=3.5$°). As expected for an untilted robot, the remaining ${\overrightarrow{m}}_{k}$ are mostly orthogonal to the vertical axis remain.

**Figure 18.**Edge pixels corresponding to vertical elements, as identified through RANSAC. Edge pixels within the largest set ${C}_{\stackrel{\u02d8}{i},\stackrel{\u02d8}{j}}$ are marked in red. The camera images are the same as used in Figure 10, cropped to the area above the horizon. Edge pixels with a gradient intensity below ${I}_{\mathrm{min}}=200$ were discarded and are not shown. After prefiltering with ${\alpha}_{\mathrm{max}}=7$°, we applied RANSAC with ${\delta}_{o,\mathrm{max}}=3.5$°.

**Figure 20.**The wheel layout of our cleaning-robot prototype. The panoramic camera (black circle) is mounted at the center of a ground plate (light gray). A red X marks the ground-contact points for each wheel, on which the robot rests. Relative to the center of the camera, the caster wheel’s (gray circle) contact point is at $(-{r}_{c},0)$. The contact points of the left and right main wheel (gray rectangles) lie at $(0,\pm {r}_{w})$, respectively. Here, we assume that the contact points are fixed and unaffected by tilts.

**Figure 21.**The fraction of images for which the tilt-estimation error is $\u03f5\le {\u03f5}^{\prime}$. All curves were generated using the parameters from Table 3. For the sake of clarity, this figure was truncated to ${\u03f5}^{\prime}\le 5$. We note that high errors $\u03f5$ can occasionally occur for all methods.

**Figure 22.**The mean tilt-estimation error $\overline{\u03f5}$, depending on the true tilt angle ${\alpha}_{T}$. The pale, dotted or dash-dotted lines represent each method’s $\overline{\u03f5}$ across all images. As in Figure 21, we used the parameters from Table 3. The ground-truth tilt angle ${\alpha}_{T}$ was calculated using Equation (42).

**Figure 23.**The mean tilt-estimation error $\overline{\u03f5}$ for each of the six environments. The methods and parameters used were the same as in Table 3. The pale, dotted or dash-dotted lines represent each method’s $\overline{\u03f5}$ across all images. Although there is some variation, each method’s tilt-estimation results are broadly similar across the different environments.

**Figure 24.**The mean tilt-estimation error $\overline{\u03f5}$, plotted against the mean execution time. The time was measured on the modern desktop system, as described in Section 2.4. Each point represents one parameter combination from Table 2. The points with the lowest $\overline{\u03f5}$ from Table 3 are highlighted in black. As shown here, accepting a slightly higher $\overline{\u03f5}$ can sometimes notably reduce the execution time. We limit this figure to $\overline{\u03f5}\le 3$° and $t\le 10\mathrm{m}\mathrm{s}$. This causes a few points to be omitted, but greatly improves legibility.

**Figure 25.**This variant of Figure 24 shows the results for the embedded system described in Section 2.4. Similar to Figure 24, we limit this figure to $\overline{\u03f5}\le 3$° and $t\le 100\mathrm{m}\mathrm{s}$.

**Figure 26.**The effect of the tilt angle $\alpha $ on the orientation and bearing error in visual pose estimation. For the planar-motion Min-Warping method [10], the pose-estimation errors (blue lines) increase with the tilt angle. In contrast, the red lines show constant errors for a nonplanar method [14] with local visual features [48]. Gray lines mark the tilt angle beyond which the planar-motion error exceeds the nonplanar one; this occurs at about $\alpha >2$°. This figure is based on results from an earlier work, which contains additional details (Figure 19) [1].

**Figure 27.**Figure 27a shows a forward-tilted image (${\alpha}_{T}=4.15,{\beta}_{T}=0$°) captured at the location in Figure 19c. As in Figure 10, the prefiltered edge pixels are highlighted in color. Few of these edge pixels correspond to vertical elements, while incorrect edge pixels are common. Similar to Figure 11, we visualize the parameters $({\phi}_{k},{s}_{k})$ of the edge pixels in Figure 27b. A dashed red line shows the expected relationship between $\phi $ and s for edge pixels from vertical elements. Unlike in Figure 11, few of the edge pixels actually lie close to this line. The solid blue line shows the $(\phi ,s)$ curve for the incorrect $(\alpha ,\beta )$ estimated by the reject-refit scheme. While it is a poor tilt estimate, the curve is a better fit for the $({\phi}_{k},{s}_{k})$ in the histogram. Thus, the poor tilt estimate is likely caused by the incorrect edge pixels. The RANSAC-based estimate suffers from a similar error, as illustrated by the dotted blue line.

**Figure 28.**The true and estimated tilt angle for corrected and uncorrected estimates. For each method, we used the parameters (Table 3) that minimize $\overline{\u03f5}$ after correction. This figure was generated using the factors $a={a}_{L}$ and ${a}^{\prime}={a}_{L}^{\prime}$ for each image location L, which we determined according to Section 2.4. For the uncorrected image-space method, we calculated $\alpha $ using Equation (26). A black line represents a perfect match between ${\alpha}_{T}$ and $\alpha $.

**Table 1.**The ground-truth tilt angle ${\alpha}_{T}$ when raising one wheel by a distance h, as calculated from Equation (42). Raising one of the main wheels results in ${\beta}_{T}=\mp 137$° for the left and right wheel, respectively (Equation (43)). ${\beta}_{T}=0$° when raising the caster wheel, which holds up the rear of the robot. Spacer refers to the metal spacers used to capture tilted images. We also included common objects from a domestic environment that may cause the robot to tilt. The ${\alpha}_{T}$ values in this table were calculated for a stationary robot. For a moving robot, driving-related forces or torques may affect the true tilt angle or direction.

Object | h | ${\mathit{\alpha}}_{\mathit{T}}$ (Main Wheel) | ${\mathit{\alpha}}_{\mathit{T}}$ (Caster) |
---|---|---|---|

Spacer (thin) | 5.0 mm | 1.38° | 2.06° |

Spacer (thick) | 10.1 mm | 2.80° | 4.15° |

Carpets (various) | 4.4 mm to 7.4 mm | 1.22° to 2.05° | 1.81° to 3.04° |

Door threshold | 5.3 mm | 1.47° | 2.18° |

**Table 2.**Parameter values evaluated in our experiments. Depending on the method and variant used, each value for ${I}_{\mathrm{min}}$ was tested with all other values for Q, ${\delta}_{s,\mathrm{max}}$, and ${\delta}_{o,\mathrm{max}}$. This results in $6\times 5=30$ combinations for $({I}_{\mathrm{min}},Q)$, $6\times 6=36$ for $({I}_{\mathrm{min}},{\delta}_{s,\mathrm{max}})$, and $6\times 10=60$ for $({I}_{\mathrm{min}},{\delta}_{o,\mathrm{max}})$.

Parameter | Values Tested |
---|---|

Gradient intensity threshold ${I}_{\mathrm{min}}$ | 100, 150, 200, 250, 300, 600 |

Rejection fraction Q | 0.5, 0.7, 0.8, 0.9 |

Random Sample Consensus (RANSAC) threshold ${\delta}_{s,\mathrm{max}}$ | 2.5, 5, 7.5, 10, 12.5, 15 pixels |

RANSAC threshold ${\delta}_{o,\mathrm{max}}$ | 0.5°, 1°, 1.5°, 2°, 2.5°, 3°, 3.5°, 4°, 4.5°, 5° |

**Table 3.**The mean tilt-estimation error $\overline{\u03f5}$ and standard deviation $\left({\sigma}_{\u03f5}\right)$ for the methods and variants being tested. The last line lists the vector-consensus results achieved when correcting the estimated tilt angle $\alpha $ using the factor ${a}_{L}^{\prime}$ (Section 2.4). For each method, we only list the results for the parameters given in the third column. Out of the possibilities from Table 2, these values gave the lowest $\overline{\u03f5}$. We also included the 50th percentile (median) and 95th percentile of $\u03f5$.

Method | Variant | Parameters | Estimation error $\mathit{\u03f5}$ (°) | ||
---|---|---|---|---|---|

$\overline{\mathit{\u03f5}}$ (${\mathit{\sigma}}_{\mathit{\u03f5}}$) | 50% | 95% | |||

Image Space | RANSAC | ${I}_{\mathrm{min}}=150$, ${\delta}_{s,\mathrm{max}}=10.0$ | 1.17 (1.03) | 0.97 | 3.16 |

reject-refit | ${I}_{\mathrm{min}}=200$, $Q=0.8$ | 0.85 (0.81) | 0.61 | 2.27 | |

Vector Consensus | RANSAC | ${I}_{\mathrm{min}}=600$, ${\delta}_{o,\mathrm{max}}=5.0$° | 1.63 (0.93) | 1.48 | 3.38 |

”, corrected | ${I}_{\mathrm{min}}=300$, ${\delta}_{o,\mathrm{max}}=2.0$° | 1.03 (0.72) | 0.92 | 2.30 |

**Table 4.**The time required for tilt estimation, calculated according to Section 2.4. This table lists the mean, standard deviation ${\sigma}_{t}$ and 50th (median) and 95th percentile, in milliseconds. Times are given for the modern desktop CPU, as well as the embedded CPU carried by our robot prototype. Each method used the parameters listed in Table 3, which gave the lowest mean error $\overline{\u03f5}$. Note that the corrected vector-consensus method appears to be much slower than the uncorrected variant. This is not due to the correction step, which consumes little time. Instead, the corrected variant achieves its lowest $\overline{\u03f5}$ for different parameter values (Table 3). However, these values also lead to longer execution times. Compared to the uncorrected variant, the corrected variant actually gives better $\overline{\u03f5}$ in similar time (Figure 24 and Figure 25).

Method | Variant | Desktop (ms) | Embedded (ms) | ||||
---|---|---|---|---|---|---|---|

Mean (${\mathit{\sigma}}_{\mathit{t}}$) | 50% | 95% | Mean (${\mathit{\sigma}}_{\mathit{t}}$) | 50% | 95% | ||

Image Space | RANSAC | 3.8 (1.43) | 3.7 | 6.3 | 26.0 (6.89) | 25.2 | 41.8 |

reject-refit | 2.5 (0.85) | 2.5 | 4.1 | 15.7 (3.02) | 15.3 | 20.7 | |

Vector Consensus | RANSAC | 3.6 (1.22) | 3.5 | 5.7 | 22.9 (4.69) | 23.2 | 29.4 |

corrected | 6.2 (1.88) | 6.0 | 9.3 | 38.1 (9.72) | 36.9 | 51.8 |

© 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Fleer, D.
Visual Tilt Estimation for Planar-Motion Methods in Indoor Mobile Robots. *Robotics* **2017**, *6*, 32.
https://doi.org/10.3390/robotics6040032

**AMA Style**

Fleer D.
Visual Tilt Estimation for Planar-Motion Methods in Indoor Mobile Robots. *Robotics*. 2017; 6(4):32.
https://doi.org/10.3390/robotics6040032

**Chicago/Turabian Style**

Fleer, David.
2017. "Visual Tilt Estimation for Planar-Motion Methods in Indoor Mobile Robots" *Robotics* 6, no. 4: 32.
https://doi.org/10.3390/robotics6040032