Single-Camera Trilateration

: This paper presents a single-camera trilateration scheme which estimates the instantaneous 3D pose of a regular forward-looking camera from a single image of landmarks at known positions. Derived on the basis of the classical pinhole camera model and principles of perspective geometry, the proposed algorithm estimates the camera position and orientation successively. It provides a convenient self-localization tool for mobile robots and vehicles equipped with onboard cameras. Performance analysis has been conducted through extensive simulations with representative examples, which provides an insight into how the input errors and the geometric arrangement of the camera and landmarks a ﬀ ect the performance of the proposed algorithm. The e ﬀ ectiveness of the proposed algorithm has been further veriﬁed through an experiment.


Introduction
This paper introduces a single-camera trilateration scheme that determines the instantaneous 3D pose (position and orientation) of a regular forward-looking camera, which can be moving, based on a single image of landmarks taken by the camera. It provides a convenient self-localization tool for mobile robots and other vehicles with onboard cameras.
Self-localization is a fundamental problem in mobile robotics. This refers to how a mobile robot localizes itself in its environment. Tremendous effort has been made to study this topic. The approach of this paper is inspired by the principle of trilateration which, given the correspondence of external references, is the most adopted external reference-based localization technique for mobile robots.
Trilateration refers to positioning an object based on the measured distances between the object and multiple references at known positions [1,2], i.e., locating an object by solving a system of equations in the form of: where p 0 denotes the unknown position of the object, p i the known position of the ith reference point, and r i the measured distance between p 0 and p i . (Even when more than three references are involved in positioning, we still call it "trilateration" instead of "multilateration", because "multilateration" has been used to name the positioning process based on the difference in the distances from three or more references [3].) Equation (1) represents the classical ranging-based trilateration problem which depends on measuring the distances between the target object and references. Existing algorithms include: closed-form solutions which have low time complexity, such as the algebraic algorithm proposed by Fang [4] and Ziegert and Mize [5] which refers to the base plane, that by Manolakis [6] which adapts (1) The proposed single-camera trilateration algorithm is fully developed with more systematic derivation and well-adopted notations. (2) The algorithm is developed with high level of completion. In particular, in the preliminary work, the camera orientation estimation algorithm only estimated the directional vector of the camera optical axis; the current work largely enhances the camera orientation estimation by providing a complete estimation of the 3D rotation matrix of the camera frame relative to the environment frame. (3) A comprehensive performance analysis of the proposed single-camera trilateration algorithm is carried out. In the preliminary work, the performance analysis of the algorithm considered mainly the effect of the landmark position errors in the environment and image; in the current work, a much more extensive performance analysis is carried out with the consideration of not only the effect of the landmark errors but also that of the errors in camera parameters. Moreover, along with the new camera orientation estimation algorithm, the corresponding performance indices are re-defined, and an enhanced performance analysis is carried out. In addition, the performance analysis in the current work is carried out with a much larger number of simulation data. (4) An experiment is carefully set up and carried out to further verify the effectiveness of the proposed single-camera trilateration algorithm, and check the estimation error under the combined effect of various input errors. Our preliminary work was simulation-based. The experimental work tests the proposed algorithm in the physical environment, which is a significant addition to the simulation-based performance analysis.
The rest of this paper is laid out as follows. Section 2 will present the derivation of the proposed algorithm. Section 3 will present the error analysis of the proposed algorithm. Section 4 will present the experimental verification. Section 5 will summarize this work.

Notation and Assumptions
The proposed single-camera trilateration algorithm is derived on the basis of the classical pinhole camera model ( Figure 1). Here, X g Y g Z g denotes the global frame of reference attached to the environment in which the camera is moving, and a vector defined in X g Y g Z g is labeled with a superscript "g", e.g., g v; X c Y c Z c denotes the camera frame with the origin at the optical center p 0 and Z c axis pointing along the optical axis from p 0 towards the image center q 0 , and a vector defined in X c Y c Z c is labeled with a superscript "c", e.g., c v; UV denotes the image frame with the origin at the upper left corner of the image, and a vector defined in UV is labeled with a superscript "m", e.g., m v; UVW is the corresponding 3D extension of UV. The vectors in X g Y g Z g , X c Y c Z c and UVW are 3-dimensional, while those in UV are 2-dimensional. A position defined in X g Y g Z g or X c Y c Z c is in the unit of millimeters, while that defined in UV or UVW is in the unit of pixels.
The proposed single-camera trilateration algorithm is derived on the basis of the classical pinhole camera model ( Figure 1). Here, XgYgZg denotes the global frame of reference attached to the environment in which the camera is moving, and a vector defined in XgYgZg is labeled with a superscript "g", e.g., g v; XcYcZc denotes the camera frame with the origin at the optical center p0 and Zc axis pointing along the optical axis from p0 towards the image center q0, and a vector defined in XcYcZc is labeled with a superscript "c", e.g., c v; UV denotes the image frame with the origin at the upper left corner of the image, and a vector defined in UV is labeled with a superscript "m", e.g., m v; UVW is the corresponding 3D extension of UV. The vectors in XgYgZg, XcYcZc and UVW are 3dimensional, while those in UV are 2-dimensional. A position defined in XgYgZg or XcYcZc is in the unit of millimeters, while that defined in UV or UVW is in the unit of pixels.
In particular, p0 denotes the optical center of the camera, and its positions defined in XgYgZg and XcYcZc are denoted by g p0 and c p0, respectively. q0 denotes the image center (also known as the principal point)-The intersection between the optical axis and image plane (which are perpendicular to each other), and its position defined in UV is denoted by m q0. The distance between p0 and q0 is known as the focal length f. Assuming that there are N landmarks captured by the camera (where N′ ≥ 3 for 3D trilateration according to the principle of trilateration), we denote the reference point of the ith landmark as pi, where i∈{1,2,…,N}, and its positions defined in XgYgZg and XcYcZc are denoted by g pi and c pi, respectively. Moreover, qi denotes the projection of pi on the image plane, and its positions defined in XcYcZc and UV are denoted by c qi and m qi, respectively. As a necessary input to the proposed single-camera trilateration algorithm, the set of intrinsic parameters of the involved camera can be retrieved from an off-line camera calibration process, a classic topic which has been addressed extensively in computer vision literature such as [57][58][59]. A well-adopted camera calibration toolbox can also be found at [60]. Using the calibration toolbox, we can determine the set of intrinsic camera parameters, including the focal length m f = [fu,fv] T , image center m q0 = [u0,v0] T , and distortion coefficients, as well as their estimation uncertainty. In particular, fu = fDusu, fv = fDv, where f is the physical focal length (mm), Du the pixel size in the U direction (pixel/mm), Dv the pixel size in the V direction, and su the scale factor. It is known that f, Du, Dv and su are in general impossible to calibrate separately [58][59][60]. Therefore, instead of using f, Du, Dv and su separately, we use fu and fv in our algorithm.
The pinhole model is established on the basis of the calibrated camera parameters and corrected image. In the following derivation, we assume that the image distortions have been corrected according to the calibrated distortion coefficients. We also assume that at least three landmarks have In particular, p 0 denotes the optical center of the camera, and its positions defined in X g Y g Z g and X c Y c Z c are denoted by g p 0 and c p 0 , respectively. q 0 denotes the image center (also known as the principal point)-The intersection between the optical axis and image plane (which are perpendicular to each other), and its position defined in UV is denoted by m q 0 . The distance between p 0 and q 0 is known as the focal length f. Assuming that there are N landmarks captured by the camera (where N ≥ 3 for 3D trilateration according to the principle of trilateration), we denote the reference point of the ith landmark as p i , where i∈{1,2, . . . ,N}, and its positions defined in X g Y g Z g and X c Y c Z c are denoted by g p i and c p i , respectively. Moreover, q i denotes the projection of p i on the image plane, and its positions defined in X c Y c Z c and UV are denoted by c q i and m q i , respectively.
As a necessary input to the proposed single-camera trilateration algorithm, the set of intrinsic parameters of the involved camera can be retrieved from an off-line camera calibration process, a classic topic which has been addressed extensively in computer vision literature such as [59][60][61]. A well-adopted camera calibration toolbox can also be found at [62]. Using the calibration toolbox, we can determine the set of intrinsic camera parameters, including the focal length m f = [f u ,f v ] T , image center m q 0 = [u 0 ,v 0 ] T , and distortion coefficients, as well as their estimation uncertainty.
In particular, f u = fD u s u , f v = fD v , where f is the physical focal length (mm), D u the pixel size in the U direction (pixel/mm), D v the pixel size in the V direction, and s u the scale factor. It is known that f, D u , D v and s u are in general impossible to calibrate separately [60][61][62]. Therefore, instead of using f, D u , D v and s u separately, we use f u and f v in our algorithm.
The pinhole model is established on the basis of the calibrated camera parameters and corrected image. In the following derivation, we assume that the image distortions have been corrected according to the calibrated distortion coefficients. We also assume that at least three landmarks have been captured, segmented, identified and located at m q i in UV. Relevant image processing techniques can be found in existing literature [59,63,64]. Moreover, we assume that the positions of the involved landmarks in the global frame X g Y g Z g , g p i , have been retrieved from a map of known landmarks using existing methods [53][54][55][56][57].
The camera position is defined by the position of its optical center p 0 in X g Y g Z g , g p 0 , and the camera orientation is defined by the rotation matrix of X c Y c Z c with respect to X g Y g Z g , g R c . The proposed algorithm will estimate g p 0 and g R c , which will be discussed in the following two subsections respectively. The following relationships obtained from the pinhole model will be used in the derivation: where u i and v i denote the U and V components of m q i respectively.

Camera Position Estimation
Assuming that N ≥ 3 landmarks are chosen for localizing the camera, we have a system of N independent trilateration equations in X g Y g Z g in the form of: which is in fact Equation (1) defined in X g Y g Z g . In principle, given g p i and r i , one can estimate g p 0 by solving Equation (5). Representing the standard ranging-based trilateration problem, Equation (5) can be solved using one of the existing closed-form or numerical algorithms in the literature [2,[4][5][6][7][8][9][10][11][12][13][14][15][16]19].
Here, g p i are known as the pre-mapped landmark positions, but r i are yet to be determined. In order to determine r i , we can write down a system of totally N (N − 1)/2 equations as: each of which is defined in a triangle ∆p 0 p i p j in X g Y g Z g according to the law of cosine. Here, θ ij denotes the vertex angle associated with p 0 in ∆p 0 p i p j (which is also known as the visual angle that p i p j subtends at p 0 ), and r ij the distance between p i and p j . In principle, given r ij and cosθ ij , one can estimate r i by solving Equation (6). At least two general schemes can be adopted to obtain r i : (1) Solving N equations for N variables: first, choose a subset of N independent equations from Equation (6) then, solve Equation (7) for r 1 , and substitute r 1 into the chosen N equations to solve for {r i |i = 2, . . . ,N} iteratively. When N = 3, (7) is a 4-degree polynomial equation which can be solved in the closed form. When N > 3, (7) is a polynomial equation of degree 8 or greater. Although it has been proven (by Niels Henrik Abel in 1824) that there cannot be any closed-form formula for such a polynomial equation, a number of numerical algorithms are available, such as Laguerre's method [65], the Jenkins-Traub method [66], the Durand-Kerner method [67], Aberth method [68], splitting circle method [69] and the Dandelin-Gräffe method [70]. The roots of a polynomial can also be found as the eigenvalues of its companion matrix [71][72][73]. (2) Solving an associated optimization problem: A nonlinear least-squares problem can be formulated to estimate the optimal r i that minimizes the residue of Equation (6), i.e., where r denotes the vector of {r i |i = 1, . . . ,N}. Search-based optimization algorithms are commonly used to solve this category of problems, including both local optimization algorithms, e.g., the steepest descent method and the Newton-Raphson method (also known as Newton's method), and global optimization algorithms, e.g., the simulated annealing and the genetic algorithm [74][75][76][77][78].
To solve Equations (6)-(8), we need to know r ij and cosθ ij . Here, r ij = ( g p i − g p j ) T ( g p i − g p j ), but cosθ ij are yet to be determined.
In order to determine cosθ ij , we have the following equation in the triangle ∆p 0 q i q j in X c Y c Z c according to the law of cosine: where ρ i = c q T i c q i denotes the distance between p 0 and q i , and denotes the distance between q i and q j . Substituting Equation (4) into Equation (9), we obtain: where Since T are determined through camera calibration and m q i are obtained by segmenting the visual landmarks from the image, cosθ ij can be calculated from Equation (10) directly.
Based on the above derivation, it is clear that the camera can be positioned by reversing the above steps. That is, first calculate cosθ ij from Equation (10), next estimate r i from Equation (6), and then estimate g p 0 from Equation (5).

Camera Orientation Estimation
Knowing g p 0 and the set of g p i , we obtain from Equation (2): where g ∆P is a 3 × N matrix with the ith column as g ∆p i = g p i − g p 0 , and c P is a 3 × N matrix with the ith column as c p i . Here, c p i can be determined by substituting Equation (4) into Equation (3): where c z i can be calculated as Given g ∆P and c P, g R c can in principle be calculated from Equation (11) as However, due to input errors, such as those in reference positioning, camera calibration and image segmentation, the resulting g R c may not strictly be a rotation matrix. In particular, the orthonormality of the matrix may not be guaranteed. Thus, instead of using Equation (14), we go through the following process to obtain a valid rotation matrix.
We define an optimal approximation of g R c as the one that minimizes the residue of Equation (11), i.e., where Tr (M) denotes the trace of the matrix M. Since Tr (M 1 M 2 ) = Tr (M 2 M 1 ) for two matrices M 1 and M 2 , we have: where ADB T is the singular value decomposition of the 3 × 3 matrix c P g ∆P T in which A and B are orthogonal matrices and D is an diagonal matrix with non-negative real numbers on the diagonal. Since B Tg R c A is an orthogonal matrix and D is a non-negative diagonal matrix, we must have: and correspondingly B T g R c A = I.
Then, we obtain from Equation (18) which is guaranteed to be an orthogonal matrix. Based on the above derivation, it is clear that the camera orientation can be determined by first calculating c p i from Equation (12), next conducting the singular value decomposition of the matrix c P g ∆P T , and then getting g R c from Equation (19).

Algorithm Summary
Following the discussions in the above two subsections, we summarize the proposed single-camera trilateration algorithm as Algorithm 1. It estimates the instantaneous 3D pose of a camera, which can be moving, from a single image of landmarks at known positions. Strictly speaking, Algorithm 1 provides a general framework. In particular, Equation (6) in Step (2) and Equation (5) in Step (3) can be solved by a variety of algorithms specific for those steps, as we pointed out.

Algorithm 1: Single-Camera Trilateration in P 3
Input: The camera parameters m q 0 (image center) and m f (focal length), the positions of a set of N ≥ 3 landmarks in X g Y g Z g , { g p i |i∈Z + , 1 ≤ i ≤ N}, and their corresponding projections in UV, { m q i |i∈Z + , 1 ≤ i ≤ N}.
Output: The position g p 0 and orientation g R c of the camera in X g Y g Z g .

Performance Analysis
The input to the proposed single-camera trilateration algorithm consists of the global positions of the involved landmarks g p i , their positions in the image m q i , the camera focal length m f and image center m q 0 . In practice, the uncertainty in camera calibration and inaccuracy of image segmentation cause errors in m f, m q 0 and m q i , and imperfect mapping brings errors to g p i . In this section, we analyze the effect of these input errors on the accuracy of the estimation output of the camera position g p 0 and orientation g R c .

Performance Indices
We define an input vector x which contains all the above input parameters, i.e., x = [ g p i Tm q i Tm f T m q 0 T ] T . Then g p 0 is a function of x, i.e., g p 0 = g p 0 (x). Denoting the actual value and random error of x as x and δx respectively, we have x = x + δx. Correspondingly, the actual value and output estimate of g p 0 can be written as g p 0 (x) and g p 0 (x) = g p 0 (x + δx), respectively, and the estimation error of g p 0 is We denote the errors of input quantities g p i , m q i , m f and m q 0 as δ g p i , δ m q i , δ m f and δ m q 0 respectively. Adopting the classical scheme of error analysis used in [2,6], we evaluate the effects of δ g p i , δ m q i , δ m f and δ m q 0 on δ g p 0 , respectively. For any of these input error vectors (generally denoted as δv), we assume that the vector components are zero-mean random variables and uncorrelated with one another with a common standard deviation δv. We adopt two well-accepted performance indices [2,6], the normalized total bias B p which represents the sensitivity of the systematic position estimation error to the input error, and the normalized total standard deviation error S p which represents the sensitivity of the position estimation uncertainty to the input error: where |a|denotes the norm of a vector a, Tr (M) denotes the trace of a matrix M, E(.) denotes the mean of a random variable, and var(.) denotes the variance of a random variable. Similar to g p 0 , g R c is also a function of x, i.e., g R c = g R c (x). To make the presentation consistent, we discuss the orientation estimation error in the vector form, and correspondingly define an error metric of g R c as: where in the unit of degrees. Here n i denotes the ith column of g R c which corresponds to the x, y or z directional vector of the camera frame with respect to the global frame. As a result, δ g θ c shows the angular difference between the actual and erroneous camera orientations. Similar to Equations (20) and (21), we define the corresponding performance indices-the normalized total bias B θ and the normalized total standard deviation error S θ as: where B θ represents the sensitivity of the systematic orientation estimation error to the input error, and S θ represents the sensitivity of the orientation estimation uncertainty to the input error. In the following performance analysis, we will report the variation of B p , S p , B θ and S θ across the simulated spaces of g p 0 under the representative σ v values of δ g p i , δ m q i , δ m f and δ m q 0 .

Simulation Settings
A performance analysis has been conducted on the proposed single-camera trilateration algorithm with representative examples in which a moving camera is localized in P 3 based on three landmarks, simulated in Matlab.
In the simulated global frame of reference, the landmarks (reference points) are placed at They form an equilateral triangle on the XY plane, inscribed in a circle centered at the origin of the global frame with a radius of r = 500 length units ( Figure 2).
Appl. Sci. 2019, 9, x FOR PEER REVIEW 9 of 32 Similar to g p0, g Rc is also a function of x, i.e., To make the presentation consistent, we discuss the orientation estimation error in the vector form, and correspondingly define an error metric of g Rc as: in the unit of degrees. Here ni denotes the ith column of g Rc which corresponds to the x, y or z directional vector of the camera frame with respect to the global frame. As a result, δ g θc shows the angular difference between the actual and erroneous camera orientations. Similar to Equations (20) and (21), we define the corresponding performance indices-the normalized total bias Bθ and the normalized total standard deviation error Sθ as: where Bθ represents the sensitivity of the systematic orientation estimation error to the input error, and Sθ represents the sensitivity of the orientation estimation uncertainty to the input error.
In the following performance analysis, we will report the variation of Bp, Sp, Bθ and Sθ across the simulated spaces of g p0 under the representative σv values of δ g pi, δ m qi, δ m f and δ m q0.

Simulation Settings
A performance analysis has been conducted on the proposed single-camera trilateration algorithm with representative examples in which a moving camera is localized in Ρ 3 based on three landmarks, simulated in Matlab.
In the simulated global frame of reference, the landmarks (reference points) are placed at    Ridge Peak Bp(δ g pi)

Peak
Sθ(δ g pi) Figure 3. B p (δ g p i ), S p (δ g p i ), B θ (δ g p i ) and S θ (δ g p i ) with σ p = 25 and R = 3000. (a) Surface plot and contour plot of B p (δ g p i ); (b) Surface plot and contour plot of S p (δ g p i ); (c) Surface plot and contour plot of B θ (δ g p i ); (d) Surface plot and contour plot of S θ (δ g p i ).
To study the effect of the geometric arrangement of the camera relative to the landmarks on the performance indices, we let the camera move on a data acquisition sphere S centered at the origin of the global frame with a radius of R, and keep the optical axis of the camera pointing towards the origin. We define a tilt angle θ as the angle between Z g and −n 3 , and a pan angle φ as the angle between X g and the orthogonal projection of −n 3 on the XY plane ( Figure 2). To study the effect of the observation distance on the performance indices, we test the algorithm with different values of R. To keep the inscribing circle of the landmarks in the camera field of view all the time, we use R ≥ 3000. It means that the required maximal angle of view is about 19 • (corresponding to R = 3000 and θ = 0 • ), which is a mild requirement met by a large number of lenses. For a specific R, by changing the tilt angle θ and pan angle φ, we move the camera on the sphere, and check the variation of the performance indices on the sphere.
The intrinsic camera parameters used in the simulation are obtained by calibrating a real camera (a MatrixVision BlueFox−120 640 × 480 CCD (Charged Coupled Device) grayscale camera with a Kowa 12 mm lens) using the camera calibration toolbox in [62] and a printed black/white checkerboard pattern. In particular, m f = [1627.5609, 1629.9348] T and m q 0 = [333.9088, 246.3799] T .
The standard deviations of input errors are chosen mainly according to the calibration results of the above camera and lens. Specifically, using the camera calibration toolbox in [62] to calibrate the camera and lens, we obtained the estimation error for m f as [1.1486, 1.1069]. Since the estimation errors are approximately three times the standard deviation of δ m f [62], a representative standard deviation of the δ m f components can be taken as σ f = 0.4. Meanwhile, the estimation error for m q 0 was obtained as [1.9374, 1.5782], which means that a representative standard deviation of the δ m q 0 components can be taken as σ c = 0.6. In addition, we take σ p = 25 as a representative standard deviation of the δ g p i components, which corresponds to 5% of the base radius r. Besides, σ q = 0.5 is taken as a representative standard deviation of the δ m q i components based on a conservative estimation of the average image segmentation operation.
For performance analysis, we check the effect of δ g p i , δ m q i , δ m f and δ m q 0 separately on δ g p 0 and δ g θ c . For each of these input error vectors, we assume that the vector components are zero-mean random variables following the corresponding Gaussian distribution and uncorrelated with one another with the same standard deviation; we discretize the data acquisition sphere S with a set of equally gapped θ and φ; for each point (θ, φ) on S, 10,000 samples of the erroneous input are generated according to the random distribution; the corresponding erroneous output estimates are obtained using the proposed single-camera trilateration algorithm; by comparing the erroneous estimates with the correct camera pose corresponding to (θ, φ) on S, the statistics of the output error are generated, and then the performance indices are calculated.
In order to optimally estimate g p 0 and r i , we carry out the steps (2) and (3) of Algorithm 1 by solving the corresponding least-squares problems of Equations (5) and (6) respectively. The Newton-Raphson method [74] is used to search for the solutions, which has a sufficient rate of convergence. To avoid being trapped in local minima, for continuous camera position estimation, the previous camera position is used as the initial guess to estimate the current camera position, which guarantees the convergence of the solution to the actual camera position.

Trends of Performance Indices under Representative Input Error Standard Deviations
To evaluate the impact of δ g p i , δ m q i , δ m f and δ m q 0 on δ g p 0 and δ g θ c , B p , S p , B θ and S θ are at first estimated with representative input error standard deviations across the data acquisition sphere S with R = 3000. Specifically, B p (δ g p i ), S p (δ g p i ), B θ (δ g p i ) and S θ (δ g p i ) are estimated with σ p = 25 ( Figure 3); B p (δ m f), S p (δ m f), B θ (δ m f) and S θ (δ m f) are estimated with σ f = 0.4 ( Figure 4); B p (δ m q 0 ), S p (δ m q 0 ), B θ (δ m q 0 ) and S θ (δ m q 0 ) are estimated with σ c = 0.6 ( Figure 5); and B p (δ m q i ), S p (δ m q i ), B θ (δ m q i ) and S θ (δ m q i ) are estimated with σ q = 0.5 ( Figure 6). Since S is vertically symmetrical to the base plane (XY plane) and horizontally symmetrical to the vertical planes containing the medians of the equilateral base triangle, we only need to display the variation of B p , S p , B θ and S θ across a patch constrained by   For the effect of the landmark position error δ g p i :

Variation of Performance Indices under Variation of Input Error Standard Deviations
(1) As shown in Figure 3a, B p (δ g p i ) in general decreases as θ increases from 0 • towards 90 • , although its maximum may not take place exactly at θ = 0 • . However, it forms a ridge when θ approaches 90 • , which is lower than the crest near θ = 0 • . When σ p = 25, the ratio between the two peaks is about 6.73.
(2) As shown in Figure 3b-d respectively, S p (δ g p i ), B θ (δ g p i ) and S θ (δ g p i ) in general decrease as θ increases from 0 • to 90 • , although their maxima may not be exactly at θ = 0 • .
For the effect of the imaging-related errors δ m q i , δ m f and δ m q 0 : (1) As shown in Figures 4a and 5a, as θ increases from 0 • , B p (δ m f) and B p (δ m q 0 ) increase at first to form a high peak, and then decrease generally as θ increases further towards 90 • . However, they reach a second, local peak when θ approaches 90 • , which is much lower than the first peak. Compared with the first peak, the second peak is far less significant: when σ f = 0.4, the ratio between the peaks of B p (δ m f) is about 1095.84; when σ c = 0.6, the ratio between the peaks of B p (δ m q 0 ) is about 1525.50. (2) As shown in Figure 4b-d, Figures 5b-d and 6, as θ increases from 0 • , B p (δ m q 0 ), S p (δ m f), S p (δ m q 0 ), S p (δ m q i ), B θ (δ m f), B θ (δ m q 0 ), B θ (δ m q i ), S θ (δ m f), S θ (δ m q 0 ) and S θ (δ m q i ) all increase at first to form a high peak, and then decreases as θ increases further towards 90 • .
The general decreasing trend of these performance indices towards θ = 90 • implies that the camera pose estimation is less sensitive to the input errors as the observations of landmarks are made more parallel and closer to the base plane. Moreover, the non-circular patterns from the contour plots in Figures 3-6 reflect the influence of the geometric arrangement of the landmarks.

Variation of Performance Indices under Variation of Input Error Standard Deviations
To examine the variation of B p , S p , B θ and S θ under the variation of the input error standard deviations, we compare the values of these performance indices obtained with a set of different values of the input error standard deviations on the data acquisition sphere S with R = 3000. Specifically, B p (δ g p i ), S p (δ g p i ), B θ (δ g p i ) and S θ (δ g p i ) are estimated with σ p = {25,50,75} (  Figure 10). For the convenience of visualization, we compute the mean of each performance index across φ at each θ, plot the curve of variation for this mean value with respect to θ, and include the standard deviation across φ as the vertical error bar at each θ. Each vertical error bar has a length equal to the standard deviation both above and below the specific mean value at a specific θ. Moreover, to accommodate the large span between the maxima and minima in these performance indices, we plot the data as logarithmic scale for the vertical axis.

Variation of Performance Indices under Variation of Observation Distance
To examine the variation of Bp, Sp, Bθ and Sθ under the variation of R, we compare the values of these performance indices obtained with representative input error standard deviations on different data acquisition spheres with different radii R = {3000, 4000, 5000} (Figures 11-14). For the convenience of comparison, we adopt the same scheme of data plotting as Figures 7-10.
For the effect of δ g pi, the simulation results show that, as R increases, (1) B p (δ g p i ), S p (δ g p i ) and S θ (δ g p i ) have lower peaks, resulting in flatter variation across S (Figure 7a,b,d); (2) B θ (δ g p i ) decreases across S (Figure 7c).
For the effect of δ m q i , δ m f and δ m q 0 , the simulation results show that, as σ q , σ f and σ c increase, m f), B θ (δ m q 0 ) and B θ (δ m q i ) tend to have lower crests and decrease across the rest of S (Figures 8, 9 and 10a,c); (2) S p (δ m f), S p (δ m q 0 ), S p (δ m q i ), S θ (δ m f), S θ (δ m q 0 ) and S θ (δ m q i ) tend to have lower crests and are essentially independent of the variation of σ f , σ c and σ q for the rest of S (Figures 8, 9 and 10b,d).
The decrease in these performance indices across the whole or portion of S, corresponding to the increase in the input error standard deviations, means that the proposed single-camera trilateration algorithm has a reduced sensitivity to higher input uncertainty under the corresponding geometric arrangements of the camera and landmarks.

Variation of Performance Indices under Variation of Observation Distance
To examine the variation of B p , S p , B θ and S θ under the variation of R, we compare the values of these performance indices obtained with representative input error standard deviations on different data acquisition spheres with different radii R = {3000, 4000, 5000} (Figures 11-14). For the convenience of comparison, we adopt the same scheme of data plotting as Figures 7-10. This means that a longer observation distance tends to increase the sensitivity of the positioning bias and uncertainty to the focal length error, and the sensitivity of the positioning and orienting bias and uncertainty to the landmark segmentation error while, except for causing the peaks to shift, it has little impact on the orienting bias and uncertainty due to the focal length error, and the positioning and orienting bias and uncertainty due to the image center error.

Error-Prone Regions in Performance Indices
Figures 4-6, 8-10 and 12-14 indicate that a sharp high peak consistently appears in Bp, Sp, Bθ and Sθ of δ m f, δ m qi and δ m q0. Each performance index increases to form a high peak as θ increases from 0°, and in general decreases as θ further increases towards 90°. (Though Bp(δ m f) and Bp(δ m q0) have a second peak later on, it is not so significant as the first peak). The high peak stands out sharply from its neighborhood. On the other side, the crests and transitions in Figures 3, 7 and 11, which show the effect of landmark position errors on the performance indices, are relatively mild.
This high peak is of particular interest. It reflects: (1) The camera localization algorithm is more sensitive to the errors from the calibration of the camera parameters and segmentation of the landmarks from the image than the errors in positioning the landmarks in the environment. (2) The neighborhood of a high peak represents a range of camera orientations (relative to the base plane) from which the camera localization is highly sensitive to the errors in camera parameters and image segmentation, resulting in high estimation bias and uncertainty. Thus, improving the accuracy of camera calibration and image segmentation will help to improve the accuracy of camera localization.
Moreover, knowing the values of θ corresponding to the peaks will help to avoid those errorprone regions in practice. The simulation results reveal that: For the effect of δ g p i , the simulation results show that, as R increases, (1) B p (δ g p i ) and S p (δ g p i ) increases across S (Figure 11a,b); (2) B θ (δ g p i ) and S θ (δ g p i ) have very small changes and can be considered being essentially independent of the variation of R (Figure 11c,d).
It means that a longer observation distance tends to increase the sensitivity of the positioning bias and uncertainty to the landmark position error, while it has little impact on the corresponding estimation bias and uncertainty of the camera orientation.
For the effect of δ m q i , δ m f and δ m q 0 , the simulation results show that, as R increases, This means that a longer observation distance tends to increase the sensitivity of the positioning bias and uncertainty to the focal length error, and the sensitivity of the positioning and orienting bias and uncertainty to the landmark segmentation error while, except for causing the peaks to shift, it has little impact on the orienting bias and uncertainty due to the focal length error, and the positioning and orienting bias and uncertainty due to the image center error.  Figure 9, Figure 10, Figures 12-14 indicate that a sharp high peak consistently appears in B p , S p , B θ and S θ of δ m f, δ m q i and δ m q 0 . Each performance index increases to form a high peak as θ increases from 0 • , and in general decreases as θ further increases towards 90 • . (Though B p (δ m f) and B p (δ m q 0 ) have a second peak later on, it is not so significant as the first peak). The high peak stands out sharply from its neighborhood. On the other side, the crests and transitions in Figures 3, 7 and 11, which show the effect of landmark position errors on the performance indices, are relatively mild.

Error-Prone Regions in Performance Indices
This high peak is of particular interest. It reflects: (1) The camera localization algorithm is more sensitive to the errors from the calibration of the camera parameters and segmentation of the landmarks from the image than the errors in positioning the landmarks in the environment. (2) The neighborhood of a high peak represents a range of camera orientations (relative to the base plane) from which the camera localization is highly sensitive to the errors in camera parameters and image segmentation, resulting in high estimation bias and uncertainty.
Thus, improving the accuracy of camera calibration and image segmentation will help to improve the accuracy of camera localization.
Moreover, knowing the values of θ corresponding to the peaks will help to avoid those error-prone regions in practice. The simulation results reveal that: (1) Given a specific geometric arrangement of the landmarks and a specific observation distance, the peaks of all these performance indices appear at the same θ value, as shown in Figures 8-10; (2) The θ value at which the peaks appear shifts as the observation distance changes, as shown in  This means that the θ value corresponding to the peaks in the above performance indices is mostly determined by the geometric arrangement of the landmarks and camera.
Although a closed-form estimation of the peak θ value is not available yet, a close approximation can be found through simulations. Specifically in our example where the three reference points define an equilateral triangle, the simulation results show that the peaks occur when the camera is located above the inscribing circle of the base triangle. That is, the θ value corresponding to the peaks is well approximated as: where r denotes the base radius. This relationship has been verified by comparing the numerically estimated peak locations in these performance indices with θ calculated from Equation (26) ( Table 1). Since the θ value at which the peaks appear is consistent across B p , S p , B θ and S θ of δ m f, δ m q i and δ m q 0 , only B p (δ m f) is reported here as a representative. Table 1 shows a high consistency between the numerically estimated θ and that calculated from Equation (26). Since the high peaks in these performance indices mean high bias and uncertainty in camera pose estimation, tracking θ at which the observation is made can alert the localization process the error-prone region in practice.
The above discussion shows that the singular cases represented by the sharp high peaks existing in the plots of performance analysis are related to both the spatial relationship between the camera and landmarks and the usage of camera imaging as the input. Equation (26) provides an effective tool for predicting/estimating potentially singular observation positions. Ultimately, a closed-form representation of the performance indices will lead to analytical analysis of the singular cases. This will be explored in our future work.

Experimental Test
An experiment was also carried out to further verify the effectiveness of the proposed single-camera trilateration algorithm, and check the estimation error under the combined effect of various input errors.
A landmark pattern was designed as shown in Figure 15a. It consists of a 679 mm × 679 mm square-shaped checker board pattern and a circle co-centered with the checker board and with a diameter of 679 mm. There are 12 landmark dots equally spaced on the circle, as shown in Figure 15b. The pattern was hung under the ceiling of an indoor experimental space with the base plane parallel to the flat floor. We used the base plane as the XY plane of the global frame of reference, with the Z axis pointing downwards and the center of the pattern as the origin of the frame. The exact camera, which was calibrated to provide the realistic values of camera parameters for the above simulations, was installed on a Directed Perception PTU-D46-17 controllable pan-tile unit on the top of a Pioneer 3-DX mobile robot ( Figure 16) such that the camera was kept on a plane parallel to the base plane at a distance of 1600 mm. We moved the camera (by moving the mobile robot and adjusting the pan-tilt unit) to a number of locations at different observation distances R (which were defined in Section 3.2), one location at one distance. At each location (each R), we adjusted the tilt angle θ and pan angle ϕ (which were defined in Section 3.2) of the camera using the pan-tile unit such that the center of the landmark pattern fell on the image center of the camera, and thus kept the camera oriented towards the landmark pattern. At each location, an image of the landmark pattern was taken by the camera, and stored in the onboard laptop computer.
The collected landmark images were processed after the data acquisition process. Focusing on The exact camera, which was calibrated to provide the realistic values of camera parameters for the above simulations, was installed on a Directed Perception PTU-D46-17 controllable pan-tile unit on the top of a Pioneer 3-DX mobile robot ( Figure 16) such that the camera was kept on a plane parallel to the base plane at a distance of 1600 mm. The exact camera, which was calibrated to provide the realistic values of camera parameters for the above simulations, was installed on a Directed Perception PTU-D46-17 controllable pan-tile unit on the top of a Pioneer 3-DX mobile robot ( Figure 16) such that the camera was kept on a plane parallel to the base plane at a distance of 1600 mm. We moved the camera (by moving the mobile robot and adjusting the pan-tilt unit) to a number of locations at different observation distances R (which were defined in Section 3.2), one location at one distance. At each location (each R), we adjusted the tilt angle θ and pan angle ϕ (which were defined in Section 3.2) of the camera using the pan-tile unit such that the center of the landmark pattern fell on the image center of the camera, and thus kept the camera oriented towards the landmark pattern. At each location, an image of the landmark pattern was taken by the camera, and stored in the onboard laptop computer. We moved the camera (by moving the mobile robot and adjusting the pan-tilt unit) to a number of locations at different observation distances R (which were defined in Section 3.2), one location at one distance. At each location (each R), we adjusted the tilt angle θ and pan angle φ (which were defined in Section 3.2) of the camera using the pan-tile unit such that the center of the landmark pattern fell on the image center of the camera, and thus kept the camera oriented towards the landmark pattern. At each location, an image of the landmark pattern was taken by the camera, and stored in the onboard laptop computer.
The collected landmark images were processed after the data acquisition process. Focusing on evaluating the proposed single-camera trilateration algorithm itself, we segmented the landmarks from each image manually after correcting the image distortion. Because the camera optical center is not a physical point on the camera, it is highly difficult, if not impossible, to directly measure the position of the camera optical center and the orientation of the camera optical axis. Thus it is highly difficult to evaluate the localization accuracy of the proposed algorithm according to the golden truth of the camera poses in the global frame. Instead, we chose to compare the camera localization results from the proposed single-camera trilateration algorithm with those obtained from camera calibration (using the camera calibration toolbox in [62]). The camera calibration process calibrated both the intrinsic parameters of the camera, which were input to the proposed algorithm for camera pose estimation, and the extrinsic parameters associated with each calibration image of the checker board pattern, which were used as the reference camera pose estimate to reflect the localization accuracy of the proposed algorithm.
To evaluate the accuracy of the proposed single-camera localization algorithm at different observation distances, we would like to obtain a collection of camera localization results corresponding to different pan angles φ at each observation distance R so that we can derive the statistics of localization accuracy at each R, where, as we adjusted the pan/tile unit to align the camera optical axis with the center of the landmark pattern at each R, the tilt angle θ is fixed at each R. Due to the size limitation on the experimental space, it was difficult for us to pan the camera relative to the Z axis of the global frame all around as we kept increasing R. Instead, we took advantage of the design of our landmark pattern. With 12 landmark dots equally spaced on the circle as shown in Figure 15b, we have 12 equally spaced landmark sets labeled as (1,5,9), (2, 6, 10), (3, 7, 11), (4,8,12), (5, 9, 1), (6, 10, 2), (7, 11, 3), (8,12,4), (9, 1, 5), (10, 2, 6), (11,3,7) and (12,4,8). As mentioned above, at each R (and θ), we took one camera image of the landmark pattern. By using different 3-landmark sets from the same image, we in fact localize the camera corresponding to different φ. For example, using landmark set (2, 6, 10) is equivalent to panning the camera from landmark set (1,5,9) about the global Z direction by ∆φ = 30 • . In this way, instead of physically panning the camera, by using different landmark sets to localize the camera, we equivalently obtained the camera localization results corresponding to different φ. On the other side, at each R (and θ), the reference camera pose estimate corresponding to each landmark set was obtained by rotating the calibrated camera pose, which was calculated based on the image of the checker board pattern at R (and θ) using the camera calibration toolbox and corresponds to the landmark set (1,5,9), about the global Z axis with a proper ∆φ.
The localization results from the proposed single-camera localization algorithm are compared with those from the camera calibration process in Table 2. The calibrated camera pose is used as the reference estimate, and the corresponding calibration error reported by the camera calibration toolbox is presented as the standard deviation of the camera pose calibration. The difference between the camera pose estimated using the proposed algorithm and the calibrated camera pose is recorded as ∆p 0 (position estimation difference) and ∆θ c (orientation estimation difference), and the mean and standard deviation of these differences are reported. The results in Table 2 show that on average the proposed algorithm attains a localization estimation very close to that attained with camera calibration. The average difference in position estimation is only about 6 mm at an observation distance of R = 4.3 m (corresponding to R h = 4 m in Table 2), and is smaller with shorter observation distances; the average difference in orientation estimation remains lower than 0.3 degree at an observation distance of R ≤ 4.3 m. While camera calibration cannot be used for real-time localization of a moving camera, the proposed algorithm, which provides a comparable estimation of the camera pose on the go, presents an effective online localization approach for many mobile robot applications. We also notice that the difference in camera position estimation between the two compared methods and the associated standard deviation increase as the observation distance increases. This is mainly because the effective size of the pixels increases as the distance increases. Since the proposed algorithm depends on the segmented landmarks from the same image, the camera positioning error clearly increases as the observation distance and thus the effective pixel size increase; while the camera position estimation error with camera calibration increases much slower, because the calibration process takes advantage of all the images taken at different distances and all the grid corners of the checker board pattern to average out the effect of the increasing effective pixel size. On the other side, the difference in camera orientation estimation between the two compared methods and the associated standard deviation do not appear to be affected as the observation distance increases, due to the cancellation between the effect of the observation distance and that of the effective pixel size. Other factors contributing to the localization error of both methods include the calibration error of the camera intrinsic parameters, and the positioning errors of the landmarks and grid corners due to the printing inaccuracy of the landmark pattern and manual measurement inaccuracy. Table 2. Camera pose estimation error-the proposed algorithm versus camera calibration 1 .   In addition, to test the speed of the proposed single-camera trilateration algorithm, we implemented a compiled standalone executable (exe file) of the algorithm on a laptop PC with a 1.66 GHz Intel Core 2 CPU, when processing the experimental data. The average estimation speed was about 0.06762 s. In our implementation, the main time-consuming steps were to solve Equations (5) and (6) by using the Newton-Raphson method to search for the solutions of the associated least-squares optimization problems in order to obtain an optimal camera pose estimation.

Conclusions and Future Work
This paper has introduced an effective single-camera trilateration scheme for regular forward-looking cameras. Based on the classical pin-hole model and principles of perspective geometry, the position and orientation of such a camera can be calculated successively from a single image of a few landmarks at known positions. The proposed scheme is an advantageous addition to the category of trilateration techniques: (1) The camera position is trilaterated from multiple simultaneously captured landmarks, which naturally resolves the common concern of the synchronization among multiple distance measurements in ranging-based trilateration systems. (2) The estimation of the camera orientation becomes an integrated part of the proposed trilateration scheme, taking advantage of the geometrical information contained in a single image of multiple landmarks.
(3) The instantaneous camera pose is estimated from only one single image, which naturally resolves the issue of mismatching among a pair of images in the stereovision-based trilateration.
The proposed single-camera trilateration scheme provides a convenient tool for the self-localization of mobile robots and other vehicles which are equipped with cameras. Depending on the identification of the landmarks involved, the proposed algorithm targets environments with identifiable landmarks, such as indoor and outdoor environments with artificial landmarks, metropolitan environments, and natural environments with distinct landmarks.
The performance of the proposed algorithm has been analyzed in this work based on extensive simulations with a representative set of geometric arrangements among the landmarks and camera and a representative set of input errors. In practice, simulations with the settings of a targeted environment and the input errors of a targeted camera will provide an effective way to predict the algorithm performance in the specific environment and system. The simulation results will provide valuable guidance for the implementation of the proposed algorithm, and facilitate mobile robots to achieve accurate self-localization by avoiding sensitive regions of the performance indices.
Nevertheless, a closed-form solution will further reduce the time complexity of the proposed algorithm, and a closed-form estimation of the performance indices will enable more efficient performance analysis and prediction of the proposed algorithm. As pointed out, existing closed-form solutions can be adopted into the steps (2) and (3) of the proposed framework (Algorithm 1) to solve the 3-landmark case. However, it is challenging to find a closed-form solution which is globally optimal and applies to the general case of 3D trilateration based on more than 3 landmarks. This will be considered in our future work.
Moreover, we target the development of an automatic localization system for mobile robot self-localization based on the proposed single-camera trilateration scheme. The experimental system shown in Figure 16 has provided the necessary onboard hardware for the proposed scheme. Our future work will focus on the development of the software system and supporting vision algorithms.
In addition, over the years of research and development, numerous localization algorithms/approaches in different categories have been proposed in the literature, from deterministic to probabilistic, from ranging-based to image-based. The work of this paper is carried out within the scope of trilateration, with the intention to provide a convenient addition to this category of localization approaches. Meanwhile, the development in other categories of localization approaches are highly significant, e.g., the approaches in the big category of computer vision-based localization [79][80][81]. Our future work on localization may expand into those categories, and comparison study among related approaches is expected.