A Novel Method for the Absolute Pose Problem with Pairwise Constraints

Absolute pose estimation is a fundamental problem in computer vision, and it is a typical parameter estimation problem, meaning that efforts to solve it will always suffer from outlier-contaminated data. Conventionally, for a fixed dimensionality d and the number of measurements N, a robust estimation problem cannot be solved faster than O(N^d). Furthermore, it is almost impossible to remove d from the exponent of the runtime of a globally optimal algorithm. However, absolute pose estimation is a geometric parameter estimation problem, and thus has special constraints. In this paper, we consider pairwise constraints and propose a globally optimal algorithm for solving the absolute pose estimation problem. The proposed algorithm has a linear complexity in the number of correspondences at a given outlier ratio. Concretely, we first decouple the rotation and the translation subproblems by utilizing the pairwise constraints, and then we solve the rotation subproblem using the branch-and-bound algorithm. Lastly, we estimate the translation based on the known rotation by using another branch-and-bound algorithm. The advantages of our method are demonstrated via thorough testing on both synthetic and real-world data


Introduction 1.Background
Camera pose estimation is a critical and fundamental problem in computer vision [6] and robotics [17].The problem of estimating the absolute pose of a calibrated camera given a certain number of correspondences between 3D world points and 2D image projection points is known as the Perspective-n-Point (PnP) problem [24].It arises as a subtask in many different applications (e.g., robot vision navigation [33] and camera localization [29]).
Mathematically, the absolute pose estimation problem, i.e., the problem of estimating the pose parameters (rotation and translation) given certain observations (3D points and 2D points), is a typical parameter estimation problem [13](also a fitting problem [11]).This problem has been studied for more than a century, and researchers have proposed many methods [34,30,9,5] of improving the solution speed, accuracy, and robustness to outliers.However, a recent study [10] has shown that fitting a model to data with outliers is an NP-hard problem.A somewhat promising result is that for a fixed dimensionality d, a robust estimation problem can be solved in polynomial time in the number of measurements N [12].However, this does not imply that a generalized robust estimation problem with outliers can be solved efficiently because a generalized robust fitting problem is a W [1]-hard problem in d dimensions [10] and, more specifically, it cannot be solved faster than O(N d ) [9,14].Furthermore, it is almost impossible to remove d from the exponent of the run time of a globally optimal algorithm [10].
In addition, we can show the "hardness" of the absolute pose estimation with corrupted data from the optimization perspective.Generally, a robust absolute pose estimation problem is always a nonconvex optimization problem [13,30].There are two reasons why robust absolute pose estimation must be formulated as a hard problem.One reason is that the objective function for robust estimation should be robust loss functions, which are always nonconvex functions.The other reason is that the robust estimation problem is optimized in SE (3), which corresponds to two totally different manifolds, rotation (R ∈ SO(3))) and translation (t ∈ R 3 )).Although there are already some solid theories regarding convex optimization in R 3 [37] and SO(3) [2] separately, robust estimation in SE(3) still seems to be a difficult problem [30].In addition, the dimensionality of SE(3) is six, which increases the hardness of the robust estimation problem.
In other words, when there are mismatches in the 2D-3D correspondences, the absolute pose estimation problem is a rather hard problem.However, in practical applications, outliers are inevitable and will lead to a significant decrease in accuracy for pose estimation [18].Fortunately, the absolute pose estimation problem is a geometric fitting problem and thus may be efficiently solved by considering geometric constraints.
In this paper, we decouple the rotation and translation subproblems using pairwise constraints, thus reducing the dimensionality of the original problem.Consequently, the original 6-degree-of-freedom (6-DoF) pose estimation problem is transformed into two 3-DoF subproblems, thereby significantly reducing the hardness of the pose estimation problem.As a result, we can efficiently obtain a global solution to the robust pose estimation problem.

Related work
The camera pose problem has been studied for more than a century, and there is a large body of literature on the absolute pose estimation problem [26].Here, we first review the PnP algorithm without mismatches.When the observations include no outliers, the PnP problem has a closed solution (n ≥ 3).To reduce the sensitivity to noise and consider a larger point set, the Efficient PnP (EPnP) [24], Optimal PnP (OPnP) [36], and Unified PnP (UPnP) [22] methods have been developed to produce accurate results with a linear complexity.These algorithms are applied in many related areas and can be regarded as state-of-the-art outlier-free PnP techniques.
When the observations includes outliers, the most commonly applied mechanism is RANdom SAmple Consensus (RANSAC) [16,28], which is a well-known algorithm for robust parameters estimation that is widely used for the camera pose estimation problem.However, as its name suggests, it is a nondeterministic heuristic algorithm, which means that RANSAC provides no guarantee regarding the optimality of its solution.The most recent advancement is to remove outliers before applying an outlierfree PnP method.In the Robust Efficient Procrustes PnP method(REPPnP) [15], the pose estimation problem is formulated as a low-rank homogeneous system, and outliers are iteratively removed under the assumption that the rank of the null space of the linear system should always be one.Re-weighting and 1-Point RANSAC-based PnP (R1PPnP) [37] uses a heuristic method of handling outliers by utilizing a soft reweighting mechanism and the 1-point RANSAC scheme.However, the outlier removal problem is as hard as the original problem.Nevertheless, although it is difficult to eliminate all outliers efficiently, the proportion of outliers in the observations can be reduced using these outlier removal methods.Moreover, in practical applications, it may be possible to obtain prior knowledge that can be used in outlier removal.For example, [7] presents an outlier filter that incorporates prior information on the viewing directions, [32] presents an approximate outlier rejection scheme with a known vertical direction, and the method proposed in [23] requires knowledge about the overall camera orientation with which to prune outliers.
In addition to the PnP methods discussed above, there is another class of methods for solving the robust pose estimation problem.In this body of work, the pose estimation problem is formulated as a robust optimization problem.Mestimator [26] is a classical robust estimation method, but it always solves to a local optimum because of the nonconvexity of the objective function.Therefore, more recent work on robust estimation has focused on obtaining globally optimal solutions.The most popular algorithm may be the branch-and-bound algorithm, which is always combined with convex relaxation [30,27,3] or geometric relaxation [13,19].However, the branch-and-bound-based algorithms devoted to pose estimation always suffer from a heavy computational burden for the obvious reason that the dimensionality of the feasible domain for pose estimation is six, and thus, the pose estimation problem perhaps cannot be regarded as a low-dimensional optimization problem from the perspective of using branch-and-bound approach.In other words, even if the branch-and-bound algorithm has tight bounds, it still needs considerable time to search the entire feasible space in SE(3).Moreover, the optimization is performed in two totally different manifolds, and it is not easy to calculate a tight bound for each branch.
Another topic that is closely related to the absolute pose estimation problem is point set registration [6,35].The only difference is that in the point set registration, there is no existing point correspondence.Similarly, the search for globally optimal solutions is a hot topic in the field of point set registration, and the branch-and-bound algorithm has also been broadly applied in recent related studies.One of the most successful algorithms for this purpose may be the algorithm proposed in [25] and its subsequent versions [6,35,4].These works are all based on rotation search theory, and for SE(3) optimization in particular, a more systematic scheme called the nested branch-and-bound is applied.Moreover, the decoupling methods presented in [31] improve efficiency, which inspires us to decouple the rotation and translation subproblems by means of pairwise constraints.

Contributions of this paper
In this paper, we introduce a novel robust and global solution to the absolute pose estimation problem, called Ro-bust and Global PnP (RGPnP).The contributions are threefold.(1) Our proposed method produces a globally optimal solution to the absolute pose problem.We apply the branchand-bound algorithm, which is a global optimization algorithm, to obtain the best solution.(2) We use novel pairwise constraints to decouple the rotation and translation subproblems, which can then be efficiently solved sequentially.(3) The proposed method is robust to outliers.We use a robust objective function, namely, consensus maximization, which can be regarded as a 0-1 loss function and has already been successfully used in many robust fitting problems.

Method 2.1. Problem formulation
In this paper, we formulate the absolute pose estimation problem as follows.Let the i-th 3D points in the world coordinate system be denoted by p i ∈ R 3 , i = 1, . . ., n.Similarly, let q i ∈ S 2 , i = 1, . . ., n be the i-th bearing vector with a unit norm, which corresponds to the i-th 2D point in the camera coordinate system.R ∈ SO(3) is the rotation and t ∈ R 3 is the translation.Given these definitions, the relationship for inlier observation is as follows: where λ i is the unknown depth of the i-th point.The objective of the absolute pose estimation problem is then to estimate the rotation and translation, given n pairs of points.Alternatively, to eliminate λ i , eq(1) can be reformulated as follows: where ∠(a, b) is the angle between vectors a and b.In this paper, we estimate the camera pose by maximizing the cardinality E of the inlier set S I : where is the inlier threshold.The function given in eq(4) is inherently robust to outliers since matched points are considered inliers only if their angular separation is below the inlier threshold .However, obtaining the global solution to eq(3) is a nontrivial problem.We propose the use of a set of novel pairwise constraints to obtain an equivalent but easier problem.

Eliminating translation by means of pairwise constraints
We consider two pairs of inlier correspondences (p i , q i ) and (p j , q j ).When they are aligned as shown in Fig. 1, the four points and the center of the camera must all lie in the same plane.v = q i × q j is the normal of that plane, and l = (Rp i + t) − (Rp j + t) = R(p i − p j ) is a vector in the plane.For simplicity, let u = p i − p j ; then, l = Ru.The relation between u and v is obvious: v ⊥ Ru or v T • Ru = 0, which is called a pairwise constraint in this paper (note that a similar equation was used in [21]).Such pairwise constraints provide an easy and elegant yet powerful means of decoupling the rotation and translation in eq(3); thus we can calculate the optimal rotation first by enforcing these pairwise constraints.Consequently, we define a new objective function with only rotation parameters as follows: where i = j and δ is a new inlier threshold.
We can also use (u, v) to rewrite eq(5) as follows: Here, • is a 0-1 function that returns a value of 1 if the condition is true and a value of 0 otherwise, and k = 1, . . ., m is the index of the (u, v) pairs.We find that there is only the rotation to be solved for in eq (7); thus, we have already successfully reduced the 6-DoF pose problem to a 3-DoF rotation estimation problem in SO(3).However, the number of input data increases from n to 0.5n(n−1).In [8], the authors pointed out that estimated parameters can be found as a solution on a subset of all the input data.Unfortunately, the number of (u, v) pairs is very large, and there are many different ways of choosing a subset from all samples.For a parameter estimation problem, all original input observations are expected to be involved in the estimation.Interestingly, we find that if each original correspondence is used once, then the number of (u, v) pairs decreases to 0.5n under our pairwise constraints.If the outlier ratio is not very large, we recommend using this 0.5n-subset as the input so that every original observation is involved in the estimation.However, if the outlier ratio is large, we recommend increasing the input size.

Global SO(3) search
In this section, we introduce a method based on the branch-and-bound algorithm for obtaining the global solution to eq (7).We summarize the proposed method in Algorithm 1.In brief, the branch-and-bound algorithm proceeds by recursively subdividing and pruning the rotation space until the global optimum is found.In this paper, the rotation space SO(3) is minimally parameterized with an angle-axis representation, and a 3D cube with a side length of 2π is used as the rotation domain.For more details about the angle-axis representation, please refer to [19].
Generally, the success of a branch-and-bound algorithm depends on the quality of its upper and lower bounds.In this paper, we present two different ways to calculate the bounds.The first pair of bounds is derived based on Hartley and Kahl s rotation search theory.To obtain the second pair of bounds, the rotation matrix is stacked into a 9 × 1 vector and eq (7) is reformulated as a linear system.Update the best solution so far: Q * (R * ) = max Q l i , i for all branches. 6: Remove the branches that Q u i < Q * , i for all branches.

7:
Update the highest priority cube B with upper bound Q u for the next loop.
terminate and return R * .

Bounds from Hartley and Kahl's theoty
Let us start with a famous equation that was proved in [19].Given a cube-shaped branch B of the rotation space, whose center is R 0 , for any u ∈ R 3 and any R ∈ B, the following holds: where σ is the half-side length of the cube B. According to the triangle inequality in a spherical geometry, for any From eq(9)-eq (12), for a given pair (u k , v k ), we can obtain Then, The lower bound can be easily calculated as follows: The proof for lower bound is obvious because no rotation in the branch can be no better than the optimum.

Bounds derived from a linear system formulation
From the equation v T • Ru = 0, we can obtain the linear homogeneous equation e T x = 0, where ) and e T = (v 1 u 1 , v 2 u 1 , . . ., v 3 u 3 ).Then, we have another orthogonal relation, ∠(e, x) = π 2 , and we can reformulate eq (7) as where τ is a different new inlier threshold.Notably, eq (17) is the outlier-robust form of the linear system Ex = 0, where E T = (e 1 , e 2 , . . ., e m ).
To derive the upper bound of eq (17), we introduce the famous lemma 2 in [19], which states that the angular distance between two rotations is less than the Euclidean distance between them in the angle-axis representation: where R 1 and R 2 are two rotations and r 1 and r 2 , respectively, are their angle-axis representations.Additionally, according to [20], Meanwhile, where x 1 and x 2 are the linear representations of R 1 and R 2 , respectively.Then Eq (25) establishes a relation between the angle-axis representation and the linear representation.Geometrically, a cube-shaped branch in the angle-axis representation can be relaxed to a continuous region in the linear representation, as shown in Fig. 2. Specifically, in a cube-shaped branch B whose center is R 0 (where x 0 and r 0 are the linear and angle-axis representations, respectively, of R 0 ), for any R ∈ B, where x and r are the linear and angle-axis representations, respectively, of R; σ is the half-side length of the cube B; and α denotes the upper bound of ∠(x, x 0 ).Similar to the first bound, we have Then, The upper bound can be derived as The lower bound can be estimated as shown in eq (33), which is similar to eq(16) Now, we have two types of bounds for objective function within a certain feasible domain.Because they have different formulations, it is very difficult to compare these two pairs of bounds theoretically.However, experiments show that the first formulation based on Hartley and Kahl s theory, is more efficient.

Global translation search
Once the optimal rotation has been obtained, the problem becomes a subproblem of robust absolute pose estimation with a known orientation [23].In this paper, we introduce an efficient method of solving the translation subproblem via three one-dimensional optimizations rather than one three-dimensional optimization.
First, we use the known rotation to reduce the outlier ratio.For a pair of correspondences, both correspondences will be considered outliers if they do not satisfy the pairwise constraint shown in eq (34).

|∠(q
Notably, when the input is the 0.5n-subset and an inlier and an outlier are paired, the inlier and outlier are both discarded.If the outlier ratio is small, we can still find the solution to the original problem from the remaining data.However, if the outlier ratio is large, we recommend increasing the input, e.g., pairing each correspondence with more than one other correspondence, to preserve as many inliers as possible.The reason is apparent: despite the discarding of inlier-and-outlier pairs, the same inliers are likely to be present in other pairs with other inliers.Theoretically, this step cannot remove all outliers, but it will significantly reduce the number of outliers . The next step is to calculate the translations from each pair constructed from the remaining correspondences.We will then have many translations, which will include some false results.Next, we must find the best translation among these translation results, for which the best solution can be obtained by voting based on the branch-and-bound algorithm.Moreover, a translation is defined by three independent variables and we can optimize those three variables independently.Consequently, the dimensionality of the problem decreases from three to one.For the onedimensional branch-and-bound method, we formulate the objective function as shown in eq( 35) where t s is the s-th solution and ε is the inlier threshold.The search domain is easily determined: t ∈ [min(t s ), max(t s )].Given the divided domain, whose center is t 0 and whose half-side length is µ, the upper and lower bounds are as follows:

Experiments
In this section, we report the results of evaluating our method on both synthetic and real-word data.To highlight the contributions of this study, all experiments were conducted with various outlier ratios, while outlier-free cases are not considered here.Based on the two different types of bounds derived in Sec 2.3, the two versions of the methods proposed in this paper are denoted by RGPnP H (Hartley and Kahls theory) and RGPnP L (linear system formulation).Here, the input set of pairs of correspondences is the 0.5n-subset, as described in Sec 2.2, for all experiments.The proposed methods were compared against several baseline approaches, including RANSAC+P3P with a maximum of 1000 trials (RNSC1000+P3P) and 5000 trials (RNSC5000+P3P), REPPnP [15], and R1PPnP [37], of which the latter two methods can be regarded as state-ofthe-art methods of handling the absolute pose estimation problem with outliers.All experiments were conducted using MATLAB 2018b on a computer equipped with a 3.2 GHz Intel Xeon E5 CPU.

Experiments with synthetic data
For synthetic experiments, we assumed a camera with an image size of 640 × 480 and a focal length of 1000 pixels.We randomly generated 1000 3D points in a cubic region of [0, 10] × [0, 10] × [5,15] and projected them onto the image to generate correct correspondences.Outliers were added to both the 3D points and 2D images to generate incorrect matches.Two different types of outliers were added, as follows: (1) Uniformly distributed 3D points were generated in the same cube as the data points ([0, 10] × [0, 10] × [5,15]), and each of them was assigned a correspondence to a randomly generated 2D points in the image.(2) Uniformly distributed 3D points were generated in a cubic region of [0, 1] × [0, 1] × [0, 1], different from the region of the data points, and each of them was assigned a correspondence to a randomly generated point in the image.The outlier ratio is defined as r outlier = N outlier N outlier +N inlier .We performed experiments with different outlier ratios, and for each ratio, 500 trials were run for each method.
To evaluate the estimation accuracy, we computed the rotation error in degrees between the ground-truth rotation R true and the estimated R as e rot = ∠(R true , R) and the translation error between the ground-truth translation t true and the estimated t as e trans = ttrue−t ttrue ×100%.We report the success rate, defined as the fraction of trials in which the correct pose was found, where an estimation was considered successful when e rot was less than 0.1 radius and e trans was less than 0.2.The success rates for 500 trials of each method for both types of outliers are plotted in Fig. 3.
As illustrated in Fig. 3, our methods performed well in all trials with both types of outliers, while R1PPnP handled the first type of outliers well but failed on the second type.The RANSAC-based methods found the correct pose in most trials with a small outlier ratio but failed in most trials when the outlier ratio was large.REPPnPs performance was un-  satisfactory for both types of outliers.
Global optimality.To demonstrate the global optimality of the proposed methods, we ran a trial with 25% outliers of the first type.We present the evolution of the upper and lower bounds, the number of branches and the remaining volume for each of our methods in Fig. 4. The upper and lower bounds converged after 775 iterations and 1731 iterations for RGPnP H and RGPnP L, respectively, indicating that the bounds derived from Hartley and Kahls theory are tighter than those derived from the linear system formula-tion.
Complexity and scalability.In this section, we study the run time of the proposed method with respect to different outlier ratios and different numbers of correspondences.In the first experiment, there were 1000 correspondences in total, and we ran the two versions of the methods 500 times under different outlier ratios with outliers of the first type.The median run time among the 500 trials is shown in Fig. 5(a).RGPnP H is faster than RGPnP L, and the median run time of RGPnP H is less than one second when the outlier ratio is no greater than 40%.Then, we experimentally investigated the scalability of the two methods.We ran each of the two methods 500 times with different numbers of correspondences and 10% outliers of the first type, and the results are presented in Fig. 5(b).Again, RGPnP H is faster than RGPnP L, and the run times of both methods increase linearly with respect to the number of correspondences.Even with 2000 correspondences, the median run time of RGPnP H is still less than 0.1 second.However, the run times are exponential in the outlier ratio, reflecting the hardness of the robust estimation problem.

Experiments with real-world data
This section reports an evaluation conducted on the DTU Robot Image Data Sets [1].The data consist of images of 60 scenes of different kinds of objects and materials, each of which was captured from 119 camera positions under 19 illumination situations.The 3D point clouds were obtained by means of structured light scanning.The calibration information is provided and the resolution of the images is 1600 × 1200.For the experiment reported in this paper, 24 scenes were used.For each of these 24 scenes, we selected 20 camera positions and 10 illumination situations for each position, which resulted in a total of 24 × 20 × 10 = 4800 2D images.For each combination of scene and illumination situation, the image No.25 was used as the reference image, and we matched SURF features between the reference image and the images from each of the 20 camera posi-  Note that the correspondences created in this way contained both outliers and noise and that the outlier ratio varied with scenes, camera positions and illumination situations.Then, we ran the six methods considered for comparison on all 4800 sets of 2D-3D correspondences and computed the estimation accuracy and the success rate as described in Sec 3.1.The results are presented in Fig. 6 and Fig. 7.Both versions of the proposed method achieved a 100% success rate for almost all scenes, camera positions and illumination situations.R1PPnP achieved results similar to those of our methods, while the other compared methods failed in most trials.These results indicate that our method produce the globally optimal solution and addresses outliers well.Fig. 7 shows several examples of the real-world image data: after recovering the camera pose, we reprojected the 3D inliers onto the image plane with an inlier threshold of 10 pixels.

Discussion and conclusion
In this paper, we propose a novel method of solving the absolute pose estimation problem.Our method is robust to outliers in the 2D-3D correspondences, and it solves the problem in a globally optimal way, which means that our method is able to produce a guaranteed best solution.Specifically, we reduce the dimensionality of the original problem from six to three, which makes the branch-andbound-based optimization process much faster.The 0.5nsubset can be used as the input when the outlier ratio is low; however, if the outlier ratio is high, which will greatly increase the run time, we recommend the common trick of applying a heuristic outlier removal method to significantly reduce the outlier ratio before using our globally optimal method.For our branch-and-bound algorithm for the rotation search, we propose two upper bounds: the first one is derived from Hartley and Kahl s rotation search theory and is more efficient for our problem, whereas the other is an original contribution that is more general and could be extended for application to other problems.

Figure 1 .
Figure 1.Geometric relations and pairwise constraints between a pair of 3D points (blue) and their corresponding 2D points (red).

Algorithm 1 8 d=1 . 4 :
Branch-and-bound algorithm for obtaining the rotation Require: Correspondence pairs {(v k , u k )} m k=1 and inlier threshold.1: Initialize B ← cube of side length 2π, and insert B intoa priority queue q. 2: while q is not empty do 3: Subdivide B into eight cubes {B d } For each B d calculate the upper and lower bounds Q u d ,

Figure 2 .
Figure 2. Geometric interpretation.(a) The geometric interpretation of the first bound: under the action of all possible rotations within a cube in the angle-axis representation, a unit vector may lie only on a spherical patch on the 3D unit sphere.(b) The geometric interpretation of the second bound: a cube in the angle-axis representation can be mapped to a continuous domain in S 8 .

Figure 3 .
Figure 3. Success rates for both types of outliers.

Figure 4 .
Figure 4.The optimality of RGPnP H RGPnP L. From left to right: the evolution of the upper and lower bounds, the number of branches and the remaining volume.

Figure 5 .
Figure 5.The complexity and scalability of RGPnP H and RGPnP L. (a).Median run time versus the outlier ratio (with 1000 correspondences).(b).Median run time versus the number of correspondences (with 10% outliers).

Figure 6 .
Figure 6.Success rates on the real-world data.

Figure 7 .
Figure 7. Examples of the real-world data.The red spots are all the correspondences detected based on SURF features, and the green circles are the inliers reprojected using RGPnP.First row: example images captured from different camera positions.Second row: example images captured under different illumination situations.Third and fourth rows: example images of different scenes.