A New Method for Absolute Pose Estimation with Unknown Focal Length and Radial Distortion

Estimating the absolute pose of a camera is one of the key steps for computer vision. In some cases, especially when using a wide-angle or zoom lens, the focal length and radial distortion also need to be considered. Therefore, in this paper, an efficient and robust method for a single solution is proposed to estimate the absolute pose for a camera with unknown focal length and radial distortion, using three 2D–3D point correspondences and known camera position. The problem is decomposed into two sub-problems, which makes the estimation simpler and more efficient. The first sub-problem is to estimate the focal length and radial distortion. An important geometric characteristic of radial distortion, that the orientation of the 2D image point with respect to the center of distortion (i.e., principal point in this paper) under radial distortion is unchanged, is used to solve this sub-problem. The focal length and up to four-order radial distortion can be determined with this geometric characteristic, and it can be applied to multiple distortion models. The values with no radial distortion are used as the initial values, which are close to the global optimal solutions. Then, the sub-problem can be efficiently and accurately solved with the initial values. The second sub-problem is to determine the absolute pose with geometric linear constraints. After estimating the focal length and radial distortion, the undistorted image can be obtained, and then the absolute pose can be efficiently determined from the point correspondences and known camera position using the undistorted image. Experimental results indicate this method’s accuracy and numerical stability for pose estimation with unknown focal length and radial distortion in synthetic data and real images.


Introduction
Retrieving the absolute pose of a camera from n 2D-3D point correspondences is one of the key steps in computer vision and SfM (structure from motion) [1][2][3][4][5][6]. Many approaches have been proposed to solve this problem, which are named PnP solvers [7][8][9][10][11][12][13] when the intrinsic camera parameters are all known as prior knowledge. The difference in the number of point correspondences makes both the ideas and the number of estimated parameters of PnP solvers different. When there is no prior knowledge except the intrinsic camera parameters, three 2D-3D point correspondences is the minimal subset, and these are called P3P solvers [14][15][16] and they can solve all six degrees of freedom of the camera pose. In practical applications, some intrinsic camera parameters may be unknown, and accordingly, many methods are proposed to work with these cases. When the focal length is unknown, a minimum of four 2D-3D point correspondences is required to estimate the absolute pose, and these corresponding methods are called P4Pf solvers [17]. Theoretically, one 2D-3D point correspondence can give two constraints, and hence eight unknown parameters can be estimated with four 2D-3D point correspondences. This means that another radial distortion parameter can be determined when the focal length is solved. These corresponding methods are called P4Pfr solvers [18][19][20][21]. If there are five 2D-3D estimation simpler and more efficient. The first sub-problem is to estimate the focal length and radial distortion. An important geometric characteristic of radial distortion, that the orientation of the image point with respect to the center of distortion (i.e., the principal point in this paper) under conditions of radial distortion is unchanged, is used to solve this sub-problem. The focal length and up to four-order radial distortion can be determined iteratively with this geometric characteristic, and can work with multiple distortion models, such as the division model and the traditional model (Brown model). The values estimated without considering the radial distortion can be used as the initial values for iteration, and these are close to the global optimal solutions. Consequently, the sub-problem can be efficiently and accurately solved with these initial values. The second sub-problem is to determine the camera pose with geometric linear constraints after estimating the focal length and radial distortion. Since undistorted images can then be given, we can obtain valid 2D-3D point correspondences, and then camera pose estimation becomes simple.
The proposed method can be used for cases wherein a zoom lens or fisheye lens is used, and the imaging position is set at a distance from the center of the image. The experimental results indicate that our proposed method has higher accuracy and better numerical stability for pose estimation from synthetic data and real images when focal length and radial distortion are unknown.
This paper is organized as follows. Section 2 presents the new method for camera pose estimation when focal length and radial distortion are unknown. Section 3 presents the results on numerical stability and noise sensitivity in the synthetic data and real images. Section 4 presents the discussion. Section 5 presents conclusions.

Problem Statement
A standard pinhole camera model is used in this paper. This paper uses three 2D-3D point correspondences and the known camera position to estimate the position with an unknown focal length and radial distortion. Up to four-order radial distortion can be estimated efficiently, and our proposed method can work with both the division distortion model and the traditional distortion model. The geometric construction of our problem without radial distortion is illustrated in Figure 1.
In this paper, three 2D-3D point correspondences are used to estimate the absolute pose when the focal length and radial distortion of the camera are unknown. Since six constraints can be given by three correspondences, this is the minimal subset for the case. The problem in this paper is decomposed into two sub-problems, which makes the estimation simpler and more efficient. The first sub-problem is to estimate the focal length and radial distortion. An important geometric characteristic of radial distortion, that the orientation of the image point with respect to the center of distortion (i.e., the principal point in this paper) under conditions of radial distortion is unchanged, is used to solve this sub-problem. The focal length and up to four-order radial distortion can be determined iteratively with this geometric characteristic, and can work with multiple distortion models, such as the division model and the traditional model (Brown model). The values estimated without considering the radial distortion can be used as the initial values for iteration, and these are close to the global optimal solutions. Consequently, the sub-problem can be efficiently and accurately solved with these initial values. The second subproblem is to determine the camera pose with geometric linear constraints after estimating the focal length and radial distortion. Since undistorted images can then be given, we can obtain valid 2D-3D point correspondences, and then camera pose estimation becomes simple.
The proposed method can be used for cases wherein a zoom lens or fisheye lens is used, and the imaging position is set at a distance from the center of the image. The experimental results indicate that our proposed method has higher accuracy and better numerical stability for pose estimation from synthetic data and real images when focal length and radial distortion are unknown.
This paper is organized as follows. Section 2 presents the new method for camera pose estimation when focal length and radial distortion are unknown. Section 3 presents the results on numerical stability and noise sensitivity in the synthetic data and real images. Section 4 presents the discussion. Section 5 presents conclusions.

Problem Statement
A standard pinhole camera model is used in this paper. This paper uses three 2D-3D point correspondences and the known camera position to estimate the position with an unknown focal length and radial distortion. Up to four-order radial distortion can be estimated efficiently, and our proposed method can work with both the division distortion model and the traditional distortion model. The geometric construction of our problem without radial distortion is illustrated in Figure 1. In Figure 1, Pi (i = 1, 2, 3) is the known 3D control point and pi is its 2D image projection without radial distortion. pc is the principal point, which is the center of the image. In Figure 1, P i (i = 1, 2, 3) is the known 3D control point and p i is its 2D image projection without radial distortion. p c is the principal point, which is the center of the image. O C is the known camera position. Since the radial distortion exists, we obtain only the distorted 2D image point p d i u d i v d i , and the undistorted 2D image point p i is unknown in real scenarios. In this paper, our core work is to estimate the camera pose when the focal length and radial distortion are unknown from the 3D control points P i and the distorted 2D image points p d i .

Radial Distortion and Focal Length Estimation
Many distortion models have been proposed in the existing literature, and for most digital cameras, the main distortion is radial distortion. Two models are usually used for radial distortion, which are respectively the division model and the traditional model. The division model is written as The traditional model is written as Here, k i is the radial distortion coefficient and r i = p d i is the distance between the distorted 2D point p d i and the center of distortion. The radial distortion is mainly dominated by the first two items [41], and we hence only consider the two items in this paper. It can be seen that no matter which model is used, the orientation of the image point with respect to the center of distortion (i.e., the principal point in this paper) under radial distortion is unchanged, and only the distance changes. This important and key geometric characteristic of radial distortion has encouraged us to propose a new method to estimate the focal length and radial distortion. According to these characteristics, the detailed geometric construction of our problem with radial distortion is illustrated in Figure 2.
OC is the known camera position. Since the radial distortion exists, we obtain only the distorted 2D image point ( ), and the undistorted 2D image point pi is unknown in real scenarios. In this paper, our core work is to estimate the camera pose when the focal length and radial distortion are unknown from the 3D control points Pi and the distorted 2D image points .

Radial Distortion and Focal Length Estimation
Many distortion models have been proposed in the existing literature, and for most digital cameras, the main distortion is radial distortion. Two models are usually used for radial distortion, which are respectively the division model and the traditional model. The division model is written as The traditional model is written as ( ) Here, ki is the radial distortion coefficient and ri = ‖ ‖ is the distance between the distorted 2D point and the center of distortion. The radial distortion is mainly dominated by the first two items [41], and we hence only consider the two items in this paper. It can be seen that no matter which model is used, the orientation of the image point with respect to the center of distortion (i.e., the principal point in this paper) under radial distortion is unchanged, and only the distance changes. This important and key geometric characteristic of radial distortion has encouraged us to propose a new method to estimate the focal length and radial distortion. According to these characteristics, the detailed geometric construction of our problem with radial distortion is illustrated in Figure 2. In Figure 2, , are denoted as α1, β1, γ1 respectively, and can be computed with triangles    3 1 c p p p ∠ are denoted as α2, β2, γ2, respectively. Here, the principal point pc and the distorted imaging points are known, but the undistorted imaging points pi are unknown. Note that the orientation of the distorted image point with respect In Figure 2, ∠p 1 O c p 2 , ∠p 2 O c p 3 , ∠p 3 O c p 1 , are denoted as α 1 , β 1 , γ 1 respectively, and can be computed with triangles P 1 O c P 2 , P 2 O c P 3 , P 3 O c P 1 , as shown in Figure 1. ∠p 1 p c p 2 , ∠p 2 p c p 3 , ∠p 3 p c p 1 are denoted as α 2 , β 2 , γ 2 , respectively. Here, the principal point p c and the distorted imaging points p d i are known, but the undistorted imaging points p i are unknown. Note that the orientation of the distorted image point with respect to the center of the distortion is unchanged under the radial distortion, and therefore α 2 , β 2 , γ 2 can be computed with p d 1 p c p d 2 , p d 2 p c p d 3 , p d 3 p c p d 1 , respectively. In the next step of the derivation using our method, only the distances between the undistorted image points and the center of distortion are used. These are unknown and need to be computed. Therefore, the derivation does not involve the distortion coefficients and does not solve the distortion coefficients directly.
Let p c p i = x i and O c p i = y i , and since p 1 p c p 2 and p 1 O c p 2 have the common edge p 1 p 2 , an equation by the cosine law can be given In Figure 2, O c p c = f is the focal length, which is perpendicular to the plane p 1 p 2 p 3 . Hence, according to the triangle O c p c p i , an equation can be given by the Pythagorean Theorem Take Equation (4) into Equation (3), and rewrite it as Similarly, the other two equations can be given We set x i f to be equal to f i , and then a system of equations with three variables was given, as follows.
The Levenberg-Marquardt (LM) algorithm [50] can be used to solve this system. It is an iterative solver, and good initial solutions to f i are needed to obtain the global optimal solutions. Choosing the initial solutions is one of the key steps in this paper. Here, when we choose the initial solutions, the radial distortion is not considered. Then, the initial solutions can be given by some existing methods [15,16]. The initial solutions without radial distortion are used for Equation (6), and the f i can be computed iteratively. The proposed method with these initial solutions can converge to the global optimal solution, which will be shown in Section 3.
After obtaining the value of f i , the focal length and radial distortion coefficients can be computed with different radial distortion models respectively, i.e., the division model and traditional model.
(1) The division model. This two-parameter model is given by the formula Here, k j (j = 1, 2) is the radial distortion coefficient and r i = x d i = p d i is the distance between the distorted point p d i and the principal point. Then we can obtain is known, and a system of polynomial equations can be obtained with the f i computed by the LM algorithm For solving linearly, f, k 1 f, k 2 f are seen as the unknown parameters in Equation (9) and then a system of linear equations can be obtained as Here, The system can be solved linearly using Then the focal length and the radial distortion coefficients are given respectively as follows: (2) The traditional model. This two-parameter model is given by the formula Similarly, a system of linear equations can be obtained as Here Then, the focal length and the radial distortion coefficients are computed linearly, as follows: In this section, when the LM algorithm is used to solve the intermediate variable f i , our method does not involve the radial distortion coefficients. This means that no matter what radial distortion model is used, the focal length and distortion coefficients can be solved linearly after obtaining the intermediate variable f i .

Camera Pose Estimation
The positions of undistorted points can be obtained according to the radial distortion coefficients estimated in Section 2.2, and the positions of distorted points in the original image. Then, the valid 2D-3D point correspondences can be obtained, and these are illustrated in Figure 3.
In Figure 3, O c -X c Y c Z c is the original camera frame and O w -X w Y w Z w is the original world frame. Two 2D-3D point correspondences can be used to estimate camera pose with a known camera position, focal length and radial distortion, and then a single solution can be obtained [37]. Alternatively, three 2D-3D point correspondences can be used to estimate camera poses with a known focal length and radial distortion, and up to four solutions can be obtained [15]. In this paper, to obtain the single solution directly, we use two 2D-3D point correspondences to estimate camera pose, i.e., rotation matrix R w_c and translation vector T w_c , written in red in Figure 3.

Camera Pose Estimation
The positions of undistorted points can be obtained according to the radial distortion coefficients estimated in Section 2.2, and the positions of distorted points in the original image. Then, the valid 2D-3D point correspondences can be obtained, and these are illustrated in Figure 3. In Figure 3, Oc − XcYcZc is the original camera frame and Ow -XwYwZw is the original world frame. Two 2D-3D point correspondences can be used to estimate camera pose with a known camera position, focal length and radial distortion, and then a single solution can be obtained [37]. Alternatively, three 2D-3D point correspondences can be used to estimate camera poses with a known focal length and radial distortion, and up to four solutions can be obtained [15]. In this paper, to obtain the single solution directly, we use two 2D-3D point correspondences to estimate camera pose, i.e., rotation matrix Rw_c and translation vector Tw_c, written in red in Figure 3.
Here, we define a new camera frame Oc−Xc2Yc2Zc2 and a new world frame Oc-Xw2Yw2Zw2. The new camera frame is defined as follows: Here, , which can be obtained after the radial distortion and focal length have been estimated in Section 2.2. In the new camera frame, the Xc2 axis is the vector 1 c O p  , the Zc2 axis is perpendicular to the plane Ocp1p2, and the Yc2 axis is defined by the right-handed coordinate system. Then, the point Pc in the original camera frame Oc−XcYcZc can be transformed to point Pc2 in the new world frame Oc−Xc2Yc2Zc2 using Here, we define a new camera frame O c -X c2 Y c2 Z c2 and a new world frame O c -X w2 Y w2 Z w2 . The new camera frame is defined as follows: Here, − − → O c p i = u i v i f , which can be obtained after the radial distortion and focal length have been estimated in Section 2.2. In the new camera frame, the X c2 axis is the vector − − → O c p 1 , the Z c2 axis is perpendicular to the plane O c p 1 p 2 , and the Y c2 axis is defined by the right-handed coordinate system. Then, the point Pc in the original camera frame O c -X c Y c Z c can be transformed to point P c2 in the new world frame O c -X c2 Y c2 Z c2 using The new world frame is defined as follows: In the new world frame, the origin is the camera position O c , which is known, the X w2 axis is the vector − − → O c P 1 , the Z w2 axis is perpendicular to the plane O c P 1 P 2 and the Y w2 axis is defined by right-handed coordinate system. Then the point P w in the original world frame O w -X w Y w Z w can be transformed to point P w2 in the new world frame O c -X w2 Y w2 Z w2 using Obviously, the new camera frame and the new world frame coincide. Then, we assume that point P c in the original camera frame and point P w in the original world frame are the same point, and according to the definitions of the new camera frame and new world frame, we can obtain the transformations between each two frames, as shown in Figure 4.
is defined by right-handed coordinate system. Then the point Pw in the original world frame Ow-XwYwZw can be transformed to point Pw2 in the new world frame Oc-Xw2Yw2Zw2 using ( ) Obviously, the new camera frame and the new world frame coincide. Then, we assume that point Pc in the original camera frame and point Pw in the original world frame are the same point, and according to the definitions of the new camera frame and new world frame, we can obtain the transformations between each two frames, as shown in Figure 4. Then, the rotation matrix Rw_c and translation vector Tw_c can be obtained from Figure  4, as follows: Pose estimation is thus finished.

Experiments and Results
In this Section, the numerical stability, noise sensitivity, computational speed and robustness to camera position noise of our proposed method, using the division model and traditional model, respectively, are thoroughly tested in the synthetic data, and compared to the general solver used in [21] (Josephson's method). Josephson's method is fast and numerically stable, and is the first method used to estimate camera pose with unknow focal length and radial distortion from four 2D-3D point correspondences. Then, real images are used to test the feasibility of our proposed method in real scenarios. From the  Then, the rotation matrix R w_c and translation vector T w_c can be obtained from Figure 4, as follows: Pose estimation is thus finished.

Experiments and Results
In this Section, the numerical stability, noise sensitivity, computational speed and robustness to camera position noise of our proposed method, using the division model and traditional model, respectively, are thoroughly tested in the synthetic data, and compared to the general solver used in [21] (Josephson's method). Josephson's method is fast and numerically stable, and is the first method used to estimate camera pose with unknow focal length and radial distortion from four 2D-3D point correspondences. Then, real images are used to test the feasibility of our proposed method in real scenarios. From the experiments, we can see that the results of the division model and the traditional model are basically the same. As such, only the result of the division model is discussed in this section.

Synthetic Data
A virtual perspective camera with radial distortion is synthesized. Its image resolution is 1280 × 800 pixels and the center of the image is the principal point, i.e., the center of distortion in this paper. Then, the 3D points of synthetic data are randomly generated in a box of [−20, 20] × [−20, 20] × [180, 220], and the 2D image points of the synthetic data are generated by projecting the 3D points using the virtual camera. Now we can randomly generate 2D-3D point correspondences for testing the numerical stability, noise sensitivity, computational speed and robustness to camera position noise of our proposed method.

Numerical Stability
Three 2D-3D point correspondences without noise are randomly generated for our proposed method, and four are randomly generated for Josephson's method. 50,000 trials are performed independently, and the distributions of the log10 value of error in rotation, focal length, radial distortion and reprojection are reported, as shown in Figure 5.

Numerical Stability
Three 2D-3D point correspondences without noise are randomly generated for our proposed method, and four are randomly generated for Josephson's method. 50,000 trials are performed independently, and the distributions of the log10 value of error in rotation, focal length, radial distortion and reprojection are reported, as shown in Figure 5.  From Figure 5, it can be inferred that the error distribution of our proposed method is more concentrated compared to Josephson's method. In addition, in terms of focal length, radial distortion and reprojection, our proposed method has better numerical stability than Josephson's method. In terms of rotation, the performance of our proposed method is almost the same as Josephson's method.

Noise Sensitivity
Three 2D-3D point correspondences with zero-mean Gaussian noise are randomly generated for our proposed method, and four are randomly generated for Josephson's method. The noise deviation level varies from 0 to 2 pixels. Then 50,000, trials are performed independently, and the median values of error in radial distortion, focal length, rotation and reprojection are reported, as shown in Figure 6.
Obviously, as the noise increases, so do the errors of the proposed method and Josephson's method. The proposed method has better numerical stability. In terms of the focal length and rotation, the proposed method performs much better than Josephson's method. In terms of the radial distortion and reprojection, the proposed method performs slightly better than Josephson's method.

Noise Sensitivity
Three 2D-3D point correspondences with zero-mean Gaussian noise are randomly generated for our proposed method, and four are randomly generated for Josephson's method. The noise deviation level varies from 0 to 2 pixels. Then 50,000, trials are performed independently, and the median values of error in radial distortion, focal length, rotation and reprojection are reported, as shown in Figure 6. Obviously, as the noise increases, so do the errors of the proposed method and Josephson's method. The proposed method has better numerical stability. In terms of the focal length and rotation, the proposed method performs much better than Josephson's method. In terms of the radial distortion and reprojection, the proposed method performs slightly better than Josephson's method.

Computational Speed
We test our proposed method on a 3.3 GHz two-core laptop. Three 2D-3D point correspondences without noise are randomly generated for our proposed method and four are randomly generated for Josephson's method. Then, 50,000 trials are performed independently, and the medians of the computational times of our proposed method and Josephson's method are 0.0768 s and 0.0743 s, respectively. It can be seen that our proposed method is 3.4% slower than the general solver.

Robustness to Camera Position Noise
With our proposed method, the difference is that it uses the camera position as prior knowledge, compared to the existing methods. The camera position is thus an important parameter, and it is necessary to analyze the effect of camera position noise on the performance of our proposed method. The camera position is generally given by RTK or the total station in this paper, and the accuracy of both these measures is better than mance of our proposed method. The camera position is generally given by RTK or the total station in this paper, and the accuracy of both these measures is better than 3 cm [56]. This section adds zero-mean Gaussian noise onto the camera's position, whose noise deviation level varies from 0 to 3 cm. Then, 50,000 trials are performed independently, and the medians of the relative error in rotation, distortion and focal length, as well as the median of error in reprojection, are reported respectively in Figure 7. It can be seen in Figure 7a that the relative errors in radial distortion and focal length are both close to zero. As described in Section 2, the problem is decomposed into two subproblems, and the first sub-problem is to estimate the focal length and radial distortion. Hence, the camera position noise has almost no effect on the focal length and radial distortion estimation.
The second sub-problem is to estimate the camera pose (rotation), and we can see in Figure 7a that the relative error in rotation increases with the increase in camera position noise. This means that camera position noise has an effect on camera pose estimation. However, the maximum relative error of rotation is less than 0.9% when the camera position noise is 3 cm, which indicates that we can still obtain good results for camera pose even if there is camera position noise.
Furthermore, the relative error in rotation, radial distortion and focal length will further affect the error in reprojection. It can be seen in Figure 7b that the error in reprojection increases with the increase in camera position noise. From the previous analysis, we can see that this is mainly caused by the error in rotation. Although the camera position noise has an effect on the reprojection, the maximum error is less than 0.5 pixels, which indicates It can be seen in Figure 7a that the relative errors in radial distortion and focal length are both close to zero. As described in Section 2, the problem is decomposed into two sub-problems, and the first sub-problem is to estimate the focal length and radial distortion. Hence, the camera position noise has almost no effect on the focal length and radial distortion estimation.
The second sub-problem is to estimate the camera pose (rotation), and we can see in Figure 7a that the relative error in rotation increases with the increase in camera position noise. This means that camera position noise has an effect on camera pose estimation. However, the maximum relative error of rotation is less than 0.9% when the camera position noise is 3 cm, which indicates that we can still obtain good results for camera pose even if there is camera position noise.
Furthermore, the relative error in rotation, radial distortion and focal length will further affect the error in reprojection. It can be seen in Figure 7b that the error in reprojection increases with the increase in camera position noise. From the previous analysis, we can see that this is mainly caused by the error in rotation. Although the camera position noise has an effect on the reprojection, the maximum error is less than 0.5 pixels, which indicates our proposed method still performs well, even though the camera position noise is present.

Real Images
The preceding section tested our proposed method on synthetic data, and this section will test our proposed method on real images. Two approaches are employed to show the performance of our proposed method. First, we use an image from the internet [57] that is widely used for camera calibration. This image has heavy distortion, as shown in Figure 8.
In this image, a checkerboard is inserted that has many straight lines, which will be bent under heavy distortion, as shown in Figure 8a. Then, three corners of the checkerboard are selected to estimate the camera pose, focal length and radial distortion with our proposed method. According to the results of our proposed method, we can obtain undistorted images as shown in Figure 8b. Intuitively, it can be seen that these lines revert to straight lines, which indicates our proposed method achieves good performance even under heavy distortion.
The first approach shows the performance intuitively, rather than quantitatively. Hence, another approach is employed to test our proposed method with quantitative evaluation on real images. The real images are captured by two cameras (MV-CS016, the CMOS is IMX296 of Sony) with a wide-angle lens (LM6JC, the focal length is 6 mm), which gives the real images heavy distortion. Given that the further a point is from the center of the image, the heavier the distortion will be, some control points are placed near the edges of both images, and hence their projections have heavy distortion. The case is useful for testing the performance of our proposed method on radial distortion. The images are illustrated in Figure 9.
our proposed method still performs well, even though the camera position noise is present.

Real Images
The preceding section tested our proposed method on synthetic data, and this section will test our proposed method on real images. Two approaches are employed to show the performance of our proposed method. First, we use an image from the internet [57] that is widely used for camera calibration. This image has heavy distortion, as shown in Figure  8. In this image, a checkerboard is inserted that has many straight lines, which will be bent under heavy distortion, as shown in Figure 8a. Then, three corners of the checkerboard are selected to estimate the camera pose, focal length and radial distortion with our proposed method. According to the results of our proposed method, we can obtain undistorted images as shown in Figure 8b. Intuitively, it can be seen that these lines revert to straight lines, which indicates our proposed method achieves good performance even under heavy distortion.
The first approach shows the performance intuitively, rather than quantitatively. Hence, another approach is employed to test our proposed method with quantitative evaluation on real images. The real images are captured by two cameras (MV-CS016, the CMOS is IMX296 of Sony) with a wide-angle lens (LM6JC, the focal length is 6 mm), which gives the real images heavy distortion. Given that the further a point is from the center of the image, the heavier the distortion will be, some control points are placed near the edges of both images, and hence their projections have heavy distortion. The case is useful for testing the performance of our proposed method on radial distortion. The images are illustrated in Figure 9. Three 2D-3D point correspondences are selected to estimate the camera pose, focal length and radial distortion by our proposed method. In this way we can obtain the undistorted images and valid 2D-3D point correspondences. In real scenarios, the ground truths of the camera pose, focal length and radial distortion are all unknown. Hence, we cannot test our proposed method directly on real images. However, the ground truths of the 3D control points given by a total station (NTS-330R, measuring precision better than 0.5 cm) are known. Hence, we can use the information to test our proposed method indirectly on real images. The measured values of these control points can be obtained by binocular vision, the accuracy of which is determined by the camera pose, focal length and radial distortion estimated via our proposed method. Binocular vision uses two cameras to obtain a three-dimensional coordinate, and this is the common three-dimensional measurement method. It can solve a problem wherein one camera cannot obtain depth information [58]. Accordingly, the accuracy of the measured values can reflect the performance of our proposed method.
We compute all the control point positions using binocular vision with our proposed method and Josephson's method to obtain the errors between the measured values and Three 2D-3D point correspondences are selected to estimate the camera pose, focal length and radial distortion by our proposed method. In this way we can obtain the undistorted images and valid 2D-3D point correspondences. In real scenarios, the ground truths of the camera pose, focal length and radial distortion are all unknown. Hence, we cannot test our proposed method directly on real images. However, the ground truths of the 3D control points given by a total station (NTS-330R, measuring precision better than 0.5 cm) are known. Hence, we can use the information to test our proposed method indirectly on real images. The measured values of these control points can be obtained by binocular vision, the accuracy of which is determined by the camera pose, focal length and radial distortion estimated via our proposed method. Binocular vision uses two cameras to obtain a three-dimensional coordinate, and this is the common three-dimensional measurement method. It can solve a problem wherein one camera cannot obtain depth information [58]. Accordingly, the accuracy of the measured values can reflect the performance of our proposed method.
We compute all the control point positions using binocular vision with our proposed method and Josephson's method to obtain the errors between the measured values and the ground truths. The mean relative errors of our proposed method and Josephson's method are 0.27% and 0.34%, respectively, which indicate that our proposed method achieves a better performance on real images. In addition, the mean relative error of our proposed method is very low, and this indicates that our proposed method can obtain good results on real images.
Similarly, we can obtain the mean reprojection error of the control points. The reprojection error is affected by the estimation of focal length, radial distortion and absolute pose, hence it can also reflect the performance of our proposed method. The mean reprojection with our proposed method is 0.21 pixels, and it is 0.29 pixels with Josephson's method, which indicates our proposed method achieves better performance. This is consistent with the results derived from synthetic data, and shows our proposed method performs well on both synthetic data and real images.

Discussion
This paper proposes a new method for absolute camera pose estimation when the focal length and radial distortion are unknown, from only three 2D-3D point correspondences and a known camera position. Up to four-order radial distortion can be estimated. The proposed method is especially suitable for cases wherein wide-angle and zoom lenses are used. The differences and advantages of the proposed method will be discussed in the following section.

Difference and Advantage
Estimating camera pose or some intrinsic parameters (i.e., focal length and radial distortion in this paper) from 2D-3D point correspondences is an important step in computer vision. For absolute pose estimation, the position of a 3D point in the world frame must be known first, which is difficult in practical applications. Hence, using fewer 2D-3D point correspondences is the aim of researchers, and is also why we are undertaking the work in this paper. Although it is difficult to obtain the absolute position of a 3D point in a world frame, it is easier to obtain the absolute position of a camera using positioning devices (e.g., IMU, RTK and total station). This is also the reason why our proposed method can use fewer 2D-3D point correspondences compared to traditional methods. Most of the traditional methods used for determining the camera pose, focal length and distortion estimation are based on the projection matrix, which is used to obtain a system of polynomial equations, and estimate the unknown parameters directly. In addition, most methods use the division distortion model to simplify the system. The difference in this paper is that the system of polynomial equations is obtained from the geometry of the photogrammetry, not the projection transformation, as in the traditional methods used for radial distortion estimation. An important geometric characteristic of radial distortion is that the orientation of the image point with respect to the center of distortion (i.e., the principal point in this paper) under radial distortion is unchanged, and this is then used to obtain a system of polynomial equations, which is the most interesting point of our proposed method. Lastly, the LM algorithm is used to solve the intermediate variables, and does not solve the radial distortion directly. This means that no matter what radial distortion model is used, the focal length and distortion coefficients can be solved linearly after obtaining the intermediate variables. It can be seen that the major difference between the proposed method and the traditional methods is that the former starts with geometry, and the latter starts with projection transformation.
Since values with no radial distortion are used as the initial solutions, our proposed method returns only one solution, but up to four are returned by Josephson's method, and it thus needs an extra constraint to disambiguate the multi-solution phenomena. In addition, the initial solutions of the LM algorithm in our proposed method are the values of the non-distorted model. Since the initial solutions are close to the truth values, they are highly likely to converge on the global optimal solution. Then, we carry out a simulation and an experiment in Section 3, and the results show the feasibility of our proposed method.
Our proposed method uses some known extrinsic parameters, i.e., camera position in this paper, which means the camera's position is an essential factor for numerical stability and noise sensitivity. The camera position is given with high precision by RTK or total station, and hence has a low error of 0-3 cm. As described in Section 3.1.4, our proposed method has good robustness to camera position noise. The good robustness is the reason why our proposed method achieves better performance in terms of numerical stability and noise sensitivity. Furthermore, a good initial solution is utilized, and this can be given by using the general PnP solvers without distortion, which cannot access the globally optimal solution. In this paper, the good initial solution is the main reason why our method has better numerical stability and noise sensitivity. Last, the solving process mainly involves a linear solution, except for the intermediate variable. This is another reason why our proposed method has lower error, as described in Section 3.1.2.
Since our proposed method achieves better performance in terms of numerical stability and noise sensitivity, and the camera position is given with high precision by a total station, we can obtain good results in the measurement of point position and reprojection for real images.
However, the major drawback of our proposed method is that it is 3.4% slower than the general solver. This drawback is caused by the low iteration step size. If we increase the step size to increase the computational speed and make our proposed method come close to the general solver in accuracy, our method will be 17.6% faster than the general solver. It can be seen that the reason our proposed method is slightly slower, as described in Section 3.1.3, is that this improves its accuracy. In practical applications, depending on the need for accuracy, we can change the step size of the iterations to increase or decrease the computational speed.

Future Work
In this paper, the iteration step size of the LM algorithm makes a profound impact on the computational speed and accuracy. Currently, our proposed method chooses the step size based on experimental experience. Hence, the work that we will do in the future will establish a method for adapting the iteration step size, which will choose the most appropriate step size automatically in order to balance the relationship between computational speed and accuracy.

Conclusions
We have proposed a new method to estimate the camera pose, focal length and radial distortion simultaneously using three 2D-3D point correspondences. This method has two key features that enable it to obtain a single solution efficiently and accurately. The first key feature is that the important geometric characteristic of radial distortion, which is the orientation of the image point with respect to the center of distortion under radial distortion, is unchanged, and this is used to solve our problem. Then, the focal length and up to four-order radial distortions can be determined iteratively with this geometric characteristic, and applied to multiple distortion models. The second feature is that the values with no radial distortion are the initial values, which are close to the global optimal solutions. This means that our problem can be efficiently and accurately solved with the initial values.
The experimental results indicate that our proposed method performs well in terms of numerical stability and noise sensitivity for synthetic and real data. It is particularly suitable for cases wherein a wide-angle or zoom lens with heavy distortion is used.