Robust and Accurate Vision-Based Pose Estimation Algorithm Based on Four Coplanar Feature Points

Vision-based pose estimation is an important application of machine vision. Currently, analytical and iterative methods are used to solve the object pose. The analytical solutions generally take less computation time. However, the analytical solutions are extremely susceptible to noise. The iterative solutions minimize the distance error between feature points based on 2D image pixel coordinates. However, the non-linear optimization needs a good initial estimate of the true solution, otherwise they are more time consuming than analytical solutions. Moreover, the image processing error grows rapidly with measurement range increase. This leads to pose estimation errors. All the reasons mentioned above will cause accuracy to decrease. To solve this problem, a novel pose estimation method based on four coplanar points is proposed. Firstly, the coordinates of feature points are determined according to the linear constraints formed by the four points. The initial coordinates of feature points acquired through the linear method are then optimized through an iterative method. Finally, the coordinate system of object motion is established and a method is introduced to solve the object pose. The growing image processing error causes pose estimation errors the measurement range increases. Through the coordinate system, the pose estimation errors could be decreased. The proposed method is compared with two other existing methods through experiments. Experimental results demonstrate that the proposed method works efficiently and stably.


Introduction
With the development of modern industrial technology, quickly and accurately determining the position and orientation between objects is becoming more and more important. This process is called pose estimation. Vision-based pose measurement technology, also known as perspective-n-point (PNP) problem, is to determine the position and orientation of a camera and a target with n feature points on the condition of knowing their world coordinates and 2D image pixel coordinates. It has the advantages of non-contact and high efficiency. It can be widely applied in robotics [1], autonomous aerial refueling [2], electro-optic aiming systems [3,4], virtual reality [5], etc.
The existing approaches to solve object poses can be divided into two categories: analytical algorithms and iterative algorithms. Analytical algorithms apply linear methods to obtain algebraic solutions: Hu et al. [6] uses four points to achieve a linear solution. Lepetit et al. [7] proposed a non-iterative solution named EPnP algorithm. Tang et al. [8] presented a linear algorithm on the condition of five points. Ansar et al. [9] presented a general framework to directly recover the rotation and translation. Duan et al. [10] introduced a new affine invariant of trapezium to realize pose estimation and plane measurement.
As for iterative algorithms, pose estimation is formulated as a nonlinear problem with the constraints, and then it is solved using nonlinear optimization algorithms, most typically, Levenberg-Marquardt method. DeMenthon et al. [11][12][13] proposed POSIT algorithm. It gets the initial value of the solution using a scaled orthographic model to approximate the perspective projection model. Zhang et al. [14,15] proposed a two-stage iterative algorithm. The iterative algorithm is divided into a depth recovery stage and an absolute orientation stage. Peng et al. [16] achieved the object pose non-linearly on the basis of five points using a least squares approach. Liu et al. [17] gets the object pose based on the corresponding geometrical constraints formed by four non-coplanar points. Zhang et al. [18] proposed an efficient solution for vision-based pose determination of a parallel manipulator. Fun et al. [19] proposed a robust and high accurate pose estimation method to solve the PNP problem in real time.
The analytical solutions generally take less computation time. However, they are sensitive to observation noise and usually obtain lower accuracy. As for iterative solutions, the pose estimation is translated into a nonlinear least squares problem through the distance constraints. Then the distance error between feature points is minimized based on 2D image pixel coordinates. However, the iterative solutions also have drawbacks: (1) non-linear optimization needs a good initial estimate of the true solution, (2) they are more time consuming than analytical solutions.
During the pose estimation process, the image processing errors emerge mainly because the perspective projection point of the feature marker center and the perspective projection image center of feature markers do not coincide. The current papers focus on the reduction of image processing errors through better extraction of the image center [20][21][22]. However, those methods are not powerful enough to eliminate the inconsistency between the perspective projection point of the feature marker center and the center of the corresponding perspective projection image, especially when the measurement range increases.
Based on the discussion above, in this paper a robust and accurate pose estimation method based on four coplanar points is proposed: In the first step, by utilizing the linear constraints formed by points, the coordinates of points in the camera coordinate system are solved analytically. The results obtained in the first step are then set as the initial values of an iterative solving process to ensure the accuracy and convergence rate of non-linear algorithm. The Levenberg-Marquardt optimization method is utilized to refine the initial values. In the second step, the coordinate system of object motion is established and the object pose is finally solved. The growing image processing error causes greater pose estimation errors with an increasing measurement range. Through the coordinate system, the pose estimation errors could be decreased.
The rest of the paper is organized as follows: Sections 2 and 3 propose a robust and accurate pose estimation method. Section 4 provides some experiments to examine the method. Section 5 gives the conclusion.

The Solving of Feature Point Coordinates in the Camera Coordinate System
The target pattern with four coplanar points is designed for pose estimation as shown in Figure 1. P 0 , P 1 , P 2 and P 3 form a trapezium. P 0 P 1 is parallel to P 2 P 3 . P 0 is set as the origin of coordinate system. The connecting line of P 0 and P 1 is Y axis.
In this way, the world coordinate system (target coordinate system) is constructed.
To achieve the solution of target position, the coordinates of points in the camera coordinate system need to be solved first. o c − x c y c z c is the camera coordinate system. o w − x w y w z w is the world coordinate system. The coordinates of each point in the camera coordinate system is P ci = (x ci , y ci , z ci ) T . The coordinates of each point in the world coordinate system are P wi = (x wi , y wi , z wi ) T . P wi is known before solving the pose. The corresponding ideal image coordinates are I ui = (x ui , y ui , 1) T (i = 0, 1, 2, 3). The relationship of P ci and I ui could be described as where (u 0 , v 0 ) is the center pixel of the computer image, s x is the uncertainty image factor, d x and d y are center to center distances between pixels in the row and column directions respectively, f is the focal length, and λ i is the projection depth of P i .
where (u0, v0) is the center pixel of the computer image, sx is the uncertainty image factor, dx and dy are center to center distances between pixels in the row and column directions respectively, f is the focal length, and i  is the projection depth of Pi. The coordinates of points in the camera coordinate system could be obtained by solving i  .
Since P0P1 is parallel to P2P3, two linear constraints are introduced. Then i  is solved (The solving process is in the Appendix).
The coordinates of points in the camera coordinates system is obtained by i  . So, the vector from the optical center Oc to each point could be calculated through Equation (3).
The length between the feature points could be calculated through Equation (4).
In order to maintain the planarity of the four points, the coplanar constraint (CO) expressed as Equation (5) should also be considered. The coordinates of points in the camera coordinate system could be obtained by solving λ i . Since P 0 P 1 is parallel to P 2 P 3 , two linear constraints are introduced. Then λ i is solved (The solving process is in the Appendix A).
The coordinates of points in the camera coordinates system is obtained by λ i . So, the vector from the optical center O c to each point could be calculated through Equation (3).
The length between the feature points could be calculated through Equation (4).
In order to maintain the planarity of the four points, the coplanar constraint (CO) expressed as Equation (5) should also be considered.
Through Equation (6), the Levenberg-Marquardt optimization method is then used to solve λ i . The value of λ i obtained in Equation (2) is used as the initial value to ensure the accuracy and convergence speed of the algorithm.

The Solving of Object Pose
The feature points coordinates in the camera coordinate system P c0 , P c1 , P c2 , P c3 are calculated through λ i . The following step is to solve the object pose. As shown in Figure 2, o m − x m y m z m is the rotation and translation coordinate system of target. The three free degrees in rotation are yaw (y m axis and α angle), pitch (x m axis and β angle), and roll (z m axis and γ angle).
Through Equation (6), the Levenberg-Marquardt optimization method is then used to solve i  .
The value of i  obtained in Equation (2) is used as the initial value to ensure the accuracy and convergence speed of the algorithm.

The Solving of Object Pose
The feature points coordinates in the camera coordinate system (2) The target rotates around xm axis. The images of target are captured in the locations of different rotation angles. N2 groups of spatial coordinates 0 c P can be obtained. With the least squares method, the N2 groups of spatial coordinates 0 c P are used to fit a plane: Ex + Fy + Gz + H = 0. The vector of the xm axis is, o m − x m y m z m is established according to the following steps: (1) The target rotates around y m axis. The images of the target are captured in the locations of different rotation angles. N 1 groups of spatial coordinates P c0 can be obtained. With the least squares method, the N 1 groups of spatial coordinates P c0 are used to fit a plane: Ax + By + Cz + D = 0. The vector of the y m axis is, v = r 4 r 5 r 6 (2) The target rotates around x m axis. The images of target are captured in the locations of different rotation angles. N 2 groups of spatial coordinates P c0 can be obtained. With the least squares method, the N 2 groups of spatial coordinates P c0 are used to fit a plane: Ex + Fy + Gz + H = 0. The vector of the x m axis is, (3) The point sets P c0 obtained in steps (1) and (2) share one rotation center. A sphere-fitting is adopted to describe the center. According to the following sphere-fitting equation, the sphere center could be calculated. The sphere center is the rotation center (the origin of o m − x m y m z m ).
(x om , y om , z om ) is the sphere center and r is the sphere radius.
The coordinates of feature points in o m − x m y m z m are represented with P mi . The positioning accuracy of the feature points during image processing have a direct impact on the pose estimation accuracy. The automatic identification of circular markers is more convenient. There are many available algorithms for the center positioning of circular markers. So in this paper, the center point of the circular markers is selected as the feature point. The nature of imaging is the perspective projection. It has the characteristic that the object is big when near and small when far. This causes the perspective projection point of the circular marker's center and the center of the corresponding perspective projection image (usually an ellipse) to not coincide, especially when the measurement range is greater [23]. This inconsistency causes image processing errors. Then image processing errors emerge, which may result in the pose estimation errors (The ideal image coordinates I ui = (x ui , y ui , 1) T in Equation (1) are obtained through these image processing steps). In order to improve measurement accuracy, the following method is introduced.
The point sets P c0 obtained in Section 2 represent the known typical position of a target in the moving space. It means N typical positions of a target. The pose of a target in the typical position is known. According to Equation (10), N P m0 are obtained. Then the coordinates of P 0 in o m − x m y m z m in the typical position j are represented with Q mj 0 . P mk i represents the coordinates of feature points in o m − x m y m z m in the location k. The nearest Q mj 0 to P mk 0 is Q mJ 0 as shown in Figure 3. P mk i is calculated as Each pose vector in the moving space corresponds to three angles. They are α (yaw angle), β (pitch angle), and γ (roll angle). They could be calculated as shown in Equation (12  Each pose vector in the moving space corresponds to three angles. They are α (yaw angle), β (pitch angle), and γ (roll angle). They could be calculated as shown in Equation (12).
α O m Q mJ i , β O m Q mJ i and γ O m Q mJ i are known quantities. α Q mJ i P mk i , β Q mJ i P mk i and γ Q mJ i P mk i could be calculated according to Equation (13). In this way, image processing error caused by the inconsistency mentioned above is significantly reduced, especially when the measurement range is greater. The pose sin α cos γ cos α sin γ − sin α sin β sin α cos γ − cos β sin γ sin β sin α sin γ + cos β cos γ sin β cos α cos β sin α cos γ + sin β sin γ cos β sin αSγ − sin β cos γ cos β cos α   

Experiment Results
The experimental system that consists of a target, a three-axis rotation stage, and two CCD cameras is shown in Figure 4. The rotation range of stage in yaw axis, pitch axis, roll axis are ±160 • , ±90 • , ±60 • respectively.

Experiment Results
The experimental system that consists of a target, a three-axis rotation stage, and two CCD cameras is shown in Figure 4. The rotation range of stage in yaw axis, pitch axis, roll axis are ±160°, ±90°, ±60° respectively.  Table 1 [24]. k1, k2 are the radial distortion coefficients. p1, p2 are the tangential distortion coefficients.   Table 1 [24]. k 1 , k 2 are the radial distortion coefficients. p 1 , p 2 are the tangential distortion coefficients. The images are captured with a single CCD camera in different locations. The feature point coordinates in the camera coordinate system are obtained in the current location. Then the rotation and translation coordinate system of target o m − x m y m z m is established. α O m Q mk i , β O m Q mk i and γ O m Q mk i which could be used to represent the target pose are solved.
The target pose measurement experiment includes three parts: computer simulation experiments, real image experiments for accuracy, and real image comparative experiments with other methods.

Computer Simulation Experiments
To validate accuracy and noise immunity of the algorithm in this paper, the algorithm proposed is compared with the Oberkampf POSIT algorithm [11][12][13] and the Duan algorithm [10] during the computer simulation experiment process. In this process, the pinhole imaging model of a camera is simulated, thus the points are transformed with perspective projection and the simulated image coordinates of points are acquired. Gaussian noise from 0 pixels to 4 pixels are added to the point's coordinates of images. The relative errors of estimated poses using the proposed algorithm, Duan algorithm, and Oberkampf POSIT algorithm are shown in Figure 5.
It can be seen that the errors are reduced after optimization and they increase with the noise level. It is noticeable that the method proposed produces better results than the other two methods, especially when the noise level is greater than one pixel. In addition, it is noted that the errors keep almost the same level under lower noise disturbance. This phenomenon can be explained as follows: owing to the fact that the Duan algorithm does not have an iterative solving process, this leads to a relatively high error. The Oberkampf POSIT method takes no account of both the coplanarity constraint and initial value of iteration process. With the noise level increasing, the factor of error changes from initial value of the iteration process to coplanarity. This results in a higher error than the proposed algorithm.  Figure 5.
It can be seen that the errors are reduced after optimization and they increase with the noise level. It is noticeable that the method proposed produces better results than the other two methods, especially when the noise level is greater than one pixel. In addition, it is noted that the errors keep almost the same level under lower noise disturbance. This phenomenon can be explained as follows: owing to the fact that the Duan algorithm does not have an iterative solving process, this leads to a relatively high error. The Oberkampf POSIT method takes no account of both the coplanarity constraint and initial value of iteration process. With the noise level increasing, the factor of error changes from initial value of the iteration process to coplanarity. This results in a higher error than the proposed algorithm.

The Measurement Experiments for Accuracy
The measurement range of yaw angle, pitch angle, and roll angle is set to −35°~35°. The measurement errors are shown in Figure 6. The measurement error is the absolute difference of the estimated angle to the true angle.

The Measurement Experiments for Accuracy
The measurement range of yaw angle, pitch angle, and roll angle is set to −35 •~3 5 • . The measurement errors are shown in Figure 6. The measurement error is the absolute difference of the estimated angle to the true angle.

The Measurement Experiments for Accuracy
The measurement range of yaw angle, pitch angle, and roll angle is set to −35°~35°. The measurement errors are shown in Figure 6. The measurement error is the absolute difference of the estimated angle to the true angle. Figure 6. The measurement error of yaw, pitch, and roll angle. Figure 6. The measurement error of yaw, pitch, and roll angle.

The Comparative Experiments for Accuracy
The algorithm proposed in this paper, Duan algorithm, and Oberkampf POSIT algorithm are used to calculate target poses. The root mean square (RMS) errors are displayed in Figure 7. Each point in the figure is the RMS error over multiple measurements. As to the Duan algorithm, when the rotation angle is greater, the RMS error of our method is less than that of Duan algorithm. The experiment results demonstrate that the analytical pose measurement process is sensitive to the observation noise. This error could be reduced by the iterative process in our method. At the same time, in our algorithm a good initial value is provided at the beginning of iteration to ensure pose estimation accuracy. As to the Oberkampf POSIT algorithm, experiment results demonstrate that the image processing error mentioned in Section 3 exists in the pose measurement process. Through the method in Section 3, the error could be eliminated successfully, especially when the measurement range is greater.
By comparing the results of our method and those of the Oberkampf POSIT and Duan algorithms, it is obvious that the measurement accuracy of our method is higher than those in the whole moving space.
As to the real applications, we first take images for the calibration board of the camera, as shown in Figure 8. The four corner circular markers (marked in red) of the calibration board are used to form a trapezium. The center points of the four circular markers are extracted from the images. Then the lengths of the four sides of the trapezium are calculated with the algorithm proposed in this paper, Duan algorithm, and Oberkampf POSIT algorithm. The measurement errors are shown in Table 2. Figure 9 shows the image sets of the calibration board. D 01 is the distance between P 0 and P 1 . D 02 is the distance between P 0 and P 2 . D 23 is the distance between P 2 and P 3 . D 13 is the distance between P 1 and P 3 . The measurement error is the absolute difference of the estimated length to the true length. It can be seen from Table 2 that the results of our algorithm are more close to the real value. The measurement error of our method is less than 0.2 mm. The algorithm proposed in this paper is more effective. the method in Section 3, the error could be eliminated successfully, especially when the measurement range is greater.
By comparing the results of our method and those of the Oberkampf POSIT and Duan algorithms, it is obvious that the measurement accuracy of our method is higher than those in the whole moving space. As to the real applications, we first take images for the calibration board of the camera, as shown in Figure 8. The four corner circular markers (marked in red) of the calibration board are used to form a trapezium. The center points of the four circular markers are extracted from the images. Then the lengths of the four sides of the trapezium are calculated with the algorithm  Table 2. Figure 9 shows the image sets of the calibration board. D01 is the distance between P0 and P1. D02 is the distance between P0 and P2. D23 is the distance between P2 and P3. D13 is the distance between P1 and P3. The measurement error is the absolute difference of the estimated length to the true length. It can be seen from Table 2 that the results of our algorithm are more close to the real value. The measurement error of our method is less than 0.2 mm. The algorithm proposed in this paper is more effective.  Figure 8. The image of P 0 , P 1 , P 2 , and P 3 on the calibration board. proposed in this paper, Duan algorithm, and Oberkampf POSIT algorithm. The measurement errors are shown in Table 2. Figure 9 shows the image sets of the calibration board. D01 is the distance between P0 and P1. D02 is the distance between P0 and P2. D23 is the distance between P2 and P3. D13 is the distance between P1 and P3. The measurement error is the absolute difference of the estimated length to the true length. It can be seen from Table 2 that the results of our algorithm are more close to the real value. The measurement error of our method is less than 0.2 mm. The algorithm proposed in this paper is more effective.   Then we test the proposed algorithm by experimentally detecting the pose information of a rotation stage in yaw angle, pitch angle, and roll angle. Figure 10 shows the image sets of the rotation stage with feature points. Through the target on the rotation stage as shown in Figure 10, its pose is calculated using the algorithm proposed in this paper, Duan algorithm, and Oberkampf POSIT algorithm. The measurement errors are shown in Table 3. The measurement error is the absolute difference of the estimated angle to the true angle. It can be seen from Table 3 that the results of our algorithm are more close to the real value. The measurement error of our method is less than 0.2 • . Using the same information extracted from the images, the proposed algorithm has an advantage to identify the pose of the rotation stage. Then we test the proposed algorithm by experimentally detecting the pose information of a rotation stage in yaw angle, pitch angle, and roll angle. Figure 10 shows the image sets of the rotation stage with feature points. Through the target on the rotation stage as shown in Figure 10, its pose is calculated using the algorithm proposed in this paper, Duan algorithm, and Oberkampf POSIT algorithm. The measurement errors are shown in Table 3. The measurement error is the absolute difference of the estimated angle to the true angle. It can be seen from Table 3 that the results of our algorithm are more close to the real value. The measurement error of our method is less than 0.2°. Using the same information extracted from the images, the proposed algorithm has an advantage to identify the pose of the rotation stage.   Finally, we estimated the head pose. The pose measurement system based on inertial technology is mounted on a helmet to calculate the head pose as shown in Figure 11. The results are taken as the real values. The corresponding images of the head are captured and shown in Figure 12. The vertices of the trapezium are the outer corners of the two eyes and the mouth. The image points of the four vertices on each image are located in a manual way. The head pose is calculated using the algorithm proposed in this paper, Duan algorithm, and Oberkampf POSIT algorithm. The first image (Frame 0) in Figure 12 is set as the initial position (zero position). The angle between the current positions (Frame 1, Frame 2, Frame 3, and Frame 4) and the initial position (Frame 0) are calculated. The measurement errors are shown in Table 4. The measurement error is the absolute difference of the estimated angle to the real angle.

D13 (mm)
image points of the four vertices on each image are located in a manual way. The head pose is calculated using the algorithm proposed in this paper, Duan algorithm, and Oberkampf POSIT algorithm. The first image (Frame 0) in Figure 12 is set as the initial position (zero position). The angle between the current positions (Frame 1, Frame 2, Frame 3, and Frame 4) and the initial position (Frame 0) are calculated. The measurement errors are shown in Table 4. The measurement error is the absolute difference of the estimated angle to the real angle.   image points of the four vertices on each image are located in a manual way. The head pose is calculated using the algorithm proposed in this paper, Duan algorithm, and Oberkampf POSIT algorithm. The first image (Frame 0) in Figure 12 is set as the initial position (zero position). The angle between the current positions (Frame 1, Frame 2, Frame 3, and Frame 4) and the initial position (Frame 0) are calculated. The measurement errors are shown in Table 4. The measurement error is the absolute difference of the estimated angle to the real angle.    It can be seen from Table 4 that the results of our algorithm are closer to the real value. The rough localization of the control points (the trapezium) in the image lead to the promotion of the measurement errors. However, our method could decrease these kinds of measurement errors compared with the other two methods. The errors of yaw, pitch, and roll are not independent. The errors of yaw, pitch, and roll influence each other for they constitute the rotation matrix together. When the head rotated in all the axes, as the errors of yaw, pitch, and roll influence each other, measurement errors are promoted.
The distance between object and camera is limited by the field of view. The corresponding experiment results (RMS errors) are shown in Table 5. When the camera is in different locations, the pose of the target is calculated. In Position 1 and Position 2, the camera is near to the target and the target is out of the field of view of the camera. In Position 5 and Position 6, the camera is far from the target and the target is out of the field of view of the camera. In Position 3 and Position 4, the target is in of the field of view of the camera. It can be seen as follows: if the object is out of the range of the field of view, the measurement accuracy is lower. If the object is in the range of the field of view, the measurement accuracy is higher and little influenced by the distance between object and camera. So in order to obtain a higher accuracy, it is needed to ensure that the object is in the range of the field of view of the camera.

Conclusions
A robust and accurate vision-based pose estimation algorithm based on four coplanar feature points is presented in this paper. By combining the advantages of the analytical methods and the iterative methods, the iterative process is given a preferable initial value acquired independently by an analytical method. At the same time, the anti-noise ability is strengthened and the result is more stable. The pose estimation errors depend on the feature extraction accuracy. When the measurement range is greater, the image processing errors become greater. In the algorithm proposed, the coordinate system of object motion is established to solve the object pose. In this way, the image processing error which may result in the pose estimation errors could be reduced.
According to both Equations (A4) and (A5), λ i is solved as shown in Equation (2) in the paper.