Simultaneous Calibration of Odometry and Head-Eye Parameters for Mobile Robots with a Pan-Tilt Camera

In the field of robot navigation, the odometric parameters, such as wheel radii and wheelbase length, and the relative pose of the optical sensing camera with respect to the robot are very important criteria for accurate operation. Hence, these parameters are necessary to be estimated for more precise operation. However, the odometric and head-eye parameters are typically estimated separately, which is an inconvenience and requires longer calibration time. Even though several researchers have proposed simultaneous calibration methods that obtain both odometric and head-eye parameters simultaneously to reduce the calibration time, they are only applicable to a mobile robot with a fixed camera mounted, not for mobile robots equipped with a pan-tilt motorized camera systems, which is a very common configuration and widely used for wide view. Previous approaches could not provide the z-axis translation parameter between head-eye coordinate systems on mobile robots equipped with a pan-tilt camera. In this paper, we present a full simultaneous mobile robot calibration of head–eye and odometric parameters, which is appropriate for a mobile robot equipped with a camera mounted on the pan-tilt motorized device. After a set of visual features obtained from a chessboard or natural scene and the odometry measurements are synchronized and received, both odometric and head-eye parameters are iteratively adjusted until convergence prior to using a nonlinear optimization method for more accuracy.


Introduction
Robot navigation is one of the key challenges facing the field of mobile robotics. This is because mobile robots are required to drive themselves through a given environment using the information gathered from their sensors. These sensors include proprioceptive sensors such as motor-speed sensors, wheel-load sensors, joint-angle sensors, battery-voltage sensors and the inertial measurement unit (IMU), which produces a type of data called odometric data. Additionally, these robots are fitted with exteroceptive sensors such as image-feature sensors, distance sensors, light-intensity sensors, sound-amplitude sensors, and global positioning system (GPS) sensors. However, robot localization is typically inaccurate due to the uncertainty associated with measurement errors during robot configuration. Although the robot configuration data, such as wheel radii and wheelbase length, can be obtained simply from the robot specifications or by manual measurement, the actual parameters can be dissimilar in practice. This is due to systematic errors such as manufacturing errors, assembly errors, tire pressure variations, and load variations that reduce the precision associated with the mounted on a motorized neck. Section 4 presents our experimental results, and the conclusion is summarized in Section 5.

Robot Coordinate System
The mobile robot configuration in Figure 1a can be understood in terms of an overview of the coordinate system of a mobile robot with a pan-tilt camera, as depicted in Figure 1b. It consists of the vehicle coordinate (robot's base), O Vehicle ; the neck coordinate includes a pan-tilt joint, (O Neck ), with a camera mounted on the top as camera coordinate, O Cam . The relation between the vehicle system O Vehicle and O Neck is estimated using Denavit-Hartenberg parameters (DH parameters) depending on the mobile robot configuration. The rotation from the neck to camera and the rotations between robot's bases are estimated in Section 3.2. The remain parameters are calculated in Section 3.4.
(a) (b) Figure 1. Mobile robot configuration. (a) Robot having a pan-tilt neck equipped with a camera (front-view); (b) coordinate system of mobile robot configuration (side-view).

Robot Wheel Parameters
In the field of robot navigation, one of the important parameters for mobile robot calibration is the wheel parameters, which are the radii of the left and right robot wheels and the baseline (the axle length between left and right wheel), as shown in Figure 2. The robot kinematics can be expressed aṡ where υ, ω, and θ are the velocity, angular velocity, and the orientation of the mobile robot, respectively, as depicted in Figure 2. These parameters can be obtained by using the following equation: where α R = r R b and α L = − r L b . The wheel parameters as r R , r L , and b are right and left wheel radii and the length of the baseline, respectively. ω R and ω L are angular velocities, which are calculated using the encoder on the right and left wheels. The ratio between the radii of wheels and the length of wheelbase is represented in terms of intermediate parameters, (α R , α L ), which are used in the calibration process instead of the real wheel parameters. The rotational angle of the mobile robot from frame i to frame j, by t i = 0 and t j = t, can be obtained through the integral of Equation (2) with respect to time as follows: where φ R (t) , φ L (t) are the encoder positions of the right and left wheels, respectively.

Simultaneous Calibration for Mobile Robot with Pan-Tilt Camera
In this section, we describe the proposed calibration method separately in six parts. The closed-loop transformation of the camera and the robot base to the robot neck between any frames i and j, where i = 1, ... , N − 1 and j = i + 1 are concisely depicted in Section 3.1. Since a set of captured images, I n=N n=1 , and the calibration data set (such as rotating angles of wheels (φ R i , φ L i ), and transformation from robot base coordinate to robot's neck coordinate, T V i N i ) are obtained once as the calibration input data. Sections 3.2-3.4 explain how to use these data to obtain the head-eye rotation, intermediate wheel parameters, and head-eye translation, including the actual wheel parameters, respectively. These processes are estimated iteratively until the value of all parameters converge, as described in Section 3.5. Finally, Section 3.6 describes the non-linear optimization method that increases the accuracy of the calibration results.

Closed-Loop Transformations
Let us now consider the homogeneous transformation of different vehicle poses from frame i to j, as shown in Figure 3. The abbreviations O C , O N , O V are the camera, neck, and vehicle coordinate systems, respectively. The closed-loop diagram can be represented by Equation (4).
where T V N is the physical relationship between O Neck and O Vehicle . The head-eye homogeneous transformation is T N C . When the camera is directed at the same target or feature, camera motion is represented as T C i C j . The robot's motion between frame i and j is presented to T V i V j . The homogeneous transformation of Equation (4) can be decomposed into rotational and translational terms as follows: where R is the 3 × 3 rotation matrix and t is 3 × 1 translation vector. Equation (5) is used to obtain the head-eye rotation parameters, R N C , which consist of 3 degrees of freedom (DOF), (γ x , γ y , γ z ). Equation (6) refers to the translation of the system, which is used to estimate the head-eye translation (t N C,x , t N C,y , t N C,z ) and the actual size of the wheel parameters (r R , r L , b). These parameters are calculated in Section 3.4.

Head-Eye Rotation Estimation
The six parameters in head-eye calibration consisting of three for rotation and another three for the translation, which are necessary and required to be obtained before an operation. In this section, three parameters of the rotation between the robot's neck and the camera are calculated precisely. A set of visual measurements and robot movement data were obtained by moving the robot and capturing images synchronously and continuously. The rotation between the camera and the robot's base was estimated accurately, as by Antonelli et al. [11]. They obtained the rotation parameters between the robot's base and camera using Equivalent angle-axis representation. However, their approach achieved because their mobile robot had a camera equipped on a fixed neck that did not change the relation between the robot's base and camera. If the relation between the base and the camera was changed during the calibration data collection, the Equivalent angle-axis method could not be used to solve the problem. During collecting the input calibration data of our mobile robot, both the robot and its neck move that means the rotation between the base and the camera are also changed.
In fact, whenever the robot's neck moves around the pan-tilt axis, the coordinate of the camera mounted on that neck is also moved significantly. Therefore, the relationship between the camera and the robot's neck is static, which means the subscript i of R where N is the total number of input images. Moreover, the mobile robot rotation is performed only on a planar. In other words, the mobile robot rotates around z-axis only [11]. Hence, R V i V j , in Equation (5) can be replaced with R z (θ). Assuming the rotation matrices are known, which are thoroughly described in Section 4, an Equation (5) can be represented in form of AX = XB, and the rotation R C N can be obtained likewise [8,9] as follows: where the matrix X is the estimated head-eye rotation, R C N . The camera rotations, R are demonstrated to matrix B, which i = 1, . . . , N − 1 and j = 2, . . . , N.

Intermediate Wheel Parameters Estimation
In this section, the linear relationships of intermediate wheel parameters, (α R , α L ), and rotational angle of the robot movement from the previous to current positions prior to capture any image i, θ j , which is obtained with Equation (3) using the period time of the robot movement between the previous and current positions, which is θ(1) = 0 as no loss of generality. The change in the rotational angle of the robot's base from frames i to j, on a planar θ(t j ) is also re-estimated. According to the robot's movement on a plane, the change in rotational angle about the z-axis of the robot's base coordinates, which is assumed to be perpendicular to the floor, between a pair of consecutive frames is R z (θ). In practical applications, the z-axis at the robot's base coordinates from one consecutive frame to another, (V i and V j ), may not be parallel because of an error in the estimated R N C and the floor plane of any pair of positions are not parallel. Therefore, the rotational angle about the z-axis at the robot's base coordinates between frames i and j can be calculated using the Euler angle (ZYX) as follows: where r 21 and r 11 are generic elements of R V i V j . If we consider for N images, the representation of Equation (3) can be used to obtain the parameters, (α R , α L ), similar to [11], as follows: where φ θ j = φ R t j φ L t j , are obtained from the rotational angles of both wheels from position i to j, (i = 1, . . . , N −1 and j = 2, . . . , N).φ θ is a matrix with N × 2 dimensions. The intermediate wheel parameters, (α R , α L ), can be calculated using the linear least squares method as follows

Head-Eye Translation and Wheel Parameters Estimation
The head-eye rotation and intermediate parameters have already been obtained in Sections 3.2 and 3.3. The remaining parameters that are estimated in this section are t N C , the head-eye translation vector, and the actual wheel parameters r R , r L , b. The translational and the rotational components of the robot's base coordinates can be described using the mobile robot kinematic equations as where i and j denote the previous and current frames, respectively, and τ is the period of time between frames. Substituting the intermediate wheel parameters and Equation (2) into Equation (11) yields In fact, t V i V j is the relative translation in x and y directions, which can be rewritten as From Equation (13), the translation between two robot base positions can be substituted into Equation (6) representing the relationship between the coordinates as follows: where ) are referred to the matrices A and B, respectively. Thus, Equation (14) can be simplified as from Equation (16), the third component of the vector λ 1 λ 2 0 T is zero, which also makes the third row zero. It can be derived as follows: Consider Equation (17) where the left matrix of the left term has 2N × 4 dimensions; and the right term is a vector with 2N dimensions.
N is a total number of frames, N = 1, . . . , 2N. From Equation (18), the head-eye translation and the actual radii of the left wheel, (t N C,x , t N C,y , t N C,z , r L ), are estimated using a linear least-squares method. The remaining parameters r R and b can be obtained with Equation (2). The wheel parameters, (r L , r R , b), which are obtained in this section, will be used to re-estimate the rotation between the robot's head and neck, as previously described.

Linearly Iterative Estimation
Even though the previous approach [11] could estimate the odometric and head-eye parameters simultaneously, their method could not provide the completed six parameters. The translation of z-axis between head-eye coordinates, t N C,z was not obtained by their method. The proposed method presents a fully mobile robot calibration of odometric and head-eye parameters, building on [14]. The rotation and translation in head-eye calibration including t N C,z and the odometric parameters are simultaneously estimated precisely. Supposing that the values of all parameters are not obtained correctly before optimization, it leads all parameters to convergence with incorrect values or divergence. Therefore, this section explains our contribution that we apply iteration-based estimation to initially guess the values of all parameters correctly that leads all parameters to convergence with the correct values rapidly. The processes of Sections 3.2-3.4 are used to compute repeatedly until all parameters are converged. The head-eye parameters, R C N , are estimated from Section 3.2 prior to being used to calculate the intermediate wheel parameters, (α R , α L ), of both wheels in Section 3.3. After that, they are used to compute the remaining parameters, as described in Section 3.4. These results are also used again to calculate the head-eye parameters following Section 3.2, as shown in Algorithm 1, which shows steps 3 to 5 compute repeatedly until convergence.
Algorithm 1 Full algorithm simultaneous calibration for mobile robot with pan-tilt camera.
Step 1: Obtain T C i C j between each frame using chessboard's corners or natural features end for Step 2: Initial r L , r R , b with manual measurements and obtain θ with Equation (3) while Convergence do Step 3: Compute R N C with Equation (7) Step 4: Compute α R , α L with Equations (8)-(10) Step 5: Compute t N C , r L , r R , b with Equation (18) end while Step 6: Refine R N C , t N C , r L , r R , b with Equations (19)

Non-Linear Optimization
Even though all parameters that are estimated in the previous section can lead to a good initial estimation, they are probably not the correct and accurate values. Therefore, a method of minimizing a function as Powell's method [16] is applied to fine-tune all parameters as closely as the ground truth. The variables consisting of the Euler angles of the head-eye rotation, R N C , (γ x , γ y , γ z ), head-eye translation, (t N C,x , t N C,y , t N C,z ), and wheel parameters, (r L , r R , b) are refined using the following equation: where Q i,k is the predicted 3D features, which are used to transform any point k from frame j, j = i + 1, to frame i. The 3D feature at frame i, represented with Q i,k and T N C , is constructed using head-eye rotational and translational parameters, (γ x , γ y , γ z , t N C,x , t N C,y , t N C,z ). T V N is calculated using DH parameters and pan-tilt data from the encoders; while T V i V j is referred to R z (θ), which is calculated using wheel encoder data and the estimated wheel parameters, r L , r R , b. N and K are the total number of images and features, respectively.

Experimental and Results
In our experiments, we used a mobile robot, which had two RGB cameras mounted on the pan-tilt motorized device, as shown in Figure 1. A chessboard and natural scenes were used as the calibration target for single camera and a pair of cameras (a stereo camera), respectively. The differential-drive mobile robot moved to any specified position and both the robot's neck and pan-tilt axis also moved before capturing an image sequentially. The mobile robot moved and captured repeatedly to obtain a set of images, I n=N n=1 . The data of the robot movements such as the rotational angles of both wheels (φ R n , φ L n ) and of any image n ( n = 1, . . . , N) were obtained by their angular velocity (ω R n , ω L n ) and the movement period time, τ i , as shown in Equation (3). The angles of wheels at the starting position, n = 1, were determined with φ R 1 = φ L 1 = 0. The transformation, including rotation and translation between robot's base and robot's neck coordinate systems, T V n=1,...,N N n=1,...,N , were calculated with the rotational angles of the pan-tilt axis at the robot's neck and Denavit-Hartenberg parameters (DH parameters).
In the case that used a chessboard as the calibration target, we captured a set of images using single camera with a resolution of 320 × 240 pixels. The chessboard contained 10 × 7 black and white square grids (56 corner points), and the size of any grid was 5.4 × 5.4 cm. We extracted the feature points of the chessboard's corners manually using [17]. Since all feature points of the chessboard's corners were obtained, the transformation T W C i between the camera at position i and the chessboard, which was determined to be the world coordinate, can be estimated with a plane-based transformation estimation [18]. Therefore, the transformation between any pair of camera positions i and j, T C i C j , was simply estimated with T C i C j = T C i W T W C j . In the case of natural scenes, the natural features were observed from the real environment based on rectified images. The corresponding feature points of the stereo images were estimated with SURF [19]. The transformations between any pair of camera positions, T C i C j , were obtained by a closed-form solution of the least-squares problem of absolute orientation using orthonormal matrices [20]. The result of stereo matching is shown in Figure 4.
The transformations (T N C and T V C ), odometric parameters, and 3D back-projection error results of the proposed and Antonelli's methods [11] were compared in Table 1. In the table of T N C , the head-eye rotation parameters (γ x , γ y , γ z ) were obtained by the ZYX-Euler angle corresponding to the rotation matrix R N C . The comparison of the head-eye transformation result indicated that the rotation and translation calibrated with the proposed method showed completed parameters estimation, while Antonelli's method did not obtain the transformation between the camera and the robot neck due to the fact that their mobile robot's neck could not move.
Furthermore, we also compared the results of the transformation between the robot's base and the camera, T V C , by calculation of the transformation at the starting position, T V 1 C 1 , which was obtained by T V 1 N 1 and T N C . The transformation matrix, T V 1 C 1 , of the proposed method was similar to Antonelli's except the translation, t V C,z , that Antonelli's method could not provide due to the constraining of the origin of the vehicle reference frame on the inertial x-y plane. The error was a 3D back-projection error, which was calculated with the average of Euclidean distance from all 3D feature points between any image and the transformed 3D feature points of other images, which was shown in mm units. Figure 5 also presents the reprojection result between frames of our method.   Even though our method requires an iterative computation in Section 3.5, all parameters reach stability within just a few iterations, as shown in Figure 6. The calibration error after optimization using both chessboard and natural scenes with respect to the number of iterations is shown in Figure 7. However, the 3D back-projection error of the calibration using natural scenes also depends on accuracy of the stereo matching process. Although the back-projection error using natural features is significantly higher than using features from a chessboard, both cases required only a few optimized iterations before the error was steady, which demonstrated the back-projection error of 4.4239 mm, as represented in Table 1. The back-projection error before optimization (iteration = 0) and after optimization of both chessboard and natural scenes related to the number of images are represented in Figure 8. It shows the number of poses that affect to the calibration accuracy. However, the required number of input images using a chessboard as the calibration target is at least 30 input images, while using the natural features requires at least 35 input images for the steady results.

Conclusions
In this paper, we presented an approach for simultaneous calibration of head-eye and odometric parameters on the mobile robot equipped with a camera mounted on the motorized pan-tilt. Our proposed approach involves complete estimation of the wheel radii, wheelbase length, and the rotation and translation of the head-eye. Additionally, we obtain comprehensive results of the relative pose between the camera and the robot's base, showing that our proposed method can compute the translation in z-axis while the previous studies could not. After the data from the visual features of either chessboard's corners or natural scenes and odometry measurements were acquired, both head-eye and wheel parameters were simultaneously estimated by using iterative adjustment until all parameters converged-the experimental results showed a few iterations were necessary for the convergence. Furthermore, nonlinear optimization is used to minimize the cost function to more sufficiently and appropriate to perform on the mobile robot equipped with a pan-tilt camera precisely.