Stereo Camera Head-Eye Calibration Based on Minimum Variance Approach Using Surface Normal Vectors

This paper presents a stereo camera-based head-eye calibration method that aims to find the globally optimal transformation between a robot’s head and its eye. This method is highly intuitive and simple, so it can be used in a vision system for humanoid robots without any complex procedures. To achieve this, we introduce an extended minimum variance approach for head-eye calibration using surface normal vectors instead of 3D point sets. The presented method considers both positional and orientational error variances between visual measurements and kinematic data in head-eye calibration. Experiments using both synthetic and real data show the accuracy and efficiency of the proposed method.


Introduction
Visual servoing is a fundamental skill that enables a robot to perform a variety of manipulation tasks, such as industrial automation, assistance in medical operations, and robotic grasping (i.e., service robots). It enables robot motion control by exploiting visual information extracted from a camera. Obviously, the transformation from the robot and the camera should be estimated in order to control the robot precisely. This problem, also known as hand-eye calibration, is a process that estimates the transformation between the robot end-effector and the camera mounted on it. In general, classical hand-eye calibration methods estimate the transformation between the robot end-effector and the camera coordinate systems by (1) moving the end-effector into different poses, (2) computing the transformation of the end-effector with respect to the robot base by robot kinematics, (3) detecting the calibration patterns from images, (4) calibrating the camera using a plane-based calibration method, (5) estimating the poses of the camera with respect to the calibration pattern, and (6) solving for the robot-sensor calibration.
Hand-eye calibration methods can be classified in two kinds of hardware configurations: eye-in-hand and eye-to-hand configuration. The former deals with the case where the camera is attached to a robot manipulator. The latter is for the case in which the camera is mounted on a place away from the robot. Additionally, many studies dealing with hand-eye calibration have typically been categorized in two ways: AX = XB and AX = YB problems (see details in Section 2).
In this paper, we propose a method to determine the transformation from a robot's head (neck) to the camera, as shown in Figure 1. It is similar to hand-eye calibration, but we call it head-eye calibration for convenience. The proposed method is an extended variance minimization (ExtMinVar) approach, which reduces the computational demand compared to the original minimum variance (MinVar) approach [1]. MinVar optimizes its solution over all 3D points in all views, which causes high amounts of computation per iteration that accumulates throughout the optimization process. The proposed ExtMinVar overcomes this issue by constructing a normal vector using 2 3D points to represent the 3D points in each view. By using only 2 3D points for each view, rather than large sets of 3D points, ExtMinVar is able to reduce the number of computations required for the coordinate transformation and covariance calculation in each iteration of the optimization, reducing the computation time while maintaining accuracy.
The main contributions of this paper are as follows: • proposal of a cost-effective head-eye calibration that is simpler than that of the previous work [1] based on the minimum variance (MinVar) method, • introduction of a novel cost function to minimize the covariance between surface normal vectors, rather than 3D point sets, and • validation of the proposed approach through simulations as well as real experiments.
The rest of the paper is organized as follows: Section 2 gives an overview of the studies related to hand-eye calibration. In Section 3, we first review head-eye calibration using the MinVar approach, and our proposed method is described in detail. Section 4 shows the experimental results using both synthetic and real datasets, and the conclusion is presented in Section 5.

Related Work
Hand-eye calibration methods can be classified in several ways. One way is to distinguish whether the mathematical representations of the given problem are in the AX = XB or AX = YB form. A survey report [2] gives an overview of the methods that solve calibration problems of the forms AX = XB and AX = YB, where X is the unknown 4 × 4 hand-eye transformation, Y is the unknown 4 × 4 transformation from target to camera, and A and B are known 4 × 4 transformations for visual measurements and robot poses, respectively.
Another way to classify hand-eye calibration methods is by determining if they are closed-form-based or iteration-based approaches. Closed-form-based approaches use linear algebra methods to obtain the solution for hand-eye calibration. Shiu and Ahmad [3] introduced a solution with a system of rigid transformations of the form AX = XB for finding the position and rotation of a sensor with respect to the robot's wrist. Tsai and Lenz [4] presented a similar but more efficient method for solving the hand-eye calibration problem by keeping the number of unknowns the same, regardless of the number of frames. Liang et al. [5] proposed a new method that used linear decomposition on A and B without calculating the screw vector. The closed-form based approach is simple and easy to implement, but it often does not give results that are sufficiently accurate due to noises from various sources during measurement.
On the other hand, many studies have been proposed to reduce calibration errors with iterative cost-minimization. Iteration-based approaches are robust to noise and calibration errors at all stages from sensing to calibration. Although the initial estimate is not optimal, nonlinear optimization can be applied to obtain an optimal solution by iteratively minimizing a cost function. Horaud and Dornaika [6] presented a different formulation of hand-eye calibration using 3 × 4 perspective matrices as well as the classical AX = XB. They also showed that the non-linear optimization method yielded more accurate results than the closed-form solution. Shi et al. [7] proposed an iterative approach that simultaneously solved both hand-eye and base-world calibrations using the quaternion representation of the rotations rather than the rotation matrix. Zhuang et al. [8] introduced a one-stage iterative method with the Levenberg-Marquardt algorithm, eliminating the propagation of rotation errors to position errors. Wei et al. [9] proposed a method of calibrating robotic hand cameras by active motion, which does not require metric control points nor the initial values of the unknowns in non-linear equations. Strobel and Hirzinger [10] presented a novel metric on the group of rigid transformation (SE(3)) to optimally estimate the parameters of the system model. It automatically adjusted the weights of the translational and rotational errors to perform an optimal estimation. In [7], an online hand-eye calibration method using motion selection was proposed, which not only prevented the degenerate cases in the calibration process, but also tried to increase calibration accuracy by choosing appropriate motions. Fassi and Legnai [11] presented the equation AX = XB of hand-to-sensor calibration as a geometrical interpretation. The form of AX i = X i B is of an overconstrained system, and the optimal approximation can be found with the least square method, which minimizes the errors. While most methods apply L2 optimization for hand-eye calibration, Zhao [12] leveraged a convex optimization without a starting value. Hartley et al. [13] showed that conjugate rotation averaging is related to hand-eye calibration, which computes the relationship between the robot and camera frames. Heller et al. [14] considered hand-eye calibration as the minimization of the -epipolar constraint and deployed the branch-and-bound (BnB) approach to minimize the objective function. They showed that their solution is guaranteed to be globally optimal with respect to the L ∞ -norm, which is based on the BnB search over the space of rotations in [15]. Li et al. [16] presented a probabilistic approach to find the hand-eye and robot-world transformation (AX = YB problem) even when noisy scrambled data are used as input due to the asynchrony of sensor streams.
At its core, hand-eye calibration works similarly to ICP (Iterative Closest Point), where two sets of 6 DoF (positional and orientational) data are matched via 2D-2D, 2D-3D, or 3D-3D registration, depending on the dimensions of the input data. Ishikawa et al. [17] presented a 2D-3D calibration between a LiDAR and a camera, based on hand-eye calibration. The LiDAR motions are estimated by an ICP algorithm and camera motions are computed by feature point matching. Limoyo et al. [18] proposed the self-calibration of a mobile manipulator by aligning two point clouds from a depth camera and a contact sensor using an ICP-based algorithm.
This work achieves head-eye calibration for settings that can be expressed in the AX = XB form using an iteration-based optimization. We propose a simpler and more efficient computation of the original MinVar algorithm, while also maintaining its robustness to input noise and accuracy.

Problem Statement and Formulation of Head-Eye Calibration
The goal of head-eye calibration is to estimate the transformation from the robot's head (neck) to the camera as given below: where R C r H ∈ SO (3) and t C r H ∈ R 3 are the rotation and translation from the right camera to the robot's head, respectively. Figure 2 shows the transformation between coordinate systems for head-eye calibration, where each transformation is denoted as follows: • T C r P : The transformation from the right camera to the pattern coordinate system, • T C r H : The unknown transformation from the right camera to the robot's head coordinate system, which will be estimated by this calibration method, •T C r H : An initial estimate of the T C r H transformation, • T HB : The transformation from the robot's head to the robot base coordinate system, • T BP : The transformation from the robot base to the pattern coordinate system.
To solve for head-eye calibration, three or more measurements obtained from different head poses using forward kinematics are needed. The pattern is observed from each of the different poses of the camera. Let us suppose that the robot's head has been turned through n poses with the camera measuring m feature locations on the pattern plane. Given the triangulated 3D points p ij B from the stereo camera and robot head rotation angles (R X Cr ij , R Y Cr ij ) for every pose i = 1, · · · , n and feature point j = 1, · · · , m, the unknown T C r H can be computed from an initial estimateT C r H with iterative optimization techniques. Here, we can also determine the transformation from the robot's head to the robot base coordinate system T HB through robot kinematics.

Robot Head-Eye Calibration Based on the MinVar Approach
We briefly review the MinVar method [1] for head-eye calibration. The basic idea of the MinVar method is illustrated in Figure 3, which enforces the constraint that the corresponding 3D points observed from n different views of a planar pattern must be mapped to the same location when transformed into the reference coordinate system. This makes it possible to find the optimal solution even in the presence of noisy data by minimizing the covariance of the 3D points.
As mentioned in Section 2, most previous works used transformations between robot end-effector motions, transformations between camera motions, and the relationship between them to find the rigid transformation of the robot's head to the camera. There may be several numerical errors in computing the transformation, such as rounding error during matrix inversion, error in pixel localization, and quantization error on each iteration. In order to reduce these errors, the MinVar method uses the 3D point sets as inputs, rather than transformations. This allows us to decrease the number of variables to be estimated from 6 × n to n variables when observed in n different views. By moving the robot's head in different directions, we obtain 3D points using m feature locations corresponding to left and right images of the stereo camera, as shown in Figure 3. Because each corresponding 3D point is obtained from different camera coordinate systems, all 3D points must be transformed to the reference coordinate system by Equation (2). The positional error can then be computed over n corresponding 3D points for each feature point.
where p ij B denotes the 3D position of the j-th point at the i-th robot pose with respect to the robot base coordinate system, and p ij C denotes the 3D position of the j-th point at the i-th robot pose with respect to the camera coordinate system. However, the 3D points transformed to the robot base coordinate system may not be coincident with each corresponding point due to measurement noise. The MinVar method of finding the solution is to minimize the covariance of positional errors between the corresponding 3D points: where λ k

Extended MinVar Approach Using Surface Normal Vectors
In this section, we describe an extended version of the MinVar approach for head-eye calibration that uses surface normal vectors as its input data instead of a 3D point set, as shown in Figure 4. The computational cost of the proposed method can be decreased by avoiding the transformation of the 3D point set at every optimization step and utilizing surface normal vectors instead, which are calculated from the start.
The calibration board is geometrically represented by a 3D plane so that a set of normal vectors in the 3D plane at n different orientations is ideally coincident with one normal vector when transforming to the robot base coordinate system. However, error between the estimated vectors may occur in practice due to noisy input data. We estimate the variance of the error, and obtain the optimal solution by minimizing it. Using the normal vector of a 3D plane is a more simple and efficient way to find the solution than using the 3D points of the 3D plane, since it requires less coordinate transformations. The flow chart of the proposed method consisting of the following two steps is shown in Figure 5. In the first step, we estimate transformations (T H j B ) from the robot's head to the robot base coordinate system using forward kinematics, and the surface normal vectors (n j C ) of the calibration plane from stereo images according to n robot poses. In the second step, the estimated surface normal vectors are projected to the robot base coordinate system, and then the solution is determined by iteratively minimizing the covariance of differences between the normal vectors. We describe the procedure of our proposed method in the following sections in more detail.

Calculate a Transformation from Robot to Head (Neck) Using Forward Kinematics
Our head-eye calibration uses the motion of the robot's head for kinematic measurements, such as the yaw and pitch angles of the robot's head taken from each joint encoder. The robot head is regularly rotated in planned directions to autonomously extract kinematic measurements as well as visual measurements from multiple viewpoints.

Corner Detection and Triangulation
In this paper, we use a stereo camera mounted on the robot's head rather than a monocular camera that is usually used for hand-eye calibration. By collecting 2D corner points from the images, the poses with respect to the pattern coordinate of the camera can be estimated by [19]. In contrast to this approach, a set of 3D points is exploited rather than 2D corner points in the calibration process. In fact, for visual measurements, we use surface normal vectors, which are calculated from the reconstructed 3D points through triangulation using the 2D feature points of the left and right images taken by the stereo camera. The stereo camera is assumed to be both internally and externally calibrated in this work.
We use a robust method [20] to automatically detect calibration markers in the chessboard pattern. This method involves the use of two symmetric properties of the chessboard pattern relevant to geometric and brightness distribution, and two concentric circles as a probe. In particular, it is comparatively robust to changes in pose, scale, and illumination.
As shown in Figure 6, a 3D point P can be calculated in homogeneous coordinates by triangulation using two corresponding 2D corner points (u l , v l ) and (u r , v r ): where B is the baseline between the left and right cameras and f is the camera focal length.

Surface Normal Vectors in a Camera Coordinate System
As mentioned earlier, we estimate the surface normal vector of each 3D plane, geometrically representing the calibration board, and use this feature for head-eye calibration. For each viewpoint, the surface normal vector is estimated from the 3D positions of 2D corner points on the calibration image using principal component analysis (PCA) [21]. The principal components can be obtained via singular value decomposition (SVD). In practical implementation, random sample consensus (RANSAC) is also employed to improve the robustness to noise of conventional PCA.
The construction of normal vectors used in this work is shown in Figure 7. The surface normal we use is a directed line segment in 3D space defined by two distinct 3D points. The starting point, s i , is located at the center of the calibration pattern plane and the ending point, e i , is located normal to and d distance above the pattern plane (i.e., e i = dn i + s i , where n i is the unit vector normal to the pattern plane). The distance d is set to be half the length of the longer distance between the corner feature points. This distance is defined to give weight to the actual metric of the measurement. The directed line segment is constructed by connecting from s i to e i , and is the normal vector used as input for the following steps of the algorithm.

Transform Normal Vectors from Camera to Robot Coordinate System
After calculating the normal vector of each plane in the camera coordinate system, the set of all starting points S C r and ending points E C r in the camera coordinate system are transformed into the robot base coordinate system by where s i B,Θ Cr H and s i C r are the i-th starting points with respect to the robot base and camera coordinate systems, respectively, and e i B,Θ Cr H and e i C r are the i-th ending points with respect to the robot base and camera coordinate systems, respectively. S B (Θ C r H ) and E B (Θ C r H ) are the sets of all starting and ending points in the robot base coordinates, respectively, parametrized by the transformation given by Θ C r H , which is the vectorized 6-DOF constituents of T C r H . T(·) is the conversion function that takes a 6-DOF vector form of a transformation and converts it into a 4 × 4 transformation matrix as follows: This allows all normal vectors to be coincident with each other by transforming the normal vector of each plane associated with different views to the same coordinate system. By transforming only one normal vector per view using the proposed method as opposed to transforming all 3D points in the view, ExtMinVar significantly reduces the computational cost compared to MinVar as the number of iterations for the 2nd step of the algorithm increases. Since ExtMinVar only uses a normal vector representation, only two 3D points must be transformed to the robot base coordinate system per view, as opposed to the m 3D points needed to be transformed per view for the MinVar approach. This also means that the covariance matrix computation and non-linear optimization in Section 4.5 only have to run over two 3D points rather than m 3D points per view, drastically reducing the amount of computation needed. If the initial estimate includes large errors, more iterations in the second step of this algorithm is required, so the computation time for ExtMinVar is much smaller than that of MinVar as the number of iterations increase asymptotically. The difference in required time, shown in the results in Section 5, verifies this experimentally.

Optimization Based on Normal Vector Variance Minimization
Ideally, all normal vectors transformed to the robot base coordinate system are necessarily coincident with each other (i.e., the sizes of S B (Θ C r H ) and E B (Θ C r H ) are each 1). Error between ideal and estimated normal vectors nevertheless occurs, due to measurement noise or various errors in the estimation process. The optimal solution is then found by a constrained minimization of this error with respect to the condition that all normal vectors are coincident with the ideal one. In general, the error between surface normal vectors is calculated as the Euclidean distance between vectors. This distance is uniformly weighted for all normal vectors, which does not accurately reflect the characteristics of the data distribution. Each normal vector must therefore be weighted by a covariance matrix that is calculated from the distribution of normal vectors about the x-, y-, and z-axis. The covariance matrix of all starting points, S B (Θ C r H ), is given by where Cov(S B (Θ C r H )) is the covariance matrix of all starting points with respect to the robot base coordinate system at the kth iteration and cov (X) denotes the covariance of X. This is also done to the ending points to calculate the covariance matrix Cov(E B (Θ C r H )) of all ending points E B (Θ C r H ) with respect to the robot base coordinate system. In Equation (7), each entry of the covariance matrix is calculated as: where E(·) denotes the expected mean of its argument. In addition, s i B x , s i B y , s i B z are respectively the x-, y-, zcomponents of the ith plane center and s B x , s B y , s B z are respectively the mean of the x, y, and z components of the ith plane center. The same formula is used for the ending points with the corresponding changes in input variables.
Normally, the initial estimateT C r H of T C r H is not optimal, and thus we must compute the optimal solution through a non-linear optimization that minimizes the error between normal vectors. This requires a measure that corresponds to the uncertainty of the covariance matrix Cov(S B (Θ C r H )) in Equation (7). We choose the Schatten p-norm of Cov(S B (Θ C r H )), which is known as a class of discriminative metrics of matrices [22], as an uncertainty measure. Formally, the Schatten norm for a matrix X ∈ R m×n can be represented as: where σ (X) is the singular value vector of matrix X.
Using the Schatten 2-norm metric, we define a cost function J that measures the averaged uncertainty of covariance matrices Cov(S B (Θ C r H )) and Cov(E B (Θ C r H )) of the projected normal vectors: where Cov(S B (Θ C r H )) and Cov(E B (Θ C r H )) are the covariance matrices of the set of starting and ending points, respectively, in the robot base coordinates parametrized by Θ C r H , and Θ C r H is the optimal solution of Θ C r H . After the cost function is defined as above, we can perform a nonlinear optimization to find the optimal solution Θ C r H in Equation (10). Most non-linear optimization methods can be divided into gradient searchand direct search-based optimizations. The former exploits the differentiation of the objective function to determine the direction of the optimization process. The latter is used for the case that does not require differentiation of the objective function, but uses instead the iteration memory during the optimization. For simplicity, the optimization in this paper is performed using Powell's method [23], which is one of the direct search methods.

Experimental Results
We carried out both simulated and real experiments to validate the performance of the proposed method. To measure the error of head-eye calibration, we computed the difference (∆T C r H = (T re f C r H ) −1 T C r H ) between the reference transformation (TT re f C r H ) and the estimated transformation (T C r H ), as in [24]. For ∆T C r H , the orientation error ∆R C r H was calculated in terms of angular error using Rodrigues's rotation formula [25], and the translation error was measured by ∆t C r H in terms of absolute length.
In this experiment, we compared the closed-form and the iteration-based methods. Four closed-form methods were used, those from Park et al. [26], Liang et al. [5], Horaud et al. [6], and Chou et al. [27], whereas three iterative methods were used, which were MinVar [1], ExtMinVar by PCA, and ExtMinVar by RANSAC PCA (the latter two methods are proposed in this paper). The experiments were run on a desktop computer equipped with an i7-4790 Intel CPU with 32 GB of memory running on Windows 8.1. MATLAB was used for implementing the closed-form-based methods, and Visual Studio C++ was used for developing the iteration-based methods.

Simulation Results
For the simulation experiments, we generated a set of data that consisted of synthetic feature locations on images obtained from different viewpoints. The patterns on the calibration board had 40 (8 × 5) points, and the grid size of each pattern was 50 × 50 mm. The stereo camera had a spatial resolution of 320 × 240 pixels for both left and right cameras.
Robustness analysis in the presence of varying noise levels was performed by adding Gaussian noise to the ground truth data of the feature locations. Two datasets with different noise levels were generated for 100 trials in each experiment. We computed the mean and standard deviation of the rotation and translation errors, as shown in Figures 8 and 9. As one might expect, both rotation and translation errors increased as the measurement noise increased. These experimental results demonstrated that the iteration-based methods showed better performances than those of the closed-form based methods. Furthermore, our proposed method showed similar results to those of MinVar, which outperformed other calibration methods. As mentioned before, PCA was used to compute the surface normal vectors in the proposed method, but it is known to be sensitive to outliers. The ExtMinVar by PCA method showed some degradation in accuracy compared to the MinVar method. However, improved accuracy could be obtained with the removal of contaminated data prior to the estimation of the normal vector. In practice, we removed outliers by applying a robust regression technique, namely RANSAC. Figure 8 shows the rotational and translational estimation errors at varying noise levels in the first dataset. We obtained results with an error of less than 0.5 degrees of rotation, and less than 1 mm in translation.
In Figure 9, the averaged errors did not exceed a maximum of 2 degrees rotation and 5 mm translation in the second data set, though the performance was slightly degraded due to the increased noise levels. One can expect that a more stable estimate of the head-eye calibration would be achieved as more poses are used. Figure 10 shows that the number of captured images also affected the measurement accuracy. It shows a clear trend: as more input data was provided, the more stable and accurate the results became. We also measured the elapsed time to execute the calibration procedure. It was more efficient to use surface normal vectors instead of utilizing 3D points when the calibration was performed, as shown in Figures 11 and 12. In both cases, the proposed method was at least 3 to 4 times faster than the MinVar method. The optimization process of the MinVar method at different noise levels are graphically illustrated in Figures 13 and 14, which are helpful for understanding how the variance of 3D points is minimized at each iteration. As expected, the variance of 3D points at convergence differed depending on the noise level.  In contrast with the previous result, it should be noted that the normal vectors in the final iteration were not coincident due to the inherent uncertainty in the output of PCA with high levels of noise.

Real Experimental Results with Humanoid Robot MAHRU-Z
For the real data experiment, we used a humanoid robot named MAHRU-Z, which can perform various tasks in a home environment, as an experimental platform. In particular, MAHRU-Z provides practical domestic services based on dexterous object manipulation. It is able to articulate with 6-DOF for each arm and 2-DOF for the neck. A bumblebee stereo camera was mounted on the head of the robot for visual information (Bumblebee2, PointGrey Research, 320 × 240 pixels). The intrinsic parameters of the camera were fully calibrated according to [19]: the focal length was 404.409 in pixels, the camera's principal point was (61.371, 119.874) in pixels, and the base line was 0.1199 m.
The reliability in grasping objects depends on an accurate calibration from the robot's head to the stereo camera. Figure 17 shows the experimental setup for the head-eye calibration from the robot's head to the stereo camera. We developed an automatic procedure to calibrate the relationship between the robot's head and its eye by moving the robot's head in 25 pre-planned directions, capturing left and right pattern images, and extracting the feature locations from them. Figure 18 shows the process of detecting feature points on the calibration pattern and finding stereo correspondences between the left and right pattern images.  Experiments were performed using the same data for five different initial estimates selected from a random interval from −20 to 20 degrees of rotation and from −60 to 30 cm of translation, and the results of the proposed method and the MinVar method were compared. The calculated transformations were all very similar, but as the initial estimation had a larger error, the processing time increased. This is due to the increase in computation time required by the optimization.
To compare the performance of the proposed method and the MinVar method, we computed the sum of variance for the 3D points transformed by each T C r H to the robot base coordinate system and measured its processing time, shown in Table 1. We also compared the results of our proposed method against the qhec (quaternion hand-eye calibration) and dqhec (dual quaternion hand-eye calibration) optimization methods obtained from the open source implementation provided by Heller et al. [28]. However, both qhec and dqhec methods implemented in MATLAB were excluded from the comparison of processing time. Similar results were shown for the MinVar method and the ExtMinVar by RANSAC PCA compared to the qhec and dqhec. Although the variance of the ExtMinVar by PCA method was slightly larger than the others, the proposed method outperformed, in terms of processing time, those obtained by using the MinVar method. After each calibration, we obtained the optimal transformations from the robot's head to its eye, which were T 1 C r H for MinVar, T 2 C r H for ExtMinVar by PCA, T 3 C r H for ExtMinVar by RANSAC PCA, T 4 C r H for qhec, and T 5 C r H for dqhec. Based on the results of the head-eye calibration, the robot accomplished various domestic tasks, such as pushing a button on a microwave, grasping a cup, and delivering toast, as shown in Figure 19.

Conclusions
We introduced an approach to head-eye calibration, which manages the problem of determining the geometric relationship between a robot's head (neck) and eye (camera). This head-eye calibration is a practical example of how hand-eye calibration could be applied to real-world problems.
The proposed method is an extension of the MinVar approach, which uses surface normal vectors to reduce the large amounts of computation that using sets of 3D points brings. A central concept in the proposed method is the application of the constraint that all normal vectors transformed to the robot base coordinate system are coincident with each other. The solution was obtained by minimizing the covariance between normal vectors, and could be optimized with respect to measurement errors by imposing the constraint on all normal vectors. This method eliminates the need to transform all 3D points, triangulated on the calibration plane, to the robot base coordinate system in different viewpoints at every iteration of the optimization. This allows calibration to be performed more efficiently than in previous methods because it requires far fewer coordinate transformations, covariance calculations, and optimization searches. Furthermore, this method provides robustness to noisy measurements by minimizing the variance instead. Non-linear optimization is used to iteratively minimize the cost function and determine the transformation from the robot's head to the stereo camera.
We showed that our proposed method provides effective and robust results on both synthetic and real data. Experimental results also showed that a humanoid robot could perform various tasks in a domestic environment when the calibration results of the proposed method were applied to visual servoing.