1. Introduction
Robot navigation is one of the key challenges facing the field of mobile robotics. This is because mobile robots are required to drive themselves through a given environment using the information gathered from their sensors. These sensors include proprioceptive sensors such as motor-speed sensors, wheel-load sensors, joint-angle sensors, battery-voltage sensors and the inertial measurement unit (IMU), which produces a type of data called odometric data. Additionally, these robots are fitted with exteroceptive sensors such as image-feature sensors, distance sensors, light-intensity sensors, sound-amplitude sensors, and global positioning system (GPS) sensors. However, robot localization is typically inaccurate due to the uncertainty associated with measurement errors during robot configuration. Although the robot configuration data, such as wheel radii and wheelbase length, can be obtained simply from the robot specifications or by manual measurement, the actual parameters can be dissimilar in practice. This is due to systematic errors such as manufacturing errors, assembly errors, tire pressure variations, and load variations that reduce the precision associated with the movement of the mobile robot. It is therefore necessary to estimate these odometric parameters to improve the robot’s operational precision.
Antonelli’s calibration method [
1] uses the least-squares technique to observe a linear mapping between the unknowns and the measurements. It aims to identify a 4-parameter model, while the modified version [
2] estimates the physical odometric parameters and yields a 3-parameter model without any requirement for a predefined path. Some researchers, such as [
3,
4], have reduced the cumulative error in odometry by considering the coupled effect of errors in diameter, wheelbase errors, and scaling errors using a popular odometry calibration method for wheeled mobile robots, developed at and named the University of Michigan Benchmark test (UMBmark) [
5]. Some trials required a static wireless sensor network and GPS devices to be equipped on mobile robots to correct for the odometry errors [
6,
7]. However, this approach is disadvantageous in terms of cost-effectiveness.
Nowadays, mobile robots equipped with cameras which provide single color images, stereo images, or depth images are widely used in the fields of robot navigation, reconstruction, and mapping. Using this information, the robot can perform more precise and varied tasks. However, the relationship between the camera and robot in terms of 3D position and orientation is also important for accurate operation. Shiu and Ahmad [
8] introduced a solution for rigid transformation between the sensor and the robot, calculated in the form AX = XB. Hand-eye calibration, as proposed by Tsai and Lenz [
9], is a similar, albeit more efficient solution which does not depend on the number of images. In mobile robots, the 3D position and orientation of the camera relative to the robot’s base is considered instead. Kim et al. [
10] found that the head-eye transformation between the robot’s coordinate system and the camera’s coordinate system can be estimated simply and accurately by using the minimum variance technique, which is resistant to noisy environments.
In practical applications, the odometric parameters, such as wheel radii and wheelbase length, and the head-eye parameters are typically estimated separately, which requires a longer calibration time and increased inconvenience due to redundancy in these methods. To avoid the disadvantages, Antonelli et al. [
11] have proposed a simultaneous calibration method that performs both odometry and the head-eye calibrations simultaneously. Since the method required only the synchronized measurement of odometric data and visual features, it was successful in terms of reducing the calibration time and improving efficiency in the mobile robot calibration. However, their approach is only applicable to the mobile robot on which a fixed camera is mounted, while recent mobile robots are equipped with pan-tilt motorized camera systems for wide view. This caused incomplete estimation of the head-eye parameters, that is, the method could not provide the
z-axis translation parameter. Shusheng Bi et al. [
12] presented an improved version of [
11] in terms of accuracy, but still did not overcome the problem. Hengbo Tang et al. [
13] solved the problem by taking the advantage of the planar constraints of the landmarks. Despite their accurate estimation of the head-eye parameters and the odometric parameters, there is a clear limitation in the sense that a very constrained environment is needed and several recognizable landmarks must be premeasured and fixed.
In this paper, we present a full mobile robot calibration of head-eye and odometric parameters, building on [
14]. The full six parameters (rotation and translation) in the head-eye calibration and the three parameters (wheel radii and wheelbase length) in the odometry calibration are simultaneously estimated. The mobile robot equipped with a mono or stereo camera moves while the camera mounted on the pan-tilt motorized device is capturing chessboards or natural scenes. After simply planned robot movements, the full mobile robot calibration algorithm is performed using both odometry measurements and visual features such as chessboard’s corners or natural feature points from a stereo camera, which are obtained by Speeded-Up Robust Feature (SURF) [
15]. The head-eye and odometric parameters are iteratively adjusted to obtain the values, which are searched as a good starting point close to the ground truth please confirm intended meaning is retained. and then finally fine-tuned with the direct search-based optimization as Powell’s method [
16]. The remainder of this paper is organized as follows:
Section 2 describes our mobile robot configuration and the relationship between each joint.
Section 3 proposes an iterative-based calibration method for mobile robot having a camera mounted on a motorized neck.
Section 4 presents our experimental results, and the conclusion is summarized in
Section 5.
3. Simultaneous Calibration for Mobile Robot with Pan-Tilt Camera
In this section, we describe the proposed calibration method separately in six parts. The closed-loop transformation of the camera and the robot base to the robot neck between any frames
i and
j, where
i = 1, …,
N − 1 and
j =
i + 1 are concisely depicted in
Section 3.1. Since a set of captured images,
, and the calibration data set (such as rotating angles of wheels (
), and transformation from robot base coordinate to robot’s neck coordinate,
) are obtained once as the calibration input data.
Section 3.2,
Section 3.3, and
Section 3.4 explain how to use these data to obtain the head-eye rotation, intermediate wheel parameters, and head-eye translation, including the actual wheel parameters, respectively. These processes are estimated iteratively until the value of all parameters converge, as described in
Section 3.5. Finally,
Section 3.6 describes the non-linear optimization method that increases the accuracy of the calibration results.
3.1. Closed-Loop Transformations
Let us now consider the homogeneous transformation of different vehicle poses from frame
i to
j, as shown in
Figure 3. The abbreviations
are the camera, neck, and vehicle coordinate systems, respectively. The closed-loop diagram can be represented by Equation (
4).
where
is the physical relationship between
and
. The head-eye homogeneous transformation is
. When the camera is directed at the same target or feature, camera motion is represented as
. The robot’s motion between frame
i and
j is presented to
. The homogeneous transformation of Equation (
4) can be decomposed into rotational and translational terms as follows:
where
R is the
rotation matrix and
t is
translation vector. Equation (
5) is used to obtain the head-eye rotation parameters,
, which consist of 3 degrees of freedom (DOF),
). Equation (
6) refers to the translation of the system, which is used to estimate the head-eye translation (
) and the actual size of the wheel parameters (
). These parameters are calculated in
Section 3.4.
3.2. Head-Eye Rotation Estimation
The six parameters in head-eye calibration consisting of three for rotation and another three for the translation, which are necessary and required to be obtained before an operation. In this section, three parameters of the rotation between the robot’s neck and the camera are calculated precisely. A set of visual measurements and robot movement data were obtained by moving the robot and capturing images synchronously and continuously. The rotation between the camera and the robot’s base was estimated accurately, as by Antonelli et al. [
11]. They obtained the rotation parameters between the robot’s base and camera using Equivalent angle-axis representation. However, their approach achieved because their mobile robot had a camera equipped on a fixed neck that did not change the relation between the robot’s base and camera. If the relation between the base and the camera was changed during the calibration data collection, the Equivalent angle-axis method could not be used to solve the problem. During collecting the input calibration data of our mobile robot, both the robot and its neck move that means the rotation between the base and the camera are also changed.
In fact, whenever the robot’s neck moves around the pan-tilt axis, the coordinate of the camera mounted on that neck is also moved significantly. Therefore, the relationship between the camera and the robot’s neck is static, which means the subscript
i of
can be omitted as
=
=
= … =
, where
N is the total number of input images. Moreover, the mobile robot rotation is performed only on a planar. In other words, the mobile robot rotates around
z-axis only [
11]. Hence,
, in Equation (
5) can be replaced with
. Assuming the rotation matrices
,
, and
are known, which are thoroughly described in
Section 4, an Equation (
5) can be represented in form of AX = XB, and the rotation
can be obtained likewise [
8,
9] as follows:
where the matrix
X is the estimated head-eye rotation,
. The camera rotations,
, are represented to matrix
A. The remaining variables of the right side
are demonstrated to matrix
B, which
i = 1, …,
and
j = 2, …,
N.
3.3. Intermediate Wheel Parameters Estimation
In this section, the linear relationships of intermediate wheel parameters, (
), and rotational angle of the robot movement from the previous to current positions prior to capture any image
i,
, which is obtained with Equation (
3) using the period time of the robot movement between the previous and current positions, which is
(1) = 0 as no loss of generality. The change in the rotational angle of the robot’s base from frames
i to
j, on a planar
is also re-estimated. According to the robot’s movement on a plane, the change in rotational angle about the
z-axis of the robot’s base coordinates, which is assumed to be perpendicular to the floor, between a pair of consecutive frames is
. In practical applications, the
z-axis at the robot’s base coordinates from one consecutive frame to another, (
and
), may not be parallel because of an error in the estimated
and the floor plane of any pair of positions are not parallel. Therefore, the rotational angle about the
z-axis at the robot’s base coordinates between frames
i and
j can be calculated using the Euler angle (ZYX) as follows:
where
and
are generic elements of
. If we consider for
N images, the representation of Equation (
3) can be used to obtain the parameters, (
,
), similar to [
11], as follows:
where
, are obtained from the rotational angles of both wheels from position
i to
j, (
i = 1, …,
N − 1 and
j = 2, …,
N).
is a matrix with
dimensions. The intermediate wheel parameters, (
,
), can be calculated using the linear least squares method as follows
3.4. Head-Eye Translation and Wheel Parameters Estimation
The head-eye rotation and intermediate parameters have already been obtained in
Section 3.2 and
Section 3.3. The remaining parameters that are estimated in this section are
, the head-eye translation vector, and the actual wheel parameters
. The translational and the rotational components of the robot’s base coordinates can be described using the mobile robot kinematic equations as
where
i and
j denote the previous and current frames, respectively, and
is the period of time between frames. Substituting the intermediate wheel parameters and Equation (
2) into Equation (
11) yields
In fact,
is the relative translation in x and y directions, which can be rewritten as
From Equation (
13), the translation between two robot base positions can be substituted into Equation (
6) representing the relationship between the coordinates as follows:
where
and (
) are referred to the matrices
A and
B, respectively. Thus, Equation (
14) can be simplified as
from Equation (
16), the third component of the vector
is zero, which also makes the third row zero. It can be derived as follows:
Consider Equation (
17) over all frames. The final equation can be expressed as
where the left matrix of the left term has
dimensions; and the right term is a vector with
dimensions.
N is a total number of frames,
…
. From Equation (
18), the head-eye translation and the actual radii of the left wheel,
, are estimated using a linear least-squares method. The remaining parameters
and
b can be obtained with Equation (
2). The wheel parameters, (
,
,
b), which are obtained in this section, will be used to re-estimate the rotation between the robot’s head and neck, as previously described.
3.5. Linearly Iterative Estimation
Even though the previous approach [
11] could estimate the odometric and head-eye parameters simultaneously, their method could not provide the completed six parameters. The translation of
z-axis between head-eye coordinates,
was not obtained by their method. The proposed method presents a fully mobile robot calibration of odometric and head-eye parameters, building on [
14]. The rotation and translation in head-eye calibration including
and the odometric parameters are simultaneously estimated precisely. Supposing that the values of all parameters are not obtained correctly before optimization, it leads all parameters to convergence with incorrect values or divergence. Therefore, this section explains our contribution that we apply iteration-based estimation to initially guess the values of all parameters correctly that leads all parameters to convergence with the correct values rapidly. The processes of
Section 3.2,
Section 3.3 and
Section 3.4 are used to compute repeatedly until all parameters are converged. The head-eye parameters,
, are estimated from
Section 3.2 prior to being used to calculate the intermediate wheel parameters,
, of both wheels in
Section 3.3. After that, they are used to compute the remaining parameters, as described in
Section 3.4. These results are also used again to calculate the head-eye parameters following
Section 3.2, as shown in Algorithm 1, which shows steps 3 to 5 compute repeatedly until convergence.
Algorithm 1 Full algorithm simultaneous calibration for mobile robot with pan-tilt camera. |
Input: |
Output: |
for…do |
Step 1: Obtain between each frame using chessboard’s corners or natural features |
end for |
Step 2: Initial with manual measurements and obtain with Equation (3) |
while Convergence do |
Step 3: Compute with Equation (7) |
Step 4: Compute with Equations (8)–(10) |
Step 5: Compute with Equation (18) |
end while |
Step 6: Refine with Equation (19) |
3.6. Non-Linear Optimization
Even though all parameters that are estimated in the previous section can lead to a good initial estimation, they are probably not the correct and accurate values. Therefore, a method of minimizing a function as Powell’s method [
16] is applied to fine-tune all parameters as closely as the ground truth. The variables consisting of the Euler angles of the head-eye rotation,
, (
), head-eye translation, (
), and wheel parameters, (
,
b) are refined using the following equation:
where
is the predicted 3D features, which are used to transform any point
k from frame
j,
, to frame
i. The 3D feature at frame
i, represented with
and
, is constructed using head-eye rotational and translational parameters, (
).
is calculated using DH parameters and pan-tilt data from the encoders; while
is referred to
, which is calculated using wheel encoder data and the estimated wheel parameters,
.
N and
K are the total number of images and features, respectively.
4. Experimental and Results
In our experiments, we used a mobile robot, which had two RGB cameras mounted on the pan-tilt motorized device, as shown in
Figure 1. A chessboard and natural scenes were used as the calibration target for single camera and a pair of cameras (a stereo camera), respectively. The differential-drive mobile robot moved to any specified position and both the robot’s neck and pan-tilt axis also moved before capturing an image sequentially. The mobile robot moved and captured repeatedly to obtain a set of images,
. The data of the robot movements such as the rotational angles of both wheels (
,
) and of any image
n (
n = 1, …,
N) were obtained by their angular velocity (
) and the movement period time,
, as shown in Equation (
3). The angles of wheels at the starting position,
n = 1, were determined with
=
= 0. The transformation, including rotation and translation between robot’s base and robot’s neck coordinate systems,
, were calculated with the rotational angles of the pan-tilt axis at the robot’s neck and Denavit–Hartenberg parameters (DH parameters).
In the case that used a chessboard as the calibration target, we captured a set of images using single camera with a resolution of
pixels. The chessboard contained 10 × 7 black and white square grids (56 corner points), and the size of any grid was
cm. We extracted the feature points of the chessboard’s corners manually using [
17]. Since all feature points of the chessboard’s corners were obtained, the transformation
between the camera at position
i and the chessboard, which was determined to be the world coordinate, can be estimated with a plane-based transformation estimation [
18]. Therefore, the transformation between any pair of camera positions
i and
j,
, was simply estimated with
.
In the case of natural scenes, the natural features were observed from the real environment based on rectified images. The corresponding feature points of the stereo images were estimated with SURF [
19]. The transformations between any pair of camera positions,
, were obtained by a closed-form solution of the least-squares problem of absolute orientation using orthonormal matrices [
20]. The result of stereo matching is shown in
Figure 4.
The transformations (
and
), odometric parameters, and 3D back-projection error results of the proposed and Antonelli’s methods [
11] were compared in
Table 1. In the table of
, the head-eye rotation parameters (
) were obtained by the ZYX-Euler angle corresponding to the rotation matrix
. The comparison of the head-eye transformation result indicated that the rotation and translation calibrated with the proposed method showed completed parameters estimation, while Antonelli’s method did not obtain the transformation between the camera and the robot neck due to the fact that their mobile robot’s neck could not move.
Furthermore, we also compared the results of the transformation between the robot’s base and the camera,
, by calculation of the transformation at the starting position,
, which was obtained by
and
. The transformation matrix,
, of the proposed method was similar to Antonelli’s except the translation,
, that Antonelli’s method could not provide due to the constraining of the origin of the vehicle reference frame on the inertial x-y plane. The error was a 3D back-projection error, which was calculated with the average of Euclidean distance from all 3D feature points between any image and the transformed 3D feature points of other images, which was shown in mm units.
Figure 5 also presents the reprojection result between frames of our method.
Even though our method requires an iterative computation in
Section 3.5, all parameters reach stability within just a few iterations, as shown in
Figure 6. The calibration error after optimization using both chessboard and natural scenes with respect to the number of iterations is shown in
Figure 7. However, the 3D back-projection error of the calibration using natural scenes also depends on accuracy of the stereo matching process. Although the back-projection error using natural features is significantly higher than using features from a chessboard, both cases required only a few optimized iterations before the error was steady, which demonstrated the back-projection error of 4.4239 mm, as represented in
Table 1. The back-projection error before optimization (iteration = 0) and after optimization of both chessboard and natural scenes related to the number of images are represented in
Figure 8. It shows the number of poses that affect to the calibration accuracy. However, the required number of input images using a chessboard as the calibration target is at least 30 input images, while using the natural features requires at least 35 input images for the steady results.