Robotic Hand–Eye Calibration Method Using Arbitrary Targets Based on Refined Two-Step Registration

Song, Zining; Sun, Chenglong; Sun, Yunquan; Qi, Lizhe

doi:10.3390/s25102976

Open AccessArticle

Robotic Hand–Eye Calibration Method Using Arbitrary Targets Based on Refined Two-Step Registration

by

Zining Song

,

Chenglong Sun

,

Yunquan Sun

^* and

Lizhe Qi

^*

Intelligent Robotics Research Institute, Fudan Academy for Engineering and Technology, Fudan University, Shanghai 200433, China

^*

Authors to whom correspondence should be addressed.

Sensors 2025, 25(10), 2976; https://doi.org/10.3390/s25102976

Submission received: 2 April 2025 / Revised: 27 April 2025 / Accepted: 28 April 2025 / Published: 8 May 2025

(This article belongs to the Section Sensors and Robotics)

Download

Browse Figures

Versions Notes

Abstract

To optimize the structure and workflow of the 3D measurement robot system, reduce the dependence on specific calibration targets or high-precision calibration objects, and improve the versatility of the system’s self-calibration, this paper proposes a robot hand–eye calibration algorithm based on irregular targets. By collecting the 3D structural information of an object in space at different positions, a random sampling consistency evaluation based on the fast point feature histogram (FPFH) is adopted, and the iterative closest point (ICP) registration algorithm with the introduction of a probability model and covariance optimization is combined to iteratively solve the spatial relationship between point clouds, and the hand–eye calibration equation group is constructed through spatial relationship analysis to solve the camera’s hand–eye matrix. In the experiment, we use arbitrary objects as targets to execute the hand–eye calibration algorithm and verify the effectiveness of the method.

Keywords:

hand–eye calibration; 3-dimention measurement; RANSAC; robotic

1. Introduction

Robot-based 3D measurement systems have been widely applied in quality inspection [1,2,3,4,5] and reverse engineering [6,7]. Among these systems, hand–eye calibration is an essential initialization step, where its accuracy significantly impacts the final measurement precision [8].

Current 3D camera calibration methods primarily rely on specific targets, including 2D patterns with distinctive features and specially designed 3D targets [9]. For 2D targets, notable methods include Tsai’s calibration approach based on radial alignment constraints using planar targets with known 3D structures [10], Zhang’s flexible camera calibration technique using planar patterns [11], as well as target designs based on Zhang’s method such as ARToolkit [12], ARTag [13], and AprilTag [14] calibration boards. Regarding 3D targets, designs include standard spherical targets [15] and those with specific geometric features [16,17,18,19].

Although target-based calibration methods demonstrate excellent accuracy and stability, their performance depends heavily on the manufacturing precision of the targets themselves, requiring specific designs and preparations while being sensitive to environmental variations. Consequently, researchers have recently explored target-less calibration approaches. Reference [20] proposed a method using arbitrary free-form surfaces, estimating hand–eye parameters through feature matching between measurement data and design models. Reference [21] introduced a scene-feature-based approach that computes camera intrinsics and hand–eye relationships by extracting ORB features from consecutive scene images. However, this method depends on the features contained in depth images and imposes strict requirements on capture range and motion sequence continuity.

To enhance the robustness of robot hand–eye calibration and reduce dependence on high-precision targets and controlled environments, this paper presents a novel calibration method using unfixed 3D objects. Our approach employs arbitrary objects of appropriate size as 3D targets, utilizing a two-step improved ICP algorithm with RANSAC-based feature recognition. By performing local registration of the target from different viewpoints, we establish relative relationships between target positions in various camera frames. Through the comprehensive modeling of the hand–eye system and the introduction of a virtual camera coordinate origin, we transform target relationships into inter-camera transformations, constructing the parameters required by traditional calibration methods to solve for high-accuracy hand–eye matrices.

This research implements robot hand–eye calibration by establishing coordinate transformations of non-fixed 3D targets through registration based on RANSAC feature recognition and improved ICP algorithms. Section 2.3 introduces the calibration methodology and core algorithms. Section 2.4 introduces the multi-step registration algorithm. Section 3 presents experiments using arbitrary targets, accuracy validation tests, and detailed error analysis. Section 4 summarizes the paper, discusses the limitations of the current study, and outlines potential future research directions.

2. Methods

2.1. Hand–Eye Robot System

Based on the position of the camera used to acquire calibration information—whether it is mounted on the robot’s end-effector or external to the robot—robot hand–eye systems can be classified into two categories as follows: Eye-In-Hand and Eye-to-Hand. The system designed in this paper includes both a structured light camera mounted on the robot’s end-effector to capture target information and a binocular tracking camera to obtain the pose of the robot’s end-effector. Therefore, this system does not fall into either of these two typical categories. As shown in Figure 1, the hand–eye system used in this paper consists of the following components:

Fixed binocular tracking camera (W): Used to track the robot’s end-effector;
Tracking target (T): Mounted on the robot’s end-effector, tracked by the binocular camera;
Structured light 3D scanner (C): Mounted on the robot’s end-effector, used to capture 3D information of the target;
Arbitrary 3D target (B): Used as the calibration object.

This system integrates both internal and external sensing capabilities, enabling robust and flexible calibration without relying on a fixed target or a single camera configuration.

In this system, the coordinate systems are defined as follows:

{O_{b}}

: The fixed arbitrary 3D target coordinate system.

{O_{t}}

: The tracking target coordinate system attached to the robot’s end-effector.

{O_{c}}

: The structured light 3D camera coordinate system.

{O_{w}}

: The fixed world coordinate system of the tracking camera.

The transformation matrices are defined as follows:

T_{t}^{w}

: The homogeneous transformation matrix from the tracking target coordinate system

{O_{t}}

to the world coordinate system

{O_{w}}

, obtained by the tracking camera.

T_{c}^{t}

: The homogeneous transformation matrix from the tracking target coordinate system

{O_{t}}

to the structured light camera coordinate system

{O_{c}}

that needs to be calibrated.

T_{b}^{c}

: The homogeneous transformation matrix from the arbitrary 3D target coordinate system

{O_{b}}

to the structured light camera coordinate system

{O_{c}}

.

When the measurement system is operational, a point

p_{c}^{i}

measured in the structured light camera coordinate system is transformed into a point

p_{w}^{i}

in the world coordinate system using the following transformation:

p_{w}^{i} = T_{t_{i}}^{w} T_{c}^{t} p_{c}^{i}

(1)

Here,

T_{t_{i}}^{w}

represents the pose of the rigid body composed of the structured light camera and its attached tracking target in the world coordinate system at the time of measurement. By unifying the measurement results from the structured light camera at different positions and orientations, the final complete measurement result is obtained.

This approach ensures that the measurements from the structured light camera are accurately mapped to the world coordinate system, enabling precise and consistent calibration and reconstruction.

2.2. Hand–Eye Calibration Algorithm

As shown in Figure 2, the raw data for the calibration algorithm proposed in this paper includes measurement data of non-specific 3D targets and the corresponding tracking data. The computation of the hand–eye matrix

T_{c}^{t}

is completed in three steps as follows: registration, equation construction, and solution.

2.3. Construction of Calibration Equation

As shown in Figure 3, the robot is used to drive the structured light camera to capture the point set

P_{c} = {p_{c 1}, p_{c 2}, \dots, p_{c j}}

of the arbitrary 3D target in the structured light camera coordinate system

{O_{c}}

. Simultaneously, the tracking camera obtains the pose

{T_{t}^{w}}

of the tracking target t in the world coordinate system

{O_{w}}

for each acquisition.

At positions i and j, the homogeneous transformation matrices between the world coordinate system

O_{w}

and the structured light camera coordinate system

O_{c}

are denoted as

T_{c i}^{w}

and

T_{c j}^{w}

, respectively. These matrices satisfy the following relationships:

\{\begin{matrix} T_{c i}^{W} = {T_{w}^{t i}}^{- 1} \cdot T_{c}^{t} \\ T_{c j}^{W} = {T_{w}^{t j}}^{- 1} \cdot T_{c}^{t} \end{matrix}

(2)

The homogeneous transformation matrix between the structured light camera coordinate systems at the two positions, denoted as

T_{c j}^{c i}

, satisfies the following:

T_{c j}^{c i} = {T_{c j}^{w}}^{- 1} \cdot T_{c i}^{w}

(3)

By combining Equations (2) and (3), we obtain the following:

T_{c j}^{c i} = {({T_{w}^{t j}}^{- 1} \cdot T_{c}^{t})}^{- 1} \cdot ({T_{w}^{t i}}^{- 1} \cdot T_{c}^{t}) = {T_{c}^{t}}^{- 1} \cdot T_{w}^{t j} \cdot {T_{w}^{t j}}^{- 1} \cdot T_{c}^{t}

(4)

Here,

T_{t j}^{t i}

represents the registration result of the point cloud of the stereo calibration target at the two positions, and it satisfies the following:

T_{t j}^{t i} = T_{w}^{t i} \cdot {T_{w}^{t j}}^{- 1} = {(T_{w}^{t j} \cdot {T_{w}^{i j}}^{- 1})}^{- 1}

(5)

Substituting Equation (5) into Equation (4), we obtain the following:

T_{c j}^{c i} = {T_{c}^{t}}^{- 1} \cdot {T_{t j}^{t i}}^{- 1} \cdot T_{c}^{t}

(6)

{T_{t j}^{t i}}^{- 1} \cdot T_{c}^{t} = T_{c}^{t} \cdot T_{c j}^{c i}

(7)

In Equation (7), let

{T_{t j}^{t i}}^{- 1} = A

,

T_{c j}^{c i} = B

, and

T_{c}^{t} = X

. For a pair of 3D calibration target measurements and their corresponding tracked target poses, the structure of the robot hand–eye matrix

T_{c}^{t}

can be formulated as the equation

A X = X B

.

As shown in Figure 4, this process is extended to perform 3D scanning of the stereo calibration target at multiple poses, obtaining multiple sets of point cloud data in the structured light camera coordinate system, as well as the corresponding tracked target poses

T_{w}^{t i}

. Sequentially, the absolute poses of the tracked target in the world coordinate system,

T_{w}^{t i}

and

T_{w}^{t j}

, are used to obtain

n - 1

pairwise relative pose homogeneous matrices

T_{t j}^{t i}

. Sequentially, adjacent point clouds are pairwise calibrated, resulting in

n - 1

homogeneous matrices

T_{c j}^{c i}

from n sets of data. After obtaining multiple

A X = X B

matrix equations, quaternion decomposition and the least squares method are used to solve the system of matrix equations.

In addition to the direct derivation of the transformation matrix equation system mentioned earlier, this calibration process can also utilize existing data and the spatial relationships contained within to convert parameters into a homogeneous transformation matrix of the 3D calibration target in the structured light camera coordinate system, similar to the calibration method described previously. For any point

(P_{c i}, P_{c i}, P_{c i})

on the 3D calibration target measured by the structured light camera at positions i and j, the transformation relationship

T_{t i}

between the point clouds after registration satisfies the following:

P_{c i} = T_{t i} \cdot P_{c j}

(8)

In the world coordinate system, any point p on the 3D calibration target satisfies the following in the structured light camera coordinate system:

P_{c i} = T_{w i} \cdot P_{w}

(9)

Combining these equations, we obtain the following:

T_{w j} \cdot P_{w} = T_{t i} \cdot T_{w i} \cdot P_{w}

(10)

Thus, the following holds:

T_{w j} = T_{t i} \cdot T_{w i}

(11)

Here,

(T_{w i}, T_{w j})

represents the poses of the camera in the world coordinate system. The transformation of the structured light camera’s pose in space is the inverse of the point cloud transformation, expressed as follows:

T_{w j} = T_{t i}^{- 1}

(12)

After indirectly obtaining the position relationship of the camera in the world coordinate system through point cloud registration, variables can be constructed to satisfy the data input requirements of traditional hand–eye calibration, and the corresponding algorithm can be invoked. The inputs are the homogeneous matrix

T_{gripper 2 base}

from the robot flange to the world coordinate system and the homogeneous matrix

T_{Target 2 cam}

from the target to the camera. The output is the hand–eye matrix.

In the system constructed in this work, the homogeneous matrix

T_{v w}

of the tracked target in the world coordinate system of the tracking camera is equivalent to the homogeneous matrix

T_{gripper 2 base}

from the robot flange to the world coordinate system. The virtual camera coordinate system is defined as the coordinate system of the first stereo calibration target, and its matrix

T_{gripper 2 base}

is the identity matrix I. For the k-th stereo calibration target, the homogeneous matrix

T_{gripper 2 base}

relative to the virtual camera coordinate system is the registration result

T_{c k - 1}

of the stereo calibration target point cloud, which can also be expressed as the product of the sequential registration results as follows:

T_{gripper 2 base} = {T_{c i}^{c k}}^{- 1} = {\prod_{i = k}^{n} T_{c i}^{c 1}}^{- 1}

(13)

By constructing the virtual camera coordinate system using the relative relationships of the stereo calibration targets obtained through registration, the transformation of homogeneous matrices can be achieved, making it compatible with existing hand–eye matrix calculation functions. This enables the implementation of the robot hand–eye calibration calculation based on arbitrarily shaped 3D calibration targets as designed in this work.

2.4. Multi-Step Registration Algorithm

The hand–eye calibration method proposed in this paper that is based on arbitrary 3D targets relies on the scanning and calibration of the target’s point cloud. It requires precise calibration calculations for point clouds with significant initial pose differences and partially non-overlapping regions to accurately determine the spatial relationship of the target’s measured area. This, in turn, enables the determination of the spatial relationship of the camera.

As shown in Figure 5, to meet the requirements of the calibration algorithm, this paper designs a registration method that includes the following two steps: coarse registration based on local geometric feature calculation and matching, and fine registration based on an improved ICP algorithm, achieving the preliminary processing of the collected data.

The RANSAC (Random Sample Consensus) algorithm is used to calculate the local feature distribution of point clouds based on a FPFH (Fast Point Feature Histogram) [22], enabling the registration of the source point cloud with the target point cloud. First, the voxel size for voxel filtering is set as the dimension for voxel downsampling, along with the radius for normal vector estimation and the search radius for FPFH feature extraction. For the downsampled voxel grid point cloud, the voxel size is used as the basic unit for the KD-tree search radius, and the nearest neighbors within each point’s neighborhood are searched to estimate the normal vectors.

The estimated normal vectors of each point are used to compute the SPFH (Simplified Point Feature Histogram) for that point. The voxel size from the voxel downsampling is set as the basic unit for the neighborhood point search radius. For each point, it forms a point pair

(p_{t}, p_{s})

and their estimated normal vectors

(n_{t}, n_{s})

, a coordinate system

O_{p} (u, v, w)

is constructed, satisfying:For a point pair

(p_{t}, p_{s})

and their estimated normal vectors

(n_{t}, n_{s})

, and a coordinate system

O_{p} (u, v, w)

is constructed, satisfying the following:

\{\begin{matrix} u = n_{s} \\ v = \frac{(p_{t} - p_{s})}{| | (p_{t} - p_{s}) | |} \times u \\ w = u \times v \end{matrix}

(14)

In the coordinate system, the pairwise features

α

,

ϕ

, and

θ

for the point pair are calculated as follows:

\{\begin{matrix} α = v s . \cdot n_{t} \\ ϕ = u \cdot \frac{(p_{t} - p_{s})}{d} \\ arctan (w \cdot n_{t}, u \cdot n_{t}) \end{matrix}

(15)

where d represents the Euclidean distance between the point pair as follows:

d = | | p_{t} - p_{s} {| |}_{2}

(16)

Through the above calculations, for each point

p_{s}

and its neighboring points

{p_{t}}

, the n point pairs form the SPFH for each point, which can also be expressed as follows:

S P F H (p_{t}) = [h_{α} (p_{t}), h_{ϕ} (p_{t}), h_{θ} (p_{t})]

(17)

where

h_{α} (p_{t})

,

h_{ϕ} (p_{t})

, and

h_{θ} (p_{t})

are histograms based on the values of

α

,

ϕ

, and

θ

.

For each point, the SPFH is weighted and summed, incorporating the information from neighboring points into the final feature parameters of the point. For point

p_{t}

, its

FPFH (p_{t})

consists of the following two parts: the first part is its own

SPFH (p_{t})

, and the second part is the weighted average of the k neighboring points using the inverse distance as the weight. Thus, the

FPFH (p_{t})

can be expressed as follows:

F P F H (p_{t}) = S P F H (p_{t}) + \frac{1}{k} \sum_{i = 1}^{k} \frac{1}{d (p_{t}, p_{i})}

(18)

Through the above calculations, the FPFH of the source point cloud and the target point cloud are obtained. Randomly select N feature point pairs

P_{s}, P_{t}

from the source and target point clouds where the FPFH matches. First, we use SVD (Singular Value Decomposition) to estimate the rotation part of the rigid transformation matrix for these point pairs. Centroid normalization is performed on these points as follows:

\{\begin{matrix} P_{t}^{'} = P_{t} - \frac{1}{N} \sum_{i = 1}^{N} P_{t, i} \\ P_{s}^{'} = P_{s} - \frac{1}{N} \sum_{i = 1}^{N} P_{s, i} \end{matrix}

(19)

For the centroid-normalized point sets

P_{s}^{'}

and

P_{t}^{'}

, the covariance matrix H is calculated as follows:

H = \sum_{i = 1}^{N} P_{s}^{'} [i] ⨂ P_{t}^{t} [i]

(20)

where ⨂ denotes the outer product.

The covariance matrix H is decomposed using SVD as follows:

H = U S V^{T}

(21)

From this, the rotation transformation R for the point sets

P_{s}^{'}

and

P_{t}^{'}

is obtained as follows:

R = V \cdot U^{T}

(22)

Based on the rotation transformation R, the origin displacement of the point sets

P_{s}^{'}

and

P_{t}^{'}

is calculated to obtain the translation transformation t as follows:

t = \frac{1}{N} \sum_{i = 1}^{N} P_{t, i} - R \cdot \frac{1}{N} \sum_{i = 1}^{N} P_{s, i}

(23)

Combining R and t yields the homogeneous transformation matrix T, as follows:

T = [\begin{matrix} R & t \\ 0 & 1 \end{matrix}]

(24)

Based on the initially calculated homogeneous transformation matrix T, the distance difference between the feature-matched corresponding points is computed. If the difference is below the threshold, the global feature registration is completed. If the difference exceeds the threshold, the above steps are repeated, and the final homogeneous transformation matrix T is obtained by cumulative multiplication and combination.

For the point cloud after coarse registration based on RANSAC and FPFH features, the optimized Iterative Closest Point (ICP) algorithm [23] is executed to further improve registration accuracy, achieving results that meet the precision requirements of the calibration process.

Using the registration result from RANSAC as the initial transformation, for each point

p_{i}

in the source point cloud and

p_{j}

in the target point cloud, the Mahalanobis distance

d (p_{i}, p_{j})

is calculated as the error metric as follows:

d (p_{i}, p_{j}) = {(p_{i} - p_{j})}^{T} \cdot {(C_{i} + C_{j})}^{- 1} \cdot (p_{i} - p_{j})

(25)

where

C_{i}

and

C_{j}

are the covariance matrices of the source and target points within their neighborhoods. For any point

p_{i}

in the point cloud, its covariance is the outer product of the coordinate differences between

p_{i}

and its neighboring points

q_{j}

, expressed as follows:

C_{i} = \frac{1}{N} \sum_{j = 1}^{n} {(q_{j} - p_{i})}^{T} \cdot (q_{j} - p_{i})

(26)

where n is the neighborhood size.

For the registration step, the target transformation matrix T is as follows:

T = [\begin{matrix} R & t \\ 0 & 1 \end{matrix}] \in S E (3)

(27)

The Mahalanobis distance-based error

E (T)

between the target point set

{p}

and the source point set

{q}

is expressed as follows:

E (T) = \sum_{i = 1}^{n} {(q_{i} - T p_{i})}^{T} \cdot C_{i}^{- 1} \cdot (q_{i} - T p_{i})

(28)

Expanding this gives the following:

E (R, t) = \sum_{i = 1}^{n} {(q_{i} - (R p_{i} + t))}^{T} \cdot C_{i}^{- 1} \cdot (q_{i} - (R p_{i} + t))

(29)

The Gauss–Newton method is used to perform a first-order Taylor expansion of the Mahalanobis distance error

E (T)

for iterative optimization. First, the transformation matrix T is parameterized as a vector x as follows:

x = {[ω^{T}, t^{T}]}^{T}

(30)

where

ω \in R^{3}

is the Lie algebra representation of

R \in S O (3)

.

For each point pair

(p_{i}, q_{i})

, the corresponding residual

e_{i} (x)

is as follows:

e_{i} (x) = q_{i} - (R p_{i} + t)

(31)

The weighted sum of squared residuals

E (x)

is expressed as follows:

E (x) = \sum_{i = 1}^{n} e_{i}^{T} C_{i}^{- 1} e_{i}

(32)

The residual

e_{i} (x)

is Taylor-expanded to approximate the nonlinear error as follows:

e_{i} (x + Δ x) \approx e_{i} (x) + J_{i} Δ x

(33)

where

Δ x = {[Δ ω^{T}, Δ t^{T}]}^{T}

is the parameter increment, and

J_{i} = \frac{\partial e_{i}}{\partial x} \in R^{3 \times 6}

is the Jacobian matrix of the error

e_{i}

with respect to parameters x. The rotation derivative

\frac{\partial e_{i}}{\partial ω}

and translation derivative

\frac{\partial e_{i}}{\partial t}

are expressed as follows:

\frac{\partial e_{i}}{\partial ω} = - I_{3 \times 3}

(34)

\frac{\partial e_{i}}{\partial t} = \frac{\partial R p_{i}}{\partial t} \approx - {(R p_{i})}^{\land}

(35)

where

{(\cdot)}^{\land}

denotes the cross-product matrix of the vector. Through Taylor expansion, the objective function

E (x)

can be approximated as a least squares problem as follows:

E (x + Δ x) \approx \sum_{i = 1}^{n} {(e_{i} + J_{i} Δ x)}^{T} C_{i}^{- 1} (e_{i} + J_{i} Δ x)

(36)

E = \sum_{i = 1}^{n} (e_{i}^{T} C_{i}^{- 1} e_{i} + 2 e_{i}^{T} C_{i}^{- 1} J_{i} Δ x + Δ x^{T} J_{i}^{T} C_{i}^{- 1} J_{i} Δ x)

(37)

Ignoring higher-order terms, it is approximated as follows:

E \approx \sum_{i = 1}^{n} e_{i}^{T} C_{i}^{- 1} e_{i} + 2 Δ x \sum_{i = 1}^{n} J_{i}^{T} C_{i}^{- 1} e_{i} + Δ x^{T} (\sum_{i = 1}^{n} J_{i}^{T} C_{i}^{- 1} J_{i}) Δ x

(38)

To minimize E, the derivative with respect to

Δ x

is taken and set to zero, yielding the Gauss–Newton equation, expressed as follows:

(\sum_{i = 1}^{n} J_{i}^{T} C_{i}^{- 1} J_{i}) Δ x = - \sum_{i = 1}^{n} J_{i}^{T} C_{i}^{- 1} e_{i}

(39)

where

H = \sum_{i = 1}^{n} J_{i}^{T} C_{i}^{- 1} J_{i}

is the Gauss–Newton Hessian matrix, and

g = - \sum_{i = 1}^{n} J_{i}^{T} C_{i}^{- 1} e_{i}

is the gradient vector.

After obtaining the linear system of equations for

Δ x

, an approximate solution is computed iteratively until

| Δ x_{k} |

is below the threshold

ϵ

. The cumulative

Δ x

corresponds to

Δ T

, yielding the final transformation matrix T.

3. Experiment

3.1. Experimental Platform and Evaluation Criteria

3.1.1. Experimental Platform

We build the experimental hardware system as shown in Figure 6. The experiment uses the JAKA Zu7 collaborative robot to drive the structured light camera and the tracking target, with a repeatability of

\pm 0.02

mm. The PhoXi 3D Scanner S structured light camera is used for 3D data scanning, with a scanning range of 384–520 mm, point spacing of 0.174 mm, and calibration accuracy of 0.050 mm. The AITS D-Series Smart tracking camera is used to track the target on the robotic arm to obtain the end-effector pose, with a tracking distance of 0.5–2.0 m and tracking accuracy of 0.15–0.30 mm. Arbitrarily selected objects are used as the calibration targets for hand–eye calibration experiments. After completing system calibration, a ceramic standard sphere with a diameter of

25 \pm 0.001

mm is used for calibration experiments. The experiments include 3D reconstruction error experiments and distance measurement error experiments.

3.1.2. Evaluation Criteria

The 3D reconstruction error refers to controlling the robot to measure the point cloud of a standard sphere placed at a fixed position from different locations. The sphere center position

O_{c}

in the structured light camera coordinate system is obtained through fitting. Using the hand–eye matrix from the previous calibration experiment and the transformation matrix of the structured light camera in the world coordinate system obtained from the tracking camera, the sphere center positions scanned from different locations are unified into the world coordinate system of the tracking camera. The position error of the fixed standard sphere center unified into the world coordinate system under different structured light camera positions is the metric for evaluating the 3D reconstruction error of the robot measurement system.

The distance measurement error refers to controlling the robot system to remain stationary and using a slide rail with a movement accuracy of 0.1 mm to measure the point cloud of the standard sphere before and after movement. After sphere center identification and coordinate system transformation, the error between the distance of the sphere center positions in the world coordinate system and the actual measured distance is the distance measurement error.

3.2. Experimental Procedures

According to the calibration process described in the methodology, the proposed calibration method requires experimental data consisting of a set of 3D point cloud scans of an object and a corresponding set of tracking poses. As shown in Figure 7, the robot-controlled scanning camera is used to scan a toolbox as the 3D target at different positions, resulting in 16 sets of corresponding point clouds and tracking poses.

According to the registration method proposed in this study, the number of iterations in the initial stage of global registration significantly influences the registration accuracy. As illustrated in Figure 8, during the registration of the first two sets of target point clouds, the impact of the number of RANSAC iterations on the RMSE is demonstrated. It can be observed from Figure 8 that a relatively small number of iterations is sufficient for achieving stable registration performance, with the RMSE reaching its optimal and stable value after approximately 8 iterations.

As shown in Figure 9, following the registration method described in Section 2.2, 15 sets of registrations are performed on the 16 sets of point clouds, resulting in 15 transformation matrices

{T_{t 1}^{t i}}

that represent the relative poses of point clouds 2–16 with respect to point cloud 1.

Because the calibration method proposed in this paper involves the registration of the same object under significant pose variations, the standard ICP registration algorithm is prone to becoming stuck in local optima and failing to achieve correct registration due to large initial pose differences and small overlapping areas. However, since this paper introduces a global registration algorithm and a covariance-optimized ICP algorithm, it can effectively achieve point cloud registration in such scenarios. Figure 10 shows the registration of the first four pairs of point clouds in this experiment using the standard ICP method, and most of the point clouds failed to achieve correct registration.

To compare the computation time of the standard ICP method and the proposed registration method under this dataset, the experiment was conducted on a laptop equipped with an AMD Ryzen 7 4800U processor. This processor is based on the Zen 2 architecture, featuring eight physical cores and 16 threads, a base clock frequency of 1.8 GHz, a maximum boost clock frequency of up to 4.2 GHz, and 8 MB of L3 cache. As shown in Figure 11, due to the introduction of global registration and covariance optimization, the proposed method can consistently maintain the computation time for each step at approximately 4 s. In contrast, standard ICP is prone to becoming trapped in local optima, which increases computation time. The proposed registration method reduces computation time by 81.87.

The spatial poses contained in this set of data are illustrated in Figure 12. Simultaneously, during each scan, the tracking camera obtains the transformation matrices

{T_{w}^{t i}}

of the tracking target on the structured light camera in the world coordinate system.

The measured values are input into the hand–eye calibration algorithm proposed in Section 2, yielding the transformation matrix of the tracking target relative to the structured light camera coordinate system, i.e., the hand–eye matrix

T_{c}^{t}

, with the following value:

T_{c}^{t} = [\begin{matrix} 0.66942031 & - 0.69956424 & 0.24997263 & 60.8915365 \\ - 0.64794354 & - 0.71441506 & - 0.26415958 & - 122.40444553 \\ 0.36338081 & 0.01486563 & - 0.93152209 & - 15.22424098 \\ 0 & 0 & 0 & 1 \end{matrix}]

3.3. Three-Dimensional Reconstruction Error

This paper uses the three-standard-sphere calibration method as the control group. The robot hand–eye matrix is obtained by following the calibration process proposed in Section 2.2, using 16 sets of coordinate systems composed of three standard spheres. Figure 13 shows the visualization results of the 3D reconstruction error validation for the ceramic standard sphere using the calibration results obtained in the previous chapter. The mean error of the 16 sets of sphere center coordinates is 1.53 mm, with a standard deviation of 0.42. In the control experiment under the same hardware system, as shown in Figure 14, the 3D reconstruction results of the ceramic standard sphere using the proposed method are compared with those obtained using the three-sphere-center coordinate system method. The mean error of the 16 sets of sphere center coordinates for the latter is 3.79 mm, with a standard deviation of 0.99. The experimental results demonstrate that the proposed method, compared to the three-point coordinate system method, achieves a 59.62% reduction in mean error and a 57.54% reduction in standard deviation, while requiring lower precision for the calibration object and simpler execution steps.

A comparison of the 3D reconstruction position errors between the method proposed in this paper, the three-standard-sphere coordinate system method, and other research methods is shown in Table 1.

The 3D reconstruction data demonstrates that the proposed method in this paper achieves the smallest mean error in the calibration of structured light cameras for arbitrary 3D objects. Compared to the experimental results obtained using laser scanner devices, the error is significantly larger. While the evaluation metric used in this paper is a 3D reconstruction error, recent learning-based methods such as EasyHeC [27] and EasyHeC++ [28] report their performance in terms of translation error during hand–eye calibration. According to the original publications, EasyHeC(2023) achieves a mean translation error of 2.06 mm, and EasyHeC++(2024) improves this to 1.35 mm, both under specific experimental setups involving fiducial targets and high-performance industrial cameras. In comparison, the 3D reconstruction error reported in this paper is 1.53 mm, which inherently incorporates the calibration error as one of its components. This indicates that the proposed method achieves an overall performance comparable to or better than EasyHeC and EasyHeC++ in practical applications, despite using low-cost arbitrary 3D targets and a more accessible experimental setup. The results highlight the flexibility and applicability of the proposed method in scenarios where specialized hardware and fiducial targets are unavailable.

3.4. Distance Measurement Error

As shown in Figure 15, for a set of linear sliders, the point clouds of the spherical surface obtained at positions of −10.0 mm, 0.0 mm, and 10.0 mm have sphere center distances of 10.47 mm and 10.31 mm, respectively, with an average distance measurement error of 0.39 mm. The sources of error mainly include the calibration error of the structured light camera (0.05 mm), the tracking error of the tracking camera (0.15 mm), potential positional errors in the sphere center fitting calculation, and cumulative errors that may arise during the computational process.

4. Conclusions

In robotic measurement systems, hand–eye calibration is an essential task. For the calibration of 3D cameras, it is often necessary to perform complex intrinsic camera calibration, followed by calibration using a calibration board with specific patterns or relying on calibration objects with high machining precision, such as standard spheres. To simplify the calibration process, this paper proposes a hand–eye calibration method based on a highly adaptable and high-precision registration approach, using arbitrary 3D targets. The proposed method employs arbitrary spatial objects as 3D targets, obtains the relative positions of each captured data through multi-step registration, and indirectly derives the spatial relationships between cameras, thereby establishing a system of linear equations for the hand–eye relationship. By solving these equations, the hand–eye relationship of the camera system is obtained. Experimental results show that, compared to conventional calibration methods based on standard spheres for establishing spatial coordinate systems, the proposed method achieves higher precision and greater spatial coverage in 3D reconstruction error experiments, as well as higher accuracy in distance measurement error experiments.

However, the proposed method still faces some robustness and precision issues, primarily due to the reliance of the registration step on the high precision and resolution of the 3D camera itself, the dependence on tracking accuracy during data collection, and the residuals generated by the hand–eye relationship solving algorithm when the spatial relationships covered by the raw data are limited. Developing registration algorithms with higher overall feature recognition capabilities or improving the performance of measurement hardware in the system could help enhance the accuracy of the calibration results and the overall precision of the measurement system. In this paper, the experimental data acquisition was performed using a manual robot control method, which is relatively labor-intensive and time-consuming. To improve the automation of this process, further research can be conducted on the arrangement of acquisition points and the automatic planning of calibration data collection. The 3D target required by the proposed method needs to satisfy non-centrosymmetric geometric features. Furthermore, due to the precision limitations imposed by the 3D camera’s resolution, using excessively small 3D targets may reduce the calibration accuracy. Developing registration algorithms with stronger spatial feature extraction capabilities could further enhance the applicability of this method.

Author Contributions

Conceptualization, Z.S., L.Q. and Y.S.; methodology, Z.S.; algorithm and experiments, Z.S. and C.S.; original draft preparation, Z.S.; review and editing, Z.S., L.Q. and Y.S.; funding acquisition, L.Q. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the key scientific and technological project of HBIS Group (No. HG2023243).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors thank HongPeng Li and ZeHao Ding for helping with solution design, conceptualization and data curation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, S.; Wang, Q.; Luo, Y. A review of applications of visual inspection technology based on image processing in the railway industry. Transp. Saf. Environ. 2019, 1, 185–204. [Google Scholar] [CrossRef]
Li, W.L.; Xie, H.; Zhang, G.; Yan, S.-J.; Yin, Z.-P. Hand-eye calibration in visually-guided robot grinding. IEEE Trans. Cybern. 2016, 46, 2634–2642. [Google Scholar] [CrossRef] [PubMed]
Xu, J.; Chen, R.; Chen, H.; Zhang, S.; Chen, K. Fast registration methodology for fastener assembly of large-scale structure. IEEE Trans. Ind. Electron. 2017, 64, 717–726. [Google Scholar] [CrossRef]
Li, T.; Gao, L.; Li, P.; Pan, Q. An ensemble fruit fly optimization algorithm for solving range image registration to improve quality inspection of free-form surface parts. Inf. Sci. 2016, 367, 953–974. [Google Scholar] [CrossRef]
Lin, H.W.; Tai, C.L.; Wang, G.J. A mesh reconstruction algorithm driven by an intrinsic property of a point cloud. Comput. Aided Des. 2004, 36, 1–9. [Google Scholar] [CrossRef]
Chang, W.; Zwicker, M. Global registration of dynamic range scans for articulated model reconstruction. ACM Trans. Graph. 2011, 30, 26. [Google Scholar] [CrossRef]
Jiang, J.; Luo, X.; Luo, Q.; Li, M. An overview of hand-eye calibration. Int. J. Adv. Manuf. Technol. 2022, 119, 77–97. [Google Scholar] [CrossRef]
Salvi, J.; Armangué, X.; Batlle, J. A comparative review of camera calibrating methods with accuracy evaluation. Pattern Recognit. 2002, 35, 1617–1635. [Google Scholar] [CrossRef]
Shiu, Y.C.; Ahmad, S. Calibration of wrist-mounted robotic sensors by solving homogeneous transform equations of the form AX=XB. IEEE Trans. Robot. Autom. 1989, 5, 16–29. [Google Scholar] [CrossRef]
Tsai, R.Y.; Lenz, R.K. A new technique for fully autonomous and efficient 3D robotics hand/eye calibration. IEEE Trans. Robot. Autom. 1989, 5, 345–358. [Google Scholar] [CrossRef]
Zhang, Z.Y. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
Fiala, M. Comparing ARTag and ARToolkit plus Fiducial Marker Systems. In Proceedings of the IEEE International Workshop on Haptic Audio Visual Environments and Their Applications, Ottawa, ON, Canada, 1 October 2005; pp. 148–153. [Google Scholar] [CrossRef]
Fiala, M. ARTag, A fiducial marker system using digital techniques. IEEE Conf. Comput. Vis. Pattern Recognit. 2005, 2, 590–596. [Google Scholar] [CrossRef]
Wang, J.; Olson, E. AprilTag 2: Efficient and Robust Fiducial Detection. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 4193–4198. [Google Scholar] [CrossRef]
Xie, Z.X.; Zong, P.F.; Yao, P.; Ren, P. Calibration of 6-DOF industrial robots based on line structured light. Optik 2019, 183, 1166–1178. [Google Scholar] [CrossRef]
Cajal, C. A crenellated-target-based calibration method for laser triangulation sensors integration in articulated measurement arms. Robot. Comput. Integr. Manuf. 2011, 27, 282–291. [Google Scholar] [CrossRef]
Wagner, M.; Reitelshofer, S.; Franke, J. Self-Calibration Method for a Robot Based 3D Scanning System. In Proceedings of the 2015 IEEE 20th Conference on Emerging Technologies & Factory Automation (ETFA 2015), Luxembourg, Luxembourg, 8–11 September 2015. [Google Scholar] [CrossRef]
Lembono, T.S.; Suarez-Ruiz, F.; Pham, Q.C. SCALAR: Simultaneous calibration of 2-D laser and robot kinematic parameters using planarity and distance constraints. IEEE Trans. Autom. Sci. Eng. 2019, 16, 1971–1979. [Google Scholar] [CrossRef]
Shibin, Y.; Yongjie, R.; Shenghua, Y. A vision-based self-calibration method for robotic visual inspection systems. Sensors 2013, 13, 16565–16582. [Google Scholar] [CrossRef]
Xie, H.; Li, W.; Liu, H. General geometry calibration using arbitrary free-form surface in a vision-based robot system. IEEE Trans. Ind. Electron. 2021, 69, 5994–6003. [Google Scholar] [CrossRef]
Xu, G.; Yan, Y. A scene feature based eye-in-hand calibration method for industrial robot. Mechatron. Mach. Vis. Pract. 2021, 4, 179–191. [Google Scholar] [CrossRef]
Rusu, R.B.; Blodow, N.; Marton, Z.C.; Beetz, M. Aligning Point Cloud Views Using Persistent Feature Histograms. In Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, France, 22–26 September 2008; pp. 3384–3391. [Google Scholar] [CrossRef]
Segal, A.V.; Haehnel, D.; Thrun, S. Generalized-ICP. In Proceedings of the Robotics: Science and Systems Conference, Seattle, WA, USA, 28 June–1 July 2009; pp. 347–354. [Google Scholar] [CrossRef]
Wang, Z.; Fan, J.; Jing, F.; Deng, S.; Zheng, M.; Tan, M. An efficient calibration method of line structured light vision sensor in robotic eye-in-hand system. IEEE Sens. J. 2020, 20, 6200–6208. [Google Scholar] [CrossRef]
Murali, P.K.; Sorrentino, I.; Rendiniello, A.; Fantacci, C.; Villagrossi, E.; Polo, A. In Situ Translational Hand-Eye Calibration of Laser Profile Sensors Using Arbitrary Objects. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA 2021), Xi’an, China, 30 May–5 June 2021; pp. 11067–11073. [Google Scholar] [CrossRef]
Peters, A.; Knoll, A.C. Robot self-calibration using actuated 3D sensors. J. Field Robot. 2024, 41, 327–346. [Google Scholar] [CrossRef]
Chen, L.; Qin, Y.; Zhou, X.; Su, H. EasyHeC: Accurate and Automatic Hand-eye Calibration via Differentiable Rendering and Space Exploration. IEEE Robot. Autom. Lett. 2023, 8, 7234–7241. [Google Scholar] [CrossRef]
Hong, Z.; Zheng, K.; Chen, L. EasyHeC++: Fully Automatic Hand-Eye Calibration with Pretrained Image Models. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 14–18 October 2024; pp. 1234–1240. [Google Scholar] [CrossRef]

Figure 1. Measurement system.

Figure 2. Calibration algorithm flow.

Figure 3. Multi-pose calibration measurement.

Figure 4. Multi-pose measurement transformation matrix relationship.

Figure 5. Multi-step registration algorithm.

Figure 6. Experimental hardware system. (a) Experimental system, (b) six-axis robotic arm, (c) structured light camera, (d) ceramic standard ball, and (e) tracking camera.

Figure 7. Calibration data collection.

Figure 8. Influence of RANSAC iteration count on the RMSE of global registration.

Figure 9. Target pose registration.

Figure 10. Pointcloud registration by standard ICP (First Four Sets).

Figure 11. Comparison of registration time.

Figure 12. Spatial distribution of pose data.

Figure 13. Three-Dimensional reconstruction error.

Figure 14. Three-Dimensional reconstruction error distribution.

Figure 15. Distance measurement error.

Table 1. Mean and distribution of 3D reconstruction errors for different methods.

Method	Camera Type	Target Type	Mean Error (mm)	Standard Deviation
Three-Ball System Method	Structured Light	Three Standard Spheres	(X: 2.77, Y: 1.49, Z: 1.47) 3.79	0.99
Zhe’s Method [24]	Line Laser	2D Calibration Board	(X: 1.334, Y: 0.511, Z: 0.925) 1.855	-
Murali’s Method [25]	Laser Profiler	Arbitrary 3D Object	(X: 0.701, Y: 0.443, Z: 0.366) 0.906	-
Peter’s Method [26]	Structured Light	Arbitrary 3D Object	1.77	-
Proposed Method	Structured Light	Arbitrary 3D Object	(X: 1.10, Y: 0.60, Z: 0.66) 1.53	0.42

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, Z.; Sun, C.; Sun, Y.; Qi, L. Robotic Hand–Eye Calibration Method Using Arbitrary Targets Based on Refined Two-Step Registration. Sensors 2025, 25, 2976. https://doi.org/10.3390/s25102976

AMA Style

Song Z, Sun C, Sun Y, Qi L. Robotic Hand–Eye Calibration Method Using Arbitrary Targets Based on Refined Two-Step Registration. Sensors. 2025; 25(10):2976. https://doi.org/10.3390/s25102976

Chicago/Turabian Style

Song, Zining, Chenglong Sun, Yunquan Sun, and Lizhe Qi. 2025. "Robotic Hand–Eye Calibration Method Using Arbitrary Targets Based on Refined Two-Step Registration" Sensors 25, no. 10: 2976. https://doi.org/10.3390/s25102976

APA Style

Song, Z., Sun, C., Sun, Y., & Qi, L. (2025). Robotic Hand–Eye Calibration Method Using Arbitrary Targets Based on Refined Two-Step Registration. Sensors, 25(10), 2976. https://doi.org/10.3390/s25102976

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robotic Hand–Eye Calibration Method Using Arbitrary Targets Based on Refined Two-Step Registration

Abstract

1. Introduction

2. Methods

2.1. Hand–Eye Robot System

2.2. Hand–Eye Calibration Algorithm

2.3. Construction of Calibration Equation

2.4. Multi-Step Registration Algorithm

3. Experiment

3.1. Experimental Platform and Evaluation Criteria

3.1.1. Experimental Platform

3.1.2. Evaluation Criteria

3.2. Experimental Procedures

3.3. Three-Dimensional Reconstruction Error

3.4. Distance Measurement Error

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI