Next Article in Journal
Interpretable Ensemble Architectures with Theory-Informed Features for High-Fidelity Real-Time Congestion Forecasting on the Chalong Rat Expressway
Previous Article in Journal
RMH-YOLO: A Refined Multi-Scale Architecture for Small-Target Detection in UAV Aerial Imagery
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Likelihood-Based Pose Estimation Method for Robotic Arm Repeatability Measurement Using Monocular Vision

Faculty of Land and Resources Engineering, Kunming University of Science and Technology, Kunming 650093, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(22), 7089; https://doi.org/10.3390/s25227089
Submission received: 20 October 2025 / Revised: 14 November 2025 / Accepted: 20 November 2025 / Published: 20 November 2025
(This article belongs to the Section Sensing and Imaging)

Abstract

Repeatability accuracy is a key performance metric for robotic arms. To address limitations in existing monocular vision-based measurement methods, this study proposes a likelihood-based pose estimation approach. Our method first obtains initial pose estimates through optimized likelihood estimation, then iteratively refines depth information. By modeling the statistical characteristics of multiple observed poses, we derive a global theoretical pose. Within this framework, two-dimensional feature points are backprojected into three-dimensional space to form an observed point cloud. The error between this observed cloud and the theoretical feature point cloud is computed using the Iterative Closest Point (ICP) algorithm, enabling accurate quantification of repeatability accuracy. Based on 30 repeated trials at each of five target poses, the proposed method achieved repeatability positioning accuracy of 0.0115 mm, 0.0121 mm, 0.0068 mm, 0.0162 mm, and 0.0175 mm at the five poses, respectively, with a mean value of 0.0128 mm and a standard deviation of 0.0038 mm across the poses. Compared with two existing monocular vision-based methods, it demonstrates superior accuracy and stability, achieving average accuracy improvements of 0.79 mm and 1.06 mm, respectively, and reducing the standard deviation by over 85%.

1. Introduction

In the field of intelligent manufacturing and automated production, repeatability is a core metric for evaluating the performance of robotic arms and other equipment. Its measurement accuracy directly impacts product quality and production efficiency [1,2,3]. High-precision repeatability measurement is fundamental to ensuring the reliability of processes such as assembly, welding, and inspection.
Currently, robotic arm repeatability measurement methods are categorized into laser-based and vision-based approaches according to the sensor type. Laser measurement techniques [4,5,6,7] capture coordinate data by detecting reflected laser spots on target surfaces, demonstrating significant advantages in specific scenarios due to their high measurement accuracy. However, this approach suffers from limitations, including high equipment costs, poor environmental adaptability, and the requirement for specialized targets. The complex installation and calibration procedures make it difficult to meet the practical demands of complex and dynamic industrial environments. In contrast, visual measurement methods offer superior scene adaptability, more convenient deployment methods, and lower cost investment. Within the visual measurement technology framework, pose estimation constitutes a critical component for achieving high-precision measurement. Diverse pose estimation methods provide multiple technical pathways for visual repeatability accuracy measurement. Current research primarily focuses on three major directions: camera calibration methods, non-iterative methods, and iterative methods. Camera calibration methods establish a transformation relationship between the image coordinate system and the world coordinate system to achieve accurate measurement of the end-effector pose of a robotic arm. R.A. Boby et al. [8] proposed a phased monocular camera calibration method and applied it to robotic arm repeatability measurement tasks. Wu et al. [9] introduced a vision-based robotic motion measurement approach, enhancing measurement accuracy and stability through monocular feature matrices and singular value decomposition. Non-iterative methods rely on analytical geometric constraints, achieving measurement through direct pose parameter solutions. To enhance algorithmic efficiency and accuracy, ref. [10] proposed the EPnP model, and multiple improved variants were subsequently derived, including RPnP [11], DLS [12], OPnP [13], SRPnP [14], and the deep learning-based EPro-PnP model [15]. However, measuring robotic repeatability accuracy under monocular vision represents a unique task paradigm. Its core objective involves performing multiple observations of the same scene in a static state and evaluating their consistency. This task focuses more on maximizing the measurement accuracy and stability of single-frame images without relying on temporal sequence information. Consequently, the widely recognized baseline methods remain the classical PnP algorithms and their improved variants. Among these, Gao et al. [16] proposed an image-based target localization method for collaborative robots, utilizing a monocular vision system combined with a PnP algorithm to measure robotic arm repeatability accuracy. Luo et al. [17] employed a coplanar P4P algorithm to determine target feature point coordinates in the camera coordinate system, integrating dual quaternions to solve the pose transformation matrix and thereby calculating repeatability accuracy. The iterative method transforms the PnP problem into an optimization objective function, solving the pose through iterative steps. Chen et al. [18] proposed a robotic arm positioning measurement system combining monocular vision with P4P and OI algorithms [19] that employs P4P results as initial values for OI algorithm optimization. However, most existing visual measurement methods scarcely consider the statistical properties of observational data and lack global optimization mechanisms for multi-frame data.
To address this, we propose a likelihood-based pose method for achieving high-precision repeatability measurement. This method first optimizes the estimated single-frame pose, then employs a likelihood-based pose estimation framework: based on maximum likelihood estimation theory, it transforms the pose problem of solving PnP for a stationary target observed multiple times into a problem of observing a slightly moving target multiple times through random signal modeling. Subsequently, the Iterative Closest Point (ICP) algorithm [20] computes displacement vectors between 3D–3D matched point pairs. The minimum envelope sphere radius is then determined using the progressive algorithm with a preference for the farthest point. This enables precise quantification of repeatability accuracy, overcoming the limitation of traditional single-frame measurement methods that cannot perform global optimization. Furthermore, by calculating 3D–3D point cloud deviations, more accurate and stable repeatability accuracy is obtained.

2. Materials and Methods

Repeatability accuracy measures the degree of pose dispersion across multiple responses to identical commands, with its core challenge lying in quantifying end-effector pose repeatability. Based on the aforementioned theory, this study employs the eye-in-hand model. By establishing a rigid mechanical connection between the visual sensor and the end-effector, a fixed coordinate system transformation relationship is constructed. Subsequently, by calculating the repeatability of the visual sensor’s pose, an indirect measurement of the end-effector’s repeatability accuracy is achieved. The overall process is illustrated in Figure 1.
The overall process comprises three modules: data acquisition, pose estimation, and repeatability accuracy calculation. The data acquisition module utilizes an industrial camera to capture image data, establishing the foundational geometric data for subsequent computations. The pose estimation module employs a weighted likelihood method to refine the initial pose estimates from PnP, mitigating the impact of observation outliers caused by image noise. Based on the optimized pose, a two-stage iterative algorithm solves single-frame depth information, explicitly enhancing estimation stability in the depth direction; by constructing a random signal model and employing maximum likelihood theory, it fuses multi-frame optimized poses to derive a globally optimal pose solution as the theoretical pose; Within the theoretical pose framework, depth information that is iteratively optimized is utilized to backproject two-dimensional feature points onto the world coordinate system, forming a three-dimensional observation point cloud under theoretical pose observation. The ICP algorithm is then employed to compute the registration error between the observation point cloud and the theoretical three-dimensional feature point cloud. The repeatability accuracy calculation module determines the final repeatability accuracy result by computing the minimum envelope sphere radius of multi-frame registration errors.

2.1. Initial Pose Estimation and Optimization

Within the data acquisition module, image data from a calibration plate is captured using an industrial camera. The calibration plate image acquisition protocol adheres to the repeatability accuracy data collection method specified in the national standard ISO 9283 [21]. Position the calibration plate within the workspace of the robotic arm under test. Using the teach pendant, program the control sequence to write five distinct poses into the controller, designated as P1, P2, P3, P4, and P5, respectively. Command the robotic arm to traverse repeatedly between two points at normal operating speed, capturing image data upon reaching each target pose. Each image acquisition sequence is repeated 30 times. The detailed image acquisition workflow is illustrated in Figure 2.
To convert abstract image information into quantifiable pose data, the PnP algorithm is employed for initial camera pose estimation. Processing images acquired from a single observation yields n pairs of 2D–3D matched feature points. For the k-th observation, the coordinates of the i-th feature point in the world coordinate system are denoted as P i w ( k ) = [ X i w ( k ) , Y i w ( k ) , Z i w ( k ) ] T , and its pixel coordinates in the image coordinate system are P i i m g ( k ) = [ u i k , v i k ] T , where i = 1, 2,..., n. The transformation relationship can be expressed using the ideal pinhole camera model:
u i k = f x r 1 k P i w ( k ) + t x k r 3 k P i w ( k ) + t z k + c x v i k = f y r 2 k P i w ( k ) + t y k r 3 k P i w ( k ) + t z k + c y
C a m = { f x , f y , c x , c y } denotes the camera intrinsic parameters, obtainable through calibration experiments and representing fixed values. To quantify the precision of the calibration, we report the maximum re-projection error across all calibration images, which is 0.11 pixels. This low value confirms the accuracy and reliability of the estimated intrinsic parameters C a m = { f x , f y , c x , c y } for our subsequent visual odometry computations. t k = [ t x k , t y k , t z k ] T is the translation vector, and R k = [ r 1 k , r 2 k , r 3 k ] T denotes the rotation matrix, where k = 1, 2,..., K, and t k and R k are initial pose parameters to be estimated, characterizing the camera’s pose during the k-th observation.
As images in practical applications often contain noise, the two-dimensional feature point P i i m g ( k ) observed at the k-th observation is typically accompanied by observation noise ϵ ( k ) . Since the initial pose estimation is based on the PnP algorithm, which is sensitive to noise in two-dimensional feature points, this noise introduces bias into the estimated initial pose t k and R k , thereby affecting subsequent theoretical pose estimation. To safeguard the accuracy of subsequent theoretical pose estimation, nonlinear optimization was performed on the initial pose t k and R k . Currently, many nonlinear optimization-based VO systems explicitly assume that visual feature measurement noise is Gaussian white noise [22,23]. Based on this, it is assumed that at the k-th observation, the observation noise ϵ ( k ) follows a Gaussian distribution with mean zero and variance σ 2 , and its projection process can be expressed as
P i i m g ( k ) = π ( R k , t k , P i w ( k ) ) + ϵ ( k )
where π ( R k , t k , P i w ( k ) ) denotes the projection function of the ideal pinhole camera model. At the k-th observation, to account for uncertainty in feature point extraction, a weight w ( k ) = [ w 1 k , w 2 k , , w n k ] T is introduced for each 2D feature point to mitigate the impact of outliers on pose estimation results. This paper employs an improved IGG 3 function to determine the weights, which exhibits favorable robustness. The weight of the i-th feature point can be expressed as
w i k = 1 | e i | k 0 k 0 | e i | ( k 1 | e i | k 1 k 0 ) 2 k 0 < | e i | < k 1 0 | e i | k 1
where k 0 and k 1 are harmonic coefficients, and e i is the standardized residual:
e i = e i σ 0 σ 0 = 1.4826 m e d i a n ( e i )
where e i is the reprojection error for each 2D feature point, e 0 is the normalization factor, and m e d i a n ( e i ) is the median of e i .
As the two-dimensional feature points are mutually independent, the joint probability density for n such points is expressed as in Equation (5). Taking the natural logarithm of Equation (5) yields the likelihood function, as shown in Equation (6). From Equation (6), it follows that maximizing the likelihood function is equivalent to minimizing the following function, thereby enabling the solution of the optimized camera pose, as depicted in Equation (7).
P = i = 1 n P ( P i i m g ( k ) | π ( R k , t k , P i w ( k ) ) , w ( k ) , σ 2 )
l n L ( P i i m g ( k ) | π ( R k , t k , P i w ( k ) ) , w ( k ) , σ 2 ) = i = 1 n w ( k ) l n ( 2 π σ 2 ) 1 σ 2 i = 1 n w ( k ) ( P i i m g ( k ) π ( R k , t k , P i w ( k ) ) ) 2
[ R ˜ k , t ˜ k ] = min { i = 1 n w ( k ) ( P i i m g ( k ) π ( R k , t k , P i w ( k ) ) ) 2 }
where R ˜ k = [ r ˜ 1 k , r ˜ 2 k , r ˜ 3 k ] T denotes the optimized rotation matrix and t ˜ k = [ t ˜ x k , t ˜ y k , t ˜ z k ] T denotes the optimized translation vector. Through iteration, the optimized camera pose R ˜ k and t ˜ k can be obtained. This completes the initial pose estimation and optimization.

2.2. Two-Stage Iterative Depth Estimation Optimization

Although the optimized camera pose filters out the influence of image noise on pose estimation, it can not resolve the issue of missing depth information in monocular visual measurement. Therefore, using the optimized camera pose as the initial value, a two-stage iterative algorithm is employed to optimize the depth information. From the ideal pinhole camera model, rays emanating from the camera’s optical center as the model’s origin simultaneously traverse both the feature image point and the three-dimensional feature point. The unit ray vector h i k can be expressed as
h i k = 1 ( u i k c x ) 2 + ( v i k c y ) 2 + f 2 u i k c x v i k c y f
According to the principles of perspective projection, the coordinate of the feature point in the camera coordinate system, P i c ( k ) = [ X i c ( k ) , Y i c ( k ) , Z i c ( k ) ] T , and P i w ( k ) should simultaneously lie on the ray unit vector h i k . Their coordinates in the camera coordinate system can be expressed as
P i c ( k ) = d i k h i k
where d i k denotes the depth-of-field factor for feature point i during the k-th observation. By the collinearity principle, the collinearity equation yields
P i c ( k ) = ( h i k ) T P i c ( k ) h i k = ( h i k ) T ( R c w k P i w ( k ) + t c w k ) h i k
where R c w k and t c w k denote the rotation matrix and translation vector, respectively. Due to measurement errors between camera coordinates and true coordinates, the error function is formulated as
E ( R c w k , t c w k ) = i = 1 n [ ( R c w k P i w ( k ) + t c w k ) ( h i k ) T ( R c w k P i w ( k ) + t c w k ) h i k ] 2
Based on Equation (11), the rotation matrix R c w k is initialized using R ˜ k . After fixing the rotation matrix, the partial derivative with respect to the translation vector is taken to optimize the translation vector t ˜ c w k and the depth of field d c w i (k):
t ˜ c w k = ( n I i = 1 n h i k ( h i k ) T ) 1 i = 1 n ( h i k ( h i k ) T I ) R c w k P i w ( k )
d c w i ( k ) = ( h i k ) T ( R c w k P i w ( k ) + t ˜ c w k )
Using the estimated depth of field, the point cloud in the camera coordinate system is then reconstructed, yielding the reconstructed camera point cloud coordinates P ˜ i c ( k ) :
P ˜ i c ( k ) = d c w i ( k ) h i k
Given the optimized translation vector and depth, the translation vector and depth can be fixed while optimizing the rotation matrix. Due to the orthogonality of the rotation matrix, its optimization problem is transformed into a least-squares problem under orthogonal constraints. Therefore, the optimized rotation matrix R ˜ c w k is solved using the Umeyama algorithm:
R ˜ c w k = arg min R c w k i = 1 n d c w i ( k ) h i k ( R c w k P i w ( k ) + t ˜ c w k ) 2
The error equation is constructed using the optimized pose and depth, and the maximum number of iterations is set to 100, with an iteration threshold of 0.1 mm:
E ( R c w k , t c w k ) = i = 1 n d c w i ( k ) h i k ( R ˜ c w k P i w ( k ) + t ˜ c w k ) 2
By alternately iterating Equations (12)–(16) until convergence, the high-precision depth and camera pose are obtained.

2.3. Reproducibility Error Propagation Model

As shown in Figure 3, the repeatability positioning accuracy of a robotic arm is typically defined as the dispersion of the end-effector’s pose after multiple repetitive motions. In essence, it calculates the dispersion of spatial discrete points, which can be quantified using the minimum enclosing sphere radius of the point set. When the robotic arm is equipped with a camera, since the end-effector and the camera are rigidly connected, their poses are related only by a fixed hand-eye transformation matrix. However, as repeatability positioning accuracy reflects the dispersion of poses, the camera pose can be used as a substitute for the end-effector pose in repeatability calculations. The eye-in-hand model is illustrated in Figure 4.
Current methods for monocular vision-based repeatability accuracy measurement predominantly rely on single-frame PnP solutions. These approaches statistically analyze pose distributions from multiple independent single-frame estimations to quantify robotic arm repeatability. However, due to the inherent scale ambiguity in monocular vision, these methods suffer from significantly compromised accuracy along the Z-axis. To address this fundamental limitation, we develop an error model of repetitive motion. The assumption that the robotic arm’s repeatability error follows a normal distribution is widely accepted in the field [24,25]. Based on this premise, the error equation is constructed as follows:
s ( k ) = t + e ( k )
where s ( k ) represents the k-th observation, t denotes the spatial reference position that the repetitive motion attempts to reach (which remains constant regardless of the number of observations), and e ( k ) is the error of a single observation and follows a normal distribution. The spatial reference position t can be estimated by constructing the error equation. The above formulation indicates that the observational data consists of the reference position and the observational error. Now, based on the spatial reference position t, a new hypothesis is proposed: all observations are assumed to be taken at the spatial reference position t. Under this hypothesis, the observational data containing errors e ( k ) are backprojected to generate three-dimensional feature points, which inherently incorporate the observational errors e ( k ) . This process integrates multi-frame information to transfer the repeatability error e ( k ) onto the point cloud, making the repeatability error e ( k ) explicitly quantifiable through point cloud registration.

2.4. Theoretical Pose Estimation

As the camera performs repetitive motion, under ideal conditions, the optimized camera pose R ˜ c w k for each frame should be identical to t ˜ c w k . However, in practical operation, each observation by the robotic arm introduces random errors n ( k ) due to factors such as motor control fluctuations, minute variations in transmission clearance, and environmental noise. Therefore, a single observation value can be expressed as the sum of the observed value m( η ) at the theoretical pose η and the error n ( k ) :
P ˜ i i m g ( k ) = m ( η ) + n ( k )
P ˜ i i m g ( k ) = π ( R ˜ c w k , t ˜ c w k , P i w ( k ) ) m ( η ) = π ( η , P i w ( k ) )
where P ˜ i i m g ( k ) denotes the k-th observation value; n ( k ) represents the k-th error, which follows a Gaussian distribution with mean zero and variance δ 2 . The errors from each observation are mutually independent. The theoretical pose η can be expressed as
η = [ R t r u e | t t r u e ] R t r u e = [ R 1 , R 2 , R 3 ] T t t r u e = [ T x , T y , T z ] T
where the theoretical pose η comprises a rotational component R t r u e and a translational component t t r u e . Given that the theoretical pose η remains constant over K repetitive motions, while errors n ( k ) arise from random factors, the K observations can be modeled as a random signal: a single observation value can be expressed as the sum of the observed value m( η ) at the theoretical pose η and the error n ( k ) . Since error n ( k ) follows a Gaussian distribution, the maximum likelihood method can be employed to estimate the theoretical pose η . Considering the Gaussian nature of the error, maximizing the likelihood function is equivalent to minimizing the sum of squares of the reprojection errors between all observed values and the theoretical true values m( η ) [26]. Therefore, the global optimization problem for determining the theoretical pose η can be formulated as
η = arg min η k = 1 K P ˜ i i m g ( k ) m ( η ) 2 = k = 1 K [ P ˜ i i m g ( k ) m ( η ) ] T [ P ˜ i i m g ( k ) m ( η ) ]
where P ˜ i i m g ( k ) m ( η ) denotes the Euclidean distance between the k-th observation P ˜ i i m g ( k ) and the theoretical true value m( η ). Solving this nonlinear least-squares problem yields the theoretical pose η . This process effectively fuses information from K observations, mitigates the impact of random errors in individual observations, and estimates the theoretical pose η via maximum likelihood estimation.

2.5. Cramér–Rao Bound

For parameter estimation problems, the choice of estimation criterion directly determines the performance of the estimate. To understand the potential performance of parameter estimation, the Cramér–Rao bound (CRB) is adopted as the evaluation benchmark. According to optimal estimation theory, the CRB represents the lower variance limit for all unbiased estimators of a parameter and can be expressed as
C ( η ) C C R ( η )
In the equation, C ( η ) represents the variance of the parameter estimation error, and C C R ( η ) denotes the Cramér–Rao bound, which can be obtained by inverting the Fisher information matrix J:
C C R ( η ) = J 1
Based on the aforementioned stochastic signal model, the normalized likelihood function can be derived according to maximum likelihood theory:
L ( η ) = l n [ det ( Q ) ] 1 K k = 1 K { [ P ˜ i i m g ( k ) m ( η ) ] T Q 1 [ P ˜ i i m g ( k ) m ( η ) ] }
Q = δ 2 I
where det ( · ) denotes the matrix determinant and Q is the covariance matrix. Based on the likelihood function derived from the above equation, the general form of the Fisher information matrix can be obtained as
[ J ] i , j = E [ L ( η ) η i L ( η ) η j ] = K { t r [ Q 1 Q η i Q 1 Q η j ] } + 2 K [ m ( η ) η i Q 1 m ( η ) η j ]
In the pose estimation problem, since the first term in the above equation is independent of the parameters to be estimated, and the noise is assumed to be isotropic Gaussian, with Q being the covariance matrix of the random variables, it follows that the Fisher information matrix for the pose estimation problem can be expressed as
[ J ] i , j = 2 K δ 2 [ m T ( η ) η i m ( η ) η j ]
Consequently, the Cramér–Rao bound of the parameters to be estimated can be expressed as
C C R ( η ) = δ 2 2 K [ m T ( η ) η i m ( η ) η j ] 1
As can be seen from the above equation, the Cramér–Rao bound for parameter estimation decreases linearly with the increase in the number of observations and increases linearly with the increase in noise power.

2.6. Repeatability Accuracy Calculation

In existing methods for calculating repeatability accuracy using monocular vision, the PnP algorithm is typically employed for camera pose estimation and optimization, with repeatability accuracy calculated directly from the optimized pose. Although the PnP algorithm can directly solve for poses using 2D–3D matched point pairs, the ICP algorithm exhibits lower estimation uncertainty and achieves more precise pose estimation when depth value information is precisely known [27]. Therefore, this paper introduces the ICP algorithm for calculating repeatability accuracy.
As the ICP is a point cloud registration algorithm, only the pose of each observation and the theoretical pose are currently known; therefore, a new observation model must be constructed to incorporate the ICP. This leads to the following hypothesis: assuming the poses of K observations are all theoretical poses η , if the 2D image point coordinates remain unchanged, the calibration board must be moved to ensure that the relative pose remains constant. Based on the principle of constant relative pose, it follows that the depth information will not change. Consequently, by backprojecting the observations into three-dimensional space, the coordinates of the three-dimensional feature points on the repositioned calibration plate can be obtained, yielding the corresponding point pairs. The coordinate pairs P ˜ i w ( k ) and P i w ( k ) form corresponding point pairs:
P ˜ i w ( k ) = π 1 ( P ˜ i i m g ( k ) , d c w i ( k ) , η )
where π 1 ( P ˜ i i m g ( k ) , d c w i ( k ) , η ) is the backprojection function for the ideal pinhole camera model. This transfers the robotic arm error from the arm’s pose to the calibration plate’s pose. The robotic arm error n ( k ) can then be quantified by registering the corresponding point pairs using the point-to-point ICP algorithm:
n ( k ) = [ Δ R k , Δ t k ] = arg min Δ R k , Δ t k i = 1 n ( Δ R k P ˜ i w ( k ) + Δ t k ) P i w ( k ) 2
where Δ R k denotes rotational deviation and Δ t k denotes translational deviation. The maximum number of iterations and the convergence threshold for the ICP algorithm are set to 10 5 and 10 8 . This transformation converts the problem of solving the pose of a stationary target PnP through multiple observations into solving the relative pose of a slightly moving target through multiple ICP-matched point clouds. If the translational deviation component Δ t k obtained from each ICP iteration is regarded as a discrete point in three-dimensional space, after K repetitions of the motion, the set of translational deviation points can be obtained:
τ = { Δ t 1 , Δ t 2 , , Δ t K }
By determining the radius of the minimum enveloping sphere for this point set, the repeatability accuracy of the robotic arm can be quantified.

3. Results

The experimental setup comprised a robotic arm, an industrial camera, and a circular calibration plate. The robotic arm was an Elfin-Pro E03-Pro collaborative robot featuring six axes of motion and weighing 18 kg, with a maximum payload of 3 kg and a working range of 590 mm. According to the manufacturer’s datasheet, it has a repeatability positioning accuracy of ±0.02 mm. All experiments were conducted in a temperature-controlled environment (23 ± 2 °C) to ensure that the robot reached thermal equilibrium, thereby minimizing potential accuracy drift due to temperature variations. The camera was an SVersion-HJ-RGBD-90 industrial camera with a resolution of 3120 × 4096 pixels. The calibration board was a CGB-020 5 × 4 dot calibration board, as illustrated in Figure 5. Our computer was configured with an Intel Core i7-12700K CPU and an NVIDIA GeForce RTX 3060 GPU. All image data in the experiments use pixels as the unit, while all vector data are expressed in millimeters (mm).
First, the algorithm’s efficacy was validated. The algorithm validation experiment primarily assessed initial pose optimization and theoretical pose estimation. This paper employed weighted maximum likelihood to optimize the initial pose. To validate the effectiveness of the improved IGG 3 function weight-allocation strategy, optimization was conducted using both maximum likelihood (ML) and weighted maximum likelihood (WML) under identical initial pose conditions. Performance was assessed by statistically evaluating the reprojection error, with the results presented below in Table 1.
The experimental results demonstrate that incorporating the improved IGG 3 function weight-allocation strategy effectively reduced the reprojection error. Specifically, the maximum error of WML decreased by 87.4% compared to ML, proving that the modified IGG 3 function dynamically adjusted weights to suppress outlier effects and mitigate the impact of inherent salt-and-pepper noise on pose estimation outcomes. The mean error and standard deviation of WML improved by 44.11% and 76.69%, respectively, compared to ML. This indicates that the introduced IGG 3 function weight-allocation strategy enables the optimization objective to focus more on high-confidence matching points, significantly enhancing the accuracy and consistency of pose estimation and demonstrating that the algorithm delivers stable gains under normal operating conditions while maintaining robustness in extreme scenarios.
This paper constructed a random signal model based on the statistical characteristics of errors and employed the maximum likelihood method to estimate theoretical poses. To validate the effectiveness of the maximum likelihood estimated theoretical poses and assess whether the estimation results can serve as the system’s theoretical poses, the theoretical boundary was calculated. The proposed method was compared with the Cramér–Rao bound using the mean reprojection error as the evaluation metric. The results are shown in Table 2 and Figure 6.
Experimental results demonstrate that across five independent replicate experiments, the accuracy of the maximum likelihood global optimization algorithm approached that of the Cramér–Rao bound. The maximum difference between this method and the Cramér–Rao bound was 0.0267 pixels, with the minimum difference being 0.0086 pixels. Both differences remained within 0.03 pixels, proving that the maximum likelihood method based on the random signal model can effectively estimate theoretical poses.
Second, simulation experiments analyzed factors influencing the pose estimation module algorithm. As the initial pose estimation employed the PnP algorithm, to identify key factors affecting the initial pose optimization algorithm, the proposed method was compared with the classical PnP improved algorithm [10,11,12,13,14] and the orthogonal iteration algorithm [19]. In the experiments, a scenario with known camera intrinsic parameters was simulated. The synthetic virtual camera resolution was assumed to be 640 × 480 pixels with a fixed focal length of 800 pixels. Three-dimensional feature points were randomly generated, distributed on a plane with coordinates ranging from [−2, 2] m × [−2, 2] m. Two-dimensional feature points were projected using an ideal pinhole camera model. Employing a controlled variable method, the number of reference points N and noise level ω were analyzed as study variables. For each variable value, 500 simulation experiments were conducted. Algorithm performance was evaluated by calculating the mean and median of the rotational and translational errors. The variable parameter settings are detailed in Table 3, with the experimental results presented in Figure 7 and Figure 8.
The experimental results demonstrate that the error of all algorithms decreased as the number of reference points increased, while it rose with increasing noise levels. Furthermore, at a fixed noise level, accuracy gradually converged once the number of reference points reached a certain threshold. As EPnP is a non-iterative linear algorithm, it is sensitive to noise, making it difficult to guarantee accuracy when few feature points are available. DLS disregards geometric constraints, making it prone to singularities during computation and resulting in overall instability. Although OI is an iterative method, its accuracy and stability deteriorated when reference points were scarce or near singularity, frequently trapping it in local optima. The RPnP, OPnP, and SRPnP algorithms all exhibited high precision; however, as RPnP is a suboptimal algorithm, its accuracy was marginally lower than that of OPnP and SRPnP. In contrast, the proposed method enhances accuracy by using a weighted maximum likelihood approach to optimize initial poses, improving the extraction precision of two-dimensional feature points. It constructs a random signal model and employs maximum likelihood estimation for theoretical pose estimation, thereby preventing estimation results from becoming trapped in local optima. Consequently, it outperformed the comparison methods in both accuracy and stability.
Finally, a comparative analysis was conducted. This analysis primarily encompassed module comparisons and comparisons with similar methods. Within the robotic arm’s workspace, an industrial camera mounted on the robotic arm captured 150 calibration plate images. A portion of the dataset images is shown in Figure 9.
The above images represent calibration plate photographs captured under five different poses: P1–P5. To quantitatively evaluate the impact of each module’s algorithm on the final repeatability measurement accuracy, a comparative experiment was designed using 30 datasets from the P1 pose. The comparison schemes were as follows:
  • Scheme A: Basic PnP (unoptimized);
  • Scheme B: A + WML (initial optimization);
  • Scheme C: B + two-stage iteration + ML (global optimization);
  • Scheme D: C + ICP (proposed method).
Scheme A performs the PnP solution directly after data acquisition, using the first frame’s pose as the baseline to compute the relative positional relationships for subsequent frames, thereby determining repeatability accuracy. Scheme B optimizes the PnP solution from Scheme A before calculating relative positions and determining repeatability accuracy. Scheme C builds upon Scheme B by employing a two-stage iterative algorithm to optimize depth and pose. It uses a maximum likelihood method based on a random signal model to estimate theoretical poses. Using these theoretical poses as reference, it calculates each frame’s relative position to the theoretical pose to determine repeatability accuracy. Scheme D represents the method proposed in this paper. Building upon Scheme C, it constructs a three-dimensional homographic point cloud based on the theoretical pose and the optimized depth field. This enables the introduction of the ICP algorithm to compute repeatability accuracy. The specific comparative experimental results are presented in Table 4.
The comparative experimental results demonstrated that the proposed method achieved repeatability accuracy of 0.0115 mm, significantly outperforming alternative approaches. The specific analysis was as follows: Scheme A exhibited repeatability accuracy of 17.9465 mm, markedly higher than that of other approaches. This is primarily because single-frame PnP solutions are susceptible to observational noise interference, leading to substantial positional estimation deviations and poor stability. If the inaccurate single-frame pose from the initial frame is directly used as the reference for calculating relative poses, the relative poses incorporate gross errors, thereby causing repeatability accuracy measurements to fail. Following the introduction of WML optimization in Scheme B, repeatability accuracy improved by 93%. Although this scheme effectively mitigated the impact of gross errors on the results by modeling the statistical characteristics of observational noise, thereby significantly enhancing single-frame pose stability, it failed to resolve the inaccuracy in the depth calculation under the coplanar configuration of monocular visual features. This compromised the accuracy of the three-dimensional repeatability results. Scheme C builds upon Scheme B by employing a two-stage iterative algorithm. It optimizes depth and relative pose based on collinearity constraints and estimates the theoretical pose via a maximum likelihood method grounded in a random signal model. This resolved depth estimation inaccuracies and enhanced repeatability accuracy. However, PnP methods reliant solely on 2D–3D matching face inherent limitations. The proposed method incorporates the ICP algorithm into Scheme C, achieving significantly higher accuracy than that scheme. This demonstrates that the ICP registration scheme for homonymous point clouds—based on the theoretical pose η and the optimized depth field backprojection—effectively overcomes the projection error sensitivity inherent in 2D–3D matching, yielding more precise repeatability accuracy.
To demonstrate the superiority of the proposed method, we conducted a comparative analysis against existing monocular vision-based repeatability accuracy calculation methods. Given identical input image data, we compared the repeatability accuracy results of our method with those from [16,18]. The specific repeatability accuracy calculations and corresponding error analysis are presented below.
Through the analysis of the data in Table 5 and Table 6, the method proposed in this paper significantly outperformed the comparative methods in repetitive positioning accuracy measurement, with measurement values within 0.0175 mm and a mean repeatability positioning accuracy of 0.0128 mm, which is on the same order of magnitude as the factory-calibrated accuracy (±0.02 mm) of the test robot, indicating that the proposed method has minimal measurement error and is capable of accurately capturing the true performance of the robotic arm. Further analysis of error statistics indicates that the proposed method surpassed the comparative methods in both mean error and standard deviation metrics, demonstrating small error fluctuations in the measurement results and no significant outlier error points. Specifically, the method in [16], based on the PnP algorithm, is sensitive to feature point distribution and calculates repetitive positioning accuracy through single-frame pose estimation results. Although it optimizes the extracted 2D feature points during image point extraction to avoid the influence of gross errors, resulting in relatively good stability, it does not consider the impact of depth estimation errors on the results. Since this algorithm is similar to Scheme B, its accuracy can only be controlled at the millimeter level. While the PnP + OI combined method proposed in [18] improves overall accuracy through nonlinear optimization, it relies on 2D–3D matching-based PnP methods that possess inherent limitations. In this paper, an improved IGG3 weighting function is adopted to suppress feature point outliers, a two-stage iterative strategy is employed to optimize depth and pose, and a maximum likelihood method based on a random signal model is constructed to estimate the theoretical pose. The theoretical pose is then used to backproject 3D points as observations, combined with the ICP algorithm to break through the accuracy limits of 2D–3D matching. The introduction of the ICP algorithm indeed increased computational cost by an average of 8 s, as reported in Table 6. This trade-off is justified through an analysis of algorithmic complexity and the fundamental objectives of this work. The complexity of the ICP algorithm is dominated by the nearest-neighbor search. Given point clouds of size N and M, the per-iteration complexity of ICP using a KD-Tree is O(NlogM), leading to an overall complexity of O(kNlogM) for k iterations, which is a manageable polynomial-time complexity. Considering that the primary objective of this study is high-accuracy 3D repeatability assessment (as opposed to high-speed online control), and that traditional monocular methods suffer from inherent depth ambiguity, the fixed computational cost is warranted. The ICP registration elevates the measurement from the 2D image space to a metric 3D space, fundamentally resolving the scale ambiguity issue. This is considered a critical advancement for measurement tasks where precision is paramount.

4. Conclusions

This paper addresses critical issues in monocular vision-based robotic arm repeatability measurement, including missing statistical assumptions in observational data, non-global optimization of pose estimation, and insufficient depth estimation accuracy. It proposes a quantitative method based on likelihood pose estimation. By optimizing single-frame poses from PnP estimates, constructing a random signal model, and employing the approach to fuse multi-frame observations for theoretical pose estimation, this approach overcomes the accuracy limitations of traditional 2D–3D matching through optimized depth information and ICP algorithms. Experimental validation demonstrates that the proposed method achieves repeatability accuracy consistently within 0.0068–0.0175 mm, with a standard deviation of 0.0038 mm. This represents improvements of 85% and 87% over comparative methods, offering enhanced precision and stability. It provides novel theoretical and methodological pathways for applying visual measurement techniques in robotic arm performance evaluation. Although the proposed method performs well in controlled environments with stable lighting and no occlusions, variable illumination and partial occlusion in industrial settings remain critical practical challenges. In future work, we will focus on enhancing the robustness of the approach. Specific directions include investigating illumination-invariant feature descriptors and developing robust estimation algorithms capable of identifying and rejecting outlier observations caused by occlusions. The ultimate goal is to adapt this measurement system for deployment in more complex industrial field environments.

Author Contributions

Conceptualization, P.Z. and J.L. (Jiatian Li); Methodology, P.Z. and J.L. (Jiatian Li); Software, P.Z.; Validation, P.Z. and Y.J.; Writing—original draft, P.Z.; Writing—review and editing, P.Z., J.L. (Jiatian Li), J.L. (Jiayin Liu) and F.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under grant number 41561082.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Slamani, M.; Nubiola, A.; Bonev, I. Assessment of the positioning performance of an industrial robot. Ind. Robot. Int. J. 2012, 39, 57–68. [Google Scholar] [CrossRef]
  2. Deng, F.; Li, K.; Hu, X.; Jiang, H.; Huang, F. Life calculation of angular contact ball bearings for industrial robot RV reducer. Ind. Lubr. Tribol. 2019, 71, 826–831. [Google Scholar] [CrossRef]
  3. Yu, Z.; Wan, J.; Hao, Z.; Kou, L. Measuring the pose repeatability accuracy of the industrial robot end-effector based on the ISSA-IGCF-IHT method. Meas. Sci. Technol. 2024, 35, 115022. [Google Scholar] [CrossRef]
  4. Ma, N.; Su, C.; Zhang, X. Research on fast field calibration method of industrial robot. In Proceedings of the 2015 Chinese Automation Congress (CAC), Wuhan, China, 27–29 November 2015; IEEE: New York, NY, USA, 2015; pp. 443–447. [Google Scholar]
  5. Tian, W.; Xu, M.; Zhang, X.; Guo, X.; Wang, L.; Huang, T. Repeatability prediction of 6-DOF hybrid robot based on equivalent error model of actuated joint. Measurement 2023, 207, 112377. [Google Scholar] [CrossRef]
  6. Yin, F.; Wang, L.; Tian, W.; Zhang, X. Kinematic calibration of a 5-DOF hybrid machining robot using an extended Kalman filter method. Precis. Eng. 2023, 79, 86–93. [Google Scholar] [CrossRef]
  7. Wang, Z.; Zhang, R.; Keogh, P. Real-time laser tracker compensation of robotic drilling and machining. J. Manuf. Mater. Process. 2020, 4, 79. [Google Scholar] [CrossRef]
  8. Boby, R.A.; Saha, S.K. Single image based camera calibration and pose estimation of the end-effector of a robot. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; IEEE: New York, NY, USA, 2016; pp. 2435–2440. [Google Scholar]
  9. Wu, L.; Deng, X.; Wang, Y.; Du, X.; Xiong, X.; Jiang, B. Research on vision-based robot planar motion measurement method. J. Braz. Soc. Mech. Sci. Eng. 2023, 45, 216. [Google Scholar] [CrossRef]
  10. Lepetit, V.; Moreno-Noguer, F.; Fua, P. EP n P: An accurate O (n) solution to the P n P problem. Int. J. Comput. Vis. 2009, 81, 155–166. [Google Scholar] [CrossRef]
  11. Li, S.; Xu, C.; Xie, M. A robust O (n) solution to the perspective-n-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1444–1450. [Google Scholar] [CrossRef] [PubMed]
  12. Hesch, J.A.; Roumeliotis, S.I. A direct least-squares (DLS) method for PnP. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: New York, NY, USA, 2011; pp. 383–390. [Google Scholar]
  13. Zheng, Y.; Kuang, Y.; Sugimoto, S.; Astrom, K.; Okutomi, M. Revisiting the pnp problem: A fast, general and optimal solution. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2344–2351. [Google Scholar]
  14. Wang, P.; Xu, G.; Cheng, Y.; Yu, Q. A simple, robust and fast method for the perspective-n-point problem. Pattern Recognit. Lett. 2018, 108, 31–37. [Google Scholar] [CrossRef]
  15. Chen, H.; Wang, P.; Wang, F.; Tian, W.; Xiong, L.; Li, H. Epro-pnp: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 47, 2781–2790. [Google Scholar]
  16. Gao, H.; Zhang, H.; Jiang, Y.; Sun, J.; Yu, J. A localization method of manipulator towards achieving more precision control. Comput. Intell. 2024, 40, e12600. [Google Scholar] [CrossRef]
  17. Luo, Z.; Cai, Z.; Jiang, W.; Yang, L. Endoscopic Attitude Measurement Method for Industrial Robot Based on Monocular Vision. Modul. Mach. Tool Autom. Manuf. Tech. 2020, 11, 60–64. [Google Scholar]
  18. Chen, G.; Xu, D.; Yang, P. High precision pose measurement for humanoid robot based on PnP and OI algorithms. In Proceedings of the 2010 IEEE International Conference on Robotics and Biomimetics, Tianjin, China, 14–18 December 2010; IEEE: New York, NY, USA, 2010; pp. 620–624. [Google Scholar]
  19. Lu, C.P.; Hager, G.D.; Mjolsness, E. Fast and globally convergent pose estimation from video images. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 610–622. [Google Scholar] [CrossRef]
  20. Zhang, Z. Iterative point matching for registration of free-form curves and surfaces. Int. J. Comput. Vis. 1994, 13, 119–152. [Google Scholar] [CrossRef]
  21. ISO 9283; Manipulating Industrial Robots: Performance Criteria and Related Testing Methods. ISO: Geneva, Switzerland, 1998.
  22. Leutenegger, S.; Lynen, S.; Bosse, M.; Siegwart, R.; Furgale, P. Keyframe-based visual–inertial odometry using nonlinear optimization. Int. J. Robot. Res. 2015, 34, 314–334. [Google Scholar] [CrossRef]
  23. Mourikis, A.I.; Roumeliotis, S.I. A multi-state constraint Kalman filter for vision-aided inertial navigation. In Proceedings 2007 IEEE International Conference on Robotics and Automation, Rome, Italy, 10–14 April 2007; IEEE: New York, NY, USA, 2007; pp. 3565–3572. [Google Scholar]
  24. Zhang, D.; Liang, H.; Li, X.a.; Jia, X.; Wang, F. Kinematic calibration of industrial robot using Bayesian modeling framework. Reliab. Eng. Syst. Saf. 2025, 253, 110543. [Google Scholar] [CrossRef]
  25. Nubiola, A.; Bonev, I.A. Absolute calibration of an ABB IRB 1600 robot using a laser tracker. Robot. Comput.-Integr. Manuf. 2013, 29, 236–245. [Google Scholar] [CrossRef]
  26. Qu, Y.; Zhang, C.; Liu, Y. Maximum likelihood pose estimation using machine vision. J. Appl. Opt. 2019, 40, 253–258. [Google Scholar] [CrossRef]
  27. Ma, X.; Liang, X. Analysis on pose estimation uncertainty and observation parametrization for RGB-D cameras. Robot 2021, 43, 54–65. [Google Scholar]
Figure 1. Algorithm flowchart.
Figure 1. Algorithm flowchart.
Sensors 25 07089 g001
Figure 2. Image acquisition flowchart.
Figure 2. Image acquisition flowchart.
Sensors 25 07089 g002
Figure 3. Repeatability positioning accuracy.
Figure 3. Repeatability positioning accuracy.
Sensors 25 07089 g003
Figure 4. Eye-in-hand model.
Figure 4. Eye-in-hand model.
Sensors 25 07089 g004
Figure 5. Experimental setup.
Figure 5. Experimental setup.
Sensors 25 07089 g005
Figure 6. Error comparison chart.
Figure 6. Error comparison chart.
Sensors 25 07089 g006
Figure 7. Variations in reference point count.
Figure 7. Variations in reference point count.
Sensors 25 07089 g007
Figure 8. Variations in noise level.
Figure 8. Variations in noise level.
Sensors 25 07089 g008
Figure 9. Calibration plate image data.
Figure 9. Calibration plate image data.
Sensors 25 07089 g009
Table 1. Effectiveness analysis results.
Table 1. Effectiveness analysis results.
Minimum/PixelsMaximum/PixelsMean/PixelsStandard Deviation/Pixels
WML0.09370.49400.28350.1402
ML0.13953.92240.50720.6016
Table 2. Performance evaluation findings.
Table 2. Performance evaluation findings.
1/Pixels2/Pixels3/Pixels4/Pixels5/Pixels
Ours0.03830.03630.04170.04730.0432
Cramér–Rao bound0.01160.02770.03460.03360.0193
Table 3. Variable parameter settings.
Table 3. Variable parameter settings.
MinimumMaximumStep
N4201
ω /pixels050.5
Table 4. Module comparative analysis results.
Table 4. Module comparative analysis results.
ABCD
Repeatability/mm17.94651.27300.97920.0115
Table 5. Repeatability positioning accuracy calculation results.
Table 5. Repeatability positioning accuracy calculation results.
P1/mmP2/mmP3/mmP4/mmP5/mm
[16]1.05691.08981.03031.07021.1041
[18]0.80490.80880.75670.81070.8331
Our method0.01150.01210.00680.01620.0175
Table 6. Error and computational efficiency analysis.
Table 6. Error and computational efficiency analysis.
Mean/mmStandard Deviation/mmTime Cost/s
[16]1.07030.02573.3454
[18]0.80280.02867.1082
Our method0.01280.003811.1390
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, P.; Li, J.; Liu, J.; He, F.; Jiang, Y. A Likelihood-Based Pose Estimation Method for Robotic Arm Repeatability Measurement Using Monocular Vision. Sensors 2025, 25, 7089. https://doi.org/10.3390/s25227089

AMA Style

Zhang P, Li J, Liu J, He F, Jiang Y. A Likelihood-Based Pose Estimation Method for Robotic Arm Repeatability Measurement Using Monocular Vision. Sensors. 2025; 25(22):7089. https://doi.org/10.3390/s25227089

Chicago/Turabian Style

Zhang, Peng, Jiatian Li, Jiayin Liu, Feng He, and Yiheng Jiang. 2025. "A Likelihood-Based Pose Estimation Method for Robotic Arm Repeatability Measurement Using Monocular Vision" Sensors 25, no. 22: 7089. https://doi.org/10.3390/s25227089

APA Style

Zhang, P., Li, J., Liu, J., He, F., & Jiang, Y. (2025). A Likelihood-Based Pose Estimation Method for Robotic Arm Repeatability Measurement Using Monocular Vision. Sensors, 25(22), 7089. https://doi.org/10.3390/s25227089

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop