Visual Predictive Control for Robotics with RBF-EKF Coupled State-Disturbance Estimation and Task-Oriented K-Means Clustering

Ji, Peng; Wang, Hongyu; Ren, Weina; Han, Youngjoon; Cao, Maoyong

doi:10.3390/s26031046

Open AccessArticle

Visual Predictive Control for Robotics with RBF-EKF Coupled State-Disturbance Estimation and Task-Oriented K-Means Clustering

by

Peng Ji

¹

,

Hongyu Wang

¹,

Weina Ren

²,

Youngjoon Han

³

and

Maoyong Cao

^1,*

¹

School of Information and Automation Engineering, Shandong Key Laboratory of Key Technologies and Systems for Humanoid Robots, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China

²

Department of Electrical and Automation, Shandong Labor Vocational and Technical College, Jinan 250300, China

³

School of AI Convergence, Soongsil University, Seoul 06978, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(3), 1046; https://doi.org/10.3390/s26031046

Submission received: 20 December 2025 / Revised: 20 January 2026 / Accepted: 2 February 2026 / Published: 5 February 2026

(This article belongs to the Section Sensors and Robotics)

Download

Browse Figures

Versions Notes

Abstract

Image-Based Visual Servoing (IBVS) systems often suffer from instability due to measurement noise, modeling errors, and external disturbances. To address these issues, this study proposes a Visual Predictive Control framework integrating Radial Basis Function (RBF) and Extended Kalman Filter (EKF) coupled state-disturbance estimation and task-oriented K-means clustering. First, a feedback linearization Model Predictive Control (MPC) law is designed to handle system nonlinearities and physical constraints. Second, a coupled estimation mechanism is established where the EKF suppresses noise while the RBF network learns lumped disturbances. Crucially, to optimize network efficiency, a task-oriented K-means clustering method is introduced to select RBF centers based on the nominal IBVS path. Lyapunov analysis confirms the Uniformly Ultimately Bounded (UUB) stability. Simulation results demonstrate that the proposed method significantly reduces estimation errors and improves tracking accuracy compared to traditional schemes. Ultimately, this approach enhances the robustness and engineering practicality of robotic visual servoing through the deep coordination of control and estimation.

Keywords:

Image-Based Visual Servoing (IBVS); Model Predictive Control (MPC); Radial Basis Function (RBF) neural network; Extended Kalman Filter (EKF); disturbance estimation; K-means clustering

1. Introduction

Image-Based Visual Servoing (IBVS) technology directly utilizes image feature feedback to control robotics movement, eliminating the need for precise hand-eye calibration and complex environmental modeling. Particularly in unstructured and dynamic environments—such as high-speed assembly lines and collaborative workspaces—IBVS acts as an irreplaceable solution where precise calibration is impossible. However, the engineering significance of IBVS relies heavily on its ability to maintain sub-pixel accuracy under strict safety constraints, which remains a critical hurdle for its widespread deployment [1,2,3]. For a robotic IBVS system, the seamless integration of precision control, reliable estimation, and efficient disturbance compensation is a prerequisite for stable operation. The IBVS controller must adapt to physical constraints and system nonlinearities; state estimation must extract reliable image feature information from noisy measurements; and disturbance compensation must offset friction, load changes, and modeling errors in real-time. The synergy of these three factors directly determines the engineering practicality of the robotic IBVS system [4,5,6].

However, in actual engineering scenarios, the IBVS system faces a three-level progressive bottleneck of “nonlinearity and constraint coupling, noise and disturbance interference, and insufficient disturbance observation performance”, which seriously restricts its high-precision application. Firstly, at the control layer, there is a fundamental conflict between nonlinearity decoupling and physical constraint handling. While feedback linearization effectively decouples system dynamics [7], it inherently ignores actuator limits, posing a risk of hardware damage or saturation. Conversely, linear MPC handles constraints explicitly but suffers from model mismatch when applied to the highly nonlinear IBVS dynamics, leading to tracking degradation [8]. Secondly, at the estimation layer, there is the problem of noise and disturbance coupling interference. Image sensors are susceptible to Gaussian noise and sudden changes in illumination, resulting in distortion of feature measurement signals. The lumped disturbances formed by joint friction, load fluctuations, and Jacobian modeling errors during the robot movement will further distort the system state information. This coupling induces a vicious cycle of degradation: measurement noise propagates into the disturbance observer, causing estimation divergence, while uncompensated disturbances conversely distort the state prediction. This mutual interference makes it extremely challenging to simultaneously suppress high-frequency noise and accurately learn low-frequency lumped disturbances [9,10]. Thirdly, there is the problem of optimizing key parameters for disturbance observation. Due to its high local approximation accuracy and fast learning speed [11], the Radial Basis Function (RBF) neural network is a mainstream disturbance observation tool in current IBVS systems [12,13]. Recently, to further mitigate computational burdens and enhance convergence speed, advanced neural control architectures have been developed. For instance, ref. [14] proposed a velocity-free adaptive neural-fuzzy control strategy, achieving predefined-time convergence for spacecraft attitude tracking with reduced computational complexity. Similarly, ref. [15] designed a fast fixed-time distributed neural disturbance observer for UAVs, which effectively guarantees rapid disturbance estimation under limited resources. However, IBVS systems usually use multiple feature points to construct image feature vectors, resulting in a high-dimensional input space for RBF. At this time, traditional center selection methods face the ’curse of dimensionality’ dilemma: Global clustering generates excessive redundant centers, violating the real-time requirements of high-frequency servoing; whereas random sampling fails to cover critical task regions, resulting in unacceptable approximation errors [16,17]. Balancing estimation fidelity with computational efficiency is an urgent problem to be solved.

To break through the above bottlenecks, various researchers have carried out extensive research from three dimensions: IBVS control, disturbance estimation, and RBF center optimization. In the field of control methods, Caradonna et al. [18] proposed feedback linearization to control the dynamics of continuum soft robots, realizing system linearization to handle nonlinear coupling, but did not consider joint physical constraints. Sauvée et al. [19] combined IBVS with the Nonlinear Model Predictive Control (NMPC) architecture, explicitly incorporating joint angle limits, actuator torque saturation, and target visibility constraints (ensuring that image features are always within the camera resolution range) into the constraints of the optimization problem, but there is still much room for improvement in real-time performance. Allibert et al. [20] attempted a combined scheme of “feedback linearization + MPC”, but did not introduce a disturbance compensation module, which is suitable for ideal disturbance-free environments.

In the field of state estimation and disturbance observation, the Extended Kalman Filter (EKF) has become the mainstream choice due to its strong noise suppression ability and moderate computation [21,22,23,24]. However, the traditional EKF classifies unknown disturbances as process noise, which needs to be accommodated by increasing the process noise covariance matrix, leading to reduced state estimation accuracy; although the RBF neural network can approximate nonlinear disturbances through online learning, directly using noisy image features as network inputs may cause network parameter oscillation and divergence; Esfandiari et al. [25] used EKF to serially update RBF network weights to adapt to external disturbances, realizing robot trajectory tracking. However, in the one-way design, the disturbance observation results cannot be fed back to correct the state prediction of EKF, failing to give play to the cooperative advantages of the two. The extended state EKF can estimate both state and disturbance [26,27], but it needs to expand the dimension of the state vector, resulting in an increase in computation compared with the traditional EKF, which is difficult to meet the real-time control requirements of the IBVS system.

In the field of RBF center optimization, Wurzberger et al. [28] pointed out that people directly randomly sample a certain number of centers from the training data set, which has the advantages of simple implementation and high real-time performance, and is suitable for simple tasks with small data volume, uniform distribution, and low precision requirements; some studies from the literature [29,30] used K-means clustering to generate RBF centers offline to improve disturbance approximation accuracy; in the IBVS field, if the selection of RBF centers can be combined with the IBVS task path and robot kinematic constraints, it can avoid introducing too many redundant or unreachable centers.

Inspired by the above research, this paper proposes an integrated solution of “feedback linearization IBVS-MPC control, EKF-RBF bidirectional coupled estimation, and task-oriented optimization of RBF centers for IBVS trajectories”. Through the closed-loop design of “decoupling nonlinearity and constraints at the control layer, cooperatively suppressing noise and disturbances at the estimation layer, and optimizing RBF center selection at the tool layer”, the three-level bottleneck is systematically broken through.

The main research contents and innovations of this paper are as follows:

1.: A coupled RBF-EKF bidirectional estimation mechanism is developed. Specifically, the EKF filters noisy visual measurements to provide refined state estimates, which serve as high-quality inputs for the RBF network. Simultaneously, the RBF network, configured with task-oriented centers, learns lumped disturbances online. This disturbance estimate is then fed back into the EKF’s prediction step to compensate for model deviations. This interaction establishes a synergistic closed-loop of “state estimation, disturbance learning, and predictive correction.”
2.: A task-oriented RBF center selection method based on K-means clustering was designed. By coupling the IBVS-MPC control law, robot forward kinematics, and a camera projection model, a task-oriented nominal image feature sequence covering the “initial-target posture” is iteratively generated. A compact center set closely fitting the task path is obtained through K-means clustering, which not only ensures the kinematic reachability of the centers but also controls the number of centers within a certain range, balancing the disturbance approximation accuracy and system real-time performance.
3.: The Uniformly Ultimately Bounded (UUB) stability of the EKF-RBF coupled state-disturbance estimation system was strictly proven based on Lyapunov stability theory, the convergence boundaries of state errors and disturbance estimation errors were clarified, the convergence of time-varying disturbance observation was ensured, and network divergence caused by noisy signals was avoided.
4.: The proposed method was verified by manipulator simulation experiments to significantly improve the estimation accuracy of states and disturbances while meeting real-time control requirements, and it exhibited strong engineering practicality.

The subsequent section arrangement of this paper is as follows: Section 2 establishes the image feature motion model and the robot velocity kinematics model of the IBVS system, as well as the challenges it faces, and clarifies the problem description; Section 3 details the design of the integrated solution, including the feedback linearization IBVS-MPC control law, the EKF-RBF coupled state-disturbance estimation method, and the task-oriented RBF center selection method based on K-means clustering; Section 4 conducts the stability analysis of EKF-RBF coupled state-disturbance estimation; Section 5 verifies the effectiveness of the method through simulation experiments and compares it with traditional methods; Section 6 summarizes the full text and looks forward to future research directions.

2. Problem Description

2.1. Introduction to IBVS System

Image-Based Visual Servoing (IBVS) is a technology that directly uses two-dimensional (2D) image feature information (such as feature point coordinates, line segment lengths, regional centroids, etc.) collected by image sensors to construct a feedback closed loop, realizing precise motion control of the end effector. Its core idea is different from Position-Based Visual Servoing (PBVS)—it does not need to convert image features into three-dimensional (3D) spatial coordinates, nor does it rely on precise hand-eye calibration and robot dynamics models. By directly minimizing the error between “current image features” and “target image features”, the robot is driven to move to the target posture, which has stronger robustness to model uncertainties and environmental disturbances.

For the robot system, the control objective of IBVS can be described as: given a target image (containing a preset target feature vector

s^{*}

), by real-time collecting images near the end effector (containing the current feature vector s), a visual servo controller is designed to generate control commands, so that the end effector moves along the optimal trajectory, and finally satisfies

lim_{t \to 0} | | s - s^{*} | | = 0,

(1)

realizing high-precision positioning or trajectory tracking.

2.2. Mathematical Models

2.2.1. IBVS Image Feature Motion Model

Firstly, we define the image feature vector as a set of 2D coordinates of

n_{1}

feature points on the target object:

s = {[u_{1}, v_{1}, u_{2}, v_{2}, \dots, u_{n_{1}}, v_{n_{1}}]}^{T} \in R^{2 n_{1}},

(2)

where

u_{i}, v_{i}

are the coordinates of the i-th feature point in the image pixel coordinate system (u is the horizontal axis, v is the vertical axis).

The movement of the end effector (position

p \in R^{3}

, posture

θ \in R^{3}

) will cause changes in the position of image feature points. The dynamic relationship between them is described by the image Jacobian matrix

L (s) \in R^{2 n_{1} \times 6}

, which is essentially the partial derivative matrix of the feature vector s with respect to the generalized velocity

v = {[{\dot{p}}^{T}, {\dot{θ}}^{T}]}^{T} \in R^{6}

of the end effector (linear velocity

\dot{p}

, angular velocity

\dot{θ}

), i.e.,:

\dot{s} = L (s) v .

(3)

For the camera model in the Eye-in-Hand mode, the Jacobian matrix sub-block

L_{i} (s) \in R^{2 \times 6}

corresponding to a single feature point

(u_{i}, v_{i})

is [31]:

L_{i} (s) = [\begin{matrix} - \frac{f_{x}}{Z_{i}} & 0 & \frac{u_{i} - u_{0}}{Z_{i}} & \frac{(u_{i} - u_{0}) v_{i}}{f_{x}} & - \frac{f_{x}^{2} + {(u_{i} - u_{0})}^{2}}{f_{x}} & v_{i} \\ 0 & - \frac{f_{y}}{Z_{i}} & \frac{v_{i} - v_{0}}{Z_{i}} & \frac{f_{y}^{2} + {(v_{i} - v_{0})}^{2}}{f_{y}} & - \frac{(u_{i} - u_{0}) (v_{i} - v_{0})}{f_{x}} & - (u_{i} - u_{0}) \end{matrix}],

(4)

where

f_{x}

is defined as the ratio of the camera physical focal length f to the pixel size

σ_{u}

in the u-axis direction of the pixel coordinate system (

f_{x} = f / σ_{u}

),

f_{y}

is defined as the ratio of the camera physical focal length f to the pixel size

σ_{v}

in the v-axis direction of the pixel coordinate system (

f_{y} = f / σ_{v}

),

(u_{0}, v_{0})

are the camera principal point coordinates, and

Z_{i}

is the depth (3D spatial distance) from the i-th feature point to the camera optical center, which can be obtained by measurement with a depth camera.

The entire image Jacobian matrix

L (s)

is a stack of

n_{1}

feature point sub-blocks:

L (s) = [\begin{matrix} L_{1} (s) \\ ⋮ \\ L_{n_{1}} (s) \end{matrix}] .

(5)

To indirectly obtain the 6-Degrees of Freedom (DOF) pose of the object, at least 3 feature points need to be marked [32]. If only three feature points are used, due to the singularity of the image Jacobian matrix, after the error is eliminated, there will be 4 global minima that cannot be distinguished. Therefore, the IBVS system in this paper selects 4 feature points to generate the image Jacobian matrix, i.e.,

(n_{1} = 4)

.

2.2.2. Robotic Kinematics Model

To implement precise control and state prediction for the 6-DOF manipulator (UR5 is employed in the experimental section), establishing a rigorous kinematic model is essential. The Direct Kinematics (DK) method determines the end-effector’s pose relative to the base frame based on joint angles, which constitutes the mathematical foundation for the RBF center selection strategy (Section 3.3).

The kinematic chain is modeled using the standard Denavit-Hartenberg (D-H) convention. For the i-th joint, the homogeneous transformation matrix

{}^{i - 1}T_{i}

relating frame

i - 1

to frame i is expressed as:

{}^{i - 1}T_{i} = [\begin{matrix} cos θ_{i} & - sin θ_{i} cos α_{i} & sin θ_{i} sin α_{i} & a_{i} cos θ_{i} \\ sin θ_{i} & cos θ_{i} cos α_{i} & - cos θ_{i} sin α_{i} & a_{i} sin θ_{i} \\ 0 & sin α_{i} & cos α_{i} & d_{i} \\ 0 & 0 & 0 & 1 \end{matrix}],

(6)

where

a_{i}, d_{i}, α_{i}

and

θ_{i}

represent the link length, link offset, twist angle, and joint angle, respectively. The standard D-H parameters for the UR5 manipulator are detailed in Table 1.

By sequentially multiplying the transformation matrices from the base to the end-effector, the forward kinematics equation is obtained:

T_{e b} (q) = \prod_{i = 1}^{6} {}^{i - 1}T_{i} = [\begin{matrix} R_{e b} (q) & P_{e b} (q) \\ 0_{1 \times 3} & 1 \end{matrix}],

(7)

where

R_{e b} (q) \in R^{3 \times 3}

denotes the rotation matrix and

P_{e b} (q) \in R^{3}

denotes the position vector of the end-effector. This explicit kinematic mapping

T_{e b} (q)

corresponds to the nonlinear function

F_{k} (\cdot)

cited later in Equation (28), ensuring the accuracy of the trajectory prediction.

2.2.3. Robotic Velocity Kinematics Model

Based on the kinematic model above, the robotic velocity kinematics model is further derived to transform velocities from the end-effector space to the joint space. Within this model, the Jacobian matrix serves as the fundamental bridge connecting joint space motion to end-effector Cartesian motion. It is also the key link establishing the correlation between image feature variations and robotic movement. Essentially, the Jacobian matrix defines the linear mapping between the generalized velocity of the end-effector and the joint velocity:

v = J (q) \dot{q},

(8)

where

q = {[q_{1}, \dots, q_{n_{2}}]}^{T} \in R^{n_{2}}

denotes the joint vector of the

n_{2}

-axis robot.

\dot{q}

is the joint velocity vector, which directly reflects the motion rate of each joint.

J (q) \in R^{6 \times n_{2}}

stands for the robot velocity Jacobian matrix, which acts as a local linear mapping operator from joint velocity to end-effector generalized velocity. The calculation formula [33] is:

J (q) = [J_{1} (q), J_{2} (q), \dots, J_{n_{2}} (q)],

(9)

where the i-th column

J_{i} (q)

describes the contribution to the end effector generalized velocity when only the i-th joint moves, and its expression is derived from the Denavit-Hartenberg (D-H) parameters of the robot:

J_{i} (q) = [\begin{matrix} z_{i - 1} \times r_{i - 1, e} \\ z_{i - 1} \end{matrix}],

(10)

where

z_{i - 1}

represents the unit vector of the

i - 1

-th joint coordinate axis,

r_{i - 1, e}

represents the position vector from the origin of the

i - 1

-th joint coordinate system to the origin of the end effector coordinate system, and × represents the vector cross product operation. Due to the change in the robot joint state q, the elements of

J (q)

will be dynamically adjusted. We need to update

J (q)

in real time to maintain the velocity mapping accuracy within the local range corresponding to the current q. Through

J (q)

, we can directly realize the bidirectional mapping of “joint velocity to end effector velocity” and “end effector velocity to joint velocity”, which becomes an indispensable link in the IBVS system.

2.3. Challenges Faced by IBVS

Although the previously derived image feature motion equation and velocity Jacobian matrix provide the fundamental mapping from image feature changes to joint motion, this model is strictly valid only under ideal scenarios. In real application environments such as industrial assembly and intelligent sorting, the IBVS system is subject to complex environmental interference, modeling errors, and measurement noise. These factors undermine the assumptions of the nominal model, directly compromising control accuracy and stability.

In the nominal IBVS model, the image Jacobian matrix is assumed to be completely known and accurate, but there are two primary sources of error in practical scenarios. First, measurement noise arises from environmental interference. Factors such as abrupt illumination changes, dust occlusion, and surface reflections can introduce random noise into the feature point coordinates (image feature vector s) captured by sensors. Additionally, the feature point depth

Z_{i}

—a core parameter of the image Jacobian—may not be directly measured and needs to be obtained through indirect estimation, and errors are inevitably introduced in the estimation process. Furthermore, camera calibration inaccuracies (e.g., deviations between the actual values and calibrated values of principal point coordinates and focal length) and incomplete lens distortion compensation prevent the acquisition of an exact image Jacobian. Consequently, only a nominal image Jacobian matrix

L_{0} (s)

can be obtained through theoretical derivation or calibration. Therefore, the nominal image feature motion equation must be modified to include error terms:

\{\begin{matrix} \dot{x} = L_{0} (s) v + d_{1} \\ y = x + n_{s} \end{matrix},

(11)

where

\dot{x}

is the rate of change of the image feature vector

\dot{s}

(image feature velocity), and

d_{1} \in R^{2 n_{1}}

denotes uncertainties such as modeling uncertainty and external disturbances.

y \in R^{2 n_{1}}

represents the measurement value containing noise.

n_{s} \in R^{2 n_{1}}

is the image measurement noise.

Similarly, the nominal robot velocity kinematics model relies on accurate D-H parameters and joint states q to calculate the velocity Jacobian matrix

J (q)

. However, errors are unavoidable in reality. While nominal D-H parameters (e.g., link length, twist angle) are provided by manufacturers, assembly tolerances and thermal deformation can cause discrepancies between the real and nominal parameters. Moreover, the joint angle q is obtained via encoders, which are subject to random measurement errors caused by circuit noise and mechanical vibration. Unmodeled dynamics, such as joint friction and link flexibility, further aggravate mapping errors. Thus, the nominal velocity Jacobian mapping Equation (8) must be revised as follows:

v = J_{0} (q_{0}) \dot{q} + d_{2},

(12)

where

J_{0} (q_{0})

represents the nominal velocity Jacobian matrix calculated from the nominal D-H parameters and the measured joint state

q_{0}

.

d_{2} \in R^{6}

denotes the mapping error term at the end-effector caused by parameter deviations, measurement errors, and unmodeled dynamics, which directly affects the accuracy of the generalized velocity v.

By substituting Equation (12) into Equation (11), it can be obtained that:

\{\begin{matrix} \dot{x} = J_{t o t a l} (x, q_{0}) \dot{q} + D \\ y = x + n_{s} \end{matrix},

(13)

where

J_{t o t a l} (s, q_{0}) = L_{0} (s) J_{0} (q_{0}) \in R^{2 n_{1} \times n_{2}}

represents the total nominal Jacobian matrix of the IBVS system, describing the approximate mapping from joint velocity

\dot{q}

to image feature velocity

\dot{s}

in real scenarios.

D = L_{0} (s) d_{2} + d_{1} \in R^{2 n_{1}}

represents the lumped disturbance of the system, which integrates Jacobian mapping errors, modeling errors, and external disturbances. Crucially, this term exhibits nonlinearity, time-variability, and uncertainty.

The existence of the lumped disturbance D significantly undermines the precise mapping of the nominal model. On one hand, random noise distorts the feature velocity

\dot{s}

, affecting the accuracy of the inverse mapping based on “control command-feature error.” On the other hand, the time-varying lumped disturbance causes the system’s dynamic characteristics to deviate from the nominal model. Consequently, traditional IBVS control laws based on precise models (such as proportional control) struggle to achieve high-precision trajectory tracking and may even induce system oscillation. Therefore, effectively suppressing measurement noise and accurately estimating system states and the lumped disturbance D have become core challenges for improving IBVS performance. This challenge serves as the direct motivation for the RBF-EKF coupled state-disturbance estimation method proposed in this paper.

3. Control Methods

To address the critical issues identified in Section 2—specifically reduced mapping accuracy, increased trajectory tracking error, and insufficient control stability caused by the lumped disturbance D (comprising image measurement noise, Jacobian modeling errors, and external environmental interference)—this section proposes an integrated solution combining “high-precision estimation” and “control compensation.” The core strategy involves accurately acquiring system states and lumped disturbance information by integrating advanced control strategies with intelligent estimation methods, and subsequently introducing a disturbance feedforward compensation term into the control law. This approach effectively suppresses uncertainties, thereby improving the overall control accuracy and stability of the IBVS system.

To fully realize this design philosophy, the section content is structured logically, progressing from control law design to estimation method construction, and finally to parameter optimization. Section 3.1 begins by considering the motion constraints and trajectory tracking requirements of the robot. An IBVS control law based on feedback linearization and Model Predictive Control (MPC) is designed, which enhances the system’s adaptability to physical constraints—such as joint angles and velocities—through rolling optimization and constraint handling capabilities. Section 3.2 subsequently addresses the complexity of the lumped disturbance D and the interference of state measurement noise by proposing a coupled state-disturbance estimation method combining the Extended Kalman Filter (EKF) and Radial Basis Function (RBF) neural network. The EKF is employed to suppress measurement noise and achieve high-precision estimation of image features. Simultaneously, using the estimated state as input, the RBF neural network utilizes its local approximation characteristics to learn the lumped disturbance D online. This estimation result is fed back to the EKF to correct state predictions, establishing a closed-loop cooperative mechanism of “state estimation–disturbance learning–prediction correction.” Section 3.3 focuses on the core prerequisite for accurate RBF estimation: the reasonable selection of high-dimensional network centers. Traditional methods often suffer from redundancy due to undifferentiated coverage, physical infeasibility due to kinematic constraint violations, or difficulties in balancing accuracy with real-time performance. To overcome these challenges, an RBF center selection method based on the IBVS nominal path is proposed. By leveraging the coupled iteration of IBVS-MPC, robot kinematics, and the camera projection model, this method achieves task-oriented coverage of key image feature areas and guarantees the kinematic reachability of centers, thus balancing estimation accuracy with system real-time performance.

Through these interconnected components, the effective suppression of lumped disturbance and a significant improvement in control performance are achieved, providing solid theoretical and methodological support for the subsequent experimental verification.

3.1. Feedback Linearization IBVS-MPC Control

A fundamental challenge in IBVS systems is their inherent nonlinearity, which arises from the time-varying nature of both the image Jacobian matrix and the robot velocity Jacobian matrix. Consequently, traditional linear control methods, such as proportional control, are ineffective for ensuring control accuracy in large-scale trajectory tracking scenarios. Furthermore, the robot joint positions and velocities are subject to strict physical constraints. Failure to account for these constraints in the control law may lead to system oscillation or hardware damage. To address these issues, this section proposes a robust control law that combines the constraint-handling capabilities of Model Predictive Control (MPC) with the nonlinear decoupling capacity of feedback linearization. Crucially, linear MPC offers low computational cost and high real-time performance. Its optimization process can be transformed into a convex Quadratic Programming (QP) problem, allowing for rapid solutions that meet the high-frequency control requirements of IBVS systems.

However, linear MPC is strictly applicable only to linear systems, whereas the IBVS system is typically nonlinear. Direct application of linear MPC would result in model mismatch, leading to increased tracking errors and instability. Therefore, it is necessary to decouple and linearize the nonlinear IBVS system via feedback linearization to establish the foundation for linear MPC application.

Feedback linearization is a nonlinear control method based on accurate modeling. Its core principle involves transforming a nonlinear system into a fully controllable linear system through nonlinear state feedback and coordinate transformation, thereby facilitating the application of mature linear control strategies. Unlike local linearization methods (e.g., Taylor expansion), feedback linearization achieves accurate linearization within the global scope, effectively retaining system dynamic characteristics and avoiding large-scale tracking errors. Additionally, the system’s lumped disturbance D can be explicitly separated during this process, providing an interface for subsequent feedforward compensation via EKF+RBF estimation, thus enhancing system robustness.

First, we design the following nonlinear state feedback control law:

\dot{q} = J_{t o t a l}^{†} (s, q_{0}) (u_{0} - \hat{D}),

(14)

where

J_{t o t a l}^{†} (s, q_{0}) \in R^{n_{2} \times 2 n_{1}}

represents the pseudoinverse of the total Jacobian matrix.

u_{0} \in R^{2 n_{1}}

represents the virtual control quantity, i.e., the control input of linear MPC.

\hat{D} \in R^{2 n_{1}}

is the estimated value of the lumped disturbance D, which will be generated by the EKF+RBF coupled estimation introduced later.

By substituting this feedback control law into Equation (10), under the premise that

\hat{D} \approx D

, the linearized system can be obtained:

\dot{x} = u_{0} .

(15)

At this stage, the nonlinear IBVS system has been converted into a simple linear integral system, eliminating the nonlinear coupling of the original system and providing a nominal linear model for MPC design.

Next, the linearized system Equation (15) is discretized to obtain the discrete state equation:

x (k + 1) = A_{D} x (k) + B_{D} u_{0} (k),

(16)

where

A_{D} \in R^{2 n_{1} \times 2 n_{1}}

represents the discrete system matrix,

B_{D} \in R^{2 n_{1} \times 2 n_{1}}

represents the discrete input matrix,

x (k + 1) \in R^{2 n_{1}}

represents the system state at time

k + 1

, and

u_{0} (k) \in R^{2 n_{1}}

represents the control input at time k.

To prevent system instability or hardware damage, the strict physical limits of the robot must be respected. This paper converts these physical constraints into constraints on the system state x (image features), the MPC virtual control quantity

u_{0}

, its increment

Δ u_{0}

, and the joint velocity

\dot{q}

, as follows:

\{\begin{matrix} x_{m i n} \leq x \leq x_{m a x} \\ U_{m i n} \leq u_{0} \leq U_{m a x} \\ Δ U_{m i n} \leq Δ u_{0} \leq Δ U_{m a x} \end{matrix},

(17)

where

x_{m i n} \in R^{2 n_{1}}

and

x_{m a x} \in R^{2 n_{1}}

represent the feasible region of image features within the pixel coordinate system.

U_{m i n} = J_{t o t a l} (s, q) {\dot{q}}_{m i n} \in R^{2 n_{1}}

and

U_{m a x} = J_{t o t a l} (s, q) {\dot{q}}_{m a x} \in R^{2 n_{1}}

represent the minimum and maximum values of the MPC virtual control quantity, respectively, which together define the range of the robot joint velocity.

Δ U_{m i n} \in R^{2 n_{1}}

and

Δ U_{m a x} \in R^{2 n_{1}}

represent the minimum and maximum constraints on the control input increment.

We define the control input sequence

U (k)

and state prediction sequence

X (k)

at time k:

\{\begin{matrix} U (k) = {[u_{0} {(k | k)}^{T}, u_{0} {(k + 1 | k)}^{T}, \dots, u_{0} {(k + N_{p} - 1 | k)}^{T}]}^{T} \\ X (k) = {[x {(k + 1 | k)}^{T}, x {(k + 2 | k)}^{T}, \dots, x {(k + N_{p} | k)}^{T}]}^{T} \end{matrix},

(18)

where

X \in R^{2 n_{1} N}

represents the prediction of the system state at time

k + i

based on the state at time k,

u (k + i | k)

represents the planned control input, and

N_{p}

represents the prediction horizon.

The prediction model can be derived as follows:

X (k) = M x (k) + K U (k),

(19)

where

M = {[A_{D}^{T}, {(A_{D}^{2})}^{T}, \dots, {(A_{D}^{N})}^{T}]}^{T} x (k | k)

,

K = [\begin{matrix} B_{D} & O_{2 n_{1} \times 2 n_{1}} & \dots & O_{2 n_{1} \times 2 n_{1}} \\ A_{D} B_{D} & B_{D} & \dots & O_{2 n_{1} \times 2 n_{1}} \\ A_{D}^{2} B_{D} & A_{D} B_{D} & \dots & O_{2 n_{1} \times 2 n_{1}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ A_{D}^{N - 1} B_{D} & A_{D}^{N - 2} B_{D} & \dots & B_{D} \end{matrix}]

.

The reference state sequence is defined as:

X_{r e f} (k) = {[x_{d} {(k + 1 | k)}^{T}, x_{d} {(k + 2 | k)}^{T}, \dots, x_{d} {(k + N | k)}^{T}]}^{T},

(20)

where

X_{r e f} (k) \in R^{2 n_{1} N}

represents the reference state sequence, and

x_{d} (k + i | k) \in R^{2 n_{1}}

is the desired state at time

k + i

, serving as the target trajectory.

A cost function is designed to balance tracking accuracy with control smoothness:

J_{l} (U (k)) = {(X (k) - X_{r e f})}^{T} Q (X (k) - X_{r e f}) + U {(k)}^{T} R U (k),

(21)

where

Q = diag {Q_{c}, Q_{c}, \dots, Q_{c}} \in R^{2 n_{1} N \times 2 n_{1} N}

and

R = diag {R_{c}, R_{c}, \dots, R_{c}} \in R^{2 n_{1} N \times 2 n_{1} N}

are the weight diagonal matrices for the state and input, respectively.

Q_{c} \in R^{2 n_{1} \times 2 n_{1}}

represents the weight error matrix, and

R_{c} \in R^{2 n_{1} \times 2 n_{1}}

represents the control input weight matrix.

e (k + i | k) = x (k + i | k) - x_{d} (k + i | k)

represents the projected tracking error vector within the prediction horizon.

The core objective of MPC is to find a feasible solution—the optimal control sequence

U^{*}

—such that the cost function

J_{l}

is minimized. This is achieved by solving the following convex optimization problem under the constraints defined in Equation (17)

U {(k)}^{*} = arg min_{U} J_{l} (U (k)) = arg min_{U} (\frac{1}{2} U {(k)}^{T} H U (k) + f^{T} U (k) + S),

(22)

where

H = 2 (K^{T} Q K + R)

,

f = 2 {(M x (k) - X_{r e f})}^{T} Q K

.

Finally, the MPC controller employs a receding horizon strategy. Only the first element of the optimal input sequence

U^{*}

is applied as the auxiliary control quantity

u_{0}

in the control law (14). At the next moment

k + 1

, the state

x (k + 1 | k + 1)

is re-measured, and the prediction and optimization process is repeated. This real-time correction compensates for errors caused by uncertainties and ensures system robustness.

3.2. EKF + RBF Coupled Estimation Method

The effectiveness of the feedback linearization IBVS-MPC strategy proposed in Section 3.1 relies on a critical premise: the estimated value of the lumped disturbance

\hat{D}

must closely approximate the actual lumped disturbance D. The accuracy of feedback linearization depends entirely on the effective compensation of these disturbances. If a significant discrepancy between

\hat{D}

and D exists, the control law fails to cancel out the nonlinear coupling and original system disturbances. Worse, it may introduce additional errors into the linearized system, causing a mismatch between the MPC optimization model and actual system dynamics. Ultimately, this leads to degraded feature point tracking accuracy and system oscillation.

In practical scenarios, however, the lumped disturbance D exhibits significant complexity and uncertainty, making it difficult to address with standard techniques. On one hand, D aggregates deterministic deviations caused by Jacobian modeling errors and external environmental disturbances, which are difficult to describe accurately using traditional analytical modeling. On the other hand, relying on a single estimation method makes it difficult to balance the dual requirements of “noise suppression” and “disturbance approximation.” The standalone Extended Kalman Filter (EKF), while effective at suppressing Gaussian measurement noise, has limited capacity to approximate complex, unmodeled disturbances. Conversely, while the Radial Basis Function (RBF) neural network possesses strong nonlinear approximation capabilities, it is highly sensitive to input quality; inputting noisy measurement signals directly often leads to parameter oscillation and divergence, rendering the estimation unstable.

To address these challenges, this section proposes a joint estimation method based on the bidirectional cooperation of the EKF and RBF neural network. The core design logic leverages the optimal estimation characteristics of the EKF to filter noisy measurement signals first, thereby achieving high-precision estimation of image features. This process provides high-quality, denoised inputs for the RBF neural network. Subsequently, the RBF network utilizes its local approximation capability to learn the lumped disturbance D online. Crucially, this disturbance estimation is fed back into the EKF’s state prediction stage to correct deviations caused by unmodeled dynamics. This establishes a closed-loop cooperative mechanism characterized by “EKF state estimation–denoised input–RBF disturbance learning–EKF prediction correction.”

The remainder of this section details the parameter update rules and the bidirectional cooperation process. This ensures the proposed method outputs high-precision state and disturbance estimates, providing reliable support for the control strategy outlined in Section 3.1.

For the IBVS system, the EKF designed in this paper is as follows:

\dot{\hat{x}} = J_{t o t a l} (\hat{x}, q_{0}) \dot{q} + \hat{D} + K_{k} (y - \hat{x}),

(23)

where

K_{k} \in R^{2 n_{1} \times 2 n_{1}}

represents the Kalman gain. The estimated value of the lumped disturbance

\hat{D}

is provided by the RBF neural network described below, which serves to correct the state prediction deviation caused by system disturbances, as shown in Figure 1.

Although the EKF is formulated in continuous time to describe the physical dynamics, the actual implementation employs a discrete-time approximation using the First-Order Forward Euler method. The state prediction step is computed as

{\hat{x}}_{k + 1} = {\hat{x}}_{k} + {\dot{\hat{x}}}_{k} Δ t

with a fixed sampling interval of

Δ t = 0.01 s

. This method was selected to minimize computational overhead while maintaining sufficient accuracy for the 100 Hz control loop. The measurement update is triggered discretely upon the acquisition of each new image frame. It is noted that the choice of sampling time is critical; the selected 10 ms interval ensures that the discretization error inherent in the Euler method remains negligible, providing stable estimation without imposing the heavy computational load associated with higher-order integration methods.

The RBF neural network possesses adaptive approximation capabilities for high-dimensional nonlinear disturbances via its radial basis functions. In this study, the RBF network is employed to estimate the lumped disturbance D. Its structure is defined as:

\hat{D} = {\hat{W}}^{T} Φ (\hat{x}),

(24)

where the input vector

\hat{x}

is obtained from the a posteriori state estimation of the EKF at the previous step.

\hat{W} \in R^{N \times 2 n_{1}}

represents the weight estimation matrix (where N denotes the number of hidden layer nodes), and

Φ (\hat{x}) = {[ϕ_{1} (\hat{x}), \dots, ϕ_{N} (\hat{x})]}^{T} \in R^{N}

represents the radial basis function vector. Given that Gaussian functions exhibit desirable characteristics—such as smoothness, differentiability, and universal approximation capabilities [34]—they are adopted as the activation functions in this study. The expression for the radial basis function

ϕ_{i} (\hat{x})

of the i-th node in the hidden layer is:

ϕ_{i} (\hat{x}) = exp (- \frac{| | \hat{x} - c_{i} {| |}^{2}}{2 σ_{i}^{2}}),

(25)

where

c_{i} \in R^{2 n_{1}}

represents the coordinate vector of the selected center, and

σ_{i} \in R

represents the width of the radial basis function.

Based on the Lyapunov stability analysis method, the adaptive law for the RBF neural network can be derived as:

\dot{\hat{W}} = Γ Φ (\hat{x}) {(y - \hat{x})}^{T} P,

(26)

where

Γ \in R^{N \times N}

represents a positive definite symmetric learning rate matrix.

P \in R^{2 n_{1} \times 2 n_{1}}

is a matrix introduced in the Lyapunov stability proof process, and its specific derivation process will be detailed in subsequent section. Finally, the bidirectional cooperation result of EKF and RBF neural network is

{[\hat{x}, \hat{D}]}^{T}

.

Remark 1.

Although this study validates the proposed method within a simulation environment, the framework is explicitly designed to mitigate the uncertainties inherent in physical visual sensors. Specifically, practical challenges such as depth estimation deviations and camera calibration inaccuracies are structurally treated as components of the system’s lumped disturbance. By learning these unknown terms online via the RBF neural network, the controller can maintain precision without relying on perfect parameter identification. Furthermore, the predictive nature of the Extended Kalman Filter combined with the receding horizon strategy of Model Predictive Control provides inherent robustness against temporary visual occlusions and sensor latency, allowing the system to maintain stable operation during short-term signal loss or processing delays.

3.3. Task-Oriented K-Means Clustering

A critical prerequisite for the accurate estimation of lumped disturbances by the RBF neural network (Section 3.2) is the appropriate selection of network centers. As defined in Section 2.2.1, the selection of four feature points results in an 8-dimensional input space for the RBF neural network; consequently, the network centers must also be 8-dimensional vectors. This high dimensionality presents significant challenges for center selection. The nonlinear approximation performance of RBF neural networks relies on the Universal Approximation Theorem [35]. A key corollary of this theorem is that the approximation error bound depends on the density of center distribution within the input space

Ω

. To minimize the approximation error with fewer neurons, the centers should be concentrated and uniformly distributed across the effective variation region of the disturbances.

However, traditional center selection methods typically employ a strategy of “indiscriminate coverage” in high-dimensional space. This approach tends to generate a large number of invalid centers (e.g., corresponding to feature points outside the robot workspace or the task path), leading to low approximation efficiency. Furthermore, it is difficult for traditional methods to balance estimation accuracy with the real-time performance of the IBVS system. While offline clustering with a large dataset can improve accuracy, the excessive number of centers results in high computational delays; conversely, random sampling supports real-time performance but suffers from insufficient accuracy due to the scattered distribution of centers.

To address the core limitations of traditional RBF center selection methods in disturbance estimation—specifically the redundancy caused by indiscriminate coverage, the kinematic infeasibility of centers falling outside robot constraints, and the conflict between estimation accuracy and real-time performance—we propose a task-oriented K-means clustering method based on the nominal IBVS path. The process proceeds as follows. First, at time step

k_{2}

, the current image is captured via the camera, and the 8-dimensional initial image feature vector is extracted. Next, based on the IBVS-MPC control law (Equation (14)) and assuming ideal conditions with no system disturbances (Equations (3) and (8)), the control input for the robot (i.e., the joint velocity) is computed. Subsequently, the joint velocity is integrated to predict the joint position at the next time step,

q (k_{2} + 1) \in R^{n_{2}}

, as follows:

q (k_{2} + 1) = q (k_{2}) + \dot{q} (k_{2}) Δ t,

(27)

where

Δ t

denotes the sampling period of the control system. The detailed process is illustrated in Figure 2.

The predicted joint position is then substituted into the robot forward kinematics forward kinematics model to calculate the end-effector pose in the base coordinate system for the next time step:

T_{e b} (k_{2} + 1) = F_{k} (q (k_{2} + 1)),

(28)

where

T_{e b} (k_{2} + 1)

denotes the homogeneous transformation matrix of the end-effector with respect to the base frame at time

k_{2} + 1

, and

F_{k} (\cdot)

represents the robot forward kinematics function.

Through the transformation relationship between multiple coordinate systems, the posture of the camera in the world coordinate system can be solved:

T_{c w} (k_{2} + 1) = T_{b w}^{- 1} (k_{2} + 1) \cdot T_{e b} (k_{2} + 1) \cdot T_{c e} (k_{2} + 1),

(29)

where

T_{c w} (k_{2} + 1)

denotes the camera pose within the world coordinate system.

T_{b w} (k_{2} + 1)

represents the fixed transformation from the robot base to the world frame, while

T_{c e} (k_{2} + 1)

signifies the constant transformation between the end-effector and the camera, which is determined via hand-eye calibration.

By applying the camera projection model, we can predict the nominal, disturbance-free image feature vector

s_{1}

at the next time step

k_{2} + 1

:

s_{1} = K \cdot [I_{3 \times 3} | O_{3 \times 1}] \cdot T_{c w} (k_{2} + 1) \cdot P_{w},

(30)

where K denotes the camera intrinsic parameter matrix, determined via Zhang’s calibration method [36].

P_{w}

represents the 3D world coordinates of the feature points.

By iteratively executing the steps outlined above, a continuous sequence of nominal image features is generated, covering the entire trajectory from the initial pose to the target pose:

S = {s_{0}, s_{1}, \dots, s_{N_{2}}} \in R^{8 \times (N_{2} + 1)},

(31)

where

N_{2} \in R

represents the number of iterations, i.e., the length of the planned path under ideal conditions.

Subsequently, K-means clustering is performed on the sequence S to obtain the final center set

{c_{1}, c_{2}, \dots, c_{N_{3}}} \in R^{8 \times N_{3}}

for the RBF neural network.

The center selection strategy proposed in this paper generates an nominal trajectory feature sequence through the coupled iteration of the IBVS-MPC control law and robot forward kinematics. This approach achieves task-oriented, compact coverage of key disturbance regions, where all feature points are strictly aligned with the complete task path—from the initial pose to the target pose—while satisfying camera constraints. Consequently, it avoids the spatial redundancy typical of traditional random sampling or unconstrained interpolation. Furthermore, the strategy ensures the kinematic reachability of each feature point through joint velocity solving and limit checking, thereby eliminating estimation blind spots caused by physically infeasible centers. By employing task-oriented K-means clustering on these continuous trajectory features, the method enhances approximation accuracy while maintaining a sparse and efficient set of RBF centers. This effectively balances RBF estimation accuracy with the real-time requirements of the Visual Predictive Control system, laying a solid foundation for the practical application of coupled state-disturbance estimation.

Remark 2.

It is worth noting that the performance of the proposed strategy depends on a trade-off between approximation accuracy and computational efficiency regarding the number of RBF centers. An insufficient number of centers may fail to capture the spatial complexity of the lumped disturbance (under-fitting), whereas an excessive number increases the computational load of the neural network, potentially compromising the real-time performance of the control loop. In this study, the number of centers was empirically determined to ensure sufficient disturbance approximation while strictly satisfying the control frequency requirements.

4. Stability Proof of EKF-RBF Coupled Estimation

The previous sections have completed the design of the feedback linearization IBVS-MPC control law, the EKF-RBF coupled state-disturbance estimation method, and the RBF center optimization method based on the IBVS ideal path, forming an integrated technical framework of “control–estimation–optimization”. To ensure the reliability and convergence of the proposed method at the theoretical level and avoid system oscillation or even instability caused by the EKF-RBF coupled state-disturbance estimation mechanism, this section conducts a rigorous analysis of the stability of the system based on Lyapunov stability theory. The core proof goal is to clarify the Uniformly Ultimately Bounded (UUB) stability conditions of the system state error (image feature observation error) and disturbance estimation error, verify the coupled compatibility between the noise suppression capability of EKF, the disturbance approximation performance of RBF, and the constraint optimization characteristics of MPC, and provide a solid theoretical basis for subsequent experimental verification and engineering implementation.

First, we make the following assumptions:

Assumption 1.

The system state x and state estimation

\hat{x}

are bounded, i.e.,

x, \hat{x} \in E

, where E is a compact set (a compact set in

R^{2 n_{1}}

is a bounded closed set).

Assumption 2.

The system lumped disturbance D is bounded, i.e.,

| | D | | \leq D_{m a x}

; the measurement noise

n_{s}

is bounded, i.e.,

| | n_{s} | | \leq v_{m a x}

and has finite variance.

Assumption 3.

The ideal weight

W^{*}

of RBF exists and satisfies the following boundedness condition:

| | W^{*} {| |}_{F} = \sqrt{tr (W^{* T} W^{*})} \leq W_{m a x},

(32)

so that the system lumped disturbance can be expressed as:

D = W^{* T} Φ (x) + ϵ,

(33)

where

ϵ (t)

represents the approximation error, and the approximation error is bounded, i.e.,

| | ϵ (t) | | \leq ϵ_{m a x}

.

Assumption 4.

The radial basis function

Φ (x)

is bounded, i.e.,

| | Φ (x) | | \leq Φ_{m a x}

.

Assumption 5.

The radial basis function

Φ (x)

is Lipschitz continuous on the compact set E, i.e., there exists a constant

L_{ϕ}

such that:

| | Φ (x_{1}) - Φ (x_{2}) | | \leq L_{ϕ} | | x_{1} - x_{2} | |, \forall x_{1}, x_{2} \in E .

(34)

Assumption 6.

The Kalman gain

K_{k}

is bounded, i.e.,

| | K_{k} | | \leq K_{m a x}

.

Remark 3.

The assumptions are grounded in practical system constraints. Assumptions A1 and A2 hold because the finite workspace of the UR5 manipulator and the fixed camera resolution inherently bound the system states, while actuator torque limits restrict the magnitude of lumped disturbances [31]. Assumptions A3–A5 follow standard RBF properties: the Universal Approximation Theorem [35] guarantees the existence of ideal weights, and Gaussian basis functions ensure Lipschitz continuity. Finally, Assumption A6 aligns with EKF stochastic stability theory [37], which ensures bounded Kalman gains under the condition of uniform observability.

Based on these assumptions, the stability of the proposed system is summarized in the following theorem:

Theorem 1.

Consider the IBVS system described by Equation (13) with the control law (14) and the coupled estimator Equations (23) and (24). Under Assumptions A1–A6, the state estimation error

\tilde{x}

and disturbance estimation error

\tilde{D}

are Uniformly Ultimately Bounded (UUB).

Proof of Theorem 1.

We define the state error:

\tilde{x} = x - \hat{x} .

(35)

By substituting Equations (14) and (23) into Equation (35) and differentiating with respect to time, it can be obtained that:

\dot{\tilde{x}} = J_{t o t a l} (x, q_{0}) \dot{q} - J_{t o t a l} (\hat{x}, q_{0}) \dot{q} + D - \hat{D} - K_{k} \tilde{x} - K_{k} n_{s} .

(36)

We use the first-order Taylor linearization [38] of

J_{t o t a l} (x, q_{0}) \dot{q}

at

\hat{x}

:

J_{t o t a l} (x, q_{0}) \dot{q} = J_{t o t a l} (\hat{x}, q_{0}) \dot{q} + G \tilde{x} + Δ J,

(37)

where

G = \frac{\partial (J_{t o t a l} \dot{q})}{\partial x} |_{\hat{x}}

is the Jacobian matrix of

J_{t o t a l} (x, q_{0}) \dot{q}

, and

Δ J

is a higher-order term. By assuming that the higher-order term can be ignored, the following relationship is found to exist:

J_{t o t a l} (x, q_{0}) \dot{q} - J_{t o t a l} (\hat{x}, q_{0}) \dot{q} = G \tilde{x} .

(38)

By substituting Equation (38) into Equation (36), it can be obtained that:

\dot{\tilde{x}} = (G - K_{k}) \tilde{x} + \tilde{D} - K_{k} n_{s} .

(39)

Then, we define the disturbance observation error:

\tilde{D} = D - \hat{D} .

(40)

By combining the expressions of the system lumped disturbance Equations (24) and (33), it can be obtained that:

\tilde{D} = W^{* T} Φ (x) + ϵ - {\hat{W}}^{T} Φ (\hat{x}) .

(41)

We define the weight error:

\tilde{W} = W^{*} - \hat{W} .

(42)

Through derivation, Equation (41) can be written as:

\tilde{D} = W^{* T} (Φ (x) - Φ (\hat{x})) + ϵ + {\tilde{W}}^{T} Φ (\hat{x}) .

(43)

By combining the definition (40) and substituting Equation (43) into Equation (39), it can be obtained that:

\dot{\tilde{x}} = (G - K_{k}) \tilde{x} + {\tilde{W}}^{T} Φ (\hat{x}) + δ,

(44)

where

δ = W^{* T} (Φ (x) - Φ (\hat{x})) + ϵ - K_{k} n_{s}

, which contains all uncertain terms. Through Assumptions 2, 3, 5 and 6, we know that:

| | δ (t) | | \leq Δ | | \tilde{x} | | + \bar{δ},

(45)

where

Δ = L_{Φ} W_{m a x}

,

\bar{δ} = ϵ_{m a x} + K_{m a x} v_{m a x}

. We design the following form of Lyapunov function V:

V = \frac{1}{2} {\tilde{x}}^{T} P \tilde{x} + \frac{1}{2} tr ({\tilde{W}}^{T} Γ^{- 1} \tilde{W}) .

(46)

By differentiating V with respect to time, it can be obtained that:

\dot{V} = {\tilde{x}}^{T} P \dot{\tilde{x}} + tr ({\tilde{W}}^{T} Γ^{- 1} \dot{\tilde{W}}) .

(47)

Through the RBF neural network weight update law (26) and (42), we can obtain:

\dot{\tilde{W}} = - \dot{\hat{W}} = - Γ Φ (\hat{x}) {\tilde{x}}^{T} P .

(48)

By substituting Equations (44) and (48) into the derivative expression of the Lyapunov function (47) and through derivation, it can be obtained that:

\dot{V} = {\tilde{x}}^{T} P (G - K_{k}) \tilde{x} + {\tilde{x}}^{T} P δ (t) .

(49)

By reasonably selecting the parameters of EKF, we can ensure that

G - K_{k}

is Hurwitz, then the following relationship exists:

{(G - K_{k})}^{T} P + P (G - K_{k}) = - Q_{v}, Q_{v} ≻ 0,

(50)

where

Q_{v}

is a positive definite symmetric matrix. Therefore, Equation (49) can be written as:

\dot{V} = - \frac{1}{2} {\tilde{x}}^{T} Q_{v} \tilde{x} + {\tilde{x}}^{T} P δ (t) .

(51)

The following inequality relationship exists:

λ_{m i n} (Q_{v}) | | \tilde{x} {| |}^{2} \leq {\tilde{x}}^{T} Q_{v} \tilde{x} \leq λ_{m a x} (Q_{v}) | | \tilde{x} {| |}^{2} .

(52)

By combining Equations (45) and (52), it is known that:

\dot{V} \leq - a | | \tilde{x} {| |}^{2} + b | | \tilde{x} | |,

(53)

where

a = \frac{1}{2} λ_{m i n} (Q_{v}) - Δ | | P | | > 0

,

b = \bar{δ} | | P | | > 0

. When

| | \tilde{x} | | > b / a

,

\dot{V} < 0

. Therefore, there exists a spherical domain

Ω = {x ∣ | | x | | \leq b / a}

. When

x \notin Ω

, V decreases until x enters

Ω

, and the radius of the spherical domain

Ω

is only related to system parameters, not to the initial error. Therefore,

\tilde{x}

is Uniformly Ultimately Bounded (UUB) [39], and the final bound is

b / a

.

Since V is positive definite, the following relationship exists:

V \geq \frac{1}{2} λ_{m i n} (P) | | \tilde{x} {| |}^{2} + \frac{1}{2} λ_{m i n} (Γ^{- 1}) | | \tilde{W} {| |}_{F}^{2} .

(54)

When

| | \tilde{x} | | > b / a

,

\dot{V} < 0

, V has an upper bound. Let us define:

V^{*} = sup_{t \geq 0} V (t) < \infty .

(55)

By combining Equation (54), the following relationship exists:

\frac{1}{2} λ_{m i n} (Γ^{- 1}) | | \tilde{W} {| |}_{F}^{2} \leq V \leq V^{*},

(56)

therefore:

| | \tilde{W} {| |}_{F} \leq \sqrt{\frac{2 V^{*}}{λ_{m i n} (Γ^{- 1})}},

(57)

That is,

\tilde{W}

is also UUB. Through the expression of

\tilde{D}

Equation (43), combined with Assumptions A3–A5, we can derive the following relationship:

| | \tilde{D} | | \leq | | \tilde{W} | | Φ_{m a x} + ϵ_{m a x} + W_{m a x} L_{Φ} | | \tilde{x} | | .

(58)

Since

\tilde{W}

and

\tilde{x}

on the right side of Equation (58) are both UUB,

\tilde{D} (t)

is UUB.

In summary, we have strictly proved based on Lyapunov stability theory that the system state observation error

\tilde{x}

, RBF weight error

\tilde{W}

, and system lumped disturbance estimation error

\tilde{D}

are Uniformly Ultimately Bounded (UUB). This provides theoretical support for the stability of the proposed method and lays a solid theoretical foundation for subsequent simulation verification. □

Remark 4.

It is important to address the system behavior during the initial transient phase when the RBF estimator has not yet converged (i.e.,

∥ \tilde{D} ∥ \neq 0

). During this period, the estimation error

\tilde{D}

acts as a bounded uncertainty acting on the linearized system. The robustness of the proposed strategy in this phase is guaranteed by two factors. First, as proven in Theorem 1, the errors are Uniformly Ultimately Bounded (UUB) regardless of the initial state, ensuring that the system state does not diverge even before convergence. Second, the MPC framework explicitly incorporates image visibility constraints (Equation (17)) into the optimization problem. Unlike classical feedback linearization, which might generate aggressive control inputs based on an inaccurate model, the MPC solver seeks a feasible control sequence that strictly satisfies the feature constraints (

x_{m i n} \leq x \leq x_{m a x}

). Consequently, while tracking accuracy may be temporarily lower during the first few iterations of the transient phase, the system explicitly prevents the loss of visual features, ensuring task safety until the RBF estimator converges to the true disturbance.

5. Experiments

To accurately quantify the performance gains of the two core innovations proposed in this paper—RBF-EKF coupled state-disturbance estimation and task-oriented K-means clustering for center selection—this section presents four groups of controlled experiments designed under the “single-variable control” principle. The simulation environment was implemented in MATLAB (The MathWorks, Inc., Natick, MA, USA) R2021b on a computer equipped with an Intel Core i9-13900HX CPU and 64 GB of RAM. The experiments utilize a UR5 manipulator (Universal Robots, Odense, Denmark) (official D-H parameters; joint limits:

[- π, π]

; velocity limits:

[- 1.57 rad / s, 1.57 rad / s]

) to evaluate two key metrics: image feature tracking accuracy and lumped disturbance estimation precision.

The consistency of core parameters was strictly maintained to ensure a fair comparison: Visual Predictive Control parameters were set as

N_{p} = 10

,

Q_{c} = diag ([10] * 8)

,

R_{c} = diag ([1] * 6)

. The EKF used process noise covariance

Q_{e k f} = 10^{2} \times I

and measurement noise covariance

R_{e k f} = 10^{4} \times I

. The vision system utilized a pinhole model (

f_{x} = f_{y} = 800, c_{x} = 320, c_{y} = 240

) tracking an 8-dimensional feature vector. To simulate the measurement setup, Gaussian noise with a standard deviation of 0.1 was added to the image features. These noisy measurement data are directly processed by the EKF to generate filtered state estimates, which are then used by the RBF network for disturbance learning and by the MPC for trajectory planning. Lumped disturbances, including end-effector load variations and Jacobian modeling errors, were introduced as:

D (t) = 20 [\begin{matrix} sin (0.8 t) + 1 \\ cos (0.6 t) + 2 \\ 1.5 sin (1.2 t) \\ 0.25 cos (0.9 t) \\ ⋮ \end{matrix}] .

(59)

5.1. Evaluation of Coupled Estimation and RBF Center Selection Strategy

This subsection evaluates the performance of the proposed estimation mechanism. The experimental groups are defined as follows:

Group 1: Uncoupled Benchmark (Uncoupled-TOC). This group employs a serial “EKF-filtering followed by RBF-observation” structure. RBF centers (20 in total) are selected using the task-oriented K-means clustering proposed in this paper.
Group 2: Random Center Control (Coupled-Random). This group utilizes the RBF-EKF coupled estimation mechanism, but with 20 centers randomly distributed across the 8-dimensional image space ( $[0, 640] \times [0, 480]$ ).
Group 3: Global Clustering Control (Coupled-Global). This group uses the coupled mechanism with centers derived from global K-means clustering on a dataset of 2,000 points sampled from the entire image space, rather than the task trajectory.
Group 4: Proposed Method (Coupled-TOC). This group integrates both the RBF-EKF coupled mechanism and the task-oriented K-means clustering (using the same 20 centers as Group 1).

By comparing Group 1 and Group 4, the superiority of the coupled feedback estimation over the serial structure is verified. By comparing Groups 2, 3, and 4, the advantages of task-oriented RBF center selection—specifically in reducing computational redundancy and improving approximation accuracy—are independently quantified.

Through four groups of control experiments, the time-series data and statistical metrics for image feature tracking errors and disturbance observation errors were obtained. Accordingly, the image feature trajectories, disturbance observation error curves, and error bar charts (including variance) were generated.

Figure 3 illustrates the motion trajectories of the image features in the

u - v

pixel plane for the four experimental groups. The `dot’, `star’, and `cross’ markers explicitly denote the initial, desired, and final positions of the feature points, respectively. Crucially, the zoomed-in insets in the top-right corners highlight the terminal convergence details. It can be clearly observed from these trajectories that the trajectory curve of the Coupled-TOC group (the core method of this paper) is significantly smoother throughout the experiment: from the initial point to the desired point, the curve always maintains a stable convergence trend without obvious fluctuations or oscillations. In contrast, the trajectory curves of the other three groups all have varying degrees of instability. This difference stems from the error correction effect of the coupled state-disturbance estimation mechanism: the fourth group corrects the EKF prediction deviation through the feedback of the RBF disturbance estimation value, effectively offsetting the coupling interference of noise and disturbance; the one-way structure of the first group (Uncoupled-TOC) cannot use the disturbance observation result to optimize the state estimation, making it difficult to resist the influence of disturbance, resulting in small-scale oscillations; the second group (Coupled-Random) has severe trajectory oscillations and significant deviations in control output because the randomly selected RBF centers cannot effectively approximate task-related disturbances; and the trajectory of the third group (Coupled-Global) is relatively stable, but the redundancy of the global clustering centers increases the computational burden.

The smooth trajectory of the Coupled-TOC group validates that the task-oriented K-means clustering ensures the RBF network focuses on the feature space most relevant to the mission, thereby maximizing approximation accuracy with minimal centers. Combined with the RBF-EKF coupled mechanism, the system maintains robust Visual Predictive Control performance even under complex lumped disturbances, fulfilling the requirements for high-precision robotic tasks.

Figure 4a–h display the time-series tracking performance across all eight dimensions of the lumped disturbance vector

D (t)

. In each subplot, the blue solid line represents the ground truth (`Real’) disturbance, which is characterized by sinusoidal fluctuations. By comparing the estimation curves of the different groups against this ground truth, the specific tracking capability of each method is revealed. Figure 4i is a bar chart of the mean and variance of disturbance observation errors, quantifying the accuracy and stability of disturbance estimation.

In terms of disturbance estimation capability: Group 1 (Uncoupled-TOC) exhibits a maximum peak error of 62.41 with violent fluctuations. This is attributed to the fact that the uncoupled EKF lacks a disturbance compensation term, leading to delayed responses, noisy RBF inputs, and significant state deviations, which collectively degrade estimation accuracy. The curve for Group 2 (Coupled-Random) remains nearly flat with high error levels throughout the experiment. This phenomenon occurs because the local approximation performance of the RBF network depends on the spatial alignment between the centers and the input data. Randomly selected centers fail to be effectively activated by the task-specific trajectory, leaving the network under-activated and incapable of capturing the time-varying characteristics of the lumped disturbances. Group 3 (Coupled-Global) shows a relatively stable curve, but its convergence lags 1–2 s behind Group 4 (Coupled-TOC). This delay is inherent to global clustering: the network integrates responses from a vast number of centers across the entire image space rather than focusing on the core feature regions relevant to the task. Consequently, the disturbance estimates are over-smoothed, making the network insensitive to rapid transients. Furthermore, the redundant centers increase computational complexity, where the weight update process is hampered by irrelevant features. In contrast, Group 4 (Coupled-TOC) demonstrates superior rapid response capability. By leveraging the RBF-EKF coupled mechanism, the EKF actively offsets disturbances in the state-space model in real-time. Simultaneously, the task-oriented K-means clustering ensures the centers are highly aligned with the actual trajectory, enabling the network to track time-varying disturbances with high fidelity and zero redundant oscillations.

To quantitatively evaluate the performance, the statistical metrics regarding disturbance observation error (Mean and Variance) and the average execution time are summarized in Table 2.

As presented in Table 2, the proposed Group 4 (Coupled-TOC) consistently achieves the lowest mean error and variance, significantly outperforming the other three strategies. Comparison with the Uncoupled benchmark (Group 1) reveals that the proposed method delivers a marked reduction in estimation error, thereby confirming the effectiveness of the coupled state-disturbance estimation mechanism. Specifically, regarding computational requirements, the average execution time of the proposed method is 2.87 ms. Although slightly higher than the Uncoupled strategy due to the feedback mechanism, it remains well within the system’s sampling period of 10 ms (100 Hz). This confirms that the proposed task-oriented K-means clustering effectively controls the network scale to balance high-precision estimation with the real-time performance required for engineering applicability.

5.2. Comparative Analysis of Control Strategies

To strictly respond to the necessity of validating the proposed control framework against established methods, this section conducts a comparative study of four distinct control strategies. The objective is to decouple and quantify the contributions of the MPC constraints and the RBF disturbance compensation to the final tracking performance. The four experimental groups are defined as follows:

Group 1: Classical IBVS Control (IBVS). The standard image-based visual servoing controller using a proportional control law without physical constraints or disturbance compensation.
Group 2: Compensated IBVS Control (IBVS + Comp.). The classical IBVS controller augmented with the proposed RBF-EKF disturbance feedforward compensation.
Group 3: Standard Model Predictive Control (MPC). The model predictive controller that handles constraints but relies on the nominal model without the RBF-estimated disturbance compensation term ( $\hat{D} = 0$ ).
Group 4: Proposed Method (MPC + Comp.). The complete framework proposed in this paper, integrating both the constrained MPC optimization and the RBF-EKF coupled disturbance compensation.

To rigorously test robustness, a time-varying sinusoidal lumped disturbance (consistent with Equation (59)) was introduced to the system. Figure 5 illustrates the error convergence trajectories and the statistical performance metrics for the four groups.

As shown in Figure 5a, the Classical IBVS (blue line) exhibits the slowest convergence rate. More critically, due to the lack of constraint handling and disturbance rejection capability, it suffers from significant oscillations (visible around iteration 700) when the external disturbance intensifies, failing to achieve a stable steady state. The IBVS + Comp. strategy (orange line) improves convergence speed and stability by compensating for the disturbance, yet it is still limited by the fixed gain of the proportional controller.

The Standard MPC (yellow line) outperforms the IBVS groups in terms of convergence speed due to its receding horizon optimization. However, because it optimizes based on a nominal model that ignores the lumped disturbance D, model mismatch occurs, leading to a steady-state offset.

Finally, the Proposed Method (purple line) demonstrates the superior performance. By explicitly incorporating the estimated disturbance

\hat{D}

into the feedback linearization loop (Equation (14)), it recovers the linear system dynamics, allowing the MPC to generate optimal control inputs that are accurate even under heavy disturbances.

The quantitative results in Figure 5b further confirm this analysis. The proposed method achieves the lowest mean tracking error (171.18), representing a 46.6% reduction compared to Classical IBVS (321.03) and a 23.5% reduction compared to Standard MPC (223.96). These results empirically validate that the integration of predictive constraint handling and active disturbance compensation is essential for high-precision visual servoing.

6. Conclusions

To enhance the stability and accuracy of Visual Predictive Control for robotics under complex disturbances, this paper presents an integrated strategy featuring RBF-EKF coupled state-disturbance estimation and task-oriented K-means clustering for RBF center selection. Comparative experiments demonstrate that this synergistic approach significantly outperforms traditional methods, specifically reducing the mean disturbance observation error by 42.6% (from 21.71 to 12.46) and significantly lowering peak errors compared to serial structures. An in-depth analysis of these results reveals that the bidirectional coupled structure effectively eliminates the “response lag” observed in uncoupled methods by feeding the disturbance estimate back to correct the EKF’s state prediction. Furthermore, the task-oriented K-means clustering algorithm addresses the efficiency-accuracy trade-off by overcoming the “curse of dimensionality” in high-dimensional feature spaces. By concentrating computational resources strictly on the valid task path, the method achieves superior steady-state stability with a variance of 25.79 using only 20 compact centers, maximizing approximation accuracy while minimizing the redundancy typical of global or random clustering strategies.

Despite these contributions, the current study has limitations that must be acknowledged. The offline RBF center selection relies on the similarity between the actual trajectory and the nominal path; thus, if extreme external disturbances cause significant deviation, the pre-selected centers may fail to cover the new state space, potentially degrading estimation accuracy. Additionally, the validation is currently restricted to simulation environments, which may not fully capture real-world complexities such as variable lighting and hardware latency. Future research will focus on addressing these issues by introducing an online dynamic center update strategy to improve adaptability and validating the system on physical platforms. We also plan to investigate robust mechanisms to handle partial or full feature occlusion by leveraging the trajectory prediction capabilities inherent in the MPC framework.

Author Contributions

Conceptualization, P.J.; methodology, P.J. and H.W.; software, P.J. and H.W.; validation, H.W.; formal analysis, P.J. and H.W.; investigation, H.W.; resources, M.C.; data curation, H.W.; writing—original draft preparation, H.W.; writing—review and editing, P.J., M.C., and Y.H.; visualization, H.W.; supervision, P.J. and M.C.; project administration, M.C. and W.R.; funding acquisition, M.C. and Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Jinan Industrial Innovation Carrier Project under Project No. 202333001, the Youth Innovation Science and Technology Support Plan of Colleges in Shandong Province under Grant No. 2021KJ025, and the Pilot Projects for the Integration of Science, Education, and Industry under Grant 2023CGZH-02.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations and Nomenclature

The following abbreviations and nomenclature are used in this manuscript:

Abbreviations
IBVS	Image-Based Visual Servoing
MPC	Model Predictive Control
RBF	Radial Basis Function
EKF	Extended Kalman Filter
QP	Quadratic Programming
UUB	Uniformly Ultimately Bounded
PBVS	Position-Based Visual Servoing
DOF	Degrees of Freedom
D-H	Denavit-Hartenberg
NMPC	Nonlinear Model Predictive Control
2D	Two-Dimensional
3D	Three-Dimensional

Nomenclature
Symbol	Definition
System Variables
x	True state vector (image feature vector), $x \in R^{2 n_{1}}$
$\hat{x}$	Estimated state vector obtained by EKF
$\tilde{x}$	State estimation error, defined as $\tilde{x} = x - \hat{x}$
D	True lumped disturbance vector (including modeling errors and external disturbances)
$\hat{D}$	Estimated lumped disturbance vector generated by RBFNN
$\tilde{D}$	Disturbance estimation error, defined as $\tilde{D} = D - \hat{D}$
y	Measurement vector containing noise
$n_{s}$	Measurement noise vector
Kinematics and Control
s	Image feature vector in pixel coordinates
$q, \dot{q}$	Robot joint position and velocity vectors
$L (s)$	Image Jacobian matrix
$J (q)$	Robot velocity Jacobian matrix
$J_{t o t a l}$	Total Jacobian matrix of the IBVS system ( $L (s) J (q)$ )
$u_{0}$	Virtual control input for the linearized system
$N_{p}$	MPC prediction horizon
Estimation Parameters
$Φ (\hat{x})$	Radial Basis Function (RBF) vector
$W^{*}, \hat{W}$	Ideal RBF weight matrix and its estimated value
$\tilde{W}$	Weight estimation error matrix ( $\tilde{W} = W^{*} - \hat{W}$ )
P	Positive definite matrix used in Lyapunov analysis

References

He, S.; Zou, C.; Deng, Z.; Liu, W.; He, B.; Zhang, J. Model-less optimal visual control of tendon-driven continuum robots using recurrent neural network-based neurodynamic optimization. Robot. Auton. Syst. 2024, 182, 104811. [Google Scholar] [CrossRef]
Amaya-Mejía, L.M.; Ghita, M.; Dentler, J.; Olivares-Mendez, M.; Martinez, C. Visual Servoing for Robotic On-Orbit Servicing: A Survey. In Proceedings of the 2024 International Conference on Space Robotics (iSpaRo), Luxembourg, 24–27 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 178–185. [Google Scholar]
Singh, A.; Kalaichelvi, V.; Karthikeyan, R. A survey on vision guided robotic systems with intelligent control strategies for autonomous tasks. Cogent Eng. 2022, 9, 2050020. [Google Scholar] [CrossRef]
Liu, Q.; Mao, J.; Han, L.; Zhang, C.; Yang, J. Predictive Observer-Based Dual-Rate Prescribed Performance Control for Visual Servoing of Robot Manipulators with View Constraints. IEEE Trans. Cybern. 2025, 55, 2424–2436. [Google Scholar] [CrossRef]
Li, Y.; Lu, G.; He, D.; Zhang, F. Robocentric Model-Based Visual Servoing for Quadrotor Flights. IEEE/ASME Trans. Mechatron. 2023, 28, 2155–2166. [Google Scholar] [CrossRef]
Li, J.; Peng, X.; Li, B.; Li, M.; Wu, J. Image-Based Visual Servoing for Three Degree-of-Freedom Robotic Arm with Actuator Faults. Actuators 2024, 13, 223. [Google Scholar] [CrossRef]
Khalil, H.K. Nonlinear Systems; Prentice Hall: Upper Saddle River, NJ, USA, 2002. [Google Scholar]
Mayne, D.Q.; Rawlings, J.B.; Rao, C.V.; Scokaert, P.O.M. Constrained model predictive control: Stability and optimality. Automatica 2000, 36, 789–814. [Google Scholar] [CrossRef]
Zhang, Y.; Ghosh, A.; An, Y.; Joo, K.; Kim, S.; Kuc, T. Geometry-Constrained Learning-Based Visual Servoing with Projective Homography-Derived Error Vector. Sensors 2025, 25, 2514. [Google Scholar] [CrossRef]
Wang, J.; Wei, Y.; Jiang, L.; Guo, X.; Zheng, A.; Zhao, W.; Li, Z. VisionSafeEnhanced VPC: Cautious Predictive Control with Visibility Constraints under Uncertainty for Autonomous Robotic Surgery. arXiv 2025, arXiv:2508.18937. [Google Scholar] [CrossRef]
Xu, B.; Yuan, L.; Yu, H. RBF Neural Network-Based Anti-Disturbance Trajectory Tracking Control for Wafer Transfer Robot Under Variable Payload Conditions. Appl. Sci. 2025, 15, 9193. [Google Scholar] [CrossRef]
Qiu, Z.; Wu, Z. Adaptive neural network control for image-based visual servoing of robot manipulators. IET Control Theory Appl. 2022, 16, 443–453. [Google Scholar] [CrossRef]
Shirzadeh, M.; Amirkhani, A.; Jalali, A.; Mosavi, M.R. An indirect adaptive neural control of a visual-based quadrotor robot for pursuing a moving target. ISA Trans. 2015, 59, 290–302. [Google Scholar] [CrossRef]
Liu, K.; Li, Y.; Wen, C.-Y.; Wang, Y.; Zhang, Y. Velocity-Free Adaptive Neural-Fuzzy Predefined-Time Attitude Control for Spacecraft. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 6354–6372. [Google Scholar] [CrossRef]
Liu, K.; Yang, W.; Jiao, L.; Yuan, Z.; Wen, C.-Y. Fast Fixed-Time Distributed Neural Formation Control-Based Disturbance Observer for Multiple Quadrotor UAVs Under Unknown Disturbances. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 13137–13155. [Google Scholar] [CrossRef]
Soper, D.S. Using an Opportunity Matrix to Select Centers for RBF Neural Networks. Algorithms 2023, 16, 455. [Google Scholar] [CrossRef]
Sousa Júnior, E.; Freitas, A.; Rabelo, R.; Santos, W. Estimation of Radial Basis Function Network Centers via Information Forces. Entropy 2022, 24, 1347. [Google Scholar] [CrossRef]
Caradonna, D.; Pierallini, M.; Santina, C.D.; Angelini, F.; Bicchi, A. Model and Control of R-Soft Inverted Pendulum. IEEE Robot. Autom. Lett. 2024, 9, 5102–5109. [Google Scholar] [CrossRef]
Sauvee, M.; Poignet, P.; Dombre, E.; Courtial, E. Image Based Visual Servoing through Nonlinear Model Predictive Control. In Proceedings of the 45th IEEE Conference on Decision and Control, San Diego, CA, USA, 13–15 December 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 1776–1781. [Google Scholar]
Allibert, G.; Courtial, E.; Chaumette, F. Predictive Control for Constrained Image-Based Visual Servoing. IEEE Trans. Robot. 2010, 26, 933–939. [Google Scholar] [CrossRef]
Chen, G.; Hua, S.; Fan, C.; Wang, C.; Wang, S.; Sun, L. An Enhanced Error-Adaptive Extended-State Kalman Filter Model Predictive Controller for Supercritical Power Plants. Algorithms 2025, 18, 387. [Google Scholar] [CrossRef]
Ding, L.; Wen, C. High-Order Extended Kalman Filter for State Estimation of Nonlinear Systems. Symmetry 2024, 16, 617. [Google Scholar] [CrossRef]
Zhao, J.; Netto, M.; Mili, L. A Robust Iterated Extended Kalman Filter for Power System Dynamic State Estimation. IEEE Trans. Power Syst. 2017, 32, 3205–3216. [Google Scholar] [CrossRef]
Marshall, M.; Lipkin, H. Kalman filter visual servoing control law. In Proceedings of the 2014 IEEE International Conference on Mechatronics and Automation, Tianjin, China, 3–6 August 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 527–532. [Google Scholar]
Esfandiari, M.; Du, P.; Wei, H.; Gehlbach, P.; Munawar, A.; Kazanzides, P.; Iordachita, I. Model Predictive Path Integral Control of I2RIS Robot Using RBF Identifier and Extended Kalman Filter. In Proceedings of the 2025 American Control Conference (ACC), Denver, CO, USA, 8–10 July 2025; pp. 3341–3347. [Google Scholar]
Xia, W.; Mao, Y.; Zhang, L.; Guo, T.; Wang, H.; Bao, Q. Extended State Kalman Filter-Based Model Predictive Control for Electro-Optical Tracking Systems with Disturbances: Design and Experimental Verification. Actuators 2024, 13, 113. [Google Scholar] [CrossRef]
Ye, X.; Lian, J.; Zhao, G.; Zhang, D. A Novel Closed-Loop Structure for Drag-Free Control Systems with ESKF and LQR. Sensors 2023, 23, 6766. [Google Scholar] [CrossRef]
Wurzberger, F.; Schwenker, F. Learning in Deep Radial Basis Function Networks. Entropy 2024, 26, 368. [Google Scholar] [CrossRef]
Fan, J.; Xu, B.; Zhang, D.; Cui, J. Ultra-precision control of dual-axis scanning mechanism based on PSO-RBF neural network. Sci. Rep. 2025, 15, 41313. [Google Scholar] [CrossRef]
Li, H.; Zhang, R.; Shi, P.; Mei, Y.; Zheng, K.; Qiu, T. Sensorless control of a PMSM based on an RBF neural network-optimized ADRC and SGHCKF-STF algorithm. Meas. Control 2024, 57, 266–279. [Google Scholar] [CrossRef]
Lin, J.; Ma, L.; Huang, D.; Wu, Y.; Sun, Y.; Hao, H. Fixed-Time Image-Based Visual Servoing with Prescribed Performance. IEEE Trans. Ind. Electron. 2025, 73, 885–895. [Google Scholar] [CrossRef]
Zhu, T.; Mao, J.; Han, L.; Zhang, C. Fuzzy Adaptive Model Predictive Control for Image-based Visual Servoing of Robot Manipulators with Kinematic Constraints. Int. J. Control Autom. Syst. 2024, 22, 311–322. [Google Scholar] [CrossRef]
Siciliano, B.; Sciavicco, L.; Villani, L.; Oriolo, G. Robotics: Modelling, Planning and Control; Springer: London, UK, 2009. [Google Scholar]
Yan, C. Exploring Gaussian radial basis function integrals for weight generation with application in financial option pricing. Comput. Math. Appl. 2025, 181, 71–83. [Google Scholar] [CrossRef]
Ismayilova, A.; Ismayilov, M. On the universal approximation property of radial basis function neural networks. Ann. Math. Artif. Intell. 2024, 92, 691–701. [Google Scholar] [CrossRef]
Hou, C.; Kang, Y.; Qiao, T. Multi-Camera Hierarchical Calibration and Three-Dimensional Reconstruction Method for Bulk Material Transportation System. Sensors 2025, 25, 2111. [Google Scholar] [CrossRef]
Reif, K.; Gunther, S.; Yaz, E.; Unbehauen, R. Stochastic stability of the discrete-time extended Kalman filter. IEEE Trans. Autom. Control 1999, 44, 714–728. [Google Scholar] [CrossRef]
Liao, K.; Nie, L.; Huang, S.; Lin, C.; Zhang, J.; Zhao, Y.; Gabbouj, M.; Tao, D. Deep Learning for Camera Calibration and Beyond: A Survey. arXiv 2025, arXiv:2303.10559. [Google Scholar]
Rotithor, G.; Salehi, I.; Tunstel, E.; Dani, A.P. Stitching Dynamic Movement Primitives and Image-Based Visual Servo Control. arXiv 2023, arXiv:2111.00088. [Google Scholar] [CrossRef]

Figure 1. Block diagram of the proposed Visual Predictive Control framework with RBF-EKF coupled state-disturbance estimation.

Figure 2. Flowchart of RBF center selection based on task-oriented K-means clustering.

Figure 3. Trajectory of image feature tracking in four control groups. (a) Uncoupled-TOC. (b) Coupled-Random. (c) Coupled-Global. (d) Coupled-TOC (proposed method).

Figure 4. 8-dimensional temporal curves of disturbance observation and statistical analysis. (a–h) Disturbance observation curves for four control groups. (i) Bar chart of mean and variance of disturbance observation error.

Figure 5. Comparison of control performance under time-varying disturbances. (a) Time-history of average pixel error convergence. The zoomed-in view highlights the steady-state behavior under disturbance. (b) Statistical comparison of the mean tracking error (absolute error sum) for the four strategies.

Table 1. Standard D-H parameters of the UR5 manipulator.

Joint (i)	$θ_{i}$ (rad)	$d_{i}$ (m)	$a_{i}$ (m)	$α_{i}$ (rad)
1	$θ_{1}$	0.089459	0	$π / 2$
2	$θ_{2}$	0	−0.425	0
3	$θ_{3}$	0	−0.39225	0
4	$θ_{4}$	0.10915	0	$π / 2$
5	$θ_{5}$	0.09465	0	$- π / 2$
6	$θ_{6}$	0.0823	0	0

Table 2. Statistical comparison of disturbance observation errors and computational costs across four experimental groups.

Group	Method Strategy	Mean Error	Error Variance	Avg. Time (ms)
Group 1	Uncoupled-TOC	21.71	34.31	2.46
Group 2	Coupled-Random	98.49	45.12	2.86
Group 3	Coupled-Global	18.41	27.16	8.08
Group 4	Coupled-TOC (Proposed)	12.46	25.79	2.87

Note: The computational time represents the average execution time per control cycle.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ji, P.; Wang, H.; Ren, W.; Han, Y.; Cao, M. Visual Predictive Control for Robotics with RBF-EKF Coupled State-Disturbance Estimation and Task-Oriented K-Means Clustering. Sensors 2026, 26, 1046. https://doi.org/10.3390/s26031046

AMA Style

Ji P, Wang H, Ren W, Han Y, Cao M. Visual Predictive Control for Robotics with RBF-EKF Coupled State-Disturbance Estimation and Task-Oriented K-Means Clustering. Sensors. 2026; 26(3):1046. https://doi.org/10.3390/s26031046

Chicago/Turabian Style

Ji, Peng, Hongyu Wang, Weina Ren, Youngjoon Han, and Maoyong Cao. 2026. "Visual Predictive Control for Robotics with RBF-EKF Coupled State-Disturbance Estimation and Task-Oriented K-Means Clustering" Sensors 26, no. 3: 1046. https://doi.org/10.3390/s26031046

APA Style

Ji, P., Wang, H., Ren, W., Han, Y., & Cao, M. (2026). Visual Predictive Control for Robotics with RBF-EKF Coupled State-Disturbance Estimation and Task-Oriented K-Means Clustering. Sensors, 26(3), 1046. https://doi.org/10.3390/s26031046

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Visual Predictive Control for Robotics with RBF-EKF Coupled State-Disturbance Estimation and Task-Oriented K-Means Clustering

Abstract

1. Introduction

2. Problem Description

2.1. Introduction to IBVS System

2.2. Mathematical Models

2.2.1. IBVS Image Feature Motion Model

2.2.2. Robotic Kinematics Model

2.2.3. Robotic Velocity Kinematics Model

2.3. Challenges Faced by IBVS

3. Control Methods

3.1. Feedback Linearization IBVS-MPC Control

3.2. EKF + RBF Coupled Estimation Method

3.3. Task-Oriented K-Means Clustering

4. Stability Proof of EKF-RBF Coupled Estimation

5. Experiments

5.1. Evaluation of Coupled Estimation and RBF Center Selection Strategy

5.2. Comparative Analysis of Control Strategies

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations and Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI