Adaptive Cooperative Control of Dual-Arm Robots Using RBF-ADP with Event-Triggering Mechanism

Dai, Yuanwei

doi:10.3390/sym18030437

Open AccessArticle

Adaptive Cooperative Control of Dual-Arm Robots Using RBF-ADP with Event-Triggering Mechanism

by

Yuanwei Dai

School of Mechanical Engineering, Northwestern Polytechnical University, Xi’an 710072, China

Symmetry 2026, 18(3), 437; https://doi.org/10.3390/sym18030437

Submission received: 31 January 2026 / Revised: 21 February 2026 / Accepted: 28 February 2026 / Published: 3 March 2026

(This article belongs to the Special Issue Symmetry/Asymmetry and Autonomous Robotics)

Download

Browse Figures

Versions Notes

Abstract

High-precision cooperative control of dual-arm manipulators faces significant challenges arising from complex dynamic coupling, parametric uncertainties, and external disturbances. Furthermore, in networked control scenarios, communication bandwidth and computational resources are inevitably constrained. To address these issues, this paper proposes a novel composite control framework that integrates adaptive dynamic programming (ADP) with active disturbance rejection control (ADRC) under a static event-triggering mechanism (SETM). First, to handle model uncertainties and external perturbations, a smooth nonlinear extended state observer (ESO) based on continuous fractional-power functions is developed. This observer guarantees finite-time convergence of the disturbance estimation without inducing the high-frequency chattering inherent in conventional sliding-mode observers. Second, leveraging the disturbance-compensated dynamics, a radial basis function (RBF) neural network-based ADP controller is designed to learn the optimal control policy online, thereby minimizing a quadratic performance index without requiring accurate model knowledge. Third, to improve resource utilization, a static event-triggering strategy is introduced to schedule control updates based on the system state and tracking error. Extensive simulation studies on a 3-DoF dual-arm system demonstrate that the proposed scheme achieves superior trajectory tracking accuracy and disturbance robustness while significantly reducing the communication frequency compared to time-triggered approaches.

Keywords:

dual-arm robot; reinforcement learning; active disturbance rejection control; event-triggered control

1. Introduction

With the rapid development of robotic applications, dual-arm collaborative robot systems have been increasingly deployed in complex assembly, hazardous-material handling, and telemedicine, among other application scenarios [1,2]. Compared with single manipulators, dual-arm systems exhibit enhanced operational flexibility and payload capability; however, their closed-chain kinematic constraints and dynamic coupling characteristics impose severe challenges on control system design. In practical applications, manipulators often suffer from parametric uncertainty (e.g., friction and payload variations) and external disturbances in unstructured environments, rendering traditional model-based control methods that rely on accurate dynamics insufficient to guarantee high-precision trajectory-tracking performance [3,4].

To address model uncertainty, adaptive control and neural-network (NN)-based approximation control have been extensively investigated [5,6]. Nevertheless, such approaches typically emphasize stability and error convergence while often overlooking performance optimization during control execution (e.g., minimizing control energy). In recent years, data-driven control methods represented by reinforcement learning (RL) and adaptive dynamic programming (ADP) have attracted substantial attention because they can approximate optimal control policies via online learning without requiring an accurate model [7,8]. For instance, Nohooji et al. [9] incorporated an actor–critic framework into PID parameter tuning and significantly improved robot adaptability. Despite their theoretical optimality, purely data-driven methods commonly encounter slow convergence in early training and sensitivity to large-magnitude disturbances, and their direct application to high-precision robotic control still raises safety concerns [10].

The core idea of active disturbance rejection control (ADRC) is to employ an extended state observer (ESO) to estimate and compensate the “total disturbance” (including internal uncertainties and external perturbations) in real time, thereby transforming a nonlinear plant into a standard cascaded-integrator model [11,12]. This “model normalization” property substantially reduces the learning difficulty for RL/ADP algorithms. Zheng et al. [13] and Kong et al. [14] attempted to combine RL with ADRC and verified the advantages of this composite architecture in disturbance suppression and tracking-accuracy improvement. However, conventional ESOs often employ nonsmooth sign functions to pursue fast convergence, which may induce high-frequency chattering in the control signal [15]. To overcome this issue, finite-time control theory based on the non-singular terminal sliding mode (NTSM) and continuous fractional-power functions provides a promising alternative [16,17,18], enabling finite-time convergence while effectively suppressing chattering [19,20].

On the other hand, in modern networked robotic control systems, communication bandwidth and computational resources are typically constrained. Conventional time-triggered control updates the control input at every sampling instant, which incurs substantial redundant transmissions and computational overhead as the system approaches steady state [21]. To improve resource utilization, event-triggered mechanisms (ETMs) have been introduced into control system design [22]. Unlike fixed-period sampling, ETMs update the control input only when the system state or error violates a prescribed condition.

Motivated by the above, for dual-arm cooperative control in the presence of model uncertainty, external disturbances, and limited resources, this paper proposes a composite control scheme integrating RBF-ADP, a smooth nonlinear ESO, and a static event-triggering mechanism. The main contributions are summarized as follows:

1.: An online learning control framework that integrates RBF-ADP with a smooth ESO is proposed. By compensating the total disturbance in real time using the ESO, the complex manipulator dynamics are simplified into a form that is amenable to ADP learning, while achieving both optimality and strong robustness.
2.: A smooth nonlinear ESO based on continuous fractional-power functions is developed. The observer not only guarantees finite-time accurate disturbance estimation but also fundamentally eliminates the chattering issue inherent to conventional approaches, making it more suitable for high-precision servo control.
3.: A static event-triggering mechanism is introduced between the controller and the actuator. By adjusting the triggering threshold online according to the real-time system state, the proposed mechanism significantly reduces the update frequency and communication burden of control commands while maintaining trajectory-tracking accuracy.

The remainder of this paper is organized as follows. Section 2 presents the designed observer and controller and details the adopted event-triggering implementation. Section 3 reports the simulation experiments, where the proposed algorithm is validated on a 3-DoF simulation case study. Section 4 concludes this paper and proposes the future research scope.

2. Method

This section considers the cooperative control problem of dual-arm 3-DoF manipulators and presents a control framework that integrates radial basis function (RBF) adaptive dynamic programming (ADP), a smooth nonlinear extended state observer, and an event-triggered mechanism. The main idea is as follows: the master arm generates the desired joint trajectory, and the slave arm, in the presence of uncertainties and external disturbances, tracks the master joint motion via an online-learning RBF–ADP control law coordinated with a disturbance observer; meanwhile, an event-triggered mechanism is employed to reduce the number of control-command updates, thereby saving communication and computational resources as shown in Figure 1.

2.1. Master–Slave Cooperative Structure and Error Definition

Consider a pair of structurally identical 3-DoF manipulators, referred to as the master arm and the slave arm, respectively.

The operational configuration functions fundamentally as a trajectory generator and a tracking follower. This classification simplifies the “master–slave” nomenclature into a rigorous error-tracking paradigm, isolating the objective to pure trajectory synchronization. Based on offline planning or a higher-level planner, the master arm provides a desired joint trajectory

q_{m}^{ref} (k) \in R^{3}, k = 0, 1, \dots, N,

(1)

and the actual joint angles of the slave arm are denoted by

q_{s} (k) \in R^{3} .

(2)

The master–slave joint-space synchronization error is defined as

e (k) = q_{m}^{ref} (k) - q_{s} (k) .

(3)

The control objective is to design a control law

u (k) \in R^{3}

such that

e (k)

remains bounded throughout the task and converges to zero as much as possible in the presence of model uncertainty and external disturbances. Meanwhile, to reduce communication/computation cost, an event-triggered mechanism is employed so that the control command is updated only when necessary.

Within this framework, the slave arm can be regarded as an unknown nonlinear discrete-time system, whose internal dynamics and disturbance structure need not be explicitly modeled in the controller; instead, compensation and approximation are achieved via online-learning RBF–ADP and an extended state observer.

2.2. Structure of the RBF–ADP Controller

To approximate the unknown optimal control policy and value function, an RBF neural network is introduced with the joint-space error

e (k)

as the input. Define the error vector

e (k) = {[\begin{matrix} e_{1} (k) & e_{2} (k) & e_{3} (k) \end{matrix}]}^{⊤} \in R^{3},

(4)

and construct M radial basis functions

{ϕ_{j} (\cdot)}_{j = 1}^{M}

with centers

c_{j} \in R^{3}

and widths

σ_{j} > 0

. The j-th Gaussian radial basis function is defined as

ϕ_{j} (e) = exp (- \frac{∥ e - c_{j} ∥_{2}^{2}}{2 σ_{j}^{2}}), j = 1, \dots, M .

(5)

Stacking all basis functions yields the feature vector

Φ (e) = {[\begin{matrix} ϕ_{1} (e) & ϕ_{2} (e) & \dots & ϕ_{M} (e) \end{matrix}]}^{⊤} \in R^{M} .

(6)

The RBF–ADP controller consists of an actor network and a critic network. The actor network is used to approximate the optimal control law

u^{*} (e)

, whose output is

u_{nom} (k) = W_{a}^{⊤} Φ (e (k)) \in R^{3},

(7)

where

W_{a} \in R^{M \times 3}

is the actor weight matrix. The critic network is used to approximate the error-related value function

V (e)

, whose output is

V (e (k)) = W_{c}^{⊤} Φ (e (k)) \in R,

(8)

where

W_{c} \in R^{M}

is the critic weight vector. With this structure, the actor and critic share the same set of RBF basis functions while using different output-layer weights.

In implementation, the centers

{c_{j}}

are selected as representative points within the expected error operating region, and the widths

{σ_{j}}

can be set as constants approximately covering the error region or as functions of the inter-center spacing. Specific numerical parameters will be provided in the experimental section.

To characterize the tradeoff between tracking error and control energy, a quadratic instantaneous cost function is introduced:

U (e (k), u_{nom} (k)) = e {(k)}^{⊤} Q e (k) + u_{nom} {(k)}^{⊤} R u_{nom} (k),

(9)

where

Q \in R^{3 \times 3}

and

R \in R^{3 \times 3}

are symmetric positive definite weighting matrices that adjust the relative importance of the error and the control input. The discount factor

γ \in (0, 1)

ensures the attenuation of future costs.

Define the infinite-horizon discounted performance index as

J (e (0)) = \sum_{k = 0}^{\infty} γ^{k} U (e (k), u_{nom} (k)) .

(10)

The RBF critic network

V (e) = W_{c}^{⊤} Φ (e)

is used to approximate the optimal value function

V^{*} (e)

, which satisfies

V^{*} (e (0)) \approx J^{*} (e (0)) = inf_{u (\cdot)} J (e (0)) .

(11)

Between discrete times k and

k + 1

, define the temporal-difference (TD) error as

δ (k) = U (e (k), u_{nom} (k)) + γ V (e (k + 1)) - V (e (k)) .

(12)

The TD error quantifies the mismatch between the value-function estimate produced by the current critic network and the quantity “instantaneous cost + discounted next-step value estimate.” To minimize the squared TD error, the critic weights are updated via gradient descent:

W_{c} (k + 1) = W_{c} (k) + α_{c} δ (k) Φ (e (k)),

(13)

where

α_{c} > 0

is the critic learning rate. This update law depends only on the current error

e (k)

, the next-step error

e (k + 1)

, and the current control input

u_{nom} (k)

, without requiring explicit model information.

To exploit the critic information for actor updating, the gradient of the value function with respect to the error is required. From the RBF structure,

V (e) = \sum_{j = 1}^{M} W_{c, j} ϕ_{j} (e),

(14)

where

W_{c, j}

is the j-th element of

W_{c}

. Taking the partial derivative with respect to e yields

\frac{\partial V (e)}{\partial e} = \sum_{j = 1}^{M} W_{c, j} \frac{\partial ϕ_{j} (e)}{\partial e},

(15)

and the derivative of a Gaussian RBF is

\frac{\partial ϕ_{j} (e)}{\partial e} = ϕ_{j} (e) \frac{c_{j} - e}{σ_{j}^{2}} .

(16)

Therefore, the value-function gradient vector is

\nabla_{e} V (e) = \sum_{j = 1}^{M} W_{c, j} ϕ_{j} (e) \frac{c_{j} - e}{σ_{j}^{2}} \in R^{3} .

(17)

Assume the slave-arm error evolution can be described by an unknown nonlinear mapping

e (k + 1) = f (e (k), u (k)),

(18)

and in ADP implementation, local linearization or a constant Jacobian matrix can be used to approximate the sensitivity of

e (k + 1)

with respect to the control input

u (k)

. Let

J_{u} = \frac{\partial e (k + 1)}{\partial u (k)} \approx a constant matrix .

(19)

Derived explicitly from the discrete-time integration step corresponding to the velocity-level kinematics

e (k + 1) = q_{m}^{ref} (k + 1) - (q_{s} (k) + (u (k) + d (k)) T_{s})

, the sensitivity matrix is deterministically evaluated as

J_{u} = - T_{s} I_{3 \times 3}

. This formulation provides exact gradient guidance for the policy optimization without violating the nonlinear assumptions of the broader unmodeled dynamics. Then by the chain rule, the gradient of the value function with respect to the control input is approximated as

\nabla_{u (k)} V (e (k + 1)) \approx J_{u}^{⊤} \nabla_{e} V (e (k + 1)) .

(20)

Define the one-step performance index

J_{k} = U (e (k), u_{nom} (k)) + γ V (e (k + 1)),

(21)

whose gradient with respect to

u_{nom} (k)

can be written as

\nabla_{u (k)} J_{k} = 2 R u_{nom} (k) + γ \nabla_{u (k)} V (e (k + 1)) .

(22)

Combining the above expressions yields

\nabla_{u (k)} J_{k} \approx 2 R u_{nom} (k) + γ J_{u}^{⊤} \nabla_{e} V (e (k + 1)) .

(23)

Since the actor output satisfies

u_{nom} (k) = W_{a}^{⊤} Φ (e (k)),

(24)

we obtain

\frac{\partial u_{nom} (k)}{\partial W_{a}} = Φ (e (k)),

(25)

and thus an approximation of the gradient of the performance index with respect to the actor weights is

\nabla_{W_{a}} J_{k} = Φ (e (k)) {(\nabla_{u (k)} J_{k})}^{⊤} .

(26)

Based on the policy-gradient idea, the actor weights can be updated via gradient descent:

W_{a} (k + 1) = W_{a} (k) - α_{a} Φ (e (k)) {(\nabla_{u (k)} J_{k})}^{⊤},

(27)

where

α_{a} > 0

is the actor learning rate.

Moreover, to improve convergence speed and robustness in the early learning stage, a linear reference control law can be introduced:

u_{tar} (k) = K e (k),

(28)

where

K \in R^{3 \times 3}

is a proportional gain matrix. This reference law reflects the design experience of conventional PD/proportional control. By constructing the “policy error”

e_{u} (k) = u_{tar} (k) - u_{nom} (k),

(29)

a supervised-learning-like corrective update can be incorporated as

Δ W_{a}^{\sup} (k) = α_{a} Φ (e (k)) e_{u} {(k)}^{⊤} .

(30)

By combining the policy-gradient term and the reference-policy guidance term, the final actor update law is given by

W_{a} (k + 1) = W_{a} (k) - α_{a} Φ (e (k)) {(\nabla_{u (k)} J_{k})}^{⊤} + α_{a} Φ (e (k)) e_{u} {(k)}^{⊤} .

(31)

Here, the first term adjusts the policy along an optimal direction using the value-function gradient information provided by the critic, whereas the second term embeds conventional proportional-control experience into the policy space to accelerate convergence and enhance stability.

It must be noted that the implementation of RBF-ADP controllers inherently encounters computational bottlenecks when utilizing kernel methods, particularly as the number of basis functions M increases to cover high-dimensional state spaces. Operations involving the RBF feature vector scale in computational complexity, potentially degrading online learning efficiency and numerical stability. Recent advancements in randomized numerical linear algebra, specifically the utilization of truncated randomized singular value decomposition (SVD) for mesh-free kernel methods [23], provide scalable solutions to construct low-rank approximations of kernel systems. This approach significantly reduces memory and processing overhead without substantial loss of accuracy, positioning SVD-based kernel truncation as a critical enhancement for the future high-dimensional scaling of the proposed RBF-ADP framework.

2.3. Smooth Nonlinear Extended State Observer

To suppress the influence of external disturbances and model uncertainties, a nonlinear extended state observer (ESO) is introduced to estimate the total disturbance acting on the slave arm online. For each joint, two-dimensional observer states are introduced:

z_{1} (k) \in R^{3}, z_{2} (k) \in R^{3},

(32)

where

z_{1}

is used to approximate the slave joint angles and

z_{2}

is used to approximate the lumped disturbance. The observer inputs are the control command

u (k)

and the joint-angle measurement

q_{s} (k)

, and the disturbance estimate is given by

\hat{d} (k) = z_{2} (k) .

(33)

Define the observation error as

e_{obs} (k) = z_{1} (k) - q_{s} (k) .

(34)

To avoid the chattering induced by the conventional sign function when the error approaches zero, a smooth sign function is adopted:

{sgn}_{ϵ} (e) = \frac{e}{| e | + ϵ},

(35)

where

ϵ > 0

is a small smoothing factor and the operation is element-wise. Based on

e_{obs}

, construct nonlinear power-function terms:

\begin{matrix} σ_{α} (e_{obs}) & = | e_{obs} |^{α} ⊙ {sgn}_{ϵ} (e_{obs}), \end{matrix}

(36)

\begin{matrix} σ_{β} (e_{obs}) & = | e_{obs} |^{β} ⊙ {sgn}_{ϵ} (e_{obs}), \end{matrix}

(37)

\begin{matrix} σ_{2 α - 1} (e_{obs}) & = | e_{obs} |^{2 α - 1} ⊙ {sgn}_{ϵ} (e_{obs}), \end{matrix}

(38)

\begin{matrix} σ_{2 β - 1} (e_{obs}) & = | e_{obs} |^{2 β - 1} ⊙ {sgn}_{ϵ} (e_{obs}), \end{matrix}

(39)

where

α \in (0, 1)

and

β > 1

are fractional-power exponents, and ⊙ denotes element-wise multiplication.

In continuous time, the smooth nonlinear ESO can be written as

\begin{matrix} {\dot{z}}_{1} (t) & = u (t) + z_{2} (t) - l_{1} σ_{α} (e_{obs} (t)) - k_{1} σ_{β} (e_{obs} (t)), \end{matrix}

(40)

\begin{matrix} {\dot{z}}_{2} (t) & = - l_{2} σ_{2 α - 1} (e_{obs} (t)) - k_{2} σ_{2 β - 1} (e_{obs} (t)), \end{matrix}

(41)

where

l_{1}, l_{2}, k_{1}, k_{2} > 0

are observer gains designed according to the desired convergence speed and noise robustness. This structure combines gradient-like terms and fractional-power terms: when the observation error is large, the higher-order power terms accelerate convergence; when the error is small, the fractional-power terms together with the smooth sign function suppress chattering while maintaining fast convergence.

Unlike conventional sliding-mode observers reliant on discontinuous signum functions that provoke infinite-frequency chattering, the defined continuous fractional-power functions guarantee smooth mathematical transitions near the origin. This structural continuity inherently avoids exciting unmodeled high-frequency dynamics within the physical actuation layer, eliminating the primary source of operational chattering.

In practice, the observer is discretized at a fixed sampling period, and the above differential equations are numerically integrated using the Euler method so that

z_{1} (k), z_{2} (k)

are updated at each sampling instant. The discrete form and parameter values will be provided in the experimental section.

2.4. Event-Triggered Mechanism

The direct integration of ADRC and ADP targets the specific operational deficit where pure ADP policies fail to converge under strong, time-varying unmatched disturbances, while standalone ADRC lacks energy optimization constraints over long computational horizons. By combining these paradigms, the system achieves optimization under intense perturbation. Concurrently, the event-triggering strategy implemented in this study adopts a fixed static threshold. This static event-triggering mechanism (SETM) offers a direct, computationally efficient methodology to balance control performance and communication frequency, addressing the bandwidth limitation bottlenecks intrinsic to networked multi-agent configurations.

To reduce the update frequency of control commands, an event-triggered mechanism (ETM) is introduced. During sampling intervals without triggering, the slave-arm actuator holds the previous control command via a zero-order hold.

Concurrently, filtering high-frequency control updates through the zero-order hold of the static event-triggered mechanism functions as a secondary, active chattering mitigation strategy at the actuator execution level.

Let

u_{calc} (k) = u_{nom} (k) - \hat{d} (k)

(42)

denote the instantaneous control input after disturbance compensation, and let

u_{last} (k)

be the most recently transmitted control command applied to the actuator. Define the triggering error as

r (k) = ∥ u_{calc} (k) - u_{last} (k) ∥_{2} .

(43)

Given a constant threshold

σ > 0

, the static event-triggering law is defined as

χ (k) = \{\begin{matrix} 1, & if k = 0 or r (k) > σ, \\ 0, & otherwise, \end{matrix}

(44)

where

χ (k) = 1

indicates that a control update is triggered at time k, and

χ (k) = 0

indicates no triggering. Accordingly, the actual control command applied by the actuator is updated as

u (k) = \{\begin{matrix} u_{calc} (k), & χ (k) = 1, \\ u (k - 1), & χ (k) = 0, \end{matrix}

(45)

and at triggering instants,

u_{last} (k + 1) = u (k) .

(46)

To evaluate the communication-saving effect of the event-triggered mechanism, the triggering rate can be defined as

ρ = \frac{1}{N} \sum_{k = 0}^{N - 1} χ (k),

(47)

where N is the total number of sampling steps. The quantity

ρ

reflects the fraction of sampling instants at which control updates actually occur during the entire task; a smaller

ρ

indicates more bandwidth savings while maintaining control performance, as shown in Figure 2.

3. Experiments and Results

This section validates the proposed cooperative control method on a given dual-arm 3-DoF manipulator simulation platform. First, the system structure and simulation settings are introduced. Then, the specific parameters related to the neural networks, observer, and event triggering are provided, followed by analysis of the simulation results.

3.1. Simulation Platform and System Setup

The experiments are implemented in MATLAB (R2025b) and its Robotics Toolbox for 3D geometric modeling and visualization of the master and slave manipulators. The two manipulators share the same link parameters and joint types, described by standard DH parameters, and all joints are revolute. The master base pose is set at the origin of the reference coordinate frame, while the slave base is translated by a certain distance along the X-axis to visualize the master–slave cooperative trajectories within the same workspace, as shown in Figure 3.

To evaluate the tracking performance of the proposed method under complex spatial motion, the master end-effector executes a 3D spiral trajectory in task space. The trajectory exhibits circular motion in the horizontal plane while ascending vertically at a constant speed, and its analytical expression can be written as

\begin{matrix} x_{d} (t) & = x_{c} + R cos (ω t), \end{matrix}

(48)

\begin{matrix} y_{d} (t) & = y_{c} + R sin (ω t), \end{matrix}

(49)

\begin{matrix} z_{d} (t) & = z_{c} + k_{z} t, \end{matrix}

(50)

where R is the circle radius,

ω

is the angular velocity,

k_{z}

is the vertical ascent rate, and

(x_{c}, y_{c}, z_{c})

denotes the center of the spiral. Using inverse kinematics, the above end-effector trajectory is discretized under a given sampling period to obtain the master joint reference trajectory

q_{m}^{ref} (k)

. The slave arm is initialized with a certain deviation relative to the master reference trajectory to emulate initial alignment error. The initial condition can be expressed as

q_{s} (0) = q_{m}^{ref} (0) + Δ q_{0},

(51)

where

Δ q_{0} \in R^{3}

is a prescribed initial offset vector.

To ensure strict reproducibility and rigorously validate the control synthesis against the identified ambiguities, the physical and algorithmic boundary conditions are explicitly defined. The mechanical structure of the manipulators is defined by standard Denavit-Hartenberg (DH) parameters: link offsets

d_{1} = 0.5

m; link lengths

a_{2} = 0.8

m,

a_{3} = 0.6

m; and twist angle

α_{1} = π / 2

rad. The dynamic motion of the tracking robot is governed by a velocity-resolved kinematic integrator model with a fixed sampling period of

T_{s} = 0.01

s. This integration correctly represents standard industrial manipulators configured with closed high-gain internal velocity loops, where the outer-loop position derivative responds strictly to the kinematic command interface, isolating the controller evaluation from lower-level current dynamics. The multi-frequency external disturbances are defined by specific amplitudes

A_{1} = 0.5

,

A_{2} = 0.3

, and

A_{3} = 0.4

rad/s and varying frequencies. This profile is deliberately engineered to expose the system simultaneously to alternating harmonic behaviors in the X-Y plane and monotonic tracking requirements along the Z-axis, thereby preventing purely periodic overfitting. For the RBF neural network architecture,

M = 25

hidden nodes are utilized. The basis function centers

c_{j}

are uniformly distributed across the expected operational error space bounding

[- 0.5, 0.5]

rad, and the widths are held constant at

σ_{j} = 0.5

. The event-triggering mechanism enforces a static threshold of

σ = 0.02

, establishing a definitive upper bound on control update latency strictly equivalent to the discrete sampling period.

3.2. Simulation Results and Analysis

3.2.1. Periodic Paths Validation

Figure 4 compares the 3D task-space end-effector trajectories of the master and slave arms. It can be seen that the master end-effector completes the task along the prescribed 3D spiral trajectory, and under the proposed control method, the slave end-effector trajectory closely follows that of the master.

At the initial stage, due to the deviation between the slave initial joint configuration and the master reference trajectory, an obvious discrepancy in end-effector positions is observed. With the online weight adaptation of the ADP controller and the disturbance compensation provided by the ESO, the trajectory mismatch gradually decreases. Eventually, the slave end-effector trajectory almost overlaps the master trajectory, indicating that the proposed scheme achieves satisfactory task-space cooperative tracking under disturbances and uncertainties.

Figure 5 depicts the time evolution of the joint tracking errors for the three slave joints. It can be observed that the error magnitude is relatively large at the beginning due to the combined effects of the initial offset and disturbances; as the RBF–ADP policy is learned online and the ESO continuously compensates disturbances, the errors of all three joints decay rapidly and remain within a small range with mild fluctuations during the middle and late stages of the task.

From the shapes of the error curves, the errors enter a steady regime after a brief transient, with small fluctuation amplitudes, indicating that the control law maintains stable closed-loop performance in the presence of disturbances. If desired, quantitative indices such as the root-mean-square (RMS) error or maximum absolute error can be further computed from the recorded simulation data for a more precise performance evaluation.

To validate the effectiveness of the smooth nonlinear ESO, Figure 6 compares the true disturbances and the observer estimates for the three joints. It can be seen that throughout the task, the disturbance estimates

{\hat{d}}_{i} (t)

produced by the ESO closely follow the variation trends of the true disturbances

d_{i} (t)

, with small amplitude and phase errors.

In intervals where the disturbance varies rapidly, a certain lag of the estimated signal relative to the true disturbance is observed, which is mainly attributed to the phase delay induced by fixed observer gains and discrete implementation. Nevertheless, the observer estimates are sufficient to compensate for most of the disturbance, enabling the ADP controller to optimize its policy over an approximately “disturbance-free” equivalent plant, thereby improving overall tracking performance and robustness.

Figure 7 shows the time histories of the master and slave end-effector positions along the X, Y, and Z axes. It can be observed that the slave trajectories track the master trajectories well in all three directions, and particularly in the steady state, the curves almost overlap.

Along the ascending Z direction, since the trajectory is monotonic, the disturbance influence is more readily estimated and compensated for by the ESO, resulting in a smoother tracking error. In the X–Y plane, since the trajectory is periodic, the slave must frequently switch the control input between forward and reverse directions; during this process, the RBF–ADP controller adapts its policy online and maintains the error within a small range.

Figure 8a presents a staircase plot of the event-triggering indicator

χ (k)

over time, and Figure 8b illustrates the relationship between the joint-1 tracking error and the triggering instants. It can be observed that triggering events mainly occur at the initial stage of the task and at time instants when the error varies rapidly; during phases where the error is small and varies slowly, only a small number of triggering events occur.

The measured triggering rate

ρ

is far below

100 %

, indicating that only a small fraction of sampling instants require control-command updates during the entire simulation, whereas at the remaining instants the actuator can simply hold the previous control value. Compared with the fully updated (time-triggered) case, the event-triggered mechanism significantly reduces communication and computational burden with almost no sacrifice in tracking performance, demonstrating its practical value in networked and resource-constrained control scenarios.

3.2.2. Non-Periodic Paths Validation

Figure 9 compares the 3D task-space end-effector trajectories of the master and slave arms under non-periodic paths.

Figure 10 depicts the time evolution of the joint tracking errors for the three slave joints under non-periodic paths. It can be observed that the error magnitude is relatively large at the beginning due to the combined effects of the initial offset and disturbances. As the RBF–ADP policy is learned online to adapt to the non-periodic reference signal and the ESO continuously compensates disturbances, the errors of all three joints decay rapidly and remain within a small range during the middle and late stages of the task.

To validate the effectiveness of the smooth nonlinear ESO under non-periodic paths, Figure 11 compares the true disturbances and the observer estimates for the three joints. It can be seen that throughout the task, the disturbance estimates produced by the ESO closely follow the variation trends of the true disturbances, ensuring that the non-periodic motion is not compromised by external uncertainties.

Figure 12 shows the time histories of the master and slave end-effector positions along the X, Y, and Z axes. It can be observed that the slave trajectories track the master trajectories well in all three directions, accurately capturing the unique characteristics of each axis.

Along the direction, the system successfully tracks the asymptotic growth and subsequent linear drift. In the direction, the slave arm precisely follows the damped oscillatory transient without overshooting during the amplitude decay. In the direction, the logarithmic climb is handled smoothly. These results demonstrate that the RBF–ADP controller adapts its policy online to maintain high precision across diverse motion profiles.

Overall, the simulation results show that the proposed RBF–ADP controller can achieve accurate tracking of the master reference trajectory by the slave arm in the presence of an unknown plant and external disturbances, with small errors in both joint space and task space. The smooth nonlinear ESO can effectively estimate the lumped disturbance; when its output is used to compensate the control input, the system robustness is substantially improved and a simpler equivalent plant is provided for the ADP controller. Meanwhile, the event-triggered mechanism markedly reduces the number of control-command updates while maintaining tracking performance, with a triggering rate far below the

100 %

of time-triggered control, thereby achieving efficient resource utilization.

4. Conclusions

For the cooperative control problem of dual-arm manipulators subject to model uncertainty, time-varying external disturbances, and limited communication resources, this paper proposed a composite control framework integrating reinforcement learning and active disturbance rejection control. The proposed scheme organically combines radial-basis-function-based adaptive dynamic programming (RBF-ADP), a smooth nonlinear extended state observer (ESO), and a static event-triggered mechanism (SETM). Theoretical analysis and numerical simulation results demonstrate that the designed smooth nonlinear ESO can provide fast, accurate, and chatter-free estimation of the “total disturbance” encompassing unmodeled dynamics and external perturbations, thereby effectively alleviating the dependence of conventional model-based methods on accurate dynamic parameters. On this basis, the RBF-ADP controller realizes high-precision asymptotic tracking of the master reference trajectory by the slave arm via online learning. Moreover, the introduced static event-triggered mechanism successfully balances control performance and communication load; while ensuring closed-loop stability, it significantly reduces the requirements on communication bandwidth and computational resources, indicating strong potential for robotic applications. Future work will focus on experimental validation on a physical dual-arm platform, investigation of Sim-to-Real techniques based on meta-learning or transfer learning, and the design of attack-resilient event-triggered secure control strategies by incorporating networked security control theory to further enhance system reliability.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

References

Proia, S.; Carli, R.; Cavone, G.; Dotoli, M. Control techniques for safe, ergonomic, and efficient human-robot collaboration in the digital industry: A survey. IEEE Trans. Autom. Sci. Eng. 2022, 19, 1798–1819. [Google Scholar] [CrossRef]
Liu, Z.; Peng, K.; Han, L.; Guan, S. Modeling and control of robotic manipulators based on artificial neural networks: A review. Iran. J. Sci. Technol. Trans. Mech. Eng. 2023, 47, 1307–1347. [Google Scholar] [CrossRef]
He, W.; Kang, F.; Kong, L.; Feng, Y.; Cheng, G.; Sun, C. Vibration control of a constrained two-link flexible robotic manipulator with fixed-time convergence. IEEE Trans. Cybern. 2022, 52, 5973–5983. [Google Scholar] [CrossRef]
Xu, S.; He, B. Robust adaptive fuzzy fault tolerant control of robot manipulators with unknown parameters. IEEE Trans. Fuzzy Syst. 2023, 31, 3081–3092. [Google Scholar] [CrossRef]
Wang, W.; Tong, S. Adaptive fuzzy bounded control for consensus of multiple strict-feedback nonlinear systems. IEEE Trans. Cybern. 2018, 48, 522–531. [Google Scholar] [CrossRef]
Taimoor, M.; Lu, X.; Shabbir, W.; Sheng, C. Neural network observer based on fuzzy auxiliary sliding-mode control for nonlinear systems. Expert Syst. Appl. 2024, 237, 121492. [Google Scholar] [CrossRef]
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
Cao, S.; Sun, L.; Jiang, J.; Zuo, Z. Reinforcement learning-based fixed-time trajectory tracking control for uncertain robotic manipulators with input saturation. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 4584–4595. [Google Scholar] [CrossRef]
Nohooji, H.R.; Zaraki, A.; Voos, H. Actor-critic learning based PID control for robotic manipulators. Appl. Soft Comput. 2024, 151, 111153. [Google Scholar] [CrossRef]
Manzl, P. Reliability evaluation of reinforcement learning methods for mechanical systems with increasing complexity. Multibody Syst. Dyn. 2023, 64, 335–359. [Google Scholar] [CrossRef]
Han, J. From PID to active disturbance rejection control. IEEE Trans. Ind. Electron. 2009, 56, 900–906. [Google Scholar] [CrossRef]
Radke, A.; Gao, Z. A survey of state and disturbance observers for practitioners. In Proceedings of the American Control Conference, Minneapolis, MN, USA, 14–16 June 2006; pp. 5183–5188. [Google Scholar]
Zheng, Y.; Chen, Z.; Huang, Z.; Sun, M.; Sun, Q. Active disturbance rejection controller for multi-area interconnected power system based on reinforcement learning. Neurocomputing 2021, 425, 149–159. [Google Scholar] [CrossRef]
Kong, X.; Xia, Y.; Hu, R.; Lin, M.; Sun, Z.; Dai, L. Trajectory tracking control for under-actuated hovercraft using differential flatness and reinforcement learning-based active disturbance rejection control. J. Syst. Sci. Complex. 2022, 35, 502–521. [Google Scholar] [CrossRef]
Liu, D.; Gao, Q.; Chen, Z.; Liu, Z. Linear active disturbance rejection control of a two-degrees-of-freedom manipulator. Math. Probl. Eng. 2020, 2020, 6969207. [Google Scholar] [CrossRef]
Feng, Y.; Yu, X.; Man, Z. Non-singular terminal sliding mode control of rigid manipulators. Automatica 2002, 38, 2159–2167. [Google Scholar] [CrossRef]
Yu, S.; Yu, X.; Shirinzadeh, B.; Man, Z. Continuous finite-time control for robotic manipulators with terminal sliding mode. Automatica 2005, 41, 1957–1964. [Google Scholar] [CrossRef]
Sun, Z.Y.; Yun, M.M.; Li, T. A new approach to fast global finite-time stabilization of high-order nonlinear system. Automatica 2017, 81, 455–463. [Google Scholar] [CrossRef]
Nguyen, V.; Lin, C.; Su, S.; Sun, W.; Er, M.J. Global finite time active disturbance rejection control for parallel manipulators with unknown bounded uncertainties. IEEE Trans. Syst. Man, Cybern. Syst. 2020, 51, 7838–7849. [Google Scholar] [CrossRef]
Xie, Y.; Ma, Q.; Ahn, C.K. Finite-time adaptive tracking control for output-constrained nonlinear systems: An improved command filter approach. IEEE Trans. Syst. Man, Cybern. Syst. 2024, 54, 6103–6112. [Google Scholar] [CrossRef]
Elahi, A.; Alfi, A.; Chadli, M. Fixed-time consensus control for uncertain heterogeneous multi-agent systems with high-order dynamics and time-varying delay under generic topologies. Math. Comput. Simul. 2024, 225, 111–128. [Google Scholar] [CrossRef]
Li, Y.; Lu, G.; Li, K. Fuzzy adaptive event-triggered consensus control for nonlinear multiagent systems with output constraints and DoS attacks. IEEE Trans. Cybern. 2024, 55, 2–13. [Google Scholar] [CrossRef]
Noorizadegan, A.; Chen, C.S.; Cavoretto, R.; De Rossi, A. Efficient truncated randomized SVD for mesh-free kernel methods. Comput. Math. Appl. 2024, 164, 12–20. [Google Scholar] [CrossRef]

Figure 1. Overall block diagram of the control system.

Figure 2. Illustration of the event-triggered mechanism.

Figure 3. Configuration of the master–slave 3-DoF manipulators.

Figure 4. Comparison of the 3D spiral end-effector trajectories of the master and slave arms (master: solid; slave: dashed).

Figure 5. Joint-space tracking error curves for the three joints.

Figure 6. Comparison of the true disturbances and the ESO-estimated disturbances for the three joints.

Figure 7. Comparison of the

X / Y / Z

position components of the master and slave end-effector trajectories.

Figure 7. Comparison of the

X / Y / Z

position components of the master and slave end-effector trajectories.

Figure 8. (a) Staircase plot of the event-trigger indicator sequence; (b) joint-1 error curve with triggering instants marked.

Figure 9. Comparison of the 3D non-periodic end-effector trajectories of the master and slave arms under non-periodic paths.

Figure 10. Joint-space tracking error curves for the three joints under non-periodic motion under non-periodic paths.

Figure 11. Comparison of the true disturbances and the ESO-estimated disturbances for the three joints under non-periodic paths.

Figure 12. Comparison of the position components showing exponential approach, damped oscillation, and logarithmic growth under non-periodic paths.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dai, Y. Adaptive Cooperative Control of Dual-Arm Robots Using RBF-ADP with Event-Triggering Mechanism. Symmetry 2026, 18, 437. https://doi.org/10.3390/sym18030437

AMA Style

Dai Y. Adaptive Cooperative Control of Dual-Arm Robots Using RBF-ADP with Event-Triggering Mechanism. Symmetry. 2026; 18(3):437. https://doi.org/10.3390/sym18030437

Chicago/Turabian Style

Dai, Yuanwei. 2026. "Adaptive Cooperative Control of Dual-Arm Robots Using RBF-ADP with Event-Triggering Mechanism" Symmetry 18, no. 3: 437. https://doi.org/10.3390/sym18030437

APA Style

Dai, Y. (2026). Adaptive Cooperative Control of Dual-Arm Robots Using RBF-ADP with Event-Triggering Mechanism. Symmetry, 18(3), 437. https://doi.org/10.3390/sym18030437

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Cooperative Control of Dual-Arm Robots Using RBF-ADP with Event-Triggering Mechanism

Abstract

1. Introduction

2. Method

2.1. Master–Slave Cooperative Structure and Error Definition

2.2. Structure of the RBF–ADP Controller

2.3. Smooth Nonlinear Extended State Observer

2.4. Event-Triggered Mechanism

3. Experiments and Results

3.1. Simulation Platform and System Setup

3.2. Simulation Results and Analysis

3.2.1. Periodic Paths Validation

3.2.2. Non-Periodic Paths Validation

4. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI