1. Introduction
In recent years, robotic technologies have been increasingly applied in industrial production, particularly in domains such as welding and assembly [
1,
2]. However, the level of automation remains constrained when facing a wide range of unstructured environments and flexible manufacturing tasks [
3]. To address this issue, the concept of human–robot collaboration (HRC) has been proposed, aiming to integrate human decision-making capabilities with the high payload capacity of robots, thereby demonstrating substantial application potential [
4].
A key challenge in the development of HRC technologies lies in the lack of efficient human–robot interaction interfaces, which hampers accurate transmission of human motion intent. Currently, physical interaction, primarily based on force sensors, remains the dominant mode of HRC [
5]. However, since force sensors acquire signals based on contact-induced strain, it is often difficult to distinguish between forces originating from the human operator and those resulting from interactions with the external environment [
6,
7]. This significantly limits the accuracy of capturing actual human-applied forces.
With the continuous integration of interdisciplinary research, scholars have increasingly recognized the strong correlation between EMG signals, commonly used in the medical field to assess human motion, and muscle force output [
8]. Early arm force estimation methods primarily relied on muscle dynamics models, such as the Hill-type model and its variants [
9]. However, constructing these models requires precise acquisition of individual-specific anatomical parameters, including muscle and skeletal structures. As the dimensionality of force estimation increases, the complexity and computational burden of these models rise sharply, presenting significant challenges for practical deployment [
10].
With the advancement of computational technologies, researchers have increasingly turned to machine learning algorithms to directly model EMG signals, enabling force estimation without depending on musculoskeletal structural parameters. Several EMG-to-force mapping approaches have been proposed, including Artificial Neural Networks (ANNs) [
11], Long Short-Term Memory networks (LSTMs) [
12], Principal Component Interaction (PCI) modeling [
13], and Convolutional Neural Networks (CNNs) [
14].
However, current machine learning models often emphasize unidirectional force estimation, which limits their effectiveness in complex industrial applications [
15,
16]. To enhance model generalizability, Su et al. proposed decomposing arm force output into three orthogonal directions in Cartesian space and modeling each direction independently using a Deep Convolutional Neural Network (DCNN) [
17]. Building on this idea, Zhang et al. introduced a Bayesian approach that first identifies the dominant force direction and then applies parallel LSTM networks for direction-specific force estimation [
18]. While these methods improve multidimensional force prediction, they overlook a key physiological reality: in real-world tasks, muscle groups typically operate in a coordinated, multi-directional fashion, resulting in complex interdependencies and mutual interference. As a result, modeling each direction separately may oversimplify the intricate dynamics of muscle coordination and limit the model’s ability to capture the underlying biomechanical interactions.
As demand grows for multidimensional dynamic arm force estimation, researchers have increasingly recognized the importance of arm posture in modeling. During human movement, joint angles reflect physiological factors such as muscle length variations and contraction types. These not only affect the amplitude dynamics of EMG signals but also modulate inter-muscle coordination, thereby significantly influencing the generalization ability of force estimation models [
19,
20]. To address this, Mobasser et al. proposed a Fast Orthogonal Search (FOS)-based elbow joint force estimation model that incorporates both EMG and joint angle information. The method demonstrated its effectiveness under isometric, isobaric, and low-load conditions [
21].
Although incorporating joint angles improves model accuracy, increasing movement dimensionality and complexity introduces greater challenges for training and fitting. To address this, Xie et al. proposed a method that takes both joint angles and EMG signals as inputs and constructs a Deep Long Short-Term Memory (Deep-LSTM) network for dynamic arm force estimation [
22]. This approach is considered one of the most effective techniques for improving dynamic force estimation accuracy. However, current methods lack explicit modeling of how posture affects muscle synergy. Instead, these approaches rely on the model to implicitly learn the complex nonlinear coupling between EMG features and joint configurations from data, which undoubtedly increases the learning burden and may limit generalization under multi-task conditions. Therefore, achieving accurate multidimensional dynamic arm force estimation remains a substantial challenge.
Beyond perception of human intent, another core issue in HRC lies in the design of effective collaboration control strategies [
23]. To enhance the responsiveness and compliance of robotic systems during collaboration, mainstream approaches commonly utilize impedance or admittance control frameworks, which dynamically adjust control parameters to adapt to interaction forces [
24]. For instance, Bednarczyk et al. proposed a variable impedance control method that integrates human-applied forces into the control model to optimize robot interaction performance [
25]. Yao introduced an adaptive admittance control method using an admittance filter, which adjusts parameters based on estimated joint torques, thereby improving the applicability of rehabilitation robots in assisted motion [
26].
Meanwhile, data-driven strategies have gradually been integrated into variable impedance frameworks due to advancements in machine learning, significantly enhancing the adaptability of control models. For example, impedance parameter optimization via iterative learning has been successfully applied in industrial tasks such as polishing and assembly [
27,
28]. Yang and Anand introduced reinforcement learning (RL) into controller design, enabling adaptive adjustment of impedance parameters through policy optimization, thus substantially improving compliant control performance [
29,
30]. However, despite notable success in improving flexibility and precision, these methods still struggle to meet real-world demands in highly dynamic and uncertain HRC environments, due to limitations of static models and single-strategy optimization [
31].
In recent years, imitation learning has emerged as a promising approach for real-time HRC. By learning from human demonstrations, robots can participate in collaborative tasks more naturally [
32,
33]. Liao et al. restructured the Dynamic Movement Primitives (DMP) formulation using Riemannian metrics, incorporating position, orientation, and stiffness information to enable the transfer and reproduction of humanoid multi-space skills [
34]. Additionally, traditional machine learning techniques such as Gaussian Mixture Models, Random Forests, and Hidden Markov Models have laid solid foundations for imitation learning [
35,
36]. However, since traditional imitation learning typically relies on predefined trajectories and is more suited for offline behavior cloning, it faces limitations in processing high-dimensional data and handling dynamic real-time interactions, which restricts its effectiveness in collaborative scenarios [
37].
Notably, reinforcement learning has demonstrated outstanding performance even in the absence of precise dynamic models, making it well-suited for complex and frequently changing collaborative environments [
38]. For example, Zhang proposed using RL to directly extract force-related information from EMG signals, enabling robot motion control without relying on traditional control models, and successfully validated this approach in a human–robot collaborative sawing task [
39]. The organic integration of imitation learning and reinforcement learning is expected to become a promising direction for novel skill acquisition and the exploitation of advanced collaborative control models. Currently, imitation reinforcement learning has been predominantly applied in domains such as autonomous driving and robotic manipulation [
32,
40]. However, research specifically targeting human skill acquisition and the enhancement of human–robot collaborative control through this paradigm remains relatively limited.
To address the aforementioned challenges, an EMG-driven dynamic arm force estimation model and an imitation reinforcement learning-based collaborative control framework are proposed to enhance performance in human–robot collaborative assembly. The overall research framework is illustrated in
Figure 1.
First, joint angle information and EMG signals collected from the human body are processed to extract features for arm force estimation. To account for the influence of joint angle variations on force estimation, a Temporal Graph Neural Network incorporating an Angle-Guided Attention mechanism (AGA-LSTM) is introduced to capture changes in inter-muscle synergies and estimate three-dimensional (3D) arm force. The estimated human arm force is then used to extract key indicators of human skill from demonstration data. Specifically, this skill refers to the ability to adjust speed in response to changes in external collaborative forces. Additionally, the estimated arm force also serves as an interface for human–robot interaction.
Based on the skill indicators extracted from human demonstrations, an expert reward function is constructed. A performance-optimized control model is then trained using an imitation reinforcement learning algorithm, enhanced with a fuzzy rule-based experience replay partitioning strategy. The proposed control framework adopts a human-in-the-loop design, converting human interaction forces directly into robot motion commands. By leveraging human perceptual and decision-making abilities, the robot is guided to collaborate effectively with the operator. In cases of trajectory deviations, the operator can intuitively correct the robot’s motion by adjusting the applied force. This approach avoids rigid trajectory replication, enabling more flexible and autonomous human–robot interaction.
2. Construction of an EMG Signals-Based Dynamic Arm Force Estimation Model
To meet the requirements for interaction force smoothness and accuracy in human–robot collaborative control, this section focuses on the acquisition and processing of arm motion signals and the construction of arm force estimation model.
2.1. Acquisition and Processing of Arm Motion Signals
2.1.1. Methods for Acquiring Arm Motion Signals
Figure 2 presents the human arm motion signal acquisition platform, which supplies training data for the EMG-based dynamic force estimation model. A goniometer is positioned on the outer side of the elbow to record one-dimensional elbow joint angle data. An IMU, attached to the inner side of the upper arm, captures the shoulder’s three-dimensional Euler angles. Six EMG electrodes are placed on key muscle groups, including the biceps/triceps brachii, anterior/posterior deltoids, pectoralis major, and infraspinatus. Arm-generated forces are simultaneously recorded using a six-axis force sensor.
During data collection, a human operator manually guides the robot by applying forces with varying intensities and directions, enabling free movement. At the same time, EMG intensity, joint angles, and force measurements are recorded. The operator is oriented toward the YZ-plane of the robot’s base frame, and the applied forces are decomposed into three orthogonal components (X, Y, and Z) relative to the robot’s coordinate system.
2.1.2. Feature Extraction from EMG Signals
It is necessary to extract features that correlate with the arm force from these time-varying signals. Given that the amplitude of the EMG signal is strongly correlated with arm force, a moving root mean square (RMS) filter is applied to the raw EMG signal to extract amplitude features. The processing is described by the following equation:
where,
Eraw[
m] denotes the
m-th denoised EMG signal and
ERMS[
i] represents the
i-th signal after processing by the RMS filter with a sliding window of size
N.
As shown in
Figure 3, even after RMS filtering, the feature signal still exhibits considerable fluctuations. Therefore, a simple discrete low-pass filter is applied to smooth the extracted features. The smoothing process can be described as follows:
where
ELPF[
n] denotes the
n-th output of the low-pass filter and
ξ is the filter’s difference weight.
According to Ref. [
41], the sliding window size
N and the filter’s difference weight
ξ are set to 64 and 0.00628, respectively, demonstrating a good balance between signal smoothness and responsiveness.
2.2. A 3D Arm Force Estimation Model Based on an Angle-Guided Attention Temporal Graph Neural Network (AGA-TGNN)
In EMG-based arm force estimation model, muscle activation patterns are significantly influenced by joint angles. The contribution of individual muscles varies with different joint angles; for instance, the biceps brachii exerts more force at smaller flexion angles, whereas its contribution diminishes during extension, with the triceps brachii taking a dominant role during extension. Moreover, muscle coordination patterns change with joint angle variations—alterations in shoulder angles, for example, can modify the synergistic behavior between the pectoralis major and the anterior deltoid. These angle-dependent dynamics make it challenging to accurately estimate arm force using traditional methods.
Existing approaches such as CNNs are effective at extracting local spatial features from EMG signals but fall short in capturing the global inter-muscle cooperative relationships. Although LSTM networks are well-suited for temporal modeling, they lack the capability to capture the spatial dependencies among muscles, making them less adaptable to the variations in muscle coordination under different joint angles. Therefore, a model that can simultaneously capture the topological structure of muscle groups and their temporal variations is needed to more accurately model muscle coordination patterns under varying joint angles.
Graph Neural Networks (GNNs) provide an effective approach to modeling the topological structure of EMG signals by employing an adjacency matrix to represent inter-muscle relationships. However, conventional GNNs typically rely on a fixed adjacency matrix, which fails to capture the dynamic changes in muscle coordination induced by variations in joint angles. To overcome this limitation, an Angle-Guided Attention mechanism is introduced, enabling the adaptive adjustment of the adjacency matrix based on joint angle information. This enhancement significantly improves the model’s generalization capability across diverse motion conditions.
To further strengthen the temporal dependency modeling of EMG signals, an LSTM is integrated following the GNN, forming a Temporal Graph Neural Network (TGNN). In this architecture, the GNN extracts spatial features related to muscle topology, while the LSTM models the time dependencies of the EMG signals. The combination of GNN and LSTM enables the model to capture both the spatial cooperation among muscles and the temporal patterns, thereby improving the accuracy of arm force estimation. Based on these considerations, the AGA-TGNN is proposed.
(a) Motivation of EMG Signals Modulated by Joint Angles
Human arm output force arises from the coordinated activation of multiple muscle groups, governed by both muscular dynamics and skeletal biomechanics. According to physiological coordination dynamics, muscle-generated tension is first transformed into joint torques through a moment matrix
R(
θ) and then mapped to the arm output force via the Jacobian matrix
J(
θ). The dynamic formulation of the estimated arm force
Festm is given by the following:
where,
Fmsc is the muscle force, and both
R(
θ) and
J(
θ) are functions of joint angles
θ.
This formulation illustrates that the resulting arm output force is influenced not only by muscle activity (as measured by EMG) but also by the arm’s current posture. In this paper, the muscle force
Fmsc is modeled using a neural network, leading to an updated force estimation expression:
Given the complexity of directly measuring muscle forces and modeling the analytical forms of
R(
θ) and
J(
θ), joint angle information is incorporated as a prior into the neural network to guide the learning process. This allows the model to learn the topological coordination pattern among muscle groups. Accordingly, the dynamic force estimation model consists of two submodules as follows:
where,
Eadj denotes the EMG amplitudes modulated by joint angles;
fNN1 represents a Temporal Graph Neural Network (TGNN) that estimates 3D arm force based on EMG signals;
fNN2 is the Angle-Guided Attention (AGA) module that learns a posture-dependent adjacency matrix to guide the TGNN. By integrating joint angle information as a prior, this mechanism enables the network to dynamically capture inter-muscle coordination patterns under varying postures.
(b) Angle-Guided Attention Module
Traditional attention mechanisms, such as the self-attention mechanism in Transformers, adjust weights by computing the “importance” or “relevance” between elements. Inspired by this principle, the Angle-Guided Attention mechanism is developed to dynamically capture changing muscle coordination patterns. Specifically, a three-layer multilayer perceptron (MLP) is used to transform the joint angle information
θ[
t] into an adjacency matrix
A′[
t]:
where,
θ[
t] is a four-dimensional vector comprising the joint angles of the elbow and the Euler angles of the shoulder;
W1,
W2, and
W3 are the weight matrices of the MLP;
b1,
b2, and
b3 are the bias vectors of the MLP.
Since the adjacency matrix must satisfy a row-normalization constraint (i.e., the sum of the connection probabilities for each node equals 1),
A′[
t] is normalized using the softmax function to obtain the final adjacency matrix A[
t]:
where,
Ai,j[
t] represents the connection weight between EMG channel
i and
j at time
t. In this study, six muscles were utilized for arm force estimation. Accordingly, to represent the interactions between all pairs of muscle groups, the adjacency matrix was designed with a dimension of 6 × 6.
(c) Temporal Graph Neural Network Model
The GNN employs a graph convolution operation based on the Angle-Guided Attention-derived adjacency matrix to extract spatial dependencies among EMG signals. The resulting temporal features are then fed into an LSTM network for sequential modeling. The LSTM’s hidden state is passed through a fully connected (FC) layer to obtain the final force estimation. This process is illustrated in the following network structure:
where,
H[
t] represents the GNN input consisting of
d1 time-series
ELPF from six muscle groups (with
d1 = 6 in this parper);
Wg is the weight matrix of the GNN;
G[
t] is the GNN’s feature matrix at time
t;
h(
t) denotes the LSTM’s hidden state at time
t;
Fpred[
t] represents the estimated arm force at time
t.
(d) Loss Function and Regularization
To minimize the estimation error, the mean squared error (MSE) between the true arm force
Ftrue[
t] and the estimated arm force
Fpred[
t] is used as the loss function:
where,
NL is the number of samples.
Additionally, to prevent overfitting and to ensure that the adjacency matrix
A(
t) does not undergo extreme variations, a regularization term based on the Frobenius norm is added:
where, λ is the regularization coefficient. The regularization parameter λ is selected from a commonly used range between 1 × 10
−5 and 1 × 10
−3. In this paper, preliminary experimental results show that λ = 1 × 10
−4 achieve the best balance between training accuracy and generalization performance, yielding the lowest validation error and minimal overfitting.
The overall loss function thus combines the MSE and the regularization term, guiding the model to learn a more accurate and stable arm force estimation.
4. Experimental Analysis and Discussion
4.1. Experimental Setup
Figure 8 illustrates the schematic of the human–robot collaborative assembly experimental platform. The platform is built using a GSK 8 kg industrial robot Guangzhou GSK Technology Co., Ltd. Guangzhou, China). EMG signals and elbow joint angles are collected using sensors and goniometers from Biometrics Ltd. (Cwmfelinfach, UK), while the shoulder joint Euler angles are acquired through an inertial measurement unit (IMU) from Hipnuc Electronics Technology Co., Ltd. (Beijing, China). A six-axis force/torque sensor from Me-Systeme GmbH (Stuttgart, Germany) is mounted at the robot’s end-effector; however, it is not involved in the control loop. Instead, it is used to collect ground-truth force data for evaluating the performance of the force estimation model.
During the experiment, the robot adjusts its position by following the human arm force, enabling collaborative assembly between the human and the robot. The initial offset between the peg and the hole is set to Lx = 300, Ly = 200 mm, and Lz = 40 mm along the X, Y, and Z axes, respectively. The assembly peg has a diameter of 9.98 mm, while the hole diameter is 11.03 mm.
4.2. Ablation Study of the Proposed Arm Force Estimation Model
To validate the effectiveness of the AGA module, two ablation models are constructed for comparison:
Fixed-Adjacency Matrix TGNN: This model constructs inter-muscle connections using a static adjacency matrix. The model uses the same EMG and joint angle inputs as the full AGA-TGNN, enabling a fair comparison and highlighting the limitations of Fixed-Adjacency Matrix structures under dynamic tasks.
LSTM-only (TGNN without GNN): In this version, the GNN is removed entirely, and only a LSTM is used to process EMG sequences. Joint angle inputs and spatial topology information are excluded, allowing assessment of the importance of spatial topology and joint angle information in modeling muscle coordination.
The results are summarized in
Figure 9 and
Table 2. The AGA-TGNN model achieves superior performance across all key metrics, including median, mean, and maximum estimation errors. The performance gains are primarily attributed to the following factors:
The Fixed-Adjacency Matrix TGNN lacks adaptability to joint angles (arm posture) variations, as its connectivity structure remains constant regardless of joint configuration. This limits its ability to capture joint angles-dependent muscle coordination patterns.
The LSTM-only model fails to incorporate spatial topology and joint angles information, resulting in poor representation of the nonlinear coupling between muscle activations and arm output force.
In conclusion, the Angle-Guided Attention module effectively integrates joint angle information to dynamically adjust muscle interaction modeling, significantly improving the accuracy of force estimation in dynamic scenarios.
4.3. Validation of Arm Force Estimation Model
This study selects several representative models that have demonstrated strong performance in force estimation tasks. The specific configurations are as follows:
Comparison Model 1 (FOS Model): The FOS model constructs an arm force fitting model based on joint angles and the amplitude of EMG signals using an orthogonal search strategy. It represents a traditional analytical approach to force estimation.
Comparison Model 2 (LSTM Model): This model employs LSTM networks to model the temporal sequences of EMG signals. As a widely adopted method for force estimation, it exhibits strong capabilities in learning temporal dependencies.
Comparison Model 3 (Deep-LSTM Model): Based on Deep-LSTM, this model performs feature extraction and modeling of EMG signals. With its powerful spatial pattern recognition and nonlinear modeling capabilities, it is one of the most competitive end-to-end force estimation approaches to date.
The experimental results of the three comparison models are shown in
Figure 10, with the corresponding error distribution box plots presented in
Figure 11. The statistical performance metrics are listed in
Table 3.
As observed from
Table 3,
Figure 10 and
Figure 11, the proposed AGA-TGNN model outperforms existing methods in terms of median error, maximum error, and mean error across all three force components. Compared with the Deep-LSTM model, the proposed model achieves reductions in mean estimation error of 10.38%, 8.33%, and 11.20% in the X, Y, and Z directions, respectively, demonstrating a significant performance advantage.
The performance differences can be attributed to the following factors:
First, FOS is a traditional model based on analytical expressions. While it can rapidly construct predictive functions in low-dimensional spaces, it lacks sufficient expressive power to capture the highly nonlinear relationship between EMG signals and arm forces. Furthermore, its fitting process is sensitive to signal fluctuations and lacks the ability to model dynamic features, resulting in unstable estimation performance. As the number of input muscles increases, the computational complexity of FOS grows exponentially without a corresponding improvement in fitting accuracy, limiting its performance in multi-dimensional dynamic scenarios.
Second, while LSTM and Deep-LSTM are both deep learning approaches with strong temporal modeling and feature extraction capabilities, respectively, neither considers the dynamic variation in inter-muscle coordination arising from changing joint states. Specifically, both models directly couple EMG and angle signals without explicitly accounting for the underlying physiological and physical interaction mechanisms. By completely relying on neural networks to decouple the relationship between EMG signals and joint angles, they fail to capture the complex, dynamic interactions, leading to insufficient generalization ability.
In contrast, the proposed method introduces an Angle-Guided Attention mechanism, which dynamically adjusts the adjacency matrix of the GNN based on joint angles. This approach effectively decouples the dynamic relationships between muscle groups and joint angles, enhancing the physiological plausibility of the input signal structure. By improving the model’s sensitivity to key muscle coordination patterns, this mechanism significantly enhances the robustness and generalization ability of the network, enabling stable and accurate force estimation across varying motion conditions.
4.4. Training of the Imitation Reinforcement Learning Model
As illustrated in
Figure 4, the imitation reinforcement learning framework employed in this work comprises an Actor–Critic (AC) network and an experience replay pool. Based on the reward mechanism defined in Equation (17), the model is trained online with the goal of directly translating human arm forces into robot control commands. This approach aims to overcome the parameter tuning challenges inherent in conventional impedance control strategies.
However, as the initial imitation reinforcement learning model is untrained, it cannot immediately perform collaborative assembly tasks. To ensure the safety of both the experimental equipment and human participants, the training process is divided into two phases: pre-training and task training.
In the pre-training phase, a kinesthetic teaching setup is adopted, where the human operator physically guides the robot without engaging in actual assembly. It is assumed that human motor adaptation skills in response to external collaborative forces are generally similar across different directions. Therefore, to reduce training complexity, motion coordination is learned along a single axis, the X-direction, as a representative case. The human operator repetitively moves the robot along this axis to help the model acquire basic motion alignment and compliance characteristics.
Following pre-training, the imitation reinforcement learning model gains an initial ability to coordinate with human motion—namely, aligning the robot’s movement direction with the direction of human-applied forces while maintaining a degree of compliance. The model then enters the task training phase, which takes place in the collaborative assembly scenario shown in
Figure 8. To prevent excessive contact forces during training, a safety mechanism is implemented: if the force between the peg and assembly hole exceeds 50 N, the current cycle is terminated, and the robot returns to its initial position before restarting. This task training stage aims to further improve motion smoothness and collaborative comfort by optimizing the policy with respect to the expert-defined reward.
During training, the robot operates on a 50 ms control cycle. In each cycle, the system performs arm force estimation, sub-motion classification, policy update via imitation reinforcement learning, and robot control command generation.
Figure 12a shows the reward curve during the pre-training stage. It can be observed that, under the guidance of the penalty mechanism, the robot achieves alignment between its movement direction and the human’s applied force direction after approximately 2000 training iterations, demonstrating good following behavior. However, due to the limited diversity of movement in long-term reciprocating training, this phase tends to result in high robot velocities even in response to small human forces. Therefore, once the robot has acquired basic following ability, further optimization through task training becomes necessary.
In
Figure 12b, the blue curve represents the total reward for each task cycle. Significant fluctuations in the early training stages suggest that the control model at that time did not fully conform to the expert demonstration trajectories. The red curve shows a fitted trend for the task rewards, indicating an overall upward trend with fluctuations, eventually converging to a stable value. After multiple rounds of training, the model’s performance surpasses that of the demonstration (the reward value of the expert demonstration is 2). This outcome may be attributed to the challenges of collecting human data, which is often limited by the degree of coordination and synergy between the leader and follower.
4.5. Comparative Analysis of Control Models
To evaluate the collaboration performance of the proposed imitation reinforcement learning-based control strategy, a comparative study is conducted against two state-of-the-art control approaches: the Adaptive Admittance Control (ADC) model [
18] and the Gaussian Process Regression-based Imitation Learning (GPRIL) model [
26].
To ensure a fair comparison within the experimental context of this study, necessary adaptations are made to each of the comparison models.
For the ADC model, the algorithm proposed in Ref. [
21] is considered, which employs joint stiffness estimation to adaptively tune PD parameters. This approach is particularly effective in addressing the trade-off between precise trajectory tracking and compliant assistance. However, since this study does not involve joint stiffness estimation, it is substituted with the estimated arm output force. Under this modification, the PD parameters are adjusted as follows:
where,
kd is the derivative term parameter,
kp is the proportional term parameter,
and
indicates the maximum and minimum value of the parameter
kp, and
and
are the maximum and minimum arm output forces from the operator during the experiment.
Regarding the GPRIL model, the multi-model Gaussian Process Regression technique, as proposed in the Ref. [
26], is employed to model the relationship between the leader’s force and the follower’s speed during demonstrations.
To further evaluate the effectiveness of the proposed fuzzy experience replay mechanism, an ablation model, used as an additional comparison model, is constructed by removing this component from the original imitation reinforcement learning framework.
In this study, each control model is evaluated through 30 repeated experiments to assess the collaborative comfort, smoothness, task completion speed, and accuracy under different control methods. The performance statistics are summarized in
Table 4, while
Figure 13 illustrates the operator’s output force and the robot’s movement speed across the three control models. The methods used to calculate collaborative comfort and smoothness are detailed in Equations (13) and (16). The evaluation criteria for accuracy and task completion speed are outlined as follows:
(a) Assembly Accuracy
Assembly error is a commonly used metric to evaluate collaboration precision. After the robot reaches the target position, the deviation between the robot’s final position and the desired position in the XY plane is used to assess the accuracy of the collaborative assembly. The assembly deviation can be expressed as follows:
where,
and
denote the assembly errors in the X and Y directions, respectively.
(b) Success Rate
The success rate of assembly serves as a direct indicator of collaboration effectiveness. To avoid collisions during the process, a safety threshold is set: if the contact force between the peg and hole surpasses 50 N, the robot stops instantly, and the attempt is deemed unsuccessful. Likewise, if the assembly duration exceeds 30 s, the task is also classified as a failure.
(c) Efficiency
The time from task initiation to the moment the assembly peg is inserted more than 10 mm into the assembly hole is defined as the assembly time. In collaborative scenarios, human operators typically prefer shorter assembly times to improve task efficiency.
As shown in
Table 4, the GPRIL model achieves collaborative comfort and smoothness values close to one. However, due to its limited learning capacity, it struggles to surpass expert-level performance.
In the ADC model, PD parameters are adjusted proportionally to the output force: higher forces lead to increased PD gains, while lower forces reduce them. This adjustment enables the robot to perform precise control when the operator exerts low force and executes rapid movements under high force. However, auxiliary motions during collaborative tasks often require coordination across three dimensions, and not all directions are suitable for high-speed execution. As a result, frequent acceleration and deceleration occur, as illustrated by the orange and blue curves in
Figure 13f.
As shown in
Figure 13, the robot’s motion under the proposed imitation reinforcement learning-based control model is noticeably smoother than that of the comparison models. As shown in
Table 4, the proposed method achieves comprehensive improvements over the ADC model across all evaluation metrics. Specifically, collaborative comfort increases by 2.75%, and collaborative smoothness improves by 8.42%, reflecting enhanced physical interaction quality. In terms of task efficiency, the assembly time is reduced by 1.63%, and the assembly error decreases by 17.65%, indicating higher precision. Moreover, the assembly success rate improves by 7.2%.
Moreover, as shown in
Table 4, the removal of the fuzzy experience replay mechanism resulted in significant performance degradation. Without sub-motion-based categorization, experiences from different motion phases interfered with each other during training, leading to decreased learning efficiency, reduced task execution accuracy, and visibly diminished coordination between human and robot during collaboration. These findings underscore the critical role of fuzzy experience replay in mitigating motion interference, promoting balanced learning across diverse sub-motion patterns, and ultimately enhancing the stability and responsiveness of human–robot collaboration under dynamic conditions.
To further validate the robustness of the proposed method across different individuals, two healthy adult participants are recruited to perform both the calibration of the arm force estimation model and the human–robot collaborative assembly tasks. The corresponding performance metrics are summarized in
Table 5. The experimental results indicate that both participants successfully complete the assembly tasks with high accuracy, which further confirms the effectiveness of the proposed method.
Notably, inter-subject variability in EMG signals remains a significant challenge. The proposed dynamic force estimation model is designed as a learning framework rather than a universal solution. Different individuals require personalized calibration following the outlined training procedure. Applying a model trained on one subject directly to another leads to a marked decline in estimation accuracy and overall collaborative performance. Future work will explore transfer learning techniques with the aim of reducing the training time required to adapt force estimation models across different users.
5. Conclusions
Inspired by the remarkable human ability to coordinate force and speed in collaborative tasks, this study proposes an innovative framework addressing two key challenges: three-dimensional arm force estimation and human-robot collaborative control.
For force estimation, an AGA-TGNN algorithm is designed, which significantly enhances the model’s capacity to capture dynamic interactions between muscle coordination patterns and joint states. Experimental results demonstrate that the proposed model outperforms representative comparison methods (FOS, LSTM, and Deep-LSTM) across multiple metrics, including median error, mean error, and maximum error. Specifically, compared to the Deep-LSTM model, estimation errors in the three directions are reduced by 10.38%, 8.33%, and 11.20%, respectively.
In the domain of collaborative control, to overcome the limitations of traditional methods in compliance and generalization, an expert reward mechanism is introduced alongside a fuzzy rule-based experience replay partitioning strategy. A novel imitation reinforcement learning algorithm is developed, which requires no explicit control model and exhibits strong adaptability. This approach enables the robot to respond flexibly to operator motion intentions by perceiving arm forces in real time, thereby enhancing the comfort and smoothness of human–robot collaboration.
The proposed method is validated in a human–robot collaborative assembly scenario. Experimental results demonstrate that, compared with the representative GPRIL and ADC models, the proposed approach achieves substantial improvements across key performance metrics, including collaborative comfort, motion stability, task efficiency, and assembly accuracy. Specifically, compared to the ADC model, collaborative comfort increases by 2.75%, collaborative smoothness improves by 8.42%, assembly time is reduced by 1.63%, assembly error decreases by 17.65%, and the assembly success rate improves by 7.2%. These findings support the feasibility of the proposed approach in practical human–robot collaboration.
The current arm force estimation model only predicts three-dimensional Cartesian forces and does not account for torque components. As a result, the human–robot collaborative assembly system lacks the capability to perform posture adjustments, which limits its effectiveness in tasks requiring precise rotational control. To address this, future work will explore the correlation between EMG signals and arm torques, with the goal of enabling posture-aware collaboration.