Continuous Estimation of sEMG-Based Upper-Limb Joint Angles in the Time–Frequency Domain Using a Scale Temporal–Channel Cross-Encoder

Han, Xu; Chen, Haodong; Cheng, Xinyu; Zhao, Ping

doi:10.3390/act14080378

Open AccessArticle

Continuous Estimation of sEMG-Based Upper-Limb Joint Angles in the Time–Frequency Domain Using a Scale Temporal–Channel Cross-Encoder

by

Xu Han

¹

,

Haodong Chen

²

,

Xinyu Cheng

¹ and

Ping Zhao

^1,*

¹

School of Mechanical Engineering, Hefei University of Technology, Hefei 230009, China

²

Seattle Children’s Hospital, Seattle, WA 98144, USA

^*

Author to whom correspondence should be addressed.

Actuators 2025, 14(8), 378; https://doi.org/10.3390/act14080378

Submission received: 20 June 2025 / Revised: 29 July 2025 / Accepted: 30 July 2025 / Published: 31 July 2025

(This article belongs to the Special Issue Intelligent Systems, Robots and Devices for Healthcare and Rehabilitation)

Download

Browse Figures

Versions Notes

Abstract

Surface electromyographic (sEMG) signal-driven joint-angle estimation plays a critical role in intelligent rehabilitation systems, as its accuracy directly affects both control performance and rehabilitation efficacy. This study proposes a continuous elbow joint angle estimation method based on time–frequency domain analysis. Raw sEMG signals were processed using the Short-Time Fourier Transform (STFT) to extract time–frequency features. A Scale Temporal–Channel Cross-Encoder (STCCE) network was developed, integrating temporal and channel attention mechanisms to enhance feature representation and establish the mapping from sEMG signals to elbow joint angles. The model was trained and evaluated on a dataset comprising approximately 103,000 samples collected from seven subjects. In the single-subject test set, the proposed STCCE model achieved an average Mean Absolute Error (MAE) of

2.96 \pm {0.24}^{\circ}

, Root Mean Square Error (RMSE) of

4.41 \pm {0.45}^{\circ}

, Coefficient of Determination (

R^{2}

) of

0.9924 \pm 0.0020

, and Correlation Coefficient (CC) of

0.9963 \pm 0.0010

. It achieved a MAE of

{3.30}^{\circ}

, RMSE of

{4.75}^{\circ}

,

R^{2}

of 0.9915, and CC of 0.9962 on the multi-subject test set, and an average MAE of

15.53 \pm {1.80}^{\circ}

, RMSE of

21.72 \pm {2.85}^{\circ}

,

R^{2}

of

0.8141 \pm 0.0540

, and CC of

0.9100 \pm 0.0306

on the inter-subject test set. These results demonstrated that the STCCE model enabled accurate joint-angle estimation in the time–frequency domain, contributing to a better motion intent perception for upper-limb rehabilitation.

Keywords:

surface electromyography; continuous angle estimation; Short-Time Fourier Transform; temporal–channel attention; deep learning

1. Introduction

According to the World Stroke Organization (WSO), stroke is the second leading cause of death and disability worldwide [1]. Upper-limb hemiparesis is one of the most common motor impairments following stroke [2], and regaining upper-limb function is critical for restoring patients’ independence in daily life [3]. Intelligent rehabilitation systems [4,5,6,7,8], particularly upper-limb rehabilitation robots [9,10,11,12], have shown great potential in promoting neuroplasticity and functional recovery by delivering intensive, repetitive, and quantifiable training [13,14,15,16]. Achieving efficient and natural human–robot interactions is central to improving rehabilitation outcomes, and accurately perceiving the user’s motion intention is fundamental to this goal [17].

sEMG, as an electrophysiological representation of muscle activity on the skin surface, contains rich information about limb movement intention [18,19]. Its non-invasive and wearable characteristics, along with its ability to reflect movement intention at an early stage, makes it particularly valuable in fields such as rehabilitation robotics, prosthetic control, and human–machine interaction [20,21]. In particular, for upper-limb exoskeletons and rehabilitation robotic systems, sEMG signals are widely regarded as a crucial input source for achieving natural and compliant control [22,23]. Among these applications, continuously estimating joint angles is critical for assisting in the execution of various functional movements [24].

In recent years, significant progress has been made in estimating joint motion information using surface sEMG. Early studies primarily focused on gesture recognition and discrete state estimation [25,26,27]. With the advancement of research, continuous joint angle estimation has attracted increasing attention due to its ability to provide more refined motion control information [28]. In terms of feature extraction, researchers have explored various approaches to obtain more expressive and task-relevant features. For example, Xiao et al. extracted multiple time-domain features, including Mean Absolute Value (MAV), Zero Crossing (ZC), Waveform Length (WL), Slope Sign Changes (SSC), and Difference Absolute Standard Deviation Value (DASDV), and demonstrated their effectiveness in joint-angle estimation tasks [29]. Raj et al. used Integrated EMG (IEMG) and ZC as model inputs to estimate elbow joint displacement and velocity [30]. Time–frequency features, by simultaneously analyzing the variations of signals in both the time and frequency domains, can provide a more comprehensive description of the dynamic characteristics of sEMG signals, making them particularly suitable for decoding tasks involving non-stationary and continuous movements [31]. Several comparative studies have demonstrated that the incorporation of time–frequency information significantly improves classification accuracy and robustness in continuous motion estimation tasks [32]. Wen et al. extracted latent motion information from multi-scale time–frequency features using Variational Mode Decomposition (VMD) and Wavelet Packet Transform (WPT) and significantly improved continuous angle estimation performance through a Bidirectional LSTM (BiLSTM) network [33]. Alazrai et al. employed Discrete Wavelet Transform (DWT) to construct time–frequency representations of surface EMG signals and extracted time–frequency features to estimate the joint angles of the wrist and fingers [34]. Jiang et al. combined raw time-domain signals with frequency-domain features to achieve the high-precision continuous estimation of multi-joint angles, providing a promising solution for myoelectric prosthesis control [35]. Overall, the evolution of sEMG feature extraction, from basic time-domain descriptors to time–frequency analysis and multi-domain feature fusion, has greatly enhanced our ability to capture motion-related information embedded in the signals, laying a solid foundation for continuous joint-angle estimation.

In terms of estimation methods, advancements in machine learning and deep learning have significantly improved the accuracy of sEMG-based joint-angle estimation. Nonlinear models have demonstrated stronger fitting capabilities. For example, Zhang et al. employed the Whale Optimization Algorithm (WOA) to optimize a Support Vector Regression (SVR) model, reducing the RMSE of elbow-joint-angle estimation to

{10.86}^{\circ}

[36]. Artificial Neural Networks (ANNs) have also been widely explored to establish nonlinear mappings between sEMG features and joint angles [37,38]. Subsequently, deep learning models have shown better performance. In particular, Recurrent Neural Networks (RNNs) and their variants, such as Long Short-Term Memory (LSTM) [39] and BiLSTM [40], have proven highly effective in modeling temporal dependencies, making them well-suited for processing temporally correlated signals like sEMG. Ruan et al. applied an LSTM-based model using multi-channel time-domain sEMG features to simultaneously estimate elbow and wrist joint angles, outperforming conventional neural networks in both accuracy and stability [41]. To address the frequent synchronization issues between sEMG and joint-angle data during real-world acquisition, Ma et al. employed a BiLSTM model to estimate continuous shoulder and elbow joint movements under weakly synchronized conditions [42]. Convolutional Neural Networks (CNNs) have been used to extract spatial and temporal features from multi-channel sEMG signals. Hajian et al. proposed a Two-stream multi-scale Convolutional Neural Network (TS-CNN) architecture, which directly extracts and fuses hierarchical features from raw high-density EMG signals using convolutional kernels of different scales, enabling the simultaneous estimation of elbow joint angle and velocity [43]. Furthermore, due to variability in sEMG signals across individuals, various strategies have been explored to improve generalization. For example, some studies have explored fusing sEMG with EEG signals for multimodal learning [44] and employed domain adaptation methods to capture subject-invariant features [45]. Generally, deep learning models have gradually become the prevailing method for continuous sEMG-based joint-angle estimation and have showed significant advantages in terms of accuracy.

This study aims to develop a continuous elbow-joint-angle decoding method based on the time–frequency features of sEMG signals. A dataset is constructed using time–frequency features extracted by STFT. A Transformer-based [46] regression model is constructed, consisting of two key modules: an independent per-channel temporal attention encoder, responsible for capturing the time–frequency dynamics of each sEMG channel; and a cross-channel attention encoder, designed to model the spatial relationships among multiple channels. Additionally, to accelerate model convergence, a feature scaling module is introduced to amplify the input feature magnitudes. The proposed method is evaluated on a dataset including seven subjects and a total of about 103,000 samples. Evaluations were conducted using training and testing in three settings: single-subject datasets, a mixed multi-subject dataset, and a leave-one-subject-out dataset. The main contributions of this work can be summarized as follows:

This study constructs a sEMG–elbow-angle dataset consisting of over 100,000 samples collected from seven healthy subjects, providing a valuable data resource for continuous joint-angle estimation research.
A fixed Input Scaling operation is applied to amplify the time–frequency features, which accelerates model convergence and improves the accuracy of angle estimation.
We propose a novel STCCE, built upon a Transformer architecture, that integrates multi-scale temporal and channel attention mechanisms to effectively model the mapping from time–frequency sEMG features to joint angles.

This paper is organized as follows: Section 2 provides a detailed description of the data acquisition platform, participant information, data collection procedures, and the steps for data trimming and preprocessing. Section 3 describes the construction of the dataset, the proposed STCCE model structure, and the implementation and training details, as well as the evaluation metrics. Section 4 comprehensively shows and analyzes the test results of the model under single-subject, multi-subject, and inter-subject scenarios, and compares these results to those from other related studies. Section 5 concludes this work and discusses its current limitations and future research opportunities.

2. Experiment Setup and Data Collection

2.1. Experiment Platform

The experiment platform setting in Figure 1a,b shows the sEMG and joint-angle sensors used in this study. The Myo armband (Thalmic Labs Inc., Kitchener, ON, Canada) contains eight sensors evenly distributed around the circumference. The sEMG signals are wirelessly transmitted to a computer, and real-time data at a sampling rate of 200 Hz can be obtained using the Myo Software Development Kit (SDK)v1.0-2014. For joint-angle collection, a measurement device was developed using 3D printing, as shown in Figure 1c. The length of the device components can be adjusted according to the subject’s arm length to accommodate different individuals. Furthermore, a single-axis angle sensor (JY-ME02-CAN, WIT Motion, Shenzhen, China, as shown in Figure 1b) was equipped at the elbow joint to measure flexion-extension angles. The sampling frequency for this sensor was also set to 200 Hz. In the experiment, the device was fastened to the participant’s arm with an elastic strap. The length of the adjustable components was set by aligning multiple fixed connection holes and sliding slots, ensuring that the angle sensor remained properly aligned with the elbow joint throughout the movement. Once aligned, the length was locked in place with bolts. This adjustment procedure was repeated for each participant.

2.2. Participants

In this study, seven healthy participants (three females and four males, 23–27 years old) are chosen. None of the participants had a history of neurological or muscular disorders. Each participant is assigned a numerical ID from 1 to 7, and some of their information is summarized in Table 1. Before the experiment, all participants were fully introduced to the experimental procedure and signed written informed consent forms. All the experimental procedures were approved by the Institutional Review Board (IRB) of Hefei University of Technology on 19 June 2025, with Protocol No. HFUT20250619001H.

2.3. Data Acquisition

During the 6 h before the collection process, all subjects did not engage in any strenuous exercise. All the subjects were seated and instructed to remain relaxed to avoid muscle tension that could interfere with the sEMG signals. The measurement device was first adjusted to an appropriate length to ensure that the rotational center of the participant’s elbow joint was aligned coaxially with the angle sensor. The MYO armband was worn on the upper arm, with Channel 4 positioned directly over the biceps and Channel 1 aligned with the medial head of the triceps. The specific movement process for each subject are described as follows and are illustrated in Figure 2.

Preparation state: The forearm hangs naturally with the palm facing forward, and the elbow flexion angle is approximately $10^{\circ} - 20^{\circ}$ .
Start mark state: The forearm is extended to $0^{\circ}$ .
Repeated flexion-extension: Starting from the Start mark state, perform the arm flexion and extension k times repeatedly. The maximum flexion angle is approximately 140°–150°; the minimum angle is $0^{\circ}$ (some participants showed a brief hyperextension of the arm, with an actual angle less than $0^{\circ}$ ; however, we still treated it as $0^{\circ}$ , because such a condition does not occur during rehabilitation exercises.).
End mark state: The last forearm extension to $0^{\circ}$ during the repetition process.
Restore to preparation state: The subject relaxes, and the elbow flexion angle is maintained at approximately $10^{\circ} - 20^{\circ}$ .

Each completion of the above experiment process by a subject is defined as 1 record data, with

k = 12

. Therefore, each record data contains 13 times (

k + 1

) mark states. Each subject was required to perform and record 20 such records, with a 1 min rest interval between consecutive experiments to avoid muscle fatigue. It is particularly noted that we did not require the participants to perform strict speed control. Conversely, participants were instructed to perform forearm flexion and extension naturally at a comfortable speed, focusing on the feeling of muscle activation. As a result, the amount of data collected from each participant is not strictly the same.

2.4. Data Trimming

Since the sEMG signals and joint-angle data are acquired independently and are not synchronized, trimming is required. For each record, we implement a Gaussian kernel function fitting on the absolute value of the sEMG signal from one channel (corresponding to the triceps), as shown in Equation (1). Here, K is the number of Gaussian kernel functions, which is set to 13 to match the number of mark states observed in the experiment. Each component is parameterized by its amplitude

A_{k}

, center

μ_{k}

, and standard deviation

σ_{k}

, representing the weight, position, and spread of the k-th Gaussian kernel, respectively. The parameters (

{[A_{k}, μ_{k}, σ_{k}]}_{k = 1}^{K}

) are estimated using a nonlinear least squares fitting method to minimize the error between the fitted curve

{(x_{i}, f (x_{i}))}_{i = 1}^{N}

and the observed data

{(x_{i}, y_{i})}_{i = 1}^{N}

, as shown in Equation (2). Based on the identified start and end marker state index, all irrelevant data before the first start marker and after the last end marker are deleted.

f (x) = \sum_{k = 1}^{K} A_{k} \cdot exp (- \frac{{(x - μ_{k})}^{2}}{2 σ_{k}^{2}})

(1)

min_{{A_{k}, μ_{k}, σ_{k}}} \sum_{i = 1}^{N} {(y_{i} - f (x_{i}))}^{2}

(2)

For angle data processing, we identified the index of 13 local minima. Similar to the sEMG processing, data outside the first and last minima are trimmed to match the sEMG signals. It should be noted that what we have achieved is not a strict temporal synchronization, but rather an alignment in terms of effect. That is, when the forearm was fully extended, the activation of the triceps muscle reached its peak.

2.5. Data Preprocessing

The sEMG signals collected from the MYO armband are internally processed and normalized to the range of [−1, 1], with a built-in 50 Hz notch filter integrated to effectively suppress power line interference. Subsequently, to remove low-frequency noise and baseline drift, a 20 Hz high-pass filter (fourth-order Butterworth) is applied to the signal, thereby preserving high-frequency sEMG features that are more physiologically meaningful. The angle data collected in this study theoretically share the same sampling frequency as the sEMG signals, but they are trimmed. Therefore, to ensure precise alignment between them, we applied interpolation to the angle data, and no additional processing is implemented.

3. Method

3.1. Dataset Construction

Considering the non-stationarity of sEMG signals in the time domain, we do not adopt this approach of relying solely on time-domain features. Instead, STFT is applied to each of the 8 sEMG channels individually to capture the time–frequency characteristics of the signals, as shown in Equations (3) and (4).

X (m, i) = \sum_{n = 0}^{N - 1} x [n + m H] \cdot w [n] \cdot e^{- j \frac{2 π}{N} i n}

(3)

A (m, i) = |X (m, i)|

(4)

The parameters are defined as follows:

x [n]

denotes the raw EMG signal,

w [n]

is the window function, N is the window length (the number of sampling points per frame), and H is the frame shift, representing the sliding step size between adjacent frames. m denotes the time frame index, and i is the frequency index.

X (m, i)

represents the complex spectral coefficient at the i-th frequency in the m-th frame. In this study, the Hanning window is used as the window function, with a window length of 40 data points. To ensure spectral continuity of frequency domain features, a 75% overlap is applied between frames, resulting in a frame shift of 10 data points. We then extract the magnitude of each frequency component after the STFT as the features. And a sequence of 7 consecutive windows is used as the input. Each input has a shape of

[W, C, A]

, where W is the window dimension, C is the channel dimension, and A is the frequency magnitude dimension. The corresponding output is a single angle value aligned with the end timestamp of the input EMG segment.

It should be noted that, although the number of actions collected from each subject is the same, the speed of the actions varied. Therefore, the signal lengths of each subject are different, and, consequently, the sizes of their data sets are also different, as shown in Table 2. All datasets are split into training, validation, and test sets in a ratio of 0.7:0.15:0.15. To eliminate temporal dependencies within the data, a random partitioning strategy is adopted. In addition, we construct a mixed dataset containing data from all subjects, as well as leave-one-subject-out (LOSO) datasets for each subject, to evaluate the proposed method’s performance in both multi-subject and inter-subject scenarios.

3.2. Proposed Model

Considering the significant non-stationarity and distributional differences of sEMG signals across both temporal and channel dimensions, this study proposes a STCCE model for the continuous estimation of upper-limb elbow joint angles. The model fully leverages the time–frequency features of sEMG and the inter-channel correlations, integrating the global modeling capability of the Transformer Encoder architecture with the adaptive properties of attention mechanisms. The overall structure is illustrated in Figure 3, and the main modules are described as follows.

3.2.1. Input Scaling

We visualized the distribution of all input features from the constructed dataset after flattening, as shown in Figure 4. In this figure, the horizontal axis is divided into many equal-width bins, each representing a range of amplitude values of the time–frequency features. The vertical axis represents the ratio of the frequency to bin width. The frequency distribution histogram illustrates that the time–frequency amplitudes of the sEMG signals are highly concentrated within a narrow range around zero. Such numerical imbalance may adversely affect gradient propagation and model optimization, especially in the early training stages. To address this issue, a fixed linear scaling operation is applied to the raw time–frequency features extracted by STFT. This operation preserves the original distribution shape of the features while amplifying their magnitude distribution, thereby enhancing the numerical expressiveness of the inputs. As a result, the Input Scaling facilitates more stable gradient propagation and accelerates model convergence during training.

3.2.2. Per-Channel Temporal Attention Encoder

Considering the relative independence of sEMG signals across different channels, the model first performs separate encoding for each channel. Specifically, for the time-series input of each channel, a linear layer is applied to map the input into a high-dimensional representation space, followed by positional encoding to retain temporal order information. The encoded sequence is then modeled using a stack of Transformer Encoder layers. To more effectively extract information from a key frame, an attention-based temporal weighted pooling mechanism is further introduced to adaptively model the importance of different time frames, thereby compressing the feature into a single-channel.

3.2.3. Cross-Channel Attention Encoder

The high-dimensional representations of all channels obtained in the per-channel temporal attention encoder are concatenated to form a parallel channel feature, which is then put into the second-stage cross-channel attention encoder. This module is designed to model the synergistic relationships among multiple sEMG channels and further extract fused features along the channel dimension. To more precisely control the weight of each channel during the fusion stage, a channel attention pooling mechanism, similar to the temporal one described earlier, is employed. This mechanism adaptively integrates channel features by computing attention-based channel importance weights.

3.2.4. Regression Head

The fused channel features are put into a regression head, which consists of two fully connected layers for nonlinear transformation, ultimately producing a one-dimensional angle estimation output. A ReLU activation function and a dropout mechanism are applied between the two linear layers to prevent overfitting and enhance the model’s generalization capability.

3.3. Implementation and Training

The proposed STCCE network architecture was implemented in a PyTorch 2.7.1-based environment, with all model training and analysis conducted on an NVIDIA RTX 3080 Ti GPU (NVIDIA, Santa Clara, CA, USA). In terms of model design, the Input Scaling is fixed at a value of 50. The Per-Channel Temporal Attention Encoder consists of 2 stacked Transformer Encoder layers, with independent parameters for each channel. Similarly, the Cross-Channel Attention Encoder employs two stacked Transformer encoder layers to enhance inter-channel modeling. Each encoder uses a feature dimension of 512 to improve the model’s representational capacity and nonlinear fitting ability. Additionally, dropout operations with a dropout rate of 0.2 are applied throughout all encoder layers and the MLP regression head to prevent overfitting.

During model training, the network weights are updated using the Adaptive Moment Estimation (Adam) optimizer through backpropagation, with a fixed learning rate of

1 \times 10^{- 5}

and no learning rate decay strategy applied. The exponential decay rates for the first and second moment estimates in Adam were set to

β_{1} = 0.9

and

β_{2} = 0.999

, respectively. The batch size is set to 64, and the number of training epochs is set to 500.

To comprehensively evaluate the performance of the STCCE model, we first construct individualized models for each subject by training separately on their respective datasets in order to assess intra-subject performance. Next, we combine the dataset from all subjects to train a unified model, thereby evaluating the performance in multi-subject setting. Finally, to evaluate the model’s inter-subject generalization capability, we employ a leave-one-subject-out strategy: all data from one subject are reserved as the validation and test sets (1:1), while the data from the other subjects are combined to form the training set.

3.4. Evaluation Metric

To evaluate the proposed method, 4 commonly used regression performance metrics are adopted: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Coefficient of Determination (

R^{2}

), and Pearson Correlation Coefficient (CC), as defined in Equations (5)–(8). MAE and RMSE reflect the overall error magnitude;

R^{2}

shows the performance of regression, with a range of [0, 1], where a value closer to 1 indicate stronger explanatory power of the model. CC quantifies the linear correlation between the predicted and true data, ranging from −1 to 1, where 1 indicates positive correlation, 0 indicates no correlation, and −1 indicates negative correlation.

MAE = \frac{1}{N} \sum_{i = 1}^{N} |y_{i} - x_{i}|

(5)

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - x_{i})}^{2}}

(6)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(x_{i} - y_{i})}^{2}}{\sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2}}

(7)

CC = \frac{\sum_{i = 1}^{N} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2}} \cdot \sqrt{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}}

(8)

where

x_{i}

denotes the ground truth angle of the i-th sample,

y_{i}

denotes the corresponding predicted value,

\bar{x}

and

\bar{y}

represent the mean values of all ground truth and predicted values, respectively, and N denotes the total number of samples in the test set.

4. Results and Discussion

In this section, we present the results of elbow joint angle estimation using the proposed method under three scenarios: single-subject, multi-subject, and inter-subject. In the multi-subject, we further discuss the effect of the Input Scaling module on the training performance of the STCCE. Finally, the results are quantitatively compared and analyzed using evaluation metrics, and the model’s performance is benchmarked against some existing methods.

4.1. Single-Subject

To evaluate the model’s ability to learn the mapping between sEMG signals and elbow joint angles within individual subjects, experiments were first conducted in single-subject training and testing scenarios. Figure 5 illustrates the training and validation loss curves for the seven individual models. The black line represents the training loss, while the dark blue line represents the validation loss. It can be observed that the model converges rapidly without signs of overfitting. Figure 6 illustrates the variation trends of the MAE and RMSE on the validation set during training for individually trained models corresponding to 7 subjects. It can be observed that the errors for all the subjects decrease rapidly in the early training stages and gradually stabilize. Although there are slight differences in initial error levels and convergence speeds among subjects, both MAE and RMSE remain at low levels.

Table 3 shows the evaluation results under this setting. The seven subjects achieved a MAE of

2.96 \pm {0.24}^{\circ}

, RMSE of

4.41 \pm {0.45}^{\circ}

,

R^{2}

of

0.9924 \pm 0.0020

, and CC of

0.9963 \pm 0.0010

on the test set. These demonstrate that the proposed STCCE model achieves high estimation accuracy and consistency in the single-subject scenario.

4.2. Multi-Subject

To further evaluate the model’s performance in multi-subject scenarios, training experiment using data from multiple subjects was conducted. Similar to the single-subject training process, both training and validation loss converge rapidly, and the validation error eventually stabilized without signs of overfitting, as shown in Figure 7. Table 4 presents the evaluation metrics under the multi-subject training scenario, with a MAE of

{3.30}^{\circ}

, RMSE of

{4.75}^{\circ}

,

R^{2}

of 0.9915, and CC of 0.9962 on the test set. The results are similar to those obtained in the single-subject scenario, indicating that the proposed method maintains high estimation accuracy even in multi-subject settings. To further visualize the elbow angle estimation performance, a segment of continuous data are randomly selected for comparison, as shown in Figure 8. It can be observed that the estimated values closely match the ground truth, demonstrating the model’s excellent accuracy and stability in angle estimation.

To evaluate the effectiveness of the Input Scaling module during training, this section compares the training and validation loss with and without the module. As shown in Figure 9, the model incorporating the Input Scaling converges more rapidly in the early stages of training, exhibits more stable validation loss throughout, and ultimately achieves a lower final loss than the model without the module. Obviously, in the later stages of training, the model without Input Scaling continues to show a decreasing training loss, while the validation loss has already converged, resulting in an increasing gap between the two. In contrast, the model with Input Scaling maintains consistent trends in both training and validation losses, with both converging to the lower values. These findings indicate that the Input Scaling module not only accelerates convergence, but also improves the model’s regression performance and enhances generalization capability.

4.3. Inter-Subject

In the inter-subject scenario, the LOSO cross-validation strategy is adopted, where data from one subject are entirely excluded for validation and testing in each round, while the remaining subjects’ data are used for training. This setting is intended to evaluate the model’s generalization ability to unseen individuals. To avoid overfitting, an early stopping mechanism was employed, and the number of training epochs was limited to fewer than 15.

Figure 10 presents the test results of the model corresponding to L3. As shown in Table 5, compared to the single-subject and multi-subject tasks, the estimation accuracy in the cross-subject scenario declined, with an average MAE of

15.62 \pm {1.73}^{\circ}

, RMSE of

21.79 \pm {2.76}^{\circ}

,

R^{2}

of

0.8128 \pm 0.0532

, and CC of

0.9114 \pm 0.0317

on the test set. Specifically, RMSE and MAE increased significantly, and

R^{2}

and CC exhibited greater fluctuations. These results indicate that substantial differences exist in the distribution of sEMG signals across subjects, and the current model has not yet fully captured the common representations of inter-subjects within the time–frequency feature space.

The notable decline in inter-subject performance is primarily attributed to the differences in sEMG signals across individuals, including variations in amplitude distribution, time–frequency patterns, and movement execution. These discrepancies cause the model to learn subject-dependent features during training, which limits its generalization ability when tested on unseen individuals. To address this issue, future work may consider incorporating strategies with stronger generalization capabilities to enhance model robustness to unseen subjects. On one hand, normalization methods (e.g., z-score normalization) can help reduce inter-subject amplitude variance. On the other hand, domain adaptation techniques (e.g., Transfer Component Analysis (TCA) [47], Domain-Adversarial Neural Network (DANN) [48]) may be employed to align feature distributions across subjects.

4.4. Compared to Other Methods

To enable a rigorous and consistent comparison, we reimplemented and evaluated the commonly used LSTM and BiLSTM models on our dataset using identical input features and evaluation protocols. Table 6 summarizes the average performance under three experimental scenarios: single-subject, multi-subject, and inter-subject estimation. All methods were trained and tested under the same settings.

As shown in Table 6, the proposed STCCE model achieves the best overall performance across all evaluation metrics, including MAE, RMSE,

R^{2}

, and CC, under the three scenarios. In the single-subject and multi-subject settings, STCCE exhibits significant improvements over LSTM and BiLSTM, with notably lower estimation errors and stronger consistency with the ground truth, reflecting superior accuracy and stability. Although the inter-subject scenario presents greater challenges due to individual variability, STCCE still maintains a consistent advantage, outperforming the baselines across all metrics. However, the margin of improvement is less pronounced in this case, suggesting that, while the model generalizes well across subjects, inter-subject differences remain a limiting factor. These results collectively confirm the robustness and generalization ability of STCCE across diverse experimental conditions.

To further verify the statistical significance of the performance improvement, we conducted paired t-tests between STCCE and the baseline models based on MAE, as shown in Table 7. It is worth noting that, in the single-subject and inter-subject scenarios, one MAE value is computed per model for each subject, and these paired values are used to perform the t-test. In contrast, the multi-subject results are obtained through five-fold cross-validation, and the averaged MAE of each fold is used as a sample, resulting in five paired values per comparison. The results demonstrate that STCCE significantly outperforms both LSTM and BiLSTM across all three evaluation scenarios. Specifically, in the single-subject and multi-subject settings, extremely high t-statistics (

T > 15

) and p-values below

1 \times 10^{- 5}

indicate strong statistical significance, even in the more challenging inter-subject scenario, where individual variability is higher. The improvements in STCCE over LSTM (

T = 3.718

,

p = 0.00494

) and BiLSTM (

T = 3.73

,

p = 0.00487

) remain statistically significant. These statistical results further show the reliability and robustness of the proposed STCCE model, confirming its superiority over the baselines under various testing conditions.

5. Conclusions

This study focuses on the continuous estimation of upper-limb elbow joint angles driven by sEMG signals and proposes a regression method based on time–frequency domain features. To address the non-stationarity and multi-channel distribution variability of sEMG signals, STFT is employed to extract time–frequency features. A multi-scale encoder network, termed STCCE, is designed by integrating temporal and channel attention mechanisms. The model includes a per-channel temporal modeling module and a cross-channel modeling module, enabling effective feature fusion across both temporal and spatial dimensions. Additionally, an input scaling module is introduced to enhance training efficiency and generalization performance.

Extensive experiments are conducted on a dataset comprising over 100,000 samples collected from seven subjects. The proposed method demonstrated strong performance in three representative scenarios: single-subject, multi-subject, and inter-subject. In the single-subject setting, the model achieved an average MAE of

{2.96}^{\circ}

, RMSE of

{4.41}^{\circ}

,

R^{2}

of 0.9924, and CC of 0.9963. In the multi-subject training scenario, the model maintained high estimation accuracy, achieving a MAE of

{3.30}^{\circ}

, RMSE of

{4.75}^{\circ}

,

R^{2}

of 0.9915, and CC of 0.9962. Although the estimation accuracy declined in the inter-subject scenario due to individual variability, the model still outperformed some existing methods, with an average MAE of

{15.62}^{\circ}

, RMSE of

{21.79}^{\circ}

,

R^{2}

of 0.8128, and CC of 0.9114.

Although the proposed model demonstrates high accuracy under controlled conditions, there are some limitations regarding its real-world applicability. Since sEMG signals are sensitive to electrode placement, deviations in electrode positioning may lead to distribution shifts in the input features, thereby affecting the model’s performance. While the elastic design of the armband helps maintain a relatively consistent sensor layout, small variations in orientation or muscle contact may still introduce signal variability. These factors highlight the need for further robustness studies and adaptation mechanisms to ensure stable performance across different wearing conditions.

Moreover, inter-subject generalization remains a significant challenge. Differences in physiological factors, such as muscle geometry and skin impedance, can lead to substantial variability in sEMG patterns across subjects. These factors contribute to the performance gap observed when applying a model trained on some individuals to unseen ones. To mitigate this issue, potential solutions include transfer learning approaches that fine-tune the model using a small amount of data from the target subject and subject normalization techniques that reduce inter-subject variability at the feature level. In addition, lightweight calibration procedures or pretraining on large, diverse subject datasets may help improve robustness and facilitate practical deployment in real-world rehabilitation scenarios.

In conclusion, the proposed method effectively fuses the time–frequency feature of sEMG signals and achieves continuous estimation of elbow joint angles. This contributes to more accurate and natural motion intention perception in upper-limb rehabilitation systems. Future research will aim to address the challenge of inter-subject generalization by exploring a range of learning frameworks, such as transfer learning, meta-learning, and domain generalization. Our approach will consider both data- and model-level strategies: at the data level, we will investigate normalization and alignment methods to reduce individual differences; and at the model level, we will explore techniques that promote the learning of subject-invariant and generalizable representations.

Author Contributions

Conceptualization, X.H. and P.Z.; methodology, X.H. and H.C.; software, X.H.; validation, X.H. and X.C.; investigation, X.H. and X.C.; data curation, X.C.; writing—original draft preparation, X.H.; writing—review and editing, X.H. and P.Z.; visualization, X.H.; supervision, P.Z. and H.C.; funding acquisition, P.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by the Fundamental Research Funds for the Central Universities of China (Grant No. PA2025GDSK0060)—Anhui Province Key Laboratory of Digital Design and Manufacturing. All findings and results presented in this paper are by those of the authors and do not represent the funding agencies.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and was approved by the Institutional Review Board of Hefei University of Technology (Protocol No. HFUT20250619001H from 19 June 2025).

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study. Written informed consent has been obtained from the patient(s) to publish this paper.

Data Availability Statement

The data are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article.

Abbreviations

The following abbreviations are used in this manuscript:

sEMG	Surface electromyographic
STFT	Short-Time Fourier Transform
STCCE	Scale Temporal-Channel Cross Encoder
ZC	Zero Crossing
MAV	Mean Absolute Value
WL	Waveform Length
SSC	Slope Sign Changes
DASDV	Absolute Standard Deviation Value
IEMG	Integrated EMG
VMD	Variational Mode Decomposition
WPT	Wavelet Packet Transform
DWT	Wavelet Transform
WOA	Whale Optimization Algorithm
SVR	Support Vector Regression
ANNs	Artificial Neural Networks
LSTM	Long Short-Term Memory
BiLSTM	Bidirectional LSTM
CNNs	Convolutional Neural Networks
TS-CNN	Two-stream multi-scale Convolutional Neural Network
SDK	Myo Software Development Kit
RMSE	Mean Square Error
$R^{2}$	Coefficient of Determination
CC	Pearson Correlation Coefficient
TCA	Transfer Component Analysis
DANN	Domain-Adversarial Neural Network

References

Feigin, V.L.; Brainin, M.; Norrving, B.; Martins, S.; Sacco, R.L.; Hacke, W.; Fisher, M.; Pandian, J.; Lindsay, P. World Stroke Organization (WSO): Global Stroke Fact Sheet 2022. Int. J. Stroke 2022, 17, 18–29. [Google Scholar] [CrossRef]
Ersoy, C.; Iyigun, G. Boxing Training in Patients with Stroke Causes Improvement of Upper Extremity, Balance, and Cognitive Functions but Should It Be Applied as Virtual or Real? Top. Stroke Rehabil. 2021, 28, 112–126. [Google Scholar] [CrossRef]
Anwer, S.; Waris, A.; Gilani, S.O.; Iqbal, J.; Shaikh, N.; Pujari, A.N.; Niazi, I.K. Rehabilitation of Upper Limb Motor Impairment in Stroke: A Narrative Review on the Prevalence, Risk Factors, and Economic Statistics of Stroke and State of the Art Therapies. Healthcare 2022, 10, 190. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Zhao, P.; Li, X.; Zhang, L.; Zhou, Y.; Wang, S. Design of MMSD Six-Bar Rehab Device toward the Realization of Multiple Gait Trajectories with One Adjustable Parameter. IEEE/ASME Trans. Mechatron. 2024, 29, 4309–4319. [Google Scholar] [CrossRef]
Song, W.; Zhao, P.; Li, X.; Zhang, Y.; Wang, S. Data-Driven Design of a Six-Bar Lower-Limb Rehabilitation Mechanism Based on Gait Trajectory Prediction. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 31, 109–118. [Google Scholar] [CrossRef] [PubMed]
Zhao, P.; Zhang, Y.; Guan, H.; Li, X.; Wang, S. Design of a Single-Degree-of-Freedom Immersive Rehabilitation Device for Clustered Upper-Limb Motion. J. Mech. Robot. 2021, 13, 031006. [Google Scholar] [CrossRef]
Zhao, P.; Zhu, L.; Zi, B.; Zhang, Y.; Wang, S. Design of Planar 1-DOF Cam-Linkages for Lower-Limb Rehabilitation via Kinematic-Mapping Motion Synthesis Framework. J. Mech. Robot. 2019, 11, 041006. [Google Scholar] [CrossRef]
Chen, H.; Zhu, H.; Teng, Z.; Xie, L.; Song, A. Design of a Robotic Rehabilitation System for Mild Cognitive Impairment Based on Computer Vision. J. Eng. Sci. Med. Diagn. Ther. 2020, 3, 021108. [Google Scholar] [CrossRef]
Inoue, Y.; Kuroda, Y.; Yamanoi, Y.; Okajima, Y.; Tsuji, T. Development of Wrist Separated Exoskeleton Socket of Myoelectric Prosthesis Hand for Symbrachydactyly. Cyborg Bionic Syst. 2024, 5, 0141. [Google Scholar] [CrossRef]
Kuroda, Y.; Yamanoi, Y.; Jiang, H.; Inoue, Y.; Tsuji, T. Toward Cyborg: Exploring Long-Term Clinical Outcomes of a Multi-Degree-of-Freedom Myoelectric Prosthetic Hand. Cyborg Bionic Syst. 2025, 6, 0195. [Google Scholar] [CrossRef]
Hu, K.; Ma, Z.; Zou, S.; Zhu, Y.; Tao, B.; Zhang, D. Impedance Sliding-Mode Control Based on Stiffness Scheduling for Rehabilitation Robot Systems. Cyborg Bionic Syst. 2024, 5, 0099. [Google Scholar] [CrossRef]
Chen, W.; Song, W.; Chen, H.; Xie, L.; Wang, S. Motion Synthesis for Upper-Limb Rehabilitation Motion with Clustering-Based Machine Learning Method. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition, Salt Lake City, UT, USA, 8–14 November 2019; American Society of Mechanical Engineers: New York, NY, USA, 2019; Volume 59407, p. V003T04A066. [Google Scholar]
Qassim, H.M.; Wan Hasan, W.Z. A Review on Upper Limb Rehabilitation Robots. Appl. Sci. 2020, 10, 6976. [Google Scholar] [CrossRef]
Colombo, R.; Pisano, F.; Micera, S.; Mazzone, A.; Delconte, C.; Carrozza, M.C.; Dario, P.; Minuco, G. Robotic Techniques for Upper Limb Evaluation and Rehabilitation of Stroke Patients. IEEE Trans. Neural Syst. Rehabil. Eng. 2005, 13, 311–324. [Google Scholar] [CrossRef] [PubMed]
Ghai, S.; Ghai, I.; Lamontagne, A. Virtual Reality Training Enhances Gait Poststroke: A Systematic Review and Meta-Analysis. Ann. N. Y. Acad. Sci. 2020, 1478, 18–42. [Google Scholar] [CrossRef] [PubMed]
Ghai, S.; Ghai, I. Effects of (Music-Based) Rhythmic Auditory Cueing Training on Gait and Posture Post-Stroke: A Systematic Review & Dose-Response Meta-Analysis. Sci. Rep. 2019, 9, 2183. [Google Scholar]
Zhang, T.; Sun, H.; Zou, Y. An Electromyography Signals-Based Human-Robot Collaboration System for Human Motion Intention Recognition and Realization. Robot. Comput.-Integr. Manuf. 2022, 77, 102359. [Google Scholar] [CrossRef]
Zhang, X.; Qu, Y.; Zhang, G.; Wang, Z.; Chen, C.; Xu, X. Review of sEMG for Exoskeleton Robots: Motion Intention Recognition Techniques and Applications. Sensors 2025, 25, 2448. [Google Scholar] [CrossRef]
Khairuddin, I.M.; Sidek, S.N.; Majeed, A.P.P.A.; Razman, M.A.M.; Puzi, A.A.; Yusof, H.M. The Classification of Movement Intention through Machine Learning Models: The Identification of Significant Time-Domain EMG Features. PeerJ Comput. Sci. 2021, 7, e379. [Google Scholar] [CrossRef]
Li, Z.Y.; Zhao, X.G.; Zhang, B.; Ding, Q.C.; Zhang, D.H.; Han, J.D. Review of sEMG-Based Motion Intent Recognition Methods in Non-Ideal Conditions. Acta Autom. Sin. 2021, 47, 955–969. [Google Scholar]
Li, L.L.; Cao, G.Z.; Liang, H.J.; Zhang, Y.P.; Cui, F. Human Lower Limb Motion Intention Recognition for Exoskeletons: A Review. IEEE Sens. J. 2023, 23, 30007–30036. [Google Scholar] [CrossRef]
Liu, H.; Tao, J.; Lyu, P.; Tian, F. Human-Robot Cooperative Control Based on sEMG for the Upper Limb Exoskeleton Robot. Robot. Auton. Syst. 2020, 125, 103350. [Google Scholar] [CrossRef]
Kiguchi, K.; Hayashi, Y. An EMG-Based Control for an Upper-Limb Power-Assist Exoskeleton Robot. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2012, 42, 1064–1071. [Google Scholar] [CrossRef]
Aung, Y.M.; Al-Jumaily, A. Estimation of Upper Limb Joint Angle Using Surface EMG Signal. Int. J. Adv. Robot. Syst. 2013, 10, 369. [Google Scholar] [CrossRef]
Ding, Z.; Yang, C.; Tian, Z.; Yi, C.; Fu, Y.; Jiang, F. sEMG-Based Gesture Recognition with Convolution Neural Networks. Sustainability 2018, 10, 1865. [Google Scholar] [CrossRef]
Zhang, L.; Liu, G.; Han, B.; Wang, Z.; Zhang, T. sEMG-Based Human Motion Intention Recognition. J. Robot. 2019, 2019, 3679174. [Google Scholar] [CrossRef]
Wei, W.; Wong, Y.; Du, Y.; Hu, Y.; Kankanhalli, M.; Geng, W. A Multi-Stream Convolutional Neural Network for sEMG-Based Gesture Recognition in Muscle-Computer Interface. Pattern Recognit. Lett. 2019, 119, 131–138. [Google Scholar] [CrossRef]
Wei, Z.; Zhang, Z.Q.; Xie, S.Q. Continuous Motion Intention Prediction Using sEMG for Upper-Limb Rehabilitation: A Systematic Review of Model-Based and Model-Free Approaches. IEEE Trans. Neural Syst. Rehabil. Eng. 2024, 32, 1487–1504. [Google Scholar] [CrossRef] [PubMed]
Xiao, F.; Wang, Y.; Gao, Y.; Zhu, Y.; Zhao, J. Continuous Estimation of Joint Angle from Electromyography Using Multiple Time-Delayed Features and Random Forests. Biomed. Signal Process. Control 2018, 39, 303–311. [Google Scholar] [CrossRef]
Raj, R.; Sivanandan, K.S. Comparative Study on Estimation of Elbow Kinematics Based on EMG Time Domain Parameters Using Neural Network and ANFIS NARX Model. J. Intell. Fuzzy Syst. 2017, 32, 791–805. [Google Scholar] [CrossRef]
Karheily, S.; Moukadem, A.; Courbot, J.B.; Abdeslam, D.O. sEMG time–frequency features for hand movements classification. Expert Syst. Appl. 2022, 210, 118282. [Google Scholar] [CrossRef]
Adzkia, M.; Setiawan, A.W.; Arland, F. Comparation Classification of EMG Signals in the Time Domain and Time-Frequency Domain. In Proceedings of the 2023 International Conference on Electrical Engineering and Informatics (ICEEI), Bandung, Indonesia, 10–11 October 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
Wen, L.; Xu, J.; Li, D.; Pei, X.; Wang, J. Continuous Estimation of Upper Limb Joint Angle from sEMG Based on Multiple Decomposition Feature and BiLSTM Network. Biomed. Signal Process. Control 2023, 80, 104303. [Google Scholar] [CrossRef]
Alazrai, R.; Alabed, D.; Alnuman, N.; Khalifeh, A.; Mowafi, Y. Continuous Estimation of Hand’s Joint Angles from sEMG Using Wavelet-Based Features and SVR. In Proceedings of the 4th Workshop on ICTs for Improving Patients Rehabilitation Research Techniques, Lisbon, Portugal, 13–14 October 2016; pp. 65–68. [Google Scholar]
Jiang, H.; Yamanoi, Y.; Chen, P.; Wang, X.; Chen, S.; Xu, Y.; Li, G.; Yokoi, H.; Jing, X. TF2AngleNet: Continuous Finger Joint Angle Estimation Based on Multidimensional Time–Frequency Features of sEMG Signals. Biomed. Signal Process. Control 2025, 107, 107833. [Google Scholar] [CrossRef]
Zhang, L.; Wang, J.; Liu, J.; Chen, W. Estimation of Joint Angle Using sEMG Based on WOA-SVR Algorithm. In Proceedings of the 2023 IEEE 18th Conference on Industrial Electronics and Applications (ICIEA), Ningbo, China, 18–22 August 2023; IEEE: New York, NY, USA, 2023; pp. 1674–1679. [Google Scholar]
Aung, Y.M.; Al-Jumaily, A. sEMG Based ANN for Shoulder Angle Prediction. Procedia Eng. 2012, 41, 1009–1015. [Google Scholar] [CrossRef]
Li, D.; Zhang, Y. Artificial Neural Network Prediction of Angle Based on Surface Electromyography. In Proceedings of the 2011 International Conference on Control, Automation and Systems Engineering (CASE), Singapore, 30–31 July 2011; IEEE: New York, NY, USA, 2011; pp. 1–3. [Google Scholar]
Graves, A. Long Short-Term Memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar]
Graves, A.; Schmidhuber, J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
Ruan, Z.; Ai, Q.; Chen, K.; Ma, L.; Liu, Q.; Meng, W. Simultaneous and Continuous Motion Estimation of Upper Limb Based on sEMG and LSTM. In Proceedings of the 14th International Conference on Intelligent Robotics and Applications (ICIRA 2021), Yantai, China, 22–25 October 2021; Springer: Cham, Switzerland, 2021. Part I. pp. 313–324. [Google Scholar]
Ma, C.; Lin, C.; Samuel, O.W.; Guo, W.; Zhang, H.; Greenwald, S.; Xu, L.; Li, G. A Bi-Directional LSTM Network for Estimating Continuous Upper Limb Movement from Surface Electromyography. IEEE Robot. Autom. Lett. 2021, 6, 7217–7224. [Google Scholar] [CrossRef]
Hajian, G.; Morin, E. Deep Multi-Scale Fusion of Convolutional Neural Networks for EMG-Based Movement Estimation. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 30, 486–495. [Google Scholar] [CrossRef]
Silva-Acosta, V.C.; Román-Godínez, I.; Torres-Ramos, S.; Salido-Ruiz, R.A. Automatic Estimation of Continuous Elbow Flexion–Extension Movement Based on Electromyographic and Electroencephalographic Signals. Biomed. Signal Process. Control 2021, 70, 102950. [Google Scholar] [CrossRef]
Li, H.; Guo, S.; Wang, H.; Bu, D. Subject-Independent Continuous Estimation of sEMG-Based Joint Angles Using Both Multisource Domain Adaptation and BP Neural Network. IEEE Trans. Instrum. Meas. 2022, 72, 1–10. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain Adaptation via Transfer Component Analysis. IEEE Trans. Neural Netw. 2010, 22, 199–210. [Google Scholar] [CrossRef]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-Adversarial Training of Neural Networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]

Figure 1. Experiment platform setting: (a) Schematic of data acquisition. (b) Myo armband and single-axis angle sensor. (c) Angle measurement device.

Figure 2. Movement process for acquiring data.

Figure 3. Overview of proposed STCCE model.

Figure 4. Distribution of STFT feature values before scaling.

Figure 5. Single-subject model training and validation loss.

Figure 6. Single-subject model validation MAE and RMSE during training.

Figure 7. Training and validation performance in multi-subject scenarios.

Figure 8. Comparison between ground truth and estimation values.

Figure 9. Effect of the Input Scaling on model training and validation loss. Input Scaling: multiply the input data by a fixed scale.

Figure 10. L3 inter-subject angle estimation results.

Table 1. Information of the subjects.

Subjects	Gender	Age	Height (m)	Weight (kg)
1	male	27	1.86	72
2	male	23	1.80	68
3	male	24	1.68	68
4	male	26	1.75	70
5	female	27	1.70	58
6	female	26	1.63	55
7	female	24	1.64	56

Table 2. The size of datasets for each subject.

Subjects	Training Set (70%)	Validation Set (15%)	Test Set (15%)	Shape
1	11,089	2376	2377	Input: [7, 8, 17] Output: [1]
2	8354	1790	1791
3	9696	2078	2078
4	10,371	2222	2223
5	9695	2078	2078
6	10,977	2352	2353
7	12,019	2576	2576

Table 3. Results of the single-subject test set.

Subjects	MAE	RMSE	$R^{2}$	CC
1	${2.85}^{\circ}$	${4.55}^{\circ}$	0.9928	0.9965
2	${2.88}^{\circ}$	${4.07}^{\circ}$	0.9926	0.9964
3	${2.99}^{\circ}$	${4.64}^{\circ}$	0.9910	0.9956
4	${3.51}^{\circ}$	${5.33}^{\circ}$	0.9884	0.9942
5	${2.77}^{\circ}$	${3.95}^{\circ}$	0.9933	0.9968
6	${2.94}^{\circ}$	${4.36}^{\circ}$	0.9933	0.9969
7	${2.79}^{\circ}$	${3.99}^{\circ}$	0.9951	0.9977

Table 4. Results of the multi-subject test set.

Subjects	MAE	RMSE	$R^{2}$	CC
Multi-subject	${3.30}^{\circ}$	${4.75}^{\circ}$	0.9915	0.9962

Table 5. Results of the inter-subject test set.

Subjects	MAE	RMSE	$R^{2}$	CC
L1	${14.69}^{\circ}$	${21.45}^{\circ}$	0.8397	0.9178
L2	${13.85}^{\circ}$	${18.20}^{\circ}$	0.8542	0.9314
L3	${13.64}^{\circ}$	${19.38}^{\circ}$	0.8445	0.9408
L4	${16.99}^{\circ}$	${24.33}^{\circ}$	0.7572	0.8909
L5	${17.51}^{\circ}$	${25.13}^{\circ}$	0.7315	0.8569
L6	${18.14}^{\circ}$	${24.97}^{\circ}$	0.7799	0.8861
L7	${13.87}^{\circ}$	${18.55}^{\circ}$	0.8915	0.9464

Li means leaving out the dataset of subject i (i = 1–7) for validation and testing.

Table 6. Comparison of estimation performance with other methods.

Research	Method	Scenarios	MAE	RMSE	$R^{2}$	CC
[44]	LSTM	Single-subject	${10.40}^{\circ}$	${14.08}^{\circ}$	0.9221	0.9600
		Multi-subject	${10.54}^{\circ}$	${14.94}^{\circ}$	0.9165	0.9574
		Inter-subject	${18.47}^{\circ}$	${23.79}^{\circ}$	0.8046	0.8951
[33]	BiLSTM	Single-subject	${10.41}^{\circ}$	${14.06}^{\circ}$	0.9219	0.9601
		Multi-subject	${10.64}^{\circ}$	${15.03}^{\circ}$	0.9154	0.9568
		Inter-subject	${18.48}^{\circ}$	${23.76}^{\circ}$	0.7797	0.8964
This paper	STCCE	Single-subject	${2.96}^{\circ}$	${4.41}^{\circ}$	0.9924	0.9963
		Multi-subject	${3.30}^{\circ}$	${4.75}^{\circ}$	0.9915	0.9962
		Inter-subject	${15.53}^{\circ}$	${21.72}^{\circ}$	0.8141	0.9100

Table 7. Paired t-test results based on MAE.

Methods	Scenarios	T-Statistic	p-Value
LSTM vs. STCCE	Single-subject	$15.41$	$2.36 \times 10^{- 6}$
	Multi-subject	$87.02$	$5.23 \times 10^{- 8}$
	Inter-subject	$3.718$	$0.00494$
BiLSTM vs. STCCE	Single-subject	$15.88$	$1.98 \times 10^{- 6}$
	Multi-subject	$158.93$	$4.70 \times 10^{- 9}$
	Inter-subject	$3.73$	$0.00487$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, X.; Chen, H.; Cheng, X.; Zhao, P. Continuous Estimation of sEMG-Based Upper-Limb Joint Angles in the Time–Frequency Domain Using a Scale Temporal–Channel Cross-Encoder. Actuators 2025, 14, 378. https://doi.org/10.3390/act14080378

AMA Style

Han X, Chen H, Cheng X, Zhao P. Continuous Estimation of sEMG-Based Upper-Limb Joint Angles in the Time–Frequency Domain Using a Scale Temporal–Channel Cross-Encoder. Actuators. 2025; 14(8):378. https://doi.org/10.3390/act14080378

Chicago/Turabian Style

Han, Xu, Haodong Chen, Xinyu Cheng, and Ping Zhao. 2025. "Continuous Estimation of sEMG-Based Upper-Limb Joint Angles in the Time–Frequency Domain Using a Scale Temporal–Channel Cross-Encoder" Actuators 14, no. 8: 378. https://doi.org/10.3390/act14080378

APA Style

Han, X., Chen, H., Cheng, X., & Zhao, P. (2025). Continuous Estimation of sEMG-Based Upper-Limb Joint Angles in the Time–Frequency Domain Using a Scale Temporal–Channel Cross-Encoder. Actuators, 14(8), 378. https://doi.org/10.3390/act14080378

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Continuous Estimation of sEMG-Based Upper-Limb Joint Angles in the Time–Frequency Domain Using a Scale Temporal–Channel Cross-Encoder

Abstract

1. Introduction

2. Experiment Setup and Data Collection

2.1. Experiment Platform

2.2. Participants

2.3. Data Acquisition

2.4. Data Trimming

2.5. Data Preprocessing

3. Method

3.1. Dataset Construction

3.2. Proposed Model

3.2.1. Input Scaling

3.2.2. Per-Channel Temporal Attention Encoder

3.2.3. Cross-Channel Attention Encoder

3.2.4. Regression Head

3.3. Implementation and Training

3.4. Evaluation Metric

4. Results and Discussion

4.1. Single-Subject

4.2. Multi-Subject

4.3. Inter-Subject

4.4. Compared to Other Methods

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI