1. Introduction
In biomedical instrumentation, electromyography has been widely used to control myoelectric prostheses due to the muscle information they contain [
1,
2]. Given the success of these applications in prosthetics, there is an opportunity to extend the use of EMG signals to other assistive devices, such as muscle–computer interfaces.
Therefore, the acquisition of muscle signals and the use of autoencoders to reconstruct them, followed by classification with recurrent neural networks, appear to be a promising approach for developing muscle–computer interfaces.
Various studies have developed algorithms capable of interpreting these signals and converting them into precise commands for prosthetic manipulation [
3,
4].
The electromyography (EMG) technique records electrical activity during muscle contraction and produces EMG signals [
5], which can be processed to identify patterns and thus classify data [
6]. Furthermore, the use of autoencoders enables a compact, low-dimensional representation of EMG signals via a latent vector [
7]. This representation reduces data and improves the model’s accuracy [
8]. By integrating these approaches, neural networks can learn to classify EMG signals more efficiently and use them in assistive devices with muscle–computer control.
Therefore, it is essential to develop a robust system for acquiring, filtering, processing, and classifying EMG signals that, together with a standardized acquisition protocol, guarantees high precision in the identification of muscle movements and their translation into control commands, thus ensuring accurate and reliable control. The correct classification of signals is essential to ensure efficient, safe control that adapts to the user’s needs, enabling them to operate the assistive device.
In contrast to traditional EMG classification methods that require manually engineered features, the proposed approach uses an autoencoder to learn latent representations directly from the signal, reducing reliance on feature design and preserving key morphological information. However, the core challenge in EMG-based gesture classification lies in the high variability of EMG signals across users and recording conditions, which limits the robustness and generalization of existing systems. Traditional approaches relying on handcrafted features or direct classification of raw signals often struggle to capture consistent representations under such variability.
To address this challenge, the proposed framework employs an autoencoder-based approach that learns compact latent representations of EMG signals in an unsupervised manner. By enforcing signal reconstruction prior to classification, the model is encouraged to preserve physiologically meaningful temporal and amplitude-related characteristics, while reducing redundancy and noise that negatively affect classification performance.
The contributions of this work are as follows:
A public EMG dataset acquired from 20 healthy participants using a low-complexity setup with three surface EMG sensors and a standardized acquisition protocol.
An autoencoder-based framework that performs EMG signal reconstruction and automatic latent feature extraction, reducing dependence on manually engineered features commonly used in traditional approaches.
A comparative analysis between a general autoencoder and sensor-specific autoencoders, showing that per-sensor models significantly improve reconstruction fidelity.
An evaluation of subject-wise and user-adaptive data partitioning strategies, clarifying the trade-off between cross-subject generalization and personalized EMG-based control systems.
The document is structured as follows.
Section 2 reviews previous research addressing issues. The
Section 3 reviews the theoretical foundation. The implemented methodology is detailed in
Section 4.
Section 5 details the experiments and their results, while
Section 6 provides an in-depth analysis of the findings. Finally, the study is concluded in
Section 7.
2. Related Works
Various methodologies have been presented for the acquisition and classification of EMG signals, varying in the muscles involved, electrode type, filters, extracted features, and neural network models used.
In [
9], a bidirectional LSTM-based recurrent network was trained to classify 18 hand gestures from preprocessed EMG signals using envelope features. The samples of each hand movement were divided into training (50%), validation (16.7%), and test (33.3%) sets. The test accuracy was 86.7% using 12 sensors.
Later, Ref. [
10] proposed an ANN to classify the movements of seven fingers to obtain the EMG signal of the superficial flexor muscle of the fingers with an accuracy of 95.52%, using 85% of the data for the training set and 15% for testing. Although it proved effective with a simple architecture, the database was limited to six participants and a single sensor, limiting the model’s generalization.
Subsequently, in ref. [
11], FFNN, LSTM, and GRU models were compared using two databases: DualMyo (1 subject, 8 gestures) and NinaPro DB5 (10 participants, 8 gestures). Accuracies of 96.6% and 90.8%, respectively, were achieved. Although recurrent models offered shorter training times, the need for optimization for implementation in embedded systems was noted, as was the need to expand the database with more participants.
Later, in ref. [
12], RNN variants were applied to forearm EMG signals, achieving up to 99.6% accuracy. However, the experiments were conducted on two databases, each using eight sensors in the forearm to record seven and four gestures, respectively. The data were split into 90% for training and 10% for validation. The number of participants was not reported, yet a 99.6% accuracy was claimed.
A proposal applied to the control of a hand prosthesis is presented in [
3], where four surface electrodes and an LSTM-RNN network were used to recognize five hand gestures from a single subject. Achieving an accuracy of 90.7% in validation and 87.2% in testing.
The overview of the related works is presented in
Table 1.
Recent advances in biomedical signal processing have expanded beyond traditional recurrent neural network approaches, incorporating improved autoencoder variants and modern fusion strategies to enhance classification robustness and feature extraction. For instance, semi-supervised and denoising autoencoder models have been applied to sEMG gesture recognition, achieving greater noise resilience and improved classification performance by combining latent representations with supervised learning frameworks [
13]. Similarly, variational autoencoder approaches have demonstrated the ability to generate structured latent spaces that improve generalization to unseen motion patterns, thereby addressing one of the major challenges in EMG classification [
14].
Studies have explored more advanced deep learning architectures for surface EMG analysis. Ref. [
15] proposed a transformer-based framework that combines temporal modeling and feature fusion for hand gesture recognition using sEMG signals, reporting improved performance compared to conventional recurrent models, particularly when large training datasets are available. This work highlights the potential of attention mechanisms for capturing long-range temporal dependencies in EMG signals. Similarly, Ref. [
16] investigated the use of vision transformers for hand gesture recognition by transforming high-density sEMG signals into image-like representations. Their approach demonstrates that transformer-based architectures can effectively model spatial–temporal patterns in sEMG data, although it relies on high-density sensor configurations and increased computational complexity. In addition, ref. [
17] presented a comprehensive review of deep learning techniques applied to EMG-based human–machine interaction. The authors analyzed convolutional, recurrent, and hybrid architectures, emphasizing that while deep models achieve high accuracy, challenges such as inter-subject variability, data requirements, and real-time deployment remain open issues.
In contrast to these approaches, the present study focuses on autoencoder-based latent feature extraction combined with recurrent models, aiming to reduce feature engineering requirements and sensor complexity while maintaining robust performance under realistic acquisition conditions.
However, the present study focuses specifically on recurrent neural networks, namely LSTM and GRU architectures. These models were selected because they remain widely adopted, computationally efficient, and well-suited for capturing temporal dependencies in EMG signals without requiring extensive optimization or substantial computational resources. Therefore, this work aims to evaluate the capability of autoencoder-based feature extraction combined with recurrent neural networks within a controlled methodological scope, rather than providing a broad comparative analysis against all emerging state-of-the-art approaches.
4. Methodology
This section details the phases for developing acquisition and classification systems. These include creating the database, signal processing, the autoencoder, and selecting hyperparameters for classification.
4.1. Windowing
According to [
18], segmentation into short-duration windows is adequate to estimate muscle movements, since this time period is sufficient to capture the relevant EMG signal patterns associated with a specific movement. The use of overlapping windows increases the density of information, as it generates more data segments from the same set of signals [
26,
27].
The EMG signals were acquired at a sampling rate of 1000 Hz, resulting in 1000 samples per second. Accordingly, a window length of 200 ms corresponds to 200 samples at the selected sampling rate.
First, the EMG recordings were organized by movement class. A total of 2000 signals (trials) were obtained across all movements, corresponding to 400 repetitions per movement. For each signal, a central segment was extracted by selecting samples between indices 2500 and 6999, resulting in 4500 samples per trial. The discarded initial and final portions correspond to resting states before and after the contraction, ensuring that the analysis focused exclusively on steady muscle activation associated with the intended movement.
Different levels of overlap were tested for EMG signal segmentation, starting at 70%. Each window was shifted by 60 ms, resulting in 72 windows per signal. Then, with 75%, the shift was 50 ms with 87 windows per signal; and finally, an overlap of 80% was tested, reducing the shift to 40 ms and generating 113 windows.
These experimental overlap percentage values are based on the ability of overlapping windows to continuously capture information during signal analysis, which is essential for real-time classification systems [
26,
28].
Finally, a total of 174,000 samples were obtained for 3 sensors over 200 time points.
4.2. Autoencoder
The database was divided into three sets: training, validation, and testing. Two autoencoder models were created. The general model was trained on a vector containing data from all three sensors simultaneously. When it was observed that the model could not predict signals from the three sensors, a model was selected for each sensor. The Individual model comprised three autoencoders, each trained on data from a single sensor.
The autoencoder architecture consists of two main components: an encoder and a decoder, with detailed configurations for both the general and individual models summarized in
Table 3.
The encoder is designed to capture the temporal structure of EMG signals and compress the input time series into a compact latent representation. Stacked recurrent layers (LSTM and GRU) are employed to model temporal dependencies and muscle activation dynamics, followed by a dense layer that performs dimensionality reduction. This latent vector represents the most informative features of the EMG signal while reducing redundancy.
The decoder mirrors the encoder structure and reconstructs the original EMG signal from the latent representation. By progressively expanding the latent vector through recurrent layers, the decoder preserves the temporal and amplitude characteristics of the original signal. This reconstruction objective ensures that the learned latent space retains physiologically meaningful information relevant for classification.
The architecture of each model begins with an input layer that receives signals from 200 points from a single sensor. To facilitate learning and stabilize training, a normalization layer was applied to the input signal, ensuring that the values had a mean of zero and a standard deviation of one.
EarlyStopping with a patience of 200 epochs and ModelCheckpoint was used for each model to save the best models during training.
The autoencoder architecture was selected to balance representational capacity and computational efficiency. Fully connected recurrent-based autoencoders are well-suited for EMG signals due to their ability to preserve temporal structure while maintaining a compact latent representation. Unlike convolutional or transformer-based architectures, this design does not require large datasets or extensive hyperparameter optimization, making it suitable for scenarios with limited data and offering potential for future optimization toward real-time assistive and muscle–computer interface applications.
After training the autoencoders, each encoder was used as a fixed feature extractor. For each EMG window, the latent vector for each sensor was independently obtained and concatenated to form a single feature vector per sample, which was used as input to the classifier.
Importantly, the data were partitioned into training, validation, and test sets prior to feature extraction. The autoencoders were trained exclusively using the training set, and their weights were frozen before extracting latent representations from the validation and test sets. Consequently, no information from the test set was used to train or tune either the autoencoders, the normalization layers, or the classifier.
To ensure that previously acquired knowledge was not altered, the encoder layers were frozen, preventing their weights from being updated during training of the new models, thereby maintaining the efficiency and accuracy of the latent representations.
The new models were constructed using the latent layer as the final output, rather than the original signals. With these new models, the signals were predicted from both the training set, on which the classifier’s neural network was trained, and the test set, on which the model’s performance was evaluated.
4.3. EMG Classifier
For the classifier, a neural network with a dense architecture was developed for multi-class signal classification. The same model architecture was used across experiments, with variations only in the data partitioning. The model’s input was a one-dimensional vector containing EMG data from the three sensors. Before feeding the data into the model, a normalization layer was applied to improve training stability and efficiency by reducing feature variance.
In the first classifier model, a partition was made based on the participants in the database. Data from 16 randomly selected people were assigned to the training set, 3 to the validation set, and the last 3 to the test set.
In the random partition strategy, the dataset was split into 72% for training, 8% for validation, and 20% for testing. This configuration corresponds to an initial 80/20 train–test split, where 10% of the training data was reserved for validation.
Once the training, validation, and test sets were obtained, a deep neural network was designed and trained for signal classification. To reduce inter-sample amplitude variability and improve training stability, input data normalization was applied. This normalization facilitated model convergence by reducing feature variability.
The hyperparameters were adjusted using the validation data, and
Table 4 presents the details of the hyperparameters used to train the classification model with both data partitions.
These two partitioning strategies were designed to evaluate different application scenarios. The subject-wise partition assesses the model’s ability to generalize to unseen users, while the random partition represents a user-adaptive scenario in which the system benefits from prior exposure to subject-specific EMG patterns. This distinction allows a clearer interpretation of the scope and limitations of the proposed approach.
The selection of the autoencoder and classifier architectures was guided by the characteristics of EMG signals and the objectives of this study. Recurrent-based architectures were chosen due to their ability to model temporal dependencies and preserve the sequential structure of muscle activation patterns. The size of the hidden layers was selected empirically to ensure sufficient representational capacity for accurate signal reconstruction while avoiding excessive overfitting.
Hyperparameters such as learning rate, batch size, and number of epochs were selected based on preliminary experiments that ensured stable convergence and consistent performance across participants. Rather than optimizing for minimal model size, the focus was placed on evaluating the feasibility and effectiveness of latent-space representations for EMG reconstruction and classification.
All experiments were conducted using a fixed random seed. Models were trained for a maximum of 500 epochs, with early stopping and a patience of 20 epochs to prevent overfitting. Training was performed on a CPU-based system using an AMD Ryzen 7 processor and 16 GB of RAM.
5. Results
This section presents the analysis of the methodology’s results.
5.1. Signal Processing
A total of 4500 samples were collected over time from 20 participants, with 20 repetitions per movement. However, for processing, it is essential to segment and window the data to perform accurate analysis and reduce computational load.
The EMG signals from the three sensors, after the rest period, were trimmed and segmented into 200 samples, allowing detailed analysis of the signals’ evolution over time and facilitating their use in training neural network models.
Figure 4 shows a comparison between the 4500 ms EMG signal and the 200 ms signal for the five hand movements across the three sensors. The y-axis corresponds to the signal voltage, while the x-axis represents time. This segmenting reduces visual saturation and makes temporal activation patterns and amplitude variations more distinguishable.
In the graphs of the movement data, the complete 4500 ms signal is saturated with information, making it difficult to clearly distinguish the signal voltages over time. In contrast, when analyzing the segmented 200 ms signal, the waveforms and muscle activation patterns of each sensor are more clearly visible.
A window length of 200 ms was selected as a trade-off between temporal resolution and signal stability, a commonly used approach in EMG analysis to capture muscle activation patterns while maintaining sufficient responsiveness. To further improve dataset representativeness and preserve temporal continuity between consecutive segments, a 75% overlap was applied. This configuration increased the number of training samples while maintaining computational efficiency. As a result, 87 windows were obtained per trial, with a 50 ms time shift between consecutive windows.
The windowing of the signals from the three sensors can be seen in
Figure 5.
Finally, the database contains 174,000 samples, of which 34,800 signals with 200 time points are from the fist gesture, thumb flexion, followed by rest, hand extension, and three-finger flexion, in that order.
Segmentation and windowing of the EMG signals enabled us to obtain more data from the same set. This approach resulted in a higher information density, thereby improving the representativeness of the data used to train and validate the machine learning model.
5.2. Autoencoder
Two approaches were evaluated using both models: one in which the combined signals of the three sensors were processed with a single encoder–decoder model, and another in which three independent encoder–decoder models were used, one for each sensor.
When it was observed that the General model could not accurately reconstruct signals from all three sensors simultaneously, individual models were adopted. The comparison between the two approaches demonstrated that using a dedicated encoder and decoder for each sensor yielded better performance with lower loss.
Separation of sensors allowed each model to specialize in extracting features from its own signal, leading to improved reconstruction quality and lower loss than the joint model. An advantage of the separate approach was the ability to perform computational optimization. By training the models independently, the batch size was adjusted for each sensor, allowing better use of computational resources without compromising training quality.
The use of callbacks enabled the acquisition of latent representations that improved reconstruction and optimized the performance of the final classifier.
Table 5 shows the results of both models.
The obtained reconstruction error reached a minimum of for the best-performing sensor-specific autoencoder, indicating high-fidelity reconstruction of the original EMG signals.
Figure 6,
Figure 7,
Figure 8,
Figure 9 and
Figure 10 compare the predictions from the three sensors obtained by both autoencoder models. In all graphs, the blue signal represents the true signal, and the red signal represents the model-predicted signal.
Figure 6a,b show the signal from the common flexor muscle of the fingers.
Figure 6c,d correspond to the flexor muscle of the thumb, while
Figure 6e,f show the activity of the extensor muscle of the fingers.
For the fist movement in the General Model, the predicted signal better matches the real signal in sensors 1 and 3, though it struggles to reproduce high-voltage values. In sensor 2, where the voltages are lower, the model does not adequately replicate the original pattern. In contrast, the Individual Model reproduces the signals from sensors 1, 2, and 3 more accurately, with sensor 1 achieving the best fit.
In the thumb flexion movement, the
General Model continues to show limitations: in
Figure 7a, sensor 1 fails to reproduce the low voltage values, although it partially follows the general trend. The same occurs in sensors 2 (
Figure 7c) and 3 (
Figure 7e), where the predicted signal deviates from the original. In contrast, in the
Individual Model, the same real signal windows are observed, and the predicted signal faithfully replicates the three sensors as shown in
Figure 7b,d,f.
For the resting hand signal, the voltages are significantly lower, with a maximum near 0.1 V, reflecting low electrical activity. In this case, the
General Model fails to reproduce some relevant peaks and troughs in the signal as shown in
Figure 8a,c,e. In the
Individual Model, sensor 3 (
Figure 8f) achieves the highest accuracy, while sensor 1 (
Figure 8b) shows more noticeable differences in predicted and real data, compared to the other movements.
The extension of the hand is the movement that produces the highest voltage in the extensor muscle sensor. That is why, for the
General Model of sensor 3 (
Figure 9e), a signal with better tendencies to replicate the original can be seen, but even so, it is not enough to obtain a reliable replica. Sensor 2 (
Figure 9c) and sensor 1 (
Figure 9a) cannot replicate the signal, with sensor 1 being more difficult. The
Individual Model has no difficulty replicating the signals from the three sensors, as shown in
Figure 9b,d,f.
For the last movement, the flexion of the three middle fingers is observed to be similar to that of the previous movements. In the
General Model, sensors 1 (
Figure 10a) and 3 (
Figure 10e) manage to partially approximate the original signal, while sensor 2 (
Figure 10c) shows difficulties in reaching the actual voltage values. In contrast, the
Individual Model is highly accurate at replicating the signals from the three sensors, as shown in
Figure 10b,d,e.
This analysis was replicated for different windows and participants, obtaining results consistent with those presented in this paper.
The General Model performs better with high-voltage signals than with low-voltage signals. Its best performance is observed in the hand extensor muscle sensor, while its poorest performance is observed in the thumb flexor muscle sensor. However, none of the replicas generated by this model achieves the accuracy achieved by the Individual Model.
In contrast, higher reconstruction accuracy was consistently observed when using autoencoders trained independently for each sensor. This indicates that the encoder’s latent features effectively capture sensor-specific characteristics of muscle activation.
The higher reconstruction accuracy and lower MSE values demonstrate that the autoencoder preserves both the temporal structure and amplitude modulation of the EMG signals, which are directly related to muscle activation intensity and contraction dynamics. Furthermore, classes with higher sensitivity exhibit more consistent activation patterns across sensors, whereas lower-sensitivity classes are associated with gestures that show greater inter-subject variability or overlapping muscle recruitment.
5.3. EMG Classifier
To assess the classifier’s performance and generalization, two data partitioning strategies were evaluated. The first approach used a participant-based partition, while the second employed a random partition of the entire dataset.
During training and validation, the model trained on the participants’ partition achieved 92% accuracy; however, when evaluated on the test set, the accuracy dropped significantly to 60%. This showed that the classifier fed with this database configuration did not generalize well to previously unseen participants. This behavior is consistent with the known inter-subject variability of EMG signals and highlights the difficulty of achieving subject-independent generalization without additional normalization or adaptation mechanisms.
With the random partition of the database, a classification accuracy of 94.3% was obtained on the training set, reaching 88.98% on the validation set and 88.81% on the evaluation set.
Figure 11 shows the evolution of the accuracy over time for the training and validation sets. In early training epochs, both curves increase rapidly, indicating that the neural network is learning relevant patterns. Subsequently, the validation accuracy stabilizes at around 89%, while the training accuracy continues to increase, reaching values close to 96%. This difference suggests that the model begins to adjust more to the training data than to the validation data. Given that both curves remain close, it can be concluded that the model has classification capability.
Figure 12 shows the loss as a function of epochs for both sets. The loss of both curves decreases during the first epochs, reflecting efficient learning. However, starting at approximately epoch 100, the validation loss stabilizes, while the training loss continues to decline steadily. 200 epochs later, a slight increase in validation loss is observed, reinforcing the presence of slight overfitting.
Together,
Figure 11 and
Figure 12 show the normal behavior of a well-trained neural network, with signs of overfitting in the final stages of training, which are mitigated by early stopping techniques. These data suggest that the trained model has reached its maximum performance before falling into overfitting.
The confusion matrix was obtained to evaluate the performance of the multi-class model.
Figure 13 compares the model’s predictions with the true classes, enabling a detailed analysis of classification errors and successes.
The main diagonal of the matrix shows the cases in which the model correctly classified, while the values off the diagonal represent classification errors. Class 4, corresponding to the extension movement of the fingers, was the most accurately classified, and the flexion movement of the thumb, class 2, was the least accurately classified.
In the first row, corresponding to class 1, we can see that the model correctly classified 5737 instances. However, class 5 is the most confusing for class 1, since in both movements, fist gesture and middle finger flexion, the common flexor muscle is mainly activated.
The thumb flexion movement was the least sensitive, and class 2 had 5961 correct predictions. There is confusion with class 3, the resting state, suggesting that the thumb flexor sensor, being the smallest muscle sensed, has a maximum voltage of 1 V, which confuses the signals obtained during the resting state.
In class 3, a total of 6462 correct classifications were obtained, with only 5 errors in predicting class 1. The fact that this is the class with the fewest errors can be attributed to the low-voltage signals obtained when the muscles are at rest.
For finger extension movement, class 4, 6520 correct predictions were achieved. This class, with the highest number of correct predictions, may be due to several factors: the common extensor muscle is the opposite of the two flexors and the muscle that reaches the highest voltages, and it is the only movement in which this muscle participates, which facilitates its differentiation.
Finally, class 5 had 6226 correct classifications. The biggest classification problem in this category is with class 1, as seen in the classification of the fist gesture movement, suggesting that they compare similar patterns in the data representation when activating the same muscle for the action.
Based on the confusion matrix, true and false positives and negatives were obtained for each of the five classes. Values were obtained for each class using a macro-averaging (one-vs.-rest) strategy. These values were used to compute class-wise accuracy, sensitivity, and specificity according to Equations (
1)–(
3). The reported mean accuracy, sensitivity, and specificity correspond to macro-averages across the five classes.
Then, the class-wise values were averaged to obtain the mean accuracy (95.52%), mean sensitivity (88.79%), and mean specificity (97.20%) reported in
Table 6.
Finally, the global efficiency value of 93.84% was computed as the arithmetic mean of the averaged accuracies, sensitivities, and specificities, providing a single indicator of overall classification performance.
6. Discussion
This study explored the use of an autoencoder to replicate EMG signals and extract latent features to improve the classification of hand movements. Unlike other studies that use raw signals directly or only after prior digital processing, this methodology provides a compact, unsupervised representation of the relevant information in EMG signals, preserving essential temporal and amplitude-related characteristics of muscle activity.
The obtained results support this design choice. The low reconstruction error indicates that the learned latent space effectively preserves essential muscle activation patterns, which contributes to more stable classification performance. The observed differences between subject-wise and random partitioning further highlight the impact of inter-subject variability, a well-known limitation in EMG-based systems. In this context, the proposed approach is particularly suitable for user-adaptive scenarios, where subject-specific calibration allows the latent representations to be exploited more effectively.
To analyze the classifier’s performance and generalizability, two data partitioning strategies were evaluated.
Subject-wise partitioning guarantees full independence between sets, preventing any information transfer between individuals. In this configuration, the model reached 92% accuracy during training and validation, but its performance dropped to 60% in the test set. This decline indicates that the model captures person-specific traits rather than subject-independent patterns. Variations in amplitude, noise level, muscle tone, and activation shape between participants shift the signal distribution and affect the latent representation of the autoencoders. To mitigate this behavior, it is necessary to incorporate mechanisms that reduce or eliminate subject dependence during training. It is important to emphasize that achieving full inter-subject generalization in EMG-based systems is inherently challenging due to physiological differences among users, including muscle anatomy, electrode placement, contraction strategies, and noise characteristics. As a result, many practical EMG control systems rely on a calibration or adaptation phase tailored to each user rather than a fully subject-independent model.
The observed drop in test accuracy indicates that the learned representations are strongly influenced by subject-specific EMG characteristics, which is a known limitation in EMG-based classification.
By contrast, the random partition represents a user-adaptive scenario, in which an initial calibration using subject-specific data is assumed. Accordingly, the results obtained with the random partition are not intended to demonstrate cross-subject generalization, but rather to assess the feasibility of adaptive and personalized EMG-based control systems.
Overall, the results show that the subject-wise partition provides an assessment of inter-subject generalization capability, revealing the strong influence of subject-specific EMG patterns. In contrast, the random partition reflects a more realistic, user-centered scenario for adaptive systems, assuming an initial calibration phase. This configuration aligns with the objective of this study, as the model benefits from previously observed anatomy-specific patterns, thereby improving classification performance and customization capabilities. Consequently, random partition forms the basis for the practical implementation of personalized EMG control systems.
To contextualize the scope of the experimental evaluation, the objective was not to perform an extensive benchmark against every existing classifier, but to evaluate whether a latent-space-based representation and per-sensor specialization improve reconstruction fidelity and classification consistency in EMG signals.
Therefore, the comparison was intentionally focused on models that share the same learning paradigm (temporal deep models capable of sequence representation and reconstruction), ensuring methodological coherence rather than an exhaustive algorithmic competition.
It is acknowledged that overlapping windows may introduce correlations between samples; however, this effect is inherent to continuous EMG processing and does not compromise the validity of the personalized-use evaluation. More restrictive partitioning strategies at the repetition or session level, as well as leave-one-subject-out validation, are considered important directions for future work.
In [
11], a test accuracy of 96.65% was achieved using the DualMyo database, which includes only one subject, eight gestures, and sixteen sensors. Similarly, Ref. [
12] reported an accuracy of 99.6% using LSTM networks with eight sensors and seven gestures; however, the number of participants was not reported. Although these studies are useful as proofs of concept, they do not address the inter-subject variability, which remains a major challenge for both clinical and commercial applications.
In contrast, Ref. [
9] evaluated a larger data set comprising 40 participants, 18 movements, and 12 sensors, achieving a test accuracy of 86.7%. Similarly, Ref. [
11] achieved a test accuracy of 90.82% based on data from ten participants, eight sensors, and sixteen sensors. Despite involving more participants and gestures, the high number of sensors helped maintain this level of accuracy; however, the performance barely exceeds 90%.
In comparison, the present work employs a database collected from 20 different individuals, introducing substantial physiological variability and therefore representing a more realistic and challenging scenario for practical implementations. Despite this added complexity, the proposed model achieved competitive average accuracy, demonstrating robustness and adaptability across diverse users.
Furthermore, studies such as [
10] used databases with only six participants and seven movements, often relying on a single high-density EMG sensor or non-portable configurations. By contrast, the present study utilized a simple acquisition setup with only three surface EMG sensors, which was sufficient to capture relevant muscle activity while reducing hardware complexity compared to high-density or multi-sensor configurations. This simplified sensing setup facilitates portability at the acquisition level, while the proposed learning framework serves as a proof-of-concept for effective latent representation and classification of EMG signals.
Finally,
Table 7 summarizes a comparison between previous studies and the present work, highlighting the number of participants, gestures, sensors, and the maximum reported accuracy in each case.
Compared to studies that use deep neural networks for EMG classification with raw signals, this methodology, based on autoencoders and dense-layer classifiers, reduces the data required for classifier training while maintaining competitive performance.
While the proposed approach demonstrates strong reconstruction fidelity and competitive classification performance, it has some limitations. First, the subject-wise evaluation revealed a notable decrease in accuracy, highlighting the persistent challenge of inter-subject variability in EMG signals. This indicates that the learned latent representations remain partially influenced by subject-specific characteristics. Second, the experimental protocol was conducted under controlled conditions, and real-time performance or robustness under varying electrode placements and muscle fatigue was not explicitly evaluated. Finally, the study focused on recurrent-based architectures and did not explore alternative deep learning paradigms or extensive ablation analyses, which may further improve performance or computational efficiency.
Despite these limitations, the proposed methodology offers clear strengths, including reduced reliance on handcrafted features, improved reconstruction accuracy via per-sensor autoencoders, and a structured framework suitable for adaptive, personalized EMG-based control systems. These characteristics make the approach a promising foundation for future extensions addressing cross-subject generalization and real-world deployment.
Recent advances, such as attention mechanisms, transformer-based architectures, multimodal learning, and advanced autoencoder variants, offer alternative strategies for addressing variability and improving robustness in biomedical time-series analysis. However, evaluating all these paradigms is beyond the scope of this study, which focuses on assessing the effectiveness of recurrent models supported by autoencoder-derived latent features.
As future work, the methodology presented here may be extended toward more advanced architectures, including attention-based neural networks, transformer-inspired models, hybrid deep learning frameworks, and multi-sensor data fusion strategies. Additionally, more advanced autoencoder variants, such as variational or denoising autoencoders, could be explored to improve robustness against noise and inter-subject variability, potentially enhancing performance in real-world EMG-based control scenarios. Similarly, alternative deep learning architectures were not included in this study, as the primary objective was not to conduct an exhaustive comparison of network designs but to assess the effectiveness of autoencoder-based feature extraction combined with temporal models within a controlled, reproducible framework. Although architectures such as CNNs or transformer-based models may offer advantages in certain applications, they often require larger datasets and higher computational resources. Investigating these approaches, along with systematic ablation studies, represents a relevant direction for future work.