A Novel Hand Motion Intention Recognition Method That Decodes EMG Signals Based on an Improved LSTM

Cao, Tian-Ao; Zhou, Hongyou; Chen, Zhengkui; Dai, Yiwei; Fang, Min; Wu, Chengze; Jiang, Lurong; Dai, Yanyun; Tong, Jijun

doi:10.3390/sym17101587

Open AccessArticle

A Novel Hand Motion Intention Recognition Method That Decodes EMG Signals Based on an Improved LSTM

by

Tian-Ao Cao

^1,2,3

,

Hongyou Zhou

^1,4

,

Zhengkui Chen

^4,*

,

Yiwei Dai

⁴,

Min Fang

⁴,

Chengze Wu

⁴,

Lurong Jiang

¹

,

Yanyun Dai

¹

and

Jijun Tong

^1,5

¹

School of Information Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China

²

School of Instrumentation Science and Engineering, Harbin Institute of Technology, Harbin 150001, China

³

Weihai Sunfull Electronics Group Co., Ltd., Weihai 264200, China

⁴

School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China

⁵

Provincial Key Laboratory for Research and Translation of Kidney Deficiency-Stasis-Turbidity Disease, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(10), 1587; https://doi.org/10.3390/sym17101587

Submission received: 13 August 2025 / Revised: 10 September 2025 / Accepted: 15 September 2025 / Published: 23 September 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Electromyography (EMG) signals reflect hand motion intention and exhibit a certain degree of amplitude symmetry. Nowadays, recognition of hand motion intention based on EMG has enriched its burgeoning promotion in various applications, such as rehabilitation, prostheses, and intelligent supply chains. For instance, the motion intentions of humans can be conveyed to logistics equipment, thereby improving the level of intelligence in a supply chain. To enhance the recognition accuracy of multiple hand motion intentions, this paper proposes a hand motion intention recognition method that decodes EMG signals based on improved long short-term memory (LSTM). Firstly, we performed preprocessing and utilized overlapping sliding windows on EMG segments. Secondly, we chose LSTM and improved it so as to capture features and enable prediction of hand motion intention. Specifically, we introduced the optimal key hyperparameter combination in the LSTM model using a genetic algorithm (GA). We found that our proposed method achieved relatively high accuracy in detecting hand motion intention, with average accuracies of 92.0% (five gestures) and 89.7% (seven gestures), while the highest accuracy reached 100.0% (seven gestures). Our paper may provide a way to predict the motion intention of the human hand for intention communication.

Keywords:

surface electromyography; hand motion intention; long short-term memory; hyperparameter optimization

1. Introduction

Electromyography (sEMG) signals reflect muscle activity and are superpositions of many motor unit action potentials (MUAPs) in time and space. The fluctuations of raw time-domain EMG signals are relatively symmetrical, and their long-term average approaches zero. The amplitudes of EMG signals are symmetrical relative to the time axis. Surface EMG (sEMG) signals can be recorded using electrodes on the skin. sEMG signals are widely applied in multiple fields, including robot-assisted rehabilitation, movements of prosthetics, industrial robot operation, intelligent supply chains, prevention of sports injuries, and other human–machine interaction (HMI) systems [1,2].

Rehabilitation robots are able to provide rehabilitation training for patients after strokes or spinal cord injuries through sEMG signals acquired from limb muscles. This assisted rehabilitation accelerates the rehabilitation process and improves the quality of patients’ lives [3,4]. For amputees, sEMG signals from their residual limb muscles can be acquired and decoded. As a result, prostheses are precisely controlled to perform daily activities, such as picking up items, operating tools, and even engaging in fine handicraft activities [5,6]. During the sorting of industrial products, sEMG signals are used to improve the flexibility and accuracy of robots. Especially in logistics handling and warehousing operations, by monitoring the muscle activities of an operator, motion intention can be conveyed to logistics equipment to make it ready in advance. This improves the efficiency and synergy of intelligent supply chains [7]. During some HMI tasks, sEMG signals are used to control virtual targets or certain devices. Users can interact with computers, intelligent prostheses, or other electronic devices directly through sEMG signals. This provides more autonomy for users while also offering new possibilities for the development of virtual reality (VR) and augmented reality [8,9,10,11]. For example, a bone conduction integrated biosignal acquisition system can be used to obtain sEMG signals simultaneously through electrodes integrated into bone conduction headphones. This system can transmit instructions to computers or machines without the need for arm movement. The technology is not only an interface for healthy people but also an interface for disabled persons with quadriplegia [12].

In EMG decoding, improving the recognition accuracy of hand motion intention is of vital importance, as it can enhance the stability and efficiency of HMI tasks significantly. In terms of traditional feature extraction and classification methods, Qing et al. [13] collected sEMG signals at a sample rate of 2000 Hz from four specific muscles. They selected the root mean square (RMS), waveform length (WL), zero crossing (ZC), and slope sign change (SSC) as features and constructed a classification model using linear discriminant analysis (LDA) and a probabilistic neural network (PNN). The results implied that the average decoding accuracy was about 95%. Fatimah et al. [14] presented a Fourier decomposition method (FDM) for hand motion intention recognition. This method decomposed sEMG signals into Fourier intrinsic band functions (FIBFs) and calculated the entropy, kurtosis, and L1 norm of each FIBF. Statistically relevant features were determined using the Kruskal–Wallis test and used to train machine learning models like a support vector machine (SVM), K-nearest neighbor (KNN), ensemble bagged trees, and ensemble subspace discriminant. The recognition rate reached 93.53% on the NinaPro DB5 dataset, which is composed of 16-channel sEMG signals with a sampling frequency of 200 Hz. Rani et al. [15] introduced an efficient sEMG acquisition band and a new Hjorth secant line (HSL) for feature extraction. They chose the NinoPro DB3 dataset, which contains 12-channel sEMG signals. Using a random forest (RF) classifier, this study achieved a classification accuracy of 82.16%. In order to further improve feature separability, various dimensionality reduction techniques have also been introduced into the EMG recognition system. Junior et al. [16] compared feature selection and dimensionality reduction extensively in hand gesture classification based on sEMG signals collected by an eight-channel arm band. Principal component analysis (PCA), LDA, an Isomap, manifold charting, an autoencoder, t-distributed stochastic neighbor embedding, and large margin nearest neighbor (LMNN) were used for dimensionality reduction. Seven classifiers were used afterwards, aiming to recognize six gestures. Five features and an extreme learning machine classifier were utilized, and an average accuracy of 89.4% was obtained. Considering 40 dimensions, an average accuracy of 94.0% was obtained based on a combination using an SVM with a Gaussian kernel and an LMNN technique. Karheilya et al. [17] focused on time–frequency-domain feature extraction and various linear and nonlinear dimensionality reduction methods. The discrete orthogonal Stockwell transform (DOST) and multidimensional scaling (MDS) were applied. The presented methods were evaluated on Exercise A of the Ninapro DB2 dataset (Channel: 12) using classical classifiers. The accuracies reached 90.05%, 89.92%, and 90.96% using the short-time Fourier transform, continuous wavelet transform, and Stockwell transform.

With the burgeoning advancement of deep learning, EMG decoding based on deep neural networks has become a research hotspot. Shen et al. [18] constructed a new deep learning-based model. The input layer consisted of 16-channel sEMG data, and a stacked structure of six parallel convolutional layers was used to extract features step by step. Then, a four-layer fully connected network was selected to map high-dimensional features to low dimensions, and the output layer performed classification. The results indicated that the accuracy of motor intention recognition was approximately 90% when using sEMG data from a single subject or all subjects. Xu et al. [19] proposed a novel squeeze-and-excite convolutional neural network (SE-CNN) attention mechanism, which re-calibrated the feature weights of the output of the convolutional layer by introducing temporal squash and excitation blocks into a simple CNN. The results showed that the recognition accuracies of this algorithm were 77.61% for the Ninapro DB4 dataset and 87.42% for the Ninapro DB5 dataset. Luo et al. [20] introduced the InRes-ACNet model, which integrated a multi-scale module and a self-attention mechanism to improve gesture recognition performance by enhancing the ability to extract channel feature information in sparse sEMG signals. The results for the NinaPro DB1 and NinaPro DB5 datasets implied that the recognition accuracies reached 87.94% and 87.04%, respectively. Sehat et al. [21] utilized the convolutional layer in a CNN to learn the features of sEMG signals and employed a genetic algorithm (GA) to optimize the key structural parameters of the model (the number of convolutional layers, the number of kernels in each convolutional layer, and the number of neurons in the fully connected layer).

Notwithstanding this progress in sEMG-based hand gesture recognition, there are still several challenges: Manually extracted features and selected machine learning methods rely on expert experience, which is time-consuming and labor-intensive. For large-scale datasets, the computational and storage requirements are limiting. Parameters of deep learning models tend to rely on manual experience, and optimization of key hyperparameters, such as the learning rate and batch size, is often not incorporated. These issues cause overfitting and poor generalization, which decrease the recognition rate of hand motion intention. To address these problems, this paper presents a novel hand motion intention recognition method by improving the long short-term memory (LSTM) model using a genetic algorithm via optimization of key hyperparameters. Section 2 introduces an overall schematic diagram and relevant theories, including experimental data, preprocessing, and model improvement. Section 3 describes the results step by step, demonstrating the effectiveness of the proposed method. Section 4 discusses the results, and Section 5 summarizes the full text.

2. Materials and Methods

2.1. Experimental Paradigm

The BandMyo dataset was recorded with a Myo armband worn on the forearm. The data consist of 15 static gestures, including finger gestures, wrist gestures, and other gestures. “Thumb”, “Index”, “Middle”, “Ring”, and “Pinky” are finger gestures and only involve finger movements. “Flex”, “Extend”, “Adduct”, “Abduct”, “Sup.”, and “Pro.” are six wrist gestures. The subjects needed to rotate their whole hands around their wrist joints to finish these gestures. “Palm”, “Spread”, “Fist”, and “Point” are four other gestures. The subjects needed to use multiple fingers simultaneously to finish these gestures. Six subjects were recruited, including four males (age: 21–26) and two females (age: 23–25). During the experiment, the subjects performed all 15 gestures while following video guidance, and their sEMG signals were recorded synchronously. The sampling rate was 200 Hz. After completing all 15 gestures, the subjects removed the equipment and took a short rest. Then, the subjects wore the equipment again and repeated the procedure. This process was conducted eight times in total [22].

Based on practicability and necessity, we first selected five gestures, “Spread”, “Fist”, “Point”, “Flex”, and “Extend”, which are the most common actions in daily life. After comparing the hand motion intention recognition performance by means of our proposed method and other conventional methods, we added “Adduct” and “Abduct”, which are relatively common in daily life, for motion intention recognition to further confirm the effectiveness of our method.

2.2. Preprocessing and Manual Feature Extraction

A Myo armband is a portable device that is able to acquire sEMG signals. There are eight ST 78589 operational amplifiers (one for each electrode) in a Myo armband. Usually, the acquired sEMG signals are contaminated by various types of noise, such as low-frequency interference and power line interference. We selected a fourth-order Butterworth high-pass filter (cutoff frequency: 3 Hz; bandwidth: 1 Hz) to eliminate low-frequency interference that did not align with the EMG bandwidth. Subsequently, an infinite impulse response (IIR) notch filter was used to eliminate 50 Hz power line interference.

Because the numbers of sampling points in different repeated experiments differed, we truncated excessively long sequences first and filled shorter sequences using linear interpolation. Considering the temporal characteristics of the gestures, when a gesture remained relatively stable during a certain period, with its main features concentrated in the first half of the sequence, truncation was primarily adopted. The number of sampling points in most repeated experiments was about 628, and the sampling rate of the BandMyo dataset was 200. As a result, the corresponding gestures were considered to be completed within three seconds. We truncated a small number of sampling points at the end so as to obtain a uniform EMG signal length of 624 sampling points for the repeated experiments.

Later, we made use of overlapping sliding windows and split the sEMG signals into segments. sEMG signals are time-series data, and sliding windows are capable of capturing the temporal dependence and tendency of data [23]. Dividing sEMG signals into multiple overlapping subsequences helps models learn and predict the behavior of time-series data better. Given the inherent capability of LSTM in processing temporal signals, this study employed computationally efficient time-domain features as the primary features. Features like the MAV, SSC, and ZC were often used for gesture recognition and proved effective [24,25]. We calculated the MAV, SSC, and ZC for each sliding window to form a feature group and used conventional classifiers (SVM [26], KNN [27], etc.) for gesture classification for comparison. These features were calculated as follows:

M A V = \frac{1}{N} \sum_{i = 1}^{N} |x_{i}|

(1)

S S C = \sum_{i = 2}^{N - 1} [(x_{i} - x_{i - 1}) \cdot (x_{i + 1} - x_{i}) < 0]

(2)

Z C = \sum_{i = 1}^{N - 1} sgn (- x_{i} \cdot x_{i + 1})

(3)

where N is the length of the sliding window,

x_{i}

is the amplitude of each data point (i), and sgn() denotes the symbolic function of the signal.

2.3. Construction of Deep Learning Model

We utilized a genetic algorithm (GA)-based LSTM model and conducted hyperparameter optimization for gesture recognition. The framework of our proposed model is illustrated in Figure 1. It comprises four main components: sEMG acquisition, data preprocessing, model optimization, and hand motion intention recognition.

LSTM learns the long-term and short-term dependencies of data sequences through a storage unit (c), which has a self-connection to store the temporal state of the network. Each LSTM unit processes information through three inputs: the input of the current time step (

x_{t}

), the output of the previous LSTM unit (

h_{t - 1}

), and the unit state of the previous unit (

c_{t - 1}

). An LSTM unit is controlled by three gates, as shown in Figure 2 [21,28,29,30,31,32]. Further details can be found in [32].

2.4. Improving LSTM Model Using GA via Optimal Key Hyperparameter Combination

The number of LSTM units affects the memory ability of the model directly, and the number of units determines the capacity and representation ability of the model. A higher number of units allows the model to learn more complex time-series patterns, but it may also lead to overfitting. A small number of units may not be enough to capture complex relationships in data. Choosing the right number of units can provide a balance between the training ability of the model and overfitting. It is crucial to ensure that the model performs well on both training and testing data. A systematic analysis by Greff et al. [33] shows that 1–2 layers of LSTMS are often sufficient to capture most of the patterns in a time series. Therefore, we selected LSTM1 units and LSTM2 units. The dropout rate is also significant in the LSTM model. The dropout rate determines the proportion of neurons that are randomly discarded during the training process to prevent overfitting of training data. Hence, there is a need to balance the training effect and regularization by way of a moderate dropout rate. We utilized the Dropout1 rate and the Dropout2 rate, as well as optimized hyperparameters. The control model of the learning rate adjusts the weight magnitude in each update. If the learning rate is too high, unstable or divergent training may occur. When the learning rate is too low, slow training may occur and local optimal solutions might appear. Hence, it is necessary to adapt the learning rate. Next, we adjusted it dynamically during the optimization process. The batch size determines the number of data samples during each parameter update. Larger batches can speed up the training process and may result in more stable gradient estimation but also require more computational resources. Smaller batches can increase model generalization but might result in longer training times. Therefore, when improving the LSTM model, we selected six key hyperparameters: the LSTM1 units (the number of units in the first LSTM layer), LSTM2 units (the number of units in the second LSTM layer), Dropout1 rate (the dropout rate of the first dropout layer), Dropout2 rate (the dropout rate of the second dropout layer), learning rate, and batch size.

These six hyperparameters were encoded as one gene, and the entire set of hyperparameters was represented as one chromosome (individual) in the GA. Each individual represented a specific hyperparameter combination. A GA was used to search for the optimal combination of these hyperparameters, evolving the population toward optimal solutions. The optimal individual was obtained through selection, crossover, and mutation. After multiple iterations, the optimal LSTM model was confirmed. The optimization process of our proposed model (shown in Figure 3) was as follows:

(1): Read the sEMG time series.
(2): Preprocess the data, including EMG normalization and segmentation, using overlapping sliding windows.
(3): Generate a certain number of individuals randomly, initialize the population, and create an LSTM parameter model.
(4): Train and validate the LSTM model for each individual, and calculate their fitness values.
(5): Select excellent individuals based on their fitness values, and generate new populations by way of crossover and mutation.
(6): Repeat the genetic operations. The individual with the highest fitness converges gradually after multiple iterations.
(7): Select the individual with the highest fitness, and train the final model by means of its LSTM network configuration.

The specific settings were as follows: Considering the balance between the amount of feature information captured by the model and the sampling rate of the dataset, a range of LSTM units from 64 to 256 was reasonable. Systematic experiments on various recurrent neural network (RNN) variants (including LSTM) show that dropout ratios between 0.1 and 0.4 perform well in language modeling and time-series tasks, and the scope can be appropriately expanded if the fitness decreased slowly in previous generations [34]. Combined with the pre-experiment, we set the dropout rate from 0.1 to 0.5. The learning rate was initialized in the range of 0.0001 to 0.01, which is commonly used in deep learning tasks, and was adjusted dynamically during training using the Adam optimizer. The batch size was treated as a discrete parameter and adjusted according to the hardware constraints and dataset size, balancing the convergence speed and model generalization. In Appendix A, some key parameters assessed during the genetic algorithm optimization process are listed (Table A1). Additionally, the pseudocode of our algorithm is provided in Table A2.

3. Results

3.1. Selection of Overlapping Sliding Window Parameters

Taking Subject 000 as an example, Table 1 presents the recognition rates of five gestures using overlapping sliding windows with different parameters, i.e., the window length (the number of sample points in a window) and overlap rate. All recognition rates were calculated using LSTM. Specifically, when the window length was 78 and the overlap rate was 25% or 75%, the sliding step size was 19.5, which was not an integer. As a consequence, we ruled this out.

When the sliding window length was 156, the recognition rate (96.4%) performed best when the overlap rate was 50%. Both high and low overlap rates were harmful for gesture recognition. When the overlap rate was 50% and the sliding window length was 78, the classification accuracy was 88.3%, which was lower. When the sliding window length rose to 312, the highest accuracy was 93.8% (overlap rate = 75%). Moreover, an excessive window length provoked a reduction in the sample size, which was not conducive to model training. Thus, sEMG data that underwent noise removal and normalization in sliding windows (window length = 156 and overlap rate = 50%) were adopted as the input for our proposed method and the controlled deep learning architectures for further experimentation.

3.2. Optimal Hyperparameter Combination Using GA

For the sake of evaluating the performance of the optimized LSTM model, we conducted experiments on sEMG data from different subjects. The data included sEMG signals from Subjects 000 to 005. We first classified five gestures, namely “Spread”, “Fist”, “Point”, “Flex”, and “Extend”. Table 2 provides information about the classification accuracies of our proposed method for these five gestures. The average recognition rate based on the conventional LSTM model was compared with the rate based on our proposed method, as shown in Table 3.

In Table 2, it is evident that the motion intention of “Extend” could be fully predicted for most subjects (except Subject 001). The other motion intentions could be fully predicted for some subjects as well. The classification accuracies of the five gestures of Subject 001 had the largest fluctuations. In Table 3, it is clear that our proposed method contributed to higher classification accuracies based on all subjects. Specifically, the accuracies of subjects 001, 002, and 003 were raised by approximately 3.6%, 1.8%, and 1.7% when using our method separately compared with conventional LSTM. The results show that GA optimization could improve the performance of the LSTM model effectively. Our method could obtain higher classification accuracy in gesture classification based on EMG by optimizing the hyperparameter settings in the model. Especially for the subjects with high accuracies (000, 004, and 005), GA optimization maintained the highest level of classification performance. For the subjects with lower accuracies (001 and 002), GA optimization improved classification performance significantly.

Subsequently, we classified seven gestures, namely “Spread”, “Fist”, “Point”, “Flex”, “Extend”, “Adduct”, and “Abduct”, and the results are displayed in Table 4.

Compared with the five-category classification tasks, the accuracies of the seven-category classification tasks show that our proposed method can improve recognition performance significantly for more complex hand motion intentions. Our method improved classification accuracy in the five-category classification tasks, and the improvement was more pronounced in the seven-category classification tasks. Introducing hyperparameter optimization in the GA to optimize the LSTM model for the seven-category task verified its effectiveness and advantages in complex classification tasks. Subsequently, we drew a confusion matrix of seven-class gesture recognition based on our proposed method, as shown in Figure 4. We also calculated three indices for evaluation, i.e., sensitivity, specificity, and the F1-score, as enumerated in Table 5. Taking Subject 000 as an example, we also plotted the convergence curve of the genetic algorithm for seven-gesture recognition, as shown in Figure A1 in Appendix B. We then conducted a sensitivity analysis to explore the impact of each hyperparameter (the number of LSTM units, the learning rate, the dropout rate, and the batch size). Specifically, based on our optimal hyperparameter combination, we varied each hyperparameter (taking one hyperparameter as the independent variable and increasing its value within a range we set by a fixed step size using a grid search) and observed the change in classification accuracy while keeping the other hyperparameters fixed at their optimal values. The results are shown in Figure 5, where the horizontal axis represents the step size of the hyperparameter and the accuracy value of 93.7% (our proposed baseline) denotes the seven-class recognition rate of Subject 000.

We also selected conventional classifiers (KNN, an SVM, and a decision tree) based on the MAV, SSC, and ZC and principal deep learning architectures like a gated recurrent unit (GRU) and a CNN to recognize the seven gestures. The classification results obtained by 5-fold cross-validation are displayed in Figure 6. Except for Subject 001, the gesture recognition accuracies of all subjects using our method and the controlled deep learning methods tended to be higher than those achieved by traditional classifiers with manually extracted features. Taking Subject 004 as an example, the gesture recognition rate reached 100.0%, which was higher than those of KNN (77.0%), the SVM (86.7%), the decision tree (70.7%), the GRU (91.1%), and the CNN (97.5%). In addition, the recognition accuracy of LSTM was 98.7%, and it rose to 100.0% via our optimization.

The average classification accuracies of our proposed method and the controlled methods for seven-category gesture recognition are presented in Figure 7. The average accuracy of our method was 89.7%, which was 2.3%, 11.1%, 3.5%, 8.2%, 9.7%, and 3.2% higher than those of LSTM, KNN, the SVM, the decision tree, the GRU, and the CNN.

Eventually, we compared the gesture recognition performance based on our proposed method and the spatial–temporal feature-based gesture recognition (STF-GR) presented by Zhang et al. [21] using the same dataset. The STR-GR method first uses the multivariate empirical mode decomposition (MEMD) technique to decompose a non-stationary multi-channel sEMG signal into a series of stationary subsignals. Then, a convolutional recurrent neural network (CRNN) is used to automatically learn and fuse the spatial–temporal features of the decomposed sEMG signals to predict gesture categories. We used 15 gestures from all subjects, using repetitions 1, 3, 5, and 7 for each gesture as the training set and repetitions 2, 4, 6, and 8 as the test set to meet the experimental conditions mentioned by Zhang et al. The results are listed in Table 6. SVM_1 and RF_1 used the RMS as a feature. SVM_2 and RF_2 selected the MAV, mean absolute value slope (MAVS), ZC, SSC, and WL as features. SVM and RF stand for the support vector machine and random forest classifiers, respectively. It is obvious that based on the same dataset, our proposed method achieved an accuracy of 71.9%. Our proposed method has a great advantage in gesture recognition compared with SVM_1, RF_1, SVM_2, and RF_2 and has a slight advantage in gesture recognition compared with STF-GR. Considering the algorithm complexity and feature calculation, our proposed method performs better.

4. Discussion

Biosignals of muscle activity are often time-dependent, and LSTM specializes in processing sequential data. In this case, LSTM can capture time-dependent relationships in EMG signals effectively. LSTM is able to integrate timing information from different channels and capture coordinated activity among multiple muscle groups. The use of overlapping sliding windows not only increases the number of samples after noise removal but also ensures the effectiveness of feature extraction, allowing more details to be captured and helping to enhance the accuracy of classification.

After improving LSTM by key hyperparameter optimization in the GA, it is undeniable that the gesture classification rates of most subjects rose. Taking Subject 000 as an example, the seven-category recognition rate was enhanced by 6.4%, which was the largest increase. In addition, it is worth noting that the classification rate of Subject 001 was always lower than those of the other subjects, regardless of the gesture number and the processing method. We believe the sEMG signals from Subject 001 may contain some specific characteristics that affected the performance of both our proposed method and the controlled deep learning architectures. To be specific, the standard electrode positions are based on the average anatomical structure and may not be entirely applicable to all individuals. There is still subjective error at the millimeter level when placing the electrodes. The influence of this error could be more significant on account of the muscle morphology, fat thickness, and fascia tissue. Additionally, physiological differences such as hairiness and the degree of sweating may cause that the electrode–skin contact impedance of some subjects to remain consistently high. These factors decreased the signal quality of Subject 001. It can also be observed that the accuracies obtained using the deep learning frameworks were lower than those obtained using traditional manual features. From our perspective, the deep learning frameworks rely on a large number of high-quality samples and are highly sensitive to distribution differences during end-to-end learning. Traditional algorithms utilize statistical or time-domain features that could “mitigate” such individual differences. Introducing quality weighting, combining deep features with manual features, or performing data augmentation on the deep learning model’s input would increase the data diversity of Subject 001. These approaches may ensure relatively high recognition performance, even when the signal quality is poor or there are individual differences. The recognition rates of five gestures for Subject 004 based on LSTM and our method both reached 100.0%. In addition, when the gesture number rose to seven, our method could still obtain an accuracy of 100.0%, while LSTM made some errors. There is no denying that, by optimizing key hyperparameters in GA-based LSTM, our proposed method is able to predict multiple hand motion intentions more accurately.

For certain subjects, the hand motion intention prediction performance of different methods was consistent. The gesture prediction performance of Subjects 004 and 005 was always high, while that of Subject 001 always showed relatively low accuracy. The standard deviation of the five-class gesture recognition rates obtained via our method was 8.9%, which was 1.4% lower than that obtained via LSTM. Similarly, the standard deviation of the seven-class gesture recognition rates obtained using our method was 12.0%, which was 0.8% lower than that obtained using LSTM. Our proposed method results in less fluctuation and ensures consistency. Based on the comparative experiments, the SVM is generally unsuitable for large-scale datasets, while KNN suffers from higher inference times as the dataset becomes larger. Moreover, both KNN and the decision tree have limited capabilities in modeling the temporal dynamics of sEMG signals. The GRU is a simplified version of LSTM and consists of only two parts: a reset gate and an update gate. There is no independent “memory unit” in the GRU. Although the CNN plays an important role in image processing, its performance is inferior to LSTM when processing EMG sequences or simple feature sequences. Therefore, the overall accuracies of these two deep learning architectures were lower for gesture recognition. In contrast, the LSTM architecture is inherently better at capturing temporal dependencies in EMG data, which leads to higher recognition accuracy. However, the performance of LSTM may be unstable due to its manually set hyperparameters.

In our experiment, the average training time for standard LSTM was 28.26 s. After introducing the GA for the hyperparameter search, the total training time increased to 1409.70 s. The main time cost was due to the repeated training and validation processes of candidate hyperparameter configurations. Although the computational cost increases significantly, the advantage of our proposed method is that it lowers the standard deviation in the same data partition during experiments on different subjects, demonstrating more stable generalization performance. Moreover, the introduction of the GA effectively reduces the reliance on manual repetitive experiments and hyperparameter adjustments, and the optimal hyperparameter combination can be found in a single search process. This optimal configuration can be used repeatedly in the long term, thereby reducing the parameter adjustment costs in new experiments and improving the automation and scalability of the experimental process. Once the training of our model is completed, the inference cost remains approximately the same as that of standard LSTM. Overall, our improved LSTM has notable advantages.

5. Conclusions

In this paper, an improved model was built by introducing hyperparameter optimization to a GA-based LSTM for hand motion intention recognition. After sEMG preprocessing, we compared the gesture prediction performance based on different window lengths and overlap rates, aiming to ensure the sample number and extract effective features. After confirming the window length and overlap rate, we introduced the optimal crucial hyperparameter combination (the LSTM1 units, LSTM2 units, Dropout1 rate, Dropout2 rate, learning rate, and batch size) to the GA to optimize the conventional LSTM model. After hyperparameter optimization, the model could predict hand motion intention more accurately. The highest recognition rate for seven gestures (100.0%) and the average recognition rate for all subjects (89.7%) were superior to the performance based on manually extracted features and conventional classifiers. Our proposed method may provide a channel in human–computer interaction tasks, such as rehabilitation training, industrial operation, and supply chains.

Since the hand motion intention recognition performance differed among the subjects, our future work will generalize the model by means of transfer learning. Additionally, we will recruit stroke patients or amputees, detect their sEMG signals, and try to predict their hand motion intentions. Meantime, we aim to find differences in EMG patterns between patients and healthy people.

Author Contributions

Conceptualization, T.-A.C. and Z.C.; Methodology, H.Z.; Software, Y.D. (Yanyun Dai) and M.F.; Validation, L.J., Y.D. (Yiwei Dai) and J.T.; Formal Analysis, H.Z. and C.W.; Investigation, L.J. and J.T.; Resources: C.W.; Data Curation, H.Z. and Y.D. (Yiwei Dai); Writing—Original Draft Preparation, T.-A.C. and H.Z.; Writing—Review and Editing, T.-A.C., Z.C. and J.T.; Visualization, M.F.; Supervision, T.-A.C., Z.C. and J.T.; Project Administration, T.-A.C. and Z.C.; Funding Acquisition, T.-A.C. and Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was sponsored by the “Pioneer” R&D Program of Zhejiang (No. 2025C01088), the Foundation of the Zhejiang Educational Committee (No. Y202456686), the Natural Science Foundation of Zhejiang Province (No. 25222260-D), and the Foundation of Zhejiang Sci-Tech University (No. 23222218-Y).

Data Availability Statement

The publicly available dataset presented in this study can be found in an online repository. The name of the repository and the accession number can be found at https://github.com/Agire/BandMyo-Dataset (accessed on 12 August 2024).

Conflicts of Interest

Author Tian-Ao Cao was employed by the company Weihai Sunfull Electronics Group Co., Ltd., Weihai, Shandong, China. The remaining authors declare that this research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Appendix A

Appendix A.1

Table A1. The key parameters for genetic algorithm optimization.

Parameter Type	Variable Name	Recommended Range
Population Parameters	POPULATION_SIZE	20
Population Parameters	ELITE_SIZE	3
Evolution Parameters	GENERATIONS	10
	CROSSOVER_PROB	0.8
	MUTATION_PROB	0.1
Training Parameters	TRAIN_EPOCHS	8
Training Parameters	EARLY_STOPPING_PATIENCE	3

Appendix A.2

Table A2. Pseudocode of improved LSTM model using GA via optimal key hyperparameter combination.

Genetic Algorithm-Optimized LSTM Pseudocode

// 1. Initialization parameters
POPULATION_SIZ
GENERATION
CROSSOVER_PRO
MUTATION_PRO

// 2. Fitness function
FUNCTION evaluate_fitness (individual):
lstm_units1, lstm_units2 = individual [0], individual [1]
dropout1_rate, dropout2_rate = individual [2], individual [3]
learning_rate, batch_size = individual [4], individual [5]

// Train the LSTM model
model = create_lstm_model (...)
history = model.fit (epochs=TRAIN_EPOCHS, …)

// Calculate fitness
best_accuracy = max (history.val_accuracy)
stability_penalty = std (history.val_accuracy) * STABILITY_WEIGHT
fitness = -(best_accuracy - stability_penalty)

RETURN fitness
END FUNCTION

// 3. Main loop
FOR generation = 1 TO GENERATIONS:
// Assess the population
FOR each individual IN population:
individual.fitness = evaluate_fitness (individual)
// Elite retention
hall_of_fame.update (population)
// Selection, crossover, mutation, adaptive mutation intensity
parents = tournament_selection (population, siz)
offspring = two-point crossover and gaussian mutate (parents, CROSSOVER_PROB, MUTATION_PROB)
// Update the population
population = select_best (offspring + hall_of_fame, POPULATION_SIZE) END FOR

RETURN get_best_hyperparameters (hall_of_fame)

Appendix B

Figure A1. The convergence curve of the genetic algorithm for Subject 000 when using our proposed method for seven-gesture recognition.

References

Atzori, M.; Gijsberts, A.; Castellini, C.; Caputo, B. Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Sci. Data 2014, 1, 140053. [Google Scholar] [CrossRef]
Vogel, J.; Hagengruber, A.; Iskandar, M.; Quere, G. EDAN: An EMG-controlled daily assistant to help people with physical disabilities. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 4183–4190. [Google Scholar] [CrossRef]
Liu, Y.; Guo, S.; Yang, Z.; Hirata, H.; Tamiya, T. A home-based bilateral rehabilitation system with sEMG-based real-time variable stiffness. IEEE J. Biomed. Health Inform. 2020, 25, 1529–1541. [Google Scholar] [CrossRef]
Sun, R.; Song, R.; Tong, K.Y. Complexity analysis of EMG signals for patients after stroke during robot-aided rehabilitation training using fuzzy approximate entropy. IEEE Trans. Neural Syst. Rehabil. Eng. 2013, 22, 1013–1019. [Google Scholar] [CrossRef]
Hye, N.M.; Hany, U.; Chakravarty, S.; Akter, L.; Ahmed, I. Artificial intelligence for sEMG-based muscular movement recognition for hand prosthesis. IEEE Access 2023, 11, 38850–38863. [Google Scholar] [CrossRef]
Al-Timemy, A.H.; Khushaba, R.N.; Bugmann, G.; Escudero, J. Improving the performance against force variation of EMG controlled multifunctional upper-limb prostheses for transradial amputees. IEEE Trans. Neural Syst. Rehabil. Eng. 2015, 24, 650–661. [Google Scholar] [CrossRef]
Meattini, R.; Benatti, S.; Scarcia, U.; De Gregorio, D.; Benini, L.; Melchiorri, C. An sEMG-based human–robot interface for robotic hands using machine learning and synergies. IEEE Trans. Compon. Pack. Manuf. Technol. 2018, 8, 1149–1158. [Google Scholar] [CrossRef]
Oskoei, M.A.; Hu, H. Myoelectric control systems—A survey. Biomed. Signal Process. Control 2007, 2, 275–294. [Google Scholar] [CrossRef]
Saponas, T.S.; Tan, D.S.; Morris, D.; Balakrishnan, R.; Turner, J.; Landay, J.A. Enabling always-available input with muscle-computer interfaces. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology, Victoria, Canada, 4–7 October 2009; pp. 167–176. [Google Scholar] [CrossRef]
Farina, D.; Jiang, N.; Rehbaum, H.; Holobar, A.; Graimann, B.; Dietl, H.; Aszmann, O.C. The extraction of neural information from the surface EMG for the control of upper-limb prostheses: Emerging avenues and challenges. IEEE Trans. Neural Syst. Rehabil. Eng. 2014, 22, 797–809. [Google Scholar] [CrossRef] [PubMed]
Lin, M.; Huang, J.; Fu, J.; Sun, Y.; Fang, Q. A VR-based motor imagery training system with EMG-based real-time feedback for post-stroke rehabilitation. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 31, 1–10. [Google Scholar] [CrossRef] [PubMed]
Jo, H.N.; Park, S.W.; Choi, H.G.; Han, S.H.; Kim, T.S. Development of an Electrooculogram (EOG) and surface Electromyogram (sEMG)-based human computer interface (HCI) using a bone conduction headphone integrated bio-signal acquisition system. Electronics 2022, 11, 2561. [Google Scholar] [CrossRef]
Qing, Z.; Lu, Z.; Cai, Y.; Wang, J. Elements influencing sEMG-based gesture decoding: Muscle fatigue, forearm angle and acquisition time. Sensors 2021, 21, 7713. [Google Scholar] [CrossRef] [PubMed]
Fatimah, B.; Singh, P.; Singhal, A.; Pachori, R.B. Hand movement recognition from sEMG signals using Fourier decomposition method. Biocybern. Biomed. Eng. 2021, 41, 690–703. [Google Scholar] [CrossRef]
Rani, P.; Pancholi, S.; Shaw, V.; Atzori, M.; Kumar, S. Enhancing gesture classification using active EMG band and advanced feature extraction technique. IEEE Sens. J. 2023, 24, 5246–5255. [Google Scholar] [CrossRef]
Junior, J.J.A.M.; Freitas, M.L.B.; Siqueira, H.V.; Lazzaretti, A.E.; Pichorim, S.F.; Stevan, S.L. Feature selection and dimensionality reduction: An extensive comparison in hand gesture classification by sEMG in eight channels armband approach. Biomed. Signal Process. Control 2020, 59, 101920. [Google Scholar] [CrossRef]
Karheily, S.; Moukadem, A.; Courbot, J.B.; Abdeslam, D.O. sEMG time–frequency features for hand movements classification. Expert Syst. Appl. 2022, 210, 118282. [Google Scholar] [CrossRef]
Shen, S.; Gu, K.; Chen, X.; Wang, R. Motion classification based on sEMG signals using deep learning. In Proceedings of the Machine Learning and Intelligent Communications: 4th International Conference (MLICOM 2019), Nanjing, China, 24–25 August 2019; pp. 563–572. [Google Scholar] [CrossRef]
Xu, Z.; Yu, J.; Xiang, W.; Zhu, S.; Hussain, M.; Liu, B.; Li, J. A novel SE-CNN attention architecture for sEMG-based hand gesture recognition. CMES 2023, 134, 157–177. [Google Scholar] [CrossRef]
Luo, X.; Huang, W.; Wang, Z.; Li, Y.; Duan, X. InRes-ACNet: Gesture recognition model of multi-scale attention mechanisms based on surface Electromyography signals. Appl. Sci. 2024, 14, 3237. [Google Scholar] [CrossRef]
Sehat, K.; Shokouhyan, S.M.; Abdallah, N.K.; Khalafet, K. Deep network optimization using a genetic algorithm for recognizing hand gestures via EMG signals. Preprints 2023, 2023010075. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, Y.; Yu, H.; Yang, X.; Lu, W. Learning effective spatial–temporal features for sEMG armband-based gesture recognition. IEEE Internet Things J. 2020, 7, 6979–6992. [Google Scholar] [CrossRef]
Lin, C.; Cui, Z.; Chen, C.; Liu, Y.; Chen, C.; Jiang, N. A fast gradient convolution kernel compensation method for surface electromyogram decomposition. J. Electromyogr. Kinesiol. 2024, 76, 102869. [Google Scholar] [CrossRef]
Graupe, D.; Cline, W.K. Functional separation of EMG signals via ARMA identification methods for prosthesis control purposes. IEEE Trans. Syst. Man Cybern.-Syst. 1975, SMC-5, 252–259. [Google Scholar] [CrossRef]
Reddy, N.P.; Gupta, V. Toward direct biocontrol using surface EMG signals: Control of finger and wrist joint models. Med. Eng. Phys. 2007, 29, 398–403. [Google Scholar] [CrossRef]
Ghaemi, A.; Rashedi, E.; Pourrahimi, A.M.; Kamandar, M.; Rahdari, F. Automatic channel selection in EEG signals for classification of left or right hand movement in Brain Computer Interfaces using improved binary gravitation search algorithm. Biomed. Signal Process. Control 2017, 33, 109–118. [Google Scholar] [CrossRef]
Zhang, S.; Li, X.; Zong, M.; Zhu, X.; Cheng, D. Learning k for kNN classification. ACM Trans. Intell. Syst. Technol. 2017, 8, 1–19. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
Kim, S.; Lee, S.P. A BiLSTM–Transformer and 2D CNN architecture for emotion recognition from speech. Electronics 2023, 12, 4034. [Google Scholar] [CrossRef]
Kumar, P.; Kumar, R. A hybrid framework for time series trends: Embedding social network’s sentiments and optimized stacked LSTM using evolutionary algorithm. Multimed. Tools Appl. 2024, 83, 34691–34714. [Google Scholar] [CrossRef]
Khademi, Z.; Ebrahimi, F.; Kordy, H.M. A transfer learning-based CNN and LSTM hybrid deep learning model to classify motor imagery EEG signals. Comput. Biol. Med. 2022, 143, 105288. [Google Scholar] [CrossRef]
Cao, K.; Zhang, T.; Huang, J. Advanced hybrid LSTM-transformer architecture for real-time multi-task prediction in engineering systems. Sci. Rep. 2024, 14, 4890. [Google Scholar] [CrossRef]
Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed]
Jozefowicz, R.; Zaremba, W.; Sutskever, I. An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2342–2350. [Google Scholar]

Figure 1. The framework of our proposed model for hand motion intention recognition.

Figure 2. A schematic diagram of LSTM.

Figure 3. The optimization process of our proposed model.

Figure 4. Confusion matrix for seven-class gesture recognition using our proposed method.

Figure 5. The model’s performance changes when varying one hyperparameter while keeping the others fixed at their optimal values.

Figure 6. A comparison of our proposed method, the traditional classifiers, the GRU, and the CNN in the classification of the seven gestures for the different subjects.

Figure 7. Average accuracies of proposed and controlled methods.

Table 1. Five-category recognition rates of Subject 000 using overlapping sliding windows with different parameters.

Overlap Rate	Window Length
Overlap Rate	78	156	312
25%	/	93.3%	87.5%
50%	88.3%	96.4%	85.0%
75%	/	92.5%	93.8%

Table 2. The classification accuracies of our proposed method for five gestures (%).

Subject	Flex	Extend	Spread	Fist	Point
000	100.0	100.0	100.0	90.9	90.0
001	90.9	71.4	60.0	100.0	60.0
002	72.7	100.0	80.0	72.7	100.0
003	100.0	100.0	100.0	100.0	70.0
004	100.0	100.0	100.0	100.0	100.0
005	100.0	100.0	100.0	90.9	100.0

Table 3. Average recognition rates of five gestures using LSTM and our proposed method (%).

Subject	LSTM	Proposed Method
000	96.4	96.4
001	73.2	76.8
002	83.9	85.7
003	92.9	94.6
004	100.0	100.0
005	98.2	98.2
Mean ± Std	90.8 ± 10.3	92.0 ± 8.9

Table 4. Seven-class gesture recognition rates using LSTM and our proposed method (%).

Subject	LSTM	Proposed Method
000	87.3	93.7
001	63.3	67.1
002	89.9	89.9
003	87.3	88.6
004	98.7	100.0
005	97.5	98.7
Mean ± Std	87.3 ± 12.8	89.7 ± 12.0

Table 5. Evaluation indices for seven-gesture recognition using our proposed method.

Evaluation Index	Flex	Extend	Adduct	Abduct	Spread	Fist	Point
Sensitivity	0.814	0.855	0.902	0.890	0.908	0.920	0.920
Specificity	0.961	0.977	0.980	0.649	0.838	1.000	0.939
F1-score	0.882	0.912	0.940	0.751	0.872	0.959	0.929

Table 6. Comparison of all gesture classification methods based on BandMyo dataset (%).

SVM_1	RF_1	SVM_2	RF_2	STF-GR	Proposed Method
56.1	63.9	59.4	62.2	71.7	71.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, T.-A.; Zhou, H.; Chen, Z.; Dai, Y.; Fang, M.; Wu, C.; Jiang, L.; Dai, Y.; Tong, J. A Novel Hand Motion Intention Recognition Method That Decodes EMG Signals Based on an Improved LSTM. Symmetry 2025, 17, 1587. https://doi.org/10.3390/sym17101587

AMA Style

Cao T-A, Zhou H, Chen Z, Dai Y, Fang M, Wu C, Jiang L, Dai Y, Tong J. A Novel Hand Motion Intention Recognition Method That Decodes EMG Signals Based on an Improved LSTM. Symmetry. 2025; 17(10):1587. https://doi.org/10.3390/sym17101587

Chicago/Turabian Style

Cao, Tian-Ao, Hongyou Zhou, Zhengkui Chen, Yiwei Dai, Min Fang, Chengze Wu, Lurong Jiang, Yanyun Dai, and Jijun Tong. 2025. "A Novel Hand Motion Intention Recognition Method That Decodes EMG Signals Based on an Improved LSTM" Symmetry 17, no. 10: 1587. https://doi.org/10.3390/sym17101587

APA Style

Cao, T.-A., Zhou, H., Chen, Z., Dai, Y., Fang, M., Wu, C., Jiang, L., Dai, Y., & Tong, J. (2025). A Novel Hand Motion Intention Recognition Method That Decodes EMG Signals Based on an Improved LSTM. Symmetry, 17(10), 1587. https://doi.org/10.3390/sym17101587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Hand Motion Intention Recognition Method That Decodes EMG Signals Based on an Improved LSTM

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Paradigm

2.2. Preprocessing and Manual Feature Extraction

2.3. Construction of Deep Learning Model

2.4. Improving LSTM Model Using GA via Optimal Key Hyperparameter Combination

3. Results

3.1. Selection of Overlapping Sliding Window Parameters

3.2. Optimal Hyperparameter Combination Using GA

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1

Appendix A.2

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI