sEMG-Based Continuous Hand Action Prediction by Using Key State Transition and Model Pruning

Conventional classification of hand motions and continuous joint angle estimation based on sEMG have been widely studied in recent years. The classification task focuses on discrete motion recognition and shows poor real-time performance, while continuous joint angle estimation evaluates the real-time joint angles by the continuity of the limb. Few researchers have investigated continuous hand action prediction based on hand motion continuity. In our study, we propose the key state transition as a condition for continuous hand action prediction and simulate the prediction process using a sliding window with long-term memory. Firstly, the key state modeled by GMM-HMMs is set as the condition. Then, the sliding window is used to dynamically look for the key state transition. The prediction results are given while finding the key state transition. To extend continuous multigesture action prediction, we use model pruning to improve reusability. Eight subjects participated in the experiment, and the results show that the average accuracy of continuous two-hand actions is 97% with a 70 ms time delay, which is better than LSTM (94.15%, 308 ms) and GRU (93.83%, 300 ms). In supplementary experiments with continuous four-hand actions, over 85% prediction accuracy is achieved with an average time delay of 90 ms.


Introduction
The surface electromyography (sEMG) signals, as electrical signals reflecting neuromuscular function, can produce 30-150 ms earlier than the movement of hand action [1]. It contains useful information about human motion intention and is extensively used to control prosthetic hands [2][3][4]. There has been considerable interest in recognizing human motion intention by sEMG signals in recent years.
Two main aspects of applications have been implemented: discrete motion classification and continuous joint motion estimation [5]. Discrete motion classification is used to recognize specific gestures of the upper limb [6][7][8][9] and lower limb [10][11][12][13] without strict constraints on the extent of motion.
Relevant works on motion classification emphasize using appropriate features and classifiers to enhance recognition accuracy. Narayan et al. [14] used a discrete wavelet transform (DWT) to extract the time-frequency features of sEMG signals. SVM was applied to classify six gestures with the features. The results showed that 95.8% accuracy could be achieved. Liu et al. [15] used the autoregressive power spectrum to recognize 13 different gestures, and the average accuracy was up to 95%. In another research study, Yang et al. [16] proposed that RMS values could be used to build feature vectors. After the division of substates for HMM, a GMM-HMMs model was proposed to distinguish six hand motions; the classification accuracy was more than 95% in eight-channel sEMG. Deep learning methods have also been widely studied [17,18]. Shen et al. [19] proposed a method to complete hand motion classification based on a convolutional neural network and stacking ensemble learning. The experimental results on the NinaPro DB5 dataset [20] outperformed other methods. The average accuracy was 11.5%, 13.6%, and 10.1% higher than LDA, SVM, and LCNN, respectively. Hu et al. [21] used an attention-based hybrid CNN and RNN architecture to capture better temporal properties of electromyogram signals for gesture recognition problems. A new sEMG image representation method that enabled deep learning architectures to extract implicit correlations between different channels for sparse multichannel electromyogram signals was proposed. The recognition accuracies on different datasets were higher than the state-of-the-art performance.
Although discrete motion classification is widely investigated in the field of sEMGbased pattern recognition, it is still hard to satisfy the real-time requirements efficiently. One reason is that it does not consider the factors which affect realistic scenarios, such as the discontinuity of the discrete motion. Generally, human-hand action is a continuous process for practical applications, and continuous motion estimation is more applicable for hand action recognition.
Currently, continuous joint motion estimation is the primary research direction in continuous motion estimation, which realizes intention recognition by establishing a mapping relationship between sEMG signals and a continuous variable such as joint angle or joint torque. Motion estimation plays an essential role in stable control for robots [22][23][24][25]. In recent studies, two methods have mainly been used: one is based on the biomechanical model [26], and the other is concentrated on the regression model [27]. Research based on the biomechanical model has been widely considered [28][29][30]. For example, Ao et al. [31] proposed a method based on combining the Hill muscle model and a linear scale model to control the ankle exoskeleton robot. The method improved the performance of human-robot cooperation control for an ankle power-assisted exoskeleton robot. Durandau et al. [32] used person-specific neuromechanical models to estimate biological ankle joint torques. The method provided a new solution for systematically adopting wearable robots. Although the biomechanical model has been widely validated in the control of robots, it still cannot satisfy the requirements for robot joint flexibility and precise positioning in terms of recognition accuracy and real-time performance. In order to realize a real-time and precise estimation, regression models, such as neural network models, multivariate linear models, and polynomial models have been proposed [33][34][35]. Gui et al. [36] and Artemiadis et al. [22] confirmed that regression models implemented the real-time control of the limb exoskeleton.
These methods for continuous joint motion estimation only consider the continuous motion process of the upper and lower limbs. Similar to the upper and lower limb movements, hand actions are also continuous, and it is necessary to explore the continuous process mechanism of hand actions. Furthermore, the accurate prediction of continuous hand actions based on this process mechanism can recognize the subsequent gesture in advance, which is conducive to the coherent execution of continuous hand actions.
In our study, a method for sEMG-based continuous hand action prediction is proposed. The overview of the motion prediction is described in Figure 1. Our method was based on the GMM-HMMs model, which assumed the time-varying properties of the Markov chain. Six states were divided to explain the process of continuous hand actions. The key state transition was chosen as the prediction condition to realize the advanced prediction of hand actions. Sliding windows with long-term memory were used to simulate the prediction process. Eight subjects were recruited to perform twelve continuous two-hand actions and four continuous four-hand actions, and every continuous two-hand action was modeled. As a result, twelve models needed to be used in the prediction task. Considering the processing time and mutual interference of multimodel recognition, we set four different groups while each group containing three models starting with the same hand gesture. Then, model pruning was used to bind the specific group based on the degree of model-fitting. Implementing model pruning could decrease the number of models and reduce crosstalk between models. Compared with previous studies on discrete motion classification and continuous joint motion estimation, our study supplemented the research on sEMG-based continuous hand action prediction and provided a new idea for future work.
The main contributions of this paper are as follows: • A new supplement to the research of sEMG-based motion intention recognition. • A modified Viterbi algorithm of GMM-HMMs which can build long-term memory for the prediction process. • A Model pruning which can expand the number of participating hand gestures for continuous multihand action prediction.
Section 2 describes the GMM-HMMs model, where the key state transition, sliding windows with long-term memory, and the model pruning are presented elaborately. Section 3 introduces experimental setups for twelve continuous two-hand actions and four continuous four-hand actions. Experimental results and discussions of continuous twohand actions, continuous four-hand actions, and baseline results are presented in Section 4. Section 5 concludes the paper.

GMM-HMMs
Different from discrete motion classification, the prediction of continuous hand actions needs more information on time dependence. The hidden Markov model is applicable for time series modeling [37]. By assuming a Markov random process, the relationship between hidden states and observation can be linked, and the continuous connection can be further developed. In the study, we propose a model based on GMM-HMMs to analyze the continuous two-hand actions. RMS values were extracted as features for GMM-HMMs and divided into 6 parts. Different GMM-HMMs of continuous two-hand actions were built. An HMM can be defined as: where A is the transition probability matrix, B is the observation probability matrix, and π is the initial state distribution. The hidden states can be defined as s = [s 1 ,s 2 ,. . . ,s N ] where N is the number of hidden states. In our work, N = 6, and s 1 refers to hand transition (rising), s 2 refers to hand stabilization, s 3 refers to hand transition (declining), s 4 refers to hand transition (rising), s 5 refers to hand stabilization, and s 6 refers to hand transition (declining). A = a ij defines the probability of transferring from state i at time t to state j at time t + 1.
where O t is a vector of feature values at time t. The initial state distribution, π = {π i }, defines the probability of being in state i at the start time point. The states of HMMs were modeled by a Gaussian mixture model.

Key State Transition and the Marginalization of Sliding Windows
The results should be achieved before hand actions are completed for the ideal prediction of continuous hand actions. Typically, a robust and reliable judging condition for prediction is essential. Continuous two-hand actions were divided into six states, and the fourth state, which played a vital role in linking the front and rear gestures, could be used as a key state transition for prediction. During continuous hand actions, the period of hand transition is much shorter than that of hand stabilization, and the overall real-time performance can be improved if the correct prediction results can be obtained during the period of hand transition. Relevant research has shown that prediction results can be obtained during hand transition [38]. It indicates that the application of key state transition can predict the subsequent hand gesture in advance.
After providing the judging condition for continuous hand action prediction, RMS values were sequentially input into the trained models, and then the path of the key state transition could be backtracked by maximum state probability [39]. Many factors still constrained the prediction process, especially its processing time. Although we preprocessed the sEMG signals, the features were still too long to input into the model. Moreover, continuous hand action prediction aims to recognize the next gesture in advance, instead of recognizing the whole process. Unlike classification tasks that used complete features to recognize hand actions, our study utilized a sliding window method to update features for a dynamic prediction. Generally, the sliding window is mainly used to limit the time steps of RMS values and simulates the prediction process. After inputting RMS values in a sliding window to the models, the path of states in these steps could be backtracked by the Viterbi algorithm [37]. Since the sliding window could only record the state path information of the current window, we used the marginalization of sliding windows to establish long-term memory between the front and rear sliding windows. Moreover, the sliding window method could realize dynamic prediction through the relational information between windows. Overall, the specific implementation was based on the Viterbi algorithm. The Viterbi algorithm, which can decode the key state transition, is expressed as follows: where δ t (i) denotes the maximum probability of state paths with the state i at time t; λ denotes the setting of GMM-HMMs; O t denotes the observation at time t; and N denotes the number of states.
To derive the global state path, the state recursion was expressed as follows: With the description of the Viterbi algorithm, we provide a modified Viterbi algorithm to realize the marginalization; the specific marginalization based on the sliding windows was expressed as: The maximum probability of the state in the first sliding window was calculated as: The maximum probability of the state in the subsequent sliding windows was modified as: When t is equal to 1, the last state probability distribution of the previous sliding windows was used to initialize the subsequent sliding windows.
The following formula could backtrack the path of the state transition: where b i (o t ) denotes the observation probability density in state i at time t; a ji represents the transition probability from state j to state i; π i is the initial state probability distribution; and T is the length of the sliding window. Generally, the marginalization is based on the inheritance of constraints on the state probability distribution of previous sliding windows. By utilizing δ t (i) of the last time step in the previous window as the initialization of the subsequent window, long-term memory between sliding windows was realized. The specific illustration is shown in Figure 1.

Model Pruning
The number of models is proportional to that of hand gestures in GMM-HMMs-based gesture recognition [40]. For real-time prediction tasks, inputting features into multiple models is bound to be time-consuming. In addition, many irrelevant models may also interfere with the prediction results. In order to reduce the mutual interference of multiple models on recognition results and real-time performance, we propose a method for model pruning. It was noted that model pruning could be used in continuous multigesture action prediction.
After constructing a set of GMM-HMMs on continuous two-hand actions, we could find the most probable model by using the maximum likelihood criterion with the distribution of the observation sequence [16]. During the configuration of model pruning, we applied that criterion to locate pruning models. The process of model pruning could be generalized into three steps. In order to facilitate further description, we set the front gesture in continuous two-hand actions as h f and the other as h r . Four different groups corresponding to four hand gestures were set, and each group contained three models starting with the same hand gesture.
Firstly, RMS values in the sliding window were input into multiple models, and then the state path with the maximum probability could be derived. It was noted that the maximum likelihood varied cumulatively using the marginalization of sliding windows. With the movement of the sliding window, the maximum likelihood would vary based on the degree of model-fitting. We searched for the key state transition in the model which had the maximum likelihood.
Secondly, the next hand action h r could be predicted based on the best-fitting model and the key state transition. We could bind the associated models by searching for the trained models, which were modeled by the h r group. Selected models could be determined from the whole model, and the number of models in the prediction task could be reduced.
Finally, the key state, which was the fourth state of the h f group, could be used as the first state of the h r group. In order to collaborate with this setting, the new starting point of the sliding window was also updated to the location of the key state. Furthermore, we could realize better predictions with fewer models. The detail of model pruning is shown in Figure 2 and Algorithm 1 is described as follows.

Experiment Setup
The datasets from previous works [20,41,42] only include the sEMG of discrete hand gestures, which are not applicable to continuous hand action prediction. Our study created a new dataset which was composed of the sEMG of continuous two-hand actions and continuous four-hand actions. The new dataset could be used for continuous hand action prediction.
Eight right-handed subjects (aged 24 to 30) participated in the experiment. Four distinct hand gestures were chosen to form twelve types of continuous two-hand actions, and each type of continuous two-hand action was formed by the continuous action of two different hand gestures. Figure 3 shows all twelve continuous two-hand actions, and four hand gestures are listed as follows: (1) making a fist, (2) fingers spread, (3) wrist extension, and (4) wrist flexion. The GForcePro+ armband system was used to record sEMG signals. Every continuous two-hand action was performed 100 times per person. Every continuous two-hand action lasted about 5 s, of which the front and rear gestures each lasted 2.5 s. In order to reduce the negative impact on muscle fatigue, participants were given a 5 s rest between every two repetitions. Besides, participants had to relax for 5 min after completing one continuous two-hand action with twenty repetitions. The recorded sEMG signals were collected at a frequency of 1000 Hz and analyzed using MATLAB 2020b. The configuration of the dataset is shown in Table 1.

Data Preprocessing
Collected sEMG signals were preprocessed using a bandpass filter (20 Hz-150 Hz) to remove unusable noise and a notch filter to remove 50 Hz power disturbances. An energybased active segmentation algorithm was used to segment sEMG signals for accurate data acquisition [16]. On the filtered sEMG signals of continuous two-hand actions, we applied overlapping sliding windows to extract RMS values. To fully use the latent information in sEMG signals and reduce the processing time of long sEMG segment length, we set a 100 ms window length with a 50 ms window increment for sliding windows.

Training and Prediction Setup
In the experiment, a six-state fully connected left-to-right structure HMM was used. According to the setting of continuous two-hand actions, six states were divided into two parts, while each of them contained three states: transitive state (rising), steady state, and transitive state (declining). Since all the hand actions started from state one, the initial state probability of state one was set to 1, and those of other states were uniformly set to 0. We adopted a single-component GMM to represent the substate of HMMs, as Yang et al. [16] indicated that it was less time-consuming, and it could obtain a high recognition accuracy during modeling. For 12 continuous two-hand actions, every continuous two-hand action was modeled using 100 samples per person, and a total of 1200 samples were used to construct the whole model. Every model adopted 80% samples as training data and 20% as validation data. To avoid overfitting, the training and validation data were randomly selected and modeled ten times. The final prediction result was the mean of the ten prediction results.
In order to validate the feasibility of the setting on the key state transition, the data before states 4 and 5 were split for prediction. A sliding window with a length of 20 frames and an increment of 20 frames was used. As part of continuous multi-gesture actions, continuous four-hand actions were selected as being representative for prediction. Since continuous four-hand actions had plenty of combinations, we chose four different types for the experiment. For continuous four-hand actions, we still used RMS values as the input of the models. It was noted that the acquisition time needed to be limited to 10 s. We collected 30 samples for each type of continuous four-hand actions to predict.
The proposed method was used to predict continuous hand actions in advance; the special partitioning and modeling of continuous states made it better than other methods. To show the robustness of our method, we used conventional LSTM and GRU models, which have been widely utilized in discrete gesture prediction for comparison. In order to ensure the uniformity of the comparison, we also used the sliding window strategy on LSTM and GRU to simulate the prediction process. A sliding window with a length of 20 frames and an increment of 20 frames was used. Figure 4 demonstrates the prediction accuracy. We found that the prediction accuracy of state four and state five were all higher than 95%, which indicated that utilizing state four and state five as the key state transitions could predict hand actions correctly. Although the prediction accuracy of state five as the key state transition was higher than state four, the occurrence of state four was earlier than that of state five. It was more efficient to use state four as the key state transition to make the advanced prediction.

Validation on the Marginalization of Sliding Windows
We investigated the effect of different sliding window settings on the prediction results. Figure 5 illustrates the prediction accuracy of four different sliding window settings. It could be observed that all four different settings achieved over 92% recognition accuracy, which indicated that different settings of the sliding window had little effect on the results and proved that the marginalization of sliding windows could help us build long-term memory between sliding windows, which guaranteed the prediction accuracy. Furthermore, it allowed us to choose the appropriate sliding window setting based on demand instead of a fixed setting. We also analyzed the reasons for the different prediction results of four settings. In our study, we gave the prediction results as soon as the key state transition was found. Influenced by the size of the sliding window, the predicted time point of finding the key state transition was different, resulting in the difference between the four settings. We also assumed that selecting a reasonable size of the sliding window based on the length of RMS values would reduce the time-consuming for prediction.

Comparison with Other Methods
The normalized confusion matrix of our method is shown in Figure 6. In order to lift the limitation of LSTM and GRU [38] on the input dimension, we fixed the input starting point and used a sliding window to expand the data successively, in which the insufficient data were filled with zeros. The results are depicted in Figure 7. Table 2 shows the average prediction accuracy and the processing time for all subjects. As seen in Table 2, our proposed method achieved a better prediction accuracy and was less time-consuming than other methods. LSTM and GRU had a good prediction performance but unacceptable processing time. Although the same sliding-window strategy was used in LSTM and GRU, the real-time performance of these methods was poor. The main reason was that no matter whether the sliding window was updated, the input dimension was always limited to the maximum. Compared with these methods, our method only utilized the updated data from the sliding window as input for the prediction, which guaranteed less processing time.

Estimation of Continuous Four-Hand Actions
Continuous four-hand action prediction was tested to evaluate the performance of model pruning. The prediction results could also validate the feasibility of continuous multigesture action prediction. Since there were many combinations of continuous fourhand actions, we selected four types of them which are listed as follows: (1): making a fist-fingers spread-wrist flexion-making a fist; (2): wrist flexion-wrist extension-making a fist-fingers spread; (3): fingers spread-making a fist-wrist flexion-wrist extension; (4): wrist extension-fingers spread-making a fist-wrist flexion. Table 3 shows the comparison between model pruning and no model pruning. The average prediction accuracy when using model pruning was over 92.9%, while that without model pruning was only over 20%. The processing time with no model pruning was too poor to satisfy the real-time performance requirement, while model pruning could tremendously reduce the processing time. The prediction results of subjects are also illustrated in Table 4. These results indicated that our proposed model pruning method achieved a significant prediction accuracy and better real-time performance. Since our trained models were modeled by continuous two-hand actions and depended on the accumulation of relevant state probabilities, they could only be used to predict continuous two-hand actions. Surprisingly, when the RMS values exceeded the period, the accumulation of state probabilities was irrelevant to the subsequent hand gesture and would affect the subsequent prediction process, which resulted in a terrible performance without model pruning. These findings suggested that model pruning was necessary and should be utilized in the range of the period. We updated the starting point of state probability accumulation and selected the best-fitting models to predict the subsequent gesture, which could also reduce the processing time of the model prediction.  The prediction accuracy of continuous four-hand actions indicated that model pruning could apply to our work. Notably, as a part of continuous multigesture action prediction, the excellent prediction accuracy of continuous four-hand actions further confirmed the feasibility of continuous multigesture action prediction with our method.

Limitations
Currently, our study focuses on predicting the combinations of four distinctive hand gestures. The prediction task of the combinations of fine-hand actions has not been carried out yet. Moreover, subject-independent problems still exist in our method, and every subject must be modeled separately in the experiment. Our future work will concentrate on predicting continuous fine-hand actions and providing a robust universal model which can solve subject-independent problems.

Conclusions
In this paper, we proposed a continuous hand action prediction method based on the continuity of hand actions. In order to predict the subsequent hand gesture in advance, our method set the key state transition as the prediction condition. The GMM-HMMs model was utilized to model the states of continuous two-hand actions, and state four was set as the key state transition. The marginalization of sliding windows was used to build long-term memory for windows, and a dynamic sliding window was used to simulate the prediction process. Compared with LSTM and GRU models, our method achieved a higher prediction accuracy and a lower processing time for continuous two-hand actions. In the extended task of continuous multigesture action prediction, the GMM-HMMs models built by continuous two-hand actions could not predict continuous hand actions with more than two hand gestures. We proposed a model-pruning method which selected the best-fitting models and updated the starting point of state probability accumulation to overcome the limitation. The prediction results also confirmed the feasibility of the method on continuous multigesture action prediction.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data sharing is not applicable to this article. Please contact the authors for further requests.

Conflicts of Interest:
The authors declare no conflict of interest.