SEMG Feature Extraction Based on Stockwell Transform Improves Hand Movement Recognition Accuracy

Feature extraction, as an important method for extracting useful information from surface electromyography (SEMG), can significantly improve pattern recognition accuracy. Time and frequency analysis methods have been widely used for feature extraction, but these methods analyze SEMG signals only from the time or frequency domain. Recent studies have shown that feature extraction based on time-frequency analysis methods can extract more useful information from SEMG signals. This paper proposes a novel time-frequency analysis method based on the Stockwell transform (S-transform) to improve hand movement recognition accuracy from forearm SEMG signals. First, the time-frequency analysis method, S-transform, is used for extracting a feature vector from forearm SEMG signals. Second, to reduce the amount of calculations and improve the running speed of the classifier, principal component analysis (PCA) is used for dimensionality reduction of the feature vector. Finally, an artificial neural network (ANN)-based multilayer perceptron (MLP) is used for recognizing hand movements. Experimental results show that the proposed feature extraction based on the S-transform analysis method can improve the class separability and hand movement recognition accuracy compared with wavelet transform and power spectral density methods.


Introduction
Electromyography (EMG) signals are divided into two main groups: needle and surface EMG (SEMG) signals. SEMG signals have been widely used in upper-limb prosthesis control because of their advantages of being convenient and noninvasive [1][2][3][4][5]. However, EMG classification is very complex because of its properties of nonlinearity, nonstationarity and subject dependency [6,7]. There are three main steps for EMG classification: preprocessing, feature extraction, and classifier design. The difficulty in EMG signal classification is extracting a feature vector that is able to classify several motions because the EMG signals are subject-dependent and have noise interference. This step is important because the feature vector has a direct impact on the accuracy of the classifier. This research focuses on SEMG feature extraction, which is applicable to hand movement recognition.
Many studies have attempted to extract a feature vector from SEMG signals using time or frequency analysis methods. For example, the EMG amplitude, root mean square, zero-crossing, autoregressive coefficients, Fourier transform coefficients and cepstrum coefficients have been used as components of the feature vector [8][9][10][11][12][13]. Angkoon et al. introduced thirty-seven state-of-the-art

SEMG Data Acquisition
In this research, forearm SEMG signals were collected with a BioRadio (wireless physiological collector from Great Lakes NeuroTechnologies company). Eight subjects, consisting of three amputees and five healthy subjects, participated in this experiment (see Figure 2). The basic information of the subjects is shown in Table 1. Two EMG acquisition channels (CH1 and CH2) were used to collect the SEMG signals. The two channels of surface electrodes were placed on the flexor carpi radialis muscle (CH1) and extensor carpi radialis longus muscle (CH2). For the amputees, the two electrode placements are the remnants of the two muscles. Each channel has two pairs of surface electrodes, and each electrode was separated from the other by 2 cm.
Before the data acquisition began, each subject was given a thorough written and oral explanation of the experiment itself, including the associated risks, the subjects would then sign an informed consent form. The experiment was approved by the Local Ethics Committee of Peking University.  Generally, the dominant energy of SEMG is concentrated in the range of 20-500 Hz, and its amplitude is limited to 0-10 mV. Therefore, the sampling frequency was set at 1000 Hz, and a bandpass filter of 20-500 Hz bandwidth was used. Each subject performed six hand motions: wrist flexion, wrist extension, hand open, hand close, key grip and rest state. The amputees imagined performing the same motions with the amputation side as with the sound side. The tests were repeated six times for each subject, resulting in 10 s of EMG signals per subject for each motion. The first three times were used for the learning procedures, and the remaining three times were used for the evaluation of recognition performance. Fatigue of the subjects was avoided with a 2-minute resting period after each exercise. In Figure 3, the raw surface EMG data of six motions in a 256-point window from subject 1 collected by CH1 are shown. The window size of the EMG sample was set to 250 ms (256 samples) for real-time control to guarantee a response time of less than 300 ms, which allows the users to operate the prosthetic hands without perceiving a time delay.

Algorithm Description
The classification of hand motions through the SEMG signal includes three main steps: preprocessing, feature extraction and pattern recognition. The preprocessing of SEMG signals was performed using hardware. The novel feature extraction with the S-transform was proposed, and four selected features were proposed for one channel after the S-transform. To speed up the computation, the PCA method was adopted to reduce the dimensions. The details of the design process of the ANN-based MLP were introduced, which include the structure of the ANN model and loss function calculation method.

Feature Extraction and Reduction
The S-transform is developed from the short-time Fourier transform and wavelet transform, and it was proposed by Stockwell in 1996 [21]. The S-transform adopts a Gaussian window function, and the window width is proportional to the inverse of the frequency, which avoids the defect of a fixed window width. In addition, the extracted feature used in the S-transform method is insensitive to noise. The S-transform of signal h(t) is where t and f represent the time and the frequency, respectively. i is the imaginary unit, and τ is the center of the Gaussian window function. Equation (1) shows that the width and height of the Gaussian window vary with frequency. Let From Equation (2), the S-transform of signal h(t) can also be shown as the following equation: C(α, f ) is a Fourier transform of S(τ, f ) (from τ to α), and A(α, f ) and B(α, f ) are the Fourier transforms of a(t, f ) and b(τ, f ) (from τ to α ), according to the convolution theorem: H(α + f ) is the Fourier transform of a(t, f ), the exponential term is the Fourier transform of b(τ, f ), and the S-transform can be calculated via inverse Fourier transform by Equation (5) (from α to τ): Let f → n/NT and τ → jT; the discrete S-transform can be shown as the following equations: where j, m, and n = 0, 1, ..., N − 1. As shown in Equation (8), the H[k] is discrete transform of time series h(m). From Equation (7), the results of the discrete S-transform is a two-dimensional matrix, which is named the S-matrix. The raw elements of the S-matrix represent the frequency, and the column elements represent the sampling time. The S-matrix can be calculated according to Equation (7) using the fast Fourier transform and the inverse Fourier transform. The raw signals were segmented into a 256-point window, and a signal with 256 points was defined as a segmented signal. Therefore, a matrix of segmented signals (2 × 256 matrix) can be one input for the control system of the prosthetic hand. The raw EMG signal can be collected as a two-dimensional matrix of time amplitude, and the S-transform can transform it into a three-dimensional matrix of time-frequency-amplitude.
The calculations of four features extracted from the S-matrix are studied in this research: (1) Maximum frequency in one segment signal (256 points) F1; (2) The amplitude of the maximum frequency in one segment signal (256 points) F2; (3) The median frequency in one segment signal F3; (4) The energy of one segment signal F4. The other channel of the feature vector is defined as F(F5, F6, F7, F8). Due to the two channels of EMG signals, the feature vector becomes eight dimensional (see Figure 4).
Next, the PCA method is used for feature reduction, which can improve the speed of the classifier. As shown in Figure 4, PCA preforms dimensionality reduction for two channel features, and after reduction by PCA, the number of features is reduced from eight to three. PCA has the advantage of automatically ranking the importance of features in the projection space. Then, the three features were used as an input vector for pattern recognition.

Classification
Finally, an ANN is used as the classifier. The ANN has one hidden layer with 10 neurons (see Figure 5). The network structure was determined by trial and error. The forward-propagating order is: linear function, ReLU function, linear function and softmax function. This process can be represented by the following equations: where x (i) is the input vector, W (1) and W (2) are weight parameters, and b [1](i) and b [2][i] are bias parameters. σ represents the softmax regression function, and k represents the labels. i = 0, 1, ..., 10, k = 0, 1, ..., 6. y a represents the probability of the kth motion. The cost function J can be computed as follows: whereŷ b is the truth value of the ith category and y b is the prediction results of the output. The advanced optimization method is used for the backpropagation of the neural network, and our final target is minimizing the cost function L. The PCA-reduced feature vector is used as the input vector, and after training the neural network, the optimized parameter can be determined. Each label maps one hand motion, and the labels turn to one-hot encoding for the data processing (see Figure 5). This classifier uses the adaptive moment estimation (Adam) optimization method, which is an adaptive learning rate method. This method can be described by the following equations for l = 1, ..., L: It calculates an exponentially weighted average of past gradients and stores it in variables v and v corrected . Variables s and s corrected were used to store the average weights of the squares of past gradients. Here, • t counts the number of steps taken by the Adam optimizer • L is the number of layers • β 1 and β 2 are hyperparameters that control the two exponentially weighted averages; generally, β 1 = 0.9, and β 2 = 0.999 • α is the learning rate • ε is a very small number to avoid dividing by zero; generally, ε =10E−8 ! " !"!"!"!" "! !"!"!"!" "! " ! !"!"!" "! " !"! !"!" "! " !"!"! !" "! " !"!"!"! The ANN model parameters were determined by the training data when the cost function J reached the minimum value, and the accuracy of the classifier in the test set was obtained, which can evaluate the performance of feature extraction methods.

Results
This research compared three feature extraction methods: power spectral density, wavelet packet transform and the proposed S-transform. Power spectral density and wavelet packet transform are popular advanced methods in EMG feature extraction. From the statistical criterion method and classifier measurement, the best feature selection method can be obtained.
Power spectral density is a frequency-domain analysis method. The mean frequency, median frequency and peak frequency are selected as features via the power spectral density. Each feature was calculated by 256 points. The mean frequency F a , median frequency F b , and peak frequency F c are shown as the following equations: where f j is the frequency of the spectrum at frequency bin j, P j is the EMG power spectrum at frequency bin j, and M is the length of the frequency bin. After the PCA method, the six features from the two channels are reduced to three. The PCA-reduced features are used as an input vector for the ANN-based MLP. The wavelet packet transform is a time-frequency analysis method. Englehart. K. used the symmlet mother wavelet of order five as the wavelet packet basis function, and the results showed that it has a good effect on the EMG signal process [19]. Three levels of wavelet decomposition and symmlet mother wavelet of order five were adopted in this research. The original signal is S, and a decomposition of S into binary-tree-structured subspaces using the wavelet packet transform is shown in Figure 6. In this study, the maximum wavelet packet coefficient was chosen as the EMG feature. This was explained by the following equation: where S i are wavelet coefficients after the decomposition, level = 3, and F represents the feature vector. Finally, the features were reduced by the PCA method in the same manner as the S-transform.

Evaluation of the S-transform Method
To evaluate the classification performance, the scatter plots of EMG features using the S-transform method are shown in Figure 7a, which are compared with the scatter plots of EMG features using the wavelet transform (see Figure 7b) and power spectral density (see Figure 7c). There are three features after feature reduction. The scatter plots of each of the two features are shown from six movements in three methods. Each movement uses a specific symbol and color with 100 sample points. Class separability can evaluate the performance of EMG features. The good quality in class separability indicates that the highest separation between classes is obtained. Generally, the statistical criterion method was used to evaluate the distance between two scatter groups and directly address the variation of features in the same group.
The RES index [31] uses the Euclidean distance as a distance function and the standard deviation as a dispersion measure. Euclidean distance is calculated as the root of square differences between coordinates of a pair of objects, and standard distance is the most robust and widely used measure of the variability. The best extraction performance is obtained when the Euclidean distance value is high and the standard deviation value is low.
The RES index uses the Euclidean distance as the numerator and the standard deviation as the denominator. Thus, a larger RES index signifies a higher class separation of each class. The RES(p, q) index of two motions (p and q) can be expressed as F norm = r + min(r) max(r + min(r) s where p and q are the feature means of two motions from six hand motions with two channels. r is the feature of the w th (w = 100) window of N w , and σ is the feature mean of all windows. σ is the average between the standard deviation of two motions (p and q) with two channels. In addition, the EMG features are normalized before calculating the RES index. The normalization of features F norm is performed, which can be expressed as Equation (22). The average of RES(p, q) from fifteen possible combinations of six hand movements was defined as the RES index. From the results, the RES indices of features using the proposed S-transform method are 9.14, 6.98, and 7.16, which are higher than the RES indices using wavelet transform and power spectral density, which means that the S-transform has the best quality of class separability. The results of the evaluation indicate that the S-transform is an effective feature extraction method.

Results of the Classifier
With the training data, the parameters of the ANN-based MLP were determined in the learning procedure. The recognition accuracies of the classifier using different feature extraction methods with test data are shown in Figure 8. The S-transform features provide 97.46%, 97.63%, 97.98%, 98.83%, 98.65%, 98.21%, 98.3%, and 97.91% mean accuracy in each subject. The wavelet packet transform provides 96.82%, 97.32%, 97.45%, 98.21%, 97.96%, 97.88%, 97.76%, and 97.44% mean accuracy, and the power spectral density provides 93.62%, 94.23%, 94.86%, 96.71%, 96.9% ,96.88%, 97.11%, and 97.21% mean accuracy in each subject. The results that used the proposed feature extraction method with four subjects showed higher accuracy than the wavelet transform method and power spectral density method.
In Figure 9, the red line represents the S-transform, the blue line represents the wavelet packet transform, and the green line represents the power spectral density. The average accuracy was improved by increasing the number of features with the three methods. When the number of features is equal to three, the speed of the increase becomes very slow. Comprehensively considering the performance and application, the three features will be more effective for the EMG classification method. With different numbers of features, the S-transform is superior to the wavelet transform and power spectral density. Thus, the S-transform can improve the class separability and classification accuracy.

Discussion
In this paper, the authors have proposed the S-transform as an efficient method for SEMG feature extraction. To the authors' knowledge, this is the first study in which this algorithm has applied for hand movement recognition. Feature extraction is an important step in myoelectric pattern recognition, and the proposed method shows excellent performance in extracting features and classifying the SEMG data for different hand movements compared with other methods.
Previous studies have shown method based on the wavelet transform and the power spectral density have achieved competitive accuracy for classifying hand gestures compared with other existing methods [3,14]. Although its performance appeared to be increased by only a few percent (see Figure 8), the S-transform method was superior to the wavelet transform and power spectral density methods under the controlled conditions of this study, as proven using statistical methods. In many studies, the number of electrodes is increased to improve the classification accuracy [32][33][34]. However, the usage of multiple EMG sensors on a prosthetic hand is not practical. First, the weight of the socket will increase, causing an amputee to feel discomfort when wearing the SEMG sensor array. Second, as the number of electrodes increases, more dimensions of data must be processed, and the amount of computation will also increase. Thus, commercial advanced prosthetics typically have only one or two electrodes [35]. This paper therefore considers two channels, consistent with typical prosthetic hand applications.
Many studies have identified more than 10 distinct hand movements and have achieved up to 80-90% accuracy. In this paper, the authors use 2 electrodes to distinguish 6 hand movements with up to 95-98% under controlled conditions. The ability to recognize 6 hand movements for a prosthetic hand is superior to the capabilities of current commercial intelligent prostheses. Recognizing a larger number of hand movements would hinder practical application due to issues related to computational complexity and practicality.
In this paper, a three-layer ANN algorithm has been adopted as the classifier to evaluate the quality achieved with different numbers of features. In designing the classifier, a linear softmax model is used here to learn a good distribution of factorized features. Although a convolutional neural network (CNN) could offer improved accuracy for hand movement classification [36], a large amount of data would be required for pretraining. Because a CNN has many layers, its calculation time is long. Therefore, an ANN was instead adopted in this paper to evaluate the proposed feature extraction method. The proposed time-frequency method requires feature reduction due to the high dimensionality of the data; therefore, choosing an optimal number of features is very important. As seen from Figure 8, there are considerable changes after dimensionality reduction to between 1 dimension and three dimensions, whereas the change in accuracy with more than three dimensions is stable. Therefore, 3-dimensional features are chosen in this paper.
After more than 70 years of development of EMG signal control methods, the main approaches are still dominated by threshold control. The main reason is that EMG signals are nonlinear and random, making it very difficult to extract characteristic quantities. The superiority of the S-transform has been verified offline in this paper. In the future, this method could be used to control various movements of prosthetic limbs in an online manner. In practical applications, the characteristics of different movements may change with muscle fatigue; in such a case, the parameters of the neural network will need to be relearned to adjust to these changes. If the classifier can learn autonomously, then the algorithm proposed in this paper, combined with an artificial intelligence algorithm, will be very effective in clinical application.

Conclusions
A method was proposed to extract forearm EMG signal features in the time-frequency domain using the S-transform. The dimension of the S-transform features was then reduced by PCA. Subsequently, an ANN-based MLP was proposed to classify the six hand motions. According to the results of the RES index, the proposed method has the best overall performance compared with current popular feature extraction methods. The experimental results show that the S-transform can offer more information about the EMG signal, which can improve the class separability and classification accuracy. In future work, this method could be used for prosthetic hand control, which could improve the accuracy of hand motion classification.

Conflicts of Interest:
The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript: