Feature Extraction and Selection for Myoelectric Control Based on Wearable EMG Sensors

Specialized myoelectric sensors have been used in prosthetics for decades, but, with recent advancements in wearable sensors, wireless communication and embedded technologies, wearable electromyographic (EMG) armbands are now commercially available for the general public. Due to physical, processing, and cost constraints, however, these armbands typically sample EMG signals at a lower frequency (e.g., 200 Hz for the Myo armband) than their clinical counterparts. It remains unclear whether existing EMG feature extraction methods, which largely evolved based on EMG signals sampled at 1000 Hz or above, are still effective for use with these emerging lower-bandwidth systems. In this study, the effects of sampling rate (low: 200 Hz vs. high: 1000 Hz) on the classification of hand and finger movements were evaluated for twenty-six different individual features and eight sets of multiple features using a variety of datasets comprised of both able-bodied and amputee subjects. The results show that, on average, classification accuracies drop significantly (p< 0.05) from 2% to 56% depending on the evaluated features when using the lower sampling rate, and especially for transradial amputee subjects. Importantly, for these subjects, no number of existing features can be combined to compensate for this loss in higher-frequency content. From these results, we identify two new sets of recommended EMG features (along with a novel feature, L-scale) that provide better performance for these emerging low-sampling rate systems.


Introduction
As many amputees and individuals with impaired motor function have difficulty using traditional user interfaces (e.g., joysticks, mice, and keyboards) or assistive and rehabilitative devices, more advanced hands-free human-computer interfaces are desirable. Recognition of human muscle activity, facilitated using surface electromyographic (EMG) signals generated during muscular contractions, has been seen as one of the more promising solutions [1,2]. To create EMG-based human-computer interfaces to be used in an everyday context, they should be simple and non-invasive, such as a watch, an armband, jewelry, or concealed beneath clothing [3]. With the advances in wearable sensors, wireless communication and embedded computing technologies, we can indeed now obtain EMG data unintrusively using wearable EMG armbands (for a review, see [4]). These EMG armbands typically include multiple EMG sensors positioned radially around the circumference of a flexible band, allowing ease of donning and wear in daily life. Arguably the most widely known EMG armband is the Myo armband by Thalmic Labs, a low-cost consumer-grade EMG device integrating an ARM Cortex-M4 based microcontroller unit, a set of eight dry EMG electrodes, frequency was chosen exclusively for comparison as the sampling rate used by the predominant commercially available consumer-grade wireless myoelectric armband. The second purpose of this study was to identify a set of features that is more accurate and robust for these new systems.
The current study extends the prior literature by considering 26 individual commonly used and newly proposed features, as well as eight different state-of-the-art multi-feature sets. Since reducing sampling rate decreases the number of data points, several large outlying data values (e.g., spurious background spikes) could highly influence features computed using conventional measures (e.g., mean, standard deviation, and variance) for an analysis window with a small number of data points [29]. Therefore, in this study, we also proposed to use a robust measure of statistical dispersion based on L-moment [30], which is unaffected by small numbers of outliers. Furthermore, four datasets from previous studies were used to investigate the confounding effects of sampling rate and practical robustness issues: (1) change in limb position; (2) change in forearm orientation; (3) variation in contraction intensity; and (4) amputee or transradial amputation.
Ultimately, this study highlights the differences in efficacy of EMG features as a result of lower sampling rates found in emerging wearable EMG technology, and informs the design and selection of correspondingly resilient features.

EMG Datasets
Surface EMG data obtained from four different datasets comprised of forty subjects (31 able-bodied subjects and 9 transradial amputees) were analyzed in this study. These datasets were collected independently at different institutes and are publicly available [26][27][28]31]. Because performance of EMG features can vary depending on experimental differences between datasets, multiple datasets are advantageous when examining the robustness and generalization of research findings [32,33]. The first three datasets were collected using high-resolution EMG recording systems and were used in this study to investigate practical robustness issues, while the last EMG dataset was collected using the Myo armband and was used in this study to identify a set of recommended EMG features for low sampling rate myoelectric control systems.
The experiment from which the last EMG dataset resulted was divided into three exercises [34]. Exercise A consists of 12 basic movements of the fingers (flexions and extensions). Exercise B consists of 8 isometric and isotonic hand configurations and 9 basic movements of the wrist (adduction/abduction, flexion/extension, and pronation/supination). Exercise C consists of 23 grasping and functional movements. It should be noted that all evaluated datasets were collected with an EMG armband placed around the circumference of the subject's forearm. More details about subjects, experiments, and data acquisition are reported in Table 1. Prior to analysis, EMG data were notch (50 Hz) and band-pass (20-500 Hz) filtered using a digital Butterworth finite impulse response (FIR) filter of order 4 to remove any power-line interference, motion artifact, and high-frequency random noise, which could possibly affect the performance of EMG features [35,36]. Data analyses in this study were conducted in accordance with the University of New Brunswick Research Ethics Review Board (REB 2008-083) while the ethics approvals and consent to participate for each dataset can be found in the original works [26][27][28]31].

Feature Extraction
Twenty-six feature extraction methods in time domain (24) and frequency domain (2) were selected from four functional EMG feature groups: the signal amplitude and power feature group, the nonlinear complexity and frequency information feature group, the time-series modeling feature group, and the unique feature group, to cover all types of meaningful information for EMG signal classification [32]. All feature extraction methods were performed using a window size of 250 ms with an increment of 125 ms (50% overlap), which has been shown to be suitable for use in real-time on an embedded system [37].

The Signal Amplitude and Power Feature Group
Eleven features were selected from the first functional feature group, which is used to estimate signal magnitude and power. These features are comprised of six commonly used features: integrated absolute value (IAV), mean absolute value (MAV), root mean square (RMS), variance (VAR), waveform length (WL), and log detector (LD) [12]; four recently proposed features: difference absolute mean value (DAMV), difference absolute standard deviation value (DASDV), difference variance value (DVARV), and the mean value of the square root (MSR) [38,39]; and a newly proposed feature in the present study, referred to as L-scale (LS). This novel feature for EMG is far less sensitive to outliers as compared to standard deviation because it uses the concept of L-moments (a linear function of the expected order statistics) [30]. If X is a real-valued random variable, the rth L-moment of X can be defined as where r is set at 2, X k:n denotes the kth order statistic of a random sample of size n, and E denotes the expected value. For an extended coverage of the LS and L-moment, the reader is encouraged to consult the original work [30].

The Nonlinear Complexity and Frequency Information Feature Group
The second functional feature group can be divided into two sub-groups: three nonlinear and complex features and five frequency information features. The first sub-group consists of maximum fractal length (MFL), detrended fluctuation analysis (DFA) and sample entropy (SampEn) [24,40]; and the second sub-group comprises zero crossing (ZC), slope sign change (SSC), Willison amplitude (WAMP), median frequency (MDF), and mean frequency (MNF) [12].

The Time-Series Modelling Feature Group
The third functional feature group includes the coefficients of time-varying linear predictive models: autoregressive coefficients (AR) and cepstrum coefficients (CC) [12]. Three different orders, namely 4, 6, and 9, of these features were employed [41][42][43]. It should be noted that these feature extraction techniques provided more than one feature value, and CC was derived from the AR model [43].

The Unique Feature Group
This last functional feature group was named unique, as it contains many varying feature extraction techniques [32]. Although features in this group are unique, capturing different kinds of information from the EMG signals, most of the features in this group could be considered as an extension of features in other groups and the individual discriminant power of features in this group is lower than features in other groups [32]. However, in this study, histogram (HIST) with nine data bins was chosen as a representative feature for this group based on its popularity in the literature [14]. It should be noted that HIST could be considered as an extension of the ZC and WAMP features.

Multi-Feature Sets
In addition to the individual features, eight different, previously proposed, multi-feature sets were evaluated in this study:

Feature Evaluation and Selection
For comparison, EMG signals from the first three datasets (Datasets 1-3) were down-sampled from the original sampling rate (Table 1) to 1000 Hz and 200 Hz using an anti-aliasing FIR low-pass filter. The dependence of the classification performance on the reduced sampling rate was investigated using a 10-fold cross-validation classification rate obtained from a support vector machine (SVM) classifier with a linear kernel [45]. Specifically, all data were randomly partitioned into 10 equally sized sub-datasets and a single sub-dataset was retained as testing data while the remaining 9 sub-datasets were used as training data for the classification model. The cross-validation process was then repeated for each of the 10 sub-datasets, and a single classification rate was computed by averaging from 10 results. For a common dataset, higher classification rates imply a higher degree of class separability or improved repeatability, both of which are desirable. The classifiers were also trained and tested independently on data from each subject. It should be noted that SVMs were employed because, in the literature, this classifier has shown a better classification performance than other commonly used classifiers such as linear discriminant analysis (LDA) and artificial neural networks [34,46,47]. In previous work [48], SVMs were tested and provided similar trends as other quantitative measures of feature space quality involving LDA, Davies-Bouldin index, and separability index [14,49]. This classification approach was also used to evaluate the classification performance of the EMG features for myoelectric control based on wearable EMG sensors (Dataset 4).
Paired-sample t-tests were used to test for differences between means of classification rates and results were considered significant for p < 0.05. The resulting p-values were adjusted using a Holm-Bonferroni method to maintain a family-wise alpha of 0.05 for tests on all single features and multi-feature sets. The size (or meaningfulness) of differences observed was measured using Cohen's effect size, d, defined as the difference between two group means divided by a standard deviation [50]. For interpretation and consistency, an effect size of 0.2 equates to a small effect, 0.5 equates to a medium effect, 0.8 equates to a large effect, 1.2 equates to a very large effect, and larger than 2.0 equates to a huge effect [50,51].
To examine whether the rank of features used to create a subset of multi-features that best predict the motion classes is the same for the two different sampling rates, a sequential forward selection (SFS) algorithm was performed independently for each case (Dataset 3) [52,53]. This feature selection approach was also used to identify a set of recommended EMG features that provide better performance for low-sampling rate systems (Dataset 4). EMG feature sets were selected using 70% of the data (a training set) and the performance of the selected features was examined using the remaining (30%) of the data (the testing set). The SFS method was applied across subjects using the 10-fold cross-validation method of within-subject classification rates on the training set.

Results
For the first aim of this study, the classification performance of the eight multi-feature sets and twenty-six single features was examined for the two sampling rates using an SVM classifier with 10-fold cross validation. Classifiers were trained and tested using EMG features from all conditions for each of the practical issues, and the results are shown in Tables 2 and 3, respectively. In addition to the quantitative measures, visual inspection of the graphs was performed. Figure 1 shows a comparison of surface EMG signals when down-sampled from 1000 Hz to 200 Hz in both time-and frequency-domains.   According to the Nyquist theorem, the sampling rate must be at least twice the highest frequency of interest in a signal, and thus the highest frequency component correctly represented for EMG signals sampled at 200 Hz and 1000 Hz is 100 Hz and 500 Hz, respectively.
The substantial loss of relevant signal power is visible in each domain. As a further illustration of the changes of EMG feature patterns due to down-sampling, representative scatter plots of the ZC features extracted from two different EMG channels of an amputee subject are shown in Figure 2. The rank and classification performance of the multi-feature sets, as selected using SFS (from sets of one to twenty-six features), are shown in Figure 3. To illustrate the effect of training strategies, Figure 4 shows the classification performance of the most commonly used multi-feature set, Hudgins' TD (MS1), under two different training strategies. The first strategy is to train a classifier with the EMG features from all the conditions, except the one being tested on. The second strategy is to train a classifier with the EMG features from each of the conditions individually while testing the trained classifier on unseen data from all possible conditions.
For the second aim of this study, the classification performance of the twenty-six single features, the eight previously proposed, and two newly proposed multi-feature sets was examined using EMG data recorded from the Myo armband, and the results are shown in Figures 5 and 6, respectively. The newly proposed multi-feature sets were determined using the SFS method. The first chosen by selecting the first local maximum after which no meaningful improvement was found (d < 0.2). This occurred including between four and five features, and resulted in classification accuracies of 88.6% for Experiment A, 82.0% for Experiment B, and 77.

Discussion
The first aim of this study was to examine the influence of two different common sampling rates (low: 200 Hz vs. high: 1000 Hz) on the classification of different hand and finger movements in able-bodied and amputee subjects. The present results show that the classification performance of all eight of the tested state-of-the-art multi-feature sets dropped significantly (p < 0.05) for all datasets and known practical issues (Table 2) when dropping the sampling rate from 1000 Hz to 200 Hz. These findings suggest that the lower sampling rate could not preserve sufficient control information for accurate classification of six-to-seven classes of hand and finger motions using 6-8 EMG channels. Although the dominant energy of surface EMG signals is in the range of 50-150 Hz, it is reaffirmed here that important signal energy exists at higher frequencies. It is clearly seen here that the accuracy in identifying multiple classes of hand and finger motions is shown to benefit from these high frequency components, especially in the case of transradial amputees (a more than 10% difference). Wilson et al. [18] even suggested that removal of the 20-120 Hz frequency band actually increased the classification accuracy of nine classes of motion significantly, and thus they recommend using the combination of sampling rate and high-pass cut-off frequency of 1000 Hz and 120 Hz to maximize classification performance in the presence of power line noise and motion artifacts. The present results agree with and extend upon our preliminary study [48] (which used a different EMG dataset of 20 able-bodied subjects, and considered another practical robustness issue, i.e., changes in the EMG signal itself over time) as well as those in [15][16][17].
The results of this investigation also included a comparison of twenty-six individual features (Table 3), which showed that the classification performance of all evaluated features decreased significantly (p < 0.05) with the reduced sampling rate. It is interesting to note, however, that the signal amplitude and power features incurred less of a reduction than those in other feature groups. Other than the MFL feature, the effect size of the differences in the nonlinear complexity and frequency information features could be considered as very large or huge (d = 1.5-5.5). The drastic results found for this feature group may be explained by the loss of high-frequency content in the signal, and their corresponding power and complexity information, observed in both the time domain and frequency domain (Figure 1). In addition, although time-series modeling features (i.e., AR and CC) obtained good accuracies (89-91% on average) when using a 1000 Hz sampling rate, they suffered the largest decreases in accuracy when dropping to the 200 Hz sampling rate (over 35% on average; d = 7.5-8.8) ( Table 3). This suggests that their benefit lies greatly in their use of the higher frequency content.
The changes in two EMG feature patterns were also examined visually to confirm the quantitative results ( Figure 2). Based on two channels of EMG, ZC features extracted from EMG data sampled at 1000 Hz can mostly discriminate between thumb flexion and index flexion motions (Figure 2). These two classes, however, cannot be separated using the ZC features extracted from the same EMG data down-sampled to 200 Hz. This is demonstrated by the increased intra-class variability and consequent overlap of both motion classes.
Decreased classification performance of some features (such as AR, CC, DFA, and SampEn) could also be because of insufficient number of data points in each analysis window, i.e., a window size of 250 ms in this study contains 250 data points when using a 1000 Hz sampling rate but only 50 data points when using the 200 Hz sampling rate. For instance, Yentes et al. [54] suggested that SampEn is extremely sensitive to parameter choices in very short datasets (<200 data points) and recommended the number of data points to be larger than 200, and as large as possible with respect to the practical constraints of the application. Due to real-time constraints, however, the total response time for myoelectric control, which includes both the window size and processing delay, should not exceed 300 ms [2,37]. Consequently, algorithms that require longer windows of data may be not suitable for the new generation of low-sampling rate wearable EMG devices. In addition, several studies have computed EMG features with a smaller size of window (50-200 ms) than is used in this study, and showed acceptable classification accuracies (>90%) [2]. However, these studies investigated the dependence of features on window size in the classification of EMG signals sampled at 1000 Hz or above. Using a 200 Hz sampling rate, a window size of 50 ms contains only 10 data points, and a few data points would reduce not only the classification performance of features, but also the robustness of the systems. One should thus exercise caution when working on data segmentation (window size and window incrementation).
As a result of the relatively poor 200 Hz performance of features from all functional feature groups (with perhaps the exception of the signal amplitude and power feature group), a combination of features across these functional feature groups did not provide meaningful improvement in performance for the low-sampling rate myoelectric control systems. For example, MAV features yielded an overall classification rate of 76.4% for amputee subjects while the MS1 feature set consisting of MAV, WL, ZC and SSC yielded an overall classification rate of 78.4% for the same subjects (only a 2% increase). In contrast, same comparison yielded an overall increase of 9% when using the 1000 Hz sampling rate. Similarly, and importantly, the recruitment of more (or even all) features could not compensate for the loss of information due to the lower sampling rate in amputee subjects. In fact, as can be observed in Figure 3, the set of AR9 and MFL computed using data sampled at 1000 Hz drastically outperformed all combinations of 200 Hz features.
Another interesting and important finding is that the feature sets selected by the SFS algorithm were vastly different for the two sampling rates (Figure 3). Using 1000 Hz, the classification rate approached the first local maximum value when four features (AR9, MFL, MSR, and DASDV) were employed, and remained relatively constant when more features were added (i.e., no significant differences between consecutively selected added features). In contrast, when using 200 Hz, the first four features selected consisted of LS, MFL, VAR, and DVARV. The classification rates continuously increased and approached a maximum value when nine features were used, and decreased when more features were added. Only the MFL feature was common to the best sets of four features, and the differences between the two sets increased with the number of features included. These results suggest that previous feature selection works based on 1000 Hz sampling rates may not be applicable to lower sampling rate myoelectric control systems.
Losing useful high frequency control information becomes an even more serious problem for myoelectric control when considering how to reduce the training burden, i.e., when EMG data are acquired under one practical condition when training the classifier (Figure 4). It has been shown [24][25][26][27][28] that the performance of EMG classifiers drops significantly when tasked with identifying EMG patterns from unseen conditions. Consequently, advanced training protocols have been proposed as a way of introducing added variability during training [55]. These results are reinforced here as, for instance, using the MS1 feature set at 200 Hz, a 78.4% classification rate was found when training with data from all three contraction intensities (Table 2). When training with only two and one contraction intensities and testing with unseen contraction intensities, however, the corresponding classification rates dropped to 56.4% and 46.0%, respectively ( Figure 4).
The second aim of the present study was to identify a set of recommended EMG features that provide better performance for myoelectric control based on low sampling rate wearable EMG sensors (here the Myo armband). This is the first study, to our knowledge, to experimentally perform a comprehensive investigation of the classification performance of a wide range of EMG features for myoelectric control based on this, or any other comparable, system. A similar trend was observed using the Myo ( Figure 5) as when down-sampling the data collected from higher-sampling rate EMG devices (Table 3), for all tested features. It is important to note that, due to the low resolution of A/D converter used (8 bits), many data points in this dataset are zero-valued, and thus the LD feature cannot be computed (as the logarithm of zero is undefined; for a mathematical definition, see [12]). By adding a positive value of 1 to each data point, the feature could be computed, but the classification results obtained remained less accurate than other signal amplitude and power features. As the LD feature can be replaced by other features in the same group [32], we recommend to avoid using the LD feature with the Myo device.
As a standalone feature, the LS feature, proposed in this study, offers the highest classification rates (84.5% for Experiment A, 74.9% for Experiment B, and 69.1% for Experiment C), followed closely by the commonly used EMG amplitude estimators (i.e., IAV, MAV, RMS, and WL) and the more recently proposed features in the signal amplitude and power feature group (i.e., DAMV, DASDV, and MSR). As the newly proposed feature, LS, is less sensitive to outliers in data than conventional measures and yields a better classification performance, this novel feature is recommended to be used instead of the common features in this group. For other functional feature groups, MFL could be used to extract nonlinear and complexity information while WAMP could be used to extract frequency information in EMG signals.
In this study, the newly proposed multi-feature set TD4 selected using SFS consists of LS and MSR from the signal amplitude and power feature group and MFL and WAMP from the nonlinear complexity and frequency information feature group. This selection of features from complementary feature groups is consistent with our previous findings using topological data analysis [32].
The newly proposed EMG feature sets, TD4 and TD9, outperformed other state-of-the-art multi-feature sets developed based on EMG signals sampled at 1000 Hz or above ( Figure 6). For instance, on average, the TD4 feature set improved classification accuracies by 1.8-3.4% (p < 0.05) as compared to the most commonly used Hudgins' TD feature set (MS1) with the same number of features. Furthermore, on average, the TD9 feature set improved classification accuracies by 3.5-5.5% (p < 0.05) as compared to the best of the eight state-of-the-art multi-feature sets (MS3), with a lower feature vector dimension. Therefore, these two newly proposed feature sets are recommended to be used for myoelectric control based on the new generation of lower-sampling rate wearable EMG sensors.

Limitations and Future Studies
The advent of non-invasive wearable EMG bands yields the promise of improved and more convenient human-computer interfaces for amputees and able-bodied individuals alike. Nevertheless, to improve their practicality and reliability, further studies are required that examine the effects of dynamic and confounding factors such as changes in EMG signal over time, limb position, forearm orientation, contraction intensity, and electrode locations. Other related factors in surface EMG measurement including but not limited to resolution of A/D, types of electrode, impedance, and cross talk, could be explored to better understand the performance of EMG feature extraction and pattern recognition techniques in emerging wearable EMG technology.
The current findings clearly showed that high-frequency content (>100 Hz) is critical to the performance of myoelectric control system. New and emerging EMG armbands should therefore be developed with due consideration for the trade-offs between design considerations such as power consumption and memory, and performance, such as accuracy and robustness. Further research investigating the wide range of potential sampling issues of EMG signals in motion recognition is required.
Finally, in this contribution, we focused on surface EMG signals. However, the loss of high frequency components could also degrade the classification of similar types of feature extraction methods in other biological signals used as prosthetic control signals [56]. The methods described here could be applied to evaluate other types of biological signals for a better understanding of the sampling rate effect in other areas of prosthetic control.

Conclusions
This investigation clearly showed that sampling rate has a significant impact on classification performance in identifying different classes of hand and finger movements. Using a 200 Hz sampling rate, as is used in the predominant commercially available wearable EMG armband, instead of a 1000 Hz sampling rate, results in a drastic reduction in discriminative information for use in myoelectric control. We further present two purposely engineered sets of EMG features, TD4 and TD9, including a novel feature, LS, for use in myoelectric control using lower-bandwidth wearable EMG sensors.
Author Contributions: A.P. and E.S. conceived and designed the experiments; A.P., R.N.K. and E.S. contributed reagents/materials/analysis tools; A.P. and E.S. analyzed the data; and A.P. wrote the paper.