Regulating Grip Forces through EMG-Controlled Protheses for Transradial Amputees

: This study aims to evaluate different combinations of features and algorithms to be used in the control of a prosthetic hand wherein both the conﬁguration of the ﬁngers and the gripping forces can be controlled. This requires identifying machine learning algorithms and feature sets to detect both intended force variation and hand gestures in EMG signals recorded from upper-limb amputees. However, despite the decades of research into pattern recognition techniques, each new problem requires researchers to ﬁnd a suitable classiﬁcation algorithm, as there is no such thing as a universal ’best’ solution. Consideration of different techniques and data representation represents a fundamental practice in order to achieve maximally effective results. To this end, we employ a publicly-available database recorded from amputees to evaluate different combinations of features and classiﬁers. Analysis of data from 9 different individuals shows that both for classic features and for time-dependent power spectrum descriptors (TD-PSD) the proposed logarithmically scaled version of the current window plus previous window achieves the highest classiﬁcation accuracy. Using linear discriminant analysis (LDA) as a classiﬁer and applying a majority-voting strategy to stabilize the individual window classiﬁcation, we obtain 88% accuracy with classic features and 89% with TD-PSD features.


Introduction
Paraphrasing the famous German philosopher Kant: "The hand is considered as an extension of the human brain to the outside", hence it has vital importance in daily life activities. The human hand is a prehensile organ necessary to carry out working, recreational and communicative activities in our daily lives. Therefore, upper limb amputees experience many different obstacles in their lives.
Upper limb amputations are not only due to accidents. Diseases such as obesity, diabetes, arthritis or vascular problems can lead to amputations, and because these diseases are becoming more and more prevalent, the number of amputees has increased. In numbers, around 30,000-40,000 amputations were performed in the U.S in 2019 and around 27,000 amputations were reported in England between the years 2015 and 2018 [1]. These values demonstrate an increase in demand for arm prosthetics, the market for which was valued at 697 million USD in 2019 and will reach 1408 million USD by the end of 2023, growing at a magnificent CAGR (12.43%) [2].
To overcome these adversities, prosthetic devices can be an effective solution. But as with any other assistive device, natural interaction between the human and the machine is essential. An ideal prosthetic hand should be intuitive to control and should quickly and reliably detect the intended gesture of the user [3]. But even this is not enough. Imagine carrying an egg in one hand and a glass bottle in the other. If we apply the same force on the egg and on the bottle, one of them will break. A reliable prosthetic hand must not only recognize intended hand posture, it must also successfully decode the user's intended force when grasping objects.
An attractive approach to controlling prosthetic devices is to command the device through the activation of remaining muscles, but this is a challenging endeavour. As the remaining muscles of the user's forearm differ considerably among users, the generalization of any decoding algorithm based on measured activation in remnant muscles is not trivial. Machine learning techniques can be seen as the tool that will translate and adapt each individual's remaining muscle response into a concrete command [4,5]. Indeed, feeding a time series of raw EMG directly to a classifier can be tempting, considering the Deep Learning algorithms that are currently trending in the literature [6], but these techniques require a large number of inputs to train a model. We therefore take the more traditional route: sEMG signals arrive in windows or batches, features are extracted from those windows and those features are used to train a conventional classifier.
As our goal is to control a self-contained prosthetic device, we have chosen to focus on time-domain features, from amongst the wide variety features that have been proposed [7], because the computational cost is lower than frequency domain features. We also consider that good results in experiments with able-bodied subjects may nevertheless yield poor results when transferred to amputees. Thus, we refer to the work of Campbell and colleagues [8] who found that classic features (mean absolute value, root mean square, zero crossing rate, slope sign changes and waveform length) and time-dependent power spectrum descriptors proposed in [9,10] (TD-PSD) remain functionally coherent when shifting from able-bodied participants to amputees. Thus, in our analysis we prioritize these features over others proposed in [7].
Analogous to the best feature subset identification, there are also many works in the literature comparing classifiers (see e.g., [11][12][13]), but few present a systematic quantitative comparison of features [14] and only in [9,10] were the EMG signals obtained from amputees. More importantly, detecting intended applied force has received little attention; with most studies relying on subjects with intact limbs. We therefore set out to extend the state-of-the-art by examining what machine-learning methodologies might be used to control grip force.
We have therefore chosen to use the database created by [10] to compare decoding strategies for gesture + force detection, this being, to our knowledge, the only publiclyavailable database containing labelled data recorded from amputees containing different force contraction levels. Although our study relies heavily on the previously published works [9,10] we build on these studies in three significant ways: First and foremost, although the original authors' main objective was to improve recognition of different hand gestures despite variations in the level of muscle activation, ours main goal is to allow the user to voluntarily modulate gripping forces applied by the finger through EMG control in a real life scenario. In pursuing that objective, we select a different set of features than those chosen by [9,10], based on the analysis reported in Campbell et al. [8]. Finally, we propose a novel look-back technique that logarithmically scales and concatenates two consecutive windows to improve reliability.
This document is structured as follows: First, we present the database to be analysed. Then, features and filters to be used are described. Finally, we show the results obtained and we conclude with recommendations for applications and further work.

EMG Database
For this investigation, we use the database collected by Al-Timemy et al. [10]. They recruited nine trans-radial amputees with unilateral amputation to participate in their study where six movements including different grip and finger movements were recorded. The gestures available in this database are: thumb flexion, index flexion, fine pinch, tripod grip, hook grip and spherical grip ( Figure 1). Each of these gestures was performed at 3 different force levels by the users: low, medium and high.

EMG Features
We use two different feature groups, which we call classic and TD-PSD features.

•
Classic Features: We extracted the following classic state-of-the-art features from the filtered data: Mean Absolute Value (MAV), Root Mean Square (RMS), Zero Crossing Rate(ZC),Slope Sign Changes (SSC) and Waveform Length (WL). We refer the user to [15] for a more extended description of these well known features. A single feature vector was constructed for each of these feature groups and used as input to a classifier (described below) to identify the user's intent.

Windowing and Look-Back Filtering
The selected features were extracted from small windows (150 ms) to simulate the real life scenario where signals arrive in batches. We windowed each trial of the data with overlaps of 50 ms. Instead of randomly associating each individual time window to either the training or test sets, however, we assigned all windows from a given trial to one or the other. This allows for specific processing of consecutive windows as would be the case in real-life scenarios where the user must first train and then use the device. A novelty of this investigation is that not only are the features extracted from the 'current' window employed; a combination of present and past windows is also considered.
For each feature set (classic and TD-PSD) we applied the classification algorithms to the feature vectors computed from a combination of the current and previous windows as follows: • Present (P): Feature vector extracted from only the current EMG window n: For the classic features we have K = 5 and While for the TD-PSD features we have K = 6 and v n = • Log version of Present (PL): Logarithmically scaled version of the previous vector, i.e., log (x 2 ): • Present-Past (PPa): Feature vector from the current window combined with that of the previous window. Values of the feature vector of the previous window are concatenated with the current one as follows: • Log version of Past and Present (PPaL): Logarithmically scaled log (x 2 ) version of the previous vector: Whereas various techniques reported here have been used previously in other combinations, to our knowledge combining log transformation with look-back (PPaL) is new. All theses algorithms were used with their default hyper-parameter values, except the Ensemble Algorithms for which we set the number of estimators to 100. In this state of the research no hyper-parameter tuning was performed.
In addition to the filtering of the feature vectors as described above, we also implemented a majority voting stabilizer that takes the last N = 20 outputs of the classifier in the queue and provides the most repeated class as the output.
As stated earlier, each of the six gestures (shown in Figure 1) was performed at three different force levels by the users: low, medium and high. The original authors' main objective was to improve performance of gesture recognition despite force/amplitude variations EMG signals recorded from amputees. Our objective, however, is to allow the user to voluntary modulate gripping forces applied by the fingers of a prosthetic device. We therefore treated the six gestures × three force levels as if they were independent gestures, i.e., as 18 classes. To ease comparison, we mimicked the train/test split used in [10], where 3 trials are used for training each gesture+force state, and the rest are used for testing. As five to eight trials were recorded for each gesture+force condition, this means 54 trials were used for training and from 42 to 99 trials were used for testing, depending on the user. The database contains 1027 total trials (486 for training 541 for testing) and the minimum number of windows a trial contains is 80.

Evaluation Methodology
Many measurements have been proposed in the literature to evaluate the 'goodness' of the results, the most popular being: accuracy rate, f-measure, ROC area under curve and time spent for classification (see [16]). We use accuracy to directly compare our results with other investigations. As the specifics of amputation for each participant is different and the remaining muscular activity can be different, we analysed each user independently, as well as through ensemble averages. We assembled the outcomes of the different combinations of feature vector, classifier and filtering into tabular form and make initial assessments through visual inspection. We then used multi-factor ANOVA to quantitatively compare the efficacy of the most promising solutions.

Results
The tables in this section show the results obtained vy averaging the results of the nine users for each independent feature group (classic or TD-PSD) and different filtering methods before (S) and after applying the majority vote (MV) to stabilize the class obtained for each individual window. For visibility purposes, mean values lower than 50% accuracy are omitted. Values shown in color indicate accuracy equal to or above 85%. We further illustrate the results from the most promising algorithms (the same for both feature groups) as bar-plots overlayed with data points from individual users. Table 1 summarizes the results obtained for the algorithms tested with classic features and feature combinations and Figure 2 shows each user's result around the mean for the most promising algorithm-feature combination.

•
Linear models: One can observe in Table 1 the number of blank cells presented in the block of linear models of the table. It is remarkable that the Stochastic Gradient Descent and Passive Aggressive do not achieve more than 50% accuracy for any of the feature combinations as applied to this dataset. The Logistic Regression algorithm achieves greater than 50% accuracy only after majority voting has been applied. Linear Discriminant Analysis appears to provide the best results out of all classifiers: 78% accuracy for single window analysis; 88% with majority voting. • Non-linear models: Neither Decision Trees, polynomic or RBF versions of Support Vector Machines, nor the Naive Bayes classifier achieve 80% accuracy for the 18 classes.
Only the linear version of support vector machine reaches 82% accuracy after majority voting with the PPaL feature combination. Accuracy is lower than with LDA in all conditions. • Ensemble models: The Ada Boost classifier does not provide results better than 50% in any of its configuration. Although the other three proposed classifiers (Bagged Decision Trees, Random Forest, Extra Trees) provide better results, none surpassed 73% even with majority voting.

Individual User Performance
In Figure 2 one can observe that User 5 manifests the poorest results in all the algorithm-feature combinations (its general accuracy is lower than 60% even if the rest of the users cluster around 80%). Additionally, one can observe that user 9 outperforms the rest of the users in terms of accuracy almost achieving perfection with the LDA classifier both for PL and PPaL feature combination with majority voting.

TD-PSD Features
There is a small overall improvement of the obtained accuracies when using TD-PSD features compared to the classic features (i.e., fewer white blocks in Table 2). We describe the results for each algorithm block, as for the classic features above.

•
Linear models: LDA classifier is once again the algorithm that outperforms the rest of the models of this block. With the TD-PSD features, we can observe easily that the PL and PPaL feature combinations demonstrate the highest accuracy (87% and 89%) after majority voting, 72% and 77% respectively for individual windows. • Non-Linear models: The tendency is similar in the non-linear models when using TD-PSD features compared with classic features. Although in this case we obtain higher general accuracies, only with the SVM with linear kernels is near the auto-imposed 85% percent threshold. • Ensemble models: As in the case of classic features, Bagging Classifiers, Extra trees and Random Forest are the ones providing higher accuracy results. Ada Boost again failed to classify correctly more than half of the instances.

Individual User Performance
Similar tendencies are observed with user 5 and 9 with this feature group (Figure 3). Again user 9 outperforms the rest of the users in terms of accuracy in most of the combinations and user 5 goes below 60% in most of the combinations.

Statistical Comparison
To quantitatively compare methodologies we performed a five-factor ANOVA with repeated measures, with accuracy as the sole dependent variable and with Feature Vector (Classic or TD-PSD), Look-Back Filter Method (P, PPa), Transform (Linear or Log), Majority Voting (Simple or with MV) and Classifier (LDA, SVM, BAG and RF) as independent factors. We choose to concentrate on these 4 classifiers as they appear to give more promising results than the other options. Figure 4 shows the relative performance between methods. ANOVA shows a marginally significant main effect of the choice of feature vector (p < 0.019) and significant main effects for the other four independent factors (p < 0.001), suggesting that (a) the TD-PSD performs better than the Classic feature vector (NB see post-hoc analysis below), (b) LDA gives the best performance amongst the different classifiers, (c) logarithmic transformation of feature components has a significant positive effect on accuracy and (d) both look-back filtering and majority voting generally increase performance. Applying Tukey's HSD test to identify homogeneous groups, LDA coupled with the log transformed combination of the present and past samples (PPaL) and majority voting stands out as the best combination, with no significant difference between the choice of feature vector (Classic or TD-PSD). Figure 5 showing the results for only the log-tranformed feature elements illustrates this observation. Nevertheless, one can observe in Figure 6 that the accuracy rates appear to vary less across subjects (smaller range of values) for the Classic vs. TD-PSD features.

Discussion
What novel insights can one draw from our analyses? First, despite showing an advantage over all in the analysis (statistical main effect of Feature Vector), we find no significant advantage of the TD-PSD feature vector over the more classical features when looking specifically at the optimal combination of log transform, look-back and majority voting is applied. This is not entirely surprising. The spectral moments m 0 , m 2 and m 4 in the TD-PSD vector are related to the power (RMS), zero crossings (ZC) and slope sign changes (SSC) of the Classic feature vector and both vectors include a measure of waveform length (WV and R W L ). The TD-PSD vector includes additional parameters related to Sparseness (S) and Irregularity factor (I irr ), but since these two measures are derived from m 0 , m 2 and m 4 , they do not necessarily bring any new information that can be exploited by the classifier. The TD-PSD features may nevertheless be attractive in a practical case because they rely less on empirical constants (e.g., thresholds) that could vary from user to user. Note nevertheless that the TD-PSD features include a λ factor that is also empirical (see [9]). In terms of robustness, it would presumably be better if the machine learning algorithm could find the optimal value for this parameter. With a new database in hands, we would propose to perform a grid search for different λ values. This could be done by methodically evaluating the accuracy changes using different λ values and choosing the one that outperforms over users.
Second, our systematic comparison across different classifiers reveals that LDA is reliably better than the three other methods subjected to our statistical analysis (and presumably better than the others as well). One should note, however, that the SVM, BAG and RF methods are not terribly far behind. If they are easier to compute, they could be considered as well.
Third, combining the present and past sample (PPa) and applying majority voting (MV) each individually improved performance, however, there is little improvement gained by combining present and past when majority voting is applied. This makes sense, as both of these methods represent a means to look further back in time. MV is another form of lowpass filtering that will reject spurious classifications when neighbouring values converge to a different categorization. Like low-pass filtering of continuous signals, however, PPa and MV will add latency to the detection of actual state changes, in the MV case depending on the number of values used in the voting. We use N = 20 which, for windowing at 150 ms with overlap of 50 ms, leads to a potential delay of up to 1 full second before a state change in the intention of the user is output to the prosthetic device. For the state of the art in which prostheses move slowly, this may be acceptable, but a more detailed analysis of the optimal value for N is also warranted. And as for the PPa look-back filtering, one might also ask if simply widening the time windows or low-pass filtering might also improve accuracy without the added complexity of implementing a voting scheme. Which of theseto apply will probably also depend on the ease of implementation (PPa is easier to compute than a moving majority vote) and on the desired response latency. Finally, one can consider the consistency across subjects. Consistently lower accuracy rates are obtained for one user than for all the others regardless of the methodology. Possible explanations include low remaining muscle activity, thicker fat layer, incorrect electrode location, etc. Removing this subject (see Figure 6) one might conclude that the Classic feature vector is more robust with respect to inter-user differences, although the data reported here are far to few to state this conclusion with confidence.

Comparison with Related Studies
The proposed experimental procedure in this investigation could be compared with the experimental Scheme -3 of the creators of the database [10], i.e., training the classifiers with all force levels and testing with single level of force at a time and with the results of the feature fusion proposed in [9]. Al-Timemy et al. [10] report an average error rate when testing with three forces of 17.42% with the LDA classifier, which means a mean accuracy of 82.58%. They tried other classifiers but LDA is the one providing best results in their case, too. In their case, they use the cosine similarity feature vector f i = −2a i b i /(a 2 i + b 2 i ) being a i the TD-PSD of the i th window and b i the logarithmically scaled version of a i as look back filtering instead of the PPaL proposed in this investigation. In [9] graphs suggest an error rate of 18% which translates to 82% accuracy. With our majority voting approach and the logarithmically scaled version of the current and past window (PPaL) combination we achieve 88% accuracy for classic features and 89% for TD-PSD features, outperforming previous attempts despite endeavouring to distinguish between different levels of muscular effort, rather than trying to ignore those variations. This suggests that when the only goal is to identify the gesture, independent of effort, it may nevertheless be more effective to treat different effort/gesture combinations as separate cases in the training process, and then collapse across efforts in the decoding phase, rather than expecting the decoder to ignore variations in overall muscular effort. For our purposes, however, where we would like to offer to the user the ability to control grip forces via changes in muscular effort, these results indicate that this increased level of sophistication in EMG control is indeed achievable.

Result Transfer to Real World
The present study is a theoretical analysis performed on signals captured in a laboratory environment under favourable circumstances. When transferring the current results to the real life scenario, one must take into account that: • muscles act under the effects of the gravity due to the weight of the prosthetic device and that interference should be quantified, • the final user should place the prosthetic on his/her arm without specialized help and this could lead to problems in the EMG readings that do not occur in controlled laboratory conditions, • the final trained model and the extracted features per window should be light enough to be loaded and extracted on real time in a micro-controllers embedded in a prosthetic hand, and • the choice of N should be done according to desired latency the capabilities of the EMG capturing system.
Our current investigation points the direction to follow, even if additional work is needed before achieving a robust product.

Conclusions
In this paper we present a systematic approach to compare the effects of different choices for feature sets, filtering and classifiers on the accuracy of hand-gesture detection from surface EMG in amputees, where in contrast to most studies, detection of the gripping force intended by the user is also to be decoded. For this objective, at least, Linear Discriminant Analysis proved to be the best classifier compared to a number of linear and non-linear alternatives, although other classifying methods came in at a close second. We find no clear advantage or disadvantage of using TD-PSD vs. more classical features to characterize the incoming EMG signals so the choice for a given application may depend more on factors not tested here, such as long-term stability and user-to-user variability. Logarithmic scaling and filtering appears to improve accuracy of the classification, as does a scheme of majority voting, albeit at the expense of increased latency to detect state changes. These result show that simultaneous decoding of intended gesture and grip force can be achieved with surface EMG signals from the remnant muscle of an amputated arm. The comparison of feature vectors, classifier methods and filtering presented here provide important insights that may be used to create effective, intuitive and robust control schemes for to meet the needs of transradial amputees. Institutional Review Board Statement: Data from human users was obtained from a published, on-line database collected under the responsibility of the originating authors [10].
Informed Consent Statement: Data from human users was obtained from a published, on-line database collected under the responsibility of the originating authors [10].

Data Availability Statement:
The data used for this investigation can be found at https://www. rami-khushaba.com/electromyogram-emg-repository.html, accessed on 21 October 2021.