sEMG-Based Gesture Recognition with Convolution Neural Networks

Ding, Zhen; Yang, Chifu; Tian, Zhihong; Yi, Chunzhi; Fu, Yunsheng; Jiang, Feng

doi:10.3390/su10061865

Open AccessArticle

sEMG-Based Gesture Recognition with Convolution Neural Networks

¹

School of Mechatronics Engineering, Harbin Institute of Technology, Harbin 150000, China

²

Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510000, China

³

Institute of Computer Application, China Academy of Engineer Physics, Mianyang 621000, China

⁴

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China

^*

Author to whom correspondence should be addressed.

Sustainability 2018, 10(6), 1865; https://doi.org/10.3390/su10061865

Submission received: 20 March 2018 / Revised: 28 May 2018 / Accepted: 28 May 2018 / Published: 4 June 2018

(This article belongs to the Special Issue Autonomous and Sustainable Computing for preparing the Internet of Things Environment)

Download

Browse Figures

Versions Notes

Abstract

:

The traditional classification methods for limb motion recognition based on sEMG have been deeply researched and shown promising results. However, information loss during feature extraction reduces the recognition accuracy. To obtain higher accuracy, the deep learning method was introduced. In this paper, we propose a parallel multiple-scale convolution architecture. Compared with the state-of-art methods, the proposed architecture fully considers the characteristics of the sEMG signal. Larger sizes of kernel filter than commonly used in other CNN-based hand recognition methods are adopted. Meanwhile, the characteristics of the sEMG signal, that is, muscle independence, is considered when designing the architecture. All the classification methods were evaluated on the NinaPro database. The results show that the proposed architecture has the highest recognition accuracy. Furthermore, the results indicate that parallel multiple-scale convolution architecture with larger size of kernel filter and considering muscle independence can significantly increase the classification accuracy.

Keywords:

gesture recognition; convolution neural network; surface electromyographic

1. Introduction

Surface electromyographic (sEMG) signals which are generated by the electrical activity of the muscle fibers can be noninvasively detected by the surface electrodes. Those signals reflect the muscle activity and provide limb movement information. Under the assumption that the patterns of sEMG signal are repeatable for the same movements and distinguishable for the different movements [1], the recognition of limb motions based on surface electromyographic signal have been widely used in many man–machine interfaces [2,3] such as upper-limb prostheses [4]. However, there are some gaps between application and research [5]. In practical applications, some conditions such as low power consumption [6], portable, space constraints [7] and extensive sEMG data with multiple channels and high sample rate [8] must be considered. Besides those, sEMG-based classification techniques have been extensively researched [9].

The quality of sEMG signal and the processing method are the main factors affecting the classification accuracy. The correct electrode locations, appropriate choice of channels, and the proper selection of hand gestures improve the signal quality and lead to high classification accuracy [10,11,12]. When processing signals, the raw sEMG signals are rarely used directly to recognize limb motions, as it can easily be disturbed by environmental noises, electrode location shifts and loose electrode–skin contacts causing inaccuracy in recognition of limb movements. To mitigate this issue and improve the accuracy, traditional methods usually consist of four phases: preprocessing, windowing, feature extraction and classification [13]. Feature extraction converts the sEMG signals to a compact and informative set of features. Those features are usually hand-crafted by human experts and those extraction methods can be categorized into operating in time domain [13,14,15,16], frequency domain [17,18] and time–frequency domain (TFD) [19]. For example, the features in time domain usually consist of Mean Absolute Value (MAV), The Root Mean Square (RMS), Mean Absolute Value Slope (MAVSlope), Waveform Length (WL), Slope Sign Changes (SSC), Zero Crossings (ZC) and EMG Histogram (HIST) which is an extension of the Zero Crossings. The characteristic frequency domain features include Median Frequency (MDF), the 3rd Spectral Moments (SM3) and Media Amplitude Spectrum (MDA). All these features are designed by human experts, and some have a strong correlation with muscle function. For example, the RMS is related to the constant force and non-fatiguing contraction. The ZC represents the muscle fatigue. As regard to the classification phase, the machine learning algorithms assign the extracted features to the class (gesture) they most probably belong. In the past decade, the optimal methods of classifying EMG signal patterns have been extensively researched [20,21]. Different classifiers have been introduced such as k-Nearest Neighbors (KNN) [22], neural networks [14,23], Bayesian classifier [17,24], linear discriminant analysis (LDA) [25], Support vector machine (SVM) [26,27] and Random Forests (RF) [28,29,30,31]. Besides, the combination of multiple classifiers is also a desirable method to improve classification accuracy. Ahmed et al. proposed a new dynamic channel selection method which combines the multiple classifiers (LDA, SVM, quadratic discriminant analysis, Bayes classifier and extreme learning machine) in the algorithm [32]. Both phases (feature extraction and classification) affect the classification accuracy, especially for the feature set. Hence, to get a higher classification accuracy, some researchers focused on the method to obtain an appropriate feature set, such as the principal component analysis (PCA) of TFD feature, nonnegative matrix factorization (NMF) algorithm and Nonlinear Multiscale Maximal Lyapunov Exponent [9,19,33]. Despite the promising performance have been shown, the greatest disadvantage of those traditional methods is that some useful information may be discarded when extracting feature.

Inspired by the recent success of deep learning which has been widely used in speech recognition and computer vision [34], Atzori et al. introduced a new method based on Convolutional Neural Network (CNN) to decode the sEMG signals [35]. Along the time sequence, those sEMG signals from different electrodes were regarded as the sEMG images. Being different from traditional methods, CNN can extract feature without any additional information or manually designed feature extractor. Four convolutional layers with three different sizes of kernel (

3 \times 3

,

5 \times 5

and

9 \times 1

) and two pooling layers are adopted. The result of [35] indicates that the classical machine learning classification methods are slightly inferior to convolution neural network with a simple architecture. The architecture of CNN has a significant influence on the classification accuracy. Geng et al. [36] and Du et al. [37] used the same ConvNet architecture which consists of four convolutional layers and two fully connected layers to recognize hand gesture by the instantaneous sEMG image. On the choice of kernel sizes, each of the first two convolutional layers consists of 64 filters of

3 \times 3

, while each of the last two convolutional layers consists of 64 non-overlapping filters of

1 \times 1

. The result shows a significant improvement in accuracy than classical classifiers. With the accuracy of

76.1 %

on single frame of sEMG signals and 77.8% using simple majority voting over a 200 ms windows implemented on DB2 of Ninapro [36], the architecture shows better performance than Atzori’s method. Ulysse et al. [38] also adopt these small convolution kernel sizes (

3 \times 3

and

4 \times 3

) to process myoelectric information. However, they calculated the spectrograms of the raw sEMG data and delivered the spectrograms to CNN. Xiaolong et al. [39] proposed an improved method based on the spectrogram of sEMG. After the calculation of spectrogram, the principal component analysis (PCA) is performed to reduce the dimensionality. The CNN model used in [38] only contains one convolutional layer with

5 \times 5

kernel sizes. After a series of the processing procedure, Xiaolong’s method achieved 78.71% classification accuracy. All those previous results show that the CNN is an effective method for electromyographic signal pattern recognition. However, in the current method, the size of the kernel filter is usually the same as the size commonly used in computer vision. It might not be suitable for sEMG signals. The choice of the size of the kernel filter should consider the characteristics of the EMG signal itself. Meanwhile, considering the non-stationary and noisy nature of myoelectric signals, the existing architecture may not be so complex that it is difficult to obtain appropriate sEMG feature set.

In this study, to better adapt to the characteristics of sEMG signals and achieve higher classification accuracy, we proposed a parallel multiple-scale convolution architecture which can extract features without any additional information or manually designed feature extractors. In the design of CNN network architecture, the characteristics of sEMG signals are considered. Unlike the kernel filter commonly used in computer vision, our architecture utilizes a larger size of kernel filter. In addition, the proposed architecture is neither the fusion of different sEMG channel information at the beginning, nor the analysis of each channel first and then fusion of each channel at the result level. Instead, the characteristics of the sEMG signal are fully considered. That is, considering the muscle independence, each sEMG channel at the front end is processed independently, eliminating the error that may be caused by premature fusion, and then the information of each channel is fused and analyzed jointly. To evaluate the proposed parallel architecture, some reference experiments such as classical method were implemented.

2. The Proposed sEMG-Based Gesture Recognition

2.1. Database

The data used in our work are the second database (DB2) from the Ninapro project, which is a publicly accessible database and has previously been used for hand gesture recognition. In DB2 40 intact subjects (28 males, 12 females; 34 right-handed, 6 left-handed; age 29.9 ± 3.9 years) were instructed to perform 50 types of hand, wrist and functional and grasping movements, organized in three distinct sets of exercises (referred to as Exercises B, C and D in [40]). Each movement was repeated six times with a 3 s rest posture in between.

Twelve Trigno wireless electrodes were used to record the sEMG signals. Eight electrodes were located around the forearm at the height of the radiohumeral joint. Two electrodes were placed on the flexor and extensor digitorum superficialis. Two electrodes were placed on the biceps and triceps. The raw sEMG signals were sampled at a rate of 2 kHz with a baseline noise of less than 750 nV RMS. Before the raw data could be used, those signals were processed by several steps such as filtering using a Hampel filter (cleaning the signals from the 50 Hz power-line interference), synchronization and relabeling. The detail can be found in [40].

In this study, 17 hand and wrist movements of Exercise B (8 isometric and isotonic hand configurations and nine basic movements of the wrist) were considered. Approximately 2/3 of the movement repetitions (Repetitions 1, 3, 4 and 6) were used as the training set, and the other two movement repetitions (Repetitions 2 and 5) were used as the testing set.

2.2. Data Analysis and Processing

The classification procedure is similar to Englehart et al. [13] and consists of windowing, feature extraction, and classification. No preprocessing procedure such as low-pass filtering [36], fast Fourier transform (FFT) [39] and Standardization [4,41] was implemented in our algorithm. On the one hand, the preprocessing such as FFT and low-pass filtering may cause the loss of useful information. On the other hand, the preprocessing such as low-pass filtering will introduce the time latency which is not conducive to the real-time control.

2.2.1. Windowing

Before feeding the sEMG signals to the classification algorithm, the data should be processed to match the input dimension of the algorithm. For each channel, the sEMG signals were segmented using a sliding window with a length of L milliseconds (2L samples). The increment of the sliding windows was set to 10 ms (20 samples). Figure 1 presents the segmentation and combination of sEMG signals. The sEMG signals were converted to several

12 \times 2 L

sEMG images for each subject, where 12 represents the number of electrodes.

The length of the window represents a compromise between time latency and classification accuracy. As described in [14], to satisfy the requirement of real-time control, the time latency is less than 300 ms. The more extended window lengths led to higher controller delays as well as increased classification accuracy [42,43,44]. In previous works [13,40,45],

L

is greater than 200 ms to get higher classification accuracy. To test the performance of the proposed algorithm in this study,

L

equal to 100 ms was chosen. Ultimately, the sEMG signals from 12 electrodes were converted into the sEMG images of size

12 \times 200

.

2.2.2. Feature Extraction and Classification

We employed the deep convolutional network to classify the hand gesture without any additional information or manually designed feature extractors. Figure 2 shows the architecture of proposed deep convolutional network named Convolution with two Parallel Block (C-B1PB2) which consists of two parts represented by the red dotted line: feature extractor and classifier.

The feature extractor is used to select the appropriate feature representation for sEMG and reduce the input dimension of the classifier. It is composed of two blocks represented by the black dotted line.

In the Block 1, five convolution layers and two maximum pooling layers are employed. The first three convolution layers contain 40 2D filters of 1 × 13 with the stride of 1 and a zero padding of 0. The last two convolution layers are similar to anterior layers except for the first dimension of kernel filter. In these two layers, the information from different electrodes is mixed to detect the relevance of each electrode. The two maximum pooling layers using the filters of

1 \times 2

are followed by the first and second convolution layer, respectively. The pooling layer is considered to improve the robustness of the algorithm. The local disturbance of sEMG signal caused by noise will not affect the classification results.

Compared with the Block 1, the Block 2 is different in first three convolution layers which adopt the bigger filter kernel size. The first three convolution layers contain 40 2D filters of

1 \times 57

with the stride of 1 and a zero padding of 0. The following two convolution layers are the same as the last two convolution layers of Block 1. The pooling layers were not adopted in Block 2.

Those two blocks are parallel and do not influence the one another when extracting feature. The outputs of the two blocks are concatenated and then delivered to the classifier.

The classifier is composed of three fully connected layers and a softmax layer. The input layer consists of 520 units which are corresponding to the feature extracted by two blocks. The first and the second hidden layers consist of 260 and 130 units, respectively. The output layer has 17 units which are equal to the number of hand gestures.

In both blocks, the batch normalization is employed between each convolution layer and activation function. In classifier, after first and second fully connected layers, the dropout with a probability of 0.5 is adopted.

3. Experiments and Results

As described above, there are some distinguishing features of the proposed method such as parallel block and the size of the convolution kernel. Several reference experiments were conducted to evaluate the performance of C-B1PB2 with those distinctive features.

● Classical Classification (CC):

For each channel, all data were standardized to have zero mean and unit standard deviation [39]. The length of sliding window was 100 ms (200 samples). The increment of sliding window was set to 10 ms (20 samples). The selected signal features include: Mean Absolute Value (MAV), Waveform Length (WL), Zero Crossings (ZC), Histogram (HIST) and marginal Discrete Wavelet Transform (mDWT) [14,43,46]. The HIST needs to predefine the number of bins. The mDWT decomposes the signals in terms of a basis function (i.e., the wavelet) at different levels of resolution, resulting in a high-dimensional frequency-time representation [46]. The predefined number of bins and the parameters of the wavelet are listed in Table 1. The random forests (RF) was implemented to recognize the hand gesture.

● Convolution with two parallel Block 1 (C-2B1):

As shown in Figure 3a, C-2B1 is composed of two parallel Block 1 which has been described in Figure 2. The input layer of classifier has 400 units to match the extracted feature by the two Block 1 while the remaining layers of the classifier are the same as in Figure 2.

● Convolution with two parallel Block 2 (C-2B2):

Figure 3b shows the architecture of C-2B2 which consists of two parallel Block 2. The input layer of the classifier is replaced by 640 units, and the rest layers remain unchanged.

● Convolution with a different kernel (C-DK):

As represented in Figure 3c, the structure is the same as in Figure 2 except the first dimension of filter for each convolution layer. For the upper block in feature extractor, the first four convolution layers have 40 2D filters of

3 \times 13

while the last convolution layer has 40 2D filters of

4 \times 13

with the stride of 1 and a zero padding of 0. The maximum pooling layers are the same as Block 1, followed by the first and second convolution layers. For the lower block in feature extractor, the first three convolution layers have 40 2D filters of

3 \times 57

while the last two convolution layers are identical to the last two convolution layers of upper block. The remaining parameters are same with C-B1PB2.

● Convolution with the small kernel (C-SK):

As shown in Figure 3d, the architecture of C-SK contains two identical parallel CNN blocks. Meanwhile, the size of kernel filter is smaller than used in previous comparison experiments but is similar to the state-of-the-art methods [35,36,37,38,39]. The first four convolution layers contain 40 2D filters of

3 \times 3

with the stride of 1 and a zero padding of 0. The last convolution layer is similar to anterior layers except for the first dimension of kernel filter. It consists 40 2D filters of

4 \times 3

. The two maximum pooling layers using the filters of

1 \times 2

are followed by the first and second convolution layer, respectively.

● Convolution with the small kernel 2 (C-SK2):

As represented in Figure 3e, the architecture of C-SK2 is similar to the architecture of C-SK except for the first dimension of kernel filter for each convolution layer. Compared with the C-B1PB2 method, the most difference is the second dimension of kernel filter which corresponded to the sampling points. The C-SK2 architecture adopts the smaller kernel filter. The first three convolution layers are composed by kernel filters of

1 \times 3

while the last two layers use the kernel filters of

7 \times 3

and

6 \times 3

, respectively.

By comparing the results of C-SK2, C-2B1, C-2B2, and C-B1PB2, the influence of the different size of kernel filter on classification accuracy can be obtained. The results of C-DK and C-B1PB2, or C-SK and C-SK2 can reveal the effect of considering the sEMG signal characteristics on classification accuracy. Moreover, we evaluated the C-B1PB2 on all hand gestures of NinaPro DB2 (including Exercises B, C and D) to verify the effectiveness of the proposed classification algorithm.

The Classical Classification method which consists of preprocessing, windowing, feature extraction and classification was implemented in MATLAB. The other experiments were implemented with Pytorch.

Table 2 gives the average classification accuracy results for each experiment. The first five rows show the average classification accuracy of each method on NinaPro DB2 Exercise B, while the last row shows the result on all hand movements of NinaPro DB2 (Exercises B, C, and D).

In all experiments, the proposed C-B1PB2 obtains the best performance on Ninapro DB2 Exercise B, while the C-SK gets the lowest classification accuracy. Except for the C-SK which consists of small kernel filter and combines the information from different channels in every convolution layer, the other methods based on CNN get higher accuracy than classical method. The C-2B2 with a larger filter kernel size obtained a higher classification accuracy than the C-2B1 and C-SK2 methods. Meanwhile, as the size of the kernel filter increases (C-SK2, C-2B1, and C-2B2), the classification accuracy also increases. Among the CNN based methods, C-SK and C-DK, both ignoring muscle independence, achieve lower classification accuracy.

Figure 4 shows the average confusion matrix, which details the classification and misclassification of the hand gesture for proposed C-B1PB2. Movements 1, 2, 6, 12, 14 and 15, which corresponded with Thumb up, Extension of the index and middle fingers, Fingers flexed together in fist, Wrist pronation, Wrist extension and Wrist radial deviation, respectively, can be more accurately classified than remaining movements. Movements 1–8 belong to the movement of the finger while remaining movements belong to the movement of the wrist.

The C-B1PB2 method can obtain an average of 5.83% increase in accuracy compared with CC method. The result shows the effectiveness of CNN in sEMG based hand gesture recognition. In addition to the disparities caused by the framework, the most significant difference between C-B1PB2 and CC is the input data. Before windowing, the latter was filtered to remove the interference and standardized to have zero mean and unit standard deviation, while the former was not preprocessed. The preprocessing of sEMG signals will influence the performance in real-time control of upper-limb prostheses.

Compared with existing convolution architecture applied in sEMG-based hand gesture recognition [35,36,37,38,39,47], the parallel multiple-scale convolutional layers and filter kernel size of C-B1PB2 are the most significant disparities. As described in Section 2.2, the first dimension of the filter kernel corresponds to the electrodes, and the second dimension of the filter kernel corresponds to the sampling points. The size of filter kernel determines the size of the receptive field. In this paper, the larger size of filter kernel means the larger size in the second dimension of the filter kernel.

As listed in Table 2, the average classification accuracy of C-2B2 is 0.33% higher than the C-2B1 and 1.19% higher than the C-SK2 method. It indicates that a larger receptive field can cause a slight increase in classification accuracy. The most commonly used kernel filter of

3 \times 3

in the state-of-the-art methods of CNN-based hand gesture recognition [35,36,37,38,39] is not optimal. This may be because, for the same kernel size, the higher the sampling rate of the sEMG signal, the less information it contains. Meanwhile, the average classification accuracy of C-B1PB2 is 1.3% higher than the C-2B2 method. It indicates that mining the information at different scales can obtain better performance than just single scale.

The average classification accuracy of C-DK is only a little higher than CC method while the C-SK is even a little lower than CC method. Compared with the other CNN-based reference methods, the biggest difference of C-DK and C-SK focus on how to deal with the information from different electrodes (the first dimension of filter kernel) in each convolution layer. The C-DK and C-SK methods combine information from different electrodes in every convolution layer while the other convolution methods process the signals for each electrode independently in the first three convolution layers and combine the information from different electrodes in the last two convolution layers. As described in [11], the sEMG signals from neighbor muscles are statistically independent. Those muscles change the activation intensity and sequence leading to different gestures. Our proposed algorithm first processes the signals from each channel independently then fuses the information of each channel and produce the feature of sEMG image. This architecture not only fully considers the correlation of each muscle but also ensures the independence of each muscle. Compared with other fusion methods, it considers more sEMG characteristics and leads to better results. The results indicate that considering the characteristics of sEMG signals such as muscle independence is essential when designing the architecture.

The C-B1PB2 method is implemented on all hand gestures of DB2 to verify the effectiveness of the proposed classification algorithm compared with state-of-the-art methods. As listed in Table 3, the average recognition accuracy reached 78.86%. Atzori et al. [35] achieved a recognition accuracy of 60.27% based on the CNN method and 75.27% based on the classical classification method (Random Forests with all features), with the low-pass Butterworth filter (1 dst, 1 HZ) preprocessing and the 200 ms window. Xiaolong et al. [39] achieved a recognition accuracy of 78.71% based on the CNN method, with the preprocessing (normalization and FFT) and 200 ms window. Even without a long-length window and preprocessing, the result of our method is comparable to state-of-the-art methods on DB2. These results further confirm the effectiveness of our architecture.

The selection of hand gestures affects the classification accuracy [11,12,33]. The confusion matrix shows that, the more similar the hand gestures and rest posture, the lower classification rate. As for the wrist movements, the accuracy of extension and supination movements are higher than the corresponding flexion and pronation movements. It may be caused by the inherent joints. The extension and supination movements may produce a higher level of muscle activation.

4. Conclusions

In the past few years, to obtain higher accuracy of sEMG-based gesture recognition, many kinds of research have been focused on feature extraction manually. Since the CNN was introduced to this field, the results demonstrate its powerful ability to extract feature. The architecture of CNN has a significant influence on classification accuracy. In the state-of-the-art methods, the classification algorithms usually adopt the simple architecture with a small kernel filter, regardless of sEMG characteristics. In this paper, we proposed a parallel multiple-scale convolution architecture with different size of the receptive field (C-B1PB2) for hand gesture recognition. The proposed algorithm, employing the larger size of kernel filter and considering the muscle independence, produces accurate results than classical classification method and state-of-the-art methods on NinaPro database 2. The results show that a larger kernel filter can cause a slight increase in classification accuracy. Moreover, the combination of different sizes of kernel filter in the parallel blocks can yield better performance than single size. Furthermore, the result also indicates that, when designing the algorithm, considering the sEMG characteristics such as muscle independence are necessary since it can significantly increase the recognition accuracy.

Future work will focus on reducing computational complexity and the real-time control of prostheses.

Author Contributions

This research was designed, carried out and written principally by Z.D. C.Yang and C.Yi commented and contributed mainly to the methodology section. Z.T. and Y.F. commented and contributed mostly to the introduction. F.J. commented and contributed mostly the experiment and the conclusion sections. All authors were involved in the finalization of the submitted manuscript. All authors read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Graupe, D.; Salahi, J.; Kohn, K.H. Multifunctional prosthesis and orthosis control via microcomputer identification of temporal pattern differences in single-site myoelectric signals. J. Biomed. Eng. 1982, 4, 17–22. [Google Scholar] [CrossRef]
Park, G.; Kim, H. Low-Cost Implementation of a Named Entity Recognition System for Voice-Activated Human-Appliance Interfaces in a Smart Home. Sustainability 2018, 10, 488. [Google Scholar] [CrossRef]
Rho, S.; Yeo, S.-S. Bridging the semantic gap in multimedia emotion/mood recognition for ubiquitous computing environment. J. Supercomput. 2013, 65, 274–286. [Google Scholar] [CrossRef]
Electromyogram Pattern Recognition for Control of Powered Upper-Limb Prostheses: State of the Art and Challenges for Clinical Use–ProQuest. Available online: https://search.proquest.com/openview/c52c612e950984f56fb0d21d8aa23b11/1?pq-origsite=gscholar&cbl=48772 (accessed on 12 January 2018).
Rho, S.; Chen, Y. Social Internet of Things: Applications, architectures and protocols. Future Gener. Comput. Syst. 2018, 82, 667–668. [Google Scholar] [CrossRef]
Chen, B.W.; Ji, W. Intelligent Marketing in Smart Cities: Crowdsourced Data for Geo-Conquesting. IT Prof. 2016, 18, 18–24. [Google Scholar] [CrossRef]
Cyber Physical Systems Technologies and Applications–Science Direct. Available online: https://www.sciencedirect.com/science/article/pii/S0167739X15003325 (accessed on 19 March 2018).
Clustering Algorithm for Internet of Vehicles (IoV) Based on Dragonfly Optimizer (CAVDO)|Springer Link. Available online: https://link.springer.com/article/10.1007/s11227-018-2305-x (accessed on 19 March 2018).
Naik, G.R.; Nguyen, H.T. Nonnegative Matrix Factorization for the Identification of EMG Finger Movements: Evaluation Using Matrix Analysis. IEEE J. Biomed. Health Inform. 2015, 19, 478–485. [Google Scholar] [CrossRef] [PubMed]
Naik, G.R.; Kumar, D.K.; Palaniswami, M. Signal processing evaluation of myoelectric sensor placement in low-level gestures: Sensitivity analysis using independent component analysis. Expert Syst. 2014, 31, 91–99. [Google Scholar] [CrossRef]
Naik, G.R.; Kumar, D.K.; Weghorn, H.; Palaniswami, M. Subtle Hand Gesture Identification for HCI Using Temporal Decorrelation Source Separation BSS of Surface EMG. In Proceedings of the 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications (DICTA 2007), Glenelg, Australia, 3–5 December 2007; pp. 30–37. [Google Scholar]
Naik, G.R.; Acharyya, A.; Nguyen, H.T. Classification of finger extension and flexion of EMG and Cyberglove data with modified ICA weight matrix. In Proceedings of the 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Chicago, IL, USA, 26–30 August 2014; pp. 3829–3832. [Google Scholar]
Englehart, K.; Hudgins, B. A robust, real-time control scheme for multifunction myoelectric control. IEEE Trans. Biomed. Eng. 2003, 50, 848–854. [Google Scholar] [CrossRef] [PubMed]
Hudgins, B.; Parker, P.; Scott, R.N. A new strategy for multifunction myoelectric control. IEEE Trans. Biomed. Eng. 1993, 40, 82–94. [Google Scholar] [CrossRef] [PubMed]
Zardoshti-Kermani, M.; Wheeler, B.C.; Badie, K.; Hashemi, R.M. EMG feature evaluation for movement control of upper extremity prostheses. IEEE Trans. Rehabil. Eng. 1995, 3, 324–333. [Google Scholar] [CrossRef]
Gu, Y.; Yang, D.; Huang, Q.; Yang, W.; Liu, H. Robust EMG pattern recognition in the presence of confounding factors: Features, classifiers and adaptive learning. Expert Syst. Appl. 2018, 96, 208–217. [Google Scholar] [CrossRef]
Englehart, K.; Hudgin, B.; Parker, P.A. A wavelet-based continuous classification scheme for multifunction myoelectric control. IEEE Trans. Biomed. Eng. 2001, 48, 302–311. [Google Scholar] [CrossRef] [PubMed]
Lucas, M.-F.; Gaufriau, A.; Pascual, S.; Doncarli, C.; Farina, D. Multi-channel surface EMG classification using support vector machines and signal-based wavelet optimization. Biomed. Signal Process. Control 2008, 3, 169–174. [Google Scholar] [CrossRef]
Kakoty, N.M.; Hazarika, S.M.; Gan, J.Q. EMG Feature Set Selection through Linear Relationship for Grasp Recognition. J. Med. Biol. Eng. 2016, 36, 883–890. [Google Scholar] [CrossRef]
Chowdhury, R.; Reaz, M.; Ali, M.; Bakar, A.; Chellappan, K.; Chang, T. Surface Electromyography Signal Processing and Classification Techniques. Sensors 2013, 13, 12431–12466. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Shin, S.; Tafreshi, R.; Langari, R. A performance comparison of hand motion EMG classification. In Proceedings of the 2nd Middle East Conference on Biomedical Engineering, Doha, Qatar, 17–20 February 2014; pp. 353–356. [Google Scholar]
Geethanjali, P.; Ray, K.K.; Shanmuganathan, P.V. Actuation of prosthetic drive using EMG signal. In Proceedings of the TENCON 2009 IEEE Region 10 Conference, Singapore, 23–26 November 2009; pp. 1–5. [Google Scholar]
Zhang, Y.; Na, S.; Niu, J.; Jiang, B. The Influencing Factors, Regional Difference and Temporal Variation of Industrial Technology Innovation: Evidence with the FOA-GRNN Model. Sustainability 2018, 10, 187. [Google Scholar] [CrossRef]
Chen, B.-W.; Abdullah, N.N.B.; Park, S.; Gu, Y. Efficient multiple incremental computation for Kernel Ridge Regression with Bayesian uncertainty modeling. Future Gener. Comput. Syst. 2018, 82, 679–688. [Google Scholar] [CrossRef] [Green Version]
Naik, G.R.; Al-Timemy, A.H.; Nguyen, H.T. Transradial Amputee Gesture Classification Using an Optimal Number of sEMG Sensors: An Approach Using ICA Clustering. IEEE Trans. Neural Syst. Rehabil. Eng. 2016, 24, 837–846. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.-H.; Huang, H.-P.; Weng, C.-H. Recognition of electromyographic signals using cascaded kernel learning machine. IEEE ASME Trans. Mechatron. 2007, 12, 253–264. [Google Scholar] [CrossRef]
Chen, B.W.; Chen, C.Y.; Wang, J.F. Smart Homecare Surveillance System: Behavior Identification Based on State-Transition Support Vector Machines and Sound Directivity Pattern Analysis. IEEE Trans. Syst. Man Cybern. Syst. 2013, 43, 1279–1289. [Google Scholar] [CrossRef]
Liarokapis, M.V.; Artemiadis, P.K.; Katsiaris, P.T.; Kyriakopoulos, K.J.; Manolakos, E.S. Learning human reach-to-grasp strategies: Towards EMG-based control of robotic arm-hand systems. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 14–18 May 2012; pp. 2287–2292. [Google Scholar]
Yao, D.; Yang, J.; Zhan, X. A Novel Method for Disease Prediction: Hybrid of Random Forest and Multivariate Adaptive Regression Splines. J. Comput. 2013, 8. [Google Scholar] [CrossRef]
Gokgoz, E.; Subasi, A. Comparison of decision tree algorithms for EMG signal classification using DWT. Biomed. Signal Process. Control 2015, 18, 138–144. [Google Scholar] [CrossRef]
Robinson, C.P.; Li, B.; Meng, Q.; Pain, M.T.G. Pattern Classification of Hand Movements Using Time Domain Features of Electromyography; ACM Press: New York, NY, USA, 2017; pp. 1–6. [Google Scholar]
Al-Ani, A.; Koprinska, I.; Naik, G.R.; Khushaba, R.N. A dynamic channel selection algorithm for the classification of EEG and EMG data. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 4076–4081. [Google Scholar]
Guo, Y.; Naik, G.R.; Huang, S.; Abraham, A.; Nguyen, H.T. Nonlinear multiscale Maximal Lyapunov Exponent for accurate myoelectric signal classification. Appl. Soft Comput. 2015, 36, 633–640. [Google Scholar] [CrossRef]
Chen, B.W.; Wang, J.C.; Wang, J.F. A Novel Video Summarization Based on Mining the Story-Structure and Semantic Relations among Concept Entities. IEEE Trans. Multimedia 2009, 11, 295–312. [Google Scholar] [CrossRef]
Atzori, M.; Cognolato, M.; Müller, H. Deep Learning with Convolutional Neural Networks Applied to Electromyography Data: A Resource for the Classification of Movements for Prosthetic Hands. Front. Neurorobot. 2016, 10. [Google Scholar] [CrossRef] [PubMed]
Geng, W.; Du, Y.; Jin, W.; Wei, W.; Hu, Y.; Li, J. Gesture recognition by instantaneous surface EMG images. Sci. Rep. 2016, 6. [Google Scholar] [CrossRef] [PubMed]
Du, Y.; Jin, W.; Wei, W.; Hu, Y.; Geng, W. Surface EMG-Based Inter-Session Gesture Recognition Enhanced by Deep Domain Adaptation. Sensors 2017, 17, 458. [Google Scholar] [CrossRef] [PubMed]
Côté-Allard, U.; Fall, C.L.; Campeau-Lecours, A.; Gosselin, C.; Laviolette, F.; Gosselin, B. Transfer Learning for sEMG Hand Gestures Recognition Using Convolutional Neural Networks. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017. [Google Scholar]
Zhai, X.; Jelfs, B.; Chan, R.H.M.; Tin, C. Self-Recalibrating Surface EMG Pattern Recognition for Neuroprosthesis Control Based on Convolutional Neural Network. Front. Neurosci. 2017, 11. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Atzori, M.; Gijsberts, A.; Castellini, C.; Caputo, B.; Hager, A.-G.M.; Elsig, S.; Giatsidis, G.; Bassetto, F.; Müller, H. Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Sci. Data 2014, 1, 140053. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gijsberts, A.; Atzori, M.; Castellini, C.; Muller, H.; Caputo, B. Movement Error Rate for Evaluation of Machine Learning Methods for sEMG-Based Hand Movement Classification. IEEE Trans. Neural Syst. Rehabil. Eng. 2014, 22, 735–744. [Google Scholar] [CrossRef] [PubMed]
Smith, L.H.; Hargrove, L.J.; Lock, B.A.; Kuiken, T.A. Determining the Optimal Window Length for Pattern Recognition-Based Myoelectric Control: Balancing the Competing Effects of Classification Error and Controller Delay. IEEE Trans. Neural Syst. Rehabil. Eng. 2011, 19, 186–192. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kuzborskij, I.; Gijsberts, A.; Caputo, B. On the challenge of classifying 52 hand movements from surface electromyography. In Proceedings of the 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), San Diego, CA, USA, 28 August–1 September 2012; pp. 4931–4937. [Google Scholar]
Menon, R.; Caterina, G.D.; Lakany, H.; Petropoulakis, L.; Conway, B.A.; Soraghan, J.J. Study on Interaction Between Temporal and Spatial Information in Classification of EMG Signals for Myoelectric Prostheses. IEEE Trans. Neural Syst. Rehabil. Eng. 2017, 25, 1832–1842. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Fougner, A.; Scheme, E.; Chan, A.D.; Englehart, K.; Stavdahl, Ø. A multi-modal approach for hand motion classification using surface EMG and accelerometers. In Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC, Boston, MA, USA, 30 August–3 September 2011; pp. 4247–4250. [Google Scholar]
Gijsberts, A.; Caputo, B. Exploiting accelerometers to improve movement classification for prosthetics. In Proceedings of the 2013 IEEE International Conference on Rehabilitation Robotics (ICORR), Seattle, WA, USA, 24–26 June 2013; pp. 1–5. [Google Scholar]
Xia, P.; Hu, J.; Peng, Y. EMG-Based Estimation of Limb Movement Using Deep Learning with Recurrent Convolutional Neural Networks: Emg-Based Estimation of Limb Movement. Artif. Organs 2017. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Converting the sEMG signals to sEMG images by sliding window.

P_{(a, b)}

represents a segment of sEMG signal from electrode

b

at the time

a

.

P_{a}

represents the sEMG signals from 12 electrodes at the time

a

.

Figure 1. Converting the sEMG signals to sEMG images by sliding window.

P_{(a, b)}

represents a segment of sEMG signal from electrode

b

at the time

a

.

P_{a}

represents the sEMG signals from 12 electrodes at the time

a

.

Figure 2. Schematic of C-B1PB2 used on the sEMG signals.

Figure 3. Schematics of three reference experiments: (a) convolution with two parallel Block 1 (C-2B1); (b) convolution with two parallel Block 2 (C-2B2); (c) convolution with a different kernel (C-DK); (d) convolution with the small kernel(C-SK); and (e) convolution with the small kernel 2 (C-SK2).

Figure 4. Confusion Matrix.

Table 1. The parameters of HIST and mDWT.

Feature	Parameter
Histogram (HIST)	10 bins along with $3 σ$ threshold
marginal Discrete Wavelet Transform (mDWT)	db7 wavelet, 3 level

Table 2. The results of classification methods.

Method	Classification Accuracy
C-B1PB2	$83.79 %$
CC	$77.96 %$
C-2B1	$82.16 %$
C-2B2	$82.49 %$
C-DK	$79.23 %$
C-SK	$75.82 %$
C-SK2	$81.30 %$
C-B1PB2 (all gestures of DB2)	$78.86 %$

Table 3. Performance comparison with state of the art methods on all hand gesture of DB2.

Methods	Atzori et al. [35]		Weidong et al. [36]			Xiaolong et al. [39]		Proposed
Methods	CNN	RF	CNN (Single Frame)	CNN (20 Frames)	KNN	CNN	SVM	CNN
Accuracy	60.27%	75.27%	76.1%	77.8%	70%	78.71%	77.44%	78.86%

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ding, Z.; Yang, C.; Tian, Z.; Yi, C.; Fu, Y.; Jiang, F. sEMG-Based Gesture Recognition with Convolution Neural Networks. Sustainability 2018, 10, 1865. https://doi.org/10.3390/su10061865

AMA Style

Ding Z, Yang C, Tian Z, Yi C, Fu Y, Jiang F. sEMG-Based Gesture Recognition with Convolution Neural Networks. Sustainability. 2018; 10(6):1865. https://doi.org/10.3390/su10061865

Chicago/Turabian Style

Ding, Zhen, Chifu Yang, Zhihong Tian, Chunzhi Yi, Yunsheng Fu, and Feng Jiang. 2018. "sEMG-Based Gesture Recognition with Convolution Neural Networks" Sustainability 10, no. 6: 1865. https://doi.org/10.3390/su10061865

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

sEMG-Based Gesture Recognition with Convolution Neural Networks

Abstract

1. Introduction

2. The Proposed sEMG-Based Gesture Recognition

2.1. Database

2.2. Data Analysis and Processing

2.2.1. Windowing

2.2.2. Feature Extraction and Classification

3. Experiments and Results

4. Conclusions

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI