Support Vector Machine-Based EMG Signal Classiﬁcation Techniques: A Review

: This paper gives an overview of the different research works related to electromyographic signals (EMG) classiﬁcation based on Support Vector Machines (SVM). The article summarizes the techniques used to make the classiﬁcation in each reference. Furthermore, it includes the obtained accuracy, the number of signals or channels used, the way the authors made the feature vector, and the type of kernels used. Hence, this article also includes a compilation about the bands used to ﬁlter signals, the number of signals recommended, the most commonly used sampling frequencies, and certain features that can create the characteristics of the vector. This research gathers articles related to different kinds of SVM-based classiﬁcation and other tools for signal processing in the ﬁeld.


Introduction
In recent decades, biomedical signals have been used for communication in Human-Computer Interfaces (HCI) for medical applications; an instance of these signals are the myoelectric signals (MES), which are generated in the muscles of the human body as unidimensional patterns. Because of this, the methods and algorithms developed for pattern recognition in signals can be applied for their analyses once these signals have been sampled and turned into electromyographic (EMG) signals. Additionally, in recent years, many researchers have dedicated their efforts to studying prosthetic control by means of EMG signal classification, that is, by logging a set of MES in a proper range of frequencies to classify the corresponding EMG signals.
The EMG signals are obtained from sensors placed on the skin surface and can help retrieve muscular information during contractions when flexing or extending an articulation. There are also implants placed under the skin that facilitate the signal acquisition, but these are not commonly used.
Regarding the pattern recognition problem for myoelectric control systems, its success depends mostly on the classification accuracy [1] because myoelectric control algorithms are capable of detecting movement intention; therefore, they are mainly used to actuate prostheses for amputees [2].
With the aim of carrying out the pattern recognition for myoelectric applications, a series of features is extracted from the myoelectric signal for classification purposes. The feature classification can be carried out on the time domain or by using other domains such as the frequency domain (also known as the spectral domain), time scale, and time-frequency, amongst others [3].
One of the main methods used for pattern recognition in myoelectric signals is the Support Vector Machines (SVM) technique whose primary function is to identify an n-dimensional hyperplane to separate a set of input feature points into different classes. This technique has the potential to recognize complex patterns [4] and on several occasions it has proven its worth when compared to other classifiers such as Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA) and Particle Swarm Optimization (PSO) [5][6][7][8]. The key concepts underlying the SVM are: (a) the hyperplane separator; (b) the kernel function; (c) the optimal separation hyperplane; and (d) a soft margin (hyperplane tolerance).
A compilation of the most outstanding works that combine different techniques based on SVM is presented in this paper. It also includes a list of those features most commonly used in the time, frequency, time-frequency, and spatial domains for pattern recognition. Finally, other applications of the SVM-based classifier are included in the last section.

EMG Signals
Pattern recognition-based myoelectric signal classification consists of logging a specific time interval of EMG signals coming from the muscles and performing many repetitions with different movements, to segment them later. The classification is performed by extracting the features in each interval of the signal to recognize the characteristic information of each movement. From these data, the training is accomplished-which varies according to the method-and, in this way, it is possible to classify the type of movement.
However, the myoelectric signal acquisition is not a simple procedure since the EMG signals have a high noise content since they are not extracted directly from the muscles, but they need to go through the different layers of the skin between the electrode and the pulse generated by the muscle. Besides, the signal acquisition instruments introduce noise themselves by the parasitic frequencies on the power line.

Feature extraction from an EMG signal
A parameter of an EMG signal is a stable variable or a value from a mathematical or physical model ideally associated with the generation or detection of an MES process, such as length and depth of a fiber, electrode surface or distance between them, coefficients of the auto-regressive model, etc. [28]. After the signal acquisition stage, a processing stage extracts a series of parameters for the analysis of the EMG signal.
After filtering the signal and digitalizing it, since the vector components do not have any meaning individually but only as a whole, it is necessary to characterize the vector representation of the signal. As a result, it is required to extract the features from the vector that represents the signal, and, from it, implement the vector classification.
A feature of an EMG signal is a unique property, which can be observed or described qualitatively, such as being big or small, fast or slow, and sharp or smooth. An EMG variable is a physical amount that can be computed, reported and transmitted in a numeric form, and that can change as a function of time, such as voltage, frequency, velocity, and delay, amongst others. The variable is estimated during a finite time interval known as an epoch [28].
Nevertheless, when the purpose of extracting the signal features is to control a certain device, it is necessary to obtain more information from each channel of the EMG signal or to assign a control function to a specific combination from the multi-channel system, which is the particular purpose of extracting characteristics from the signals [27].
For the extraction of features, the signals can be processed in the time domain; they can also be transformed into the frequency domain, or represented in the time-frequency space or in time scale. This process consists in assembling a feature vector with different parameters of the signal. Choosing the proper parameters that will form the feature vector correctly is of vital importance since this is the starting point from which classification is made.
That is, an MES is a time function, and, thus, it can be described in terms of its amplitude, frequency, or phase. Hence, for its study, the extracted features lie in different domains, such as time or frequency, and some of their variants. A description of the most common features is given in Appendix A.

Time Domain (TD)
MESs have a very particular structure during muscle contraction, which varies according to the movement performed by the extremity. For this reason, MES classification can be used to actuate a prosthesis. By processing signals in the time domain, there is an increase in the available time for analysis since there is no need for the time-consuming task of transforming the signal to a different domain [27].
Since the signals are usually sampled in the time domain, it is more common to extract features in that domain, since they do not need to be converted and can be processed directly. These time-domain signals are studied in depth and used by researchers from the medical and engineering fields.
Time-domain features are more natural and simpler to extract since they are calculated from the sampled MES time series (the EMG signal) without any intermediate transformation [23,27]. Notwithstanding, time-domain EMG signals also present some disadvantages, which come from the non-stationary properties of the MES, with time-varying statistical properties. Nonetheless, features in this domain are highly used, because of their performance during classification presents a very reduced amount of noise and their processing time is lower compared with those features found in the frequency domain and timescale.

Frequency Domain (FD)
Spectral analysis, also known as representation in the frequency domain, is instrumental in studying muscle fatigue and it is influenced by the firing rate of the motor unit in frequencies lower than 40 Hz and for the morphology of the action potential in muscle fiber in frequencies higher than it [56].

Time-Frequency Domain (TFD)
TFD features are more sophisticated computationally than time-domain features. However, there are fast algorithms with which the characteristics can be implemented in TFD in order that real-time requirements necessary for MES classification are still met [9,11,50,57].

Spatial Domain (SD)
Spatial Domain features allow finding an improvement in the difference between postures and MES signal force levels, which provide information about the spatial distribution of the motor unit action potential (MUAP) and load between muscles [58].
In theory, a classifier must be able to differentiate, according to the input values, to which class it belongs. An MES is, in essence, a one-dimensional pattern, so that the methods and algorithms developed for pattern recognition can be applied to its analysis. The information extracted from an MES, represented in a feature vector, is chosen to minimize the control error. The feature set should be selected as the one that separates as much as possible the desired output classes [27].

Myoelectric Signal Classification
There are many classifiers in the literature, such as Simple Logistic Regression (SLR), Artificial Neural Networks (ANN), Linear Discriminant Analysis (LDA), Naïve Bayes (NB), K-nearest neighbor (KNN), Nonlinear Logistic Regression (NLR), Multi-Layer Perceptron (MLP), and Support Vector Machines (SVM), among others. However, in some cases, such as the ones shown below, the classification of MES with SVM has demonstrated improved performance in terms of accuracy.
Recently, Dhindsa et al. [8] made a performance evaluation of several classifiers, using EMG signals to predict five levels of knee angles. They used 15 features per each of the four measured muscles, combining time and frequency features with four auto-regressive (AR) coefficients. The evaluated classifiers were LDA, NB, KNN, and SVM with different kernels, of which the quadratic kernel of SVM performed the best classification accuracy of 93.07 ± 3.84%. In addition, in the study, the EMG signals were segmented in five different window sizes with various overlapped window schemes, and the best result was achieved with the overlapping of 500 ms and 250 ms window sizes.
Furthermore, EMG signals can be used for finger movement recognition. Purushothaman and Vikas [6] compared SVM against LDA and NB in the classification of 15 different finger movements from 15 subjects. They utilized Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) as feature selection algorithms. MAV, ZC, SSC, and WL were the features extracted (see Appendix A). Remarkably, they achieved more than 95% effectiveness without feature selection using the SVM and 4% less with 16 features considered using PSO and ACO.
EMG signals also allow the diagnosis of muscular dystrophy disorder. Kehri and Awale [25] compared ANN with SVM in the classification of EMG signals to identify this disorder by using a wavelet-based decomposition technique. The results show the classification accuracy of an available clinical EMG database, of 140 samples, with 95% effectiveness using a polynomial kernel of fifth order in an SVM.
A low-cost mechatronics platform for the design and development of robotic hands was proposed by Geethanjali [14], by comparing the SVM with other classifiers, such as ANN, LDA, and SLR.
For the database, the subjects performed six different hand movements, and from those signals, the time-domain features were extracted, such as MAV, ZC, SSC, WL, MAVS, VAR, RMS, WAMP, and fourth-order AR coefficients (see Appendix A). With these features, they ensembled five groups to demonstrate their influence. Finally, they applied different kernel functions in the classification with SVM and reached the best result with a linear kernel and normalized data of 92.8% in the group with MAV, SSV, WL, ZC and fourth-order AR coefficients.

Support Vector Machines
SVM are used quite often as classification algorithms, of body movements, images, sound, etc. SVM construct an optimal separation hyperplane into a feature space that is of high dimension, due to the entries that are mapped using non-linear functions, to distinguish between two (as depicted in Figure 1) or more types of objects. This theory was introduced by Vapnik and Corina in 1995 [59]. For the nonlinear separable problem, the input space is mapped into a high-dimensional feature space and the separation hyperplane is found in this new space. The optimal hyperplane needs to discriminate different categories correctly, and so the hyperplane with maximum clearance between classes them should be found, i.e., the hyperplane that best separates the classes.
In SVM, the training algorithm is reformulated as a problem to solve by Quadratic Programming (QP), whose solution is global and unique. Considering input training data (x 1 , y 1 ), . . . , (x m , y m ) ∈ R N × {−1, +1}, where x i corresponds to the input value and y i to the assigned class (−1 or +1) to which it belongs. If these data are not linearly separable, they are mapped by a non-linear transformation φ : R N → R M inside of a new feature space R M where the transformed data will be linearly separable. In this way, the obtained hyperplane that separates object types can be seen as where ω ∈ R M and b ∈ R. The QP problem is supposed to build an optimal hyperplane with a maximum value of separation and a closed error ξ = (ξ 1 , . . . , ξ m ) in the training algorithm, that is, we aim to subject to If the data points are too close, indeed, if it is difficult to separate them directly, it is possible to use a kernel function K to separate them. That is, subject to where K(x j , x k ) is the kernel function, which can be a Radial Basis Function (RBF), a Gaussian, a polynomial, etc. A polynomial kernel can be linear, quadratic, cubic or of any degree d [6], which can be described as The RBF of two samples, which are feature vectors, is defined by [60]: The Gaussian function is written as [6] where γ = 1/2σ 2 and σ is the standard deviation. When SVM are used to classify more than two classes, two strategies can be adopted: One Against One (OAO) and One Against All (OAA). The first one discriminates between classes, one by one, that is, the first category compared only against another category and so on, while the second separates each class from the rest.

SVM-Based Myoelectric Signal Classification
Many researchers have extensively discussed resolution methods for pattern-based classification for control applications. This review only includes works related with SVM, since this method is widely recommended by several authors; this is largely because it is very flexible and can be combined with other methods, which allows improving the accuracy of classification.
For example, in [61], forms of Motor Unit Potentials (MUP) in a Motor Unit Potential Train (MUPT) are evaluated to determinate if they represent a single motor unit They authors obtained 95.6% accuracy with this method.
By taking advantage of technological advances, in [32], the Myo Armband device from Thalmic Lab was placed on 26 subjects to perform a series of four hand gestures, and a linear kernel was used for obtaining an average accuracy of 94.9% with eight electrodes and 72% with four.
Similarly, in [54], a DL-3100 system was used to measure MES signals as a new user authentication method for mobile devices. SVM were trained under four features values (max, min, time of max and min value) and it is possible to choose between five hand gestures to unlock the mobile device. However, in [20], only two channels were used to classify five classes of hand movements, and a method for normalize the signals was implemented, reaching 95% accuracy precision with an expert user.
In [13], two different kernels (Gaussian and RBF) were used to classify five different leg movements, through four MES channels, by combining MAV, WL, ZC and, SSC in such a way that in total 16 different vectors are introduced to SVM. With this method (MKL-SVM), the authors obtained more than 90% of accuracy. In the same manner, in [12,51], an RBF kernel was used for the SVM classifier to perform the error estimation using Leave-One-Out Cross Validation (LOOCV) to separate between six different walking movements; however, the former used 31 electrodes placed on different leg and buttocks muscles to form 16 bipolar signals, while the latter only used 9 electrodes. Both extracted MAV, ZC, WL and, SSC feature signals, but, in [12], the authors also extracted RMS, AR1, AR2 and, AR3. In both studies, 95% precision was obtained.
In time domain, Alkan and Günay [49] combined discriminant analysis with SVM to distinguish between four arm movements. When extracting MAV and AR from windows formed by 32 samples at a sampling frequency of 1 kHz, they made the features vector, with which they obtained an average accuracy of 99%.
Additionally, Liu [22] used AR6, MAV, ZC, WL and SSC to classify six different movements, taking the signals readings in the forearm, by means an incremental learning adaptive algorithm to SVM, which incorporated useful information in tests to a self-correction mechanism to suppress erroneous classifications, with 96.6% average accuracy.
In addition, some studies use MES signals focused on support for people with disabilities. For example, Ishii et al. [45] studied the navigation of an Electric WheelChair (EWC). Four channels of MES signals, placed on cheek, neck, and shoulder, control the seven EWC movements, the iMES of each channel was calculated to form the feature vector. The average classification accuracy was 89.7%.
In the same manner, Rossi et al. [53] took advantage of signals in time domain, therefore they did not use any method for feature extraction. Besides, they combined the information about signal history through the HMM (Hidden Markov Models) with the advantages of time-independent SVM classification, forming an HMM-SVM classification algorithm with 91.8% accuracy to distinguish six different arm movements. Wang et al. [24] implemented visual feedback from the virtual prosthetic hand system to improve classification accuracy and achieved a mean of 98.79%. The authors distinguished between eight movements, with three pairs of sensors, by the SVM classifier and compared the obtained results by using RMS, MAV, VAR, WL, WAM, IAV and SCC as a different features vector.
Sometimes, authors combine features in the time domain with those in the frequency domain. Sasaki et al. [47] developed and tested a tongue interface to detect six motions, including saliva swallowing, from the surface of suprahyoid muscles at the underside of the jaw. They combined RMS from time domain with CC features from frequency domain and achieved 95.1 ± 1.9% classification accuracy. In addition, Cai et al. [36] performed a classification of eight facial expressions. They used 74 features for each expression using only six channels, and achieved 99.6% of overall accuracy with a cubic kernel in SVM classifier. Among the features used, they included mean value and RMS value of all channels mean values.
Nevertheless, other authors prefer to work in time-frequency domain. Lucas et al. [11] used the representation space of characteristic vector based on DWT (Discrete Wavelet Transform), by using an unrestricted parametrization of Wavelet mother. With this method, they obtained an average classification error of six hand movements, through eight electrodes, of 4.7 ± 3.7%. Using the same method, Lin et al. [43] utilized a pair of features after applying DWT in their MES raw data, from eight subjects, to identify the movement intention of the patients. They found as the best choice the fifth step in DWT decomposition combined with MAV and Max features, which achieved 100% accuracy with SVM classifier.
In addition, Too et al. [21] classified 17 hand and wrist movements from MES signals acquired from NinaPro database. The feature vector was composed of RMS extracted from DWT and the average energy of spectrogram at each frequency bin, after having applied a Principal Component Analysis (PCA) and conserving the first three. By applying SVM, the highest classification accuracy was 95% and 71.3% for normally-limbed and amputee subjects, respectively. Moreover, Ahlawat et al. [17] used PCA for dimensionality reduction with a kernel quadratic in SVM classifier, where kurtosis, skewness, SSC, MAV and AR1 in TD were the features extracted. The overall mean classification accuracy was 99.04% for the two activities performed.
At the other extreme, Omari and Liu [39] proposed an algorithm called GAPSO-SVM that combines Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) combined with SVM. The algorithm selected optimal parameters for RBF and an optimal decomposition level also making use wavelet mother function to classify energy of wavelet coefficient obtained from forearm signals. In addition, they implemented PCA, and obtained 98.7% accuracy in the classification. As in the previous study, Sui et al. [46] used PSO but they combined it with an improved SVM (by removing slack variables, as the bias value from the decision function), calling their method PSO-ISVM, which effectively identified six types of upper limb movements with an average recognition rate of 90.66%. As a feature vector, the WPT was used to extract the variance and energy of the wavelet packet coefficients and, as a kernel, the RBF.
In the same manner, Xing et al. [50] used the energy node of WPT coefficients as MES signal characteristic. They also used the Non-parametric Weighted Feature Extraction (NWFE) to reduce the dimensionality of features vector that enters the SVM, from which they obtained 98.39% precision in OAA mode, and 98.214% in OAO mode. After the analysis and comparison between different mother wavelet functions in discrete and continuous time, Too et al. [37] achieved 98.74% using MAV feature in Symlet 4 function in level 2, 98.49% by applying WL feature in Coif 3 function in level 4, both from DWT. For CWT, 98.56% was achieved with MAV by employing Sym 6, scale 16 and 98.64% with WL by using Mexh, scale 32. The authors classified 10 different hand movements with four channels.
Another method that is used to reduce the dimensionality problem is that used by Erkilinc and Sahin [48], which, in addition to applying FFT to four MES signals for the camera control, whose movements are up, downright, left and neutral, performs a component reduction by SPCA (Sparse Principal Component Analysis), before entering SVM values. The authors implemented a not widely used technique, namely the data division in Kaiser windows, obtaining 81% accuracy. In the same manner, Goen and Tiwari [5] employed SPCA with the lasso to produce modified principal components with scattered loads. In addition, they used the SVM ensemble for classification of seven different arm movements, combined with a window length of 256 ms and a 50% overlapping. The classification accuracy obtained by the authors reached 98%.
Unlike other authors, Kouchaki et al. [62] simulated, in the laboratory of the University of Waterloo, MES signals by incorporating statistical and morphological properties. They utilized SVM to discriminate between different neuromuscular diseases (neuropathy and myopathy), and, after simulating signals, decomposed them by using an Empirical Mode Decomposition (EMD) as well as the Kolmogorov complexity and other informative features to reveal the number of irregularities within each subspace formed by EMD. The accuracy obtained was 91.11%.
Combining time and frequency domain features, Yoshikawa et al. [35] used ZC in time domain and mean MES, CC and, DCC (Delta Cepstrum Coefficients) in frequency domain. The last one is defined as a characteristic of the dynamic type, and it is the difference between two CC. The features were used to classify seven hand movements, and they obtained a precision of 91-94.4%, depending on the test subject. This classification was used for digital robotic arm control, with 62.5 Hz delay. In addition, Doulah et al. [33] presented a method for automatic detection of posture transition and used it for a knee-ankle-foot orthosis. Furthermore, they used a PCA for dimensionality reduction from the eleven extracted features of ten subjects with 14 sensors (MAV, SSC, STD, entropy, coefficient of variation, maximum, minimum, median, maximum to RMS ratio, RMS to mean ratio, and fractal dimension). The obtained precision was 92.94% for the detection of the sit-to-stand posture transition.
In the same way, Bian et al. [15] compounded IEMG, STD and RMS features in time domain and MPF and MNF in frequency domain. Besides using a linear kernel, the dimensionality reduction was done by PCA. Starting from seven vectors, the obtained results were higher than 92.25% for classifying eight movements, whose signals were extracted using the eight channels from Myo armband device. Furthermore, Roldan-Vasco et al. [7] combined five time domain features (LOG, DASDV, VAR, ZC and MYOP) with two in frequency domain (MNF and FR) to record the activity from 47 healthy subjects when swallowing water, yogurt and saliva, using for classification SVM with RBF kernel, and obtained 92.03% accuracy.
Moreover, without feature extraction tools, Luo et al. [19] extracted the synergistic patterns of myoelectrical activities by a non-negative matrix factorization (NMF), with five healthy subjects, to classify five different movements (hand open and close, key pinch, palm valgus and grasp cylindrical tool). They implemented two different filters, one analog and one digital, within the recorded signals from six muscles. By their method, the muscle synergy patterns as a feature vector matrix could achieve the mean classification rate of 96.08%. Table 5 lists classification accuracy according to each one of the authors mentioned above, who worked with time-domain features. Number of classes and channels are in the second and third column, respectively. The features extracted from EMG signals are in the third column and, in the last column, the classification accuracy obtained is presented. Table 6 summarizes those who worked in time-frequency domain, while Table 7 presents the authors who combined features in different domains. Table 5. EMG signals classification methods with time-domain features.

Other Applications of SVM-Based Classifiers
To identify abnormal changes in Mental Workload (MWL) and thus prevent accidents due to work overload, Yin and Zhang [63] classified overload levels (low, medium and high) using EEG-PSD to form the features vector. To reduce it, they used the Locally Linear Embedding (LLE) technique, and subsequently classified with combined techniques of Support Vector Clustering (SVC) and Support Vector Data Description (SVDD), obtaining 79.54% accuracy. In the same year, Yin and Zhang [64] used LS-SVM to differentiate between the state of high mental load and fatigue, by combining EEG, ECG, and Electrooculogram (EOG) signals, with a feature reduction made with a Recursive Feature Elimination (FRE) and keeping RBF kernel. This procedure improved the accuracy to 92.67% compared against their previous study.
The classification of other types of signals has also been utilized as a guide for some doctors and therapists for the detection of different diseases, or the diagnosis of disorders in the motor system. In [65], an adaptive system for SVM, called ASVM, is proposed to diagnose diseases through the blood, using data on diabetes and breast cancer. In this method, the bias value of SVM is adjusted by a feedback mechanism, which allows the classification to be done more quickly and with higher precision than in its different evaluation, obtaining 67.22-97.39% accuracy.
Similarly, features in temporal and frequency space were utilized in [66] to perform the detection of muscular fatigue of the lower extremities to prevent falls and injuries; with the aid of six cameras, the authors differentiated between the state of fatigue and without fatigue using SVM with linear and RBF kernel, obtaining 96% accuracy with both kernels. In the same manner, in [67], six cameras with a 200 Hz sample rate were used to differentiate between assisted walking of patients with arm support and unassisted walking. The authors also used temporal space features, combining Non-dominated Sorting Genetic Algorithm II (NSGAII) and Genetic Algorithms (GA) with SVM to choose from among 30 marching parameters and conducted the classification, obtaining 99.31% precision.
Other authors (e.g., ) combined different signal types, such as EEG, ECG, EOG, EGM and videotaping for sleep evaluation, i.e., distinguishing between the state of waking and sleep of different people. The features used by Park et al. [68] were Proportional Integration Mode (PIM), Zero Crossings Mode (ZCM) and FFT, to which RBF is applied and then classified with the SVM, obtaining a precision of 88.94%.
More techniques utilized by some authors also reduced the number of features to improve the classification accuracy and to increase the processing speed by reducing the dimensionality of the features vector; for instance, in [69], an approach of random forest classification to the diagnosis of lymphatic diseases is proposed. In the first stage, the authors performed a features reduction of different methods, obtaining as best result 92.2% accuracy in the distinction of four states of the patient, including normal, malignant lymph or fibrosis, with the reduction from 18 to 6 features using genetic algorithms.
Khazaee and Ebrahimzadeh [70] used ECG signals to differentiate between five kinds of arrhythmias. They used a database offered by MIT-BIH. The procedure was performed in three different stages. The first one consisted in feature extraction by Non-Parametric Power Spectral Density (NPPSD). In the second, the classification using SVM with Gaussian Radial Basis Function was performed. Finally, they accomplished the SVM parameter optimization by a GA. The classification accuracy obtained was 96%.
Raj and Ray [42] differentiated arrhythmias using PCA to reduce features in time-frequency space. These features were obtained from ECG by means of the Discrete Orthonormal Stockwell Transform (DOST) concatenated with morphological features. They also employed the PSO technique to adjust the SVM parameters with an RBF kernel. These combined methods, PSO and SVM, reached 98.82% classification accuracy to differentiate among 16 types of arrhythmic events that are produced more frequently in the heart.
Dobrowolski et al. [71] used the SVM for neuromuscular disorder diagnosis based on the analysis of scalograms formed from MES extracted from the deltoid muscle. Then, the SVM analysis was implemented to subsequently reduce to a single decision parameter. The error probability of this method was 0.5%.
Another medical application where an SVM is used for classification is in the differentiation of four main classes in which a protein is composed. In [72], a method is proposed to discriminate between both classes and protein structures by the incorporation of pseudo average chemical shift along with an SVM. This method was used onin four different databases, obtaining 84.2%, 85%, 86.4%, and 89.2%, respectively, in classification accuracy.
In the case of hyperspectral image classification, in [73], a guided filter is incorporated into the SVM classifier. This originates from the fusion of spectral and spatial features with the help of the PCA method. The authors classified more than nine classes with the spatial features of the SVM and achieved an average 98.92% classification accuracy.
Other images that also have been classified are digital mammograms, in search of microcalcifications for diseases prevention. For instance, El-Naqa et al. [74] used a database of 76 digital mammograms, and the obtained accuracy with a polynomial kernel in SVM was 94%.
The authors of [39,60] used GA combined with SVM to classify images obtained by fusing multifrequency RADARSAT-2 synthetic aperture radar and Thaichote multispectral images. The results provided high classification accuracy at over 95%.
The Content-Based Image Retrieval (CBIR) technique, which is a developing trend in digital image processing, aims at recovering a queried image from a large database. In this field, Sugamya et al. [75], in the search for an image, extracted color, form and texture from an image, and later they used SVM for the classification, obtaining 76.6% accuracy with the help of the standardized Euclidean metric.
Moreover, in addition to extracting voltage signals from the brain, images can also be extracted. Alam et al. [76] differentiated between healthy individuals and those who suffer from Alzheimer's disease. They used structural Magnetic Resonance Imaging (sMRI) data, from which a features extraction was done with the aid of Voxel-Based Morphometry (VBM), and the features reduction with the PCA. Finally, the classification accuracy was 84.17%.
Sharma and Srivastava [77] used SVM to classify characters strings, i.e., text classification. The fact that such character strings did not have the proper format to be classified was solved with the help of the Stemmers-Stemming algorithm. For classification, they uses RBF kernel, with LibSVM to distinguish between related phrases with shopping and food, obtaining a correct classification of 64.86%.
Other signals frequently used are those of digital modulation. For example, Zhou [78] used second-, fourth-and sixth-order cumulants as signal features, and also employed an RBF kernel combined with a method of cross-validation grid parameters selection to improve the SVM-based classification accuracy, reaching 92.2%.

Conclusions
Pattern classification is used in certain areas of knowledge. Depending on the needs and characteristics of each system, a specific tool may be selected. SVM offer a high classification accuracy since they allow the combination with other pattern classification methods to reach different objectives that are taken in the classification, besides a high accuracy percentage. In other words, it allows the incorporation of tools that transform the input data to the SVM or that solve the same.
The mentioned authors combined methods to improve the classification accuracy in different applications, although the main purpose of the article is making a compilation of those studies which used myoelectric signals as input vector. In addition, other applications of classification based on SVM are listed to give the reader a broader idea of different fields of study in which the SVM can be applied. Then, four points can be concluded: • The most common kernel used was RBF, followed by linear and Gaussian. • PCA is the most common tool for dimensionality reduction. Most of the published papers seek advantage in feature extraction area, but only certain researchers reported an algorithm to combine directly with the SVM. An example of this is the combination with GA [39,67,70,79]. Therefore, this is an area of opportunity, as there are several algorithms in artificial intelligence that can be tested and combined with SVM.
Another small studied tool is, instead of using feature reduction algorithms such as PCA or SPCA, decreasing the number of input vectors when one has many features extracted; in such case, feature extraction algorithms can be used. Currently, these algorithms are mainly used in data mining.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: AAV is the average amplitude from an EMG signal [80]. The principal problem with this feature is that the amplitude from the signal also varies with sweat or any other factor producing skin conductance changes [81]. For a segment of N samples, it is given by where x k is the k-th sample. •

Mean Absolute Value (MAV)
MAV is one of the most used statistics in signal analysis [27], and it can be described as This is a MAV generalization [82] where a weighted window function w k is used in the MAV equation with the purpose to make it more robust; it is computed as where w k = 1, 0.25 ≤ k ≤ 0.75N, 0, otherwise. •

Modified Mean Absolute Value type 2 (MMAV2)
This is another MAV generalization, such as MMAV1 [82], where the weight function w k is continuous, which improves the smoothness of the weighted function. It is given by where This feature corresponds to the signal energy [38], that is, SSI is the sum of squared values of the EMG signal amplitude, it can be expressed as: This statistic is another power index [52], and is defined as the mean of the squared values of the deviation from the mean of the variable; it can be expressed as •

Zero Crossings (ZC)
A simple measure of frequency can be obtained by counting the number of times that the wave crosses by zero [27]. To reduce the noise effects, a lower bound for the absolute difference could be included as a threshold L. Given two consecutive samples x k and x k+1 , ZC is the count of the zero crossings, that is, where This algorithm will fail to log a zero crossing if the absolute difference of two consecutive samples of opposite sign are lower than the selected threshold. •

Slope Sign Changes (SSC)
This feature allows another description of the frequency; it is the number of times that a slope sign change occurs [27]. Again, an adequate threshold can be chosen to reduce the induced noise. Given three consecutive samples x k−1 , x k and x k+1 , the SSC count is increased if where WL provides information about the complexity of the waveform in each segment [27]; this is simply the accumulated waveform length above time segment defined as AAC is the mean value of the absolute difference between two consecutive samples [80], that is, WAMP is an EMG signal frequency information measure. It is the number of times the amplitude of the absolute difference between two contiguous samples exceeds a preset lower bound L. This is related with the triggering of the action potentials of motor unit and muscle contraction force. It is defined as where RMS is modeled as a Gaussian random process with modulated amplitude, which is related to a constant force and contraction without fatigue; it is very similar to the calculation of the standard deviation since it is given by Some features such as the IEMG also allow signal recognition without a pattern. This is used as an initial detection rate and it is related to the trigger point of the EMG signal sequence. It is defined as the sum of the absolute values of each EMG sample, and it can be expressed as [83]: •

Auto-Regressive Coefficients (AR)
The AR form a prediction model that describes each EMG signal sample x k as a linear combination of the P − 1 previous samples x k−1 , . . . , x k−P plus a white noise error term w k ; this model is expressed as where P is the order of the AR model. The most common P used is 6 [10,22,23,28]; nevertheless, some authors [12] used orders 1, 2 and 3, while Villarejo Mayor et al. [26] used P = 5.
• Standard Deviation (STD) STD is defined as the square root of the variance, that is, wherex is the AAV. •

Difference of Absolute Standard Deviation (DASDV)
DASDV is similar to RMS since it is the standard deviation from wavelength [84]; it is defined by (A16) •

Log-Detector (LOG)
This feature provides an estimate of the strength of muscle contraction [85]. However, its definition is changed to be based on logarithm feature and record detector [82]. LOG can be calculated as: MYOP is a myopulse output averaged value, which is defined as "1" when the EMG signal absolute value exceeds a determined lower bound L [81]. Mathematically, it is calculated by: where • Absolute Value of the Third, Fourth and Fifth Temporal Moments (TM3, TM4 and TM5) Temporal moment is a statistical analysis proposed by Saridis and Gootee [86]. The first and the second temporal moments are the absolute values of AAV and VAR, and the next three are defined as •

Histogram (HIST)
This feature parts elements in EMG signal into M segments, which have to be equally spaced.
Finally, HIST returns a number of signal elements for each segment [82]. A suggested number for M is 9. • V is a non-linear detector that computes implicitly the strength of a muscular contraction. It is defined from a functional mathematical model of EMG signal generation, given by the expression where γ and α are constants and n i is a class of the ergodic Gaussian processes; α is theoretically 0.5, but a rate between 1 and 1.75 was obtained experimentally [87]. An optimal value for V has been reported as 2 [85], which leads to the same definition of RMS. The mathematical definition of V is MAVSLP is a by-product of MAV; it computes differences between adjacent MAV segments [88]. Its equation is given by where i = 1, . . . , I is the number of segments covering the EMG signal. However, with a higher number of segments, its definition approaches that of a traditional MAV feature. In the study of Miller [88], I is set to 3. •

Multiple Hamming Windows (MHW)
MHW segments the raw EMG signal by a series of Hamming windows [89]. The MHW features are computed using the energy of each window, which can be expressed as for i = 1, . . . , I, where w is the Hamming windowing function. Similar to MAVSLP, it is recommended to have a small I; for example, in [90], it is set to 3 and the authors suggested using a 30% overlap. •

Multiple Trapezoidal Windows (MTW)
MTW is similar to MHW since it also uses the contained energy inside a window but, instead of using Hamming windows, it uses trapezoidal windows and is described by for i = 1, . . . , I. As in the case of MHW, I is set to 3 with the same overlapped windows [90]. •

Median Differential Value (MDV)
It is the mean value of the differential value of all peak values of the EMG signal. MDV is described by [36]: • Amplitude of the First Burst (AFB) AFB is defined as the first maximum point extracted from the resulting time function [90].
To calculate it, the raw EMG signal is first squared and passed through a moving mean FIR filter with a Hamming windowing function, and then the low-frequency components of the EMG signal are obtained. The maximum value of the first burst is used as a feature [82].
It is a mean frequency which is calculated as the sum of the product of the MES signal power spectrum and the frequency divided by the total of the spectrum intensity [23]. In addition, it is known as central frequency or spectral center of gravity [38].
where f l is the spectral frequency in the bin of frequency l, P l is the EMG signal power spectrum in the bin l, and M is the frequency bin length. •

Median Frequency (MDF)
It is the frequency divided into two regions with the same amplitude; that is to say, MDF is the total power average [23], and could be expressed as the number MDF with the property that where P l is the EMG signal power spectrum at a frequency bin l, and M is the frequency bin length. •

Peak Frequency (PKF)
It is the frequency at which maximum power occurs, described by PKF = max(P l ). It is defined as an aggregate of the EMG signal frequency spectrum or as zero spectrum moment, and is given by FR can be used to make a distinction between muscle contraction and relaxation [91], which uses the quotient obtained from the division of low-frequency EMG components with high frequency. The equation that defines it is expressed by: where ULC and LLC are upper and lower coefficients of the cutoff frequency of the low band frequency and UHC and LHC are those from the high frequency, respectively. The threshold to divide between low and high frequencies can be determined of two different ways, the first suggested by Han et al. [92], who used a range of 30-250 Hz for low frequency and 250-1000 Hz of high frequency. The values were obtained by experiments. The second was suggested by Oskoei and Hu [91], who proposed obtaining the frequency rates based on the MNF value.
• Power Spectrum Ratio (PSR) The PSR can be defined as an extended version of the PKF and the FR [93]. It is the quotient between P 0 that is close to the maximum value of MES power spectrum and P that is the total energy of MES power spectrum. It can be defined mathematically by where P 0 = f 0 +n ∑ l= f 0 −n P l and P = ∞ ∑ l=−∞ P l , with f 0 the PKF value and n the MES integral limit. Qingju and Zhizeng [93] defined n = 20 and P in 10 and 500 Hz.

• The First, Second and Third Spectral Movements
Another statistic that is offered to MES spectral power analysis is the moment spectral. The first three moments are the most important [38]. These are calculated by (A34) •

Variance of Central Frequency (VCF)
It is defined by the use of some moment spectral and is written as where f c is the central frequency.
• Power Spectral Density (PSD) It is defined as the energy distribution or power signal over the different frequencies of which it is formed. Mathematically, it is defined as where X( f ) is the x(t) Fourier Transform; the integral of this function in the whole axis f is the total energy value of x(t), the signal.
• Frequency Median Density (FMD) FMD splits PSD into two equal parts and is defined as •

Frequency Mean Density (FMN)
FMN is the frequency mean of the PSD and is described by CC is the inverse of Fourier transform of the logarithmic power spectrum of a signal. Low-order coefficients are often used in speech recognition classification [35]. If F k l (p) for (k = 0, . . . , N − 1) is defined as Fourier transform, when the sample measured nth from the signal x l (n), from the source lth, the Fourier transform is with g as window function, k as time sample and m as frequency bins. •

Continuous Wavelet Transform (CWT)
CWT is the wavelet transform of a continuous signal.
with t as translation parameter, a as scale parameter and ψ as mother wavelet function.
• Discrete Wavelet Transform (DWT) DWT divides the signal into an approximation and detail coefficient by passing through low and high as complementary filters. The approximation coefficients are divided into second-level approximation detail coefficients. By repeating the process, a signal is divided into many components of lower resolution. •

Wavelet Packet Transform (WPT)
WPT is a DWT generalized version that applies to both low-pass results (approximations) and high-pass results (details). •

Stationary Wavelet Transform (SWT)
SWT does not decimate the signal in each stage, avoiding non-linear distortion problem of DWT and WPT. Hakonen et al. [58] pointed out that Wavelet transforms allow focusing on specific frequency bands because of their use of subsets of coefficients, which could provide better robustness to the system, as in comparison with TD and DF features. Additionally, the TFD features generate a high dimension feature vector; therefore, dimensionality reduction is usually necessary so that, when using this vector, the speed can be increased.

Experimental Variogram
The variogram function describes the spatial correlation between observations. The variogram definition is based on the idea that the z value depends on the location where x is observed.
where h is the distance vector, x(z i ) is the measurement at location z i and γ(h) is the number of pairs h units apart from the direction of the vector h.