Research on Lower Limb Motion Recognition Based on Fusion of sEMG and Accelerometer Signals

Since surface electromyograghic (sEMG) signals are non-invasive and capable of reflecting humans’ motion intention, they have been widely used for the motion recognition of upper limbs. However, limited research has been conducted for lower limbs, because the sEMGs of lower limbs are easily affected by body gravity and muscle jitter. In this paper, sEMG signals and accelerometer signals are acquired and fused to recognize the motion patterns of lower limbs. A curve fitting method based on median filtering is proposed to remove accelerometer noise. As for movement onset detection, an sEMG power spectral correlation coefficient method is used to detect the start and end points of active signals. Then, the time-domain features and wavelet coefficients of sEMG signals are extracted, and a dynamic time warping (DTW) distance is used for feature extraction of acceleration signals. At last, five lower limbs’ motions are classified and recognized by using Gaussian kernel-based linear discriminant analysis (LDA) and support vector machine (SVM) respectively. The results prove that the fused feature-based classification outperforms the classification with only sEMG signals or accelerometer signals, and the fused feature can achieve 95% or higher recognition accuracy, demonstrating the validity of the proposed method.


Introduction
According to data released by the China Disabled Persons' Federation, there were about 85.2 million disabled people in China in 2010, among which the physically disabled make up the largest proportion, up to about 29.07%[1].However, the data released on webpages for the disabled show that the actual prosthetic fitting rate is not high enough, and the demand for prosthetic and other auxiliary equipment is great [2].Nowadays, research on lower limbs auxiliary equipment is more focused on how to realize autonomous control for patients and make auxiliary motion more natural and convenient.An electromyograghic (EMG) signal is the result of neuromuscular excitement and the release of bioelectricity during the human body's autonomic movement; therefore, it can be used to identify the state of muscles and perceive movement intention.As it is noninvasive and portable, surface EMG (sEMG) signals have become the ideal signal source for rehabilitation equipment and prosthetics to achieve autonomous control [3,4].
In the past 10 years, research of sEMG-based human movement pattern recognition, especially movement identification of upper limbs, was widely conducted.The studies employed plentiful classification models and achieved good recognition effects [5,6], and the results were used to control humanoid machinery and artificial prosthesis [7].However, research methods for the lower limbs are still immature, because the lower limb gait generation mechanism and adjustment methods are more Symmetry 2017, 9, 147 2 of 19 complex [8].However, similarly to the study of upper limbs, in order to realize the accurate identification of the lower limbs' movements, the whole system is divided into several parts: preprocessing, feature extraction, pattern recognition, and post-processing.The research of feature extraction and pattern recognition is especially important.
Al-Ani et al. [9] extracted time-domain (TD) features (root mean square (RMS), mean absolute value, integrated absolute value, zero crossing, slope sign change (SSC), and waveform length) and a time-domain power spectrum descriptors (TDPSD) feature of sEMG, combined with five different classifiers (support vector machine (SVM), linear discriminant analysis (LDA), quadratic discriminant analysis, Bayes classifier, and extreme learning machine) to distinguish six movements including different grip and finger movements.Arjunan et al. [10] successfully identified and analyzed the force of muscle contraction and muscle fatigue by computing six different features of sEMG signals, including the normalized spectral index (NSM5), median frequency (MF), RMS, waveform length (WL), normalized root mean square (NRMS), and increase in synchronization (IIS).Besides, the independence in channel index (ICI) between multi-channel sEMG signals was used to measure the loss of motor units [11].Some blind source separation (BSS) methods, such as independent component analysis (ICA) [12] and non-negative matrix factorization (NMF) were also used for sEMG signals to classify different motion modes.Recently, Xi et al. [13] analyzed and evaluated 15 feature extraction methods of sEMG signals (including time, frequency, time/frequency domain, and entropy) and five classification methods for the identification of seven activities of daily living, providing a sufficient experimental analysis for existing methods.
In addition to sEMG signals, some kinematics signals are usually applied to identify different motions of the lower limbs.The accelerometer signals are usually analyzed by using multiple principal component analysis (MSPCA) [14] or wavelet PCA [15] to classify different ways of walking.Also, the blind source separation method ICA could be used for accelerometer signals to distinguish the toe walking gait from normal gait in Idiopathic Toe Walkers (ITW) children [16].In addition, the accelerometer signals were always used for physical activity monitoring [17] and fall detection [18,19].
In practical applications, the sEMG signals of the lower limbs are affected by the ground environment and gravity center shift, therefore the recognition effect is not ideal by using sEMG alone.On the other hand, using kinematics signals (such as accelerometer signals) alone can obtain a better recognition effect for different motions of the lower limbs.However, action segment detection and onset prediction are difficult to achieve, but it is much easier to achieve them with sEMG signals [20,21].Besides, for the motions with a similar trend, the recognition effect is also not satisfactory by using accelerometer signals alone.Researches have indicated that better recognition results can be achieved when sEMG is combined with some kinds of kinematics signal sources [20,22,23].
Miller et al. [24] used LDA and SVM classifiers respectively to identify seven kinds of gait, and these two classifiers have equivalent performance during experiments.Huang et al. [25,26] divided a gait into four step phases according to plantar pressure changes, and proposed a method based on different phases, which realized the recognition of seven kinds of gait.The features of sEMG and plantar mechanic signals were analyzed and compared, and the results showed that if the two signal sources were fused, the recognition effect was obviously better than using sEMG or using mechanical signals alone, with a recognition rate of about 95%.Wu et al. [27] extracted the approximate entropy and basic-scale entropy of sEMG, and then combined pressure signals based on an analysis of the change rule of plantar pressure.Finally, fall detection was implemented successfully by SVM decision with a fusion of the two signal sources.Juan et al. [28] realized the identification and detection of lower limb movements by an integration of acceleration and sEMG signals, and achieved good results.Liu et al. [29] fused the sEMG and the hip angle and acceleration signals of the lower limbs, achieving the identification of lower limb motor patterns.Besides, Tikkanen et al. [30] combined EMG, heart rate, and accelerometer signals to estimate energy expenditure in four different motion modes on the treadmill.Even some devices were developed to acquire sEMG signals and kinematics signals synchronously [31].
Given that the requirement of an acceleration signal device is simple and portable, and the accelerometer signals contain rich gait information, two sources of signals-sEMG and acceleration-will be used in this paper for lower limb motion recognition.In current related research, the preprocessing of accelerometer signals and movement onset detection from continuous motion signals are always overlooked.Besides, the feature extraction methods for extracting features of signals with a higher separation degree, as well as more efficient classification methods, are still remaining to be further developed for lower limb motion recognition systems.All the above factors will affect the accuracy and stability of the final classification results, and only those recognition systems with high accuracy and good stability can be more effectively used for the actual control of intelligent devices.In this paper, sEMG signals and accelerometer signals are acquired and fused to recognize motion patterns of lower limbs.Firstly, a curve fitting method based on median filtering is proposed to remove accelerometer noise.As for movement onset detection, a sEMG power spectral correlation coefficient method is used to detect the start and end points of active signals.Next, time-domain features and wavelet coefficients of sEMG signals are extracted, and a dynamic time warping (DTW) distance is used for feature extraction of acceleration signals.Finally, five lower limbs' motions are classified and recognized by using Gaussian kernel-based LDA and SVM, respectively.The results show that the fused feature-based classification outperforms the classification with only sEMG signals or accelerometer signals, and the fused feature can achieve 95% or higher recognition accuracy, demonstrating the validity of the proposed method.

Signal Acquisition
In the experiment, the surface EMG sensor and the analysis system produced by Biometrics (UK) are used to acquire sEMG signals (4 channels, 1000 Hz), and the Mini Attitude and Heading Reference System (AHRS) gesture board is used to acquire acceleration signals (3 axes, 100 Hz).
Five daily activities of lower limbs were studied in this paper, including squatting, standing, walking forward, walking upstairs, and walking downstairs.These actions are all daily activities, so it is easy to complete them in a laboratory environment without other additional devices.Acquiring sEMG signals from multiple channels will improve the recognition effect of different motions, and four to eight channels are selected in many studies.Many more channels may reduce the recognition effect because of the crosstalk between different channels of sEMG signals.Combining human anatomy and lots of experimental results, the biceps femoris, semitendinosus, rectus femoris, and vastus lateralis muscles were selected as the sticking points of electrodes in order to obtain effective sEMG signals.
After the electrodes are pasted, the subject will do some simple actions of the lower limbs to test whether the signals of each channel are clear or whether there is obvious noise, so as to ensure that the paste positions of electrodes are correct and effective.In the experiment, each mode of motions requires multiple sets of data used for the analysis of the training process and the testing process.When acquiring data samples, each mode of motions should last for 15 s.In order to avoid the effects of muscle fatigue, a 2 min test is needed after finishing a motion.
The participants include four healthy subjects aged between 23 and 35 years old, and one amputated volunteer with a below-knee amputation.The amputated subjects completed the five lower limb motions by wearing artificial limbs in the experiment.All subjects gave their informed consent for inclusion before they participated in the study.The study was conducted with the approval of the Wuhan University of Technology.

Signal Preprocessing
One of the most important aspects of EMG preprocessing is the selection of the size of the analysis window.Signals need to be processed with the window, and a current real-time signals segment is continuously obtained as the window moves.Huang et al. [27] noted in a gait study that the recognition rate would be higher, but the delay was too large when taking the signals of an entire gait circle as the input.It was advisable to split the signals with a smaller analysis window.There are two kinds of commonly used windows: non-overlapping analysis window and overlapping analysis window.In addition, the overlapping window can make the delay smaller and ensure the full utilization of the signals' information.Phinyomark et al. [32] have studied different combinations of analysis window length and overlapping distance.The accuracy was the best when the window length was 500 ms and the sliding distance was 125 ms.When the analysis window gradually increased from 125 ms to 250 ms, the recognition rate was significantly improved.When the analysis window increased from 250 ms to 500 ms, the increase in recognition rate became flat.So, when the analysis window is too large, it is necessary to consider whether the delay is too long.
As shown in Figure 1, a sliding window analysis is used for signal preprocessing in this study.'Window' refers to the length of the analysis window, and once the previous analysis is complete, the window slides forward at a distance of "Increase".Studies have shown that increasing the length of the analysis window will increase the accuracy of the classification results, but will also increase the processing delay of the system [32,33].In this study, different combinations of analysis window length and sliding distance were studied and analyzed by experiments, so as to ensure a better recognition effect and a smaller processing delay.The analysis window was increased from 200 ms to 250 ms, and the sliding distance was increased from 50 ms to 100 ms [34].Finally, we set an analysis window length of 250 ms, and a sliding distance of 80 ms.
Symmetry 2017, 9, 147 4 of 18 two kinds of commonly used windows: non-overlapping analysis window and overlapping analysis window.In addition, the overlapping window can make the delay smaller and ensure the full utilization of the signals' information.Phinyomark et al. [32] have studied different combinations of analysis window length and overlapping distance.The accuracy was the best when the window length was 500 ms and the sliding distance was 125 ms.When the analysis window gradually increased from 125 ms to 250 ms, the recognition rate was significantly improved.When the analysis window increased from 250 ms to 500 ms, the increase in recognition rate became flat.So, when the analysis window is too large, it is necessary to consider whether the delay is too long.As shown in Figure 1, a sliding window analysis is used for signal preprocessing in this study.'Window' refers to the length of the analysis window, and once the previous analysis is complete, the window slides forward at a distance of "Increase".Studies have shown that increasing the length of the analysis window will increase the accuracy of the classification results, but will also increase the processing delay of the system [32,33].In this study, different combinations of analysis window length and sliding distance were studied and analyzed by experiments, so as to ensure a better recognition effect and a smaller processing delay.The analysis window was increased from 200 ms to 250 ms, and the sliding distance was increased from 50 ms to 100 ms [34].Finally, we set an analysis window length of 250 ms, and a sliding distance of 80 ms.

Movement Onset Detection of sEMG
When lower limbs are active, EMG signals are changed from the resting state to the active state.Movement onset detection is important for motion recognition [35].EMG signal fluctuations increase as the muscles begin to be active, compared to the resting state [36].
The amplitude is low when the muscles are inactive.It is found by experiments that the correlation coefficient of the power spectrum corresponding to two neighboring sliding windows is small, in which situation the occasional fluctuations in sEMG signals can be almost completely filtered out.The amplitude increases when the muscles are active, and the correlation coefficient of the power spectrum corresponding to two neighboring sliding windows becomes big.Thus, the following method is used for movement onset detection.


Use a sliding window to segment the whole data of each channel into n epochs and calculate the power spectrum ().


Calculate the sum of the maximum number of correlation coefficients corresponding to each channel,   ().


Compare   () with a predefined threshold .If   () is greater than , EMG signals are recognized as in an active state.


For the selection method of the threshold, Cheng et al. [28] found by experiment that the method for selecting the threshold is relatively stable and has a good repeatability according to the percentage of maximal voluntary contraction.The method is a reference in this study.Through experiments, 1% of the maximal voluntary contraction is set as the threshold of the moving average method and Teager-Kaiser operator, and 0.1% of the maximal voluntary contraction is set as the threshold of the power spectral correlation coefficient method.

Movement Onset Detection of sEMG
When lower limbs are active, EMG signals are changed from the resting state to the active state.Movement onset detection is important for motion recognition [35].EMG signal fluctuations increase as the muscles begin to be active, compared to the resting state [36].
The amplitude is low when the muscles are inactive.It is found by experiments that the correlation coefficient of the power spectrum corresponding to two neighboring sliding windows is small, in which situation the occasional fluctuations in sEMG signals can be almost completely filtered out.The amplitude increases when the muscles are active, and the correlation coefficient of the power spectrum corresponding to two neighboring sliding windows becomes big.Thus, the following method is used for movement onset detection.

•
Use a sliding window to segment the whole data of each channel into n epochs and calculate the power spectrum P f (n) • Calculate the correlation coefficient P cor (n) between P f (n) and P f (n + 1).

•
Calculate the sum of the maximum number of correlation coefficients corresponding to each channel, PC sum (n).

•
Compare PC sum (n) with a predefined threshold TH.If PC sum (n) is greater than TH, EMG signals are recognized as in an active state.

•
For the selection method of the threshold, Cheng et al. [28] found by experiment that the method for selecting the threshold is relatively stable and has a good repeatability according to the percentage of maximal voluntary contraction.The method is a reference in this study.Through experiments, 1% of the maximal voluntary contraction is set as the threshold of the moving average method and Teager-Kaiser operator, and 0.1% of the maximal voluntary contraction is set as the threshold of the power spectral correlation coefficient method.

De-Nosing of Accelerometer Signals
Due to the situation that error bits occur along with the signal transmission process in Mini AHRS, there will be a lot of glitches or continuous segments of noise existing in the collected accelerometer signals.As for the de-noising of such noise, the method based on a median filter is often used, the main process of which is to select a length of point neighborhood and replace the original points with the median of the neighborhood.The key factor of the median filter is the window size of the neighborhood.The more obvious fluctuations will be eliminated with a too-large window, while a too-small window will not achieve good results for the continuous noise segment.In practical application, the window size is selected by experiments and comparison while gradually increasing the window size.The method can eliminate the isolated noise points and the calculation is relatively simple and quick.Nevertheless, a lot of details are also filtered out when removing the signal noise, resulting in a too-smooth signal curve.On this basis, a median curve fitting-based accelerometer signal de-noising method is proposed in this study.For the signal in a point neighborhood window, the sample points are sorted by size.Only a few intermediate points are selected, and the points of the largest and the smallest value on both sides are removed, since such points are more likely to be noise.By doing a curve fitting calculation for these intermediate points, glitch noise can be removed, and the signal can be smooth as well.
Curve fitting is to use a function curve to approximate the known measurement data points, and then use the curve to calculate the value of other unknown points.When the form of the curve function is uncertain, the polynomial curve is generally used to fit.Assume that there are N sample points (x i , y i ), which are expressed by an m-th-order polynomial.
where b m is the fitting coefficient, and its matrix form can be written as: As a result, Y = XB, B = (X X) −1 X Y, and we can see that:

Feature Extraction
Feature extraction is a very important part of pattern recognition, and a good feature has higher resolution, which has a great influence on the recognition results [37].

Feature Extraction of EMG Signals
Nowadays, many methods of feature extraction have been proposed, among which time-domain features and the wavelet coefficient are most commonly used [38][39][40].The most commonly used time-domain features are Root Mean Square (RMS), Zero Crossing (ZC), Willison Amplitude (WAMP), SSC and Wave Length (WL).

•
RMS can reflect the amplitude of signals: where n is the number of samples in an analysis window.

•
ZC is the number of times the waveform passes through the zero point, which reflects the frequency characteristics of the signals.If the zero point is set to be a range rather than a value, the noise introduced by the zero point can be reduced.When we select a threshold thr, the calculation formula of ZC is shown in Formula ( 5): where f (x) = sgn(x); i f |x| > thr 0; otherwise • WAMP can reflect the relative fluctuation amplitude of a sample point and the next one.

•
SSC can be used as the supplementary information of the signal frequency.
where ω = 0.001 in this paper.

•
WL not only reflects the amplitude information of the signals, but also contains the fluctuation frequency information of the signals.
The information contained in a single time-domain parameter is not abundant enough, therefore, in much research, multiple time-domain parameters are combined to obtain a more efficient feature vector [41].

Wavelet Transform
Wavelet transform (WT) is a time-frequency domain transformation with the ability to simultaneously locate the time and frequency.It mainly replaces the original trigonometric function basis e −jω in the Fourier transform, and selects the wavelet basis Ψ a,b (t) which satisfies the corresponding allowable conditions.The basis of the Fourier transform is infinite, but the wavelet basis is damped and the length is finite.There is only one frequency variable ω in the Fourier transform, while the wavelet transform has two variables: scale a and translation b, where the scale corresponds to different frequency variables, and the translation corresponds to the positioning on time.
For the signal f (t), a continuous wavelet transform can be expressed as: where function system Ψ a,b (t) can be expressed as: The scale a controls the scaling of the wavelet function, and the large value corresponds to the low frequency, with the corresponding Ψ a,b (t) becoming wider.The translation b controls the translation of the wavelet function, which corresponds to different time scale.Besides, |a| −1/2 is the normalizing factor.In essence, the wavelet transform of the signal f (t) is actually the inner product of f (t) and Ψ a,b (t), and the continuous wavelet transform quantitatively expresses the degree of correlation between the signal and each wavelet in the wavelet function system.
When a wavelet transform is used to decompose the EMG signal, what needs to be considered firstly is the mother wavelet and the decomposition scale.The mother wavelets mainly used in EMG processing are: Daubechies, Coiflets, Symlets, et al.Decomposition scales should also be chosen according to the actual situation.In this paper, multiple experiments with different mother wavelets and decomposition scales were carried out.Fisher's Index (FI) was used to measure the separation degree of feature vectors.The larger the FI is, the higher the separation degree of the feature sets is, which is more conducive to a subsequent pattern classification.Experiments showed that the "sym4" mother wavelet with four wavelet decomposition had a higher FI, so it was chosen for the wavelet transform of this study.
The dimension of wavelet coefficients is very large, and a dimension reduction process is needed before it can be used as a feature [42][43][44].Therefore, we extract the principal components of each layer wavelet coefficient as the feature.When a wavelet transform is used to decompose the EMG signal, what needs to be considered firstly is the mother wavelet and the decomposition scale.The mother wavelets mainly used in EMG processing are: Daubechies, Coiflets, Symlets, et al.Decomposition scales should also be chosen according to the actual situation.In this paper, multiple experiments with different mother wavelets and decomposition scales were carried out.Fisher's Index (FI) was used to measure the separation degree of feature vectors.The larger the FI is, the higher the separation degree of the feature sets is, which is more conducive to a subsequent pattern classification.Experiments showed that the "sym4" mother wavelet with four wavelet decomposition had a higher FI, so it was chosen for the wavelet transform of this study.

Feature Extraction of Accelerometer Signals
The dimension of wavelet coefficients is very large, and a dimension reduction process is needed before it can be used as a feature [42][43][44].Therefore, we extract the principal components of each layer wavelet coefficient as the feature.Euclidean distance is a commonly used calculating distance, but sometimes it may not be suitable.It is assumed that there are two time sequences : [1,1,12,5,8,9] and : [1,12,5,5,8,9], then we can get  = 170 by using the computational formula of Euclidean distance:  = ∑([] − [] 2 ).In fact, the two matrix sequences are very similar.The DTW algorithm can be a good solution to this problem [45].

Feature Extraction of Accelerometer Signals
The DTW algorithm is most commonly used in speech signal processing.Due to the high randomness of speech signals, even if the same speaker is speaking the same word, the time length is different, namely the amplitude is different, and there will be uneven bending distortion along the Euclidean distance is a commonly used calculating distance, but sometimes it may not be suitable.It is assumed that there are two time sequences M : [1, 1, 12, 5, 8, 9] and N : [1, 12, 5, 5, 8, 9], then we In fact, the two matrix sequences are very similar.The DTW algorithm can be a good solution to this problem [45].
The DTW algorithm is most commonly used in speech signal processing.Due to the high randomness of speech signals, even if the same speaker is speaking the same word, the time length is different, namely the amplitude is different, and there will be uneven bending distortion along the time axis.It is similar to accelerometer signals.The time and distance can be combined by the DTW algorithm, and then the distance between the vectors of sequences with different length is compared.
Assume that the length of the test template is M frames and the length of reference template is N frames.Then, make the test template's each frame number m = 1-M and set the horizontal axis of the two-dimensional grid to one-to-one correspondence, and mark the reference template's frame number n = 1-N on the vertical axis.The corresponding vertical and horizontal lines can be drawn with the integer frame numbers, in which case each intersection in the grid represents the matching point between a frame in the test template and a frame in the training template.The DTW algorithm has two main steps, one is to calculate the distance between each frame of the two templates, that is to find the matching distance matrix; and the next step is to find the shortest path in the matching distance matrix.
In this paper, we restrict the research path, and the constraint conditions are shown in Formula (11).
where D(m, n) is the cumulative distance from the starting point to the intersection (m, n), and d m,n is the distance between the test template T m and the reference template point R n .For each of the five motions, we select a signal with a certain length to create a reference template.Since the length of accelerometer signals in a 250 ms analysis window is approximately 25, in order to further reduce the computations, an averaged value for each three points is added to the test template, and the length of each test template is limited to eight dimensions.When making the reference template, the median is also taken from every three points.
To determine the start frame and the end frame, the test template T is sliding and matching on the reference template R. When the distance between the first frame T(1) and the k-th frame T(k) is smaller than the set threshold value TH, the k-th frame on R is determined to be the start of the matching distance calculation.Then, the frame is found which meets the condition of d(T(8), R(k + n)) < TH within the range of R(k + 5) to R(k + 10), and determine it as the end frame.If not found, the test template continues sliding backwards until the end of the reference template.The length of the reference template is about two gait cycles.

Recognition and Classification Methods
As for motion recognition based on sEMG, another key point is selection of classifiers, and the recognition effect will be very different when combining different feature vectors and classifiers [25,46].Due to the development of pattern recognition, a lot of classifiers can be chosen, among which the most commonly used classifiers are LDA [47], ANN [48], and SVM [49].In this paper, we will mainly use the LDA classifier and the SVM classifier to classify and validate the features, and make a comparative analysis.

LDA Classifier Based on Kernel Function
The basic idea of LDA is to project the samples of high-dimensional space into low-dimensional space.For example, if the sample space is H-dimensional, we can project the samples to a straight line along a certain direction, which is a one-dimensional space.For the H-dimensional vectors which Symmetry 2017, 9, 147 9 of 19 are inseparable and overlapping with each other, the principle is to find a suitable direction "w" in projection, so that the different classes are separated as far as possible while the same class is gathered as much as possible.In this process, the Fisher criterion is usually used to find the direction.
For multi-class problems, the one-many portioning strategy on the basis of two-class problems is used.That is to successively regard the patterns belonging to the ω class and the patterns not belonging to the ω class as two classes, and then make discrimination.Because there may be points not considered to belong to the ω class every time, in order to avoid the phenomenon of unclassified, we regard the class from its closest point as the output.
LDA is simple, robust, and not prone to over-fitting.It is widely used and especially effective for linear problems.However, it is limited in nonlinear data processing.For this reason, the data can be projected nonlinearly into a feature space F, and then the Fisher discriminant criterion is calculated in F, which is called the kernel function-based LDA method.∅ is used to represent the nonlinear function mapping X to F, that is ∅ : X → F .ω i refers to the i-th class to be classified.Then, the input sample x 1 , x 2 , • • • , x N can be represented as ∅(x 1 ), ∅(x 2 ), • • • , ∅(x N ) in the feature space.The mean value of the samples is: The within-class scatter matrix S i ∅ , the total within-class scatter matrix S w ∅ , and the inter-class scatter matrix S b ∅ can be expressed in the feature space as: The Fisher discriminant function in F is: Finally, the projection of the final feature space ∅ on w can be transformed into the projection of k(•, x) on the direction α.
By combining Equations ( 13) and ( 14), the discriminant function can be written as: where x n ∈ ω j , and 1 Nj is a matrix whose all items are 1/N j .
In the choice of kernel function, the Gauss kernel function is used in this paper as the kernel function because of its good function approximation performance.

SVM Classifier Based on Grid Search Optimization
The SVM is proposed for small sample problems, and it has great advantages for nonlinear and high-dimensional pattern recognition.Besides, it has good ability to solve global optimization problems and will not fall into the local minimum [50].SVM is a binary classification model, and the learning strategy is to find the optimal hyper-plane as the classification boundary to maximize the interval.
For the selection of SVM kernel functions, the commonly used functions are linear kernel, polynomial kernel, radial basis kernel, and sigmoid kernel.Many studies have shown that the Radial Basis Function (RBF) has better performance, and the linear kernel function is essentially a special case of the RBF.Compared with the polynomial kernel, RBF requires fewer parameters, and the number of the kernel function's parameters directly affects the complexity of the function.As for the sigmoid kernel, it has similar performance to the RBF for some parameters.In the case that the feature dimension is not large, the RBF is selected in this paper.Meanwhile, since SVM is a binary classification model and multiple SVMs need to be set for multi-classification problems, we use the "one to other" approach in this paper.
The learning strategy of SVM is to find the optimal hyper-plane as the classification boundary to maximize the interval.The classification hyper-plane of samples is expressed as f (x) = w T x + b = 0, and the interval of the hyper-plane is defined as δ = y i (w•x + b) − 1 ≥ 0. The maximum interval is an important concept of SVM.It is difficult to achieve complete data separation because of the influence of noise in reality.Over-fitting occurs easily when mapping the data to a very high dimension by a strong kernel function.Therefore, the system is expected to have a certain fault tolerance for noise and outliers, ensuring the robustness of the system.If the slack variable is added at this time, the original constraint equation can be changed to: The constraint allows some points to appear at intervals smaller than 1, that is, outliers.The accurate classification will lead to some losses when discarding these outliers, but a bigger classification interval will be acquired, that is, more robust classification performance.Adding the loss to the target function, that is, penalty factor C, then the optimization problem can be written as: The penalty factor C of the support vector machine (SVM) and the parameter g of the kernel function, that is, the variance in the radial basis, need to be selected appropriately.The basic method of selecting C and g is to search for the parameters through a grid, that is, to find the optimal solution in a certain sense by searching and comparing one by one, which is easy to implement.Grid search is to let C and g select discrete values in a certain range.Test results will be acquired taking the training set as input one by one.Final parameters will be determined according to the best result.In the search process, if multiple sets of results correspond to the highest recognition rate, the minimum C and the first set of g are selected because over-fitting occurs easily when C is too large.
In this paper, the method of two-level grid search is chosen.The step of each grid is gradually reduced.The first-level grid step is (3,3).The best combination of C and g (Cbest1, gbest1) will be determined after the first level-grid search.Then, exact parameters will be selected with a smaller step.Experiments show that it is most appropriate that the step of C and g is set as (0.3, 0.3).At this point, the recognition rate will not be significantly improved with a smaller grid step, but the search time will increase.When the step increases, the time reduces, but the recognition rate decreases significantly.Therefore, the second-level grid is set at the size of 3 × 3 based on the first best grid point (Cbest1, gbest1) as the center, and the search step is set at (0.3, 0.3) for an accurate search.
When testing the training set, we adopt k-fold cross-validation in this paper to make the result more convincing.In addition, it is inevitable that there will be signal errors due to muscle jitter or errors of the system's discrimination in the recognition results.In order to improve the accuracy and stability of the system, the Majority Vote (MV) algorithm is used to make decisions on the recognition results [51].

Analysis of EMG Signal Onset Detection
In order to verify the effectiveness of the proposed movement onset detection method, the moving average method and the Teager-Kaiser energy method were used to compare the detection results.The start and end points of the original signal were confirmed by observation as shown in Figure 3.
When testing the training set, we adopt k-fold cross-validation in this paper to make the result more convincing.In addition, it is inevitable that there will be signal errors due to muscle jitter or errors of the system's discrimination in the recognition results.In order to improve the accuracy and stability of the system, the Majority Vote (MV) algorithm is used to make decisions on the recognition results [51].

Analysis of EMG Signal Onset Detection
In order to verify the effectiveness of the proposed movement onset detection method, the moving average method and the Teager-Kaiser energy method were used to compare the detection results.The start and end points of the original signal were confirmed by observation as shown in Figure 3.The three methods were employed for movement onset detection, respectively, and the results are shown in Table 1.The misjudged points are marked in bold with a short underline.
It can be seen that the moving average method has some mistakes in movement segment detection, but the two error points happen to be between the previous action start point and the next end point, thus the final segmentation will not be affected.The Teager-Kaiser operator can also get the start and end of the signal, but the resulting active segment will be shortened compared to the two other methods.The proposed power spectral correlation coefficient method can accurately determine the starting and ending points of the movement segment, and the signal length will not be The three methods were employed for movement onset detection, respectively, and the results are shown in Table 1.The misjudged points are marked in bold with a short underline.
It can be seen that the moving average method has some mistakes in movement segment detection, but the two error points happen to be between the previous action start point and the next end point, thus the final segmentation will not be affected.The Teager-Kaiser operator can also get the start and end of the signal, but the resulting active segment will be shortened compared to the two other methods.The proposed power spectral correlation coefficient method can accurately determine the starting and ending points of the movement segment, and the signal length will not be lost.Besides, there is no misjudgment point for the signal, which indicates a good fault tolerance.In addition, the other signal segments are also processed as above, and the results also show that the proposed method can achieve good movement segment detection.

Analysis of Signal De-Noising Results
Error bits often occur along with the signal transmission.As shown in Figure 4, there are a lot of glitches or continuous segments of noise existed in the original accelerometer signals.
Symmetry 2017, 9, 147 11 of 18 lost.Besides, there is no misjudgment point for the signal, which indicates a good fault tolerance.In addition, the other signal segments are also processed as above, and the results also show that the proposed method can achieve good movement segment detection.

Analysis of Signal De-Noising Results
Error bits often occur along with the signal transmission.As shown in Figure 4, there are a lot of glitches or continuous segments of noise existed in the original accelerometer signals.The commonly used median filtering method was used to deal with the noise, and the neighborhood window was set as  = 9 .Then, the glitch noise could almost be filtered out completely, and the results are shown in Figure 5.However, many details of the accelerometer signals were also filtered out, resulting in a lack of information.The commonly used median filtering method was used to deal with the noise, and the neighborhood window was set as win = 9.Then, the glitch noise could almost be filtered out completely, and the results are shown in Figure 5.However, many details of the accelerometer signals were also filtered out, resulting in a lack of information.The commonly used median filtering method was used to deal with the noise, and the neighborhood window was set as  = 9 .Then, the glitch noise could almost be filtered out completely, and the results are shown in Figure 5.However, many details of the accelerometer signals were also filtered out, resulting in a lack of information.Then, we adopted the curve fitting method based on median filtering proposed in this paper to deal with the noise, and set the neighborhood window as  = 5 through experiment contrasts, resulting in better performance.The results are shown in Figure 6.Then, we adopted the curve fitting method based on median filtering proposed in this paper to deal with the noise, and set the neighborhood window as win = 5 through experiment contrasts, resulting in better performance.The results are shown in Figure 6.The commonly used median filtering method was used to deal with the noise, and the neighborhood window was set as  = 9 .Then, the glitch noise could almost be filtered out completely, and the results are shown in Figure 5.However, many details of the accelerometer signals were also filtered out, resulting in a lack of information.Then, we adopted the curve fitting method based on median filtering proposed in this paper to deal with the noise, and set the neighborhood window as  = 5 through experiment contrasts, resulting in better performance.The results are shown in Figure 6.From Figures 5 and 6, we can see that, compared with the traditional median filtering method, adding the curve fitting process can remove the glitch noise while preserving more details of the original signal.

Spatial Distribution of Features
In this paper, four-channel EMG signals were collected, so the time-domain feature dimension is 4, the number of wavelet coefficients is 4 × 5 = 20D, and the DTW distance feature dimension of the accelerometer signals is equal to the number of lower limb motion classes, 5.In order to visually observe the spatial distribution of these features in two-dimensional space, the PCA method was used to reduce the dimension of these features to two dimensions.
After analyzing the features of accelerometer signal and sEMG, the two signal sources were merged, and the feature can be represented as F = [F sEMG ; F acc ].F sEMG and F acc represent the features of sEMG and accelerometer signals respectively, and the two features are synchronous.
Figures 7-9 show the spatial distribution of features from healthy subjects.Figure 10 shows the spatial distribution of an amputated volunteer.(ds, ups, walk, up, sit represent walking downstairs, walking upstairs, walking forward, standing and squatting, respectively.) From the spatial distribution in Figures 7-10, we can see that these features all have some degree of separation.The separating degree of the accelerometer signal features of the five subjects is obvious, which proves the effectiveness of the proposed feature extraction method based on DTW distance.For the healthy subjects, the separating degree of the time-domain union feature is greater than that of a single time-domain feature, and the separating degree of the principal components of the wavelet coefficients is also significant.For the amputated subject, there is a certain degree of separation between the various features, in which the separating degree of the accelerometer signal features is most obvious.For this judgment, the method proposed in this paper has the prospect of applying to people with limb disabilities.
Figures 7-9 show the spatial distribution of features from healthy subjects.Figure 10 shows the spatial distribution of an amputated volunteer.(ds, ups, walk, up, sit represent walking downstairs, walking upstairs, walking forward, standing and squatting, respectively.) From the spatial distribution in Figures 7-10, we can see that these features all have some degree of separation.The separating degree of the accelerometer signal features of the five subjects is obvious, which proves the effectiveness of the proposed feature extraction method based on DTW distance.For the healthy subjects, the separating degree of the time-domain union feature is greater than that of a single time-domain feature, and the separating degree of the principal components of the wavelet coefficients is also significant.For the amputated subject, there is a certain degree of separation between the various features, in which the separating degree of the accelerometer signal features is most obvious.For this judgment, the method proposed in this paper has the prospect of applying to people with limb disabilities.Figures 7-9 show the spatial distribution of features from healthy subjects.Figure 10 shows the spatial distribution of an amputated volunteer.(ds, ups, walk, up, sit represent walking downstairs, walking upstairs, walking forward, standing and squatting, respectively.) From the spatial distribution in Figures 7-10, we can see that these features all have some degree of separation.The separating degree of the accelerometer signal features of the five subjects is obvious, which proves the effectiveness of the proposed feature extraction method based on DTW distance.For the healthy subjects, the separating degree of the time-domain union feature is greater than that of a single time-domain feature, and the separating degree of the principal components of the wavelet coefficients is also significant.For the amputated subject, there is a certain degree of separation between the various features, in which the separating degree of the accelerometer signal features is most obvious.For this judgment, the method proposed in this paper has the prospect of applying to people with limb disabilities.In addition, from Figures 7-10, we can find that there is a lot of overlapping between the various features, so it is difficult to obtain a satisfactory recognition effect by classifying features in linear space.

Analysis of Classification and Recognition Results
Firstly, the original LDA classifier and the Gaussian kernel-based LDA classifier are respectively used to classify different features in this study.For the time-domain features, multiple features are selected to be combined in much of the literature, and it can be seen from Figure 8 that the separating degree of the time-domain union feature is greater than that of a single time-domain feature.Therefore, only the results of the time-domain union feature are counted in the table.In this paper, the k-fold verification method is used to verify the classification effect of the used classifier.Here, k = 3, and the mean value of the recognition accuracy is calculated as the final recognition result for each subject.The classification results of the two kinds of LDA classifiers are shown in Tables 2 and 3, where Subjects 1-4 are healthy subjects, and Subject 5 is the amputated subject.In addition, from Figures 7-10, we can find that there is a lot of overlapping between the various features, so it is difficult to obtain a satisfactory recognition effect by classifying features in linear space.

Analysis of Classification and Recognition Results
Firstly, the original LDA classifier and the Gaussian kernel-based LDA classifier are respectively used to classify different features in this study.For the time-domain features, multiple features are selected to be combined in much of the literature, and it can be seen from Figure 8 that the separating degree of the time-domain union feature is greater than that of a single time-domain feature.Therefore, only the results of the time-domain union feature are counted in the table.In this paper, the k-fold verification method is used to verify the classification effect of the used classifier.Here, k = 3, and the mean value of the recognition accuracy is calculated as the final recognition result for each subject.The classification results of the two kinds of LDA classifiers are shown in Tables 2 and 3, where Subjects 1-4 are healthy subjects, and Subject 5 is the amputated subject.It can be seen from Tables 2 and 3 that the matching distance feature of the accelerometer signals based on the DTW algorithm proposed in this paper has the best performance compared to the time-domain union feature and wavelet feature of the sEMG signals.Meanwhile, because of the variation of the amplitude of sEMG signals in the movements of lower limbs, the recognition accuracy is low when using original LDA to classify the commonly used sEMG features, as little as 50-60% for healthy subjects, which further confirms that it is difficult to obtain good classification results from these features in the linear space, and it is necessary to classify the features in nonlinear space.Comparing Tables 2 and 3, we can see that the classification accuracy of the Gaussian kernel-based LDA is improved by the nonlinear mapping of feature space, and the effect is obviously better than that of the original LDA.
The SVM is proposed for small sample problems, and it has great advantages for nonlinear and high-dimensional pattern recognition.As a result, the SVM classifier is also adopted to classify the different motion modes, where the kernel function is selected as the RBF kernel function, and the parameters C and g are selected by using the grid search optimization method discussed in Section 2.4.2.The classification results of the five subjects with SVM are shown in Table 4.It can be seen from Tables 3 and 4 that, for both the Gaussian kernel-based LDA classifier and the SVM classifier, the matching distance feature of accelerometer signals based on the DTW algorithm proposed in this paper has the best performance among these single features.Meanwhile, SVM has a higher accuracy and better stability of recognition results than the Gaussian kernel-based LDA for these single features.
However, the above motion recognition result is only up to 90% (by the accelerometer feature), which cannot meet the needs of the application.Additionally, the features of accelerometer signals and sEMG signals can be combined to achieve more satisfactory classification results (above 95%), as shown in Table 5.For the single features and the fused features, the classification results of the Gaussian kernel-based LDA and SVM are shown in Figures 11 and 12, respectively.It can be seen from the above tables and figures that for the lower limb motion recognition of the five subjects, the recognition effect with fused features of sEMG and accelerometer signals is better than the recognition effect with sEMG signals or accelerometer signals alone.In addition, for the healthy subjects, the wavelet feature of sEMG signals combined with an accelerometer feature will achieve a better recognition effect.For the amputated subject, a better recognition effect can be obtained by a Gaussian kernel-based LDA with the fused features of an sEMG time-domain union feature and an accelerometer feature, by SVM classifier; however, the results with the fused features of an sEMG wavelet feature and an accelerometer feature seem to be better.In general, the recognition effect of the by using a sliding window analysis in signal processing, making our approach more advantageous in the real-time control of intelligent devices.
However, the signal acquisition of this study is mainly from healthy subjects, and the study has not been applied to the real-time control of intelligent prosthesis yet.In a follow-up study, we will carry out experiments on more subjects with limb disabilities by cooperating with a rehabilitation hospital, so that our research has better application prospects in the intelligent control of prosthesis.

Figure 1 .
Figure 1.Settings of analysis window and sliding window.

Figure 1 .
Figure 1.Settings of analysis window and sliding window.

Figure 2
Figure 2 shows the change of x-axis and z-axis accelerometer signals when walking downstairs.It can be seen that in the same gait period, the accelerometer signals are sequential and periodic.Therefore, it is feasible to match the accelerometer signals of unknown motion patterns with a template built based on training samples, and the matching distance can be used to measure the acceleration change of unknown motions and find a more similar motion with the training samples.

Figure 2
Figure 2 shows the change of x-axis and z-axis accelerometer signals when walking downstairs.It can be seen that in the same gait period, the accelerometer signals are sequential and periodic.Therefore, it is feasible to match the accelerometer signals of unknown motion patterns with a template built based on training samples, and the matching distance can be used to measure the acceleration change of unknown motions and find a more similar motion with the training samples.

Figure 2 .
Figure 2. Change of accelerometer signals when walking down stairs.

Figure 2 .
Figure 2. Change of accelerometer signals when walking down stairs.
get D = 170 by using the computational formula of Euclidean distance: D

Figure 3 .
Figure 3. Marks of start and end points of original signals.

Figure 3 .
Figure 3. Marks of start and end points of original signals.

Figure 6 .
Figure 6.De-noising results of curve fitting method based on median filtering.

Figure 6 .
Figure 6.De-noising results of curve fitting method based on median filtering.Figure 6. De-noising results of curve fitting method based on median filtering.

Figure 6 .
Figure 6.De-noising results of curve fitting method based on median filtering.Figure 6. De-noising results of curve fitting method based on median filtering.

Table 1 .
Results of three methods for signal onset detection.

Table 1 .
Results of three methods for signal onset detection.

Table 5 .
Classification results of fused features of sEMG and accelerometer signals (%).