Applying Improved Multiscale Fuzzy Entropy for Feature Extraction of Mi-eeg

Electroencephalography (EEG) is considered the output of a brain and it is a bioelectrical signal with multiscale and nonlinear properties. Motor Imagery EEG (MI-EEG) not only has a close correlation with the human imagination and movement intention but also contains a large amount of physiological or disease information. As a result, it has been fully studied in the field of rehabilitation. To correctly interpret and accurately extract the features of MI-EEG signals, many nonlinear dynamic methods based on entropy, such as Approximate Entropy (ApEn), Sample Entropy (SampEn), Fuzzy Entropy (FE), and Permutation Entropy (PE), have been proposed and exploited continuously in recent years. However, these entropy-based methods can only measure the complexity of MI-EEG based on a single scale and therefore fail to account for the multiscale property inherent in MI-EEG. To solve this problem, Multiscale Sample Entropy (MSE), Multiscale Permutation Entropy (MPE), and Multiscale Fuzzy Entropy (MFE) are developed by introducing scale factor. However, MFE has not been widely used in analysis of MI-EEG, and the same parameter values are employed when the MFE method is used to calculate the fuzzy entropy values on multiple scales. Actually, each coarse-grained MI-EEG carries the characteristic information of the original signal on different scale factors. It is necessary to optimize MFE parameters to discover more feature information. In this paper, the parameters of MFE are optimized independently for each scale factor, and the improved MFE (IMFE) is applied to the feature extraction of MI-EEG. Based on the event-related desynchronization (ERD)/event-related synchronization (ERS) phenomenon, IMFE features from multi channels are fused organically to construct the feature vector. Experiments are conducted on a public dataset by using Support Vector Machine (SVM) as a classifier. The experiment results of 10-fold cross-validation show that the proposed method yields relatively high classification accuracy compared with other entropy-based and classical time–frequency–space feature extraction methods. The t-test is used to prove the correctness of the improved MFE.


Introduction
Stroke is a disease that causes lethal damage to human health.These patients often experience motor dysfunction.It is critical to help these patients restore their motor function.Motor Imagery Electroencephalography (MI-EEG) is a bioelectrical signal that carries enormous amounts of physiological or disease information.As a result, much attention has been paid to its application in the rehabilitation field.The active rehabilitation of patients can be realized by identifying MI-EEG.The accurate feature extraction of MI-EEG is the key to its successful application [1,2].MI-EEG is a nonlinear and non-stationary signal, and many researchers have devoted their efforts to exploring its feature extraction from the perspective of the time, frequency, and spatial domains.
There are three main kinds of classical feature extraction methods, i.e., Autoregressive (AR) model, Wavelet Transform (WT), and Common Spatial Pattern (CSP).The basic idea of the AR model is making use of the AR process to approximate a real EEG signal, and then using AR model coefficients as the feature of the EEG signal.This method is simple and has good real-time performance, but it is a kind of time domain analysis method for a stationary signal.The length of data segment determines the resolution and accuracy of parameter estimation [3].The WT method is able to take advantage of scale and shift operations to perform multiscale decomposition and time-frequency domain localization, effectively obtaining the time-frequency information of signals.Thus, the analysis of EEG signals can benefit from WT [4].However, recent studies do not support the use of wavelet features for the discrimination of EEG signals because of redundant and irrelevant information contained in wavelet coefficients [5].The CSP method can find two directions that maximize variance for one class and minimize variance for the opposite class by using the matrix simultaneous diagonalization theory [6].The performance of CSP is closely related with its operational frequency band.Hence, setting a broad frequency range in CSP generally yields poor classification accuracy [7].To overcome this problem, the Common Spatio-Spectral Pattern (CSSP) [8], Sub-band Common Spatial Pattern (SBCSP) [9], and Filter Bank Common Spatial Pattern (FBCSP) [10] have been proposed on the basis of CSP and widely applied to the feature extraction of EEG signal.
With the development of nonlinear dynamics, it has been proved that the brain is a nonlinear dynamic system, and EEG can be considered as the output of the system.To obtain a better classification result, some researchers try to use various complexity measures-for example, dimensions and entropies-to extract the features of EEG signals.However, their calculations frequently face the problem of insufficient data points.Moreover, most defined dimensions and entropies display the limitations of experimental data in the application since all recorded signals are polluted by noise in some way, which prevents accurate estimation.In order to address the insufficient and noisy data problems in physiological signals, Pincus [11] put forward Approximate Entropy (ApEn), which can measure the complexity of time series.Once introduced, ApEn has been widely used in physiological signals such as EEG [12,13] and has shown its advantages compared with most complexity measures-for instance, the correlation dimension and the Lyapunov exponent.Nevertheless, it lacks relative consistency and the result relies heavily on the data length, which is caused by self-matching.To tackle these problems, Richman [14] presented Sample Entropy (SampEn), in which there is no self-matching.Once put forward, SampEn has a certain application in the feature extraction of EEG [15,16].Zhou et al. calculated the SampEn of the MI-EEG signal and the classification accuracy was between 50% and 87.8% with a Linear Discriminant Analysis (LDA) classifier [15]; Wang et al. used SampEn as the feature of MI-EEG, and the classification rate was between 75.48% and 78.68% by using Support Vector Machine (SVM) optimized by a Genetic Algorithm (GA) [16].These applications indicate that SampEn possesses relative consistency and is less dependent on data length.However, the Heaviside function is used to measure the similarity definition of reconstructed vectors in the computation of ApEn and SampEn, and this results in a lack of continuity for both the two statistical measures because of the mutation of the Heaviside function.With regard to this disadvantage, Chen et al. developed a new statistic, Fuzzy Entropy (FE), which can evaluate the self-similarity of time series [17].Compared with the calculation procedure of ApEn and SampEn, FE replaces the Heaviside function with fuzzy membership function.It not only has stronger relative consistency and is less dependent on data length, but also achieves continuity and more resistance to noise.FE has been widely applied in EEG.[20], can also detect dynamic complexity changes in time series, and it has been widely applied to the analysis of EEGs [21,22].Meanwhile, PE has some limitations.It is unable to extract the complexity information from data with spiky features or abrupt changes in magnitude and easily ignores the information contained in a small probability event.Subsequently, Weighted-Permutation Entropy (WPE) [23] and Permutation Rényi Entropy (PEr) [24] were introduced to improve the performance of PE and be exploited for the feature extraction of EEG.However, ApEn, SampEn, PE, and FE are single-scale based and therefore fail to account for the multiple scales inherent in brain electrical activities.So, Costa et al. proposed Multiscale Entropy (MSE) by introducing a scale factor on the basis of SampEn [25,26].MSE can measure the complexity of time series over multiple scales instead of a single scale and can be used in the EEG signals of sleep staging and fatigue driving [27,28].Motivated by the merits of PE and MSE, Aziz and Arif put forward Multiscale Permutation Entropy (MPE) [29].Ouyang et al. extracted the features of EEG by calculating its MPE and the classification accuracy was 90.6% with a LDA classifier [30].Furthermore, Morabito et al. proposed Multivariate Multi-Scale Permutation Entropy (MMPE) to incorporate the simultaneous analysis of multi-channel data as a unique block and applied it to a complexity analysis of Alzheimer's disease EEGs [31].Zheng et al. came up with Multiscale Fuzzy Entropy (MFE) by combining FE and scale factor, and used rolling bearing fault type recognition [32].At present, MFE is mainly applied on fault diagnosis and has shown its superiority to most complexity measures such as ApEn, SampEn, FE, PE, and so on.Recently, Azami et al. proposed the so-called refined composite multivariate multiscale fuzzy entropy (RCmvMFE) based on MFE, and applied it to feature extraction on intracranial EEG data and fantasia data; the average classification accuracies on the two datasets were 96% and 75% with a SVM classifier, respectively [33].However, there are few reports about the application of MFE in MI-EEG signal analysis.In addition, the same parameter values are employed to calculate MFE on multiple different scales using the MFE method.As a matter of fact, from the perspective of signal processing, the essence of the coarse-grained process of time series is to sample the signal after low-pass filtering, and each coarse-grained time series carries the characteristic information of the original signal on different scale factors and has its own complexity.Therefore, it is necessary to optimize and use the different parameters in the calculation of MFE on different scale factors.This will make it more reasonable to measure the complexity of a signal and enhance the adaptability of MFE.In this paper, MFE is improved by using independent optimization strategy for the parameters on different scale factors, and improved MFE (IMFE) is applied to the feature extraction of MI-EEG.
The paper is organized as follows.In Section 2, the basic principles of FE, MFE, and SVM are briefly introduced.Section 3 describes the working process of IMFE in detail.In the next section, extensive experiments are conducted on a publicly available dataset.Section 5 concludes the paper.

Fuzzy Entropy
Fuzzy Entropy (FE) is defined to measure the complexity and irregularity of the time series; the computation process of FE is as follows [18,19]: 1. Assume that a time series is denoted as X = {x(i) : 1 ≤ i ≤ N}, where N is the length of time series.Then, the mean x 0 (i) of m consecutive x(i) values can be calculated as follows: where parameter m is called the embedding dimension and is a positive integer.Then m dimensional vector Appl.Sci.2017, 7, 92 4 of 20 2. Suppose that d m ij ( i, j = 1 ∼ N − m; j = i) is denoted as the maximum distance between X m i and X m j .Then, d m ij can be calculated according to Equation (3): where ) is denoted as a fuzzy function: where exp(•) denotes the exponential function, parameter n is the boundary gradient, and r is the boundary width.Then the similarity degree D m ij between X m i and X m j is given as: 4. Φ m (n, r) is obtained from Equation ( 6): 5. Repeat Steps (1)-( 4) for obtaining m + 1 dimensional vector X m+1 i , and Φ m+1 (n, r) can be described as 6.The FE of time series {x(i) : 1 ≤ i ≤ N} can be calculated as follows: where ln(•) denotes the natural logarithm function.If N is finite, FE(X, m, n, r) can be expressed as

Multiscale Fuzzy Entropy
Multiscale Fuzzy Entropy (MFE) is defined to measure the complexity and irregularity of time series based on multiple scale factors.A brief description of MFE is as follows [32]: 1. Assume that a time series is denoted as X = {x(i) : 1 ≤ i ≤ N}, where N is the length of time series.Coarse-grained time series {y(τ)} is constructed as {y 1 (τ), y 2 (τ), • • • , y N/τ (τ)}, where τ is a positive integer.y j (τ) is computed based on Equation (10): For τ = 1, the time series {y(1)} is an original time series.The length of each coarse-grained time series equals the length of the original time series divided by scale factor τ. The coarse-grained procedure is shown in Figure 1.  2. The FE of each coarse-grained time series can be computed according to Equations ( 1)-( 9) and MFE is expressed by Equation ( 11) as a function of scale factor τ.This procedure is called MFE analysis.

Support Vector Machine
The theory of Support Vector Machines (SVM) has received much attention in recent years.The basic idea of SVM is as follows.In the first place, it maps input points to a high dimensional feature space by nonlinear transformation and then finds an optimal classification hyperplane by maximizing the margin between two classes in this space.In this paper, SVM is chosen as a classifier to recognize MI-EEG and the radial basis function is selected as the kernel function.Furthermore, the parameters of SVM, including the kernel parameter and the error penalty factor, are optimized by using traversal searching method.

Description of Feature Extraction
Based on the idea of independent optimization of parameters, the normal MFE is improved, and the improved MFE (IMFE) method is applied to the feature extraction of MI-EEG.The specific steps can be summarized as follows: 1.The optimal selection of time interval for MI-EEG Suppose that the original MI-EEG signal of the Lth channel in a trial is where C and K are the number of channels and sampling points per trial, respectively.The FE time series of every channel of MI-EEG is calculated for each training sample of different imaginary tasks.To obtain the mean FE time series, they are superimposed and then averaged for every imaginary task.The optimal sampling interval may be determined to ensure there is a significant difference between the mean fuzzy entropies of MI-EEG on two channels for every imaginary task.A new EEG signal X 1 L can be constituted by selecting the optimal sampling interval of the data points from the original EEG signal X 0 L and it can be expressed as , where d is the first selected point from X 0 L , e is the last selected point from X 0 L , N is the number of selected data points from X 0 L , and N = d − e + 1.

The Coarse-Grained Procedure of MI-EEG
The coarse-grained MI-EEG signals of X 1 L on multiple scale factors τ = 1, 2, • • • , τ max can be obtained according to Section 2.2 and denoted as is the maximum of scale factor τ and X 1 L,j represents the coarse-grained MI-EEG of the Lth channel for the jth scale.

The Calculation of MFE
The FE of each coarse-grained MI-EEG signal can be calculated according to Section 2.1.The FE of coarse-grained sequences Thus, the MFE of the Lth channel MI-EEG is given by Equation ( 12): 4. The parameters' independent optimization of MFE for different scale factors τ On different scale factors, the parameters, including embedding dimension m, boundary gradient n, and boundary width r, will directly influence the MFE value.To obtain the feature vectors that are beneficial to classification, the relevant parameters will be optimized independently.For multiple scale factors τ, when any two parameters of m, n, and r remain relatively fixed, the variation curves of the average and standard deviation of MI-EEG's MFE with a parameter are calculated for different imaginary tasks, respectively.The optimal values of the parameters may be determined by considering the fluctuation of the error line for each imaginary task, the overlapping degree of error lines, and the difference of means between different tasks.After the independent optimization of the parameters, the MFE of the Lth channel MI-EEG is expressed as where represents the improved FE of the Lth channel MI-EEG for scale factor τ.

The construction of feature vector
The improved multiscale fuzzy entropies of MI-EEG signals on all channels are fused serially to construct a feature vector, or based on the characteristics of MI-EEG, their improved multiscale fuzzy entropies on relevant channels are organically fused.Only the feature vector after serial fusion is given by Equation ( 14): where F represents the feature vector of MI-EEG in a trial.

Data Source
The experimental data were derived from dataset III of Brain Computer Interface (BCI) Competition II provided by BCI Lab, Graz University of Technology in Graz, Austria (http://www.bbci.de/competition/ii/).The dataset was obtained by collecting the EEG signals of a healthy adult female while she was imagining left hand or right hand movement.The dataset was composed of 280 trials, of which 140 were used for training and 140 were used for testing.The 140 trials used for training and testing included 70 trials imagining left hand movement and 70 trials imagining right hand movement.Each trial lasted for 9 s, and the timing diagram of the experiment is shown schematically in Figure 2.
As shown in Figure 2, for the first two seconds the subject remained quiet and relaxed; when the time reached 2 s, a short beep indicated the start of the trial and the '+' cursor appeared on the monitor simultaneously.When the time was 3-9 s, the visual cue (left-right arrow) was displayed as the direction of motor imagery.At the same time, the subject imagined the hand movement according to the direction indicated by the arrow.The data were sampled at 128 Hz.The three channels, C3, Cz, and C4, were applied to acquire EEG, using Ag/AgCl as an electrode, and the placement of the electrode is shown in Figure 3.As shown in Figure 2, for the first two seconds the subject remained quiet and relaxed; when the time reached 2 s, a short beep indicated the start of the trial and the '+' cursor appeared on the monitor simultaneously.When the time was 3-9 s, the visual cue (left-right arrow) was displayed as the direction of motor imagery.At the same time, the subject imagined the hand movement according to the direction indicated by the arrow.The data were sampled at 128 Hz.The three  channels, C3, Cz, and C4, were applied to acquire EEG, using Ag/AgCl as an electrode, and the placement of the electrode is shown in Figure 3.

Optimal Selection of Time Interval for MI-EEG
The MI-EEG signals on channels C3 and C4 from 280 trials of samples were selected as the experimental data.Based on the event-related desynchronization (ERD)/event-related synchronization (ERS) phenomenon associated with hand movement or imaging movement, the optimal sampling interval of MI-EEG may be determined to ensure there is a significant difference between fuzzy entropies corresponding to two motor imaginary tasks.First, for 140 trials of imaginary left hand movement EEG signals on channel C3, the FE time series of each MI-EEG could be obtained by using a sliding time window, where the window length was 1 s, the interval was one sampling point, and parameters m , n , and r were set to 2, 2, and 0.1 SD, respectively.Next, the 140 FE time series of imaginary left hand movement EEGs on channel C3 were superimposed, and averaged to obtain their mean FE time series.In a similar way, the mean FE time series of 140 trials of imaginary left hand movement MI-EEGs on channel C4 could be calculated.Furthermore, the mean FE time series of 140 trials of imaginary right hand movement MI-EEGs on channels C3 and C4 were obtained as well.The experimental results are shown in

Optimal Selection of Time Interval for MI-EEG
The MI-EEG signals on channels C3 and C4 from 280 trials of samples were selected as the experimental data.Based on the event-related desynchronization (ERD)/event-related synchronization (ERS) phenomenon associated with hand movement or imaging movement, the optimal sampling interval of MI-EEG may be determined to ensure there is a significant difference between fuzzy entropies corresponding to two motor imaginary tasks.First, for 140 trials of imaginary left hand movement EEG signals on channel C3, the FE time series of each MI-EEG could be obtained by using a sliding time window, where the window length was 1 s, the interval was one sampling point, and parameters m, n, and r were set to 2, 2, and 0.1 SD, respectively.Next, the 140 FE time series of imaginary left hand movement EEGs on channel C3 were superimposed, and averaged to obtain their mean FE time series.In a similar way, the mean FE time series of 140 trials of imaginary left hand movement MI-EEGs on channel C4 could be calculated.Furthermore, the mean FE time series of 140 trials of imaginary right hand movement MI-EEGs on channels C3 and C4 were obtained as well.The experimental results are shown in Figure 4.The solid magenta line represents the mean FE time series of 140 trials of MI-EEG on channel C3 for each imaginary task, and the green dotted line expresses the mean FE time series of 140 trials of MI-EEG on channel C4 for each imaginary task.
As seen in Figure 4, the means of FE on channels C3 and C4 also change with the variation of sampling point for any one of two imaginary tasks.When the sampling interval is [451,900], the difference of mean FE values between C3 and C4 channels is remarkable.In this paper, the sampling interval will be chosen in the following feature extraction of MI-EEG.
hand movement MI-EEGs on channel C4 could be calculated.Furthermore, the mean FE time series of 140 trials of imaginary right hand movement MI-EEGs on channels C3 and C4 were obtained as well.The experimental results are shown in Figure 4.The solid magenta line represents the mean FE time series of 140 trials of MI-EEG on channel C3 for each imaginary task, and the green dotted line expresses the mean FE time series of 140 trials of MI-EEG on channel C4 for each imaginary task.As seen in Figure 4, the means of FE on channels C3 and C4 also change with the variation of sampling point for any one of two imaginary tasks.When the sampling interval is [451,900], the

Multiscale Analysis
The entropy of time series is usually used to characterize its complexity, but the entropy variation of some sequences may be inconsistent on different scale factors.If the majority of scales' entropy values are higher for one time series than for another, the former is considered more complex than the latter.
We randomly selected two trials of MI-EEG signals on channel C3, in which one is derived from imaginary left hand movement and the other is derived from imaginary right hand movement.Then, we calculated MFE values of two trials of MI-EEG signals.At this time, τ max was equal to 4 and in the calculation of MFE values on different scale factors, the parameters m, n, and r were set to 2, 2, and 0.1 SD, respectively.Here, SD was the standard deviation of the original MI-EEG signal.The experimental result is shown in Figure 5. difference of mean FE values between C3 and C4 channels is remarkable.In this paper, the sampling interval will be chosen in the following feature extraction of MI-EEG.

Multiscale Analysis
The entropy of time series is usually used to characterize its complexity, but the entropy variation of some sequences may be inconsistent on different scale factors.If the majority of scales' entropy values are higher for one time series than for another, the former is considered more complex than the latter.
We randomly selected two trials of MI-EEG signals on channel C3, in which one is derived from imaginary left hand movement and the other is derived from imaginary right hand movement.Then, we calculated MFE values of two trials of MI-EEG signals.At this time, max τ was equal to 4 and in the calculation of MFE values on different scale factors, the parameters m , n , and r were set to 2, 2, and 0.1 SD, respectively.Here, SD was the standard deviation of the original MI-EEG signal.The experimental result is shown in Figure 5. From Figure 5, we can see that the FE of MI-EEG signal for imaginary left hand movement is smaller than the FE of MI-EEG signal for imaginary right hand movement when 1 τ = .This means that the latter is more complex than the former.However, when 2 τ = , 3 and 4, the FE values of the MI-EEG signal for imaginary left hand movement are all higher than those of the MI-EEG signal for imaginary right hand movement corresponding to one scale factor; this shows that the former is more complex than the latter.So, it is unreasonable to analyze the complexity of a time series on a single scale with FE.In addition, it can be seen that the coarse-grained MI-EEG signal on each scale factor contains important information related to the imaginary task.To obtain more information, it is necessary for MI-EEG to conduct multiscale analysis.From Figure 5, we can see that the FE of MI-EEG signal for imaginary left hand movement is smaller than the FE of MI-EEG signal for imaginary right hand movement when τ = 1.This means that the latter is more complex than the former.However, when τ = 2, 3 and 4, the FE values of the MI-EEG signal for imaginary left hand movement are all higher than those of the MI-EEG signal for imaginary right hand movement corresponding to one scale factor; this shows that the former is more complex than the latter.So, it is unreasonable to analyze the complexity of a time series on a single scale with FE.In addition, it can be seen that the coarse-grained MI-EEG signal on each scale factor contains important information related to the imaginary task.To obtain more information, it is necessary for MI-EEG to conduct multiscale analysis.

Construction of Feature Vector
After performing multiscale analysis for MI-EEG on all channels, a variety of forms can be used to construct the feature vector.If the multiscale fuzzy entropies of MI-EEG on all channels are fused serially, the feature vector is obtained by Equation ( 15): where τ max is the maximum of τ; MFE 1 C3 , MFE 1 C4 , and MFE 1 Cz can be calculated by Equation ( 12) and stand for the MFE of MI-EEG on channels C3, C4, and Cz, respectively.
Considering the ERD/ERS phenomenon of MI-EEG on channels C3 and C4, we can also flexibly select the MFE values of MI-EEG on those channels to construct a feature vector after a specific operation to guarantee the sharp distinction between two imaginary tasks.The result is as shown in Equation ( 16): In calculations for fuzzy entropy on different scale factors, parameters m, n, and r were set to 2, 2, and 0.1 SD, respectively; SD was the standard deviation of the original MI-EEG signal, and τ max was equal to 4.
To find the best means of feature vector construction, some experiments were conducted on a public dataset using SVM as a classifier.In addition, to eliminate the contingency in the feature extraction process and increase the objectivity of feature evaluation, 10-fold Cross-Validation (CV) was employed.This means that the data, including 280 trials, were randomly divided into 10 subsets, each of which was used as a validation set.Experiment environment: Win7 operating system, memory 4G, programming language is Matlab R2014a.The experiment results of 10-fold CV are listed in Table 1.Table 1 shows that the feature vector F 2 constructed by Equation ( 16) has certain advantages over F 1 constructed by Equation (15), and the highest classification accuracy and average classification rate with 10-fold CV were 100% and 90.36%, respectively.It is obvious that the feature vector F 2 is more conducive to mining and characterizes more and deeper feature information contained in the MI-EEG signal.Therefore, feature vector F 2 is employed in the following experiments.

The Parameters' Independent Optimization of MFE
In the course of calculating MFE, the four parameters, i.e., scale factor τ, embedding dimension m, boundary gradient n, and boundary width r, should be determined in advance.For scale factor τ, when it was too large, the calculation of MFE would raise the problem of insufficient data points, while a too small scale factor would not be good for accessing the deeper information of MI-EEG.From Section 4.2 we can see that the change of FE is significant when the time range is [451,900].Meanwhile, to ensure that the calculation of FE is not affected by the data length, the length of time series is at least 100 points.As a result, τ max was set to 4. The remaining three parameters would be determined by experiments.Firstly, when τ, m, n, and r were given fixed values, we calculated the fuzzy entropy of 140 trials of MI-EEG on channel C3 and C4 for imaginary left hand movement.
Then, we calculated 140 differences between the FE on channel C3 and the FE on channel C4, and we defined FED = FE C3 − FE C4 , where FE C3 and FE C4 stand for the FE of MI-EEG on channels C3 and C4, respectively.Finally, we calculated the mean and standard deviation of 140 FED, and they were noted as M FED and SD FED , respectively.For a given τ, we could obtain the variation curve of M FED and SD FED with any one parameter while the others were kept constant.When scale factor τ was 1, we obtained three curves, which are presented in Figure 6.The solid pink line represents the situation of imaginary left hand movement.Similarly, the results associated with imaginary right hand movement are displayed with a green dotted line.Figure 6a gives the variation curves of M FED and SD FED with parameter m when τ = 1, n = 2, r = 0.1SD, and SD is the standard deviation of original MI-EEG.When m equals 1, although SD FED is small for any one of the imaginary tasks, which means the MI-EEG signals are more intensive for any one of two tasks, the M FED values corresponding to two imaginary tasks are quite close, which means the two tasks show a poor distinction.With the increase of m, for any one of two imaginary tasks the M FED first increases and then remains stable.The bigger m is, the more accurate the calculation of FE is and the more detailed information is implied.Meanwhile, the more complex the computation, the more data points are needed.Taking into account the constraints of the experimental dataset, the parameter m is set to 2. Figure 6b displays the variations of M FED and SD FED with parameter n when τ = 1, m = 2, r = 0.1SD, and SD is the standard deviation of the original MI-EEG.When parameter n is 1, the M FED values corresponding to the two imaginary tasks are quite different, which means they show a better distinction, but the SD FED values are also very big for the two tasks, which means the MI-EEG signal is too scattered for any one task.When parameter n equals 3 or 4, SD FED is small for any one imaginary task.On the other hand, the M FED values corresponding to the two imaginary tasks are very close.When parameter n equals 2, SD FED is moderate for any one imaginary task; meanwhile the M FED values corresponding to the two imaginary tasks are quite different.To sum up, the parameter n is set to 2. Figure 6c exhibits variations of M FED and SD FED with parameter r when τ = 1, m = 2, n = 2.The M FED values corresponding to the two imaginary tasks are quite different, while the SD FED values for the two tasks are both large when parameter r is relatively small.With the increase of r, the M FED corresponding to the two imaginary tasks become smaller.In conclusion, the parameter is selected as r = 0.1SD, and SD is the standard deviation of the original MI-EEG.
In summary, when scale factor τ is 1, the values of parameters m, n, and r have a significant influence on the FE of MI-EEG for two imaginary tasks, and this will directly affect the quality of the FE features.Therefore, it is necessary to further optimize the parameters of FE when scale factor τ equals 2, 3, and 4. The M FED and SD FED values with different parameters m, n, and r are obtained by using a similar computation process, and their variations are shown in Figures 7-9, respectively.A detailed analysis of Figures 7-9 was performed using the analysis method of Figure 6.It can be seen that the parameters m = 2, n = 2, r = 0.1SD are more suitable for the classification of MI-EEG when τ = 2, 3, and 4. Note that SD is the standard deviation of the coarse-grained MI-EEG corresponding to each scale factor τ, and not the standard deviation of the original MI-EEG.
To prove the necessity of parameter optimization, a comparison between IMFE and MFE was carried out on a public dataset and SVM was chosen as a classifier.In the computation of MFE, r = 0.1SD were selected, but SD was different in the two methods.In the IMFE method, SD was the standard deviation of the coarse-grained MI-EEG corresponding to each scale factor τ, i.e., SD was varied with τ.However, in the MFE method, SD was the standard deviation of the original MI-EEG on each scale factor τ and was constant.The experimental results are listed in Table 2.As seen from Table 2, when IMFE is employed to extract the feature of MI-EEG, the average classification rate with 10-fold CV increases by 1.78% from 90.36% to 92.14% compared with MFE, and the experimental results of IMFE show more stability than MFE.This demonstrates that the parameters' independent optimization of MFE is beneficial for enhancing the accuracy and adaptability of the feature extraction method.

Comparison of Multi-Feature Extraction Methods
To compare IMFE with the nonlinear dynamic methods and the classical feature extraction methods, some experiments were conducted on a public dataset using SVM as a classifier.

Comparison with Multiple Nonlinear Dynamic Methods
The proposed IMFE and the other nonlinear dynamic methods, including ApEn, SampEn, FE, PE, WPE, MSE, MPE, and MFE, were used to extract the features of MI-EEG.The experimental results are given in Figure 10.
parameters' independent optimization of MFE is beneficial for enhancing the accuracy and adaptability of the feature extraction method.

Comparison of Multi-Feature Extraction Methods
To compare IMFE with the nonlinear dynamic methods and the classical feature extraction methods, some experiments were conducted on a public dataset using SVM as a classifier.

Comparison with Multiple Nonlinear Dynamic Methods
The proposed IMFE and the other nonlinear dynamic methods, including ApEn, SampEn, FE, PE, WPE, MSE, MPE, and MFE, were used to extract the features of MI-EEG.The experimental results are given in Figure 10.As seen from Figure 10, the classification results of ApEn and SampEn are relatively poor, because they use the Heaviside function to measure the similarity definition of reconstructed vectors.FE replaces the Heaviside function with fuzzy membership function, and the recognition rate has been improved.The classification accuracy of WPE is higher than that of PE.This is because  As seen from Figure 10, the classification results of ApEn and SampEn are relatively poor, because they use the Heaviside function to measure the similarity definition of reconstructed vectors.FE replaces the Heaviside function with fuzzy membership function, and the recognition rate has been improved.The classification accuracy of WPE is higher than that of PE.This is because WPE also contains amplitude information besides the order structure of MI-EEG, compared with PE.Compared with ApEn, SampEn, FE, and PE, the classification rates of MSE, MPE, and MFE have been greatly improved.That is because ApEn, SampEn, FE, and PE can only estimate the complexity of time series based on a single scale and MSE, MPE, and MFE can measure the complexity of time series on multiple scale factors.IMFE is improved by adding the parameters' independent optimization to MFE, and it can adaptively extract more and deeper information so that the classification accuracy can be further improved.In addition, compared to other nonlinear dynamic methods, its standard deviation (±2.1) is the smallest, which means that the improved MFE method has better stability.

Comparison with Multiple Classical Feature Extraction Methods
In this section, some experiments were carried out to compare the proposed IMFE with classical feature extraction methods, including AR, WT, CSP, CSSP, and FBCSP.In the experiments, the parameter values of AR and WT were the same as the reference [3,4], respectively.The parameter values of CSP, CSSP, and FBCSP were the same as the reference [34].The average classification accuracies with 10-fold CV are shown in Figure 11.
The classification results of AR, WT, CSP, CSSP, and FBCSP are not as good as those of the IMFE method.This is mainly because these classical feature extraction methods only take into account the information in one domain, including time domain, frequency domain or spatial domain, and they are even completed on the premise that MI-EEG is a linear signal.In fact, MI-EEG is a typical nonlinear signal.IMFE is matched with the nonlinear property of signal, and the parameters' independent optimization of MFE is advantageous for accurately extracting and correctly interpreting the characteristic information of MI-EEG.Furthermore, the minimal standard deviation of IMFE (±2.1) shows the strong stability, and this can better meet the requirements of a real application.
In this section, some experiments were carried out to compare the proposed IMFE with classical feature extraction methods, including AR, WT, CSP, CSSP, and FBCSP.In the experiments, the parameter values of AR and WT were the same as the reference [3,4], respectively.The parameter values of CSP, CSSP, and FBCSP were the same as the reference [34].The average classification accuracies with 10-fold CV are shown in Figure 11.The classification results of AR, WT, CSP, CSSP, and FBCSP are not as good as those of the IMFE method.This is mainly because these classical feature extraction methods only take into account the information in one domain, including time domain, frequency domain or spatial domain, and they are even completed on the premise that MI-EEG is a linear signal.In fact, MI-EEG is a typical nonlinear signal.IMFE is matched with the nonlinear property of signal, and the parameters' independent optimization of MFE is advantageous for accurately extracting and correctly interpreting the characteristic information of MI-EEG.Furthermore, the minimal standard deviation of IMFE (±2.1) shows the strong stability, and this can better meet the requirements of a real application.

Comparison of Multiple Recognition Methods
In this section, a comparative study was performed on the same public dataset to prove the effectiveness of the recognition method, i.e., the combination of IMFE and SVM.
First, the combined recognition of IMFE and SVM was compared with the top three methods in BCI competition II in many aspects [35].The detailed information is illustrated in Table 3.
From Table 3, it can be seen that the highest recognition rate of 100% is achieved by using IMFE feature extraction and the SVM classifier; it has increased significantly compared with the top three methods.Furthermore, the average recognition rate with 10-fold CV is higher than the highest recognition rates of the other three methods.

Comparison of Multiple Recognition Methods
In this section, a comparative study was performed on the same public dataset to prove the effectiveness of the recognition method, i.e., the combination of IMFE and SVM.
First, the combined recognition of IMFE and SVM was compared with the top three methods in BCI competition II in many aspects [35].The detailed information is illustrated in Table 3. Note: "-" means that average classification rate with 10-fold CV is not given in the reference.
From Table 3, it can be seen that the highest recognition rate of 100% is achieved by using IMFE feature extraction and the SVM classifier; it has increased significantly compared with the top three methods.Furthermore, the average recognition rate with 10-fold CV is higher than the highest recognition rates of the other three methods.
From Table 4, we can see that the proposed recognition method has the highest classification rate (100%) and its average recognition rate (92.14%) with 10-fold CV is higher than the highest recognition rates corresponding to the other methods except references [4,40,49].

Computation Time
The computation time can actually reflect the complexity of a method, and it is closely related to the application in a system.Figure 12 presents the test time of feature extraction in a trial by using the proposed IMFE method and the conventional feature extraction methods (AR, WT, CSP, CSSP, FBCSP, ApEn, SampEn, FE, PE, WPE, MSE, and MPE).Less time is consumed in application of AR, WT, CSP, CSSP, and ApEn, but their effect of feature extraction is not ideal, as we know from the above analysis.The time consumption of SampEn, PE, WPE, MSE, and MPE is at a medium level.FBCSP, FE, and IMFE need more time, especially IMFE, which means that IMFE has a relatively higher computational complexity compared to the other methods.This is mainly because of the exponential membership function in IMFE.However, it could basically satisfy the requirements of a BCI system.FBCSP, FE, and IMFE need more time, especially IMFE, which means that IMFE has a relatively higher computational complexity compared to the other methods.This is mainly because of the exponential membership function in IMFE.However, it could basically satisfy the requirements of a BCI system.

Statistical Analysis
The IMFE is developed in this paper on the basis of MFE.It is necessary to analyze the differences between IMFE and MFE statistically.In the following, a paired t-test is applied to identify whether there is a significant difference when they are used for feature extraction of MI-EEG.
Suppose that

Statistical Analysis
The IMFE is developed in this paper on the basis of MFE.It is necessary to analyze the differences between IMFE and MFE statistically.In the following, a paired t-test is applied to identify whether there is a significant difference when they are used for feature extraction of MI-EEG.
Suppose that IFE lh L,τ and IFE rh L,τ stand for the improved fuzzy entropy of the Lth channel MI-EEG for scale factor τ corresponding with imaginary left hand and right hand movements, respectively.Similarly, FE lh L,τ and FE rh L,τ denote the normal fuzzy entropy of the Lth channel MI-EEG for scale factor τ corresponding with imaginary left hand and right hand movements, respectively.Define ).The null hypothesis is H 0 : µ D ≤ 0; the alternative hypothesis is H 1 : µ D > 0. The one-tailed paired t-test was chosen (α = 0.05).The decision rule is to reject H 0 if: or p = P{t > t α (n − 1)} ≤ 0.05, (18) where d and s D denote the mean and standard deviation of sample D, respectively; n is the number of elements in sample D. The t-test results are shown in Table 5.From Table 5, we see that all the p values are less than 0.05.Therefore, the null hypothesis H 0 is rejected at the 0.05 significance level.Therefore, the fuzzy entropy values obtained by IMFE and MFE are significantly different and IMFE outperforms MFE in discriminating between two imaginary tasks.

Conclusions
Aiming at the highly nonlinearity and multiscale property of MI-EEG, MFE is introduced and improved to measure its complexity.Especially with the parameters' independent optimization strategy, all the parameters of MFE are optimized for each scale factor in sequence.So, the MFE of each coarse-grained MI-EEG on a different scale factor is calculated by using different parameter values.This makes IMFE a more accurate multiscale analysis method.It would be helpful to discover the nature of a nonlinear signal in more detail.The improved MFE is applied to the feature extraction of MI-EEG, and results in relatively higher classification accuracy compared with the exiting nonlinear dynamic methods and conventional time, frequency, or spatial domain analysis methods.The statistical results of a paired t-test further illustrate that IMFE has significant advantages over MFE.These lay the foundation for expanding the application of nonlinear dynamic methods in EEG or even other bioelectrical signals.However, IMFE requires relatively more computation time than some other methods.This is mainly due to the exponential fuzzy membership function in MFE.We will solve that problem by simplifying the fuzzy membership function and improving programming skills in future work.
the time series { (1)} y is an original time series.The length of each coarse-grained time series equals the length of the original time series divided by scale factor τ .The coarse-grained procedure is shown in Figure 1.

Figure 1 .Figure 1 .
Figure 1.The coarse-grained process of time series for scale factor . τ Figure 1.The coarse-grained process of time series for scale factor τ.

(
http://www.bbci.de/competition/ii/).The dataset was obtained by collecting the EEG signals of a healthy adult female while she was imagining left hand or right hand movement.The dataset was composed of 280 trials, of which 140 were used for training and 140 were used for testing.The 140 trials used for training and testing included 70 trials imagining left hand movement and 70 trials imagining right hand movement.Each trial lasted for 9 s, and the timing diagram of the experiment is shown schematically in Figure 2.

Figure 2 .
Figure 2. The timing diagram of the collection experiment.

Figure 2 .
Figure 2. The timing diagram of the collection experiment.

Figure 4 .Figure 4 .
Figure 4. (a) The mean FE time series of MI-EEG on channels C3 and C4 for imaginary left hand

Figure 4 .
Figure 4. (a) The mean FE time series of MI-EEG on channels C3 and C4 for imaginary left hand movement; (b) the mean FE time series of MI-EEG on channels C3 and C4 for imaginary right hand movement.

Figure 4 .
Figure 4. (a) The mean FE time series of MI-EEG on channels C3 and C4 for imaginary left hand movement; (b) the mean FE time series of MI-EEG on channels C3 and C4 for imaginary right hand movement.

Figure 5 .
Figure 5.The MFE variation curves with scale factor τ .The pink dotted line and blue solid line represent the MFE of MI-EEG on channel C3 corresponding to imaginary left hand and right hand movement, respectively.

Figure 5 .
Figure 5.The MFE variation curves with scale factor τ. The pink dotted line and blue solid line represent the MFE of MI-EEG on channel C3 corresponding to imaginary left hand and right hand movement, respectively.

Figure 6 .
Figure 6.For =1τ , the mean and standard deviation of FED curves for input variables (a) embedding dimension m ; (b) boundary gradient n ; and (c) boundary width r .

FigureFigure 6 .
Figure 6a gives the variation curves of

Figure 8 .Figure 9 .Figure 9 .
Figure 8.For =3 τ , the mean and standard deviation of FED curves for input variables (a) embedding dimension m ; (b) boundary gradient n ; and (c) boundary width r .

Figure 10 .
Figure 10.The average classification accuracy and standard deviation performed by 10-fold CV for IMFE and multiple nonlinear dynamic methods.

Figure 10 .
Figure 10.The average classification accuracy and standard deviation performed by 10-fold CV for IMFE and multiple nonlinear dynamic methods.

Figure 11 .
Figure 11.The average classification accuracy and standard deviation performed by 10-fold CV for IMFE and multiple classical feature extraction methods.

Figure 11 .
Figure 11.The average classification accuracy and standard deviation performed by 10-fold CV for IMFE and multiple classical feature extraction methods.

Figure 12 .
Figure 12.A comparison of time consumption using the proposed IMFE and conventional feature extraction methods.

Figure 12 .
Figure 12.A comparison of time consumption using the proposed IMFE and conventional feature extraction methods.
D I MFE = IFE lh L,τ − IFE rh L,τ , D MFE = FE lh L,τ − FE rh L,τ , and D = D I MFE − D MFE , and calculate D for each channel of C3, C4, and Cz and each one of scale factor τ = 2, 3 and 4.Then, we tested that D is a sample from a normal population N(µ D , σ 2 D Tian et al. extracted the features of MI-EEG signals based on FE and the average classification accuracy was 87.22% by a LDA classifier [18]; Xu et al. made use of FE to extract attention level features from EEG signals and the average identification rate reached 81% with a SVM classifier [19].In addition, Permutation Entropy (PE), which was introduced by Bandt et al. in 2002

Table 1 .
Comparison of feature vector construction.

Table 2 .
The influence of parameter optimization in MFE on recognition rate.

Table 3 .
Comparison with the top three recognition methods in BCI competition II.

Table 4 .
Comparison with other recognition methods."represents the combination or optimization of methods for feature extraction or classification; "-" means that average classification rate with 10-fold CV is not given in the reference.BP: Back propagation, WPT: Wavelet Packet Transform, BPR: Biomimetic Pattern Recognition, ICA: Independent Component Analysis, PSD: Power Spectral Density, WE: Wavelet Entropy, MOWT: Maximum Overlap Wavelet Transform, SSE: Singular Spectrum Entropy, KNN: K-Nearest Neighbor, EMD: Empirical Mode Decomposition, FCM: Fuzzy C-means, PSO: Particle Swarm Optimization, PCA: Principal Component Analysis, GHSOM: Growing Hierarchical Self-organizing Map.