Motor Imagery EEG Classification for Patients with Amyotrophic Lateral Sclerosis Using Fractal Dimension and Fisher’s Criterion-Based Channel Selection

Motor imagery is based on the volitional modulation of sensorimotor rhythms (SMRs); however, the sensorimotor processes in patients with amyotrophic lateral sclerosis (ALS) are impaired, leading to degenerated motor imagery ability. Thus, motor imagery classification in ALS patients has been considered challenging in the brain–computer interface (BCI) community. In this study, we address this critical issue by introducing the Grassberger–Procaccia and Higuchi’s methods to estimate the fractal dimensions (GPFD and HFD, respectively) of the electroencephalography (EEG) signals from ALS patients. Moreover, a Fisher’s criterion-based channel selection strategy is proposed to automatically determine the best patient-dependent channel configuration from 30 EEG recording sites. An EEG data collection paradigm is designed to collect the EEG signal of resting state and the imagination of three movements, including right hand grasping (RH), left hand grasping (LH), and left foot stepping (LF). Five late-stage ALS patients without receiving any SMR training participated in this study. Experimental results show that the proposed GPFD feature is not only superior to the previously-used SMR features (mu and beta band powers of EEG from sensorimotor cortex) but also better than HFD. The accuracies achieved by the SMR features are not satisfactory (all lower than 80%) in all binary classification tasks, including RH imagery vs. resting, LH imagery vs. resting, and LF imagery vs. resting. For the discrimination between RH imagery and resting, the average accuracies of GPFD in 30-channel (without channel selection) and top-five-channel configurations are 95.25% and 93.50%, respectively. When using only one channel (the best channel among the 30), a high accuracy of 91.00% can still be achieved by the GPFD feature and a linear discriminant analysis (LDA) classifier. The results also demonstrate that the proposed Fisher’s criterion-based channel selection is capable of removing a large amount of redundant and noisy EEG channels. The proposed GPFD feature extraction combined with the channel selection strategy can be used as the basis for further developing high-accuracy and high-usability motor imagery BCI systems from which the patients with ALS can really benefit.


Introduction
BCI enables individuals with severe motor disabilities to control devices or communicate with others by means of their brain activity. Brain activity can be measured by different modalities. Commonly used include EEG [1], functional magnetic resonance imaging (fMRI) [2], and near-infrared spectroscopy (NIRS) [3][4][5][6]. Among these, EEG is the most frequently used because of many advantages such as lower cost, better portability, and higher temporal resolution. Many EEG BCIs have been 30, 5, and 1, where the 30-channel accuracy (i.e., without channel selection) is used as a baseline to examine the accuracy drop due to the channel reduction. Moreover, K-nearest neighbor (K-NN) and linear discriminant analysis (LDA) classifiers have been widely used in BCI studies [44][45][46]. In this study, we also compare the two based on the FD feature in different cases, including different tasks and different EEG configurations.

Participants
Five patients with ALS were recruited from the Taiwan Motor Neuron Disease Association and participated in this study. All gave informed consent. The ALS functional rating scale-revised (ALSFRS-R) [47] was recorded for each patient. The ALSFRS-R questionnaire assesses the bulbar, limb, and respiratory functions, and quantifies the overall physical function by a total score from 0 (worst) to 48 (normal). The stages of the patients were also evaluated based on the staging criteria for ALS [48,49]. The stages of ALS are defined as the following: stage 1 (symptom onset), stage 2A (diagnosis), stage 2B (involvement of second region), stage 3 (involvement of third region), stage 4A (need for gastrostomy), and stage 4B (need for non-invasive ventilation). A patient can belong to stage 4A and 4B simultaneously. The patients belonging to stage 4A and/or 4B can be considered as late-stage ALS. All patients' data are listed in Table 1. Prior to this study, the patients did not receive any training related to the regulation of the brain rhythms including the SMR training by motor imagery.

EEG Recording and Setting
EEG signals were recorded by a 33-channel electro-cap. The positions of the Ag/AgCl electrodes followed the international 10-20 system (see Figure 1). Electrode gel was applied to the electrodes to keep the impedance below 10 K Ohm. The ground channel was at the forehead, and the reference channels were at the mastoids (A1 and A2). The remaining 30 channels were all used to record EEG signals for analysis. Electrooculography (EOG) signals were recorded by the electrodes attached at the right side of the right eye and above of the left eye, respectively. Both EEG and EOG were amplified, band-pass filtered (0.5-100 Hz), and converted to digital signals with a sampling rate of 500 Hz using the NuAmp amplifier produced by NeuroScan Inc. The ocular artifacts were then removed from the EEG signals using the artifact removal software tool provided by the NeuroScan. The processed EEG signals were all stored in a personal computer for further offline analysis. the patients were consistent. For example, the movement of left foot stepping was decomposed to two motions: standing then stepping across the line with the left foot, as shown in Figure 2.  The motor imagery tasks include left hand grasping, right hand grasping, and stepping across a black line in front of the patient with left foot. The three pictures were presented during the preparation stage so that the patients could practice the designated motor imagery following the movements shown in the pictures.

EEG Data Collection
The motor-imagery EEG data collection session consisted of three runs separated by 1-to 3-min breaks, each run had 40 motor-imagery trials. The motor imagery tasks to be performed in the three runs are LH, RH, and LF, respectively. Prior to the session, there was a resting period of 90 s during which the patients were asked to keep relaxed, try to gaze at the cross centered at the screen and try not to think anything on purpose ( Figure 3). The resting EEG signals of 90 s were segmented into 40 epochs of 3-s length, and there was a 1-s overlap between two consecutive EEG epochs.
Each trial consisted of a 5-s ready-to-start period and a 3-s motor imagery period. During the ready-to-start period, both visual and auditory cues were presented to the patients. Taking the first run as an example, the auditory cue was the sentence: please imagine left hand grasping after hearing an auditory trigger. Its text content was also shown on the screen as a visual cue. In the beginning of the motor imagery period, there was an auditory trigger. The patients started to imagine the designated movement after they perceived this auditory cue, and they were asked to gaze at the Figure 1. Electrode placement. Thirty-three channels are used, among which two are references (A1, A2), one is the ground (GND), and the remaining are used for EEG recording. The positions of the electrodes follow the international 10-20 system.

Preparation
The first four patients sat in a wheelchair (patients Sub1 and Sub3) or normal armchair (Sub2 and Sub4) while facing a 20-inch computer screen. The patient Sub5 reclined in the bed. We projected the computer screen on the wall synchronously by a projector so that the instructions presented during the experiment could be clearly perceived by the patient Sub5.
Three kinds of motor imagery tasks were designed in this study to test the classification accuracy of each motor imagery and no imagery (i.e., resting), including grasping an object by left hand (LH), grasping an object by right hand (RH), and stepping across a line in front of the patient with left foot (LF). All patients were provided with three pictures showing the three kinds of movements, respectively ( Figure 2). They were asked to practice the imagination of the motions during the preparation stage (the stage of applying electro-gel to the 30 electrodes). They were also informed of the motion decomposition in each movement in order to ensure the motor imagery contents across the patients were consistent. For example, the movement of left foot stepping was decomposed to two motions: standing then stepping across the line with the left foot, as shown in Figure 2.   The motor imagery tasks include left hand grasping, right hand grasping, and stepping across a black line in front of the patient with left foot. The three pictures were presented during the preparation stage so that the patients could practice the designated motor imagery following the movements shown in the pictures.

EEG Data Collection
The motor-imagery EEG data collection session consisted of three runs separated by 1-to 3-min breaks, each run had 40 motor-imagery trials. The motor imagery tasks to be performed in the three runs are LH, RH, and LF, respectively. Prior to the session, there was a resting period of 90 s during

EEG Data Collection
The motor-imagery EEG data collection session consisted of three runs separated by 1-to 3-min breaks, each run had 40 motor-imagery trials. The motor imagery tasks to be performed in the three runs are LH, RH, and LF, respectively. Prior to the session, there was a resting period of 90 s during which the patients were asked to keep relaxed, try to gaze at the cross centered at the screen and try not to think anything on purpose ( Figure 3). The resting EEG signals of 90 s were segmented into 40 epochs of 3-s length, and there was a 1-s overlap between two consecutive EEG epochs.  Resting and motor imagery EEG data collection. The motor imagery data collection consisted of three runs separated by 1-to 3-min breaks, and each run had 40 trials. One trial started with a 5-s ready-start period followed by a 3-s motor imagery period during which the participants performed the designated motor imagery task according to both visual and auditory cues presented in the ready-to-start period.

GPFD
The GP algorithm estimates the correlation dimension (D2) or FD of an attractor in phase-space domain. Let x = x(1), x(2), … , x( ) denote the EEG signal of 3 s, where N is the length of the signal ( = 500 Hz 3 s = 1500). The EEG signal can be reconstructed as a set of M-dimensional vectors as where is the time delay and is the embedding dimension. The time delay is a free parameter which needs to be determined experimentally. The correlation integral stands for the probability that the set of the distance between two different reconstructed vectors ( ) and ( ) (i.e., | ( ) − ( )|, ∀ ) fall into the cell of size at a given embedding dimension , where H is a Heaviside function defined as ( ) = 1 if 0; ( ) = 0 otherwise. The correlation dimension is given by The slope of log C( ) versus log at a given can be estimated over the region where log C( ) is approximately linear in log by linear least-squares fitting. The slope of the line is the value at the given . Moreover, the value of gradually increases with increase of , and then achieves saturation. The saturation value is defined as the FD [29]. However, in numerical computation, the saturation value would not maintain at a constant, but varies slightly as the value further increases. Accordingly, the FD was determined based on the steps as follows, Resting and motor imagery EEG data collection. The motor imagery data collection consisted of three runs separated by 1-to 3-min breaks, and each run had 40 trials. One trial started with a 5-s ready-start period followed by a 3-s motor imagery period during which the participants performed the designated motor imagery task according to both visual and auditory cues presented in the ready-to-start period.
Each trial consisted of a 5-s ready-to-start period and a 3-s motor imagery period. During the ready-to-start period, both visual and auditory cues were presented to the patients. Taking the first run as an example, the auditory cue was the sentence: please imagine left hand grasping after hearing an auditory trigger. Its text content was also shown on the screen as a visual cue. In the beginning of the motor imagery period, there was an auditory trigger. The patients started to imagine the designated movement after they perceived this auditory cue, and they were asked to gaze at the fixation cross on the screen while imaging the movement. The trials repeated 40 times in each run. Thus, the motor imagery EEG collection session lasted to about 20 min (8 s × 40 trials × 3 runs + 2 min × 2 breaks between runs). For each patient participant, forty EEG signals of 3 s were obtained for each task (resting, LH imagery, RH imagery, LF imagery). where τ is the time delay and M is the embedding dimension. The time delay is a free parameter which needs to be determined experimentally. The correlation integral

Feature Extraction
stands for the probability that the set of the distance between two different reconstructed vectors y(i) and y(j) (i.e., |y(i) − y(j)|, ∀i = j) fall into the cell of size r at a given embedding dimension M, where H is a Heaviside function defined as H(y) = 1 if y > 0; H(y) = 0 otherwise. The correlation dimension d c is given by The slope of log C(r) versus log r at a given M can be estimated over the region where log C(r) is approximately linear in log r by linear least-squares fitting. The slope of the line is the d c value at the given M. Moreover, the value of d c gradually increases with increase of M, and then achieves saturation. The saturation value is defined as the FD [29]. However, in numerical computation, the saturation value would not maintain at a constant, but varies slightly as the M value further increases. Accordingly, the FD was determined based on the steps as follows, Step 1. initialize M = 2; Step 2. M = M + 1; Step 3. calculate the d c value at the given M; Step 4. repeat step 2 and step 3 until where is a positive number (we set = 0.001). An excessively large value of epsilon could result in an M value for which the corresponding d c value has not yet saturated (e.g., M < 10). On the contrary, an excessively small value of epsilon could lead to the problem that no M value can be obtained. In this study, the value was decided by trial and error. By using the setting of = 0.001, the FDs of all EEG signals were obtained, and their M values were within the range of M > 10.
Examples of the d c − M plots are shown in Figure 4, where each curve was calculated based on a 3-s EEG signal of a mental task. The four curves were based on the signals of resting, LH imagery, RH imagery, and LF imagery, respectively (all from patient Sub1 and recorded at F7). It can be observed from Figure 4 that the M values corresponding to the FDs (the d c values marked in black square) for the four different tasks are different (M = 23, 20, 21, 13 for the four tasks). According to our experiment, the GPFDs of all the EEG signals were found within the range of M = 10 to M = 30. Also, the M values associated with the GPFDs for different trials and different participants were not necessarily the same, depending the result by the aforementioned steps for the FD calculation. a 3-s EEG signal of a mental task. The four curves were based on the signals of resting, LH imagery, RH imagery, and LF imagery, respectively (all from patient Sub1 and recorded at F7). It can be observed from Figure 4 that the values corresponding to the FDs (the values marked in black square) for the four different tasks are different ( = 23, 20, 21, 13 for the four tasks). According to our experiment, the GPFDs of all the EEG signals were found within the range of = 10 to = 30. Also, the M values associated with the GPFDs for different trials and different participants were not necessarily the same, depending the result by the aforementioned steps for the FD calculation.

HFD
The Higuchi method estimates the FD of a time series directly in the time domain without reconstructing the strange attractor, and gives a reasonable FD estimate for a short-time segment [32,39]. For an EEG signal of N samples ： = y(1), y(2), … , y( ) , k new signals are reconstructed by where the integers and stand for the initial time point and the time interval, respectively, and (•) denotes the integer function. The length of each curve is computed as

HFD
The Higuchi method estimates the FD of a time series directly in the time domain without reconstructing the strange attractor, and gives a reasonable FD estimate for a short-time segment [32,39]. For an EEG signal of N samples: y = [y(1), y(2), . . . , y(N)], k new signals are reconstructed by where the integers m and k stand for the initial time point and the time interval, respectively, and int(·) denotes the integer function. The length of each curve y k m is computed as The length L m (k) represents the normalized sum of absolute values of differences between two consecutive points of distance k. The length of the curve for each time interval k, is computed as the average of the m curves: The signal has a fractal dimension of FD if for different values of k. The FD value can be obtained by applying the least-squares linear fitting to calculate the slope of log(L(k)) against log(1/k). The slope of the line is the FD of the EEG signal. The maximum k value, k max was set as 100 in this study. The FD value ranges from 1 (deterministic) to 2 (stochastic signal). Thus, the higher the value of the FD is, the higher the complexity of the signal is. Figure 5 shows the example of log-log plots of L(k) versus 1/k of four different mental tasks.
for different values of . The FD value can be obtained by applying the least-squares linear fitting to calculate the slope of log ( ) against log(1 ⁄ ). The slope of the line is the FD of the EEG signal. The maximum value, was set as 100 in this study. The FD value ranges from 1 (deterministic) to 2 (stochastic signal). Thus, the higher the value of the FD is, the higher the complexity of the signal is. Figure 5 shows the example of log-log plots of ( ) versus 1 ⁄ of four different mental tasks.

Channel Selection Based on Fisher's Criterion
The aim of channel selection is to select the most discriminative EEG channels from the original 30 channels by applying the Fisher's criterion. The FD features from the 30 channels are concatenated to a feature vector (called data hereafter) of dimension , where = 30. Let ∈ denotes the jth training data of class , where = 1, … , , is the mean of the training data of the ith class, and is the mean of the training data from all classes. In this study, we focused on the classification

Channel Selection Based on Fisher's Criterion
The aim of channel selection is to select the most discriminative EEG channels from the original 30 channels by applying the Fisher's criterion. The FD features from the 30 channels are concatenated to a feature vector (called data hereafter) of dimension q, where q = 30. Let x ij ∈ R q denotes the jth training data of class i, where i = 1, . . . , c, m i is the mean of the training data of the ith class, and m is the mean of the training data from all classes. In this study, we focused on the classification problem in a motor imagery task (resting vs. motor imagery). Thus, c = 2. The within-class S w and the between-class S b scatter matrices are calculated by and respectively, where n i denotes the number of data in the ith class, the superscript 't' stands for the transpose of a matrix, and P i = n i / c ∑ j=1 n j is the prior probability of the ith class. The class separability for the f th feature ( f = 1, . . . , q) can be represented by the Fisher score F( f ) as where S b ( f ) and S w ( f ) are the f th diagonal elements of S b and S w , respectively. The higher the value of F( f ) is, the better the class separability of the feature extracted from the f th channel is. Based on the class separability criterion, we can select d top channels among the 30 ones. The d features from the selected channels are concatenated to a feature vector.

Classification and Parameter Optimization
Both K-NN and linear discriminant analysis (LDA) classifiers were used for testing the two-class classification accuracies of RH imagery vs. resting (RH vs. R), LH imagery vs. resting (LH vs. R), and LF imagery vs. resting (LF vs. R). The classification accuracy was computed by leave-one-out cross validation (LOO-CV). For each two-class classification problem, there are totally 80 EEG data (40 from resting, and 40 from a motor imagery). One of the 80 data is used for testing, and the remaining ones were used for training the classifier. This procedure repeated until every data served as the test data once.
Parameters of the methods were optimized by the cross validation procedure combined with grid searching method. In this study, only the GPFD method involves free parameter, i.e., the time delay τ (in sample points). We searched for the optimum value of time delay in the set τ = {20, 30, 40, 50, 60, 70}; namely there were 6 parameter grids. The LOO-CV 30-channel classification accuracies for different time delay values and tasks are shown in Figure 6. It can be observed from Figure 6 that the best accuracies for both RH-R and LH-R classification tasks occurred at τ = 50. The best τ value for LF-R classification (94.50%) was 30, which was only slightly higher than the accuracy (92.25%) of τ = 50. Therefore, we set τ = 50 for all classification tasks. The results reported in the following are the highest LOO-CV classification accuracies.

Results
The average K-NN classification accuracies over the five patient participants among different feature extraction methods are listed in Figure 7a-c, where the results shown in the three subplots are based on all 30 EEG channels (without channel selection), the top five channels, and the best channel, respectively. We used the Wilcoxon rank sum test to statistically examine the difference of classification accuracy between different feature extraction methods, because the one-sample Kolmogorov Smirnov test rejected the null hypothesis of normal distribution of the data (p < 0.05). Tables 2-4 list the top five channels and the corresponding Fisher values in RH-R, LH-R, and LF-R classification conditions. Different subbands in beta band may have different levels of beta rebound induced by a motor imagery [25,50]. Therefore, we divided the mu and beta bands into five subbands, and computed the BP of the five subbands for the purpose of comparison.

Results
The average K-NN classification accuracies over the five patient participants among different feature extraction methods are listed in Figure 7a-c, where the results shown in the three subplots are based on all 30 EEG channels (without channel selection), the top five channels, and the best channel, respectively. We used the Wilcoxon rank sum test to statistically examine the difference of classification accuracy between different feature extraction methods, because the one-sample Kolmogorov Smirnov test rejected the null hypothesis of normal distribution of the data (p < 0.05). Tables 2-4 list the top five channels and the corresponding Fisher values in RH-R, LH-R, and LF-R classification conditions. Different subbands in beta band may have different levels of beta rebound induced by a motor imagery [25,50]. Therefore, we divided the mu and beta bands into five subbands, and computed the BP of the five subbands for the purpose of comparison.     Among the BP features, the mu band performs the worst (only 60-70%) for all tasks and channel configurations. Overall, the high beta band of 20-28 Hz performs the best. Taking the RH-R classification task as an example, the BP of 24-28 Hz achieves high accuracy of 88% in 30-channel configuration (Figure 7a), and accuracy of 90.75% in the configuration with five optimal channels (Figure 7b). When using one channel, the BP of 20-24 Hz can still give high RH-R classification accuracy of 87.5%. In contrast, the GPFD method provides a higher accuracy than all the BPs and HFD, and the averaged classification accuracy of GPFD is significantly higher than that of the mu-band BP in all tasks and all channel configurations (p < 0.05). Also, GPFD is still capable of maintaining a high accuracy even if the number of channels is largely reduced from 30 to 5, and even from 5 to 1. Taking RH-R classification as an example, the accuracies achieved by GPFD in 30, 5, and 1 channel configurations are 95.25%, 93.25%, and 90.50%, respectively. The performance drop is very limited, and the accuracy remains to be above 90% even if only one single channel (i.e., top channel) is used (Figure 7c).
The above results have shown that the top five channels are able to achieve high accuracy. However, it is unclear whether the channels with the smallest Fisher scores are really noisy. Therefore, we further conducted an experiment to examine whether the Fisher's criterion-based channel selection can remove noisy/redundant channels. Table 5 shows the comparison of average K-NN classification accuracies between the top-five-channel (listed in Tables 2-4) and the worst-five-channel configurations, where the worst five EEG channels are the ones having the lowest Fisher score values among the 30 channels. The results show that most accuracies of the worst-five-channel configurations are close to chance level, and the accuracies of the top five channels are much higher. Taking RH-R classification as an example, the BP of 20-24 Hz only gives accuracy of 62.50% by the worst-five-channel configuration, while its accuracy achieves 86.50% when using the top five channels. Similarly, the difference of RH-R classification accuracy between the two configurations achieves almost 30% (93.25% − 65.50% = 27.75%) when the GPFD feature was used. The comparisons indicate that some brain regions are close to useless while some are very helpful for the discrimination between resting and motor imagery. Figure 8 shows two patients' GPFD topoplots of resting and RH imagery states. For patient Sub1, no significant changes can be observed in some areas, such as frontal and occipital. For patient Sub3, significant difference between the two states is observed from the area around C4, while some other regions such as occipital and the right prefrontal (FP2) show almost no difference. Figure 8c and f show that the top-five-channel configurations for the two patients are different even if the features used are the same (i.e., GPFD). The above results and observations demonstrate that (1) channel selection is necessary, (2) best EEG channel configuration is patient-dependent, and (3) the best patient-dependent five-channel configuration obtained by the Fisher's criterion is able to achieve high classification accuracy. Table 5. Comparison of average K-NN classification accuracies between the best-five-channel and the worst-five-channel configurations in different features and tasks (in %). Here, "Ave." means average over the subjects.  The results listed in Figure 7 have shown that the GPFD method is superior to other methods. However, the classification accuracies in Figure 7 are based on the simple classifier K-NN. We conducted another experiment to see if the GPFD-based motor imagery classification performance in ALS can be improved by using the more sophisticated classifier LDA. According to Table 6, LDA performs better than K-NN in most cases. However, the improvement is very limited. The maximum improvement is only 2.25% (from 88.25% of K-NN to 90.50% of LDA), occurring in the LH-R classification using 30 channels. One possible reason is that when a simple K-NN classifier is used, The results listed in Figure 7 have shown that the GPFD method is superior to other methods. However, the classification accuracies in Figure 7 are based on the simple classifier K-NN. We conducted another experiment to see if the GPFD-based motor imagery classification performance in ALS can be improved by using the more sophisticated classifier LDA. According to Table 6, LDA performs better than K-NN in most cases. However, the improvement is very limited. The maximum improvement is only 2.25% (from 88.25% of K-NN to 90.50% of LDA), occurring in the LH-R classification using 30 channels. One possible reason is that when a simple K-NN classifier is used, the resulting classification accuracy based on GPFD feature is already very high (88.25%). Therefore, the space of the accuracy that can be improved by other more advanced classifiers like LDA is limited (only 100% − 88.25% = 11.75%), which might explain why LDA outperforms K-NN by only 2.5%. However, the error reduction rate is high, because 11.75% − 9.50%/11.75% = 21.28%. From the error-reduction point of view, LDA improved a lot.

Discussion
No previous studies have been focused on the FD-based motor imagery classification for patients with ALS. Nevertheless, our results have shown the success of the use of FD (particularly the GPFD) as the feature for classification. As mentioned in Section 1, the previous works on motor imagery classification studies involving ALS patients ( [9,11]) were all based on the SMR feature, i.e., the mu and beta BPs of EEG signals recorded at the sensorimotor area. To compare with the previous approach, we calculated the features (BPs of five subbands within the alpha and beta bands, GPFD, HFD) of the EEG signals from five sensorimotor channels (C3, Cz, C4, CP3, CP4). The comparison of accuracies between the sensorimotor area and the top-five-channel configuration (listed in Tables 2-4) is shown in Figure 9, where K-NN is used as the classifier, and the accuracies in the top five channels are the same as those in Figure 7b. According to Figure 9, it is clear that the BPs from the sensorimotor area do not provide satisfactory average classification accuracy (<80% in all cases). Our results are consistent with the ones of earlier study [9] and the latest study [11], where the accuracies are all below 80%. In [9], the five patients' classification accuracies of hand motor imagery vs. resting are inside of the range of 70-80%, and the accuracy is lower than 60% for a patient without receiving SMR training. In [11], the average accuracy is around 60%. The results of our study and the previous studies [9,11] show that the BPs from the sensorimotor area are not robust features for the motor imagery classification in ALS. On the other hand, our RH-R and LF-R classification results show that the classification accuracy of high beta BPs (20-24 Hz and 24-28 Hz) from the sensorimotor area is higher than 70%, which is also much higher than the accuracies of around 60% reported in [9,11] in which the EEG features are also the BPs from the sensorimotor area. Such difference could be due to the fact that the classification in our study was processed offline while these studies [9,11] calculated classification using online feedbacks of motor imagery. Since offline analysis operates in a synchronous (cue-based) mode, the offline accuracy is often higher than the accuracy of online testing (i.e., asynchronous or self-paced mode).
The results in Figure 9 also reveal that the top-five-channel configuration is more robust than the five channels within the sensorimotor area. For all features (BP, GPFD, HFD) and all classification tasks (RH-R, LH-R, LF-R), the top-five-channel configuration achieves a better average classification accuracy than the five channels within the sensorimotor area. For the BP feature of 12-16 Hz, the LH-R classification accuracy of top five channels is significantly higher than the accuracy of the five sensorimotor channels (p < 0.05). For the BP feature of 24-28 Hz, the top-five-channel configuration also achieves a significantly higher accuracy in RH-R classification than the five sensorimotor channels (p < 0.05). Moreover, we can see from Tables 2-4 that most of top five channels do not fall within the sensorimotor area. Taking the patient Sub1 as an example (BP of 20-24 Hz, RH-R classification), the patient's top five channels (TP7, T4, FP2, T3, O2) does not contain any from the sensorimotor area.
All these comparison results suggest that the searching of best EEG recording sites for ALS patients should not be restricted within the sensorimotor area. In this study, we have compared the performances between two FD methods (GPFD and HFD) with BPs of different subbands. The mean classification accuracies by the two FD methods were all different for all tasks, which was due to the fact that the two methods estimate the FD by different approaches. Moreover, GPFD achieved higher mean accuracies than HFD for all classification tasks and all channel configurations. One possible reason is that Higuchi's method is very sensitive to noise [51]. Such high noise sensitivity could limit the discriminating ability of the FD features estimated by Higuchi's method, because in this study the inputs were signal-trial raw EG signals which had low signal-to-noise ratio.
We have compared the GPFD and FD with BP features of different subbands. In BCI studies, common spatial pattern (CSP) [19,50,52,53] and autoregressive model (AR) [54] have been also widely used. CSP transforms a filtered multi-channel EEG ( , where denotes the number of channels) to new EEG signal ( ) by a transform : = , where the signal needs to be spectrally filtered in advance [52]. The training phase of CSP is to obtain the matrix (see the steps summarized in [50]). The normalized log powers of the first p and the last p rows of the new signal matrix are optimal for the discrimination between two classes. Therefore, a CSP feature vector consists of 2p features, where p is a free parameter of CSP method. AR is a time-domain feature extraction method which involves only one parameter, the order of the model. Given an order value, the ARM finds the best fit to the real EEG signal using the least-square method. Supposing that the order is set as , then for each channel's EEG signal there are AR coefficients extracted. The AR coefficients from each of the channels are concatenated to a feature vector of dimension. The searching values for include 2, 3, 4, 5, 6, and 7. In our experiment, the optimal CSP parameter and the AM parameter were all determined by the LOO-CV procedure determined in Section 2.7. According to Figure 7, GPFD is superior to other features. Therefore, here we only compared the GPFD with CSP and AR features, and the accuracies of other features (BP and HFD) are already shown in Figure 7. The classification results based on K-NN classifier are listed in Table 7. According to Table 7, the best RH-R classification accuracy in the three channel configurations (30, top five, and top one) are 95.25%, 93.25%, and 90.50%, respectively, which are all obtained by GPFD. However, beta-CSP, i.e., raw EEG signals were band-pass filtered by a 3-order Butterworth filter (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) and then sent into CSP for feature extraction, performs better than mu-CSP in two conditions (30 and top five channels). Also, beta-CSP is better than GPFD for LH-R classification when 30 channel were used. Overall, CSP and GPFD are comparable to each other in the condition of 30 channels. However, the results show that GPFD was much better than CSP in top-five-channel condition, especially for LF-R classification, where GPFD achieved a high accuracy of 90.50% while the accuracies of both mu-CSP and beta-CSP were all below 80%. On the other hand, CSP is based on the decomposition of the signal parameterized by a matrix that projects the signals in the original recording space to the ones in the Figure 9. Comparison of the accuracies between the top-five-channel configuration and the five channels (C3, Cz, C4, CP3, CP4) within the sensorimotor area. * indicates significant difference between EEG configurations (top five channels and the five channels on the sensorimotor area) for each method at p < 0.05.
In this study, we have compared the performances between two FD methods (GPFD and HFD) with BPs of different subbands. The mean classification accuracies by the two FD methods were all different for all tasks, which was due to the fact that the two methods estimate the FD by different approaches. Moreover, GPFD achieved higher mean accuracies than HFD for all classification tasks and all channel configurations. One possible reason is that Higuchi's method is very sensitive to noise [51]. Such high noise sensitivity could limit the discriminating ability of the FD features estimated by Higuchi's method, because in this study the inputs were signal-trial raw EG signals which had low signal-to-noise ratio.
We have compared the GPFD and FD with BP features of different subbands. In BCI studies, common spatial pattern (CSP) [19,50,52,53] and autoregressive model (AR) [54] have been also widely used. CSP transforms a filtered multi-channel EEG E (N c × N, where N c denotes the number of channels) to new EEG signal E N (N c × N) by a transform W: E N = WE, where the signal E needs to be spectrally filtered in advance [52]. The training phase of CSP is to obtain the matrix W (see the steps summarized in [50]). The normalized log powers of the first p and the last p rows of the new signal matrix E N are optimal for the discrimination between two classes. Therefore, a CSP feature vector consists of 2p features, where p is a free parameter of CSP method. AR is a time-domain feature extraction method which involves only one parameter, the order of the model. Given an order value, the ARM finds the best fit to the real EEG signal using the least-square method. Supposing that the order is set as d, then for each channel's EEG signal there are d AR coefficients extracted. The AR coefficients from each of the N c channels are concatenated to a feature vector of N c × d dimension. The searching values for d include 2, 3, 4, 5, 6, and 7. In our experiment, the optimal CSP parameter and the AM parameter were all determined by the LOO-CV procedure determined in Section 2.7. According to Figure 7, GPFD is superior to other features. Therefore, here we only compared the GPFD with CSP and AR features, and the accuracies of other features (BP and HFD) are already shown in Figure 7. The classification results based on K-NN classifier are listed in Table 7. According to Table 7, the best RH-R classification accuracy in the three channel configurations (30, top five, and top one) are 95.25%, 93.25%, and 90.50%, respectively, which are all obtained by GPFD. However, beta-CSP, i.e., raw EEG signals were band-pass filtered by a 3-order Butterworth filter (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) and then sent into CSP for feature extraction, performs better than mu-CSP in two conditions (30 and top five channels). Also, beta-CSP is better than GPFD for LH-R classification when 30 channel were used. Overall, CSP and GPFD are comparable to each other in the condition of 30 channels. However, the results show that GPFD was much better than CSP in top-five-channel condition, especially for LF-R classification, where GPFD achieved a high accuracy of 90.50% while the accuracies of both mu-CSP and beta-CSP were all below 80%. On the other hand, CSP is based on the decomposition of the signal parameterized by a matrix that projects the signals in the original recording space to the ones in the surrogate sensor space. However, this decomposition relies on at least two EEG channels; otherwise the spatial covariance matrix of CSP cannot be decomposed (see [55] for detailed formulation). In other words, at least two channels are required to record EEG signals when using the CSP method, which is its limitation in use. Therefore, the CSP method cannot be applied in top-one-channel condition, as shown in Table 7 where N/A mean "not available". This is one advantage of GPFD over CSP, and GPFD can still achieve high accuracy (90.50% in RH-R classification) even if only one channel was used. AR and mu-CSP gives similar accuracies. However, beta-CSP outperforms AR greatly. The comparison results listed in Table 7 indicate that GPFD is more suitable in motor imagery classification for ALS patients, especially when there are only few channels used for EEG recording. Although the classifier design is beyond the scope of our study, it is expected that the classification accuracy can be further improved by using advanced classifiers, the convolutional neural networks (CNN) [56] for example. Table 7. Comparison between GPFD, AR, and CSP features using K-NN classifier (in %). The successful implementation of GPFD feature extraction and Fisher's criterion-based channel selection strategy has several implications for further development of motor imagery BCI systems for patients with ALS. First, the current study tested the accuracies for three different classification tasks (RH vs. R, LH vs. R, and LF vs. R). Each of the binary classifications can be the basis for further developing a two-state asynchronous BCI [57] by which the ALS patients can respond to their family members' questions (express "yes" for example) by performing a motor imagery, which can increase their quality of life. Such simple function is useful for late-stage ALS patients whose speaking function are severely impaired (for example the patient Sub5 in this study), and is extremely useful for ALS patients who are in the completely locked-in state. Currently, our methods were tested on the EEGs from the late-stage ALS patients who are still able to communicate (but very slow). It is unknown whether it is possible to successfully transfer the methods (including the EEG data collection paradigm) to completely locked-in ALS patients. However, a recent study has evidenced that cognitively intact patients ALS patients produced motor imagery decoding accuracies better than cognitively impaired patients [9]. Moreover, in our data collection stage (Figure 3), a voice instruction is presented before every motor imagery period. Thus, the locked-in patients may be able to follow the instruction to perform the instructed motor imagery tasks even if they are unable to watch the computer screen. Therefore, it is believed that there exists such possibility as long as their cognition process is still normal (can hear, and follow the voice instructions). Nevertheless, it should be further justified in the future. Second, this study introduced the Fisher's separability measure to automatically select a set of discriminating channels which are patient-dependent. This can be applied to effectively and efficiently calibrate the best EEG recording sites for different users/patients before training the motor imagery classification model. Although the results have demonstrated that the best channel configuration is capable of achieving high accuracy, the accuracy is not necessarily optimal in practice. In other words, the best channel configuration is not necessarily the best. The main reason is that the Fisher's separability criterion calculates the Fisher scores of the features (or channels) separately, and does not consider the joint class separability of multiple features [42]. In other words, some features that do not have relatively high Fisher scores may provide high or even higher classification accuracy when these features are used together as the input to the classifier. To deal with this drawback, some advanced methods could be applied, such as the joint mutual information [58] and the evolutionary computation algorithms like particle swarm optimization (PSO) [59]. Last, our results have shown that high accuracy can be achieved by the use of low-density electrodes (5 or 1) with GPFD feature. However, the results of the methods were based on a commercialized EEG amplifier, which is very expensive. To improve the motor imagery BCI's applicability, there is a need to develop a low-cost and portable EEG acquisition and preprocessing module so that the BCI system based on the proposed methods can really be used to help patients with ALS.

Conclusions
Motor imagery BCI has shown success on able-bodied individuals. Yet, there is still limited efficacy in late-stage ALS, which may be due to the fact that typical features, spectral information in the mu and beta bands, may not be as discriminative for ALS patients. To deal with this problem, we have proposed in this paper a method by combining the GPFD feature and Fisher criterion-based channel selection strategy. Although the two separate methods have already been used in the BCI community, the use of GPFD and its combination with channel selection is novel for ALS patients' motor imagery classification. Our results have demonstrated that the proposed use of GPFD is superior to other features that have been used in previous studies involving ALS patients, and is able to achieve high accuracy (~90%) even when there is only one channel (the one selected by the proposed automatic subject-independent channel selection strategy). In summary, based on the results, it can be concluded that the contribution of the proposed method is three-fold. First, the use of the GPFD feature can provide motor imagery BCIs for ALS patients with higher accuracy. Second, the combination of the GPFD feature and the Fisher's criterion-based channel selection strategy is able to automatically determine the best patient-dependent channel configuration from 30 EEG recording sites. Based on only a few selected channels, the GPFD can still maintain high average accuracy, which greatly improves the usability of the BCI for ALS patients. Finally, our analysis results suggested that most of the discriminating channels for ALS motor imagery classification do not fall on the sensorimotor area. To our knowledge, this is a new finding in the EEG-based BCI community, and also supports the fMRI evidence that there is a spatial shift of function in the motor area in ALS patients [18].