Selective Feature Generation Method Based on Time Domain Parameters and Correlation Coefficients for Filter-Bank-CSP BCI Systems

This paper presents a novel motor imagery (MI) classification algorithm using filter-bank common spatial pattern (FBCSP) features based on MI-relevant channel selection. In contrast to existing channel selection methods based on global CSP features, the proposed algorithm utilizes the Fisher ratio of time domain parameters (TDPs) and correlation coefficients: the channel with the highest Fisher ratio of TDPs, named principle channel, is selected and a supporting channel set for the principle channel that consists of highly correlated channels to the principle channel is generated. The proposed algorithm using the FBCSP features generated from the supporting channel set for the principle channel significantly improved the classification performance. The performance of the proposed method was evaluated using BCI Competition III Dataset IVa (18 channels) and BCI Competition IV Dataset I (59 channels).


Introduction
Brain-computer interfaces (BCIs) enable the translation of neural signals related to a user's intention into control signals in the absence of muscle movements, and have drawn considerable attention in various research fields, including rehabilitation and engineering [1][2][3]. Due to the technological developments in practicality and portability in recent years, BCIs have also been applied to entertainment and educational filed. [4,5].
Most current BCI systems use electroencephalogram (EEG) extensively due to its high temporal resolution and non-invasiveness [6]. The EEG-based BCI studies show that, when imagining movement of the body, the EEG signals from the regions associated of the cerebral cortex show decreased and increased power in sensorimotor and beta rhythm, called event-related desynchronization (ERD) and event-related synchronization (ERS), respectively [7,8]. Thus, motor imagery (MI)-based BCIs are widely studied by identifying ERD/ERS patterns.
However, EEG signals suffer from low signal-to-noise ratios (SNR) and are highly correlated due to the volume conduction effect [9]. Consequently, they are susceptible to strong artifacts [10,11]. The common spatial pattern (CSP) approach is perhaps the most popular method for extracting ERD/ERS-related features, resolving these difficulties and thus improving the performance of MI-based BCI [12][13][14][15]. The CSP approach designs spatial filters that maximize the variance for one MI-task while simultaneously minimizing the variance for the other task, and extracts discriminative ERD/ERS-related features based on the spatially filtered EEG signals.
Various studies have examined ways to improve CSP algorithms in three categories: frequency optimization, regularization, and channel selection. Frequency range optimization for CSP has been proposed using filter-banks. The filter-bank CSP (FBCSP) approach, described in [16,17], performs the CSP for each frequency band and selects the distinct frequency bands by comparing mutual information of each frequency band CSP features. The FBCSP approach overcomes the frequency range problem of CSP and shows improved performance for MI-classification.
The regularized CSP (R-CSP) [18] approach considers regularization of CSP to overcome the sensitivity thereof to noise and overfitting. However, the performance of R-CSP is highly dependent on optimization of the regularization parameters, which requires exhaustive cross-validation. A frequency range-optimized version of R-CSP, filter-bank regularized CSP (FBRCSP), has also been proposed [19].
Since the CSP approach uses all available channels, including noisy and task-irrelevant channels, the selection of MI-related channels is important for improving the performance of CSP-based algorithms. The sparse CSP (SCSP) approach, described in [20], removes MI-irrelevant channels via sparse CSP filters based on the 1 / 2 -norm constraint, and applies the CSP to the remaining channels. The CSP-rank for multiple frequency band (CSP-R-MF) approach, described in [21], generates CSP outputs for each frequency band based on the channels with significant conventional CSP filter coefficients, and selects features from the multi-band CSP outputs using the least absolute shrinkage and selection operator (LASSO) algorithm [22]. The experimental results presented in this paper show that these channel selection approaches yield better performance than frequency optimization and regularization.
Although the channel selection CSP approaches improve performance markedly, they have a fundamental limitation that SCSP and CSP-R-MF select MI-relevant channels based on the global CSP, which might already be corrupted by the MI-irrelevant channels. Hence, a CSP-independent method for determining MI-relevant channels is desired.
In this paper, we propose a novel MI-relevant channel selection method for FBCSP. We utilize time domain parameters [23] (TDPs) and correlation coefficients of EEG channel pairs to determine MI-relevant channels. TDPs are extracted from wide frequency band EEG time domain signals and originally used as features to classify MI in [24]. In [24], three types of TDPs are introduced, namely the variance of the signal, the variance of the first derivative, and the variance of the second derivative, which represent activity, mobility, and complexity of the signal, respectively. We consider the channel with the highest Fisher ratio ( [25]) of TDPs as the most discriminative channel for MI-tasks and refer to it as the principle channel. We form a supporting channel set for the principle channel with the channels that have high correlation coefficients with the principle channel. Finally, we extract the FBCSP features from the supporting channel set and use them as the input to the support vector machine (SVM) classifier [26]. The performance of the proposed method was evaluated using BCI Competition III Dataset IVa and BCI Competition IV Dataset I. Comparison of the performance with existing CSP-based methods demonstrates significant improvement in classification performance. The rest of this paper is structured as follows. Section 2 presents the proposed method. Section 3 provides the experiment results and discussion. Finally, the conclusion is drawn in Section 4.

Methods
In this paper, we consider the binary MI-classification. First, let us consider K channel EEG signals. We denote the kth channel EEG signal at time point n as x (k) (n), where k = 1, 2, . . . , K, n = 1, 2, . . . , N and N is the number of time samples per channel. Assume that I trials of the MI-EEG signal are available, indexed as x . . , I. We denote I 1 and I 2 as the index set of each MI class (I 1 ∪ I 2 = {1, 2, . . . , N}).
The block diagram in Figure 1 depicts the proposed algorithm. We first introduce the TDPs, and Fisher ratio of TDPs to measure the MI-relevance of each channel. The channel with the highest Fisher ratio of TDPs is referred to as the principle channel. A supporting channel set for the principle channel is constituted from channels that have correlation coefficients with the principle channel exceeding a certain threshold. The FBCSP features are extracted from the supporting channel set to improve the MI-classification performance.

Principle Channel Selection
The type p (p = 0, 1, 2) time domain parameter (TDP) for the ith trial of the kth channel EEG signal, denoted by T (k) (i,p) , is defined by the following [23]: The first type (p = 0) represents signal power, the second type (p = 1) represents the EEG pattern for mean frequency, and the third type (p = 2) represents the EEG pattern for frequency change [23].
The Fisher ratio is widely used to measure the class-discriminative property of a parameter by projecting high-dimensional parameters into one-dimensional space [27]. The Fisher ratio of the three types of TDP for channel k, defined by F (k) , is given by, where |I c | denotes the size of the index set I c . Postulating that the channel with the highest Fisher ratio has the most significant discrimination between MI tasks, we select the channel with the highest Fisher ratio, denoted by k p , and refer to it as the principle channel.

Supporting Channel Set for the Principle Channel
To extract features based on FBCSP, we need sufficient number of channels. For this, we use channels that are highly correlated with the principle channel. The (sample) correlation coefficient between the principle channel k p and a channel q for the ith trial EEG signals is given by: , q = 1, 2, . . . , K, and q = k p where and the mean correlation coefficient for class c, denoted asρ (k p ,q) c , is given by: After calculating the mean correlation coefficients between the principle channel and other channels, we define the supporting channel set for the principle channel, denoted as S, as those channels with mean correlation coefficient exceeding a given threshold, ρ thr , as follows:

FBCSP Applied to the Supporting Channel Set
The FBCSP approach is then applied to the supporting channel set to extract the MI-relevant features. Let us consider the output of the mth filter-bank of the supporting channel set S at the ith trial as X where E

(S)
i,m ∈ R |S|×|S| . Subsequently, the normalized mean sample covariance matrix for the class c, denote asĒ (S) c,m , is given by: Let p (m) be a spatial filter applied to X The common spatial pattern (CSP) algorithm [12][13][14] finds the spatial filters that maximize or minimize the averaged variance ratio J as denoted by the following equation: The CSP feature vector, for the ith trial data for supporting channel set S in the mth frequency band, is defined as: where For each trial i, the FBCSP approach selects feature vectors in discriminative frequency bands among the M filter-banks by using the mutual information based individual feature (MIBIF) algorithm [17]. The MIBIF algorithm computes the mutual information between feature vectors and its class label to select the discriminative filter-banks. By selecting the best two frequency bands, e.g., m 1 and m 2 , we obtain the FBCSP feature vector for the ith trial data: In the training phase, {v i } and their corresponding known class label vector are fed to the SVM classifier.

Result and Discussion
We evaluated the proposed method in the task of classifying MI-task on publicly available BCI Competition III Dataset IVa and BCI Competition IV Dataset I. The classification performance of the proposed method was compared with TDP [24], FBCSP [17], FBRCSP [19], SCSP [20] and its filter-bank version denoted as FBSCSP, and CSP-R-MF [21].

BCI Competition Dataset IVa
The BCI Competition III Dataset IVa [28] was recorded from five healthy subjects, denoted as "al", "aa", "av", "aw", and "ay". The five subjects each performed 140 trials involving right hand and right foot MI, divided into training and test set. Table 1 shows the number of training and test sets for the five subjects. Each subject performed the MI over 3.5 s after visual cue, and relaxed for a random length of time (1.75-2.25 s) thereafter. A total of 118 EEG channels corresponding to the positions of the extended international 10/20-system were used for recording, with a sampling rate of 100 Hz. The EEG data were bandpass-filtered between 0.05 and 200 Hz. In this experiment, we selectively used 18 channels (K = 18) chosen based on the homunculus theory [29], as shown in Figure 2. The EEG signals in the time segment 0.5-2.5 s after presentation of the visual cue were bandpass-filtered using a fourth-order Butterworth filter operating at 0.5-40 Hz. For FBCSP, eight filter-banks were used for the frequency range 4-36 Hz, dividing evenly at 4-Hz intervals.

BCI Competition IV Dataset I
The BCI Competition IV Dataset I was recorded from four healthy subjects ("a", "b", " f " and "g") and contains two MI EEG signal classes [30]. Fifty-nine EEG channels (K = 59) were used to record the data, with a sampling rate of 100 Hz; these were bandpass-filtered between 0.05 Hz and 200 Hz. For each subject, the dataset consisted of 100 trials per class. This experiment used the EEG signals from 0.5 to 2.5 s after cue are used. For TDP extraction, the EEG signals were bandpass-filtered using a fourth-order Butterworth filter operating at 0.5-40 Hz. For FBCSP, eight filter-banks in the frequency range of 4-36 Hz that were divided evenly in 4-Hz intervals were used.

Experiment Results
The threshold, ρ thr , used for configuring supporting channel set S, plays an important role in classification performance. We determined the optimal threshold value by cross-validation. This can be set as a constant for all subject, or on individual basis for each subject. Although optimization of threshold for each individual subject performed better, this paper presents the experiment results obtained under both settings. Table 2 and 3 lists the classification results for the BCI Competition III Dataset IVa. Table 2 shows the classification performance of the CSP approach and its variants, and of the TDP algorithm. Table 3 compares the frequency-optimized CSP variants using filter-banks. The existing frequency-optimized channel selection approaches, FBSCSP and CSP-R-MF, yielded better performance (87.76% and 87.11%, respectively) than the FBCSP and FBRCSP approaches. The proposed method, i.e., frequency-optimized channel selection based on TDP, achieved the best mean classification accuracy, of 89.13%, among all existing algorithms.  Table 4 lists the classification performance in terms of the threshold value (ρ thr ) and the corresponding size of the supporting channel set (in parenthesis).  Table 5 shows the classification performance of the proposed method for BCI Competition IV Dataset I. In this experiment, we tested the algorithms using a 5 × 5 cross-validation. The performance of the proposed method was compared with the frequency-optimized channel selection approaches: FBSCSP, the filter-bank version of the sparse CSP (SCSP) [20], and CSP-R-MF [21]. As shown in Table 5, the proposed method yielded the highest mean classification accuracy. Since BCI Competition III Dataset IVa consists of the training and test data specified by BCI Competition III organizers, the performance evaluation based on the Dataset IVa might be overfitted. However, the performance evaluation using BCI Competition IV Dataset I based on cross-validation justified the high performance of the proposed method for an arbitrary training set. Table 6 lists the classification performance in terms of the threshold value and the corresponding size of the supporting channel set (in parenthesis) for BCI Competition IV Dataset I.

Conclusions
In this paper, we present a motor imagery (MI) classification algorithm using FBCSP features based on a MI-relevant channel selection. The proposed algorithm uses the Fisher ratio of TDPs and correlation coefficients to obtain a set of channels supporting the principle channel. The FBCSP features generated from the supporting channel set significantly improved the classification performance over existing method, as evaluated using BCI Competition datasets.