Simultaneously Spatiospectral Pattern Learning and Contaminated Trial Pruning for Electroencephalography-Based Brain Computer Interface

: Electroencephalography (EEG)-based brain computer interfaces (BCIs) translate motor imagery commands into the movements of an external device (e.g., a robotic arm). The automatic design of spectral and spatial ﬁlters is a challenging task, as the frequency bands of the spectral ﬁlters must be predeﬁned by previously published studies and given that they may be a ﬀ ected during trials by artifacts and improper motor imagery (MI). This study aimed to eliminate the contaminated trials automatically during classiﬁer training, and to simultaneously learn the spectral and spatial patterns without the need for predeﬁned frequency bands. Compared with previous studies that measured the discriminative power of a frequency band based on mutual information, this study determined the di ﬀ erence of the class conditional probability density function between two MI classes. This information was further shared to measure the contamination level of the trial that simpliﬁed the computation. A particle-based approximation technique iteratively constructed a ﬁlter bank that extracted discriminative features, and simultaneously removed potentially contaminated trials. The particle weight was estimated by an analysis of variance F-test instead of mutual information as commonly used in previous studies. The experimental results of a publicly available dataset revealed that the proposed method outperformed the other BCI in terms of the classiﬁcation accuracy. Asymmetrical spatial patterns were found on left- versus right-hand MI classiﬁcations. The learnt spectral and spatial patterns were consistent with prior neurophysiological knowledge.


Introduction
A brain computer interface (BCI) provides direct communication between the brain and an external device that translates neuronal signals into user commands without the use of the peripheral nerve and muscle pathways. Many research groups have created electroencephalography (EEG)-based BCIs that offer sufficient time resolution, usability, and are associated with lower brain surgery risks [1][2][3][4] for medical and industrial applications. The analysis of EEG signals induced by motor imagery (MI), i.e., the imagery of body-part movements, has received considerable attention because of its widespread application in motor tasks, such as wheelchairs and mouse cursor controls [1]. Although many data-driven approaches have been developed to automatically design BCIs, low signal-to-noise ratios of EEG signals and the highly complex connectivity of the human brain make the decoding of the user's intention difficult.
The EEG signal possesses task-specific features in the spectral and spatial domains during the MI task [5]. The spectral features are extracted by a spectral filter with a specific frequency band. The spectral-filtered signals are spatially transformed by a spatial filter. The spatially transformed signals are expected to be maximally discriminative between the two MI classes. Therefore, the identification of spectral and spatial filters constitutes the main BCI design issue.
In the spectral domain, task-related changes in the EEG band power induced by MI tasks, commonly referred to as event-related desynchronization (ERD) or event-related synchronization (ERS), have attracted attention in the analysis of EEG signals. A power decrease in the µ-rhythm (8)(9)(10)(11)(12)(13)(14), and a power increase in the β-rhythm (14-30 Hz) of the motor and somatosensory cortices, respectively, indicate ERD and ERS during the execution of MI tasks. Therefore, the selection of frequency bands that possess discriminative power among different MI tasks is the key issue of the BCI design. However, ERD/ERS patterns vary greatly across subjects and even across trials [6]. This makes the selection of frequency bands difficult. The spectral filter is often either manually selected based on visual inspection or is usually designed for broadband frequency operations, including both µand β-rhythms [6][7][8][9]. Several research studies have attempted to determine the optimal frequency bands and spatial patterns related to ERD/ERS using the techniques of signal processing and pattern recognition [7,10]. However, these approaches cannot find the optimal frequency bands in a general analytic manner.
For spatial filter optimization, machine learning has been considered as the main tool because of the increasing development of computational hardware and statistical learning theory [7]. A common spatial pattern (CSP) algorithm, a data-driven method, has been extensively used in the design of spatial filters for two-class MI problems [6,11]. This CSP identifies the spatial filter and an optimal projection direction that maximizes the variance of the data conditioned on one class. Simultaneously, the CSP minimizes the variance of the data conditioned on the other class. Raw EEG signals are then projected to this direction which makes MI classification easier. However, the CSP is highly sensitive to the spectral filter and the EEG signal. Improper spectral filters or misleading EEG signals may lead to degraded BCI performances [7,10,12]. Therefore, simultaneously optimizing spectral and spatial filters is desirable in most EEG-based BCIs.
A CSP augments spatial signals with time-delayed signals, thus resulting in double channels [13]. However, the first-order, finite impulse response, filter optimization approach, restricts the adaptability of this method. Accordingly, the time-delay parameter must be tuned by experience. The adaptability of this work was extended by introducing arbitrary order finite impulse response filters that allow the extraction of spectral features with a higher degree for the CSP [14]. However, the filter coefficients were strongly correlated with the initial parameters.
Numerous iterative procedures have been proposed to jointly optimize both the spectral and spatial filters. One approach uses a convex optimization to automatically learn feature extraction, selection, and combine them in the spectral and spatial domains [15]. An iterative spatiospectral pattern learning approach was proposed to learn spatial filters based on the optimized spectral filters and the classifier in the preceding iteration [16].
Numerous research groups developed a bank of bandpass filters to extract spectral features in various frequency bands. Filter-bank CSP (FBCSP) has been considered as the most computationally efficient algorithm that divides a broadband frequency range into small nonoverlapping frequency bands with fixed bandwidths. FBCSP then independently employs an individual CSP module to each individual frequency band to extract spatial features from EEG signals. A feature set possessing discriminative power was selected by a maximal mutual information criterion [17]. Conversely, a coefficient decimation method was developed to select a subject-specific CSP discriminative filter bank [18]. However, these methods determine optimal filters for unimodal multivariate normal distributions only. An optimal spatiospectral filter network further optimizes the spatiospectral filters by estimating the mutual information between the features and the class labels, and can thus be applied to complex data structures [19].
The frequency bands of the filter bank usually cover the µand β-rhythms, and the spatial filter was optimized in each frequency band which led to improved classification accuracy [17,20]. However, these methods need to predefine the frequency bands according to the neurophysiological knowledge [18,19]. A Bayesian spatiospectral filter optimization (BSSFO) was proposed to simultaneously design spectral and spatial filters without predefined frequency bands [21]. Nevertheless, the features extracted from EEG signals contaminated by noise and artifacts may not be related to the motor imagery tasks, and may thus lead to inaccurate learning of spectral and spatial patterns.
The EEG signals are easily contaminated by background noise and eye-blink artifacts which lead to low signal-to-noise ratios. Conversely, the EEG signals may be contaminated by improper motor imagery wherein the subjects do not perform the imagery tasks properly, or performed the wrong mental tasks. Data-driven BCI classifiers were sensitive to these noisy and atypical observations (contaminated trials). The performance of BCI was degraded when the contaminated trials participated in classifier training. Therefore, eliminating the contaminated trials is desired during classifier training to achieve reliable MI classification. The contaminated trials induced by muscle and eye-blinks can either be removed by independent component analysis (ICA), or rejected by threshold methods [22]. However, ICA requires a visual inspection to select the artificial components that make the implementation of an automatic BCI system nearly impossible. In addition, both the ICA and threshold methods cannot detect the noise caused by improper imagery.
A relevant dimensionality estimation-based method was proposed [23] to detect contaminated trials caused by improper motor imagery. Wang et al. proposed a trial pruning method that used a Gaussian mixture model and a genetic algorithm to eliminate the trials contaminated by artifacts and improper imagery [12]. These approaches extracted features with predefined frequency bands along with CSP. Therefore, the trial pruning method may be improved further when the spectral and spatial patterns can be automatically learnt without predefined frequency bands.
This study aims to tackle the challenges associated with the design of filter banks without predefined frequency bands and with contaminated trials for the EEG-based BCIs. This study proposes Parzen Windows-based spatiospectral patterns with trial pruning (PWSPTP) to achieve the research goal. The main contributions of the present study are summarized as follows:

1.
A particle-based approximation technique was developed to iteratively construct a filter band and detect potential contamination trials. The spectral and spatial features were learnt as the contaminated trials were eliminated during classifier training that led to improved MI classification accuracy; 2.
The discriminative power of a feature and the contamination level of a trial were simultaneously estimated by the difference of the class conditional probability density function (pdf ) instead of using mutual information. The class conditional pdf was estimated by the Parzen window density estimator; 3.
The importance weight (particle weight) of each feature was estimated by analysis of variance (ANOVA) F-tests with the use of the class conditional pdf that can be simply implemented.
The remaining parts of this study are organized as follows: Section 2 illustrates the proposed PWSPTP and provides a better understanding of the relationship between PWSPTP and other methods, including similarities and differences. Section 3 demonstrates the performance of the PWSPTP for MI classification. Finally, Section 4 summarizes the present study.

Parzen Windows-Based Spatiospectral Patterns with Trial Pruning
This study aimed to design an EEG-based BCI that could automatically eliminate the contaminated trials during classifier training, and to simultaneously learn the spectral and spatial patterns without predefined frequency bands and negative effects of the contaminated trials. Inspired by a particle-based approximation technique, this study constructs a filter bank with overlapping frequency bands. The discriminative power of each frequency band is estimated by the Parzen window density estimator. Compared with BSSFO, the present study measured the discriminative power of the frequency bands based on the difference of the class conditional pdf of two MI classes instead of using the mutual information. Furthermore, the particle weight was estimated by analysis of variance (ANOVA) F-tests instead of mutual information. Conversely, inspired by the trial pruning method proposed by [12], the present study eliminated the contaminated trials during classifier training. The trials potentially contaminated by artifacts and improper imagery are detected by the class conditional pdf. This study combines the merits of the particle-based approximation technique and trial pruning method to simultaneously measure the discriminative power of the frequency band and eliminate the contaminated trials. This innovation overcame the limitation of the trial pruning method that adopted fixed nonoverlapping frequency bands and overcame the negative effects of the contaminated trials when learning the spectral and spatial patterns.
PWSPTP mainly consisted of three parts: (a) estimation of the discriminative power of the feature, (b) estimation of the contamination level of a trial, and (c) composition of a filter bank without contaminated trials. Figure 1 shows the plot of the PWSPTP. In the training phase, a set of EEG signals corresponding to two MI classes were processed by a set of spectral filters (filter bank) and spatial filters (CSP). The discriminative power of spatiospectral features and contamination level of a trial were quantified by a Parzen window density estimator. A filter bank was constructed by the particle filter whose particle weight was estimated by an ANOVA F-test. A spectrally weighted classifier was trained according to the learnt spectral and spatial features without contaminated trials. In the testing phase, the classifier made an MI prediction based on the features extracted by the learnt spatiospectral filters.

Spatiospectral Filter Optimization
Consider a single-EEG trial x ∈ R τ of a MI task, where τ denotes the number of electrode channels. This study considered a binary classification problem whose class label ω was positive (+) or negative (−), e.g., left-or right-hand motor imagery. The features of x are usually extracted by the following three steps: (1) spectral filtering: z = h ⊗ x; (2) spatial filtering: y = W T z; (3) feature extraction: f = log(var(y)); where h represents the spectral filter, W represents the spatial filter, z represents the spectrally filtered EEG, y represents the spatially filtered signal, and var represents the variance function. Each spectral filter is specified by the start b s and the end b e of the frequency band, whereby b s < b e . The goal was to find a set of spectral filters whose features could be correctly discriminated between two MI tasks. The function p(b) of the random variable b = b s b e T defines the probability of the spatially filtered signal and its correct classification between the two MI tasks. Given K trials of EEGs X = {x k } K k=1 and their corresponding class labels Ω = {ω k } K k=1 , ω ∈ {+, −}, a posterior pdf was defined as p( b| X, Ω). As the spectrally filtered EEG z, spatially filtered signal y, and the feature f were computed deterministically, p( b| X, Ω) could be directly obtained by a posterior pdf p f X, Ω . and spatial filters (CSP). The discriminative power of spatiospectral features and contamination level of a trial were quantified by a Parzen window density estimator. A filter bank was constructed by the particle filter whose particle weight was estimated by an ANOVA F-test. A spectrally weighted classifier was trained according to the learnt spectral and spatial features without contaminated trials. In the testing phase, the classifier made an MI prediction based on the features extracted by the learnt spatiospectral filters.  (a) Training phase. The spectral and spatial filters and the classifier were trained after the elimination of contaminated trials. The MI class label was adopted to optimize the spatial filters and classifier. (b) Testing phase. The learnt classifier made an MI prediction based on a single-trial EEG.
Inspired by the particle-based approximation [21], a weighted particle-set B old = b old n , π n , M n N n=1 was generated to approximate p f X, Ω , where π denotes the importance weight of the particle, N denotes the number of particles, b n denotes a particle (a single-frequency band), and M n = m k n K k=1 denotes a set of noisy trial marks with m k n ∈ {1, 0}. Each particle represents a frequency band of a spectral filter and its corresponding noisy trial mark. An individual CSP module was independently performed based on the features that correspond to one spectral filter to obtainŴ n [8]. The features f n N n=1 were extracted by these spectral filters (particles) and their corresponding spatial filters. The importance weight of each particle π n was computed by a class conditional pdf (detailed in Section 2.2). The noisy trial mark m k n was updated by the class conditional pdf as well (detailed in Section 2.3). For the next sampling iteration, a new set of particles was generated by an effective prior B old , as defined in the previous iteration. A particle b old j was randomly chosen from B old by roulette wheel selection with the use of π j . In other words, the chance for the selection of b old j was proportional to π j . A detailed description of roulette wheel selection could be found in [24]. The chosen b old j was copied as the new particle b new n for the next iteration. If b old j has been chosen multiple times, a diffusion method was applied to avoid identical copies of the particle and was implemented by adding noise to the particle as follows, b new where σ is a Gaussian noise with a zero mean and a unit variance. This ensured that the particle with a large discriminative power could be reserved and was prevented from converging to a local optimum [25]. . Iterative sampling of the particle around the current particle with the importance weight helped the particles converge to p f X, Ω . The resulted particles constructed a weighted filter bank whose spectral filter bandwidths may be different from those of other particles and may overlap Symmetry 2020, 12, 1387 6 of 15 each other. This overcame the limitation of the predefinition of the frequency bands according to accumulated neurophysiological knowledge [17,19].

Importance Weight Update Rule for Particles
The importance weight of each particle was updated based on the evaluation of the discriminative power of the features. Compared with the BSSFO [21], the discriminative power of a feature was estimated based on the difference of the class conditional pdf instead of mutual information. Furthermore, the class conditional pdf was also adopted to measure the contamination level of a trial.
Inspired from the F-score for feature selection [26], ANOVA [27] in conjunction with class conditional pdf was used to measure the discriminative power of a feature (particle). It has been known that the trials with low signal-to-noise ratios and the trials contaminated by improper motor imagery may reduce the discriminative power of a feature. The trials with low signal-to-noise ratios possessed low class conditional probabilities. The trials contaminated by improper motor imagery possessed higher class conditional probability for the improper class than the class assigned in the MI task. Therefore, a difference of the class conditional pdf for the trial of the class + was defined as while that for the trial of the class − was defined as The discriminative power of a feature was large when e k n was large, and vice versa. Furthermore, e k n was small for the trial contaminated by noise and artifacts. Thus, this value could be adopted to detect contaminated trials as well (detailed in Section 2.2). The class conditional pdf p f k n ω = c was determined using a Parzen window density estimator [28,29] as follows: where where c ∈ {+, −}, K c denotes the number of trials belonging to class c, λ denotes the width of the window function, Σ denotes a covariance matrix, and r denotes the dimension of the feature. Note that a multivariate Gaussian window function ψ(·) was adopted. The importance weight of each particle π n was defined as follows, where J = ∪ k m k n = 1 , k ∈ {1, 2, . . . , K} represents a set of trials that are not marked as noisy trials. In other words, the importance weight was calculated exclusively for noisy trials to ensure the discriminative power of a feature (particle) could be carefully measured without the effect of noisy trials. The numerator denotes the discrimination between two MI classes, while the denominator Symmetry 2020, 12, 1387 7 of 15 denotes the discrimination within each MI class based on the use of the difference of the class conditional pdf.

Trial Pruning Process
Trial pruning aimed to prune trials contaminated by noise, artifacts, and improper motor imagery in the training data. The contamination level of the trial was measured by the class conditional pdf that was also adopted in the importance weight-update rule of the particles. The trial contaminated by noise and artifacts were identified as a type-1 noisy trial and had low p f k n ω = c for either the + or − classes, and led to small sums p f k n + + p f k n − . Therefore, the sum of the class conditional pdf was used as a type-1 noisy indicator and was defined as follows, Conversely, the trials contaminated by improper motor imagery were identified as type-2 noisy trials. These trials had lower p f k n + values than p f k n − when the assigned class label was +, and vice versa. This led to negative e k+ n or e k− n values. Therefore, the difference of the class conditional pdf, e k n , was used as a type-2 noisy indicator, while it was also used to update the importance weight of each particle.
Once the contamination level of each trial has been measured, q k n was sorted in order as follows, where u + (·) and u − (·) denote the sorted indices of the trials in classes + and −, respectively, and K + and K − denote the numbers of trials in classes + and −, respectively. The first ε 1 * K + and ε 1 * K − noisy trial marks of m k n were set to zero for classes + and −, where ε 1 denote a predefined percentage of the type 1 noisy trials and · represent the "flooring" operation. In other words, type-1 noisy trials are detected as follows, For type-2 noisy trials, e k+ n and e k− n were sorted according to the following order, where v + (·) and v − (·) denote the sorted indices of the trials in classes + and −, respectively. The first ε 2 * K + and ε 2 * K − noisy trial marks of m k n were set to zero for classes + and −, respectively, where ε 2 denotes a predefined percentage of the type-2 noisy trials. In other words, type-2 noisy trials were detected as follows, Note that the spatiospectral filter (corresponding to one particle) may filter out the noise that leads to a high q k n value. Therefore, the noisy trials in one particle may be different from those in another particle. These contaminated trials were iteratively pruned when the CSP was executed at the instant at which the importance weight of the particle was updated. The impact of the contaminated trials could then be reduced during the design of the filters and classifier in the training phase. As a result, a set of spectral and spatial filters could be iteratively constructed by simultaneously estimating the and spatial filters were optimized by the PWSPTP and the contaminated trials were removed during the training process. The output of the mixture of expert models was

Performance Evaluation
To demonstrate the efficiency of discriminative feature extraction of the PWSPTP, this study considered the evaluation protocol of paired binary classification, i.e., left-versus right hand, left hand versus foot, left hand versus tongue, right hand versus foot, right hand versus tongue, and

Mixture of Expert Classifiers
It has been determined that a mixture of expert classifiers could improve the classification accuracy [30]. The present study linearly combined multiple classifiers to predict the MI class. Each classifier s n played the role of one expert and was trained by the features extracted from one spectral filter with bandwidth b n and one spatial filterŴ n . The mixing coefficient of s n was the importance weight π n . A support vector machine (SVM) that possessed a strong classification performance [31] was used as the classifier. Note that these classifiers were trained after the spectral Symmetry 2020, 12, 1387 9 of 15 and spatial filters were optimized by the PWSPTP and the contaminated trials were removed during the training process. The output of the mixture of expert models waŝ π n ·s n f n (14)

Performance Evaluation
To demonstrate the efficiency of discriminative feature extraction of the PWSPTP, this study considered the evaluation protocol of paired binary classification, i.e., left-versus right hand, left hand versus foot, left hand versus tongue, right hand versus foot, right hand versus tongue, and foot versus tongue. This protocol evaluated the discriminative power of the features extracted by the learnt frequency bands when the contaminated trials were eliminated during the training process.
The classification accuracy of the evaluation protocol was the percentage of correctly classified trials. All experiments were conducted with MATLAB (Version R2019a, MathWorks, Natick, MA, USA), with an Intel (R) Core (TM) i7-6900K central processing unit (CPU) @3.2 GHz, a 64 GB random access memory (RAM), and the Microsoft Windows 10 operating system.

Experimental Results and Analysis
This section introduced the dataset used in the experiment and then presented the setting of the proposed PWSPTP. The classification results of the PWSPTP were presented when FBCSP and BSSFO are compared. Finally, a discussion is provided to analyze the experimental results.

Dataset
BCI Competition IV Datasets 2a provided by the Institute for Knowledge Discovery were used for the evaluation of the proposed PWSPTP [32]. Nine subjects performed four MI tasks (left hand, right hand, both feet, and tongue). One training session and one testing session were recorded with 22 EEG channels and were sampled with a frequency of 250 Hz for each subject. Each session consisted of 288 trials with 72 trials for each MI task. The EEG signals in the dataset were originally bandpass filtered (0.5 Hz-100 Hz) and further processed by a 50 Hz notch filter to suppress line noise. This study bandpass filtered (4 Hz-40 Hz) the EEG signals to cover the µ (8-14 Hz) and β (14-30 Hz) rhythms. Although prior neurophysiological knowledge was adopted, there were no predefined frequency bands in the filter bank. A small Laplacian derivation [33] was applied to each electrode using anterior, posterior and both lateral electrodes to increase the signal-to-noise ratio. Following the same protocol as that adopted by [34], the time segment with the length of 2 s between 0.5 s and 2.5 s after visual cue onset in each trial was adopted for MI classification in both training and testing sessions. Detailed descriptions of the BCI Competition IV Datasets 2a could be found in [32].

Setting of PWSPTP
The initial prior distribution which was adopted to generate a set of particles was set as a mixture of Gaussian functions according to prior neurophysiology knowledge where µ and β represent the mean frequencies of the µ-(8-14 Hz) and β-rhythms (14-30 Hz), respectively, and the covariance Σ was set to be an identical matrix. Therefore, the start b s and the end b e of the frequency band (particle) were initially sampled from the mixture of Gaussians. The half number of the spatial patterns, d, was set to one. A Gaussian kernel-based SVM was used in the present study owing to its superior classification performance of BCI studies [7,21,35].

Effects of Numbers of Particles
The number of particles (N) was the hyperparameter of the PWSPTP. To investigate the effects of the hyperparameter to the MI classification accuracy, PWSPTP used various N values were implemented. Figure 3 shows the experimental results with various N values on the left-and right-hand MI classification accuracy for subject 2. The classification accuracy was improved as N increased, but was not monotonically improved when N was larger than 35. Therefore, the PWSPTP was used for the construction of the filter bank with 35 frequency bands.

Effects of Pruning Percentages
The pruning percentages ( 1 and 2 ) determined the numbers of type-1 and -2 noisy trials and were designed empirically. The previous study revealed that the trial pruning method improved the classification accuracies of the subjects that were associated with low-MI classification accuracies [12]. The left-hand versus right-hand classification of subject 2 has been known as a strongly confused MI binary classification [12,21,36]. The left-and right-hand MI classifications of subject 2 were conducted to analyze the effects of various 1 and 2 , as presented in Figure 4.
Increasing the values of 1 and 2 improved the classification accuracy when 1 and 2 were smaller than 3% and 7%, respectively. The classification accuracy decreased when 1 and 2 were larger than 3% and 7%, respectively. Therefore, 1 and 2 were empirically set to 3% and 7%, respectively, to remove two types of noisy trials during the training phase.

Effects of Pruning Percentages
The pruning percentages (ε 1 and ε 2 ) determined the numbers of type-1 and -2 noisy trials and were designed empirically. The previous study revealed that the trial pruning method improved the classification accuracies of the subjects that were associated with low-MI classification accuracies [12]. The left-hand versus right-hand classification of subject 2 has been known as a strongly confused MI binary classification [12,21,36]. The left-and right-hand MI classifications of subject 2 were conducted to analyze the effects of various ε 1 and ε 2 , as presented in Figure 4.

Effects of Pruning Percentages
The pruning percentages ( 1 and 2 ) determined the numbers of type-1 and -2 noisy trials and were designed empirically. The previous study revealed that the trial pruning method improved the classification accuracies of the subjects that were associated with low-MI classification accuracies [12]. The left-hand versus right-hand classification of subject 2 has been known as a strongly confused MI binary classification [12,21,36]. The left-and right-hand MI classifications of subject 2 were conducted to analyze the effects of various 1 and 2 , as presented in Figure 4.
Increasing the values of 1 and 2 improved the classification accuracy when 1 and 2 were smaller than 3% and 7%, respectively. The classification accuracy decreased when 1 and 2 were larger than 3% and 7%, respectively. Therefore, 1 and 2 were empirically set to 3% and 7%, respectively, to remove two types of noisy trials during the training phase.

Visualization of Spatial Patterns and Distribution of Discriminative Frequency Band
The most significant spatial patterns learnt along with the spectral filter (particle) with maximum importance weight value of subject 1 are shown in Figure 5. These spatial patterns are distinguishable among six MI pairs, and were consistent with the neurophysiological plausibility Increasing the values of ε 1 and ε 2 improved the classification accuracy when ε 1 and ε 2 were smaller than 3% and 7%, respectively. The classification accuracy decreased when ε 1 and ε 2 were larger than 3% and 7%, respectively. Therefore, ε 1 and ε 2 were empirically set to 3% and 7%, respectively, to remove two types of noisy trials during the training phase.

Visualization of Spatial Patterns and Distribution of Discriminative Frequency Band
The most significant spatial patterns learnt along with the spectral filter (particle) with maximum importance weight value of subject 1 are shown in Figure 5. These spatial patterns are distinguishable among six MI pairs, and were consistent with the neurophysiological plausibility [5]. The learnt spatial patterns identified the attenuations and increases in the localized neural rhythmic activities and helped improve the MI classification.
Symmetry 2020, 12, x FOR PEER REVIEW 11 of 15 [5]. The learnt spatial patterns identified the attenuations and increases in the localized neural rhythmic activities and helped improve the MI classification. The distribution of the discriminative frequency band learnt by the proposed PWSPTP is presented in Figure 6. The probability was computed within an interval of 0.5 Hz. Most optimized particles distributed around the µ-rhythm. This indicates that these discriminative frequency bands matched the neurophysiological knowledge. Figure 6. Visualization of the distribution of the discriminative frequency band wherein the optimized particles were marked as black circles.

Classification Performance
The classification performance of the PWSPTP was averaged over ten independent runs owing to its stochastic properties. The experimental results of six paired binary MI classifications among nine subjects are presented in Table 1. The classification accuracies varied from subject to subject. Specifically, subjects 1 and 3 obtained higher accuracies, while subject 5 and 6 obtained lower The distribution of the discriminative frequency band learnt by the proposed PWSPTP is presented in Figure 6. The probability was computed within an interval of 0.5 Hz. Most optimized particles distributed around the µ-rhythm. This indicates that these discriminative frequency bands matched the neurophysiological knowledge.
Symmetry 2020, 12, x FOR PEER REVIEW 11 of 15 [5]. The learnt spatial patterns identified the attenuations and increases in the localized neural rhythmic activities and helped improve the MI classification. The distribution of the discriminative frequency band learnt by the proposed PWSPTP is presented in Figure 6. The probability was computed within an interval of 0.5 Hz. Most optimized particles distributed around the µ-rhythm. This indicates that these discriminative frequency bands matched the neurophysiological knowledge. Figure 6. Visualization of the distribution of the discriminative frequency band wherein the optimized particles were marked as black circles.

Classification Performance
The classification performance of the PWSPTP was averaged over ten independent runs owing to its stochastic properties. The experimental results of six paired binary MI classifications among nine subjects are presented in Table 1. The classification accuracies varied from subject to subject. Specifically, subjects 1 and 3 obtained higher accuracies, while subject 5 and 6 obtained lower

Classification Performance
The classification performance of the PWSPTP was averaged over ten independent runs owing to its stochastic properties. The experimental results of six paired binary MI classifications among nine subjects are presented in Table 1. The classification accuracies varied from subject to subject. Specifically, subjects 1 and 3 obtained higher accuracies, while subject 5 and 6 obtained lower accuracies. For the paired MI classification, the PWSPTP achieved better performance in the MI of the left hand vs. foot, left hand vs. tongue, and right hand vs. tongue (in this order). The PWSPTP was compared with the CSP outcomes [6] to evaluate the effects of the spatiospectral filter optimization integrated in the trial pruning method. The experimental results in Table 2 revealed that the proposed PWSPTP outperformed the CSP. The trial pruning improved the classification accuracy based on the removal of the contaminated trials during the optimization of the spatiospectral filter.

Effect of Number of Particles on the Classification Accuracy
When the number of particles increased, the classification accuracy was not monotonically improved. As the number of particles was small, only few spectral and spatial patterns could be extracted by these particles (frequency bands) that may not possess sufficient discriminative power for the MI tasks. Conversely, as the number of particles was too large, some particles may be resampled from the same particle with high discriminative power. These particles may possess high overlapping portions of frequency bands compared with the other particles. Although these particles possess high discriminative power, these frequency bands may result in highly similar spectral and spatial patterns that do not provide diverse features for training the BCI. Furthermore, the BCI may overfit the training data that possess large dimensional features. Therefore, an appropriate number of particles could not only extract a sufficient number of spectral and spatial patterns, but prevent too many particles being almost identical.

Effect of Pruning Percentages on the Training Phase
Continuous increases of the ε 1 and ε 2 values may prune more trials that may not be contaminated trials. When too many trials are eliminated, the PWSPTP could not learn accurately the spatiospectral filters owing to the small amount of trials available for the training phase. Insufficient data amounts may lead to overfitting issues in the training of the classifier. These results suggested that approximately 3% and 7% of the trials were respectively contaminated by background noise/artifacts and improper MI for subject 2.

Asymmetrical Spatial Patterns on Left-versus Right-Hand MI Classification
The learnt spatial patterns of the left-hand versus right-hand MI classification addressed neurophysiologically plausible areas. The spatial patterns of the left-and right-hand MIs focus on the right and left motor cortices, respectively. These asymmetrical spatial patterns on left-versus right-hand MI classifications are consistent with the neurophysiological plausibility [5]. This suggested that the PWSPTP could automatically learn MI-related spatial patterns which match neurophysiological evidence. From the distribution of the discriminative frequency band, the PWSPTP could automatically learn the discriminative frequency bands related to the µ-rhythm without predefining them so that the discriminative spectral features associated with MI could be effectively extracted for classification.

Effect of Trial Pruning on Paired Binary MI Classification Performance
It has been revealed that the paired binary MI classification of the left-hand vs. foot was better than the left hand vs. right hand among nine subjects. When the subject performed four MI tasks, the likelihood of performing improper left-/right-hand MI may be higher than that of left-hand/foot MI. Therefore, the left-/right-hand MI may result in more type-2 noisy trials than the left-hand/foot MI. Although PWSPTP could prune the contaminated trials, a few type-2 noisy trials may not be eliminated owing to the use of fixed pruning percentages. Thus, the classification accuracy of the left-hand/foot MI was higher than that of the left-hand/right-hand MI.
The CSP is a data-driven method that is sensitive to contaminated trials. The contaminated trials may increase the likelihood of the MI classifier learning spectral-spatial features irrelevant to the MI task, and could thus hamper the MI classification accuracy. The PWSPTP not only integrated trial pruning in the spatiospectral filter optimization, but shared the information of the discriminative power of spectral features to measure the contamination levels of the trials. Comparing with the CSP, the PWSPTP extracted the spatiospectral features without any noted effects on the outcomes from trials conducted in the presence of noise, and then improved the classification accuracy.

Conclusions and Future Work
The present study proposed a PWSPTP that constructed a filter bank without predefined frequency bands and without contaminated trials during the training phase. A Parzen window density estimator was applied to measure both the discriminative power of the spatiospectral features and contamination levels of the trials. A set of discriminative spectral filters was learnt by a particle-based approximation technique. Meanwhile, the use of ANOVA F-tests makes the computation of an importance weight of each spectral filter simpler. A set of potentially contaminated trials were removed during the training of classifiers without the effects of artifacts and improper motor imagery. Six paired binary MI classifications among nine subjects were conducted to evaluate the effectiveness of the learnt filter bank and trial pruning. The PWSPTP outperformed the CSP in terms of classification accuracy. The learnt spatial patterns and discriminative frequency bands were approximately consistent with the neurophysiological plausibility. The number of spectral filters and the number of percentages of pruned trials were hyperparameters and should be defined before running the PWSPTP. Future work will be devoted to the adjustment of the number of spectral filters and the number of percentages of pruned trials in the training phase.