An Adaptive Task-Related Component Analysis Method for SSVEP Recognition

Steady-State Visual Evoked Potential (SSVEP) recognition methods use a subject’s calibration data to differentiate between brain responses, hence, providing the SSVEP-based brain–computer interfaces (BCIs) with high performance. However, they require sufficient calibration EEG trials to achieve that. This study develops a new method to learn from limited calibration EEG trials, and it proposes and evaluates a novel adaptive data-driven spatial filtering approach for enhancing SSVEP detection. The spatial filter learned from each stimulus utilizes temporal information from the corresponding EEG trials. To introduce the temporal information into the overall procedure, a multitask learning approach, based on the Bayesian framework, is adopted. The performance of the proposed method was evaluated into two publicly available benchmark datasets, and the results demonstrated that our method outperformed competing methods by a significant margin.


Introduction
A Brain Computer Interface (BCI) system is a device that it translates human brain activity into artificially generated control signals, providing us with an alternative communication medium, other than physical communication, which can be used to help people with motor disabilities [1], to augment communication abilities of healthy individuals, [2], for entertainment [2], and, for neuromarketing purposes [3].Brain activity can be measured with various specialized devices such as MRI scanners and electroencephalograms (EEG)-based devices, where, EEG devices is widely used since the required equipment is simple and inexpensive.EEG-based BCI systems utilize various brain responses such as motor imagery and visual responses, from which, the use of Steady State Visual Evoked Potentials (SSVEPs) have attracted the attention of many researchers due to their lower training requirements for the end-user and higher information transfer rates [4,5].When an individual is looking into a visual stimulus, which is flashing as a fixed frequency, then a brain response is revealed in occipital and occipital -parietal areas of individual's brain which is called SSVEP response [6].A SSVEP response contains sinusoidal components which are related to the fundamental frequency of the visual stimulus as well as its harmonics.The overarching goal of a SSVEP BCI system is to detect the different frequency components corresponding to the visual stimuli and translate them into commands, by using an EEG-based pattern recognition algorithms.
The recognition of SSVEP responses involves the use of Machine Learning (ML) algorithms.Linear classifiers such as Support Vector Machines (SVMs) and the Linear Discriminant Analysis (LDA) have been used to detect SSVEPs [7].In addition, in [8] the use of Multivariate Linear Regression (MLR) was proposed to learn discriminative features for improving SSVEP classification, while, in [9] kernel -based extensions of MLR were proposed using SSVEP-related kernels as an integral part of the Sparse Bayesian Learning (SBL) framework.Furthermore, Deep Learning (DL) approaches using Convolutional Neural Networks (CNN), based on time frequency analysis, are used to discriminate SSVEP responses [10,11].
However, SSVEP responses present specific frequency and spatial characteristics, hence methods utilizing these characteristics have been proposed.More specifically, methods based on Power Spectrum Density Analysis (PSDA) were widely used for frequency detection [12], where the target frequency is assigned to the frequency corresponding to maximum value of PSD.However, the efficient calculation of PSD requires a relatively large time window, and it is sensitive to noise [13,14].To avoid the above shortcomings of PSDA, spatial filtering approaches have been proposed.In [14] the Minimum Energy Combination (MEC) method has been proposed while in [13] the Canonical Correlation Analysis (CCA) method was introduced.Both methods use sinusoids waves as reference templates and they solve an optimization problem, based on multi-channel SSVEP data, in order to obtain optimal spatial filters.Finally, extensions of CCA have been proposed in [15,16,17,18,19,20], extracting the subject-specific and task-related information from the individ-ual calibration data and reduce the effect of spontaneous background EEG activities.
From spatial filtering methods, the task-related component analysis (TRCA)based method [18] shows great potential since it has achieved superior performance among various spatial filtering methods.The core idea of TRCA is to acquire the spatial filters by strengthening the task-related SSVEP components and suppressing the noise.TRCA-based methods are followed by the target detection step where the similarity between the filtered test signal and the filtered template is calculated via the correlation coefficient.All spatial filtering -based methods are based on the basic (generalized) eigenvalue problem [21,22].However, difference between various approaches can be observed and these difference are reflected to the way that the matrices, involved in the eigenvalue problem, are constructed [22].In [23] Correlated Component Analysis (CORCA)assumes that the task related component is shared among the subjects by adopting a transfer learning procedure in the construction of covariance matrices.While, in [24] a task discriminant component analysis was applied which involves the construction of within and between SSVEP targets covariance matrices.
One defect of TRCA is that it can only deal with limited noise components.For other kinds of noise, such as locally occurring noises that have the same profile, the TRCA-based method is powerless [25,26].Additionally, its performance deteriorate drastically if the calibration trials are insufficient.To deal with more general noises and the number of trials, we introduce a novel adaptive time-domain filter resulting in more reliable similarity measurement.By introducing the temporally-based filter into the objective function of the TRCA-based method we construct a time filter that acts together with the spatial filter to suppress more general noises.Furthermore, the filter adapts to the statistical properties of SSVEP trials.
The rest of this paper is organized as follows.In section 2, first, we provide a short description of CCA and TRCA concentrating on the mathematical formulation of them, and then, we describe our approach for SSVEP recognition.Section 3, we describe the SSVEP datasets that are used in our study, and then we provide details about our experiments and the performance of our method.Also, a comparison with competing methods is provided.Finally, a short discussion and some concluding remarks are are provided in section 4.

Problem Description
When a SSVEP experiment is taken place, the subject is seated in front of a screen where visual stimuli are flashing in different frequencies, During the experiment raw EEG data are collected in order to calibrate the overall system.The segmentation of raw EEG data (using event triggers), results into a set of trials for each visual stimulus (or class).Using these EEG trials the experimenter can calibrated the BCI system (for example, by training the classifier).Let us assume that the SSVEP dataset is a collection of multichannel EEG trials M } Ns s=1 for each participant, where M is the number of trials of a SSVEP target, (s) is the index of the SSVEP target, N s is the number of SSVEP targets (or classes).Each X where N ch is the number of channels and N t the number of samples.Additionally, we assume that the multi-channel EEG signals are centralized since in practise the EEG trials are bandpass filtered or detrended.

Canonical Correlation Analysis (CCA)
Spatial filtering attempts to maximize the SNR between the raw EEG data and the spatial filtered version of them.In typical cases, such as bipolar combination or laplacian filtering, the spatial filters are determined manually.However, this approach does not take into account any prior knowledge about SSVEPs or any subject-specific information.One of the first approach that take into consideration the structure of SSVEPs was based on Canonical Correlation Analysis (CCA) [13].The CCA is a multivariate statistical method attempting to discover underlying correlations between two sets of data [13,27].These two sets of data is assumed to be only a different view (or representation) of the same original (hidden) data.More specifically, CCA finds a linear projection for each set such that these two set are maximally correlated in the hidden (dimensionality-reduced) space.
In the SSVEP problem these two views is the test EEG trial X (s) m and the reference templates for s-th stimulus Y (fs) , where Since ρ s is invariant to the scaling of w s and v s the above optimization problem can be also formulated as the following generalized eigenvalue problem: where λ s is the eigenvalue corresponding to the eigenvector w s .
In order to find the stimulus of test EEG trial X (s) m , that the subject intends to select, we find ρ s for all available stimuli and the stimulus-target, c, is then identified by finding the index of the maximum feature among N s features: c = arg max s {ρ s }.It must be observed here that there is no need for training (or calibration) since the templates Y fs are artificially generated.

Task Related Component Analysis (TRCA)
Task-related component analysis (TRCA) enhances reproducibility of SSVEPs across multiple trials and the intuition of TRCA is to maximize the reproducibility of SSVEP target-related components after spatial filtering.More specifically, the TRCA method find the spatial filters w s by solving a generalized eigenvalue problem which is described by the following equation: where m , and B is a concatenated matrix contains all trials of s-th stimulus In order to find the target of the test trial, X test we apply the following discriminant function: where corr(•, •) denotes the Pearson's correlation coefficient.

Adaptive Task-Related Component Analysis (adTRCA)
In our work we propose a new generalized eigenvalue problem for SSVEP detection which is described by the following equation: where C and D are "filtering" matrix that acts on the time dimension of trials.The matrices C and D can be defined using various approaches and their goal is to remove noise in time domain.
In our study we make some critical assumptions about the generation model of SSVEP responses, which affect the data analysis procedure.More specifically, SSVEP responses contains strong sinusoids components [13], hence the SSVEP signal in each channel is modeled as a linear combination of sinusoids described the following matrix: Additionally, SSVEP responses belonging to the same visual stimulus share common components.From the above we can observe that the generation of SSVEP responses can be modeled as multiple regression tasks that share common information.
EEG trials from the s-th stimulus are collected in matrix Ns) , where each column of B contains the data from one channel or each column of B contains the data from one task.Hence we have Each learning task can be described by the following linear regression model: where w i 2N s N h × 1 vector of weights (or parameters), and, e i N t × 1 vector of noise coming from a zero mean Gaussian random variable with unknown precision (inverse variance) a 0 .We can observe that each of the mapping yields a corresponding regression task, and performing multiple such learning tasks has been referred to as multitask learning [28], which aims at sharing information effectively among multiple related tasks.In a more abstract view of our problem we can see that each learning task is a linear regression problem, and sinusoids components from one regression task affect the fitting procedure of another regression task.
The likelihood function for parameters w i and a 0 is given by: The parameters of a regression task, w i , are assumed to be drawn from a product of zero-mean Gaussian distributions that are shared by all tasks.Letting w i,j be the j-th parameters for i-th task then we have: where the hyperparameters a = {a j } j=1,2,••• ,2NsN h are shared among N ch N s regression tasks, hence, data from all regression tasks contribute to learning these hyperparameters.To promote sparsity over parameters, we place Gamma priors over hyperparameters a [29,28].Also, the same type of prior is placed over noise precision a 0 .
In addition, we can observed here, that noise properties are shared among different tasks (i.e. the noise vectors in Eq. ( 6) are drawn from the same Gaussian distribution).Finally, it must be noted that we have an hierarchical model, and these types of models are natural to be "dealt" within the bayesian framework.
Given hyperparameters a and noise precision a 0 , we can apply Bayes theorem to find the posterior distribution over w i , which is a Gaussian distribution: where and In order to find hyperparameters a and promote sparsity in parameters, the type-II Maximum Likelihood procedure is adopted [29,30], where the objective is to maximize the marginal likelihood (or its logarithm).Also, a similar procedure is followed for the noise precision.The marginal likelihood L(a, a 0 ) is given by: where C i = a −1 0 I + ΦAΦ T Differentiating L(a, a 0 ) with respect to a and a 0 and setting the results into zero [29,30,28] (after some algebraic manipulations) we obtain: Σ i,(j,j) a where µ i,j the j-th element of µ i and Σ i,(j,j) the j-th diagonal element of covariance matrix Σ i .The above analysis suggests an iterative algorithm that iterates between Eqs. ( 12), ( 13), ( 15) and ( 16), until a convergence criterion is satisfied.Also, the same algorithm can be derived by adopting the EM framework and treating parameters w i as hidden variables [29].Finally, based on the above bayesian formulation, we can derive a fast version of the above algorithm.The fast version provides an elegant treatment of feature vectors by constructing adaptively the matrix Φ through three basic operators: addition, deletion and re-estimation.More information on this subject can be found in [29,28].Now, SSVEP components in each task can be represented as: m a 0 ΦΣ i Φ .Due to filtered trials we find the spatial filters w s by solving the following generalized eigenvalue problem: where m , and B f is a concatenated matrix contains all trials of s-th stimulus, The above generalized eigenvalue problem can be connected by that of Eq. 5.After some algebraic manipulations Eq. 17 can be written as: max ws w s ACA w s w s BDB w s (18) where We can observe an interesting connection between the proposed method and the TRCA.
When the C = I, where I is the unitary matrix, the proposed approach degraded to the TRCA method.We see that the TRCA method is a limiting case of the proposed method.In addition, we can observe that matrices C and D acts on the time dimension of the EEG trials, hence time samples are differently treated according to the time dimension rather than equally weighted.Also, we can observe that filters, represented by matrix C, are adapted to the statistical properties of EEG trials.Finally, after finding the spatial filters, to find the target of the test trial, X f test we apply the following discriminant function:

Ensemble case
According to the previous discriminant rule, described by Eq. 19, we can observe that to calculate the similarity of the test trial to the stimulus s first we apply spatial filter w s .However, since we have enough calibration trials we can obtained N s spatial filters for each stimulus [18,23].Hence, we can extend our method using an ensemble approach, similar to [18,23], where all spatial filters for stimulus s are concatenated to create an ensemble spatial filter W s ∈ R N ch ×Ns .Now, in order to find the target of the test trial, X f test we apply the following discriminant function: where, now, the function corr(•) depicts correlation between matrices.

Results
In our study two, publicly available, SSVEP datasets are used to evaluate our method.We call these datasets, the Speller dataset and the EPOC dataset, and we provide a description of them in the following paragraphs.
Speller dataset [31]: The Speller dataset [31] contains 40-target visual stimuli were presented on a 23.6-in LCD monitor.Thirty five healthy subjects (with normal or corrected-to-normal vision) were recruited for the SSVEP experiment.Furthrmore, eight of them had experience with using a SSVEPbased BCI speller.EEG signals were recorded with 64 channels according to an extended 10-20 system.In our experiments we have used electrodes from the occipital and parietal-occipital areas (P z , P O 5 , P O 3 , P O z , P O 4 , P O 6 , O 1 , O z , and O 2 ) .During the experiment, each subject completes 6 blocks, where in each block the subject was looking at one of the visual stimuli, indicated by the stimulus program, in a randomg order for 5s and he/she complete 40 trials corresponding to all 40 targets.Using event triggers, EEG trials were extracted and down-sampled to 250Hz.Futhermore, there trials have been band-pass filterd to 7-90Hz with an infinite impulse response (IIR) filter using the filtfilt function from MATLAB.Additionally, a delay of 140ms in the visual system was considered in our experiments [31].
EPOC dataset [7]: EEG trials, from 11 subjects executing a SSVEPbased experimental protocol, were acquired.The frequencies of visual stimulation were: 6.66Hz, 7.50Hz, 8.57Hz, 10.00Hz and 12.00Hz.EEG trials were recorded using the Emotiv Epoc device, with 14 wireless channels and a sampling rate of 128 Hz.Each subject was asked to gaze at one of the visual stimuli, indicated by the stimulus program, in a random order, and complete 20 trials for each of the five targets.The EEG data have been band-pass filtered from 5Hz to 45Hz.More information about this dataset can be found in [7] and https://physionet.org/content/mssvepdb/1.0.0/.The visual latency is an important factor in a SSVEP BCI system.An accurate estimation of the visual latency ensures that the extracted data epochs only contain SSVEP responses to the stimulation.In the Speller dataset the visual latency is defined using the classification accuracy [31], while, in the EPOC dataset an alignment procedure is used to extract data epochs containing the SSVEP responses.

Performance metrics
To calculate the performance metrics we adopt a leave-one-out crossvalidation scheme where we use B − 1 blocks for training and 1 block for testing, similar to [17,8,31,9].The performance of the examined algorithms was calculated by using the following metrics: Classification Accuracy: Classification Accuracy is the ratio between the number of correctly classified trials to the total number of trials.
Information Transfer Rate (ITR): When a BCI system is designed, we are interested, besides the accuracy, to the number of classes offered, as well as, to the time required for the classification process.These two factors are particularly important in the design of BCI systems since they determine the amount of information that can be transmitted.Hence, we use an additional metric called Information Transfer Rate (ITR).ITR is used to quantify the rate of information transmitted by the BCI system and is measured by the following equation: with K being the number of classes, P the classification accuracy, and T the time in seconds required for the classification process to complete.We compare the proposed spatial filtering method with two spatial filtering approaches: the CCA [13] and the TRCA [18], and with two ML methods: the MLR approach [8] and the Graph-based Sparse Representations Classification (MLR-SRC) [32].We have calculate the above metrics with respect the time window of the test EEG trial, the number of EEG channels and the number of training trials.

Performance Comparison versus the Time Window Length
In the first series of experiments we follow a typical analysis of SSVEP datasets by examine the performance of methods with respect to the time window.More specifically, the performance of methods are evaluated for variable time from 0.5sec to 4sec with step of 0.5sec.In addition for the Speller dataset we use the 9 channels from occipital and parietal -occipital areas (Pz, PO5, PO3, POz, PO4, PO6, O1, Oz, O2), while, for the EPOC dataset we use all available 14 channels, covering the entire brain.In Fig. 1 we provide the obtained results for all comparative methods for the two datasets using the basic channel configuration.We can observed the adTRCA and TRCA methods provides better results from all other methods in both datasets.Also, we can observed that between the adTRCA and TRCA, the adTRCA method has marginally better detection accuracy in the Speller dataset, and significant better detection accuracy in the EPOC dataset.Similar conclusions can be drawn with respect to the ITR (see Tables 1 and 2).We can observed that for most TWs, the adTRCA methods provides the best ITR among all methods.However, we must point out, that for the Speller dataset the best ITR is achieved at TW=0.5s from the adTRCA method, while, for the EPOC dataset, the best ITR value is achieved at TWs=0.5s from the TRCA method.

Performance Comparison using the minimal number of EEG channels
In the second series of experiments we have used the minimal number of EEG channels to evaluate the performance of methods.These EEG channels covering the occipital area depending of the EEG device that had been used in each dataset.More specifically, for the Speller dataset O1, O2 and Oz EEG channels are used, while, for the EPOC dataset, we use O1 and O2 channels.This study could correspond to cases where we are not able to use  high density EEG devices such as in BCI applications outside a controlled environment.
In Fig. 2 we provide the obtained results for all comparative methods for the two datasets using the minimal channels' configuration.We can observed the adTRCA and TRCA methods provides better results from all other methods in both datasets.Also, we can observed that between the adTRCA and TRCA, the adTRCA method has better significant better detection accuracy in both datasets.In Tables 3 and 4, we provide the results with respect to the ITR measure for all methods.We can observed that for all TWs, besides ones, the adTRCA methods provides the best ITR among all methods.However, while for the Speller dataset the best ITR is achieved at TW=0.5s from the adTRCA method, for the EPOC dataset, the best ITR value is achieved at the same TW from the TRCA method.

Performance Comparison versus the Number of training trials
In the last series of experiments, we investigate how the performance of TRCA and adTRCA are affected by the number of training trials when the time window is 1sec.The obtain results are provided in Tables 5 and 6.The proposed method clearly has better accuracy from TRCA.Additionally, the proposed scheme achieves the best performance among them and the margin provided by it is more distinct when a smaller number of training blocks are utilized.Especially, in the case Speller dataset, when only three training blocks are utilized, the proposed scheme achieves a classification accuracy of 63%, while the TRCA method achieve an accuracy of 61%.Furthermore, in the case of EPOC dataset, when three training blocks are used the proposed scheme achieves a classification accuracy of 33%, while the TRCA method achieve an accuracy of 22% (slighlty above the random guess).

Ensemble case -Experiments
In the previous subsection we have presented experiments and we performed comparisons using the basic version of our method.In the current  subsection, we provide the obtained results of the ensemble version of our method (see 2.5), and also, we perform a comparison with ensemble TRCA [18].More specifically, in Fig. 3 we provide the obtained results for the ensemble TRCA (EnsembleTRCA) and ensemble Adaptive TRCA (Ensemble adTRCA) methods.The comparison between the two methods is performed with respect to the number of channels and the datasets.For the Speller dataset we can observe that in the case of 9 channels the two methods present similar performance.However, in the case of 3 channels the Ensemble adTRCA method provides significantly better performance than the EnsembleTRCA method.Additionally, when we use the EPOC dataset, the Ensemble adTRCA method provides significantly better performance, either using all 14 channels of the EPOC device or the two channels covering the occipital lobe.

Discussion and Conclusions
Enhancing the performance of SSVEP recognition is a significant issue for BCI applications.In this study, we develop a multitask learning scheme   to strengthen the TRCA method.The idea behind the proposed learning scheme is to develop a adaptive time -domain filter which can be used to a more general eigenvalue problem than the corresponding problem of TRCA method.The proposed method is able to deal with more general noises and with reduced number of trials as the experiments have shown.However, this increase in performance from our method has an increase also in the computation time of the overall procedure since an iterative method is used to find the adaptive time domain filters.A significant part of our study is the use of two SSVEP datasets to evaluate our method.Most of SSVEP studies use the Speller dataset to evaluate the proposed methods.However, this dataset was created into a controlled environment with high cost EEG equipment which make difficult to replicate An effect which can be observe by comparing the performance of adTRCA and TRCA on both datasets.We can see that adTRCA method provides much better performance than the TRCA in the case of EPOC dataset.In the Speller dataset the adTRCA method has around 1% better accuracy than the TRCA, while, in the EPOC dataset this difference is increase to 5%.Additionally, the ensemble version of our method presents better performance than the ensemble version of TRCA in both datasets, especially when we have a limited number of channels.
The spatial filters and the SSVEP templates play important roles in the target recognition methods.When the spatial filters and the SSVEP templates can not be accurately computed, e.g. in the case of small calibration data or noisy EEG recordings, the resulting recognition performance will be dramatically decreased.Hence, to this challenge the key is how to estimate reliable spatial filters.In this study, we present a novel spatial filtering approach to recognize SSVEP signals.Our method use the multi-task idea to construct adaptive time -domain filters resulting into a generalized eigenvalue problem from where the final spatial filters are obtained.Extensive experiments, using two SSVEP datasets, have been shown the usefulness of our method.The proposed method significantly outperformed the TRCA, the CCA, the MLR and the SRC methods in terms of classification accuracy and ITR.Finally, future extensions of our approach could include transfer learning approaches utilizing the data from all subjects to construct the recognition model.

Figure 1 :
Figure 1: Average Classification over all subjects (a) for the Speller dataset and (b) the EPOC dataset 14 using the basic configuration with respect to the EEG channels.In both cases, the time window ranges from 0.5s to 4s (0.5s interval).* indicates statistically significant difference between the TRCA and adTRCA methods, using paired sample t-test for Speller dataset and Wilcoxon signed rank test for the EPOC dataset (p < 0.05).

Figure 2 :
Figure 2: Average Classification over all subjects (a) for the Speller dataset (b) for the EPOC dataset using the EEG channels covering the occipital areas.In both cases, the time window ranges from 0.5s to 4s (0.5s interval).* indicates statistically significant difference between the TRCA and adTRCA methods, using paired sample t-test for Speller dataset and Wilcoxon signed rank test for the EPOC dataset (p < 0.05).

Figure 3 :
Figure 3: Average Classification over all subjects by using for the Speller dataset with (a) 9 channels and (b) 3 channels and for the EPOC dataset with (c) 14 channels and (d) 2 channels, respectively.In both cases, the time window ranges from 0.5s to 4s (0.5s interval).* indicates statistically significant difference between the two methods using paired sample t-test for Speller dataset and Wilcoxon signed rank test for the EPOC dataset (p < 0.05).
f s is the frequency of s-th stimulus Typically, CCA methods maximize the linear correlation between the projections w T s X m and v T s Y fs , where w s ∈ R N ch and v s ∈ R Nt .At the end, we solve the following optimization problem: max ρ s = max ws,vs

Table 5 :
Classification accuracy (%) on Speller dataset with respect to the number of training trials -9 channels

Table 6 :
Classification accuracy (%) on EPOC dataset with respect to the number of training trials -2 channels the study for real BCI applications with low cost equipment and in very noisy environment.Hence the aforementioned methods that are evaluated on the Speller dataset tend to underestimate the noise part of SSVEP EEG trials.