Steady-State Visual Evoked Potential Classification Using Complex Valued Convolutional Neural Networks

The steady-state visual evoked potential (SSVEP), which is a kind of event-related potential in electroencephalograms (EEGs), has been applied to brain–computer interfaces (BCIs). SSVEP-based BCIs currently perform the best in terms of information transfer rate (ITR) among various BCI implementation methods. Canonical component analysis (CCA) or spectrum estimation, such as the Fourier transform, and their extensions have been used to extract features of SSVEPs. However, these signal extraction methods have a limitation in the available stimulation frequency; thus, the number of commands is limited. In this paper, we propose a complex valued convolutional neural network (CVCNN) to overcome the limitation of SSVEP-based BCIs. The experimental results demonstrate that the proposed method overcomes the limitation of the stimulation frequency, and it outperforms conventional SSVEP feature extraction methods.


Introduction
The brain-computer interface (BCI) or brain-machine interface (BMI) is a direct communication pathway for controlling external devices by discriminating brain signals [1,2]. BCIs have been widely researched and applied for many practical systems [3][4][5][6]. There are various methods for acquiring brain signals for BCIs such as electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI). In this paper, we focus on EEG-based BCI because EEG is a noninvasive and simple signal acquisition method for BCIs [7].
There are various ways to issue commands for BCIs. For example, event-related potentials (ERPs) such as P300 [8], sensorimotor rhythms (SMRs) such as µ-rhythm [9,10], the steady-state visual evoked potential (SSVEP), and auditory or tactile evoked responses [11,12] have been used for BCIs. A recent review of BCI technologies and applications appeared in [13].
SSVEP is a response to periodic flicker stimulation with a frequency of 5-30 Hz that is usually presented by an LED or LCD [14,15]. The response EEG has peak frequencies that are the fundamental and harmonic frequencies of the flicker stimulation. An SSVEP-based BCI is realized by assigning commands or alphabets to flickers having different frequencies [16,17]. BCIs can determine the intended commands of users by detecting the peak frequency from observed EEGs. SSVEP-based BCIs achieve the highest information transfer rate (ITR) among state-of-the-art BCI implementations [16][17][18]. There are many extensions and hybrid BCIs for SSVEP-based BCIs. For example, [19,20] extended the SSVEP-based BCI by utilizing the phase of the stimulus in addition to frequency information.
The command of SSVEP-based BCIs is determined by a frequency analysis of the observed EEGs. In the early days, Fourier analysis or spectrum estimation methods were used [21,22]. Then, canonical component analysis (CCA)-based approaches were developed [23]. There are various extensions for CCA-based methods that determine the command of SSVEP-based BCIs. Multi-way CCA (mwayCCA) was developed for tensor EEG data [24]. Phase-constrained CCA (p-CCA) was proposed for phase-modulated SSVEP BCIs [25]. Filter bank canonical correlation analysis (FBCCA) decomposes EEGs by using a sub-band filter bank and then applies CCA to each sub-band signal [26]. The individual template-based CCA (IT-CCA) integrates the standard CCA and the canonical correlation between individual templates and test data [27].
Although the SSVEP-based BCI achieves performances well among state-of-the-art BCI implementations, it and its signal processing methods still have drawbacks to be addressed. LCD is often used to present SSVEP stimuli, and it normally has a fixed refresh rate of 60 Hz. Thus, the stimulus frequency is basically limited to a sub-multiple of the refresh rate. Although an approximation method for generating a stimulus in which the frequency is not sub-multiples of the monitor refresh rate was proposed [28], it is not an essential solution and it requires a longer analysis time window. Another problem is due to Fourier analysis or CCA-based feature extraction. They extract not only the stimulus flicker frequency but also its harmonic components. Therefore, these methods cannot discriminate two flicker frequencies when one frequency is twice (or three times) that of the other, and these stimulus frequency pairs are not available for current SSVEP-based BCIs. These two problems prevent an increase in the number of BCI commands and ITR.
To address these problems, we propose a complex valued convolutional neural network (CVCNN) for SSVEP classification. To the best of our knowledge, this paper is the first to apply CVCNN to an SSVEP classification problem, and it proves that CVCNN overcomes the problem of stimulus frequency limitation. The combination of feature extraction using the discrete Fourier transform (DFT) and convolutional neural networks (CNNs) has been applied to SSVEP classification and has shown higher ITR performance [29]. However, the biggest advantage of CNNs is the data-driven automatic learning of optimal feature extraction, called representation learning [30]. Feature extraction using DFT may not be optimal for SSVEP data, and the combination of DFT and CNN cannot extract optimal features from EEG data directly. Any DFT coefficient is represented by the complex inner product between an input signal and the discrete Fourier basis; hence, we propose using a complex valued neural networks for feature extraction. DFT is a fixed basis for representing an input signal; on the other hand, a complex weight vector of a neural network is a variable parameter that is optimized with respect to an appropriate criterion represented by a cost function or loss function. In other words, by virtue of complex valued convolutional neural networks and representation learning, the network obtains the optimal feature extraction including DFT. We introduce the activation function using a complex absolute value function that is equivalent to calculating the amplitude spectrum for a fixed complex Fourier basis [31].
We conducted an SSVEP-based BCI experiment in order to demonstrate the performance of the proposed CNN and compared it with conventional methods. In our experiment, the flicker stimulations had pair flickers in which one frequency was twice that of the other. We also show the experimental results obtained using open SSVEP-based BCI data. The experiments demonstrate that the proposed method can discriminate pair commands, and it outperforms the conventional methods for SSVEP-based BCIs.
The rest of the paper is organized as follows. Section 2 explains complex valued neural networks and their learning procedure. Section 3 describes the experimental setting of our SSVEP-based BCI and the open dataset. Section 4 shows the experimental results, comparing the proposed method with conventional methods. Section 5 concludes the paper.
Unlike ordinary real valued neural networks (RVNNs), the activation function of CVNNs has to be considered carefully because complex functions have a restriction in terms of differentiability expressed by Cauchy-Riemann equations. The activation functions of CVNNs are categorized into two groups: the split type f (z) = φ 1 ( (z)) + jφ 1 ( (z)) and the amplitude-phase type f (z) = φ 2 (|z|) exp(j arg z), where j = √ −1 is an imaginary unit, and (z) and (z) are real and imaginary parts of z ∈ C, respectively. For the split type, the rectified linear unit (ReLU) function φ 1 (x) = max(x, 0) or the hyperbolic tangent function φ 1 (x) = tanh(x) is often used. For the amplitude-phase type, φ 2 (x) = tanh(x) is used [39,40]. A review of CVNN and of their applications in signal processing can be found in [41].
Phase information in EEGs or other signals is often unnecessary because it depends on the onset of data. Some SSVEP-based BCIs utilize absolute or relative phase information to increase the number of commands. However, in these methods, the BCI system should carefully synchronize EEG recording and stimulus presentation. Here, we consider discrimination of the stimulus frequency; therefore, phase information should be removed. For this purpose, the authors previously proposed the use of the complex absolute function φ(z) = |z| for the activation function [31]. In the proposed method, we use three activation functions: the amplitude-phase and split types, and the absolute function. The structure of the proposed neural networks is shown in Table 1 and  Table 1). The second layer is the filtering for time index (5-8 in Table 1). Then, the data are flattened and goes through the complex dense layer with the absolute activation function. Finally, the output is given by the softmax layer.

Forward and Backward Propagation
Let W (l) be the weight matrix and o (l) be the output vector of the lth layer of the network. Then, the forward propagation of an L layer CVCNN is calculated by the following equations: where φ (l) is the element-wise activation function of the lth layer and o (1) denotes the complex valued input vector of the network. The set of weight matrices W = {W (l) } l=1,...,L is optimized to minimize a loss function by back-propagation (BP) for CVCNNs. Square-error loss, logistic loss, or softmax loss is often used for classification problems. Let E n (W ) be the loss function for the nth input training sample. Each weight matrix is iteratively updated by where η > 0 is the learning rate.

Split-Type Activation Function
Suppose that we use a split-type activation function. Let u Then, applying the chain rule for real and imaginary parts independently, we have where z denotes the complex conjugate of z and o

Amplitude-Phase Type
In the case of the amplitude-phase type activation function f (z) = tanh(|z|) exp(j arg z), the absolute value and phase of the entries of W (l) are independently updated by the following rule [42,43]: δ arg w (l) where θ

Absolute Activation Function
Since the output of the absolute activation function φ(z) = |z| is real-valued, the layers after the absolute activation function are treated as ordinary RVNNs. Suppose that the activation function of the lth layer is the absolute function; then, the chain rule is given by Equations (6) and (7).

Experimental Setting and Dataset
We used two datasets: (i) our original SSVEP dataset (original dataset) and (ii) an open SSVEP benchmark dataset (open dataset) [44]. The original dataset was designed to demonstrate the distinctiveness of harmonic stimulus flicker frequencies. The stimulus flicker frequencies were 6.0, 6.5, 7.0 7.5, and 8.0 Hz, and their second harmonics were 12, 13, 14, 15, and 16 Hz. With CCA-based methods, it is difficult to classify these harmonic pairs.

Original Dataset
We used a refresh rate of 60 Hz, a 27-inch LCD monitor, and Psychtoolbox-3 for stimulus presentation [45]. For EEG recording, we used the MATLAB data acquisition toolbox, g.tec active EEG (g.GAMMAcap2, g.GAMMAbox, and g.LADYbird), a TEAC BA1008 amplifier, and a Contec AI-1664 LAX-USB A/D converter. The sampling frequency was 512 Hz. A reference electrode was placed on the right earlobe, and the ground electrode was placed on FPz. We used nine electrodes for recording: O1, O2, Oz, P3, P4, Pz, PO3, PO4, and POz.
Five healthy subjects (four males and one female, 26.8 ± 7.43 y.o.) voluntarily participated in the experiment. The experiment procedure was approved by the ethics committee of the University of Electro-Communications, and it was conducted in accordance with the approved research procedure and with the relevant guidelines and regulations. Informed consent was obtained from all the subjects. Figure 2 depicts the stimuli presentation setting of our experiment. The distance from the eyes of the subject to the monitor was 70-80 cm. The target queue was randomly displayed for two seconds, and flicker stimuli were then presented for five seconds. The stimuli were given with a white and black box. One trial consisted of ten randomly ordered target stimuli, and ten trials were conducted for each subject. The subjects took a rest between trials as needed. The subjects were asked not to blink during the stimulus presentation. Due to the latency of the SSVEP, we extracted EEG from 0.14 s after the start of stimulation. The length of the EEG data for one target was five seconds. We rearranged the data and generated data 1000 ms and 500 ms in length with no overlaps. For each subject, the number of pieces of data were 500 and 1200 for data of 1000 ms and 500 ms in length, respectively. We applied band-pass filtering from 4 Hz to 45 Hz. The classification accuracy and ITR were evaluated using ten-fold cross validation.
To utilize the phase modulation, we generated training and test data as follows: (i) applied band-pass filtering; (ii) extracted the data with offset t off from the onset of the stimulus for each trial, with the data length being 500 ms or 1000 ms; (iii) generated training/test data by five-fold cross-validation, i.e., for each target, 12 trials for training data and 3 trials for test data; and (iv) calculated classification accuracy and ITR and averaged them for several offset values t. For the 500 ms data, we used t off ∈ {0, 500, . . . , 3500}, and for the 1000 ms data, we used t off ∈ {0, 1000, . . . , 3000}.
For the CCA and combined CCA, we used the fundamental and second harmonics of the stimulus flicker frequency for the reference signal. We implemented them using Python 3.6.8, Numpy 1.19.1, and Scikit-learn 0.23.1.
For the CCNN, we used the same network structure as in [29], and we applied DFT with zero-padding; then, a complex spectrum from 3 Hz to 35 Hz was input to the network in a similar manner as [29]. The frequency resolution was set to 0.293 Hz. For the CCNN and CVCNN (proposed), the network parameters were fixed or determined by the grid search listed in Table 2 and implemented by using TensorFlow 1.14.0 and Keras 2.2.4 in addition to the software above.

Results
The classification performance of the SSVEP-based BCI was evaluated in terms of the classification accuracy and ITR by using the cross validation for each subject. ITR I (bit/min) was obtained by the following equation: where T (s) is the time to output the command, P is the mean classification accuracy, and M is the number of targets [2]. We assumed that the latency of the subject's attention to the target stimulus was 0.5 s and obtained the ITR.

Original Dataset
We excluded subject 2 from the analysis because the recorded EEG data were inappropriate because some electrodes were not set up properly. Figures 3 and 4 show the classification accuracies and ITR. The proposed method exhibited the best classification accuracy and ITR for all data lengths. Since the stimulus flicker frequencies included harmonic pairs, the CCA-based methods performed worse than the CNN-based methods.  To show the classification performance for the harmonic pairs, we show the misclassification rate for another target harmonic pair in Figure 5 for the 1000 ms data length. For example, the bar of 12 Hz represents the misclassification rate at which the 12 Hz target was misclassified as the 6 Hz target. The figure shows that the CCA-based methods showed higher misclassification rates than the CNN-based methods.  Figures 6 and 7 show the classification accuracy and ITR for the open dataset. Although the dataset did not have harmonic flicker frequency stimuli, the proposed method exhibited the best classification accuracy and ITR for the 500 ms data length. For the 1000 ms dataset, the proposed method showed almost comparable results with the stateof-the-art method, the combined CCA, because the accuracy rates saturated to 100% for some subjects.

Conclusions
We proposed a complex valued convolutional neural network (CVCNN) structure and absolute activation function for the classification of the SSVEP-based BCI. The SSVEP-based BCI is often realized by an LCD display; however, it has a limitation of the stimulation frequency due to the refresh rate. Moreover, the conventional CCA-based SSVEP classification methods (e.g., [23,27]) have the disadvantage that it is difficult to discriminate harmonic frequency stimuli (e.g., 6 Hz and 12 Hz). This problem hinders performance improvement of the SSVEP-based BCI in terms of the number of available commands and ITR. The proposed method overcomes this problem and outperforms state-of-the-art SSVEP classification methods by using feature extraction in the frequency domain of complex valued networks and the representation learning of convolutional neural networks.
Our original data are based on a small number of participants (five subjects and four available data); therefore, further investigation is needed. Future work includes extending the proposed method for phase modulation of SSVEP applications and selfpaced BCI applications.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Conflicts of Interest:
The authors declare no conflict of interest.