4-Class MI-EEG Signal Generation and Recognition with CVAE-GAN

: As the capability of an electroencephalogram’s (EEG) measurement of the real-time elec-trodynamics of the human brain is known to all, signal processing techniques, particularly deep learning, could either provide a novel solution for learning but also optimize robust representations from EEG signals. Considering the limited data collection and inadequate concentration of during subjects testing, it becomes essential to obtain sufﬁcient training data and useful features with a potential end-user of a brain–computer interface (BCI) system. In this paper, we combined a conditional variational auto-encoder network (CVAE) with a generative adversarial network (GAN) for learning latent representations from EEG brain signals. By updating the ﬁne-tuned parameter fed into the resulting generative model, we could synthetize the EEG signal under a speciﬁc category. We employed an encoder network to obtain the distributed samples of the EEG signal, and applied an adversarial learning mechanism to continuous optimization of the parameters of the generator, discriminator and classiﬁer. The CVAE was adopted to adjust the synthetics more approximately to the real sample class. Finally, we demonstrated our approach take advantages of both statistic and feature matching to make the training process converge faster and more stable and address the problem of small-scale datasets in deep learning applications for motor imagery tasks through data augmentation. The augmented training datasets produced by our proposed CVAE-GAN method signiﬁcantly enhance the performance of MI-EEG recognition.


Introduction
Electroencephalogram (EEG) records the electric potential variations from pyramidal neurons in the cortical layers, therefore recognized as the reflection of the brain activity, and can be used to study mind processes [1][2][3]. Although EEG has shown to be a critical tool in many domains, it still suffers from a few limitations that hinder its effective analysis or processing. Owing to the brain activity buried under multiple environmental sources, EEG maintains a quite low signal-to-noise ratio (SNR) [4,5]. Consequently, various filtering and noise reduction techniques including the deep learning (DL) [6] method have been used to minimize the impact of these noise sources and extract true brain activity from the recorded signals. Meanwhile, the DL framework has shown outstanding performance in the field of complex data processing such as text audio signals and images [7], playing an ever-increasing role in industrial applications. By virtue of the sufficient training data, DL can study computational models and learn hierarchical representations of input data through successive non-linear transformations [8,9], indicating the size of the available training data as the restriction of the performance about the identifying model in brain computer interface (BCI) [10,11].
The EEG-based BCI decoding process involves pre-processing, feature extracting, and pattern recognition (classification or regression) [12]. The goal of pre-processing and feature extracting is to extract the target band and channel information from raw EEG and represent it in a compact and relevant manner which is conductive to classification. Regression and classification map the extracted feature vector for probability value or n-categories classification results. Figure 1 shows the general paradigm of a BCI system, receiving brain signals and mapping them into control commands for robotic equipment. The system includes several key components, including decoding and control part. The brain signals are collected from humans and sent to the preprocessing component for denoising and enhancement. Then, the discriminating features can be extracted from the processed signals and sent to the classifier, recognizing and converting the signals into external device commands. Generative Adversarial Networks (GAN) allows synthesis of data from latent representations of samples [13], while relying heavily on optimizing generative models, however, suffering from training instability [14].
In this work, we employed the GAN combined with a conditional variational autoencoder (CVAE) [15] to synthesize EEG signals. By varying the motor imagery category label fed into the resulting generative model, we could generate EEG in a specific category with random noise on a latent attribute vector. We adopted an encoder network to learn the temporal and spectral representations of real EEG samples by simultaneously training a GAN to enforce the learning of the representations. Due to this, our proposed approach had achieved two main contributions. First, this distribution learning of latent representations could help us to generate high imitation of real EEG based on its task features used for data increase and augmentation. Second, benefited from the adversarial learning network, the proposed model also could perform well in motor imagery (MI) task classification through the temporal-spectrum image transformed form the raw EEG. We experimented with public and private MI-EEG data, demonstrating its robust capability of generating realistic and diverse samples with motor imagery labels. Finally, we compared the diverse discussions based on methodological choices of the models with evaluation metrics, thus revealing the optimal results obtained from this study.
Our approach had two novel aspects. First, we adopted CVAE as the generative model for feature sub-space and synthetic EEG reconstruction by virtue of class information of MI-EEG samples. Second, we constructed generative adversarial patterns according to the feature sub-space, taking advantages of both statistic and pairwise features to make the training process converge faster and more robust, finally paving the way to improve the efficiency of the EEG-based BCI system.

Related Work
Conventional research of generative models, including principal component analysis (PCA) [16], the Gaussian mixture model (GMM) [17], and independent component analysis (ICA) [18], assuming a simple formation of the data set. Later, restricted Boltzmann machines (RBMs) [19] and the Markov random field (MRF) [20] were hindered for the reason of lack of effective latent representations.
Different from the present promising generative models, the GAN architecture consists of two opposing networks trying to outperform each other through a min-max game process. GAN is capable of modeling complicated and high-dimensional distributions, thus trains the learning model to generate data more similar to real samples. Nevertheless, the GAN model faces the converging problem in the training stage leading to the wide discrepancies between generated samples and the natural ones [21]. Meanwhile, the GAN cannot accurately represent the intrinsic characteristics of the normal samples because of the latent manifold providing little useful information [22]. To improve the quality of GANbased detection, Mean and covariance feature matching GAN [23] was employed to provide mean and covariance feature matching, thus restricting the range of the parameters for reducing discrimination. Loss-sensitive GAN [24] tries to learn a loss function quantifying the quality of generated samples and uses this loss to generate high-quality data. In this paper, we manage to combine GAN and VAE, learning latent both spatial and temporal representations of real EEG. Due to imperfect measures such as the squared error and the injected noise, the generated signal samples are often distorted, viewed as a disadvantage of VAE [25], which can be made up nicely through repeated adjustment by the GAN model.
Our model aimed to decode the MI-EEG by simultaneously training a conditional VAE and GAN network, enforcing the learning of the EEG data representations. Besides, we utilized statistic feature fusion to make the training converge more stable and fluent.

Methods
In this work, firstly, we employed a data preprocessing step based on short-term Fourier transform (STFT) to convert MI-EEG signals (C3, C4, and Cz electrodes) to a set of time-spectrum images. Then, a CVAE-GAN architecture was proposed for EEG signal generation and classification from a latent learning representation, obtained from an adversarial learning process. Considering that CVAE generated images are decent but fuzzy and the ones from CGAN are clear but contain a significant change, CVAE-GAN was used as an appropriate solution. For example, when we go to generate the motor imagery EEG, CVAE-GAN can calculate the class probability of the EEG data with its additional part. Therefore, it can make use of the latent class in training data to generate samples regularized by Gaussian distributions with learnable statistic values. The proposed CVAE-GAN architecture for EEG mainly consists of three parts: (1) The encoder network, mapping the EEG sample x to a latent representation z through a convolution neural network (CNN). (2) The generator network, generating fake EEG signals with respect to a latent vector. (3) The recognition network, being trained to distinguish between real and fake EEG data and measuring the class probability of the data from the real and generated input. Then, all the parts would be guided to adjust itself.

Datasets and Conversion Based on STFT
We evaluated our approach on a private EEG dataset collected by ourselves and additionally a public dataset (Competition IV Data sets 2a). The data were separated into 70% training set and 30% test set repeatedly for cross-validation. The data detail is shown in Table 1. Throughout the experiments, subjects were scheduled in front of a computer screen and instructed to perform motor imagery tasks including executing the movement of the right or left hand. Each trial took 8 s with same inter-trial resting period's length. The fact is that the energy in mu band (8)(9)(10)(11)(12)(13) Hz) observed in the motor cortex of the brain decreases, called event-related de-synchronization (ERD), meanwhile the energy increase caused in the beta band (17-30 Hz) is called event-related synchronization (ERS) after executing the MI task. MI tasks were found to cause ERD and ERS, respectively, on the right and left sides of the motor cortex affecting EEG signals at C4 and C3 electrodes. Cz is mainly affected by both tongue and feet MI tasks. Consequently, the datasets we used in this work included recordings from three electrodes (C3, Cz, and C4) of left/right handle MI task and short time Fourier transform (STFT) was applied to converting the 2 s long sequential EEG trial to temporal-spectrum power image. A set of time-frequency power image was generated by a short time window sliding along the time axis of the sequential EEG signal. The STFT can be defined as: where h(t) is a window with a limited number of nonzero points and τ is the window position on the temporal axis. In the case of the 250 Hz signal, this corresponded to 500 samples. The window size of STFT was set to 64 and its number of points to overlap between segments was set to 50. STFT was computed for 32 windows over all samples leading to are 257 × 32 as the size of the spectrum images, where 257 presented the number of sample frequencies and 32 meant the number of segment times. Ultimately, mu and beta frequency bands of the spectrum images were extracted. The final input sizes of the normalized image for mu and beta band were empirically chosen as 32 × 32.

CVAE-GAN Construction
CVAE models a generator as a pair of encoder and decoder networks. Compared to VAE, CVAE is able to generate data based on certain attributes. The encoder learns a latent representation z from the data x on condition of the y belonging to a specific category, while the decoder aims to predict the specific category according to the learned representation z. The generated data will be distorted owing to the deficiency of input and output. Fortunately, the generated performance will be improved when it combines with the discriminator. On the other side, the generator of GAN is isolated from real EEG, thus the introduction of the encoder will make GAN more stable. The whole CVAE-GAN architecture is shown in Figure 2.
Considering the goal of generating process under the control of the specified category, label dates were extra input to the encoder and decoder. We feed the training samples and its corresponding label to the encoder, then concatenate the hidden representation with the corresponding label and feed it to the decoder to train the network. Ultimately, we could generate data with the specified label by feeding the decoder with the noise sampled from the Gaussian distribution and the assigned label after the training process. Therefore, the CVAE can be formulated as: We employed the CNN as the implementation architecture of the encoder and decoder. Their mapping details are illustrated in Table 2. The input images were convolved with the kernels to be trained and were then put through to generate the map of output in the convolution layer [26]. In a given layer, the map of k-th features can be obtained as: where 20 and 40 kennels were employed in the first and second layer, respectively. The max-pooling layer is a completely connected layer for different dimensions. The feature mapping after convolutional kernel was down-sized into a smaller sample in the pooling layer. With the proposed approach, the network is fed with the labeled training set, and the error E is computed taking into account the fact that the desired output is different from the output of the network. Subsequently, the stochastic gradient descent algorithm [27] was applied to minimize the error occurring with changes in the network and thus to optimize the network weights and filters in the convolutional layer. The output of the encoder formed vector z through a two layers CNN, which extracts the temporal, spectral, and spatial features from the 3D power image representation of MI-EEG. The encoding vector h was mapped into two feature compression layers in order to calculate the mean µ and variance σ of the neural networks. The latent representation z is calculated as: where ε means randomly sampling from standard normal distribution N(0,1) and denotes an element-wise product. The covariance matrix of z is diagonal. Because we expected that the components of latent vector z should be mutually independent so as to makes them informative. After every convolution operation, we adopted batch norm to improve the model training efficiency and nonlinearity. The 50% dropout rate was used to prevent from overfitting. The generator mainly consisted of deconvolution and bilinear interpolation upsampling processes aiming to restore the signal time-spectrum image through the learned features. The generator was used to efficiently identify real and false signals under the specific category condition. The decoder composed of the discriminator and classifier was employed to discover abnormal power image signals and classify out the MI task results.

Optimization of Training Process
The encoder E maps the input data x to a latent representation z through a learned distribution P(z|x r , c) , where s represents the category of the data. The generative network G generates EEG data from a learned distribution P(x f z, c) . The function of the generator and discriminator is similar to that in the generative adversarial network (GAN). The generator tries to model the real data distribution according to the gradients given by the discriminator result learning to distinguish between real and fake samples. In the encoder of the CVAE framework, L KL is employed in the encoding network, indicating whether the distribution of latent variable is similar to the expected distribution, as shown: where the loss function consists of the reconstruction loss of the decoder and the variational posterior encoder loss, this framework was suggested to enforces representation for z with respect to s. However, as is universally known to be an unperfect achievement in generation practice, which we applied GAN to tackle this problem. The GAN consists of a generator G (the CVAE) and a discriminator D, where the generator loss and the discriminator loss can be illustrated respectively as: The effect taking from the discriminator can be denoted as: In classifier part, the classification loss can be denoted respectively as: The goal of the CVAE-GAN is to minimize the loss function as follows: The whole training pipeline of the proposed algorithm is shown in Algorithm 1.

Algorithm 1 Training Process of the Proposed CVAE-GAN
Initialization: m is the batch size. n is the number of category, epoch is the number of iteration Initialize the different network (E, G, D) with random weights. 1: Sample {x r } ∼ p r m batch from real EEG data 2: Map feature parameters µ, σ ← E(x r , c r ) using Encode network. 3: Resample feature parameters to obtain latent representation z 4: Generate conditional samples x g ← G(z, c r ) through the generator network. 5: Feed the real and generated samples into the decoder network for authenticity identification Output: End untilL has converged and save all network parameters.

Features Extracting and Convergence
For a better elaboration about spatial feature extracting, the one-task average features power of the real and the generated EEG according to the 16 electrodes' positions topographic maps have been illustrated in Figure 3. We can find high spatial match between the real and generated EEG power and apparent variation characteristics on ERD. Figure 4 shows the training curves of the proposed model for one participant. The generator loss gradually decreases to reduce the distance among the distribution of generated samples. After approximately 6 epochs, it seems that both networks' losses are converging to some constant values and the whole training becomes stable. The other subjects' training results on data show homologous trend. Figure 5 plots the average of the real and generated (labeled as fake) samples of each individual subject (S1-S3) from D1. The blue and the green colors represent the real and the artificial fake data, respectively. It can be observed that there is a high match between them. Furthermore, we can find the diversity generated from an individual which suggests that the proposed model can learn a complicated temporal variation from real samples rather than copy the same ones repeatedly. This result can pave a smooth way for exploring the possibilities of the practical application of generating EEG samples considering individual diversity.

Generation Evaluation
We evaluate the proposed CVAE-GAN-based approach with the following frameworks: (1) CVAE, (2) convolutional encoder (CNN), (3) conditional GAN (CGAN) [28]. For each dataset, 70% of normal samples are randomly used to constitute the training set. The testing set is composed of all abnormal and the rest normal samples. Table 3 shows metric results for different architectures. Inception score (IS) is a common method to provide information for the quality of the trained generator. The Frechet inception distance (FID) is also used to give a better evaluation to the quality of the generated samples through calculating the distance distribution between the real and generated samples. The sliced Wasserstein distance (SWD) [29] is employed to approximates the Wasserstein distance by computing projections of the two distributions. These metrics give evidence for the quality of the generated model to some extent. The test accuracy of the classifier for calculating the IS and FID was 86.14%. Obviously, CVAE-GAN performs the best in IS and SWD metrics. CNN outperforms in FID but is relatively worse in other metrics. Overall, CVAE-GAN and CGAN are the better architecture choices for classified EEG generation.

Classifier Performance
We utilized a similar classifier to compare the recognition performance involving before and after the data augmentation. The augmented part (generated samples) accounted for a quarter of the real training sets. Figure 6 shows the performance of testing classification with all subjects of D1 and D2. There was obviously growth of the classification accuracy after data augmentation both in D1 and D2 data. Moreover, we found the overfitting evidence in D2-S4 and reduced deviations of accuracy in D2, indicating improvement of the robustness on different subjects with appropriate data augmentation. Figure 7 shows the classification results of different models obtained on D2. It can be observed that the proposed CVAE-GAN is superior to other methods and achieves the best accuracy on D1 and D2, manifesting its superiority of adversarial models with the latent generation code and the generalization ability to automatically extract latent features from the subject's diversity. We can also find that the adversarial model CGAN can achieve better results but not in all subjects. This suggests that the latent representation encoding can be helpful for subject invariant representations, inspiring us to explore subject-invariant features further.   Figure 8 shows the confusion matrix and average accuracies of four classes all over the subjects of D2 compared with two methods including CGAN presenting conditional adversarial processing for EEG signal discrimination as well as CNN using the convolution processing for recognition about an 2D time-frequency target MI-EEG transformed by STFT. The lower-right value corresponds to overall accuracy. Bottom row depict sensitivity and rightmost column lists the precision which is defined as the indicator of specific classification characteristics. CVAE-GAN models have an obvious improvement in four MI-task classification, better than CNN being trained without appending generated data. The result also demonstrates the enhancement of MI-EEG-based recognition by appending generated data with the proposed framework. Figure 9 illustrates the confusion matrix corresponding to recognition with real and synthetic data. The recognition in real EEG has achieved 4 percent, exceeding the synthetic source. We were also conscious of the appearance that of its relatively high sensitivity and precision in tongue and feet movement imagery task in synthetic EEG data, revealing that the proposed generative architecture obtains advantages in single channel feature learning due to the fact that discriminating information of tongue and feet, is mainly contained in the Cz channel.

Efficiency Analysis
Generally, deep learning algorithms require substantial time to execute, bringing limitation of their suitability to BCI applications which typically require close to real-time performance. For instance, the practical deployment of a BCI system could be limited by its recognition time-delay if it takes two minutes to recognize the user's intent. In this section, we will focus on the running time of our approach and compare it to the widely used baselines. As shown in Figure 10, CNN required the least training time as the result of its simple framework and weight. Furthermore, employing CVAE as the generative model could effectively reduce the consuming time during adversarial training. However, training is a one-off operation. For practical considerations, the execution time of an algorithm during testing is what matters most. The testing time of our approach is less than ten seconds, as similar as other baselines.

Discussion
We had demonstrated that an unsupervised model using CVAE to learn the statistical structure of MI-EEG input data could effectively augment and generate EEG data so as to address the problem of small-scale datasets in deep learning applications for MI tasks. The public dataset of BCI Competition IV dataset and private dataset collected in our lab were used to evaluate the method.
The generative task is supposed to synthesize 2D format EEG for existing statistical distribution. First, we tested these three kinds of data generating methods. Then, we analyzed and compared the generated and real samples in different data sources from subjects. However, with some generating differences and randomness still existing, the experiment results revealed the approximate variation tendency and distribution guiding the goal and direction of our future exploration of improved methods and synthesis.
As the second work, the MI-EEG synthesized from our model had else revealed its usage in data augmentation or for training better facing recognition model. After this study, we would clearly recognize the importance of data augmentation and the challenge of useful feature capturing under an insufficient data situation in accordance with the experimental results. Furthermore, applying the CVAE as the generated model is capable of improving the robustness on adversarial training, aiming to take advantages of both statistic and feature matching to make the training process converge faster and more stable.

Conclusions and Future Work
This paper studied the application of the CVAE-GAN in the classified motor imagery EEG generation with the various prospective usages in BCI systems, for instance, reconstruction of the corrupted data and non-homologous data augmentation. In order to achieve high quality generation under the 4-class conditional MI-task EEG, a combination of latent representation and the adversarial network is proposed to learn subject-invariant representations and to make the generated samples approaching the reals. Compared with other generated models on the public and private dataset, the proposed approach generated EEG samples from different subjects according to its MI-task, and experimental results show an effectiveness of the CVAE-GAN. In experimental evaluation, the training based on data augmentation with the proposed framework enhances the performance of MI-EEG recognition.
In the future, this study should be further continued to evaluate how the generated EEG data affects the performance of the BCI system and transfer learning approach that explore and develop some shared structure in the classified EEG data.