Auto-Denoising for EEG Signals Using Generative Adversarial Network

The brain–computer interface (BCI) has many applications in various fields. In EEG-based research, an essential step is signal denoising. In this paper, a generative adversarial network (GAN)-based denoising method is proposed to denoise the multichannel EEG signal automatically. A new loss function is defined to ensure that the filtered signal can retain as much effective original information and energy as possible. This model can imitate and integrate artificial denoising methods, which reduces processing time; hence it can be used for a large amount of data processing. Compared to other neural network denoising models, the proposed model has one more discriminator, which always judges whether the noise is filtered out. The generator is constantly changing the denoising way. To ensure the GAN model generates EEG signals stably, a new normalization method called sample entropy threshold and energy threshold-based (SETET) normalization is proposed to check the abnormal signals and limit the range of EEG signals. After the denoising system is established, although the denoising model uses the different subjects’ data for training, it can still apply to the new subjects’ data denoising. The experiments discussed in this paper employ the HaLT public dataset. Correlation and root mean square error (RMSE) are used as evaluation criteria. Results reveal that the proposed automatic GAN denoising network achieves the same performance as the manual hybrid artificial denoising method. Moreover, the GAN network makes the denoising process automatic, representing a significant reduction in time.


Introduction
An electroencephalogram (EEG) is a brain signal generated by a non-implantable brain-computer interface (BCI). The EEG signal can reflect a variety of brain activities obtained by attaching electrodes to the scalp. Therefore, the EEG signals can be easily affected by unexpected noise such as eye blinking, heartbeat etc. These noisy signals may generate higher energy than the original EEG signal, making EEG-based research difficult. Thus, denoising is an important area of EEG research.
Head muscle movements, blinking, eye movements heartbeat are the common factors to generate the noise. Traditionally, the commonly used denoising methods are wavelet transform (WT) denoising, independent component analysis (ICA) denoising, and empirical mode decomposition (EMD) denoising. References [1,2] used wavelet denoising based on soft or hard threshold selection. First, the signal was normalized, and then the signal was decomposed by a wavelet. The researchers used soft and hard thresholds to remove the noise part from the decomposed sub-signals and the filtered signal was reconstructed. Compared to [1], Reference [2] used different threshold selection methods and different wavelets to decompose the signal, which were then tested separately using different noises. In addition, an adaptive filtering method was employed, which effectively removed the involved in the training of the model, the model could still extract the noisy features of the EEG signal and remove the noise.
Another problem is that it is challenging to generate EEG signals by using the GAN model. When using GAN to generate images, the range of the image data is normally between 0 and 1, while the range of the EEG signal is uncertain. Some channel signal amplitude values are higher, above 50 µV, but some are lower, between −10 µV and 10 µV. Sometimes, when humans blink their eyes, move their facial muscles, or when other effects occur, the amplitude value can suddenly extend into the hundreds of micro volts. Thus, if we do not do any normalization, the range of the new EEG obtained is not controllable, meaning that the GAN is also unstable. Although the wave shapes of some signals generated by GAN are similar to the target signal without any normalization, it can still lose some signal information. Therefore, it is very important to control the range of input EEG data before using the GAN model. We propose a sample entropy threshold and an energy threshold-based normalization method to overcome these problems. This method can check the abnormal parts of the signal and limit the range of the EEG signals, which is particularly helpful in relation to the GAN generating EEG signals.
In summary, there are two significant contributions to this paper. Firstly, sample entropy and an energy threshold-based data normalization method are introduced. The purpose is to apply the idea of image restoration to EEG signal denoising. However, EEG differs from image filtering in that the amplitude of the EEG signal has no fixed range. The GAN model may generate abnormal signals, such as a signal with a strange shape that already loses the original EEG characteristics. Thus, before using the GAN model, the EEG signal should be normalized. Using the proposed normalization method, the GAN model can stably generate the EEG signal because the value of EEG data is limited to a small range, which can be seen as some images. Compared to other normalization functions, such as tanh function, the proposed method can avoid information loss because part of noisy EEG signals are selected and processed before inputting into the generative model. Thus, during the model training process, the performance is not influenced by the mutation signals. The proposed normalization method is thereby helpful in terms of the GAN model stably generating EEG signals.
Secondly, the GAN-based blind denoising method for removing noise is proposed. The GAN model can replace artificial denoising and still obtain the same filtering performance. Compared to the artificial denoising method, our model is automatic, which reduces the computation time. In addition, when our model uses the different subjects' data for training, it can still apply to the different subjects' data denoising. The GAN model has an extra discriminative dimension compared to the deep network denoising model. The discriminator always judges whether the noise is filtered out. The generator is constantly changing the denoising way. During this process, the discriminator and generator update their parameters to improve the filtering performance.

Methodology
The proposed GAN-based denoising system is shown in Figure 1. There are three steps: The EEG data are pre-processed, including frequency filtering, re-reference, and baseline removal; 2.
The processed data are normalized to a range. This step uses energy threshold and entropy threshold to select the noisy part of the EEG data; 3.
The original EEG data are input into the trained generator of the GAN model to obtain the cleaned EEG data.

Data Pre-Processing and Normalization
The raw EEG data are filtered by 1 to 40 Hz because the main features of EEG exist in this frequency band [22]. This can also remove some high-frequency noise, such as 50 Hz or 60 Hz linear noise. Re-reference involves using the raw EEG data to minus the mean value of all the channels. The function of this step is to correct the reference electrode close to zero.
Next, we develop the sample entropy threshold and energy threshold-based normalization method, as shown in Figure 2. We aim to normalize the original EEG signal to limit the range of the signal. If the subject blinks, the channel's signal near the eye can experience a sudden and significant fluctuation. However, if we normalize the data by using the signal to divide the max value, it can result in the overall energy being particularly small. Thus, we need to find a way to obtain the noisy position of the original signal and process the bad parts before we normalize the data. First, sample entropy detection is used [23]. Entropy is a measure of signal confusion. If a signal is abnormal, the sample entropy of this signal should be minimal. Thus, we define a sample entropy threshold k. When the signal sample entropy is greater than k, the signal is defined as a normal signal. When the entropy value of the signal sample is smaller than k, there may be a sudden change in the signal, and the signal will need to be processed. The value of k is selected based on experience. This threshold is used to check whether the signal fluctuates sharply and the value selection is not sensitive. It is related to the signal length. Usually, for EEG signals, the range of sample entropy is from 0 to 1.5. When the signal is seriously affected by eye movement, the sample entropy should be very small because it does not contain much EEG information; however, it only has a significant fluctuation. Generally, sample entropy less than 0.2 means that the data have a lot of information loss.
We can manually select some data that are slightly affected by eye movement to calculate the mean sample entropy SE a by using all channels' data. Select some front channels' signals to calculate the mean sample entropy SE f . Then, select some central channels' signals to calculate the mean sample entropy SE c . The data of the channels close to the eye may be significantly affected by eye blinking; therefore, the mean sample entropy should be small. The data in the central area of the brain are ordinarily stable; therefore, the sample entropy should be large. If there is not much difference among the three sample entropy, SE f can be used as the threshold k. If the three entropy is quite different, a value can be selected between SE a and SE c as the threshold k.
Next, we calculate the entropy of the signal for each channel and select the signal that needs to be processed. The noise of the signal may appear at a certain time point or a short period. Therefore, we first divide the signal into n segments. For each segment of the signal, the mean energy is calculated. We use the L1 norm of the signal as the energy. The energy of the sudden change signal should be higher; therefore, we use the same means to set an energy-based threshold e to select the signal segment that needs to be processed. When the mean energy is less than e, the signal is normal. The segment may contain sudden fluctuation when the energy is greater than e. When calculating the normalization coefficient, this part should be ignored. Usually, the energy of the EEG signal should be below 30. When the artefact noise comes, the mean energy should be over 50. Thus, 50 can be used as a reference of threshold e, which is applicable for most EEG signals.
Finally, we calculate the absolute max value of normal segments of data as the normalization coefficient m. Then, the whole data divides m to achieve the purpose of normalization: where X is original data and S is the normalized data. We record the max value of each group of data and use the parameter m to reconstruct the original signal range after generating a new signal. After normalization, the processed signal into the GAN network is employed to train the model. The selected threshold is not related to the subjects because the selection of the threshold depends on the mean sample entropy. For different subjects, the mean sample entropy is also different. Thus, the threshold is not fixed, which changes dynamically according to the different subjects.
The purpose of the threshold is to select the affected data segment and then pick it out for processing. If the signal is abnormal, the calculated sample entropy would be very small. We assume that the threshold is set as the smallest threshold SE a . Any value smaller than this value must be an abnormal signal segment. For another situation, we assume that we set the threshold as the largest threshold SE c . Although some normal data could also be filtered out, these data can still be filtered back by the energy threshold in the next step.
For a worst situation, if both thresholds are set too large and the normal data are filtered out, then these data will be divided by a small value during normalization. Finally, the signal will still not lose too much information. Thus, the threshold is not very sensitive within the selection range and the selection range is determined automatically based on the mean sample entropy of the EEG signal.

Construct Generative Adversarial Network
In this paper, the generative adversarial network (GAN) is used to denoise the EEG signal. There are two parts-the generator and the discriminator. For the generator, the input is an unfiltered signal and the output is a clean signal. Therefore, we need to obtain some clean signals (e.g., 30% for each subject) to serve as the training data before using this network. Thus, we take some of the data to perform artificial filtering. The filtering method is mainly based on the ICA. We also use the wavelet denoising method [1] and EMD denoising method [4]. Firstly, we use the ICLabel technique to process the raw signal to obtain the estimated probability of each noise. We select several kinds of noise with high probability for further analysis. We use the EEGLAB toolkit [24] to plot the spectrum of each independent component and the distribution of brain energy. Then, we determine whether this independent component is a noise component based on the signal waveform, spectrum and energy distribution. After that, the noise component is removed to obtain the filtered signal. After artificial denoising by ICA, the EMD [4] and wavelet transform filtering methods [1] are used for secondary filtering. Eventually, a relatively clean EEG signal is obtained. Thus, the denoising method can be seen as a hybrid artificial method.

Build Generator and Discriminator Structures
In the network, the generator first compresses the original multi-dimensional signal and then extracts the main features of the signal. After that, it reconstructs the signal according to the feature vector. Therefore, aside from the generator, we also need to define a discriminator. The discriminator is used to determine whether the signal generated is a clean EEG signal. Training the model is to find the Nash equilibrium where the generator can generate clean signals and the discriminator can classify the noisy signals and clean signals. The designed generator and discriminator network structures are shown in Figures 3 and 4, respectively. The suggested parameters are tabulated in Tables 1 and 2.

GAN Model Training
The training process is shown in Figures 5 and 6. Referring to Figure 5, the noisy EEG data S are input into the generator, and the output are the generated EEG data G(S). The generated data can be denoised data or they can also be the data still with noise. Next, the generated data G(S) and the clean data y are input into the discriminator. The purpose is to make the discriminator have ability to classify noisy and clean data. Next, the output of the generator is the input of the discriminator can produce the outputs D(y) and D(G(S)). They can be regarded as the variables that reflect whether the output of the generator are clean EEG data or not. The discriminator can be seen as a binary classification classifier. The loss L D of the discriminator depends on these two outputs. If the output value is large which means that the data are clean; otherwise, the data are noisy.  For the generator (shown in Figure 6), the loss L G depends on the feedback from the discriminator. We assume that the discriminator can accurately classify the clean data y and generated data G(S). Then, the purpose of the generator is to generate more clean data to make the output value D(G(S)) from the discriminator become very large. At the same time, the mean square error between the clean data y and the generated data G(S) should also be considered as another updating factor because the generator should also ensure that the generated data G(S) cannot lose too much original data information. Therefore, the mean square error and the output of the discriminator are jointly used to calculate the loss L G of the generator.
The output value of the discriminator can be regarded as the degree of denoising level. When the clean data are input into the discriminator, the output value is expected to be large. When the noisy data are input into the discriminator, the output value is expected to be small, which means that the discriminator has the ability to classify the clean data and noisy data. When the input of the discriminator is from the generator, a large output value of the discriminator is returned, which implies that the generated data are clean. Otherwise, the generated data are noisy when a small value of the output of the discriminator is returned. Based on this condition, this value will become large when the generator is well trained. It means that the trained generator has the ability to remove the noise. Thus, for the generator, the output value from the discriminator is expected to be as large as possible if the noise is successfully removed.
Referring to Equation (3), the loss function of generator should have two loss items. The first item is the mean output of discriminator D when inputting the target data y. The purpose is that when the generator's output is input into the discriminator, it is best if the value of the discriminator output is as large as possible. The second item is the mean output of discriminator D when the generative data G(S) are used as input. The purpose is that the contribution of a signal output by the generator should be as close as possible to the contribution of the artificially filtered signal.
Referring to Equation (2), the loss function of discriminator should also have two loss items. The first item is the mean output of discriminator D when the generative data G(S) are used as input. The discriminator should have the ability to distinguish whether the signal is an artificially filtered signal or a generated signal. When the artificially filtered signal comes, the output of the discriminator should be as large as possible. When the generated filtered signal comes, the output of the discriminator should be as small as possible. The second item is the mean square error between the generated signal G(S) and target signal y. When the model is used to filter the signal, the filtered signal should keep as much original information as possible.
The traditional GAN network uses Jensen-Shannon (JS) divergence to measure the distance between two distributions [25,26]. Multi-dimensional signals are the same as images. They are all high-dimensional data; therefore, the target distribution and the real distribution are difficult to overlap in space, and they are difficult to train using the traditional GAN. Therefore, the generative model in this paper is based on the Wasserstein generative adversarial network (WGAN). The distance between the two distributions is measured by the earth-moving distance [27]. Then, we combine the conditions mentioned above with the loss of WGAN. The final loss function is defined in Equations (2) and (3): where α and β are used to balance the loss of the generator. Because the GAN model is composed of two networks, the learning process of the two networks should be balanced.
In other words, the ability of the discriminator to determine whether it is noise can also not be too good or too bad. Thus, the two balanced networks can improve their performance together. The ability of the generator reaches its limit if both mean square errors and the output of the discriminator are almost unchanged. If we continue to train the model, the performance of the discriminator may change, resulting in the performance of the two networks being out of balance. This, in turn, results in the poor performance of the generator. Therefore, we should stop training the network when both the mean square error and the output of the discriminator no longer change or decrease slightly. We calculate the mean square error between the output of the generator and the clean EEG data in each iteration during the updating process: where U kc is the k th trial c th channel of clean data. V kc is output of the generator. N is the total channel number in each trial. K is the total trial number. MSE(U, V) is the mean square error of U and V.
If the error does not decrease anymore which means that the network cannot gain its denoising ability. As a result, the trained generator can be used as a noise filter. The processed and normalized EEG signal are fed into the generator to perform the signal denoising operation.

Dataset
In this study, the HaLT dataset is used to demonstrate the performance of the proposed denoising method, which was published in 2018 [28]. There are 12 subjects and the trial number of each participant in each category is about 160. There is a total of approximately 950 trials for each participant with six imaginary motion states. There are 22 channels of data. However, channels 11 and 12 are reference channels while channel 22 is the pulse signal channel; thus, there are only 19 valid data channels. Participants make corresponding imaginary behaviors according to the random prompts on the screen. The sampling frequency of this dataset is 200 Hz, and in each experiment, it only collects data for about one second. To facilitate the experiment, we cut the data of each experiment to a length of 200. The data for this period are motor imagery data which are obtained when the subject sees the instructions. These data are pure motor imagery data, excluding the rest data and preparation data. Motor imagery tasks include left hand, right hand, left leg, right leg, and tongue imagery movement. The number of trials for each subject in the HaLT dataset is tabulated in Table 3.

Experiment
There are two groups of experiments. The first experiment aims to denoise the data using the network which is trained by single-subject data. The second experiment aims to denoise the data using the network which is trained by multiple subject data. The experiment is mainly based on the HaLT dataset. For the HaLT dataset, the data size is 19 × 200. The channel number is 19, and the data length is 200. The dataset has a total of 12 subjects. In the experiment, the sample entropy of the signal with sudden and large fluctuations is below 0.2, and the sample entropy of the signal with slightly abnormal fluctuations is between 0.2 and 0.3. Thus, the sample entropy threshold k is set to 0.3. When the sudden change signal occurs, the energy should be above 50. The energy of the normal signal is between 20 and 30. Hence, the energy threshold e is set to 50. The processed EEG data are normalized. The optimizer is a root mean squared propagation (RMSProp). The learning rates η of both networks are 0.0002. In the loss function, the balance parameter α is set at 0.1 and β is set at 1.

Denoise Using Single-Subject EEG Data
We perform the GAN denoising method on the HaLT dataset. Firstly, we artificially denoise 30% of the data of every single subject and then use the original signal as input. Furthermore, 70% of the data of every single subject are used as a testing set. The artificially denoised signal is used as a real sample to train the generator and discriminator. After 500 epochs of training, the GAN model is obtained. The test sample is then input into the generator.
We use two evaluation methods. The first is the mean correlation. The filtered signal should be as close to the original signal as possible because the filtered signal should not lose the original information of the signal as much as possible. The higher the similarity, the more information the filtered signal retains from the original signal. The second evaluation method is to calculate the RMSE. Compared to the signal before filtering, the filtered signal will lose noise signals. Thus, the energy may decrease; however, it should not be by too much because the filtered data should retain as much original information as possible. When the correlation is higher and the RMSE is lower, the filtering performance should be better. The mean correlation and the root mean square error are shown in Equations (5) and (6) below.
where X is the original signal and X is the denoised signal. X kci is the k th trial, c th channel i th point of signal X. σ is the variance. K is the total number of trials. N is the total channel number in each trial. The second evaluation measures the RMSE.
where X ki is the k th trial, c th channel of signal X. K is the total number of trials; N is the total channel number in each trial. RMSE(X, X ) is the root mean square error of X and X . The results are tabulated in Table 4. There are some graph results, which are shown in Figures 7-9. They can be divided into three groups: the EEG data that are slightly affected, the EEG data that are moderately affected, and the EEG data that are seriously affected by noise signals. The top side of the figure is the original EEG signal while the bottom is the result after filtering.

Denoise Using Multiple Subject EEG Data
We merge five subjects' data and select 30% of the data as training data to train the GAN model. Then, we use the remaining data of these five subjects and the other seven subjects' data as the test set to test the denoising network separately. The five subjects are A, B, C, G, and H because these subjects contain all the types of noisy data, including the EEG data slightly affected by noise, the data moderately affected by noise, and the data seriously affected by noise. Therefore, in this process, the network can learn more features of the different types of noise. The purpose of this experiment is to prove that the GAN model can be used not simply on the EEG data which belong to the training subjects, but also on the data which belong to the subjects that have never been used before. In this experiment, the training data pertains to the five subjects. The remaining data of the five subjects and the other seven subjects are the EEG data which have not been used in the network before. The results are tabulated in Table 5.

Discussion
The first experiment uses single-subject data to train the model and test the model using each subject's data. From Table 4, it can be seen that most of the noise is successfully filtered out. By observing two evaluation indicators, most of the filtered signal can maintain most of the original information. However, some signals do not get good filtered results. For example, the similarity of subject C is worse. This is because the data of subject C has a large amount of data that is seriously affected. Some channels are close to the ears and eyes. The amplitude change in these channels of data is volatile. Therefore, the RMSE between the filtered and original data is relatively large.
The second experiment employs multiple subjects' data to train and test the model using each subject's data. From Table 5, merging all the data is better than the testing data for each participant, especially for subjects C and G. Because the training samples of the data are increased, these data samples contain different types of noise signals from different subjects. There are more noisy signals, which allows the network to learn different forms of denoising. Therefore, in practical applications, it can select a variety of noise signals from different participants to train the generative model so that the trained network can be used to various denoise types of EEG signals.
We are more concerned about whether the denoised signal can still retain the original data information during the denoising step. We can analyze the data from the time and frequency domain. The RMSE is used to measure the degree of information retention of the original signal in the time domain. We have already evaluated the data by this indicator in previous experiments. From the results, little time information is lost. Thus, we continue to analyze the data from the frequency domain. The data are a motor imagery signal. The useful frequency information should exist in alpha and beta bands. Thus, we focus on whether the denoised signal retains the original frequency information in the 5-25 Hz band.
There are four main types of noise in EEG signals: eye movement artifact noise, EMG noise, heartbeat noise, and power line noise. Eye movement noise is in the low-frequency range, normally 1-4 Hz. It appears as a sudden fluctuation in a short period. The EMG noise is a signal over 30 Hz within a significant fluctuation. Power line noise is a 50 Hz sinusoidal signal. Since the signal is filtered at 1-40 Hz in the pre-processing stage, we do not need to consider this type of noise. Heartbeat noise is difficult to accurately distinguish in the frequency domain; however, the heartbeat is a regular noise so that it can be easily determined in the time domain. However, the data used in this paper are short time data; therefore, the heartbeat noise can also be ignored. Figure 10 presents a group of signals with eye movement artifact noise in the time domain. From the figure, the signals have significant sudden fluctuation in the front channels. After denoising, we can see that these sudden fluctuations are removed in Figure 11. The shape of the signal in other channels is almost unchanged in the time domain. We also draw the mean frequency spectrum of these front channels in Figure 12.
We compare the frequency spectrum of original data and denoised data. It appears that the energy in the low-frequency band is decreased. The power in the other frequency band is almost unchanged. Thus, our model can remove the eye movement artifact noise and also retain the most original data information.    Figure 14. We also draw the mean frequency spectrum of motor imagery-related channels in Figure 15. We compare the frequency spectrum of original data and denoised data. It appears that the energy over 30 Hz is decreased. In other frequency bands, the signals may be slightly affected; however, most of the frequency information is retained. However, we cannot ensure that the EMG noise can be accurately and completely removed because we do not have ground truth. We can only say that this denoising method can be used to reduce the EMG noise and retain the most original information.
Our purpose is to use the GAN model to replace the artificial denoising method. Thus, we also compare the GAN-based denoising method with the artificial denoising method. The correlation and RMSE results are tabulated in Table 6.     From the results, it can be seen that most of the GAN filtering effects are close to those of the artificial filtering performance, and some GAN filtering effects are better than the artificial filtering results. GAN is better than artificial filtering in terms of time and labor costs because the GAN model is an automatic method. Therefore, the proposed GAN network can reduce the processing time. Artificial filtering can be seen as combining ICA with manual selection by human eyes, threshold-based EMD, and wavelet methods. Thus, the generative model can also explain how to combine multiple effective filtering methods.
Compared to other networks, GAN has both a generator and a discriminator. Thus, it can always judge whether the generated signal is still noisy. If there is noise, the model will keep training. While other models cannot judge whether the generated signal is clean or not, they just simply output the filtered signal. We apply GAN to repair the multidimensional EEG signal. A new loss function is defined specifically for EEG filtering, which ensures that the filtered signal can retain effective information and energy. Since EEG is a multi-dimensional signal and the signal is unstable, the normalization method is used to process the EEG signal before inputting that into the network. The processed EEG signal can allow the GAN model to better extract the EEG signal features. Once the denoising system is established, the model can still denoise the signal even though the new subject's signal was never involved in the training of the model.
Due to a large amount of data, there is no guarantee that the artificial method can always filter the data very well because humans need to manually select the destructive components according to personal experience. However, there are no fixed evaluation criteria to remove the destructive components. It can also be seen from the results that some artificial filtering results are not very good. Therefore, denoise performance can be improved if a cleaner signal is used as the learning target.
By using traditional ICA, we need to manually filter each subject, including analyzing the frequency spectrum, the energy distribution map, and the signal waveform. Thus, it takes lots of time because all the steps are part of a manual selection. For the proposed method, after the model is well trained, we just need to feed the target signal into the model for the model to output the denoised signal almost immediately. In addition, the well-trained model can be used for any other subjects. Thus, the automagical model should reduce much processing time compared to manual methods.
This experiment consequently proves that GAN can replace the artificial denoising method. After training, it can filter new noise signals and achieve the same performance as the artificial filtering method. The experimental simulations are implemented by using MATLAB R2020b on a Windows personal computer with Core i7-8665U 2.11 GHz CPU and RAM 16.00 GB. The mean denoising execution time for each trial is 0.039 s. In practical applications, if the target dataset is large, some effective filtering methods should be selected first to filter some of the EEG signals, and then the GAN network can be trained to filter other signals automatically. This saves time and achieves the effect of combining multiple methods.

Conclusions
In this paper, we proposed a GAN-based denoising method. For most noise effects, the GAN denoising model can obtain good filtering performance levels. The correlation between original signal and filtered signal is 0.7771, and the RMSE is 0.0757 when the proposed GAN model is used. Because it is difficult to generate the EEG signal using a generative model, we also introduced a sample entropy and energy-based normalization method which can process the EEG signal before feeding into the model. This can help the generator to produce more real EEG signals. Once the denoising system is established, new subject data can still be filtered even though the EEG signals from this new subject were never involved in the training of the model. Compared to the artificial denoising method, the proposed GAN-based denoising model can achieve similar performance levels compared to artificial denoising method and can make the whole process automatic, which saves much time. Accordingly, the GAN-based denoising method can be used for real-time and online EEG analysis.

Conflicts of Interest:
The authors declare no conflict of interest.