The Study of Influence of Sound on Visual ERP-Based Brain Computer Interface

The performance of the event-related potential (ERP)-based brain–computer interface (BCI) declines when applying it into the real environment, which limits the generality of the BCI. The sound is a common noise in daily life, and whether it has influence on this decline is unknown. This study designs a visual-auditory BCI task that requires the subject to focus on the visual interface to output commands and simultaneously count number according to an auditory story. The story is played at three speeds to cause different workloads. Data collected under the same or different workloads are used to train and test classifiers. The results show that when the speed of playing the story increases, the amplitudes of P300 and N200 potentials decrease by 0.86 μV (p = 0.0239) and 0.69 μV (p = 0.0158) in occipital-parietal area, leading to a 5.95% decline (p = 0.0101) of accuracy and 9.53 bits/min decline (p = 0.0416) of information transfer rate. The classifier that is trained by the high workload data achieves higher accuracy than the one trained by the low workload if using the high workload data to test the performance. The result indicates that the sound could affect the visual ERP-BCI by increasing the workload. The large similarity of the training data and testing data is as important as the amplitudes of the ERP on obtaining high performance, which gives us an insight on how make to the ERP-BCI generalized.


Introduction
The brain-computer interface (BCI) is a system to control the machine without human peripheral neuromuscular system [1,2]. This can effectively enhance the physical ability of the user [3][4][5]. Event-related potential (ERP) is an evoked potential recorded from the surface of the scalp when a person performs cognitive processing (e.g., attention, memory, thinking) with a particular stimulus, which reflects the neurophysiological changes of brain during cognitive processes [6,7]. Less training and high performance make the ERP-based BCI system widely used, e.g., underwater manipulator [8], consciousness detection [9], paradigm research [10,11].
The ERP-based BCI meets a common problem: the performance of the BCI decreases when applying it into the real environment [12,13]. The sound widely exists in the daily life. When the subject uses the BCI to output commands, he/she is unavoidably affected by the sound [14], such as talking voice, robot walking step, the sound of the machine, and so on. Whether the sound affects the performance of the visual ERP-based BCI is unclear, since there are two different views on the influence of the sound. On one hand, some researchers have shown that using the visual and auditory tasks Mental workload is the ratio between the brain resources required by the task and the available brain resources of the operator [27,28]. Since the visual BCI requires the subject to be with a suitable mental workload [29] to reach a high performance, this study assumes that the sound might influence the BCI performance by having an effect on the mental workload. This study proposes to use the speed of sound to adjust the mental workload. The speed of sound represents the number of words played per minute, which affects the amount of information received by the brain. The increase of information input would increase the load of processing them in the brain, which might reduce the efficiency of dealing with the BCI task, since the limited workload of the brain cannot handle the sound with high speed and BCI task simultaneously. Compared with the researches studying the auditory BCI, the novelty of this study is to propose a dual-task that includes both the visual BCI task and auditory task. The dual-task has advantages in activating both the visual and auditory pathways in the brain and is helpful for analyzing the interaction of the two pathways when the subject is under a BCI system.
It is a problem whether the speed auditory task decreases the performance of the visual BCI when the speed of auditory task occupies more brain resources. And whether the auditory task has an inhibitory effect on the visual pathway is also a problem. In order to test the effect of the sound on the visual ERP-based BCI, this study designs a dual-task experiment that requires the subject to do the visual BCI task and auditory task simultaneously. The auditory tasks are designed as three levels to simulate the daily cases in which the subject is affected by the sound in different indexes. National Aeronautics and Space Administration-Task Load Index (NASA-TLX) is used to indicate the subjective index of the workloads. We evaluate the effect by the amplitude, accuracy rate, and information transfer rate. In addition, we use the data collected under the same or the different workloads to train and test the classifier. The results show that the amplitudes of N200 and P300 potentials decreased from 2.59 µV to 1.90 µV and from 3.20 µV to 2.43 µV in occipital area when the workload increased.
The results indicate that auditory noise has interference in the occipital area when inducing ERP. High workload also leads to low accuracy and information transfer rate. The accuracy and information transfer rate (ITR) of the high workload are 90.83% and 27.68 bits/min, which are higher than the low workload. After the experiment, we found that as the brain resource becomes less, the visually-induced ERP is reduced. The decrease of the ERP amplitude makes the classification accuracy and ITR decrease, which causes the visual BCI performance to decline. Besides, a large difference between the training data and testing data also leads to a decrease of the performance. These findings imply that the auditory task could make the N200 and P300 potentials different from the training phase to the testing phase by increasing the subject's workload, which influences the visual ERP-based BCI performance. This finding is helpful for further designing a more robust visual ERP-based BCI.
Section 1 is the introduction that describes the reason why ERP-BCI is used widely and the existing problems in ERP-BCI. Section 2 introduces the design of the experiment and the index of evaluation. Section 3 describes the comparison of subjective fatigue values and the ERP amplitudes, as well as the accuracy and information transmission rates. Section 4 discusses the factor why these values have differences in different auditory tasks. Finally, a brief conclusion is given in Section 5.

Participants and Data Collection
Ten subjects (nine males) from the Hebei University of Technology with a mean age of 22 years (range from 19 to 24) participated in the electroencephalography (EEG) session of this study. All participants had normal or corrected-to-normal vision and were informed about the risk of seizures in epileptics due to flicker stimulation. They reported not to have ever suffered from epilepsy and gave their written informed consent. One was familiar with this experiment, and the rest had no experience. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the ethics committee of the Hebei University of Technology (HEBOThMEC2019001).
All electrode impedances were reduced to 10 kΩ before data recording. EEG signals were sampled at 1000 Hz (SynAmps2, Neuroscan, USA). All 36 channels were grounded between Fpz and Fz channel and referenced to the binaural mastoid. Figure 1 shows the interface and protocol developed in the OpenViBE environment. There are 12 robotic static images in the stimulus interface, which is a 3 × 4 matrix. Each image represents a robotic arm behavior. To encode robotic arm behavior, we define row i and column j in the matrix as location (i, j), as shown in Table 2. We use this interface to activate the visual stimuli. In this oddball paradigm-based interface [30], visual images flash randomly one by one. When a visual image is presented, a black square with a white solid circle shields the others. The image that the subject focuses on is the "target", and the image ignored is defined as "non-target". At the beginning of the experiment, the subjects watch the interface. Twelve pictures flash randomly, and the time each picture flashes is 150 ms, pausing for 75 ms after the flashing; the remaining 11 pictures flash randomly. The stimulus onset asynchrony is set to 225 ms. A process in which the interface flashes each stimulus once is defined as a "repetition". In our study, 10 repetitions constitute a "trial", and the target in the trial stays the same while the other stimuli are non-targets; the interval of every trial is 500 ms. One experiment session consists of 12 trials, and each subject conducts three sessions in this study.

Visual-Auditory BCI Tasks
The auditory material is an excerpt of audio story "Like a Flowing River". The time length of the material is 8 min.
The visual-auditory BCI tasks are categorized into three types. In each type, the subjects need to conduct 36 trials. Every trial is composed of 10 repetitions. Every 12 trials consist of a sub-experiment in which a certain auditory material is played when the sub-experiment starts and is stopped when the sub-experiment ends. In each sub-experiment, the subject is required to not only focus on the target stimulus in each trial, but also silently count the number of "De" noticed from the auditory material. The differences between the three tasks are the speeds of playing the material: 0-auditory task (0-T) represents the task in which there is no auditory material played; 0.5-auditory task (0.5-T) represents the task in which the auditory material is played in half of the ordinary speed; 1-auditory task (1-T) represents the task in which the auditory material is played in an ordinary speed. The numbers of the "De" in the three tasks are 0, 26 and 53. The procedure of the visual-auditory BCI tasks is shown in Figure 2.

The NASA-TLX Scale
Subjects complete the fatigue scale after the experiment, and the average value the subjects get from the scale can determine the difficulty of the auditory task; the difficulty has three levels: low, medium and high. The most commonly used subjective fatigue scale is the NASA-TLX scale developed by NASA, which has good internal consistency and structural validity as a subjective fatigue assessment for BCI system operations [27]. In Figure 3a, we can get the scores of M 1 to M 6 . In Figure 3b, we can calculate N 1 to N 6 from the 15 pairs comparisons. The larger the total score, the greater the mental workload.  Its score expression is: where F is the mental workload value of the subject, and N i is the number of i-th dimensions that participants considered to have more contribution to the overall mental workload than other dimensions. The C 2 6 means the number of pair-wise comparisons of all factors. M i is the score of the i-th dimension.

Feature Extraction and Classification
The EEG data from 0 ms to 600 ms post-stimulus is cut for each stimulus. Firstly, a 3rd-order Butterworth band-pass filter (0.5-10 Hz) is used to filter each epoch. Secondly, the data of 50 ms pre-stimulus is used as the baseline to make a baseline correction. Thirdly, all epochs are down-sampled to 40 Hz to reduce the data size. At last, this experiment uses 36 channels as feature channels to output a 864-dimension feature vector.
The fisher linear discriminant analysis (FLDA) is used for the binary classification problem because they have excellent classification performance in ERP-based BCI systems to solve identification problems [31,32].
The discriminant function of the Fisher classifier is: w T is the best projection direction, x is the data sample, and ω 0 is the central value of the two types of data; the samples are classified by the value of y: the samples is discriminated as a target when y> 0 and as a non-target when y< 0.
The mental workload of different people is affected by many factors, so the fatigue of each subject after an auditory task has three states which are low workload state (LD), medium workload state (MD) and high workload state (HD). The rule of division is as follows: where V j is the fatigue value of the j-th subject after completing the visual-auditory BCI task.

Evaluation Criteria
The classification accuracy in this article refers to the ratio of the number of correctly classified "Trial" and the total number of "Trial".
where P is the accuracy, T r is the correctly classified number of "Trial", and T t is the total number of "Trial", which is 36 in this research. The information transfer rate (ITR) was originally used for the communication and computational rates of measurement systems in the communications field and was introduced in the BCI field by the Wolpaw et al. [32,33]. Due to the three basic performance indicators of optional target number, target recognition accuracy, and single target selection time, ITR has become one of the most commonly used comprehensive evaluation indicators for evaluating system communication rates in BCI research. Before calculating the ITR, we first need to calculate the amount of information transmitted by a single target selection, that is the bitrate (B). The formula of B is: The unit of B is bits/selection, the N represents the number of possible targets (12 with the 3 × 4 matrix), and the P represents the accuracy that are correctly classified (average accuracy).
The formula of ITR is: The n represents the repetition number of trials, the t r represents the time of a repetition which is 2700 ms, the t i represents the interval time between each trial, which is 500 ms.

Behavior Data
From Table 3, the average number of "De" that 10 participants count is close to the actual number in the auditory story. The average recognition error of all the participants fluctuates within an acceptable range. It can be considered that the subjects completed the auditory task during the experiment. All the participants completed the auditory tasks well, indicating that the difficulty of the auditory tasks is designed reasonably. In addition, as the difficulty of the auditory task increases, the error fluctuation increases from 1 to 3 on average. This indicates that high-speech auditory tasks will increase the probability of misidentification, and this auditory task is more difficult. Additional cognitive tasks are more difficult, and the speed of auditory material is the factor that affects the difficulty of the task itself. We cross verify the subjective reports by using Ura's variation of Scheffe's method [20]. Table 3 shows the difficulty of each sound. The rated difficulty is analyzed by the analysis of variance that contains factors of the average of ratings, the individual difference of the ratings and so on. It reveals that the significant main effects come from the average of ratings (p < 0.01). The fatigue values of three auditory tasks about all subjects are provided in Table 4. We use the mean as the fatigue value of the three auditory tasks [34][35][36]. The fatigue values of 0-T, 0.5-T and 1-T are 34.98, 50.09 and 54.87. According to the fatigue values of three auditory tasks, the auditory task 0-T is defined as low difficulty task, 0.5-T is defined as medium difficulty task (p < 0.05, p = 0.0282), and 1-T is defined as high difficulty task (p < 0.05, p = 0.0073). The behavioral results (the error number of the "De") and the NASA-TLX scale scores prove that controlling the difficulty of the auditory task is achieved. The mental workload of the subjects in the process of performing the BCI task has reached different status. We can conclude that the subjects can produce a stronger fatigue response after experiencing a faster speaking rate auditory task. This reaction can be seen from the change of fatigue scale values. Different subjects think the auditory task is different in difficulty. The fatigue values of S2, S6 and S8 do not decrease with the increased difficulty of the auditory task. The subjects' different fatigue values imply that individual differences also exists in the subjects' feeling on the tasks. The addition of auditory task to the subject's visual task while getting stimulated also induces ERP. In Section 2, we defined the LD, MD and HD according to the fatigue value that is calculated after the subject makes the experiment, so we take the EEG data of three statuses after the experiment to analyze.

The Analysis of ERP
The off-line analysis extracted features of the ERP. Brain signals within a 600 ms window, from 0 ms pre-stimulus to 600 ms post-stimulus, were selected and pre-processed using the following steps. First, a digital filter with a bandwidth of 0.5-10 Hz filtered the signals. Second, a baseline corrected the signals by subtracting the mean value of the data from 300 ms pre-stimulus to the time point when the stimulus appeared. Finally, an algorithm averaged the epochs induced by the same type of stimuli (target or nontarget) to extract the ERP's patterns. The grand averaged ERP waveform for PO4, POZ, PO3, P6, CP6, P7, O1, OZ, and O2 are depicted in Figure 4. The common ERP components for oddball paradigms like N200 and P300 can be observed from the waveform. In Figure 4, the ordinate represents the magnitude of the ERP amplitude. The abscissa is the time after the target stimulation. The black solid line represents the amplitude changes under the LD. The blue dashed line represents the amplitude change under the MD, and the red dashed line represents the amplitude change under the HD. The latency and amplitude of some channels is shown in Table 5. The latency of N200 and P300 under higher workload state is delayed about 6 ms compared to lower workload state. The P300 amplitude of parietal and occipital areas decreased with the increase of additional auditory difficulty tasks. The possible reason may be that working memory occupies the mental resources that generate P300.  The average amplitude of P300 decreased by 0.74 µV (p = 0.0104) under MD compared with LD, and decreased by 0.12 µV (p = 0.0489) under HD compared with MD in occipital area. The average amplitude of N200 decreased by 0.56 µV (p = 0.0055) under MD compared with LD, and the N200 amplitude induced decreased by 0.13 µV (p = 0.0383) under HD compared with MD in the occipital area. In addition, the amplitude of N200 in central and parietal areas also decreased with the increase of additional auditory difficulty tasks. The changes of N200 and P300 amplitude between MD and LD have more obvious differences than that between HD and MD. Both the amplitudes of N200 and P300 in parietal area decrease with the workload state increase. The P300 amplitude in parietal area has the fastest decline under MD compared with LD, and the N200 amplitude in the parietal area has the fastest decline. This indicates that the higher workload the subject has, the lower the ERP amplitude in occipital and parietal areas. The amplitude of the P300 of occipital area in HD is decreased by 0.86 µV (p = 0.0239) compared to the LD, which indicates the subjective and objective data are consistent.
Next, we observe the changes in the region of the brain related to the vision from the brain topographic map. Figure 5 shows patterns of brain activities induced by the target stimulus at the post-stimulus time of 250 ms under LD, MD and HD. The colors of the topographies represent the amplitudes of the 10 subjects' average brain signals. The scale in Figure 5 is 6 µV. The amplitude in the occipital region of the brain increase at 250 ms after receiving the visual stimulus. When the subject is in a high mental workload state, the amplitude of the occipital area gradually becomes the least. In Figure 5, the amplitude under LD in the occipital area is the maximum, MD is next, and the amplitude of the occipital area under HD is the minimum. The topographic map changes corresponding to the amplitude changes in Table 4. It has the most obvious amplitude decrease in the occipital region of the brain, with the increase of the mental workload of the subject. This indicates that when the additional auditory tasks become more difficult, the brain is in a higher fatigue state, and the magnitudes of N200 and P300 in the occipital region of brain decrease.

Accuracy and ITR
The average classification accuracy and the ITR of all subjects are in Figure 6, where the abscissa represents the repetition number of a Trial. The ordinate represents the classification accuracy and the ITR respectively. The same color of line indicates the same test set. The same mark of line indicates the same train set. As the number of repetitions increases, the classification accuracy increases and the ITR decreases. After 10 repetitions, the curves L-L (90.83%, 5.65 bits/min), M-L (86.94%, 5.15 bits/min) and H-L (85.41%, 4.96 bits/min) are almost the highest, which means the accuracy and ITR reach the highest when using the data of LD as the test set. In Figure 6, L-L (90.83%, 16.43 bits/min) and M-L (86.94%, 5.15 bits/min) are the highest, followed by L-M (83.05%, 4.68 bits/min) and M-M (86.11%, 5.05 bits/min), and L-H (76.39%, 3.96 bits/min) and M-H (77.49%, 4.07 bits/min) are the lowest. This indicates that the classification accuracy is the highest with the data of LD for testing, followed by the data of MD for testing, and the lowest with the data of HD for testing except using the data of HD as the training set.

Discussions
Many BCI systems perform well in the lab environment, but the performance becomes poor when applying it into the real environment. It is necessary to find the reason that causes this decline to retain a high performance. Since there are many potential factors influencing the BCI performance in the real environment and the sound is a common noise, this study mainly analyzes the influence of the sound.

The Auditory Task Increases the Mental Workload
We can see that most of the subjects regard the task with higher speed as harder. Higher speed makes the subjects receive and deal with more information within unit time, which causes an increase of mental workload. When the amount of information is too much, the subjects might ignore some information, which leads to decision mistakes, such as counting the number of "De" wrongly, or forgetting to pay attention to the target stimulus when it appears. This can be seen from the behavior results, where as the speed increases, the rate of counting correctly falls.
On the other hand, as the difficulty increases, the ERP decreases. The average amplitude of P300 is decreased by 0.86 µV under HD compared with LD in occipital area (p = 0.0239). The average amplitude of N200 is decreased by 0.69 µV under HD compared with LD in occipital area (p = 0.0158). The subjects allocate less resources to the visual stimulus because he/she pays parts of attentions to the auditory task, which leads to lower ERPs when stimulated by the stimulus.

The Factors of Performance Decline
When applying the BCI into the environment where the speed of the sound is higher, the performance of it declines. The accuracy or the ITR in the LD conditions are all higher than the accuracy or ITR of the HD (MD & LD: p = 0.0282; HD & LD: p = 0.0073) conditions. The accuracy is also decreased by 5.95% in HD compared with LD (p = 0.0101). Finding the reason is important for using the BCI in daily life. This study uses the data sets under the same or different data sets to explore the factors influencing the detection performance. The nine conditions can be categorized as two types: The type that uses the same data sets to train and test the classifier, such as L-L, M-M, and H-H; and the type that uses the different data sets to train and test the classifier, such as L-M, L-H, M-L, M-H, H-L, and H-M. We assume that the performance of the classification in the BCI is influenced by two factors.
Firstly, the amplitude of the ERP has an influence on the performance. In Figure 6, as the speed of the auditory task increases, the performance of the classifiers of the first type decreases. The descending order of the three classifiers is L-L, M-M, H-H. This order is in accordance with the order of the amplitude under different tasks. Since the only difference between these classifiers is the amplitude, the amplitude affects classification. Higher amplitude could produce more obvious features, which leads to higher classification accuracy rate.
Secondly, the difference between training and testing data sets plays important roles on the performance. When using the same data to test the classifiers that are trained with different data sets, the results show that the L-L outperforms the M-L and H-L, the M-M outperforms the L-M and H-M, and the H-H outperforms the M-H and L-H. The accuracy and ITR under H-H are higher than the accuracy and ITR under H-M; the accuracy and ITR are the highest under L-L. The ITR is decreased from 16.43 bits/min under H-H to 12.36 bits/min under H-M. This indicates that the classifier performs best if the training and testing data are from the same task, even though the amplitude is not the highest. This finding indicates that if the testing environment is too noisy, the training process should take the noise into consideration to obtain better performance. When the training set data is inconsistent with the test set data, the classification accuracy and information transmission rate will decrease. When the subjects are in a low brain workload, or the classifier is trained and tested using the same state data, the classifier can obtain better classification and information transmission rate.

Conclusions
This study uses a dual-task to explore the influence of the speed of the sound on the visual BCI. The P300 and N200 both decline by 0.86 µV (p = 0.0239) and 0.69 µV (p = 0.0158) when the speed of the sound increases, which leads to a substantial decrease of accuracy (5.95%, p = 0.0101) and ITR (9.53 bits/min, p = 0.0416) of the visual BCI. The result demonstrates that increasing the speed of the sound has an effect on the allocation of the workload for the visual BCI task, which indicates that the auditory and visual pathways are related when dealing with the BCI. Besides, we find that the performance of the BCI also depends on the difference between the training and testing data. Large differences would lead to bad performance even though the classifier is trained well under the training environment; therefore, we should take the noise of the testing environment into consideration when training a classier.
The subject could hear the sound of servos of the robotic arm when conducting the on-line experiments. Whether this sound has an effect on the BCI performance needs to be studied in our future work. In addition, the visual BCI should take more kinds of visual images, such as the subject's arm, to explore the way of improving the BCI performance. Whether the user's familiarity with the sound or the memory cognitive mechanism of the human will influence the visual BCI performance needs to be studied in our future work.