1. Introduction
In the fields of healthcare and welfare, human–machine interfaces (HMIs) that utilize bio-signals, such as EEG, electromyography (EMG), and electrooculography (EOG), are essential [
1,
2]. Among these, interfaces based on EEG signals are known as brain–computer interfaces (BCIs), which serve as the sole means for individuals with severe conditions such as amyotrophic lateral sclerosis (ALS), brainstem infarction, cerebral palsy, and spinal cord injuries to control a computer [
3,
4]. BCIs can be categorized as invasive or non-invasive based on the method of EEG measurement. Examples of invasive BCIs include electrocorticography (ECoG), which involves placing electrodes on the brain surface, and local field potentials (LFPs), which require electrodes to be implanted directly in the brain [
5]. These methods necessitate surgery to open the skull and dura mater for electrode placement, imposing a significant burden on the user. However, they allow for the direct measurement of brain signals, yielding high temporal and spatial resolutions, which enables real-time multi-input control [
6,
7,
8]. In contrast, non-invasive BCIs commonly utilize scalp EEG [
9], in which electrodes are placed on the scalp [
10,
11,
12]. This approach does not require surgery, making it more accessible for daily use, though some of the signal strength is attenuated as it passes through the skull and scalp. Consequently, a single EEG recording is insufficient to achieve a reliable control input. Instead, non-invasive BCIs typically analyze 40–120 EEG epochs, resulting in one input command approximately every 30–60 s, which limits real-time control. Although the performance of non-invasive BCIs is generally lower than that of invasive methods, the ease of use and the non-surgical nature make scalp EEG-based BCIs promising for practical applications [
13].
Scalp EEG-based BCIs can be divided into two categories: those that detect periodic bio-signals, known as “baseline rhythms”, and those that capture event-related potentials (ERPs), which are transient signal peaks generated when the brain recognizes specific sensory stimuli [
14]. Among ERPs, P300 is the only waveform with successful applications in practical scalp EEG-based BCI systems. P300 is an EEG peak that appears as a positive deflection approximately 300 ms after a person attends to a particular sensory stimulus presented randomly and at a fixed frequency [
15,
16]. Leveraging this characteristic, Farwell and Donchin developed a P300-based speller interface in which 26 letters flash alternately on a screen. By analyzing the timing of the letter flashes and ERP responses, they were able to infer the letter the user focused on, creating a keyboard input BCI known as the P300 speller [
17]. Many studies have further developed the P300 speller. One direction of research has aimed to improve sensory presentation methods to enhance users’ recognition of the target stimulus. For example, Kirasirova and colleagues found that flashes of letters surrounding the target letter could negatively affect the P300 peak for the target. By limiting the visual field, they improved the detection accuracy [
18]. Similarly, Kaufmann et al. achieved higher recognition accuracy by replacing letter flashes with images of human faces, which are more readily recognizable by users [
19]. Another line of research has focused on refining methods for extracting the P300 peak. The foundation for P300 analysis, proposed by Galambos and Sheatz in 1964 [
20], involves the following process: first, EEG data are segmented into epochs from 1 s before to 1 s after stimulus onset, and these epochs are categorized by stimulus type. Next, baseline correction is applied, setting the most stable EEG point within each epoch as the zero-voltage reference (EEG baseline). Typically, either the amplitude at a specific pre-stimulus time (e.g., 100 ms; single-time-point baseline correction) or the average amplitude over a pre-stimulus period (e.g., 0–200 ms; time-range-averaged baseline correction) is used as the zero point, with all the EEG data adjusted accordingly. Finally, the averaged waveform for each stimulus type is calculated. This process allows for the cancellation of baseline rhythms (around 50 µV) that obscure small P300 peaks (around 5 µV) by taking advantage of the phase differences in baseline rhythms across epochs [
21,
22,
23].
Thus, P300 detection accuracy improves with longer experiment durations and more epochs for averaging. However, in scenarios involving continuous sensory presentation, such as with the P300 speller at 200 ms intervals, achieving a stable baseline reference may be challenging, potentially reducing detection accuracy. To address this, Tanner and Norton proposed improving baseline correction by simultaneously recording EEG and magnetoencephalography (MEG) during ERP-evoking stimuli. They clarified the effective high-pass filter settings for baseline correction, assuming MEG as the accurate reference [
24].
Additionally, Krusienski et al. explored using additional EEG sites (PO7, PO8, and Oz) beyond the standard P300 sites (Fz, Cz, and Pz), enhancing the baseline reference detection across multiple channels [
25]. Furthermore, in recent years, several studies have been conducted on P300 detection using machine learning based on convolutional neural networks (CNNs) as an advancement of the P300 speller. Kilani et al. demonstrated that by training a model directly on the EEG waveforms of target and non-target stimuli without applying conventional baseline correction or averaging, it is possible to detect the presence or absence of the P300 component in raw EEG signals [
26]. However, in order to achieve a high classification accuracy, subject-specific training (fine-tuning) is necessary due to inter-individual differences in EEG waveform characteristics. In contrast, Li et al. proposed a machine learning-based P300 detection method that does not require fine-tuning [
27]. Their approach involved training on EEG waveforms after applying time-range averaged baseline correction and averaging, enabling a high classification performance without subject-specific training. This indicates that baseline correction and averaging effectively reduce individual variability in EEG waveforms, highlighting the critical role of these preprocessing techniques in P300 detection.
Given the continuous 400 ms interval sensory stimuli used in P300-based BCIs, the characteristics of the EEG baseline potential under such conditions have not been thoroughly analyzed. Additionally, the effectiveness of conventional baseline correction methods, such as single-time-point baseline correction and time-range-averaged baseline correction, has not been quantitatively validated. Therefore, in this study, we conducted a P300 evoked potential experiment with continuous 400 ms visual stimuli, measuring EEG data from 0 ms to 1000 ms before stimulus onset (with a 1 ms resolution). Three types of baseline correction (single-time-point, time-range-averaged, and multi-time-point) are applied to analyze P300 peaks. We then analyzed a baseline method for calculating EEG reference potentials that would allow effective P300 detection for continuous stimulation.
In Experiment 1, we performed single-time-point baseline correction for all time points between 0 ms and 1000 ms before visual stimuli, analyzed all waveforms, and examined the characteristics of the single-time-point baseline method that can effectively determine P300. In Experiment 2, we applied time-range-averaged baseline correction using different baseline durations ranging from 0 ms to 1000 ms before stimulus onset, analyzed all the waveforms, and examined the characteristics of the time-range-averaged baseline correction that can effectively determine P300. In Experiment 3, we test our proposed multi-time-point baseline correction to provisionally determine the P300 for all points in a specific time range using single-time-point baseline correction, and then evaluate the results of these judgments comprehensively and make a final judgment. We then analyze all waveforms and verify that the multi-time-point baseline correction can effectively determine P300.
3. Analysis 1: Characteristics of Conventional Single-Time-Point Baseline Correction for P300 Peak Detection
Single-time-point baseline correction, particularly using the baseline 100 ms before visual stimulus presentation, is the most commonly used method in previous studies for P300 peak detection. However, other studies have employed baselines from 200 ms or even 1000 ms before the stimulus. We extracted EEG data from 1000 ms before to 500 ms after the visual stimulus and applied single-time-point baseline correction at each 1 ms interval from 0 to 1000 ms prior to the stimulus. Baseline-corrected EEG data for each stimulus were then averaged, and the maximum amplitude within a specific post-stimulus time range was extracted. By comparing the maximum amplitudes obtained for each stimulus, we analyzed and discussed the effectiveness of this method for P300 peak detection, aiming to identify the most effective baseline point for amplitude-based baseline correction.
3.1. Feature Analysis of the Maximum Amplitude in P300 Peak Detection Based on Single-Time-Point Baseline Correction
To facilitate comparison, we first focused on five representative baseline points used in single-time-point baseline correction (100, 200, 300, 400, and 500 ms before visual stimulus presentation) and calculated the P300 peak detection results based on each baseline point.
Figure 4a shows the post-stimulus EEG waveforms after 100 trials of averaging, classified as “Recognition Difficulty: Low”, “Recognition Difficulty: Moderate”, or “Recognition Difficulty: High”. The waveform of S5 represents a “Recognition Difficulty: Low” outcome, where a clear P300 peak appeared during the target stimulus and the waveform stabilizes near zero during the non-target stimulus. In contrast, for S10 (“Recognition Difficulty: Moderate”) and S13 (“Recognition Difficulty: High”), some baseline points led to higher amplitudes during the non-target stimulus than during the target stimulus, making P300 peak detection challenging. This suggests that conventional single-time-point baseline correction may not provide a stable “EEG baseline voltage” at certain points.
However, this issue could also have arisen due to inadequate suppression of the background rhythm due to insufficient averaging. To investigate further,
Figure 4b illustrates the results of applying single-time-point baseline correction at 1 ms intervals from 0 to 1000 ms before the visual stimulus. Maximum amplitudes were calculated within the 200–400 ms post-stimulus range (300 ± 100 ms), with the
x-axis representing the baseline point and the
y-axis the maximum amplitude. For each baseline point, correctness was determined if the maximum amplitude during the target stimulus was higher than that during the non-target stimulus. The lower part of each graph shows correct/incorrect classification results, and the accuracy rate is displayed. These results are presented for averaging trials of 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100.
In addition, the top row in
Figure 4b illustrates the “Recognition Difficulty: Low” condition, the middle row the “Recognition Difficulty: Moderate” condition, and the bottom row the “Recognition Difficulty: High” condition. Across all conditions, non-target stimulus maximum amplitudes remained high with less than 20 averages, indicating residual background rhythm. Starting from 40 averages, the impact of the background rhythm diminished. In the “Recognition Difficulty: Low” condition of S5, target stimulus P300 peak amplitudes are clearly distinguishable across all baseline points. However, in the “Recognition Difficulty: Moderate” (S10) and “Recognition Difficulty: High” (S13) conditions, non-target stimulus amplitudes remained higher, even with 40 or more averages, and stable P300 peak detection was only achieved after 90 or more averages. Thus, while averaging 40 or more times reduces background rhythm interference, weak P300 peaks from target stimuli can result in incorrect judgments due to unstable EEG baseline voltages at certain baseline points, causing higher maximum amplitudes for non-target stimuli.
Finally, in
Figure 4c, for each of the 22 participants, we applied single-time-point baseline correction at 1 ms intervals from 0 to 1000 ms before the visual stimulus. Maximum amplitudes within the 200–400 ms range (300 ± 100 ms) post-stimulus were calculated from the averaged EEG data of 100 trials. Statistical results of the maximum amplitudes for the target (one stimulus) and non-target (eight stimuli) stimuli are shown as box plots. This confirms that the median of the maximum amplitudes from the target stimuli exceeded that from the non-target stimuli across all participants, although higher maximum amplitudes from the non-target stimuli were observed at certain points, potentially leading to misclassification. This suggests the importance of determining a stable EEG baseline voltage point within the range of 0 to 1000 ms prior to the visual stimulus.
3.2. Feature Analysis of the Time Range for Determining Maximum Amplitude in the Single-Time-Point Baseline Correction for P300 Peak Detection
As shown in
Figure 5a, the position of the P300 peak did not occur exactly at 300 ms after visual stimulus onset but varied between individuals. Additionally, the P300 peak width spans approximately 100 to 200 ms, necessitating an appropriately defined detection time range (P300 peak detection time range) to accurately capture individual differences in the P300 peak position. Therefore, we analyzed EEG data for the 22 participants by applying 10 different time ranges around 300 ms post-stimulus, with intervals of ±10 ms up to ±100 ms. We evaluated both the maximum amplitude used as an indicator for P300 peak detection and the corresponding time position for the target (one stimulus) and non-target stimuli (eight stimuli combined) averaged EEG data. The results are presented as box plots in
Figure 5b,c. Regarding the median maximum amplitude in
Figure 5a, the values show an upward trend from 300 ± 10 ms to around 300 ± 60 ms, stabilizing in the 300 ± 70 to 100 ms range. In
Figure 5b, after averaging 80 trials or more, the time position of the maximum amplitude stabilized particularly within the range of 300 ± 70 to 100 ms, suggesting that 300 ± 70 ms is the most suitable among the 10 time ranges.
Another notable feature is that the maximum amplitude of the target stimulus tended to occur later than 300 ms, while that of the non-target stimulus tended to appear earlier than 300 ms. This characteristic can also serve as a useful indicator for P300 peak detection. Considering these two features—the time range of 300 ± 70 ms and the tendency for the target stimulus maximum amplitude to occur later than 300 ms—the time range of 300 to 370 ms is estimated to be the most effective for detecting the maximum amplitude of the target stimulus.
3.3. Detailed Analysis of P300 Peak Detection in the 300–370 ms Time Range for Single-Time-Range Baseline Correction
In the previous section, it was suggested that detecting the P300 peak as the maximum amplitude within the 300–370 ms range after the visual stimulus is the most effective approach. Therefore, in this section, we performed single-time-point baseline correction at each 1 ms interval from 0 to 1000 ms before the visual stimulus for each of the 22 participants. For each baseline-corrected, averaged EEG dataset, the maximum amplitude within the 300–370 ms range post-stimulus was calculated. A detailed statistical analysis was conducted on the maximum amplitudes for the target (one stimulus) and non-target stimuli (eight stimuli) to evaluate the time points that provide a stable EEG baseline voltage within the 0–1000 ms pre-stimulus range.
First,
Figure 6a shows the ratio of the maximum amplitude for the non-target stimuli to that for the target stimuli (set to 100%) for each baseline time point (horizontal axis), allowing for easy observation of non-target stimulus influence. The upper section of the figure displays the results for averaging trial counts of 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100, while the lower section shows correct/incorrect classifications at each baseline time point along with accuracy rates. Additionally, the results for the three representative participants—S5, S10, and S13—are presented, as in
Section 3.1. The findings align with the trends noted in
Section 3.1; however, a key observation is that with 90 or more averaging trials, all participants showed lower maximum amplitude ratios for the non-target stimuli compared to the target stimuli within the 0–200 ms range pre-stimulus, indicating a stable EEG baseline voltage.
Subsequently,
Figure 6b,c present statistical results across all 22 participants, showing the ratio of non-target to target maximum amplitude values when the target amplitude was set to 100%.
Figure 6b displays the mean and standard deviation, while
Figure 6c illustrates the mean, maximum, and minimum values within shaded areas. Across all figures, the non-target maximum amplitude ratio is lower within three specific pre-stimulus ranges: “0–200 ms”, “400–600 ms”, and “800–900 ms”. Each of these ranges aligns with the initial 0–200 ms range before any visual stimulus.
This suggests that using a stable EEG baseline voltage, such as within the 0–200 ms range, rather than relying on a single-time-point baseline, could enable more effective baseline correction. Consequently, using this stable range as a baseline would likely yield the highest accuracy in P300 peak detection.
3.4. Summary of Single-Time-Point Baseline Correction for P300 Peak Detection
We demonstrated that using the 0–200 ms pre-stimulus period as a relatively stable EEG baseline for baseline correction, combined with assuming the time position of the maximum P300 peak amplitude within the 300–370 ms range, enables effective height-based judgment of maximum amplitude values in averaged EEG data for target and non-target stimuli. This approach effectively enhances the accuracy of P300 detection and provides key insights for improving P300 detection accuracy.
6. Discussion
Detailed analyses were performed using two conventional methods: single-time-point baseline correction and time-range-averaged baseline correction. The results demonstrate that, firstly, single-time-point baseline correction is not stable because differences in fundamental rhythm amplitudes arise depending on the chosen time point. Secondly, the time-range-averaged baseline correction method utilizes a single averaged amplitude within a specified time range. As a result, if waveforms with periodicities longer than the averaging window are present, it may fail to accurately estimate the waveform’s center axis, thereby resulting in an inappropriate baseline. Therefore, in this paper, we propose a multi-time-point baseline correction approach that comprehensively evaluates waveform characteristics within a specified time range. This method successfully demonstrated a performance comparable to that of the time-range-averaged baseline correction method. Additionally, since the proposed method can be regarded as a multivariate analysis of waveform characteristics within a certain time range, future enhancements to waveform evaluation methods within specific time windows have the potential to improve the performance further compared to traditional average-amplitude baseline correction.
However, a major drawback of the multi-time-point baselining approach is its higher computational cost. In this study, single-time-point correction, averaging processes, and P300 peak detection were performed at every millisecond interval from 200 ms to 0 ms before stimulus onset. It was inferred that the computational cost is greater than that of the conventional single-time-point baseline correction and time-range-averaged baseline correction methods. To further quantify these computational costs, additional experiments comparing calculation costs among the three baseline methods were conducted. Given that PC conditions can influence the computation time, each P300 analysis was performed ten times.
Table 1 presents the results regarding computational costs related to the number of averaged epochs used for P300 peak detection based on each of the three baseline methods. Defining a single trial as an EEG data analysis for P300 detection following visual presentation of one target stimulus and eight non-target stimuli, the computation time per trial was approximately 0.85 ms for single-time-point baseline correction, 1.00 ms for time-range-averaged baseline correction, and 150.0 ms for multi-time-point baseline correction. Clearly, multi-time-point baseline correction entails significantly greater computational costs. Nevertheless, considering that each trial lasts 3.6 s (nine stimuli each presented sequentially for 400 ms), the 150 ms multi-time-point baseline correction process can still be executed concurrently. Thus, it is feasible to develop a real-time analysis algorithm that outputs final trial results approximately 150 ms after the last stimulus.
In conclusion, real-time P300 detection is achievable following detailed waveform analyses within specific time windows, emphasizing the importance of enhancing processing methods within these time windows beyond simplistic averaging approaches such as traditional time-range-averaged baseline correction.
This study focuses on the research of P300 interfaces for welfare applications. The primary motivation for this focus is that, for individuals with complete quadriplegia (such as patients with Locked-in Syndrome), who are unable to move any motor organs and can only recognize information through sensory inputs processed in the brain, brain-computer interfaces (BCIs) represent the only viable means of life support [
32]. Among various BCIs, non-invasive BCIs that do not require surgical procedures have advanced significantly, with P300 interfaces utilizing event-related potentials (P300) measured by EEG being one of the most practically realized technologies to date [
33]. The P300 Speller is a representative system based on the elicitation of P300 event-related potentials through visual stimuli. In its standard configuration, multiple characters are displayed on a screen and randomly flash one by one at 400 ms intervals. By focusing their gaze on a specific target stimulus, users can generate a P300 response that can be analyzed to identify the intended selection [
17]. Previous studies have not only employed simple character stimuli but also explored the use of various visual designs to enhance practical usability [
34]. Other studies have compared the detection performance of P300 potentials under variations in flash intensity, flash color, flash shape, and flash duration [
35,
36,
37,
38]. Furthermore, efforts have been made to improve the efficiency of the P300 signal analysis process itself by modifying baseline correction and signal averaging procedures traditionally based on stimulus timing. Specifically, methods that omit baseline correction and average in favor of frequency-domain analysis [
25], as well as machine learning-based approaches for direct P300 peak detection [
39], have been proposed.
However, in frequency analysis approaches, it becomes difficult to accurately identify P300 peaks because variations in the EEG baseline reference can cause low-frequency components to be erroneously interpreted as signal peaks. Similarly, machine learning-based methods have been reported to yield better performance when applied after baseline correction [
40], underscoring the importance of identifying an appropriate “EEG baseline reference” for reliable P300 detection. Moreover, for welfare applications, EEG measurements must often be conducted in office environments rather than shielded rooms typically used in neuroscience laboratories, making them more susceptible to commercial power line noise. Therefore, establishing a baseline reference suitable for real-world measurement environments is crucial [
41].
Against this background, the present study aims to clarify the relationship between continuous visual stimuli and the EEG baseline reference used in conventional P300 spellers. One distinctive feature of the P300 speller is that more than 80 flashes of visual stimuli are necessary to determine a single selection. To mitigate user fatigue and maintain concentration, the flashing interval is shortened to 200–400 ms, which is shorter than those generally used in conventional neuroscience studies. This raises concerns regarding the stability of EEG signals under such continuous stimulation conditions. Previous research has typically adopted methods where the amplitude at a single point in time (100 ms or 200 ms before stimulus onset) is used as the baseline reference, or where the average amplitude over a time window from 0 to 200 ms before stimulus onset is used. However, no studies have quantitatively validated the effectiveness of these baseline correction methods under continuous stimulation conditions. Therefore, in this study, experiments were conducted with 21 participants, using nine visual stimuli that flashed randomly at 400 ms intervals. The P300 detection performance was evaluated through comparison of the following three baseline correction methods:
A method based on the amplitude at each individual time point;
A method based on the average amplitude over specific time windows;
A novel method is proposed in this study, which performs a comprehensive evaluation by individually correcting the baseline across all time points from 0 to 1000 ms prior to stimulus onset.
The experimental results revealed that baseline correction based solely on a single-point amplitude is highly susceptible to fluctuations in basic EEG rhythms, leading to unstable P300 detection performance. In contrast, the method based on the average amplitude over a time window demonstrated that using the average amplitude between 0 and 200 ms before stimulus onset provided the most stable baseline reference and yielded high P300 detection performance. However, it was also observed that the stability of the baseline reference deteriorated when artifacts with periods longer than 200 ms were present.
Moreover, the proposed method involving multi-time-point baseline correction and comprehensive evaluation achieved a comparable high P300 detection performance when focusing on the 0–200 ms pre-stimulus window. From these results, it is suggested that under continuous visual stimulation with 400 ms intervals, baseline correction using the average amplitude over the 0–200 ms pre-stimulus period is most appropriate. Baseline correction based on a single-point amplitude is considered unsuitable due to the increased risk of erroneous P300 detection caused by fluctuations in basic EEG rhythms. Future work should include the prior assessment of waveform characteristics during the 0–200 ms pre-stimulus window. If only basic rhythms are present, calculating the central axis of the waveform may be appropriate. If artifacts with longer periods are detected, it may be preferable to either expand the time window or utilize the baseline reference obtained from the immediately preceding stimulus presentation. Nonetheless, because the optimal time window for evaluating a stable EEG baseline may vary depending on the visual stimulation interval, further detailed analyses of the relationship between the EEG baseline and P300 peaks under different continuous stimulation conditions are necessary.
7. Conclusions
In this study, we analyzed the performance of baseline correction methods for P300 analysis in order to improve the detection accuracy of EEG-based P300 event-related potentials for continuous sensory stimulations with durations of 400 ms that are used in welfare applications. The experimental method consisted of shining light, one at a time, on nine types of numbers aligned on a screen in 400 ms intervals and in a random order, and subjects were asked to focus on one type of number and view it 100 times. EEG data from 1000 ms before the visual stimulation to 500 ms after the visual stimulation were extracted and classified, and three baseline correction methods (single-point amplitude, time-range-averaged amplitude, and multiple-point amplitude correction) were applied. Characteristic evaluation was attempted based on the P300 peak detection accuracy after additive averaging. The results show that the P300 peak detection rate was highest when EEG data from 0 ms to 200 ms before visual stimulation were used for all three baseline types of stimuli. It was also found that the maximum amplitude of the target stimulus statistically occurred between 300 ms and 370 ms and that the maximum amplitude of the non-target stimulus appeared before 300 ms. In addition, we confirmed that the one-time baseline method tends to exhibit differences in detection accuracy because the amplitudes of the basic rhythm at different phase positions are selected. In addition, the maximum position of the P300 peak after additive averaging tends to fluctuate. Since time-range-averaged baseline correction was performed over a specified time range, the center voltage of the fundamental rhythm is calculated, and the maximum position of the P300 peak after additive averaging was stabilized. The latter could be used as an appropriate EEG reference potential. However, although it is effective for basic rhythms in the time range of 0 to 200 ms, it is not an appropriate EEG reference potential when oscillations with a period of longer than 200 ms are included, reducing the accuracy of P300 peak detection. Finally, we confirmed that the newly proposed multi-time-averaged baseline correction method, through which 200 provisional P300 peak estimates can be evaluated by applying single-time-point baselining to 200 time points from 0 ms to 200 ms before the visual stimulation, shows a similar performance to time-range-averaged baselining. In particular, this method can be used to evaluate the amplitude characteristics within a specified time range, with more variables than a single average value, and future improvements should lead to more accurate calculation of EEG reference potentials.