Multi-Scale Feature Extraction to Improve P300 Detection in Brain–Computer Interfaces

Usman, Muhammad; Lin, Chun-Ling; Chen, Yao-Tien

doi:10.3390/electronics14030447

Open AccessArticle

Multi-Scale Feature Extraction to Improve P300 Detection in Brain–Computer Interfaces

by

Muhammad Usman

¹,

Chun-Ling Lin

^2,*

and

Yao-Tien Chen

¹

International Ph.D. Program in Innovative Technology of Biomedical Engineering and Medical Devices, Ming Chi University of Technology, New Taipei City 243, Taiwan

²

Department of Electronic Engineering, National Taipei University of Technology, Taipei City 10608, Taiwan

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(3), 447; https://doi.org/10.3390/electronics14030447

Submission received: 16 December 2024 / Revised: 15 January 2025 / Accepted: 22 January 2025 / Published: 23 January 2025

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

P300 detection is a difficult task in brain–computer interface (BCI) systems due to the low signal-to-noise ratio (SNR). In BCI systems, P300 waves are generated in electroencephalogram (EEG) signals using various oddball paradigms. Convolutional neural networks (CNNs) have previously shown excellent results for P300 detection compared to different machine learning models. However, current CNN architectures limit P300 detection accuracy because these models usually only extract single-scale features. Aiming to enhance P300 detection accuracy, an inception module-based CNN architecture, namely Inception-CNN, is introduced. Inception-CNN effectively learns discriminative features from both spatial and temporal information to reduce overfitting and computational complexity. Furthermore, it can extract multi-scale features, which effectively improves P300 detection accuracy and increases character spelling accuracy. To analyze the effect of the inception layer, two additional models are proposed: Inception-CNN-S, which uses the inception layer with a spatial convolution layer, and Inception-CNN-T, which uses the inception layer with a temporal convolution layer. The proposed model was evaluated on dataset II of BCI Competition III and dataset IIb of BCI Competition II. The experimental results show that Inception-CNN provides a promising solution for improving the accuracy of P300 detection, with F1 scores of 47.14%, 55.28%, and 78.94% for dataset II of BCI Competition III (Subject A and Subject B) and dataset IIb of BCI Competition II, respectively.

Keywords:

brain–computer interface; convolutional neural networks; event-related potential; inception; multi-scale; P300 detection

1. Introduction

Brain–computer interface (BCI) systems are processes of communication and control that enable real-time interaction between the human brain and operating devices [1]. The user’s intention can be reflected in the electroencephalogram (EEG) signal, which can be converted into the desired output form by the BCI system for utilization. As a result, people can communicate with the external world independently of the brain’s peripheral nerves and muscle production pathways [2,3,4]. BCI technology was originally developed for biomedical applications. As the world ages, the number of people with disabilities has increased due to various disasters and diseases, especially those suffering from severe neuromuscular damage and neurodegenerative diseases, such as amyotrophic lateral sclerosis (ALS) [5,6]. BCI systems are used to overcome the problems faced by these individuals.

The P300 is one of the widely used event-related potential (ERP) signals and was first discovered by Sutton in 1965 [7]. The P300 wave is a positive deflection in ERP that occurs upon the recognition of a rare stimulus within a series of frequent stimuli. According to previous research, the BCI system based on the P300 speller is a real-time communicative system and performs outstandingly well in character spelling applications. In general use, this system must have high character recognition accuracy. Through multi-modal integration, traditional machine learning models, e.g., Support Vector Machine (SVM) classification algorithms, have been widely used to improve the performance of P300 classification in the commonly used BCI application of the P300 speller [8,9,10,11,12,13]. Some methods based on linear discriminant analysis (LDA) [14,15], Fisher’s linear discriminant (FLD) [16], group-sparse Bayesian linear discriminant analysis (gsBLDA) [17], and the Gradient Boosting Algorithm [18] are somewhat complex in operation, and their training times also increase significantly. An ensemble of a weighted artificial neural network (EWANN) based on a sparse autoencoder (SAE) and a stacked sparse autoencoder (SSAE) has been proposed for P300 detection to overcome the variation among different classifiers [19].

In recent years, deep learning (DL) has become a research hotspot in the field of image processing and pattern recognition. In the field of EEG analysis, deep learning architectures can automatically extract features of EEG signals, demonstrating a strong ability to express data features [20]. Lawhern and Vernon et al. used a compact CNN for the classification of four types of EEG data, including P300 [21]. To overcome overfitting problems in CNN models for P300 detection, a batch normalization (BN) layer and dropout (DP) layer were used [22]. Furthermore, combining CNNs with feature selection methods, such as Fisher ratio-based selection, followed by classification using Extreme Support Vector Machines (ESVMs), has shown promising results [23]. Recent advancements in P300 detection have yielded significant progress. Notably, the P300MCNN model optimizes performance by incorporating precise separable convolutions, adaptive activation functions, and tailored learning rates. Similarly, the WE-SPSQ-CNN model enhances classification accuracy and the signal-to-noise ratio by employing a weighted ensemble of spatio-sequential architectures [24,25].

CNNs provide an intuitive and easily understood method for handling the spatial relationship of natural space so that filtering and classification can be combined in one framework. Previous methods for the classification of P300 signals using CNNs typically employed fixed-size filters to extract single-scale features [21,22,23]. However, since the amplitude and latency of P300 signals vary across individuals and trials, these filters may fail to capture all relevant information. To address this limitation, we propose a method for P300 classification that utilizes inception layers within a CNN, enabling multi-scale feature extraction. The basic structure of the CNN for P300 classification in previous techniques includes batch normalization, a convolutional layer for spatial filtering, a convolutional layer for temporal filtering, and a dropout layer [22,23]. In previous studies, inception-based CNN models were employed for P300 detection, utilizing consecutive inception layers followed by convolution and classification layers [26], while IENet expanded on this concept with an ensemble of InceptionEEG models featuring multi-scale 1D convolutions to extract diverse EEG features and enhance adaptability across BCI paradigms [27]. However, our method incorporates an inception layer before each convolutional layer, providing a distinct mechanism for capturing diverse and rich features.

The motivation behind our approach stems from the need to improve the adaptability of P300 classification across diverse datasets and individuals. Variations in signal characteristics require a more flexible feature extraction framework, which traditional CNN architectures struggle to provide. By introducing inception layers before each convolutional layer, our method captures both fine-grained and high-level features at multiple scales. Our contributions include the following: (1) We present a CNN architecture that integrates inception layers before both spatial and temporal convolutional layers, thereby enhancing the ability to extract multi-scale features effectively. (2) The inception layers allow for the simultaneous extraction of fine and coarse details, leading to better adaptability to variations in EEG signals across individuals and sessions. (3) We conduct a thorough comparative evaluation with existing state-of-the-art methods, demonstrating significant improvements in classification accuracy and robustness. (4) We present the impact of the inception layers on both the spatial and temporal convolutional stages, highlighting their individual and combined contributions to overall model performance. The advantages of our approach lie in its improved adaptability to signal variability, efficient feature extraction, and better classification performance. The inception layer, originally part of the GoogLeNet architecture, facilitates efficient feature extraction and dimensionality reduction for image recognition tasks with CNNs [28]. It enhances the discriminative power of CNNs for the P300 detection task, improving both classification and character recognition accuracy. Our findings indicate that utilizing the inception layer before each spatial and temporal convolutional layer significantly enhances the CNN’s ability to detect P300 waves over existing methods. These improvements demonstrate the potential of our approach to advance BCI systems.

This paper is organized as follows. Section 2 provides an overview of the P300 wave, the oddball paradigm, and the associated classification challenges. Section 3 details the dataset utilized, the preprocessing steps applied, and the proposed CNN architecture. Finally, Section 4 presents the experimental results.

2. Background

2.1. P300 Speller

P300 is an ERP recorded in the EEG signal that responds to external events or stimuli with a relatively low probability of occurrence [29]. As the name suggests, P300 refers to a positive peak that appears approximately 300 milliseconds after the stimulus event occurs in the upper area of the brain, as shown in Figure 1.

Currently, many character spelling systems are based on the P300 signal. Among these systems, the most widely used and successful strategy is the row-column paradigm, which was designed and later improved by Farwell and Donchin [30,31]. In this paradigm, the subject’s task is to focus on a given character in the matrix shown in Figure 2. The character matrix then randomly flashes all characters in a row or column at a frequency of 5.7 Hz. When the subject focuses on the row or column containing the target character as it flashes, the brain cortex produces a P300 potential. Ideally, without interference, P300 should not be induced when other rows or columns flash. In the 6 × 6 matrix, only a specific row and column contain the character to be focused on, so the probability of P300 being induced by the system is 1/6, conforming to the characteristics of a low-probability stimulus.

2.2. Classification Problems

The classification problem in BCI systems based on P300 involves two types of problems: P300 detection in EEG signals and character recognition performed by the subject.

2.2.1. P300 Detection

P300 detection is a binary classification problem with two categories: one indicating the presence of P300 and the other indicating non-P300. Accurately detecting P300 can be challenging due to possible interference that may prevent the subject from producing P300 at the expected moment. Furthermore, the low signal-to-noise ratio (SNR) of the ERP significantly impacts the accuracy of P300 detection.

2.2.2. Character Recognition

The main task of the P300 speller-based BCI is to identify the specific character focused on by the subject, which is determined by combining the results of P300 detection. In the row-column paradigm of the character matrix, each character is defined by a pair of coordinates (x, y). Under ideal conditions, only 12 flashes (six rows and six columns) are sufficient to recognize the target character. However, EEG signals are often contaminated by various types of noise, so it is necessary to repeat each character experiment several times to improve the SNR and enhance the accuracy of P300 detection. The column and row of the target character can be identified by selecting the column and row with the highest probability of P300 during vertical and horizontal flashing, respectively. Therefore, using the results of P300 detection to identify the rows and columns with the highest cumulative probability of P300, their horizontal and vertical crossing targets can be judged as the characters that the subject is looking at. Since there are 36 characters in the stimulus paradigm, character recognition is modeled as a 36-class problem.

3. Experimental Data and Setup

3.1. Data

This study utilized dataset II of BCI Competition III (Subject A and Subject B) [32] and dataset IIb of BCI Competition II [33], abbreviated as BCI IIIA, BCI IIIB, and BCI II for convenience. The datasets were provided by the Wadsworth Center for the New York State (NYS) Department of Health. In this experiment, subjects were presented with a matrix of size

6 \times 6

(see Figure 2) containing 36 characters: [a-z], [1-9], and [-]. The subjects’ task was to focus on the characters from a predefined word in a fixed order. The six rows and six columns of this matrix were flashed randomly at 5.7 Hz in a sequence. The characters to be determined were defined by rows and columns. Therefore, 2 of the 12 flashing rows/columns contained the expected characters, i.e., 2 of the 12 flashes were expected to produce a P300 response. During the experiment, the matrix remained blank for a period of 2.5 s. Then, a row or column in the matrix was randomly intensified for 100 ms. After the flashing of the row/column, the whole matrix was blank for 75 ms. All rows and columns that were intensified once (i.e., sets of 12 intensifications) were grouped into character sets. The character set was repeated 15 times for each character. So, in a single-character experiment, 30 possible P300 responses were expected to be detected.

In this experiment, signals were collected from 64 different channels according to the international 10–20 electrode specification. EEG signals were band-pass filtered at 0.1–60 Hz and digitized at 240 Hz [20]. For BCI IIIA and BCI IIIB, the training dataset contained 85 characters, and the test dataset contained 100 characters. In the training dataset, the positive samples (containing the P300 signal) were

85 \times 2 \times 15 = 2550

, and the negative samples (excluding the P300 signal) were

85 \times 10 \times 15 = 12, 750

. In the test set, positive and negative samples numbered 3000 and 15,000, respectively.

For BCI II, the training dataset contained 42 characters, and the test dataset contained 31 characters. The positive samples (containing the P300 signal) in the training set were

42 \times 2 \times 15 = 1260

, and the negative samples (excluding the P300 signal) were

42 \times 10 \times 15 = 6300

. Since the ratio of positive to negative samples in the training set was 1:5, an imbalanced dataset would cause the model to predict most samples as negative samples. To solve this problem, all positive samples of the training set were duplicated four times to achieve a balance of positive and negative samples, as shown in Table 1.

3.2. Data Preprocessing

To remove the irrelevant part of the EEG signal and improve the SNR of the signals, it was necessary to preprocess the collected EEG data. In EEG signals, noise from different frequency bands, such as power frequency noise, random noise, and other noise, usually appears. P300 appears only in a specific range of frequencies. Therefore, the EEG data were passed through a low-pass or band-pass filter to preserve the active P300 frequency components and remove noise from unrelated frequency bands. In this paper, a 10th-order FIR band-pass filter of 0.1–20 Hz was used to filter the signal.

To ensure the accurate detection of the P300 potential, known to have a typical latency of around 300 ms after stimulation, a specific time window of 0–600 ms was employed for trimming the data. The EEG signal was digitized and downsampled into the form of a matrix. Let

X_{i, j}

denote a signal sample, where

0 \leq i < N_{t}

and

0 \leq j < N_{e l e c}

. Here,

N_{e l e c}

is the number of electrodes (i.e., 64), and

N_{t} = T \times F S

is the number of temporal samples in time T with a sampling frequency

S F

. The number of sample points per channel was 144 (

0.600 \times 240 = 144

). The signal was then downsampled by half to reduce the dimensionality of the data, resulting in a signal sampled at 120 Hz (

0.600 \times 120 = 72

).

3.3. Inception Module

Inception, also called the GoogLeNet architecture, is a CNN classification model proposed by Google in 2014 [28]. The inception module is the core component of the GoogLeNet architecture. The main objective of the inception module is to reduce overfitting problems and computational complexity. In GoogLeNet, a total of nine inception modules are stacked. There are two versions of the inception module, as described below.

3.3.1. Inception Module: Naïve Version

The basic structure of the naïve version of the inception module consists of parallel

1 \times 1

,

3 \times 3

, and

5 \times 5

convolutional layers, along with a

3 \times 3

max-pooling layer [28]. In the final layer, the outputs from each convolutional layer are combined. The inception module utilizes convolution kernels of different scales simultaneously to extract more comprehensive features.

3.3.2. Inception Module: Improved Version

A key challenge addressed by the improved inception module is the problem of excessive convolution, which significantly increases computational demands. Deep neural networks inherently require substantial computing resources. To mitigate the computational burden and improve the efficiency of the model, an additional

1 \times 1

convolutional layer is strategically inserted before the subsequent

3 \times 3

and

5 \times 5

convolutional layers. This preliminary

1 \times 1

convolutional layer effectively reduces the number of input channels, thus reducing the computational overhead [26]. Furthermore, another

1 \times 1

convolutional layer is added after the

3 \times 3

max-pooling layer with a stride of

1 \times 1

. This approach contributes to the network’s capacity to capture essential features and patterns in the data while preserving spatial resolution. The combined architecture aims to strike a balance between computational efficiency and feature representation in deep neural networks. A detailed diagram of the improved inception module is shown in Figure 3.

3.4. Proposed Inception-CNN Model

Typical CNN architectures consist of convolutional layers for feature extraction, pooling layers to reduce feature dimensionality, and fully connected layers to perform classification. The convolutional layers identify important patterns, the pooling layers simplify the data representation, and the fully connected layers integrate these features for accurate predictions. In this research, our key idea is to employ the inception layer in a CNN architecture, similar to previous methods. The basic structure of these techniques includes a spatial convolutional layer, a temporal convolutional layer, batch normalization, and dropout. The proposed model, called Inception-CNN, utilizes an inception layer before each spatial and temporal convolutional layer. The Inception-CNN model is composed of seven layers, as shown in Figure 4. The details of each layer are described below.

L_{o}

: This is the first layer and performs batch normalization on the input signal with a shape of

N_{t} \times N_{e l e c}

, which is given as

X_{i, j}^{B N} = B N (X_{i, j})

(1)

where

0 \leq i < N_{t}

,

0 \leq j < N_{e l e c}

.

N_{t}

is the number of temporal samples, which equals 72, and

N_{e l e c}

denotes the number of electrodes, which is 64.

L_{1}

: The second layer is an inception layer consisting of four parallel layers:

L_{A S}

,

L_{B S}

,

L_{C S}

, and

L_{D S}

.

L_{A S}

is a convolutional layer with a kernel size of

1 \times 1

, which is used to capture low-level features.

L_{B S}

and

L_{C S}

both start with a

1 \times 1

convolutional layer applied to the input signal, followed by a

3 \times 3

convolutional layer in

L_{B S}

and a

5 \times 5

convolutional layer in

L_{C S}

. These layers help capture patterns at different scales and are optimized to extract high-level features.

L_{D S}

starts with a

3 \times 3

max-pooling layer with a stride of

1 \times 1

on the input signal. This is followed by a

1 \times 1

convolutional layer. Finally, the outputs of all four layers are concatenated to form the output of the inception layer

(O_{i n c e p t i o n})

:

O_{i n c e p t i o n} = c o n c a t e n a t e (L_{A S}, L_{B S}, L_{C S}, L_{D S})

(2)

L_{2}

: The third layer serves as a spatial filter. This layer is composed of 16 convolution kernels of size

1 \times 64

, with a stride of 1 and ReLU activation.

L_{3}

: This layer is identical to

L_{1}

, except for the use of a different number of filters in

L_{A T}

,

L_{B T}

,

L_{C T}

, and

L_{D T}

.

L_{4}

: This layer functions as a temporal filter and comprises 16 convolution kernels of size

36 \times 1

. Each kernel operates with a stride of 1 and uses ReLU activation. Additionally, an average pooling operation of size

2 \times 1

is applied to further reduce the spatial dimensions and enhance feature extraction.

L_{5}

: In this layer, flattening is performed to transform the tensor into a one-dimensional array, and then it is fed to a dense layer with 64 neurons. A dropout rate of 0.4 is applied to prevent overfitting [34].

L_{6}

: The seventh layer is a fully connected (FC) output layer with a softmax activation function, containing two neurons representing the two classes: P300 and non-P300 signals. The softmax function computes the probabilities for these classes based on the outputs of the two neurons, denoted as

X^{1}

and

X^{0}

, respectively. The predicted output is determined as follows:

O (X) = \{\begin{matrix} 1, & X^{1} > X^{0} \\ 0, & otherwise \end{matrix}

(3)

where X represents an input signal and O is the classifier. If

O (X) = 1

, the classification result is P300, and if

O (X) = 0

, the predicted label is non-P300.

A detailed overview of the Inception-CNN architecture is provided in Table 2. The increase in trainable parameters in our model is primarily due to the inclusion of two inception layers, which capture multi-scale features by applying multiple filter sizes within each layer. These inception layers consist of varying filter sizes, such as

1 \times 1

,

3 \times 3

, and

5 \times 5

, and combinations of these, allowing the model to capture fine-grained details. For instance, in

L_{1}

(the first inception layer), multiple filter combinations contribute to 18,072 parameters. In

L_{2}

(the second inception layer), the increased parameter count of 81,936 arises from applying these multi-scale filters on larger input sizes. While this leads to a higher parameter count and increased computational complexity, it significantly improves the model’s ability to learn richer and more diverse representations [26]. Additionally, to analyze the effect of the inception layer when used separately with either the spatial or temporal convolutional layers, two separate configurations are proposed: Inception-CNN-S and Inception-CNN-T. Inception-CNN-S utilizes the inception layer before the spatial convolutional layer only, resulting in fewer parameters, as shown in Table 2. In contrast, Inception-CNN-T employs the inception layer before the temporal convolutional layer, which leads to a higher parameter count due to the complexity of temporal features. The architectures of Inception-CNN-S and Inception-CNN-T are depicted in Figure 5, where

L_{3}

and

L_{1}

are excluded in each configuration, respectively. The filter details utilized in both inception layers are provided in Table 3. To facilitate our experiments, we employed a pre-trained Transformer model, EEGPT, as a readily available tool for EEG data [35]. The model was used without further fine-tuning or modification, focusing solely on its direct application to our dataset.

3.5. Training

First, the Keras Tuner was utilized to select the appropriate number of filters in the inception layer, neurons in the dense layer, dropout rate, optimizer, and hyperparameters [36]. Then, the model was trained using K-fold cross-validation with four splits. The model training was conducted using the Tesla K80 GPU provided by Google Colab. Google Colab offers a cloud-based platform with free access to GPUs, enabling efficient model training without requiring local high-end hardware. Additionally, it facilitates seamless integration with libraries like TensorFlow and Keras, streamlining the development and experimentation process. The Mean Square Error (MSE) was utilized as the loss function, and Root Mean Square Propagation (RMSProp) was used as the optimizer with a learning rate of 1

\times 10^{- 4}

. The batch size used during training was set to 64. Ten percent of the training data was reserved as validation data. The pseudocode in Table 4 outlines the methods and procedures for EEG-based P300 detection and character recognition, including the training of the proposed CNN architectures.

4. Results

Ourobjective was to use the proposed model to develop an integrated P300 speller system and assess its performance in P300 detection and character recognition using appropriate evaluation standards.

4.1. P300 Detection

The following metrics are commonly used for P300 detection: true positive (

T P

), true negative (

T N

), false positive (

F P

), and false negative (

F N

). TP denotes the number of correctly predicted positive samples, TN denotes the number of correctly predicted negative samples, FP denotes the number of negative samples incorrectly identified as positive, and FN denotes the number of positive samples incorrectly identified as negative. The recognition accuracy (Reco.) of P300 detection is defined as

R e c o . = \frac{T P + T N}{T P + T N + F P + F N}

(4)

Moreover, some other widely used metrics to measure the quality of classification results, such as

r e c a l l

,

p r e c i s i o n

, and F1

s c o r e

, are given as

R e c a l l = \frac{T P}{T P + F N}, P r e c i s i o n = \frac{T P}{T P + F P}

(5)

F 1 - s c o r e = 2 \frac{R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(6)

Precision represents the proportion of true positive samples among the samples identified as positive (P300), while recall represents the proportion of true positive samples correctly detected. Precision and recall are interconnected; if the detection algorithm is ideal, both indices will be high. However, in general, achieving optimal values for both at the same time, where one is high and the other is low, is challenging. Therefore, an F1 score can be chosen as their combined measure. A higher F1 score indicates greater detection accuracy for both P300 and non-P300.

Table 5 presents the results of P300 detection on the BCI IIIA, BCI IIIB, and BCI II datasets. The proposed Inception-CNN, Inception-CNN-S, and Inception-CNN-T models were compared with CNN-1, MCNN-1, MCNN-3, and BN3 [20,22]. The experimental results show that Inception-CNN achieved the highest F1 scores of 47.14%, 55.28%, and 78.94% on the BCI IIIA, BCI IIIB, and BCI II datasets, respectively. Using the pre-trained Transformer model EEGPT yielded F1 scores of 10.13%, 12.54%, and 7.68% on the BCI IIIA, BCI IIIB, and BCI II datasets, respectively, highlighting its out-of-the-box effectiveness for P300 classification. However, as no additional customization was performed, the results might not represent its full potential.

4.2. Character Recognition Accuracy

Character recognition accuracy is an important indicator for evaluating the performance of the P300 spelling system. The character recognition accuracy is associated with the set number of experimental repeats. The higher the number of repeats, the greater the accuracy of character recognition; however, this also increases the experiment time. The P300 detection probabilities can be used to find the target character. In this experiment, the number of epochs is set to 15. Let O represent the output from the classifier, and

P_{i, j}

denote the pattern of the epoch corresponding to the flash in the

j^{th}

column (

1 \leq j < 6

) or the

{(j - 6)}^{th}

row (

7 \leq j < 12

) during the

i^{th}

epoch. The cumulative probabilities

v (j)

for P300 detection are given by

v (j) = \sum_{i = 1}^{n} O (P_{i, j}), 1 \leq j \leq 12

(7)

The coordinates of the character in row (r) and column (c) are defined as

r = \underset{7 ⩽ j ⩽ 12}{arg max} v (j) - 6

(8)

c = \underset{1 ⩽ j ⩽ 6}{arg max} v (j)

(9)

Table 6 summarizes the character recognition accuracy for Subjects A and B (BCI IIIA and BCI IIIB datasets). The three proposed models were compared with WE-SPSQ CNN, CM-CW-CNN-ESVM, SSAE-EWANN, BN3, CNN-1, MCNN-1, MCNN-3, and E-SVM [9,19,20,22,23,24]. The results show that Inception-CNN achieved 99% accuracy for both subjects after 15 epochs. Inception-CNN ranked highest at 12 epochs and 5 epochs for Subjects A and B (BCI IIIA and BCI IIIB datasets), respectively. The average accuracy for both subjects with Inception-CNN increased over 15 epochs by 5.3%, 3.9%, 4.3%, 2.5%, 6.8%, 6.1%, and 6.6%, respectively, from WE-SPSQ CNN, CM-CW-CNN-ESVM, SSAE-EWANN, BN3, CNN-1, MCNN-1, and MCNN-3. Additionally, across nine epochs, the average accuracy increased by 4% compared to E-SVM. For BCI IIIB, the accuracy at four epochs was identical to that of previously reported methods. A comparison of the character recognition accuracy results shows that implementing the inception layer before both the spatial and temporal convolutional layers in Inception-CNN yielded higher accuracy than implementing the inception layer before only one of the convolutional layers in Inception-CNN-S or Inception-CNN-T. This finding indicates that the simultaneous utilization of the inception layer to capture spatial and temporal features improves the model’s capacity to learn discriminative features and thus achieve higher accuracy.

A comparison of Inception-CNN for Subjects A and B (datasets BCI IIIA and BCI IIIB) with previously reported techniques [11,16,17,24,37,38,39] is shown in Table 7. The results show that the proposed method achieved higher accuracy after 10 and 15 epochs. A paired t-test was performed to examine the difference in character recognition accuracy between Inception-CNN and other methods based on the results in Table 6 (

n = 30

pairs for WE-SPSQ CNN, CM-CW-CNN-ESVM, SSAE-EWANN, BN3, CNN-1, MCNN-1, and MCNN-3;

n = 18

pairs for E-SVM). The results are shown in Table 8. The p-values for Inception-CNN against all methods were less than

0.05

, which indicates that Inception-CNN achieved significantly higher character recognition accuracy than other methods.

The results in Table 6 were used to calculate the Information Transfer Rate (ITR) in bits per minute (bpm). The ITR was calculated to demonstrate the speed of the BCI system. It is defined as follows:

I T R = \frac{60 (P {log}_{2} (P) + (1 - P) {log}_{2} (\frac{1 - P}{N - 1}) + {log}_{2} (N))}{T}

(10)

where P denotes the probability of recognizing a character, N is the number of classes (

N = 36

), and T is the time taken to detect a character. According to the speller paradigm, each row/column is intensified for 100 ms followed by a pause of 75 ms and a pause of

2.5

s between each character epoch. T can be defined as

T = 2.5 + 2.1 n,

1 \leq n \leq 15

, with n representing the number of epochs. A comparison of the ITR of the proposed method with that of other models is shown in Figure 6. The results show that at epoch 2, the proposed method achieved the highest ITR compared to other epochs. This implies that our model has a considerable edge in terms of the ITR at that epoch, which translates to superior single-trial accuracy and makes it ideal for constructing practical speller systems.

The character recognition accuracy of Inception-CNN on the BCI II dataset was compared with that of previous methods [11,13,15,19,22,23] across six epochs, as shown in Table 9. The results show that the proposed method achieved 100% accuracy in the fourth epoch and lower error rates in the first three epochs compared to the other techniques. The predicted characters are detailed in Table 10. From the results in Table 9, the ITR was calculated for the first three epochs. Figure 7 demonstrates that the proposed model achieved a higher ITR during the first three epochs.

5. Discussion

BCI systems based on P300 signals have gained significant attention for their role in assistive communication technologies. Accurate and efficient character recognition remains a critical challenge in developing robust P300-based speller systems. Traditional methods, including ensemble classifiers [9], PCA combined with SVMs [11], and Bayesian channel selection [17], have demonstrated notable improvements. Advanced feature extraction techniques, such as HOSRDA [16] and discrete wavelet transform (DWT) [15], have further refined ERP detection processes.

Deep learning models have introduced transformative approaches, enabling end-to-end learning and robust feature extraction. Techniques such as sparse autoencoders [19] and CNNs [22] have achieved substantial performance gains. Despite these advancements, enhancing P300 signal detection accuracy remains essential for improving character recognition in speller-based BCI systems.

Recent studies have introduced methodologies like converting EEG signals into visually interpretable images [40], addressing cross-participant variability with an Attention Domain Adversarial Neural Network (OADANN) [41], optimizing preprocessing and transfer learning using a CNN-based classifier, P3CNET [42], and minimizing stimulus reliance with a spatial-temporal neural network (STNN) [43]. However, these studies were excluded from our comparisons due to the use of different datasets. The current comparison focuses on studies utilizing dataset II of BCI Competition III (Subjects A and B) and dataset IIb of BCI Competition II.

In this study, an Inception-CNN architecture was employed to facilitate multi-scale feature extraction from P300 signals. By utilizing convolutional filters of varying sizes, the architecture effectively captured diverse frequency and temporal characteristics, which are crucial for distinguishing between P300 and non-P300 signals. As shown in Table 11, the performance of various methods on the BCI IIIA and BCI IIIB datasets was evaluated using two metrics: the mean F1 score and the character recognition/ITR [20]. Regarding the mean F1 score, calculated by averaging the results across both datasets, Inception-CNN surpassed BN3, CNN-1, MCNN-1, and MCNN-3 by 1.96%, 4.21%, 4.74%, and 3.94%, respectively, in P300 classification. Similarly, in terms of the character recognition rate, Inception-CNN achieved the highest or comparable performance in most scenarios. Notably, it excelled on the BCI IIIA dataset at the 5th and 10th epochs and on the BCI IIIB dataset at the 5th epoch. Furthermore, when considering the average recognition rate across both datasets, Inception-CNN consistently outperformed other methods [19,20,22,23,24] at the 5th and 10th epochs.

Table 12 summarizes the results of P300 classification and character recognition on the BCI II dataset. The proposed method achieved an F1 score of 0.7894, exceeding that of BN3 by 15.53%. For character recognition, the model consistently outperformed others in the first three epochs, achieving high recognition and ITRs across various scenarios compared to previous methods [11,13,15,19,22,23]. The inception layer’s multi-scale feature extraction capability improved classification accuracy and character recognition by effectively capturing the distinct frequency and temporal characteristics of P300 signals.

The performance of the Inception-CNN model was evaluated using test data to simulate real-time conditions (Table 13). The P300 detection time, measured in seconds, was within an acceptable range, making it suitable for real-time applications [44]. However, this evaluation was conducted in an offline setup. Further work is needed to implement and validate the algorithm in a fully real-time environment, including integration with data acquisition systems. Conducting fully real-time tests would also require Institutional Review Board (IRB) approval for live subject testing, and ethical and procedural considerations would also need to be addressed. These aspects, which are beyond the scope of the current study, will be addressed in future research to advance the practical applicability of the algorithm.

6. Conclusions

Currently, the performance of deep learning and machine learning methods for P300-based brain–computer interface (BCI) systems is limited. To address this issue, we propose a method called Inception-CNN, which incorporates an inception layer into a CNN to extract multi-scale features of P300. We also introduce two models, namely Inception-CNN-S and Inception-CNN-T, to evaluate the impact of the inception layer when used with spatial and temporal convolutional layers separately. The experimental results demonstrate that Inception-CNN outperforms other state-of-the-art methods in terms of P300 detection accuracy. Specifically, our proposed model leverages multi-scale filters to learn discriminative features of the P300 signal, leading to enhanced character recognition accuracy compared to existing methods.

7. Limitations and Future Directions

Inception-CNN demonstrates impressive performance, particularly in P300-based speller systems, by effectively capturing multi-scale features for accurate character recognition. The dual inception modules in Inception-CNN enable it to extract richer representations of EEG signals, contributing to better performance when compared to the alternative configurations, Inception-CNN-S and Inception-CNN-T, which utilize only a single inception layer. Despite these advantages, the inclusion of two inception layers introduces considerable computational complexity, leading to higher resource consumption and longer training times. This can limit the model’s real-time deployment, especially in resource-constrained environments. However, these trade-offs are often justified when high accuracy is paramount. In particular, Inception-CNN’s ability to leverage multi-scale features is superior to the configurations with only one inception layer, highlighting its effectiveness for P300 detection. Moreover, despite the increase in computational complexity, the P300 detection time, measured in seconds, is within an acceptable range, making it suitable for real-time applications. The strength of Inception-CNN lies in its ability to extract meaningful features from EEG signals, although its interpretability remains a challenge. The model’s feature-learning process is not yet fully understood, and integrating explainable AI techniques could provide valuable insights, as well as optimize the architecture and further improve performance in future applications. The use of a pre-trained Transformer model in this study served as a baseline to demonstrate its applicability in P300 classification. Future research could explore fine-tuning or adapting the model for enhanced performance.

Inception-CNN was evaluated on datasets like BCI Competition III (dataset II) and BCI Competition II (dataset IIb). While further testing is needed, its adaptive design and potential for fine-tuning suggest it could achieve robust performance across various ERP paradigms. Expanding its application to additional datasets and real-time testing could enhance its accuracy and practical utility.

Author Contributions

Conceptualization, C.-L.L.; Methodology, M.U.; Software, M.U.; Investigation, C.-L.L. and Y.-T.C.; Data curation, M.U.; Writing—original draft, M.U.; Writing—review & editing, C.-L.L. and Y.-T.C.; Visualization, M.U.; Supervision, Y.-T.C.; Funding acquisition, C.-L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets are available online. The code and trained weights can be provided upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shih, J.J.; Krusienski, D.J.; Wolpaw, J.R. Brain-Computer Interfaces in Medicine. Mayo Clin. Proc. 2012, 87, 268–279. [Google Scholar] [CrossRef] [PubMed]
Bi, L.; Fan, X.A.; Liu, Y. EEG-Based Brain-Controlled Mobile Robots: A Survey. IEEE Trans. Hum.-Mach. Syst. 2013, 43, 161–176. [Google Scholar] [CrossRef]
Ramadan, R.A.; Vasilakos, A.V. Brain Computer Interface: Control Signals Review. Neurocomputing 2017, 223, 26–44. [Google Scholar] [CrossRef]
Wolpaw, J.R. Brain–Computer Interfaces as New Brain Output Pathways. J. Physiol. 2007, 579, 613–619. [Google Scholar] [CrossRef] [PubMed]
Birbaumer, N.; Cohen, L.G. Brain–computer interfaces: Communication and restoration of movement in paralysis. J. Physiol. 2007, 579, 621–636. [Google Scholar] [CrossRef]
Birbaumer, N.; Ghanayim, N.; Hinterberger, T.; Iversen, I.; Kotchoubey, B.; Kübler, A.; Perelmouter, J.; Taub, E.; Flor, H. A spelling device for the paralyzed. Nature 1999, 398, 297–298. [Google Scholar] [CrossRef]
Sutton, S.; Tueting, P.; Zubin, J.; John, E.R. Information delivery and the sensory evoked potential. Science 1967, 155, 1436–1439. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Ma, Z.; Lu, W.; Li, Y. Automatic removal of the eye blink artifact from EEG using an ICA-based template matching approach. Physiol. Meas. 2006, 27, 425. [Google Scholar] [CrossRef] [PubMed]
Rakotomamonjy, A.; Guigue, V. BCI competition III: Dataset II-ensemble of SVMs for BCI P300 speller. IEEE Trans. Biomed. Eng. 2008, 55, 1147–1154. [Google Scholar] [CrossRef]
Gu, Z.; Yu, Z.; Shen, Z.; Li, Y. An online semi-supervised brain–computer interface. IEEE Trans. Biomed. Eng. 2013, 60, 2614–2623. [Google Scholar] [PubMed]
Kundu, S.; Ari, S. P300 detection with brain–computer interface application using PCA and ensemble of weighted SVMs. IETE J. Res. 2018, 64, 406–414. [Google Scholar] [CrossRef]
Li, Y.; Pan, J.; Wang, F.; Yu, Z. A hybrid BCI system combining P300 and SSVEP and its application to wheelchair control. IEEE Trans. Biomed. Eng. 2013, 60, 3156–3166. [Google Scholar]
Kaper, M.; Meinicke, P.; Grossekathoefer, U.; Lingner, T.; Ritter, H. BCI competition 2003—Data set IIb: Support vector machines for the P300 speller paradigm. IEEE Trans. Biomed. Eng. 2004, 51, 1073–1076. [Google Scholar] [CrossRef] [PubMed]
Blankertz, B.; Lemm, S.; Treder, M.; Haufe, S.; Müller, K.R. Single-trial analysis and classification of ERP components—A tutorial. NeuroImage 2011, 56, 814–825. [Google Scholar] [CrossRef]
Bostanov, V. BCI competition 2003—Data sets Ib and IIb: Feature extraction from event-related brain potentials with the continuous wavelet transform and the t-value scalogram. IEEE Trans. Biomed. Eng. 2004, 51, 1057–1061. [Google Scholar] [CrossRef] [PubMed]
Idaji, M.J.; Shamsollahi, M.B.; Sardouie, S.H. Higher order spectral regression discriminant analysis (HOSRDA): A tensor feature reduction method for ERP detection. Pattern Recognit. 2017, 70, 152–162. [Google Scholar] [CrossRef]
Yu, T.; Yu, Z.; Gu, Z.; Li, Y. Grouped automatic relevance determination and its application in channel selection for P300 BCIs. IEEE Trans. Neural Syst. Rehabil. Eng. 2015, 23, 1068–1077. [Google Scholar] [CrossRef] [PubMed]
Hoffmann, U.; Garcia, G.; Vesin, J.M.; Diserens, K.; Ebrahimi, T. A boosting approach to P300 detection with application to brain-computer interfaces. In Proceedings of the 2nd International IEEE EMBS Conference on Neural Engineering, Arlington, VA, USA, 16–19 March 2005; pp. 97–100. [Google Scholar]
Kundu, S.; Ari, S. A deep learning architecture for P300 detection with brain-computer interface application. IRBM 2020, 41, 31–38. [Google Scholar] [CrossRef]
Cecotti, H.; Graser, A. Convolutional neural networks for P300 detection with application to brain-computer interfaces. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 433–445. [Google Scholar] [CrossRef]
Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural Eng. 2018, 15, 056013. [Google Scholar] [CrossRef]
Liu, M.; Wu, W.; Gu, Z.; Yu, Z.; Qi, F.; Li, Y. Deep learning based on batch normalization for P300 signal detection. Neurocomputing 2018, 275, 288–297. [Google Scholar] [CrossRef]
Kundu, S.; Ari, S. P300-based character recognition using convolutional neural network and support vector machine. Biomed. Signal Process. Control 2020, 55, 101645. [Google Scholar] [CrossRef]
Shukla, P.K.; Cecotti, H.; Meena, Y.K. Towards Effective Deep Neural Network Approach for Multi-Trial P300-based Character Recognition in Brain-Computer Interfaces. arXiv 2024, arXiv:2410.08561. [Google Scholar]
Liu, M.; Shi, W.; Zhao, L.; Beyette, F.R., Jr. Best performance with fewest resources: Unveiling the most resource-efficient convolutional neural network for P300 detection with the aid of Explainable AI. Mach. Learn. Appl. 2024, 16, 100542. [Google Scholar] [CrossRef]
Santamaria-Vazquez, E.; Martinez-Cagigal, V.; Vaquerizo-Villar, F.; Hornero, R. EEG-Inception: A Novel Deep Convolutional Neural Network for Assistive ERP-Based Brain-Computer Interfaces. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 2773–2782. [Google Scholar] [CrossRef] [PubMed]
Du, Y.; Liu, J. IENet: A Robust Convolutional Neural Network for EEG-Based Brain-Computer Interfaces. J. Neural Eng. 2022, 19, 036031. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Luck, S.J. An Introduction to the Event-Related Potential Technique; MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
Donchin, E.; Spencer, K.M.; Wijesinghe, R. The mental prosthesis: Assessing the speed of a P300-based brain-computer interface. IEEE Trans. Rehabil. Eng. 2000, 8, 174–179. [Google Scholar] [CrossRef]
Farwell, L.A.; Donchin, E. Talking off the top of your head: Toward a mental prosthesis utilizing event-related brain potentials. Electroencephalogr. Clin. Neurophysiol. 1988, 70, 510–523. [Google Scholar] [CrossRef] [PubMed]
Krusienski, D.; Schalk, G. Documentation Wadsworth BCI Dataset (P300 Evoked Potentials) Data Acquired Using BCI2000’s P3 Speller Paradigm. BCI Competition III Challenge. 2004, pp. 1–8. Available online: https://www.bbci.de/competition/iii/desc_II.pdf (accessed on 15 December 2024).
Blankertz, B.; Krusienski, D.; Schalk, G. Documentation second Wadsworth BCI dataset (P300 evoked potentials) data acquired using BCI2000 P300 Speller Paradigm. BCI Classification Contest November (2002).
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Wang, G.; Liu, W.; He, Y.; Xu, C.; Ma, L.; Li, H. EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals. In Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 9–15 December 2024. [Google Scholar]
O’Malley, T.; Bursztein, E.; Long, J.; Chollet, F.; Jin, H.; Invernizzi, L. Keras Tuner 2019. Available online: https://github.com/keras-team/keras-tuner (accessed on 15 December 2024).
Lee, Y.R.; Kim, H.N. A data partitioning method for increasing ensemble diversity of an eSVM-based P300 speller. Biomed. Signal Process. Control 2018, 39, 53–63. [Google Scholar] [CrossRef]
Salvaris, M.; Sepulveda, F. Wavelets and ensemble of FLDs for P300 classification. In Proceedings of the 2009 4th International IEEE/EMBS Conference on Neural Engineering, Antalya, Turkey, 29 April–2 May 2009; pp. 339–342. [Google Scholar]
Tomioka, R.; Müller, K.R. A regularized discriminative framework for EEG analysis with application to brain–computer interface. NeuroImage 2010, 49, 415–432. [Google Scholar] [CrossRef] [PubMed]
Ail, B.E.; Ramele, R.; Gambini, J.; Santos, J.M. An intrinsically explainable method to decode p300 waveforms from EEG signal plots based on convolutional neural networks. Brain Sci. 2024, 14, 836. [Google Scholar] [CrossRef] [PubMed]
Li, S.; Daly, I.; Guan, C.; Cichocki, A.; Jin, J. Inter-participant transfer learning with attention based domain adversarial training for P300 detection. Neural Netw. 2024, 180, 106655. [Google Scholar] [CrossRef] [PubMed]
Daǧ, I.; Dui, L.G.; Ferrante, S.; Pedrocchi, A.; Antonietti, A. Leveraging deep learning techniques to improve P300-based brain computer interfaces. IEEE J. Biomed. Health Inform. 2022, 26, 4892–4902. [Google Scholar]
Zhang, Z.; Yu, X.; Rong, X.; Iwata, M. Spatial-temporal neural network for P300 detection. IEEE Access 2021, 9, 163441–163455. [Google Scholar] [CrossRef]
Zhang, J.; Wang, B.; Zhang, C.; Xiao, Y.; Wang, M.Y. An EEG/EMG/EOG-Based Multimodal Human-Machine Interface to Real-Time Control of a Soft Robot Hand. Front. Neurorobot. 2019, 13, 7. [Google Scholar] [CrossRef]

Figure 1. Comparison of P300 and non-P300 waveforms.

Figure 2. Character matrix in the row-column paradigm.

Figure 3. Detailed structure of the improved inception module. The four parallel layers, denoted as

L_{A}

,

L_{B}

,

L_{C}

, and

L_{D}

, are concatenated at the end to form a unified feature representation.

Figure 3. Detailed structure of the improved inception module. The four parallel layers, denoted as

L_{A}

,

L_{B}

,

L_{C}

, and

L_{D}

, are concatenated at the end to form a unified feature representation.

Figure 4. Inception-CNN architecture, illustrating the inception layer, convolutional layer, and fully connected (FC) layer, along with the input and output feature maps (FMs). The circles in the FC layer represent neurons performing weighted sums followed by activation functions. The concatenated outputs of the inception layer before the spatial convolutional layer are denoted as

L_{A S}

,

L_{B S}

,

L_{C S}

, and

L_{D S}

, while those before the temporal convolutional layer are denoted as

L_{A T}

,

L_{B T}

,

L_{C T}

, and

L_{D T}

.

Figure 4. Inception-CNN architecture, illustrating the inception layer, convolutional layer, and fully connected (FC) layer, along with the input and output feature maps (FMs). The circles in the FC layer represent neurons performing weighted sums followed by activation functions. The concatenated outputs of the inception layer before the spatial convolutional layer are denoted as

L_{A S}

,

L_{B S}

,

L_{C S}

, and

L_{D S}

, while those before the temporal convolutional layer are denoted as

L_{A T}

,

L_{B T}

,

L_{C T}

, and

L_{D T}

.

Figure 5. (a) The structural composition of Inception-CNN-S, without the

L_{3}

layer, and (b) the configuration of Inception-CNN-T, without the

L_{1}

layer.

Figure 5. (a) The structural composition of Inception-CNN-S, without the

L_{3}

layer, and (b) the configuration of Inception-CNN-T, without the

L_{1}

layer.

Figure 6. Comparison of ITR of Inception-CNN with that of different models on the BCI IIIA and BCI IIIB datasets.

Figure 7. Comparison of the ITR of the proposed Inception-CNN model with that of previously reported techniques on the BCI II dataset [13].

Table 1. Number of samples in the training and test sets for BCI IIIA, BCI IIIB, and BCI II.

Subject	Train		Test
Subject	P300	Non-P300	P300	Non-P300
BCI IIIA	12,750	12,750	3000	15,000
BCI IIIB	12,750	12,750	3000	15,000
BCI II	6300	6300	930	4650

Table 2. Overview of the Inception-CNN architecture. The architecture consists of seven layers, denoted as

L_{0}

,

L_{1}

,

L_{2}

,

L_{3}

,

L_{4}

,

L_{5}

, and

L_{6}

. The kernel size, input size, number of features, and parameters in each layer are provided.

Table 2. Overview of the Inception-CNN architecture. The architecture consists of seven layers, denoted as

L_{0}

,

L_{1}

,

L_{2}

,

L_{3}

,

L_{4}

,

L_{5}

, and

L_{6}

. The kernel size, input size, number of features, and parameters in each layer are provided.

Layer	Kernel Size	Input Size	Number of Features	Parameters
$L_{0}$	-	$72 \times 64$	-	4
$L_{1}$	( $1 \times 1$ ), ( $1 \times 1$ )( $3 \times 3$ ), ( $1 \times 1$ )( $5 \times 5$ ), ( $1 \times 1$ )	$72 \times 64$	80	18,072
$L_{2}$	$1 \times 64$	$72 \times 64$	16	81,936
$L_{3}$	( $1 \times 1$ ), ( $1 \times 1$ )( $3 \times 3$ ), ( $1 \times 1$ )( $5 \times 5$ ), ( $1 \times 1$ )	$72 \times 1$	56	3856
$L_{4}$	$36 \times 1$	$72 \times 1$	16	32,272
$L_{5}$	-	$18 \times 1$	64	18,496
$L_{6}$	-	64	-	130
Total	-	-	-	154,766

Table 3. Parameter details of the inception layer.

Layers	1 × 1	3 × 3	5 × 5
$L_{A S}$	24	-	-
$L_{B S}$	24	16	-
$L_{C S}$	24	-	24
$L_{D S}$	16	-	-
$L_{A T}$	24	-	-
$L_{B T}$	16	8	-
$L_{C T}$	8	-	8
$L_{D T}$	16	-	-

Table 4. Steps in the experimental setup, along with the descriptions and pseudocode for the BCI IIIA, BCI IIIB, and BCI II datasets.

Step	Description	Pseudocode
		Procedure start
1	Data Preprocessing	Load EEG data → Filter (0.1–20 Hz)
1	Data Preprocessing	→ Epoch Extraction (0–600 ms) → Downsample (240 Hz to 120 Hz)
2	Define CNN	Define CNN (Inception-CNN, Inception-CNN-S, Inception-CNN-T)
3	Train Model	Use Keras Tuner to optimize hyperparameters (e.g., filter sizes, learning rate)
		Train each model using K-fold cross-validation
		Evaluate models
4	P300 Detection	Predict P300/Non-P300
5	Probability Calculation	Aggregate epoch probabilities
6	Character Mapping	Identify maximum probability row/column → Decode
7	Character Recognition	Map (row, column) → Display character
8	Accuracy Evaluation	Calculate F1 score and other metrics (P300/Non-P300) Character recognition accuracy
8	Accuracy Evaluation	Compare with other methods
		End of procedure

Table 5. P300 detection results of the proposed methods compared with those of previously reported techniques on the BCI IIIA, BCI IIIB, and BCI II datasets. The results in bold represent the highest values.

Dataset	Method	TP	TN	FP	FN	Reco.	Recall	Precision	F1 Score
BCI IIIA	Inception-CNN	1598	12819	2181	1402	$0.8009$	$0.5326$	$0.4228$	$0.4714$
	Inception-CNN-S	1515	12953	2047	1485	$0.8037$	$0.5051$	$0.4253$	$0.4617$
	Inception-CNN-T	1563	12598	2402	1437	$0.7867$	$0.5210$	$0.3941$	$0.4488$
	BN3 [22]	1910	11615	3385	1090	$0.7513$	$0.6367$	$0.3607$	$0.4605$
	CNN-1 [20]	2021	10645	4355	979	$0.7037$	$0.6737$	$0.3170$	$0.4311$
	MCNN-1 [20]	2071	10348	4652	929	$0.6899$	$0.6903$	$0.3080$	$0.4260$
	MCNN-3 [20]	2023	10645	4355	977	$0.7038$	$0.6743$	$0.3172$	$0.4314$
BCI IIIB	Inception-CNN	1712	13519	1481	1288	$0.8461$	$0.5706$	$0.5361$	$0.5528$
	Inception-CNN-S	1795	13007	1993	1205	$0.8223$	$0.5983$	$0.4738$	$0.5288$
	Inception-CNN-T	1847	12820	2180	1153	$0.8148$	$0.6156$	$0.4586$	$0.5256$
	BN3 [22]	2084	12139	2861	916	$0.7902$	$0.6947$	$0.4214$	$0.5246$
	CNN-1 [20]	2035	12039	2961	965	$0.7037$	$0.6783$	$0.4073$	$0.5090$
	MCNN-1 [20]	2202	11453	3547	798	$0.6899$	$0.7340$	$0.3830$	$0.5034$
	MCNN-3 [20]	2077	11997	3003	923	$0.7038$	$0.6923$	$0.4089$	$0.5141$
BCI II	Inception-CNN	778	4387	263	152	$0.9256$	$0.8365$	$0.7473$	$0.7894$
	Inception-CNN-S	749	4213	437	181	$0.8892$	$0.8053$	$0.6315$	$0.7079$
	Inception-CNN-T	733	3944	706	197	$0.8381$	$0.7881$	$0.5093$	$0.6188$
	BN3 [22]	752	3960	690	178	$0.8444$	$0.8086$	$0.5215$	$0.6341$

Table 6. Character recognition accuracy of the proposed methods compared with that of previously reported techniques on the BCI IIIA and BCI IIIB datasets. The results in bold represent the highest accuracy.

Subject	Method	Number of Epochs
Subject	Method	$1$	$2$	$3$	$4$	$5$	$6$	$7$	$8$	$9$	$10$	$11$	$12$	$13$	$14$	$15$
BCI IIIA	Inception-CNN	22	$47$	56	$68$	70	$78$	$84$	$86$	$89$	90	$94$	$94$	$96$	$97$	$99$
	Inception-CNN-S	20	42	52	58	72	73	76	82	87	$92$	91	90	92	91	95
	Inception-CNN-T	$24$	37	51	61	64	74	79	78	84	88	87	86	87	90	92
	WE-SPSQ CNN [24]	19	29	56	67	$75$	70	76	77	83	86	88	91	94	93	98
	CM-CW-CNN-ESVM [23]	22	32	55	59	64	70	74	78	81	86	86	90	91	94	$99$
	SSAE-EWANN [19]	21	35	54	64	67	72	74	76	83	89	89	93	92	95	98
	BN3 [22]	22	39	$58$	67	73	75	79	81	82	86	89	92	94	96	98
	CNN-1 [20]	16	33	47	52	61	65	77	78	85	86	90	91	91	93	97
	MCNN-1 [20]	18	31	50	54	61	68	76	76	79	82	89	92	91	93	97
	MCNN-3 [20]	17	35	50	55	63	67	78	79	84	85	91	90	92	94	97
	E-SVM [9]	16	32	52	60	72	−	−	−	−	83	−	−	94	−	97
BCI IIIB	Inception-CNN	46	$64$	$71$	75	$84$	$88$	$91$	91	93	94	$95$	95	96	$98$	$99$
	Inception-CNN-S	43	54	64	71	81	84	86	88	89	92	91	92	91	93	94
	Inception-CNN-T	41	59	65	72	81	83	87	90	90	92	92	94	93	93	94
	WE-SPSQ CNN [24]	43	59	66	$77$	78	83	89	89	89	93	92	92	91	92	91
	CM-CW-CNN-ESVM [23]	37	58	70	72	80	86	86	89	93	$95$	$95$	$97$	$97$	$98$	$99$
	SSAE-EWANN [19]	39	59	66	68	76	80	85	87	91	93	93	93	95	95	98
	BN3 [22]	$47$	59	70	73	76	82	84	91	$94$	95	95	95	94	94	95
	CNN-1 [20]	35	52	59	68	79	81	82	89	92	91	91	90	91	92	92
	MCNN-1 [20]	39	55	62	64	77	79	86	$92$	91	92	$95$	95	95	94	94
	MCNN-3 [20]	34	56	60	68	74	80	82	89	90	90	91	88	90	91	92
	E-SVM [9]	35	53	62	68	75	−	−	−	−	91	−	−	96	−	96

Table 7. Character recognition accuracy of the proposed Inception-CNN model compared with that of previously reported techniques on the BCI IIIA and BCI IIIB datasets. The numbers in bold represent the best performance.

Dataset	Epochs	PCA-EWSVM [11]	HOSRDA + LDA [16]	gsBLDA [17]	Data Partition-ESVM [37]	EFLD [38]	TSR [39]	WE-SPSQ CNN [24]	Inception-CNN
BCI IIIA	10	$82.0$	$84.0$	$88.0$	$85.0$	$82.0$	−	$86.0$	$90.0$
	15	$99.0$	$96.0$	$99.0$	$97.0$	$93.0$	$99.0$	98	$99.0$
BCI IIIB	10	$93.0$	$94.0$	$91.0$	$92.0$	$93.0$	−	$93.0$	$94.0$
	15	$97.0$	$97.0$	$95.0$	$95.0$	$97.0$	$95.0$	$91.0$	$99.0$
Avg	10	$87.5$	$89.0$	$89.5$	$88.5$	$87.5$	−	$87.5$	$92.0$
	15	$98.0$	$96.5$	$97.0$	$96.0$	$95.0$	$97.0$	$94.5$	$99.0$

Table 8. Results of paired t-test comparing the character recognition accuracy of Inception-CNN with that of other methods on the BCI IIIA and BCI IIIB datasets, including test-statistic (t-stat) and p-value.

Inception-CNN Paired with	t-Stat	p-Value
WE-SPSQ CNN	5.650	$4.1671 \times 10^{- 6}$
CM-CW-CNN-ESVM	5.193	$1.4835 \times 10^{- 5}$
SSAE-EWANN	7.602	$2.2137 \times 10^{- 8}$
BN3	4.508	$9.9161 \times 10^{- 5}$
CNN-1	9.731	$1.2195 \times 10^{- 10}$
MCNN-1	7.977	$8.4854 \times 10^{- 9}$
MCNN-3	11.692	$1.6978 \times 10^{- 12}$
ESVM	5.233	$1.0111 \times 10^{- 4}$

Table 9. Character recognition accuracy of the proposed Inception-CNN model compared with that of previously reported techniques on the BCI II dataset. The values in bold represent the highest values.

Subject	Number of Epochs
Subject	1	2	3	4	5	6
Inception-CNN	$26$	$29$	$30$	31	31	31
SSAE-EWANN	23	26	29	30	31	31
CM-CW-CNN-ESVM	25	28	29	31	31	31
BN3	24	23	27	28	29	30
PCA-EWSVM	25	27	29	31	31	31
Vladimir	20	26	29	30	30	31
Kaper et al. [13]	20	22	26	30	31	31

Table 10. The predicted characters in the first 4 epochs and the error (in %) of the proposed Inception-CNN model on the BCI II dataset. The characters in bold represent incorrectly predicted characters.

Epoch	Predicted Characters	Error
1	FOODMOOTHAMPIECAKETUHAZYAOTVAZ7	16.2%
2	FOODMOOTHAMPIECAKETUNAZYGOTF5Y7	6.4%
3	FOODMOOTHAMPIECAKETUNAZSGOT4567	3.2%
4	FOODMOOTHAMPIECAKETUNAZYGOT4567	0%

Table 11. Performance comparison of various methods on the BCI IIIA (Subject A) and BCI IIIB (Subject B) datasets in terms of mean F1 score and character recognition/ITR for epochs 5, 10, and 15. The best results in each column are highlighted in bold. A dash (-) indicates that the value was not reported in the respective paper.

Method	Mean F1 Score	BCI IIIA			BCI IIIB			Mean
Method	Mean F1 Score	5	10	15	5	10	15	5	10	15
Inception-CNN	0.5121	70/12.69	90/10.69	99/8.89	84/17.15	94/11.58	99/8.89	77.00/14.92	92.00/11.14	99.00/8.89
WE-SPSQ CNN [24]	-	75/14.20	86/9.87	98/8.69	78/15.14	93/11.35	91/7.54	76.50/14.67	89.50/10.61	94.50/8.12
CM-CW-CNN-ESVM [23]	-	64/10.99	86/9.87	99/8.89	80/15.79	95/11.81	99/8.89	72.00/13.39	90.50/10.84	99.00/8.89
SSAE-EWANN [19]	-	67/11.83	89/10.48	98/8.69	76/14.51	93/11.35	98/8.69	71.50/13.17	91.00/10.92	98.00/8.69
BN3 [22]	0.4925	73/13.59	86/9.87	98/8.69	76/14.51	95/11.81	95/8.17	74.50/14.05	90.50/10.84	96.50/8.43
CNN-1 [20]	0.47	61/10.18	86/9.87	97/8.51	79/15.47	91/10.91	92/7.69	70.00/12.82	88.50/10.39	94.50/8.10
MCNN-1 [20]	0.4647	61/10.18	82/9.11	97/8.51	77/14.83	92/11.13	94/8.00	69.00/12.50	87.00/10.12	95.50/8.26
MCNN-3 [20]	0.4727	63/10.71	85/9.68	97/8.51	74/13.89	90/10.69	92/7.69	68.50/12.30	87.50/10.19	94.50/8.10

Table 12. Summary of P300 classification and character recognition results on the BCI II dataset. The table highlights the performance of the proposed Inception-CNN model, including the F1 score, character recognition rate, and ITR, compared to existing methods. The best results in each column are highlighted in bold. A dash (-) indicates that the value was not reported in the respective paper.

Method	F1 Score	BCI II
Method	F1 Score	1	2	3
Inception-CNN	0.7894	83.87/48.32	93.54/40.24	96.77/32.71
PCA-EWSVM [11]	-	80.64/45.23	87.09/35.40	93.54/30.64
CM-CW-CNN-ESVM [23]	-	80.64/45.23	90.32/37.74	93.54/30.64
SSAE-EWANN [19]	-	74.19/39.42	83.87/33.18	93.54/30.64
BN3 [22]	0.6341	77.41/42.27	74.19/27.06	87.09/26.95
Vladimir [15]	-	64.51/31.45	83.87/33.18	93.54/30.64
Kaper et al [13]	-	64.51/31.45	70.96/25.17	83.87/25.26

Table 13. Performance evaluation of Inception-CNN for P300 detection with five repetitions. The tests were conducted on a Tesla K80 GPU provided by Google Colab, and the average detection time across all repetitions is reported alongside the individual times.

Dataset	t1	t2	t3	t4	t5	Mean
BCI IIIA	3.90	3.74	3.74	3.76	3.86	3.80
BCI IIIB	3.89	3.63	3.65	3.61	3.74	3.70
BCI II	3.62	3.72	3.60	3.61	3.59	3.63

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Usman, M.; Lin, C.-L.; Chen, Y.-T. Multi-Scale Feature Extraction to Improve P300 Detection in Brain–Computer Interfaces. Electronics 2025, 14, 447. https://doi.org/10.3390/electronics14030447

AMA Style

Usman M, Lin C-L, Chen Y-T. Multi-Scale Feature Extraction to Improve P300 Detection in Brain–Computer Interfaces. Electronics. 2025; 14(3):447. https://doi.org/10.3390/electronics14030447

Chicago/Turabian Style

Usman, Muhammad, Chun-Ling Lin, and Yao-Tien Chen. 2025. "Multi-Scale Feature Extraction to Improve P300 Detection in Brain–Computer Interfaces" Electronics 14, no. 3: 447. https://doi.org/10.3390/electronics14030447

APA Style

Usman, M., Lin, C.-L., & Chen, Y.-T. (2025). Multi-Scale Feature Extraction to Improve P300 Detection in Brain–Computer Interfaces. Electronics, 14(3), 447. https://doi.org/10.3390/electronics14030447

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Scale Feature Extraction to Improve P300 Detection in Brain–Computer Interfaces

Abstract

1. Introduction

2. Background

2.1. P300 Speller

2.2. Classification Problems

2.2.1. P300 Detection

2.2.2. Character Recognition

3. Experimental Data and Setup

3.1. Data

3.2. Data Preprocessing

3.3. Inception Module

3.3.1. Inception Module: Naïve Version

3.3.2. Inception Module: Improved Version

3.4. Proposed Inception-CNN Model

3.5. Training

4. Results

4.1. P300 Detection

4.2. Character Recognition Accuracy

5. Discussion

6. Conclusions

7. Limitations and Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI