Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Multi-Scale Feature Extraction to Improve P300 Detection in Brain–Computer Interfaces

Electronics 2025, 14(3), 447; https://doi.org/10.3390/electronics14030447

by Muhammad Usman¹, Chun-Ling Lin^2,*

and Yao-Tien Chen¹

Reviewer 1:

Dong Zhao

Reviewer 2: Anonymous

Reviewer 3:

Inês Domingues

Electronics 2025, 14(3), 447; https://doi.org/10.3390/electronics14030447

Submission received: 16 December 2024 / Revised: 15 January 2025 / Accepted: 22 January 2025 / Published: 23 January 2025

(This article belongs to the Section Computer Science & Engineering)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper introduces an Inception module-based Convolutional Neural Network (Inception-CNN) architecture, which effectively learns discriminative features from both spatial and temporal information to enhance P300 detection accuracy while reducing overfitting and computational complexity. The manuscript points out that current CNN architectures limit P300 detection accuracy because these models usually only extract single-scale features. Inception-CNN is a seven-layer convolutional neural network that incorporates an inception layer before spatial and temporal convolution layers, combined with batch normalization and dropout techniques, enabling multi-scale feature extraction to enhance detection accuracy and generalization ability. The experimental results demonstrate that Inception-CNN provides a promising solution for improving the accuracy of P300 detection and achieves better performance compared to other methods across different datasets. However, this paper requires significant revisions before it can be considered for publication. Below are some comments and suggestions from the reviewer aimed at enhancing the clarity, rigor, and overall quality of the manuscript.

1. Please provide a detailed explanation of the main contributions and innovations of this article.

2. Please add a detailed description of the advantages and motivation of the proposed method in this article.

3. The results indicate that the proposed method does not outperform other comparison methods on all indicators, and the quantitative and qualitative analysis of the experimental results is insufficient. The authors are requested to provide additional explanations for this observation and discuss the limitations.

4. The related work section is not sufficiently comprehensive, and the comparison algorithms in the article lack comparisons with state-of-the-art methods. Please cite the latest literature and provide a thorough discussion.

5. The use of formulas and parameters in the article is very confusing, which causes great interference to reading work. Please carefully check the use of parameters in each formula and ensure that the parameters in the formula are consistent with the parameters explained later, such as formula (3) and formula (7). Please check in detail, for example, the "N" in line 303 and the "N" in line 306.

6. Please review each figure and table, ensuring that necessary legends are added to enhance readability, and fully explain the specific meaning of each parameter and function used in the image and the meaning represented by the graph, such as the meaning of “FM” in Figure 9, the meaning represented by the circle in the image, the meaning of "t Stat" in Table 7, etc.

7. Please demonstrate the computational complexity of the method.

8. The introduction about the proposed method is vague, which may cause some confusion for readers. Please add a pseudocode table to explain the proposed method.

9. The readability of this article is poor. The entire manuscript must be revised in English. The grammar of the entire manuscript must be improved. For example:

1) The “which” in line 298 refers to something unclear.

2) The “Where” in line 303 should be in lowercase letters.

10. Please review the citation format of the references for accuracy and consistency.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Comment 1: Please provide a detailed explanation of the main contributions and innovations of this article.

Response 1: Thank you for pointing it out. We agree with this comment. Therefore, we have added the contribution of our work. The revisions can be found in Section 1, on pages 2 and 3, lines 83–91. [Our contributions include: (1) We present a CNN architecture that integrates inception layers before both spatial and temporal convolutional layers, enhancing the ability to extract multi-scale features effectively. (2) The inception layers allow simultaneous extraction of fine and coarse details, leading to better adaptability to variations in EEG signals across individuals and sessions. (3) We conduct a thorough comparative evaluation with existing state-of-the-art methods, demonstrating significant improvements in classification accuracy and robustness. (4) We present the impact of inception layers on both spatial and temporal convolution stages, highlighting their individual and combined contributions to overall model performance. The advantages of our approach lie in its improved adaptability to signal variability, efficient feature extraction, and better classification performance.]

Comment 2: Please add a detailed description of the advantages and motivation of the proposed method in this article.

Response 2: Thank you for pointing it out. We agree with this comment. Therefore, we have added the motivation of the proposed method. The changes can be found in Section 1, on page 2, lines 78–82. [The motivation behind our approach stems from the need to improve the adaptability of P300 classification across diverse datasets and individuals. Variations in signal characteristics require a more flexible feature extraction framework, that traditional CNN architectures struggle to provide. By introducing inception layers before each convolutional layer, our method captures both fine-grained and high-level features across multiple scales.]

Advantages are provided in Section 1, on page 3, line 91-99. [The advantages of our approach lie in its improved adaptability to signal variability, efficient feature extraction, and better classification performance. The Inception layer, originally part of the GoogleNet architecture, facilitates efficient feature extraction and dimensionality reduction for image recognition tasks in a CNN[28]. It enhances the discriminative power of CNN for the P300 detection task, improving both classification and character recognition accuracy. Our findings indicate that utilizing the inception layer before each spatial and temporal convolutional layer significantly enhances CNN's ability to detect P300 compared to existing methods. These improvements demonstrate the potential of our approach to advance BCI systems.]

Comment 3: The results indicate that the proposed method does not outperform other comparison methods on all indicators, and the quantitative and qualitative analysis of the experimental results is insufficient. The authors are requested to provide additional explanations for this observation and discuss the limitations.

Response 3: Thank you for pointing it out. We agree with this comment. Previous studies on P300 speller-based BCI systems have compared their results with benchmark papers, primarily focusing on F1-Score for the classification of P300 and non-P300, character recognition accuracy, and ITR. Therefore, in our study, we have also included these metrics for comparison. Additionally, we have elaborated on the results comparison in the discussion section, which can be found on Page 14, lines 389–398. [An Inception-CNN architecture was employed to facilitate multi-scale feature extraction from P300 signals. Convolutional filters of varying sizes effectively captured diverse frequency and temporal characteristics, crucial for distinguishing between P300 and non-P300 signals. The Inception-CNN achieved F1-scores of 47.14%, 55.28%, and 78.94% on datasets BCI IIIA, BCI IIIB, and BCI II, respectively, surpassing previous methodologies[20,22]. Improved character recognition accuracy was observed, requiring 11 epochs for BCI IIIA and 5 epochs for BCI IIIB, outperforming earlier approaches[9,19,20,22,23,24]. At 10 and 15 epochs, the Inception-CNN exceeded previous performance benchmarks for BCI IIIA and BCI IIIB[11,16,17,24,36,37,38]. For dataset BCI II, lower error rates were achieved during the initial three epochs, along with higher information transfer rates (ITR) at the second epoch, demonstrating superior single-trial accuracy [11,19,22,23,39,40]. The multi-scale feature extraction capability of the inception layer significantly enhanced classification accuracy and character recognition performance by capturing distinct frequency and temporal characteristics of P300 signals. This integration underscores the potential of Inception-CNN for practical P300-based speller systems.]

Also added the limitation on page 15, line 410-430. [Limitations and Future Directions:

[Inception-CNN demonstrates impressive performance, particularly in P300-based speller systems, by effectively capturing multi-scale features for accurate character recognition. However, its dual Inception modules introduce considerable computational complexity, resulting in higher resource consumption and longer training times, which may limit real-time deployment in resource-constrained environments. Despite these trade-offs, the computational cost is often justified where high accuracy is paramount. EEG-Inception's strength lies in its ability to extract meaningful features from EEG signals, though its interpretability remains a challenge. The model’s feature-learning process is not yet fully understood, and the integration of explainable AI techniques could provide valuable insights, optimizing its architecture and improving performance in future applications. Currently, Inception-CNN has been evaluated on specific datasets, such as dataset II of BCI Competition III (subjects A and B) and dataset IIb of BCI Competition II. However, it has yet to be tested on other ERP paradigms, including miniature asymmetrical or motion visual evoked potentials, which present distinct neural response patterns. These paradigms may challenge the model’s generalization. Nevertheless, the adaptive design of Inception-CNN, with fine-tuning and additional training, shows potential to overcome these challenges and deliver robust performance across a range of ERP types. Expanding the model’s application to a broader range of datasets and paradigms, along with integrating adaptive learning techniques and real-time testing, could significantly enhance its flexibility, accuracy, and practical applicability in real-world scenarios.]

Comment 4: The related work section is not sufficiently comprehensive, and the comparison algorithms in the article lack comparisons with state-of-the-art methods. Please cite the latest literature and provide a thorough discussion.

Response 4: Thank you for pointing it out. We agree with this comment. Therefore, we have added some recent studies in the Introduction Section 1, on page 2, lines 59–61. Additionally, reference paper [24] has been included in Table 6 on page 11 and Table 7 on page 12 for comparison with our findings. [Similarly, the WE-SPSQ-CNN model enhances classification accuracy and signal-to-noise ratio by employing a weighted ensemble of spatio-sequential architectures [24,25].

24.Shukla, P.K.; Cecotti, H.; Meena, Y.K. Towards Effective Deep Neural Network Approach for Multi-Trial P300-based Character Recognition in Brain-Computer Interfaces. arXiv Preprint, 2024, arXiv:2410.08561.

25.Liu, M.; Shi, W.; Zhao, L.; Beyette Jr, F.R. Best performance with fewest resources: unveiling the most resource-efficient convolutional neural network for P300 detection with the aid of Explainable AI. Machine Learning Applications, 2024, 16, 100542.]

Also some related work is added in discussion Section 5, page 14, line 375-382. [Recent studies introduced methodologies like converting EEG signals into visually interpretable images [41], addressing cross-participant variability with an Attention Domain Adversarial Neural Network (OADANN)[42], optimizing preprocessing and transfer learning using a CNN-based classifier, P3CNET [43], and minimizing stimulus reliance with a spatial-temporal neural network (STNN) [44]. However, these studies, were excluded from the comparison due to the use of different datasets. The current comparison focuses on studies utilizing dataset II of BCI Competition III (subject A and B) and dataset IIb of BCI Competition II.

41.Ail, B.E.; Ramele, R.; Gambini, J.; Santos, J.M. An intrinsically explainable method to decode P300 waveforms from EEG signal plots based on convolutional neural networks. Brain Sciences, 2024, 14, 836.

42.Li, S.; Daly, I.; Guan, C.; Cichocki, A.; Jin, J. Inter-participant transfer learning with attention-based domain adversarial training for P300 detection. Neural Networks, 2024, 180, 106655.

43.Daǧ, I.; Dui, L.G.; Ferrante, S.; Pedrocchi, A.; Antonietti, A. Leveraging deep learning techniques to improve P300-based brain-computer interfaces. IEEE Journal of Biomedical and Health Informatics, 2022, 26, 4892–4902.

44.Zhang, Z.; Yu, X.; Rong, X.; Iwata, M. Spatial-temporal neural network for P300 detection. IEEE Access, 2021, 9, 163441–163455.]

Comment 5: The use of formulas and parameters in the article is very confusing, which causes great interference to reading work. Please carefully check the use of parameters in each formula and ensure that the parameters in the formula are consistent with the parameters explained later, such as formula (3) and formula (7). Please check in detail, for example, the "N" in line 303 and the "N" in line 306.

Response 5: Thank you for pointing it out. We agree with this comment. Formula 3 has been rewritten on page 8 for clarity. Additionally, Formula 7 has been revised on page 10, and the description on lines 313–317 has been updated to ensure better clarity and understanding.

About N represents the number of classification problems. As discussed in Section 2.2, page 4, lines 120-142. The first task is a binary classification aimed at distinguishing between P300 and non-P300 signals. The second task utilizes the predictions of P300 to identify rows and columns in a P300 speller, which consists of 36 characters. Therefore, N = 36 corresponds to the total number of classification tasks required to recognize the specific character on which the subject is focused.

Comment 6: Please review each figure and table, ensuring that necessary legends are added to enhance readability, and fully explain the specific meaning of each parameter and function used in the image and the meaning represented by the graph, such as the meaning of “FM” in Figure 9, the meaning represented by the circle in the image, the meaning of "t Stat" in Table 7, etc.

Response 6: Thank you for pointing it out. We agree with this comment. The captions of Figure 3(page 6), Figure 4(page 7), Table 3(page 9), and Table 8(page12) are updated

Comment 7: Please demonstrate the computational complexity of the method.

Response 7: Thank you for pointing that out. The limitations, including those related to computational complexity, are addressed in Section 7, on page 15, line 411-416. [Inception-CNN demonstrates impressive performance, particularly in P300-based speller systems, by effectively capturing multi-scale features for accurate character recognition. However, its dual Inception modules introduce considerable computational complexity, resulting in higher resource consumption and longer training times, which may limit real-time deployment in resource-constrained environments. Despite these trade-offs, the computational cost is often justified where high accuracy is paramount.]

Comment 8: The introduction about the proposed method is vague, which may cause some confusion for readers. Please add a pseudocode table to explain the proposed method.

Response 8: Thank you for pointing that out. To improve the clarity of the proposed method, another table has been added (Table 1) on page 5 and included additional descriptions in Section 3 on page 4, lines 144–151, to make the method more understandable. [Table 1 outlines the steps involved in the process for BCI IIIA, BCI IIIB, and BCI II, including descriptions and pseudocode. The process begins with data preprocessing, which involves EEG data loading, filtering, epoch extraction, and downsampling. Next, a CNN is proposed and trained to detect P300 signals. The system then aggregates epoch probabilities for P300 detection and maps the maximum probability rows and columns to decode characters. Finally, accuracy evaluation is performed using the F1-score, precision, and recall to comprehensively assess the model's effectiveness and reliability in detecting P300 signals and character recognition.]

Comment 9: The readability of this article is poor. The entire manuscript must be revised in English. The grammar of the entire manuscript must be improved. For example:

1) The “which” in line 298 refers to something unclear.

2) The “Where” in line 303 should be in lowercase letters.

Response 9: Thank you for pointing that out.

The issue regarding the comment ("The 'which' in line 298 refers to something unclear.") has been resolved by integrating it into the paragraph. It is now clear in Section 4.2, on page 11, line 341.

The issue regarding the comment ("The 'Where' in line 303 should be in lowercase letters.") was already addressed and is corrected in Section 4.2, on page 12, line 346.

Comment 10: Please review the citation format of the references for accuracy and consistency.

Response 10: Thank you for your comment. We have carefully reviewed all the citation formats consistent with the required format.

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you for sharing your work. The article clearly outlines the limitations of current methods for BCIs based on P300 and effectively presents the Inception-CNN model as a solution. The explanation of the two complementary models—Inception-CNN-S and Inception-CNN-T—and their purpose enhances the clarity of the research approach. I appreciate how the experimental results are highlighted, demonstrating the Inception-CNN's superiority in detection accuracy and character recognition. Overall, this article makes the methodology and findings easy to understand and emphasizes the significance of the proposed improvements.

Here are some suggestions to improve the article:

i. Please arrange the keywords in alphabetical order.

ii. Add a sentence in the introduction that outlines how the paper is structured in the following sections.

iii. Including 2-3 more related works from the past five years would be beneficial.

iv. Please provide basic information about Google Colab by adding 2-3 sentences.

v. Please consider including a Convolutional Neural Network (CNN) architecture diagram and essential information before introducing Figure 3, which shows the Inception-CNN architecture.

vi. Please make sure to mention the training and testing split ratio.

vii. Please discuss the limitations of the proposed technique.

viii. Please proofread the entire article for clarity and correctness.

Comments on the Quality of English Language

Please proofread the entire article; thanks.

Author Response

Comment 1: Please arrange the keywords in alphabetical order.

Response 1: Thank you for pointing it out. We have made the revisions on page 1, lines 18–19.[Keywords: Brain-Computer Interface; Convolutional Neural Networks; Event-Related Potential; Inception; Multi-Scale; P300 detection]

Comment 2: Add a sentence in the introduction that outlines how the paper is structured in the following sections.

Response 2: Thank you for pointing it out. We have added the revisions in Section 1, on page 3, lines 100–103. [This paper is organized as follows: Section 2 provides an overview of the P300 wave, the oddball paradigm, and the associated classification challenges. Section 3 details the dataset utilized, the preprocessing steps applied, and the proposed CNN architecture. Finally, Section 4 presents the experimental results.]

Comment 3: Including 2-3 more related works from the past five years would be beneficial.

Response 3: Thank you for pointing it out. We agree with this comment. Therefore, we have added some recent work in Introduction Section 1, page 2, line 59-61[Similarly, the WE-SPSQ-CNN model enhances classification accuracy and signal-to-noise ratio by employing a weighted ensemble of spatio-sequential architectures [24,25].

Also some related work is added in discussion Section 5, page 14, line 375-382. [Recent studies introduced methodologies like converting EEG signals into visually interpretable images [41], addressing cross-participant variability with an Attention Domain Adversarial Neural Network (OADANN)[42], optimizing preprocessing and transfer learning using a CNN-based classifier, P3CNET [43], and minimizing stimulus reliance with a spatial-temporal neural network (STNN) [44]. However, these studies, were excluded from the comparison due to the use of different datasets. The current comparison focuses on studies utilizing Dataset II of BCI Competition III (subject A and B) and Dataset IIb of BCI Competition II.

42.Li, S.; Daly, I.; Guan, C.; Cichocki, A.; Jin, J. Inter-participant transfer learning with attention-based domain adversarial training for P300 detection. Neural Networks, 2024, 180, 106655.

44.Zhang, Z.; Yu, X.; Rong, X.; Iwata, M. Spatial-temporal neural network for P300 detection. IEEE Access, 2021, 9, 163441–163455.]

Comment 4: Please provide basic information about Google Colab by adding 2-3 sentences.

Response 4: Thank you for pointing it out. We have added the revisions in Section 3.5, on page 9, line 273-277. [Google Colab offers a cloud-based platform with free access to GPUs, enabling efficient model training without requiring local high-end hardware. Additionally, it facilitates seamless integration with libraries like TensorFlow and Keras, streamlining the development and experimentation process.]

Comment 5: Please consider including a Convolutional Neural Network (CNN) architecture diagram and essential information before introducing Figure 3, which shows the Inception-CNN architecture.

Response 5: Thank you for pointing it out. Figures 4 and 5 illustrate the Convolutional Neural Network (CNN), which follows a basic CNN architecture with the addition of an inception layer. This architecture is explained layer by layer in Section 3.4. To provide readers with a clearer understanding of the fundamental CNN structure, we have done some revisions in Section 3.4, on page 7, lines 223–226. [Typical CNN architectures consist of convolutional layers to perform extraction, pooling layers to reduce feature dimensionality, and fully connected layers for final classification. The convolutional layers identify important patterns, pooling layers simplify the data representation, and fully connected layers integrate these features for accurate predictions.]

Comment 6: Please make sure to mention the training and testing split ratio.

Response 6: Thank you for pointing it out. However, we did not split the data into training and testing sets ourselves. The data is already pre-divided into training and testing sets in dataset II of BCI Competition III (Subject A and Subject B) and dataset IIb of BCI Competition II.

Comment 7: Please discuss the limitations of the proposed technique.

Response 7: Thank you for pointing it, we added the limitation on page 15, line 410-430. [Limitations and Future Directions: Inception-CNN demonstrates impressive performance, particularly in P300-based speller systems, by effectively capturing multi-scale features for accurate character recognition. However, its dual Inception modules introduce considerable computational complexity, resulting in higher resource consumption and longer training times, which may limit real-time deployment in resource-constrained environments. Despite these trade-offs, the computational cost is often justified where high accuracy is paramount. EEG-Inception's strength lies in its ability to extract meaningful features from EEG signals, though its interpretability remains a challenge. The model’s feature-learning process is not yet fully understood, and the integration of explainable AI techniques could provide valuable insights, optimizing its architecture and improving performance in future applications. Currently, Inception-CNN has been evaluated on specific datasets, such as dataset II of BCI Competition III (subjects A and B) and dataset IIb of BCI Competition II. However, it has yet to be tested on other ERP paradigms, including miniature asymmetrical or motion visual evoked potentials, which present distinct neural response patterns. These paradigms may challenge the model’s generalization. Nevertheless, the adaptive design of Inception-CNN, with fine-tuning and additional training, shows potential to overcome these challenges and deliver robust performance across a range of ERP types. Expanding the model’s application to a broader range of datasets and paradigms, along with integrating adaptive learning techniques and real-time testing, could significantly enhance its flexibility, accuracy, and practical applicability in real-world scenarios.]

Comment 8: Please proofread the entire article for clarity and correctness.

Response 8: Thank you for your suggestion. The article has been thoroughly proofread for clarity and correctness.

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript is well-structured and presents advancements with the Inception-CNN model for P300 detection, but could benefit from clearer objectives, updated references, discussion on generalization, practical implications, and inclusion of computational efficiency metrics.

Some specific comments include:
- The introduction mentions the limitations of current models but does not clearly articulate how Inception-CNN addresses these limitations
- Numbers in tables should be aligned to the right
- Mention limitations and future directions
- List of references seems outdated, with the most recent one being from 2022
- Consider providing the code or trained model weights to encourage reproducibility

Since the data is temporal, wouldn't it make more sense to use a transformer instead of CNN-based models? Transformers can process multi-channel EEG.

Author Response

Comment 1: The introduction mentions the limitations of current models but does not clearly articulate how Inception-CNN addresses these limitations.

Response 1: Thank you for pointing it out. We agree with this comment. Therefore, we have added some description on in Section 1, pages 2-3, lines 78-99: [The motivation behind our approach stems from the need to improve the adaptability of P300 classification across diverse datasets and individuals. Variations in signal characteristics require a more flexible feature extraction framework, that traditional CNN architectures struggle to provide. By introducing inception layers before each convolutional layer, our method captures both fine-grained and high-level features across multiple scales. Our contributions include: (1) We present a CNN architecture that integrates inception layers before both spatial and temporal convolutional layers, enhancing the ability to extract multi-scale features effectively. (2) The inception layers allow simultaneous extraction of fine and coarse details, leading to better adaptability to variations in EEG signals across individuals and sessions. (3) We conduct a thorough comparative evaluation with existing state-of-the-art methods, demonstrating significant improvements in classification accuracy and robustness. (4) We present the impact of inception layers on both spatial and temporal convolution stages, highlighting their individual and combined contributions to overall model performance. The advantages of our approach lie in its improved adaptability to signal variability, efficient feature extraction, and better classification performance. The Inception layer, originally part of the GoogleNet architecture, facilitates efficient feature extraction and dimensionality reduction for image recognition tasks in a CNN[28]. It enhances the discriminative power of CNN for the P300 detection task, improving both classification and character recognition accuracy. Our findings indicate that utilizing the inception layer before each spatial and temporal convolutional layer significantly enhances CNN's ability to detect P300 compared to existing methods. These improvements demonstrate the potential of our approach to advance BCI systems.]

Comment 2: Numbers in tables should be aligned to the right.

Response 2: Thank you for pointing that out. After reviewing the MDPI formatting guidelines, we found that table content, including numbers, should be center-aligned. I have applied center alignment to all tables accordingly.

Comment 3: Mention limitations and future directions.

Response 3: Thank you for pointing it, we added the limitation on page 15, lines 410-430. [Limitations and Future Directions: Inception-CNN demonstrates impressive performance, particularly in P300-based speller systems, by effectively capturing multi-scale features for accurate character recognition. However, its dual Inception modules introduce considerable computational complexity, resulting in higher resource consumption and longer training times, which may limit real-time deployment in resource-constrained environments. Despite these trade-offs, the computational cost is often justified where high accuracy is paramount. EEG-Inception's strength lies in its ability to extract meaningful features from EEG signals, though its interpretability remains a challenge. The model’s feature-learning process is not yet fully understood, and the integration of explainable AI techniques could provide valuable insights, optimizing its architecture and improving performance in future applications. Currently, Inception-CNN has been evaluated on specific datasets, such as dataset II of BCI Competition III (subjects A and B) and dataset IIb of BCI Competition II. However, it has yet to be tested on other ERP paradigms, including miniature asymmetrical or motion visual evoked potentials, which present distinct neural response patterns. These paradigms may challenge the model’s generalization. Nevertheless, the adaptive design of Inception-CNN, with fine-tuning and additional training, shows potential to overcome these challenges and deliver robust performance across a range of ERP types. Expanding the model’s application to a broader range of datasets and paradigms, along with integrating adaptive learning techniques and real-time testing, could significantly enhance its flexibility, accuracy, and practical applicability in real-world scenarios.]

Comment 4: List of references seems outdated, with the most recent one being from 2022.

Response 4: Thank you for pointing it out. We agree with this comment. Therefore, we have added some recent work in Introduction Section 1, page 2, line 59-61[Similarly, the WE-SPSQ-CNN model enhances classification accuracy and signal-to-noise ratio by employing a weighted ensemble of spatio-sequential architectures [24,25].

Also some related work is added in the discussion Section 5, on page 14, line 375-382. [Recent studies introduced methodologies like converting EEG signals into visually interpretable images [41], addressing cross-participant variability with an Attention Domain Adversarial Neural Network (OADANN)[42], optimizing preprocessing and transfer learning using a CNN-based classifier, P3CNET [43], and minimizing stimulus reliance with a spatial-temporal neural network (STNN) [44]. However, these studies, were excluded from the comparison due to the use of different datasets. The current comparison focuses on studies utilizing Dataset II of BCI Competition III (subject A and B) and Dataset IIb of BCI Competition II.

42.Li, S.; Daly, I.; Guan, C.; Cichocki, A.; Jin, J. Inter-participant transfer learning with attention-based domain adversarial training for P300 detection. Neural Networks, 2024, 180, 106655.

44.Zhang, Z.; Yu, X.; Rong, X.; Iwata, M. Spatial-temporal neural network for P300 detection. IEEE Access, 2021, 9, 163441–163455.]

Comment 5: Consider providing the code or trained model weights to encourage reproducibility.

Response 5: The code and trained weights can be provided to anyone on request.

Comment 6: Since the data is temporal, wouldn't it make more sense to use a transformer instead of CNN-based models? Transformers can process multi-channel EEG.

Response 6: Thank you for pointing it out. We explored the use of Transformers in our study, but they did not yield significant improvements for our specific problem. This is because our task goes involve more than just detecting P300 signals; it also involves utilizing the predicted P300 responses to accurately identify the target character in the speller matrix.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors did not provide point-to-point responses to my review comments. In addition, the revised manuscript did not highlight the revised contents, and it is hard for me to follow the revised section. I cannot recommend this manuscript for publication. For example, the authors did not response my concerns as follows.

1. The qualitative and quantitative analysis of the experimental results in the article did not meet the criteria for modification.

2. The expression of parameters in the article is incorrect. Is the description of parameter N in line 394 the number of classifications mentioned earlier or the parameter 'n' in expression T.

3. The author should demonstrate the computational complexity of the entire algorithm.

4. The author should present a pseudocode table of the methods mentioned in the article.

5. The algorithm proposed in the article did not demonstrate real-time performance.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors Minor comments: - The word significant and its derivatives should only be used in the context of differences validated by statistical analysis - The fact that "The code and trained weights can be provided to anyone on request." should be mentioned in the paper - The experiments made with transformers should be also briefly mentioned in the paper

Article Menu

Multi-Scale Feature Extraction to Improve P300 Detection in Brain–Computer Interfaces

Further Information

Guidelines

MDPI Initiatives

Follow MDPI