Next Article in Journal
A Welding Defect Detection Model Based on Hybrid-Enhanced Multi-Granularity Spatiotemporal Representation Learning
Previous Article in Journal
A Kinodynamic Model for Dubins-Based Trajectory Planning in Precision Oyster Harvesting
Previous Article in Special Issue
Simulation Design of an Elliptical Loop-Microstrip Array for Brain Lobe Imaging with an 11.74 Tesla MRI System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

BCINetV1: Integrating Temporal and Spectral Focus Through a Novel Convolutional Attention Architecture for MI EEG Decoding

1
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
2
Modern Educational Technology Center, Hainan Medical University, Haikou 571199, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(15), 4657; https://doi.org/10.3390/s25154657 (registering DOI)
Submission received: 9 June 2025 / Revised: 16 July 2025 / Accepted: 24 July 2025 / Published: 27 July 2025
(This article belongs to the Special Issue Advanced Biomedical Imaging and Signal Processing)

Abstract

Motor imagery (MI) electroencephalograms (EEGs) are pivotal cortical potentials reflecting cortical activity during imagined motor actions, widely leveraged for brain-computer interface (BCI) system development. However, effectively decoding these MI EEG signals is often overshadowed by flawed methods in signal processing, deep learning methods that are clinically unexplained, and highly inconsistent performance across different datasets. We propose BCINetV1, a new framework for MI EEG decoding to address the aforementioned challenges. The BCINetV1 utilizes three innovative components: a temporal convolution-based attention block (T-CAB) and a spectral convolution-based attention block (S-CAB), both driven by a new convolutional self-attention (ConvSAT) mechanism to identify key non-stationary temporal and spectral patterns in the EEG signals. Lastly, a squeeze-and-excitation block (SEB) intelligently combines those identified tempo-spectral features for accurate, stable, and contextually aware MI EEG classification. Evaluated upon four diverse datasets containing 69 participants, BCINetV1 consistently achieved the highest average accuracies of 98.6% (Dataset 1), 96.6% (Dataset 2), 96.9% (Dataset 3), and 98.4% (Dataset 4). This research demonstrates that BCINetV1 is computationally efficient, extracts clinically vital markers, effectively handles the non-stationarity of EEG data, and shows a clear advantage over existing methods, marking a significant step forward for practical BCI applications.

1. Introduction

A recent World Health Organization (WHO) report revealed that neurological conditions affect over one-third of the global population, exceeding 3 billion people, with stroke alone impacting approximately 7.8 million individuals worldwide [1]. Such conditions frequently lead to motor impairments that severely compromise daily functioning of the inflicted individuals. Brain–computer interfaces (BCIs) present a promising alternative to potentially restore motor capabilities by translating neural activity directly into control signals for external devices, thereby bypassing damaged neuromuscular pathways. Consequently, BCIs hold transformative potential in neurorehabilitation, assistive technology, and as communication aids for individuals with motor disabilities [2]. To harness these neural signals, BCI systems employ a range of neuroimaging modalities. These are broadly categorized into invasive methods, which require surgical implantation, such as electrocorticography (ECoG), stereoelectroencephalography (sEEG), and microelectrode arrays (MEAs). Conversely, non-invasive techniques acquire brain signals externally, including functional magnetic resonance imaging (fMRI), magnetoencephalography (MEG), functional near-infrared spectroscopy (fNIRS), and electroencephalography (EEG) [3].
Among these, EEG is particularly prominent for real-time BCI applications due to its high temporal resolution, portability, affordability, and inherent safety. Within the EEG-based BCI landscape, common paradigms include event-related potentials (ERPs), steady-state visually evoked potentials (SSVEPs), and motor imagery (MI). While ERP and SSVEP paradigms necessitate external stimuli to elicit brain responses, MI is distinguished by its intuitive engagement of the sensorimotor cortex, enabling more naturalistic control without reliance on external stimuli [4]. The term ’MI EEG signals’ refers to the time-series electrical potentials recorded from the scalp that reflect this underlying neural activity during imagined movements. Despite often exhibiting higher performance variability, MI-based BCIs are especially advantageous for motor rehabilitation because they can promote neuroplasticity and necessitate active user participation, both of which are crucial for functional recovery [5].
Decoding MI EEG signals is inherently difficult due to their complex and dynamic nature. These signals are fundamentally nonlinear, meaning the brain’s electrical response is not a simple, direct reflection of the mental task but rather emerges from the complex dynamics of underlying neural networks [6]. Furthermore, these signals are non-stationary, meaning their statistical properties shift unpredictably over time. This is coupled with a strong time dependency, where each data point is influenced by those that came before it. Compounding these issues is the high degree of variability not only between individuals but also across different recording sessions for the same person. To navigate these complexities and distill meaningful features, researchers utilize techniques like wavelet transform (WT) [7], short-time Fourier transform (STFT) [8], empirical mode decomposition (EMD) [9], empirical wavelet transform (EWT) [10], variational mode decomposition (VMD) [11], complex variational mode decomposition (CVMD) [3], and empirical Fourier decomposition (EFD) [12]. These methods are effective in analyzing the frequency domain characteristics and extract features from MI EEG signals.
In the past decade, deep learning (DL) has emerged as a formidable force in automated MI-tasks detection, showcasing its impressive potential. The growing appeal of DL algorithms among researchers is attributed to their ability to discern nonlinear characteristics that elude human detection and mathematical description. Among the diverse architectures of deep learning algorithms, convolutional neural networks (CNNs) are widely used for MI EEG signal classification tasks because their shared, hierarchical filters automatically learn local spatial–temporal features with far fewer parameters than simple fully connected deep neural networks. For instance, Zhang et al. [13] proposed a new algorithm named CNN-LSTM, combining convolutional neural networks and long short-term memory networks to simultaneously extract spatial and temporal features from MI EEG signals, achieving an accuracy of 83% on the dataset 2a from BCI competition IV [14]. Their proposed method demonstrated a 33–40% relative improvement in kappa score over traditional LDA and FBCSP baselines. Dai et al. [15] introduced a mixed convolutional kernel and proposed a hybrid scale CNN (HS-CNN) architecture, utilizing data augmentation methods for MI EEG signal classification. This approach achieved classification accuracies of 91.57% and 87.6% on two datasets, respectively. Furthermore, they reportedly achieved a 9.5% relative accuracy improvement over FBCSP by using hybrid convolution scales and data augmentation.
A foundational benchmark in the CNN domain is EEGNet, a compact convolutional neural network proposed by Lawhern et al. [16]. Unlike deeper, more complex architectures, EEGNet was designed for generalizability across different BCI paradigms (P300, ERP, MI) and for computational efficiency, making it suitable for on-device and embedded applications. Its architecture is notable for emulating traditional signal processing steps within a CNN framework. It begins with a temporal convolution to act as a learnable filter bank, followed by a depthwise convolution to learn spatial filters for each temporal feature map, akin to the Common Spatial Pattern (CSP) algorithm. Finally, a separable convolution block efficiently summarizes and mixes the features for classification. Due to its robustness, efficiency, and strong performance, EEGNet serves as a common baseline and a structural inspiration for many subsequent models in the field. For instance, Deng et al. [17] delved into the relationship between the 1D convolution of EEGNet and discrete wavelet transform (DWT) and proposed the TSGL-EEGNet model based on the temporally constrained sparse group lasso (TSGL) algorithm. Their model achieved average accuracies of 81.34% and 88.89% on the datasets 2a and IIIa from BCI competition IV [14], respectively. Their method showed a 20.1% relative accuracy improvement over FBCSP.
With the profound understanding and analysis of the characteristics of MI EEG signals, temporal convolutional networks (TCNs) have been widely applied for the classification of time-series signals. While a TCN is a type of convolutional neural network, it is set apart by a specific architecture designed for sequence modeling. First, TCNs use causal convolutions, ensuring that the output at time step t depends only on inputs from the past (at or before time t). This preserves the temporal order of the data. Second, they employ dilated convolutions, where the dilation factor increases exponentially with the depth of the network. This allows the receptive field to expand exponentially, enabling the network to efficiently capture very long-range dependencies without requiring an excessive number of layers. Finally, TCNs are typically structured using residual blocks to stabilize the training of these deeper networks. This combination of causality, an efficient large receptive field, and stable training makes the TCN a specialized and powerful architecture for sequential data. Ingolfsson et al. [18] proposed a novel EEG-TCNet, by integrating a TCN with the baseline EEGNet, achieving a classification accuracy of 83.84% on a four-class MI EEG dataset [14]. The EEG-TCNet yielded a 12.1% relative improvement in accuracy over a Riemannian geometry-based classifier. Furthermore, Musallam et al. [19] combined EEGNet with EEG-TCNet to introduce the TCNet-fusion model, achieving an accuracy of 94.41% on the high gamma dataset [20]. Their method demonstrated a 13.6% relative accuracy improvement compared to a standard FBCSP baseline.
In recent years, the attention mechanism has begun to play a significant role in the classification tasks of MI EEG signals [21]. In the algorithmic architecture of deep learning, the attention mechanism serves as a method to mimic human cognitive systems. It enables neural networks to automatically focus on the contextual information when processing the input data, thereby enhancing the model’s performance and generalization capabilities. A particularly powerful variant is self-attention, where a signal’s representation is refined by relating its different positions to one another. This makes it exceptionally effective at capturing the long-range dependencies and complex internal relationships found in EEG data [21]. Owing to its advantages such as capturing long-range dependencies, improving information processing efficiency, and robustness, various attention-based classification methods have been introduced in the EEG domain. For instance, Zhang et al. [22] proposed a convolutional recurrent model based on graphs and attention networks to capture spatial and temporal information from MI EEG signals. Amin et al. [23] combined the attention mechanism with LSTM to introduce a novel lightweight deep learning model that achieves high accuracy with fewer parameters and reduced computational time, obtaining average accuracies of 82.8% and 97.1% on the datasets 2A from the BCI competition IV [14] and high gamma dataset [20], respectively. Their method resulted in a 21.8% relative accuracy improvement over FBCSP while using fewer parameters. Additionally, Zhang et al. [22] and Amin et al. [23] provided empirical validation, where attention mechanisms consistently boosted accuracy by 19.02–23.62% and 2.6–3.8%, respectively, compared to identical architectures without attention. Furthermore, Huang et al. [24] proposed a novel convolutional neural network combined with the attention mechanism. Their experiments demonstrated that both the Fully Convolutional Network with Temporal Attention (FCN-TA) and Fully Convolutional Network with Spatial Attention (FCN-SA) models outperform the attention-less FCN model mechanism, further supporting the enhancing effect of the attention mechanism on neural network performance.
The above literature reveals that various research efforts have been made to propose solutions that are both accurate and focused on significant biomarkers in EEG signals. For a model to be clinically significant or explainable, its decision-making process must be interpretable and validated against known neurophysiological markers, such as event-related desynchronization/synchronization (ERD/ERS) patterns. This stands in contrast to ‘black box’ models where high accuracy might be achieved by learning from dataset-specific artifacts or noise, rather than genuine brain activity. Despite the precision of some methods, significant drawbacks in achieving this clinical relevance persist. Despite the precision of these methods, significant drawbacks persist. We have identified three main categories of challenges in the existing models as follows.
  • Challenge 1: The reliance on signal decomposition methods to address EEG signals non-stationarity. These methods often suffer from high parametric dependence and intrinsic flaws, including mode mixing, excessive decompositions, boundary issues, determining the appropriate number of modes, sensitivity to noise, and high computational costs.
  • Challenge 2: Deep learning approaches, particularly those using CNN-based methods for end-to-end decoding, effectively capture local patterns but struggle to maintain the global time-varying features in non-stationary EEG signals.
  • Challenge 3: Attention-based methods have recently made strides in detecting long-range patterns within signals, yet the precise localization of event time stamps and important spectral variations (ERS/ERD) has not been investigated. While some studies have focused on directly decoding features across EEG’s temporal, spectral, or spatial domains, they have not successfully localized the task-relevant neural signatures.
To overcome these challenges, we propose the Brain–Computer Interface Network: Version One (BCINetV1), a novel approach for EEG signal decoding. The BCINetV1 employs a convolutional attention-based mechanism, meticulously designed to bypass the requirements for non-stationary signal decomposition, focusing instead on pinpointing event-related time stamps and spectral shifts within the EEG signal. The utility of our framework lies in its clinically informed decision-making ability, considering local and global contexts of the EEG data, minimal parametric dependency, independence from specific preprocessing algorithms, and generalization across varied datasets under distinct protocols. The major contributions of our study are as follows:
  • First, we propose BCINetV1, an end-to-end framework featuring a simple yet effective parallel dual branch structure. It comprises three new modules: a temporal convolution-based attention block (T-CAB), a spectral convolution-based attention block (S-CAB), and a squeeze-and-excitation block (SEB). Together, these modules are designed to precisely identify, focus on, and fuse critical tempo-spectral patterns directly from raw EEG signals without manual preprocessing.
  • Second, at the core of the T-CAB and S-CAB modules, we introduce Convolutional Self-Attention (ConvSAT), a novel attention mechanism. ConvSAT innovatively integrates 1D convolution operations into the self-attention framework to synergize the local feature extraction strengths of CNNs with the global contextual modeling of attention. This mechanism performs the heavy lifting, enabling the network to effectively capture both local details and long-range dependencies in non-stationary EEG signals.
  • Third, we conduct a rigorous set of subject-specific experiments on four diverse public datasets, demonstrating that BCINetV1 consistently achieves state-of-the-art performance and stability. Furthermore, by visualizing the learned attention patterns, we show that the model’s decision-making is clinically interpretable and grounded in the identification of established neurophysiological markers like ERD/ERS.
The proposed BCINetV1 provides a manifestation to the aforesaid challenges in the following way:
  • To address Challenge 1, BCINetV1 employs an end-to-end architecture where the convolutional layers within the T-CAB and S-CAB modules learn a task-relevant feature decomposition directly from the data, thereby eliminating the need for a separate, parameter-sensitive pre-processing pipeline like VMD or EMD.
  • To overcome Challenge 2, BCINetV1 integrates a novel ConvSAT mechanism that explicitly models long-range temporal dependencies across the entire trial duration, allowing the model to capture the global, time-varying context of the EEG signal that standard, locality-biased CNNs inherently miss.
  • To resolve Challenge 3, BCINetV1 provides direct, interpretable evidence of its decision-making. The attention masks generated by the T-CAB and S-CAB modules explicitly localize critical temporal and spectral events, successfully isolating well-established neurophysiological markers like event-related synchronization (ERS) and bridging the gap between deep learning and clinical interpretability.
The rest of this study is structured as follows: Section 2 outlines the foundational elements of the BCINetV1 model. Section 3 narrates the utilized EEG datasets. Section 4 details the experimental setup of the framework, along with the results and discussions. Section 5 focuses on the limitations and future directions, while Section 6 offers the conclusion of this study.

2. Methodology—BCINetV1 Overview

The BCINetV1 is a streamlined dual-branch network designed to decode MI EEG signals. Its decision-making process adopts an innovative approach, which selectively amplifies task-relevant temporal and spectral instances within the data. These enhanced instances are then extracted as potent features for classification, optimizing the network’s analytical precision. Traditional deep learning approaches often process the entire EEG signal, extracting features indiscriminately and overlooking the specific local and global contexts within the tempo-spectral dimensions. In contrast, the BCINetV1 model introduces a three-tiered architecture comprising the temporal convolution-based attention block (T-CAB), spectral convolution-based attention block (S-CAB), and the squeeze-and-excitation block (SEB). The T-CAB and S-CAB modules meticulously process the input signals while selectively enhancing task-related tempo-spectral responses and minimizing noise interference. This enhanced processing is achieved through the integration of a specifically engineered convolution self-attention (ConvSAT) block within the pipeline, adept at extracting both local and global contextual information from the EEG signal. Subsequently, the adjusted data from both T-CAB and S-CAB are fused in the squeeze-and-excitation block (SEB), which reorganizes the information to synergize the tempo-spectral responses effectively. The resultant features are then refined through a selection process, where less significant ones are discarded, leaving only the most relevant features for the final classification phase.
The operational details of the model will be elaborated in the Results section, which will demonstrate how the T-CAB and S-CAB modules scale the tempo-spectral information in the EEG signal. The specific parameter settings and architecture of the BCINetV1 model are illustrated in Figure 1. A comprehensive breakdown of each processing block within the model is provided as follows.

2.1. Module 1: Temporal Convolution-Based Attention Block (T-CAB)

To amplify the task-related time stamps and attenuate the impact of non-significant instances in the non-stationary EEG signals, we developed the T-CAB module using advanced deep learning techniques. This module comprises six deep learning blocks, each tailored to autonomously pinpoint the time intervals critical to the MI task at hand. First, we have a temporal depth-wise (DW) convolution block which uses Fs/2 number of 1D kernels of size 32 to extract temporal features along individual frequency vectors. The depth-wise operation is favorable, since it keeps the number of kernels flexible as a function of the number of frequency components (Fs) in the signal. Next, we normalize the entire batch of extracted features to reduce the internal covariance in data. Subsequently, a point-wise convolution operation is performed to compress all frequency components into a single temporal vector (since we strictly want to preserve the time information in the signal). Subsequently, we forward the compressed temporal vector to a novel ConvSAT block. The ConvSAT block is a combination of conventional self-attention, superimposed by convolution operation. The significance of ConvSAT block lies in its ability to combine the utilities of both convolution and self-attention.
The convolution operation plays a crucial role in extracting locally connected features and preserving the spatial hierarchy of signals. However, it falls short in capturing long-range dependencies in time-varying signals. This is where the self-attention mechanism comes into play, enhancing the convolution by deriving contextual information from extensive data ranges. In the ConvSAT block, the process begins by generating Query (Q), Key (K), and Value (V) vectors from the input feature map F e a t (output from the preceding temporal depth-wise convolution and normalization). This is achieved using separate 1D convolutional kernels ( W q i , W k i , and W v i , respectively), each of size k t , as shown in Equations (1)–(3):
Q ( t , 1 ) = i = 0 k t 1 W q i × F e a t ( t + i , 1 )
K ( t , 1 ) = i = 0 k t 1 W k i × F e a t ( t + i , 1 )
V ( t , 1 ) = i = 0 k t 1 W v i × F e a t ( t + i , 1 )
Here, Q ( t , 1 ) , K ( t , 1 ) , and V ( t , 1 ) represent the query, key, and value vectors generated at a specific temporal position t based on a window of the input F e a t . These operations effectively create sequences of Q, K, and V vectors.
The core of the self-attention mechanism then involves calculating attention weights. First, a similarity score is computed between each query vector and all key vectors. This is typically accomplished by taking the dot product of a query Q i (the i t h vector in the query sequence) with the transpose of a key K j (the j t h vector in the key sequence). These similarity scores are then normalized using a softmax function to produce the attention weights AttentionWeights i j , indicating how much attention query i should pay to key/value j:
AttentionWeights i j = exp ( Q i × K j T ) k = 1 T exp ( Q i × K k T )
This softmax function normalizes the scores across all keys for a given query, effectively turning them into probabilities that sum to 1.
Finally, these attention weights are used to compute a weighted sum of the value vectors ( V j ). This produces the output of the self-attention layer, often called the Attention Score, for each query position i. This process ensures that the most pertinent information from the value sequence, as indicated by the attention weights, is emphasized in the final representation:
AttentionScore i = j = 1 T AttentionWeights i j × V j
The significance of this ConvSAT block lies in its ability to combine the local feature extraction strengths of convolutions (in generating Q, K, and V) with the global contextual understanding of self-attention, all while using 1D kernels which can be more parameter-efficient than the 2D weight matrices found in some conventional self-attention mechanisms.
After normalization, these attention scores are multiplied with the base signal to scale the tempo-spectral signal within the data, emphasizing significant time stamps while diminishing the less relevant ones. The output of the T-CAB block is thus a refined tempo-spectral signal with enhanced temporal features.

2.2. Module 2: Spectral Convolution-Based Attention Block (S-CAB)

Following the accurate identification of relevant time stamps using the T-CAB block, we developed an S-CAB module to pinpoint the crucial frequency variations specific to the task at hand. This phase holds clinical significance, as MI activities predominantly manifest within the μ (8–13 Hz) and β (13–30 Hz) bands of the EEG frequency spectrum, necessitating precise frequency localization. During motor imagery of the left/right hand/foot, a noticeable reduction in the spectral power of μ and β rhythms occurs, signifying event-related desynchronization (ERD) within specific regions of the sensorimotor cortex. This reduction is promptly succeeded by a rise in spectral power within these frequency bands, a process known as event-related synchronization (ERS), which marks the motor cortex’s return to its baseline state. Consequently, the accurate detection of these frequency changes is essential for the precise identification of MI EEG tasks.
The S-CAB module initiates its process with a point-wise convolution along the spectral dimension. This is important to extract Fs/2 number of local features from the spectral dimension of the signal. Following this, the extracted features undergo batch normalization, and the temporal dimension is compacted through global average pooling. This step is vital as it aims to solely retain the spectral information, effectively averaging out the temporal dimension. After the initial processing, the tensor is reshaped from (Fs/2, 1, 1) to (1, Fs/2, 1) to facilitate subsequent operations in the ConvSAT phase. This reshaped tensor is then fed into the spectral ConvSAT block, which is designed to localize the contextual information within the spectral domain. The spectral ConvSAT operates in a manner similar to its temporal counterpart, with the primary difference being its focus on the frequency aspects of the EEG signal, rather than the temporal elements. Following the processing in the spectral ConvSAT block, its output is reshaped back to (Fs/2, 1, 1) to align the dimensions with the original signal. Subsequently, a multiplication is performed between the spectral attention scores and the original tempo-spectral signal, adjusting the frequency magnitudes accordingly. This operation ensures that significant frequencies are accentuated, while less relevant ones are attenuated in the final signal, which now encapsulates enhanced spectral information.

2.3. Module 3: Squeeze and Excitation Block (SEB)

The final block is the SEB block which receives the scaled temporal and spectral representations from T-CAB and S-CAB modules. The SEB block’s primary function is to eliminate non-essential elements from these scaled tempo-spectral representations and to compress the data’s dimensionality, thereby streamlining it for the final classification task. At first, the data from T-CAB and S-CAB modules are depth-wise concatenated. Next, the channels in (Fs, Ts, 1) are rearranged in such an order so that the first Fs/2 channels are followed by the last Fs/2 channels in an alternating fashion (first channel followed by (Fs/2) + 1th channel and so on).
This reordering is crucial for aligning the scaled temporal and spectral representations corresponding to the same channel indices. Without this alignment, subsequent feature extraction processes might not accurately retrieve the relevant information. Next, a groupwise convolution operation is performed by grouping consecutive temporal and spectral channels. This step further helps extracting refined features from the alternating channels. Lastly, the max pooling operation is performed to sift the amplified significant features and discard all the attenuated hallmarks. This step also reduces the features dimension into half, which significantly reduces the training time and risk for curse of dimensionality. Finally, the resultant features are fed into the classification block for final classification.

3. Materials and Experimental Protocols

3.1. Datasets

This study utilizes four distinct EEG datasets, encompassing a total of 69 participants, to assess the BCINetV1 model’s adaptability.
  • Dataset 1: BCI Competition III dataset IVa [25] involved five healthy subjects performing two motor imagery tasks (right hand/right foot) across 280 trials each. EEG data was acquired using 118 electrodes (10/20 system) at 1000 Hz, subsequently down-sampled to 100 Hz.
  • Dataset 2: the GigaDB dataset [25] expanded the participant pool significantly with 52 individuals performing binary class (right/left hand) motor imagery tasks. The data was recorded with 64 Ag/AgCl electrodes (10/10 system) over 100–120 trials per task at a 512 Hz sampling rate.
  • Dataset 3: BCI Competition III dataset V [26] introduced more complex tasks with three-class mental imagery (left-hand, right-hand movements, and word association) from three participants. The EEG signals were accumulated using 32 electrodes (10/20 system) sampled at 512 Hz across three sessions.
  • Dataset 4: BCI Competition IV dataset 2a [14] featured nine healthy subjects performing four types of motor imagery (left hand, right hand, feet, and tongue movements), with 288 trials per subject recorded from 22 Ag/AgCl electrodes at 250 Hz over two separate training and testing sessions.
The detailed descriptions of these datasets is omitted here due to spatial limitations. For further information, the readers are encouraged to refer to the referenced studies.

3.2. Experimental Protocols

All the experiments in this study were conducted on a laboratory computer system equipped with an Intel® CoreTM i9-13900k 5.80 GHz CPU with 24 cores, 32 number of threads and 32 GB of GDDR6 random-access memory (RAM). Moreover, the deep learning computations were accelerated by an AI powered NVIDIA GeForce RTX 4090 24 GB GDDR6X GPU with 16,384 CUDA cores for parallel processing. The model was written in the Python version 3.10 language using the PyTorch version 2.7.1 deep learning framework.
All experiments followed a 5-fold cross-validation strategy, dividing the entire EEG dataset into five equal segments. Four segments (80% of the data) were combined to create the training set, while one segment (20% of the data) was allocated for model validation. This process was repeated five times to assess each dataset instance as both a training and testing set. Through empirical assessment, we handpicked an optimal set of training hyperparameters for the comprehensive analysis. These high-performing parameters include the ADAM optimizer, a learning rate of 1 × 10−4, 10 epochs, and a batch size of 32. In the forthcoming part of this study, we will present a detailed empirical analysis focusing on the selection of these hyperparameters, derived from our exhaustive experimentation with a multitude of parameters. We assessed several loss functions, such as mean squared error (MSE), binary cross-entropy loss (BCE), hinge loss, Kullback–Leibler Divergence (KL Divergence), and Huber loss. However, the classification performance remained consistent across all loss functions, showing minimal variation. Due to its simplicity and efficient processing capability, we selected the MSE loss function for this study.
The 1D input sequence of MI EEG signals must undergo conversion into a 2D tempo-spectral image before being fed into the BCINetV1 network. To achieve this, we employed the continuous wavelet transform (CWT) method, known for its reputation and robustness, to convert the 1D time-domain signal into a time-frequency (TF) representation. Nevertheless, this approach is not mandatory nor is it a focal point of our contribution; alternative methods for time-frequency (TF) conversion can also be considered. We explored results using other well-established methods like short-time Fourier transform (STFT), empirical mode decomposition (EMD), empirical wavelet transform (EWT), variational mode decomposition (VMD), and empirical Fourier decomposition (EFD). These methods consistently yielded stable classification performance without significant deviations. Thus, it underscores that BCINetV1 is invariable to the specific 2D TF conversion technique employed.
The predictive proficiency of the model was assessed using four benchmark metrics: accuracy, precision, recall, and the kappa coefficient. Accuracy reflects the model’s capability to correctly classify instances across all classes, providing a comprehensive performance overview. Precision highlights the model’s ability to avoid false positives by indicating how many positively predicted instances were actually positive. Recall showcases the model’s sensitivity to true positives by capturing its capacity to identify all positive instances. Lastly, the kappa coefficient considers random agreement in predictions, providing a more nuanced measure of model performance beyond chance expectations. The mathematical relations for these metrics are as follows:
Accuracy = T P + T N T P + T N + F P + F N
Recall = T P T P + F N
F Score = 2 × Precision + Recall Precision + Recall
Kappa = Accuracy + γ Acc 1 γ Acc
where TP, TN, FP, and FN denote true positive, true negative, false positive, and false negative outcomes, respectively. Further γ A c c is the random accuracy which refers to the baseline level of agreement that would occur by random chance alone, without any predictive power from the model.

3.3. State-of-the-Art Comparison Models

A critical aspect of validating any novel neural network architecture is its comparison against existing, well-regarded models. In this context, BCINetV1, which integrates convolutional, multiscale, and attention principles, is compared with a diverse set of state-of-the-art EEG decoding models. These benchmark models, summarized in Table 1, encompass both foundational and advanced designs within the realms of CNNs and attention-based architectures. The subsequent table provides an overview of their working mechanisms, significance, and specific relevance as comparators for BCINetV1.

4. Experimental Results and Discussions

4.1. Results and Discussions for Dataset 1

To maintain a clean and concise initial analysis, we first conduct an in-depth evaluation of the BCINetV1 model exclusively using EEG data from the Dataset 1 cohort. Subsequent sections will then present comparative classification results for Datasets 2, 3, and 4. It is important to note that the detailed findings and trends observed with Dataset 1 are equally representative of the model’s performance across the other datasets.

4.1.1. Five-Fold Classification Performance

This section details the classification efficacy of the BCINetV1 model, assessed via a 5-fold cross-validation on Dataset 1. Figure 2 illustrates the average 5-fold outcome for all subjects of Dataset 1.
Upon detailed examination, the classification performance of the BCINetV1 model presents notable outcomes. The average accuracy rates for subjects AA, AL, AV, AW, and AY stand at 98.2%, 98%, 98.8%, 99%, and 99.4%, respectively. When it comes to recall, the subjects exhibit mean rates of 97.9%, 97.8%, 98.4%, 98.3%, and 97.2%. The f-scores also reflect high performance, averaging at 98.5%, 98%, 97.94%, 98.9%, and 98.6% for the respective subjects. Lastly, the kappa statistics maintain a strong showing with mean values of 95.4%, 95.5%, 97.3%, 97.6%, and 97.8%. The figures indicate a positive correlation among all performance metrics, implying that an improvement in accuracy corresponds to enhancements in recall, f-score, and kappa, suggesting an unbiased classification by the model. The results indicate not only the model’s robustness in correctly identifying and classifying EEG patterns associated with different MI tasks but also its reliability and consistency across various subjects. The exceptional performance metrics highlight the BCINetV1 model’s promise as a valuable asset in BCI research and its practical applications.

4.1.2. Comparison of BCINetV1 with State-of-the-Art Methods

In this section, we delve into a detailed comparative analysis of our BCINetV1 framework. The comparative analysis, as presented in Table 2, is segmented into two primary categories: (1) hybrid signal processing methods that often combine traditional feature extraction with machine learning classifiers, and (2) benchmark deep learning methods, as given in Table 1, which learn features end-to-end. The performance evaluation considers individual subject accuracy, overall average accuracy, and standard deviation.
When compared to hybrid signal processing methods, BCINetV1 demonstrates a clear advantage. The top-performing method in this category, MEWT+JIA+MLP by [43], achieved an average accuracy of 97.0% with a standard deviation of 2.70%. BCINetV1 surpasses this by 1.6% in average accuracy and exhibits significantly better consistency (0.50% std. dev. vs. 2.70%). Other notable hybrid methods, such as LRFS+TSD by [42] (95.2% avg. acc.) and WPD+HOS+SVM by [40] (91.7% avg. acc.), are also outperformed by a considerable margin, highlighting BCINetV1’s superior ability to discern relevant patterns without extensive manual feature engineering.
Next, we compare BCINetV1 to the deep learning models given in Table 1. To ensure a fair comparison, we meticulously reproduced the architectures and simulated the performance of several deep learning models for which specific results on this dataset were not readily available in the existing literature. Within the deep learning methods category, BCINetV1 again takes the lead. Its closest competitor is the LSTM+multi-head Attention method by [47], which reported an impressive average accuracy of 98.2% with a standard deviation of 0.72%. While the average accuracy margin is 0.4%, BCINetV1’s lower standard deviation (0.50% vs. 2.72%) indicates greater reliability across different subjects. Other deep learning approaches like MSFCNN (95.6% avg. acc.), DeepEnsembleNet by [46] (93.6% avg. acc.), and TCANet (93.1% avg. acc.) also fall short of BCINetV1’s performance. Compared to the benchmark EEGNet (60.7% avg. acc.), BCINetV1 achieves a substantial gain of 37.9%, demonstrating the significant advancements incorporated into our model. Furthermore, to formally validate these performance improvements, a comprehensive statistical analysis was conducted, the details of which are provided in the Supplementary Materials (Section S1. Statistical Analysis). This analysis employed the non-parametric Wilcoxon signed-rank test for pairwise comparisons, with p-values adjusted for multiple comparisons using the Benjamini–Hochberg procedure for false discovery rate control. The tests confirm that the superior accuracy of BCINetV1 compared to the top-performing benchmark models is statistically significant (p < 0.05) across all four datasets. This robustly demonstrates that the observed advantages of our proposed framework are not due to chance.
The combination of leading average accuracy and the lowest standard deviation is a key differentiator for BCINetV1. This indicates not only high predictive power but also strong stability, generalization capability, and adaptability, making it less prone to overfitting and more reliable across diverse data scenarios compared to methods with higher performance variance.

4.1.3. Effect of Variational Electrodes Combinations

In this section, we explore the impact of varying the number of EEG channels on the performance of our BCINetV1 model. The careful selection of channels is pivotal in MI EEG classification. This is because motor imagery tasks primarily modulate brain activity in the sensorimotor cortex, a region spanning the primary motor and somatosensory areas. The EEG electrodes located directly over this critical strip, primarily those designated as Central (C) and Centro-Parietal (CP), are known to capture the most discriminative event-related desynchronization/synchronization patterns. Consequently, incorporating channels beyond this key area may introduce extraneous data, potentially confounding the model and increasing computational demands. We evaluated the BCINetV1 model with four distinct channel configurations [25]: a minimal set of 3 channels (C3, CZ, and C4), an expanded set of 8 channels (F3, F4, T7, C3, C4, CZ, T8, and PZ), a broader array of 18 channels (C5, C3, C1, C2, C4, C6, CP5, CP3, CP1, CP2, CP4, CP6, P5, P3, P1, P2, P4, and P6), and the full complement of 118 channels available in the dataset.
As depicted in Figure 3, a clear trend emerges: accuracy improves significantly as more channels are used for training. For instance, the minimal 3-channel configuration results in the lowest accuracies (ranging from 78.9% to 88.6%), while the comprehensive 118-channel setup achieves near-perfect performance, with accuracies consistently exceeding 99% for all subjects. Meanwhile, the 18-channel configuration also delivers strong performance, with accuracy figures ranging between 95.7% and 98.1%, trailing closely behind the 118-channel results. The improvement in accuracy when using 118 channels compared to 18 channels for training is slight, yet the training duration for the full set is six times longer than for 18 channels. Additionally, the 18-channel data are not only time-efficient but also clinically significant, as they are derived exclusively from the motor cortex region. In light of these results, the 18-channel configuration has been selected as the most efficient. This also highlights the BCINetV1 model’s resilience in handling the potential noise from the extraneous information in the 118 channels, maintaining exceptional performance.

4.1.4. Statistical Analysis of Features

In this section, we conduct a statistical analysis of the extracted features for our proposed BCINetV1 model to assess the relevance of extracted hallmarks and better interpret the model. In this regard, we have considered the input EEG data from subject AA of Dataset 1 and extricated 2400 deep features with the BCINetV1 model. Figure 4 illustrates five statistical measures, namely mean, median, standard deviation, first quartile, and third quartile, respectively, for class 1 and 2 features. Such statistical metrics are significant to analyze in order to assess the central tendency measure, distribution, dispersion, and outliers in the features which are all vital for class segregation analysis. For better visualization, the statistical values of features from class 1 are arranged in ascending order, while the similar arrangement belonging to class 1 was transferred to class 2 features for fair comparison. It can be seen that the mean and median values of hallmarks from both classes are antipode to each other. This validates that the features from different classes originate from distinct distributions with well-separated central tendency measures. Similarly, the standard deviation depicts an inversely proportionate relationship between the two classes, which is yet another indicator that our model is sensitive to the variability in the data, potentially augmenting its predictive accuracy and generalization capability. Furthermore, the analysis of the first and third quartiles for the two classes reveals a consistent pattern of separation, indicating that not only the central values but also the spread and range of the data differ significantly between classes. This separation in the lower and upper quartiles underscores the model’s ability to distinguish between the classes across the entire distribution spectrum, enhancing its classification robustness and reliability.
To further evaluate the extracted features, we performed a pairwise analysis of variance (ANOVA) test for determining the statistical significance (p-value) of each feature pair. Figure 5a shows the correlation map of ANOVA test p-values, where the pairwise test is conducted for each and every combination of 2400 features. Hence the total number of tested combinations was 2,878,800. As a general practice in research, the null hypothesis cutoff threshold was set to 10% of the entire population. For better representation, those tests with p-values > threshold were set to 1 (non-significant white pixel), and those with p-values < threshold were set to 0 (significant black pixel). Consequently, we achieve a binary image with black and white pixels only. It is apparent from the Figure 5a that the figure is black dominant, clearly suggesting that a substantial proportion of the feature pairs exhibit statistically significant differences, highlighting the potential for these features to contribute meaningfully to the predictive modeling process. Next, in Figure 5b we have further quantified the occurrences of significant pairs in the form of a histogram. It is evident that the number of p-value instances between 0 and 0.1 is the highest (>95%), while the non-significant instances are <5% in total. The combined statistical tests incorporating mean, median, standard deviation, first and third quartile, and lastly the ANOVA test reveal that the BCINetV1 model produces statistically meaningful features which augments the class discrepancy of the MI EEG tasks and ameliorates the predictive accuracy of the model.

4.1.5. Topographical Maps and Features Representation

To illustrate the significance of the features extracted by BCINetV1 for MI EEG tasks, topographical representations are provided in Figure 6. These topo-plots are generated by averaging the features extracted from each of the 18 sensorimotor cortex channels across all subjects in Dataset 1. The electrodes were methodically selected to ensure uniform coverage across both hemispheres: on the left hemisphere, the channels C5, C3, C1, CP5, CP3, CP1, P1, P3, and P5 are positioned, whereas on the right hemisphere, the channels C6, C4, C2, CP6, CP4, CP2, P6, P4, and P2 are located, demonstrating a symmetric distribution for comprehensive spatial analysis. The figure reveals a consistent pattern indicating that the features extracted by BCINetV1 effectively discriminate between hand and foot MI EEG tasks. Notably, an increase in model activations is observed in the left hemisphere of the motor cortex during hand MI tasks and conversely in the mid-right hemisphere for foot MI tasks.
This underscores BCINetV1’s robust capability to pinpoint task-specific information across the relevant channels. The topographical plot further illustrates that the intensity of activity, both positive and negative, fluctuates across different channels, suggesting that the imagined actions are detected irrespective of the spatial positioning of the channels, highlighting the model’s precision in capturing MI tasks. Furthermore, to illustrate the separability of the extracted features, we have used t-distributed stochastic neighbor embedding (t-SNE) method to plot the high-dimensional features into a two-dimensional scatter plot as shown in Figure 7. The plots are portrayed for all four datasets used in this study each having distinct number of classes. This visualization represents data from all four datasets used in our study, each with a different number of classes. Initially, the pre-trained features appear intermixed, lacking discernible patterns. However, post-training, the features segregate into distinct clusters for each class, showcasing the BCINetV1 model’s robust capability for feature separation. This evidence confirms that our model not only achieves high accuracy but also extracts clinically relevant features for the motor imagery tasks at hand.

4.1.6. Ablation Study

In this section, we conduct an ablation study to assess the individual contributions of the BCINetV1 model’s components to its overall performance. Ablation studies are crucial for dissecting the model’s architecture, thereby improving its transparency, gaining deeper understanding of the learning mechanisms, and refining the parameter settings for optimal performance. Table 3 displays the classification performance and the statistical significance of features derived from various combinations of modules within the BCINetV1 framework. The accuracy figures provided are the average results for the five subjects from Dataset 1. It has been noted that excluding the ConvSAT block from the combination of T-CAB and S-CAB modules leads to diminished classification effectiveness and less significant features. This is likely because the ConvSAT block acts as a central processing element for both T-CAB and S-CAB, capturing essential local and global contexts within the EEG signals. In its absence, the T-CAB and S-CAB modules struggle to pinpoint the critical information in the signals, thereby impairing the overall classification performance.
Furthermore, it is evident that the SEB module becomes ineffective without channel reordering. This ineffectiveness is attributed to the fact that channel reordering facilitates the adjacent pairing of corresponding temporal and spectral channels, which are then efficiently processed by the group convolution layer for enhanced feature extraction. Omitting this reordering step causes a separation of relevant temporal and spectral channels, leading to the SEB block’s inability to detect shared patterns within the data. Continuing the analysis, the SEB, T-CAB, and S-CAB modules independently achieve classification accuracies of 72.45%, 75.98%, and 85.09%, respectively. This progressive improvement in accuracy reflects each module’s effectiveness in extracting MI task-relevant patterns from the EEG data. However, it is noteworthy that none of these modules reach their maximum accuracy potential when operating independently. This underlines that the true strength of the BCINetV1 model lies in the combined operation of its modules, emphasizing that an integrated approach is crucial for achieving the best results in MI task classification.
To illustrate the complete workflow of the BCINetV1 framework, Figure 8 presents a schematic diagram that traces the journey from data input to SEB output, mapping out how the T-CAB, S-CAB, and SEB modules interact within the network architecture. For better visualization, we have chosen a sample signal with apparent ERS/ERD signatures in the time-frequency domain; however, the BCINetV1 model is equally robust for a diverse ranges of MI EEG signals. It can be seen in Figure 8a that the initial 1D temporal signal does not make much sense visually, and it is impossible to make any interpretation about such a signal using clinical analysis. In Figure 8b, this signal is transformed into a 2D tempo-spectral representation, with time stamps along the x-axis and frequency on the y-axis. This transformation yields a signal that provides insightful information about spectral variations over time. However, the resulting 2D signal is marred by excessive noise and contains an overwhelming amount of extraneous information.
In Figure 8c,e, we can observe that the attention maps/masks generated by the T-CAB and S-CAB modules are localized in the time and frequency domains, respectively. The resultants in Figure 8d,f clearly showcase the precise temporal and spectral instances captured by the T-CAB and S-CAB modules by amplifying the task-related instances and attenuating the non-significant ones. Lastly, Figure 8g gives the final output as yielded by the SEB module. The SEB module successfully integrates the responses from the T-CAB and S-CAB modules, as demonstrated by the final output. This output provides a clear clinical indication of the ERS/ERD in the signal, highlighted by three red distinct spikes within the β -rhythm. This integration, combined with the spatial origin of the signal, facilitates a straightforward clinical determination of the specific MI task represented in the EEG data.

4.1.7. Computational Complexity Analysis

This section delves into the computational complexity of the BCINetV1 model, focusing on how variations in input dimensions affect its processing time. As mentioned previously, the network’s input is a two-dimensional image, with time stamps along the x-axis and frequency on the y-axis. Our aim is to assess the impact of altering these dimensions on the model’s training and testing durations. Figure 9 presents this analysis, showing the changes in x and y dimensions of the input image across a range of values. We employed curve fitting techniques to accurately model and understand the computational complexity (C.C) of the BCINetV1 model under these varying conditions. Figure 9a examines how the computational time for a single-trial changes with the x-dimension (time stamps) of the input signal, ranging from 50 to 1000, while maintaining a constant frequency (y-dimension) of 100 Hz.
The findings indicate a consistent increase in both training and testing times as the number of time stamps grows. Through curve fitting analysis, a cubic polynomial, O ( n 3 ) , was determined to represent the computational complexity (C.C) in relation to the x-dimension increase. Conversely, Figure 9b shows a linear increase in C.C, represented by O ( n ) , as the frequency of the input signal varies from 150 to 1000 Hz, with the time stamps held steady at 50. Consequently, the overall computational complexity of the system is established at O ( n 3 ) , which is considered reasonable for an efficient deep learning model. Additionally, the analysis revealed that the maximum time taken for processing a single trial is under 0.4 s, underscoring the model’s capability for rapid classification. In summary, the BCINetV1 model demonstrates a manageable computational complexity, scaling predictably with input size, and maintains a swift processing rate, making it a viable and efficient choice for real-time classification tasks in EEG analysis.
Figure 10 presents a multi-dimensional comparative analysis between BCINetV1 and other benchmark models illustrating the number of trainable parameters, the achieved accuracy, and the inference speed. It is important to note that the number of trainable parameters vary depending on dataset-specific characteristics like sampling rate and signal length. The values reported in Figure 10 are therefore based on the configuration for Dataset 1. However, we have confirmed that while the absolute parameter counts may shift slightly for the other datasets, the relative efficiency and the overall trend depicted here remain consistent. As illustrated in Figure 10, BCINetV1 is positioned in the top-left quadrant, signifying its ability to deliver state-of-the-art performance with a remarkably low number of trainable parameters. Specifically, BCINetV1 achieves a leading accuracy of 98.6% with only 19,648 parameters. In stark contrast, many contemporary models that achieve high accuracy, such as DeepConvNet (92.8%), EEG-Conformer (88.3%), and TCANet (93.1%), occupy the right side of the plot, requiring more parameters (164k, 158k, and 307k, respectively). This highlights a significant computational expense for a comparatively lower performance. When compared to models with a similar parameter, such as MSCNN (19,396 parameters), the BCINetV1 demonstrates a clear advantage by achieving a substantial 3% gain in accuracy. This analysis robustly demonstrates that the architectural design of BCINetV1, which integrates feature extraction and attention in a streamlined manner, successfully balances high predictive power with computational parsimony. This establishes it as a highly efficient and effective solution for MI EEG decoding, particularly for practical applications where computational resources may be limited.

4.1.8. Analysis with Model Parameters

In this section, our objective is to fine-tune the primary training parameters of the BCINetV1 model to strike an optimal balance between precision and computational efficiency. The key training parameters in deep learning models include the learning rate (lr), the number of training epochs, and the selection of an optimal optimizer function. We aim to identify the most effective combination of these parameters for the five subjects in Dataset 1 and subsequently apply the same optimized settings to Datasets 2 through 4. Figure 11 illustrates the empirical outcomes for these parameters across various settings, providing a comprehensive view of their performance impact.
Figure 11a delves into the learning rate’s (lr) impact on the training accuracy for various subjects. The x-axis spans lr values from 1 × 10 1 to 1 × 10 6 across six logarithmic increments, illustrating how lr influences the speed at which the loss function minimizes and, consequently, the maximization of classification accuracy. A too high lr might prevent convergence, leading to poor accuracy, whereas an excessively low lr can decelerate the convergence process, extending the training period unnecessarily. Our analysis shows that higher initial lrs, specifically 1 × 10 1 and 1 × 10 2 , do not converge, yielding low accuracy levels. A noticeable improvement in accuracy occurs at the lr of 1 × 10 3 , although it remains inconsistent across subjects. The accuracy stabilizes and maintains a consistent trend from the lr of 1 × 10 4 to 1 × 10 6 , with a minor dip observed at 1 × 10 6 , indicating a slowdown in convergence and a need for more extended training. Consequently, based on the experimental evidence, the lr of 1 × 10 4 was identified as optimal and was uniformly applied in subsequent experiments throughout this research.
In our second analysis, we assess how the number of training epochs affects the accuracy for each subject. Training epochs refer to the total number of complete passes the neural network makes through the entire training dataset. Insufficient epochs can lead to underfitting, while too many epochs might cause the model to overfit and unnecessarily extend the training time. The data presented in Figure 11b shows that starting from zero epochs, the model yields poor accuracies. However, there is a significant improvement after just one training pass through the dataset. The accuracy for all subjects reaches the peak after about 10 epochs, with little to no further changes. Based on these observations, we determined that 10 epochs provide an optimal balance for training the BCINetV1 model, highlighting the model’s rapid adaptability to the training data and its ability to achieve peak accuracy with minimal training time.
Finally, we explored how different optimizers affect the accuracy of our subjects. Optimizers play a crucial role in adjusting the loss function during backpropagation, and a well-chosen one can lead to quicker convergence, enhanced generalization, and improved model stability. In our study, we evaluated the performance of three widely used optimizers in EEG research: adaptive moment estimation (ADAM), root mean square propagation (RMSProp), and stochastic gradient descent with momentum (SGDM). The results, as depicted in Figure 11c, demonstrate that the ADAM optimizer markedly surpasses RMSProp and SGDM in terms of classification accuracy. ADAM’s superior performance is attributed to its effective moment estimation, bias correction, adaptive learning rates, and robustness against hyperparameter variations. Consequently, the ADAM optimizer was selected for subsequent experiments within this research, based on its standout empirical performance.

4.2. Analysis with Other BCI EEG Datasets

This section evaluates the performance and generalizability of BCINetV1 on three additional, distinct BCI EEG datasets. The results presented herein for each dataset represent the average performance across all subjects; however, detailed subject-wise classification outcomes are comprehensively provided in the supplementary document accompanying this manuscript. The comparative analyses in the following subsections are conducted in terms of accuracy, recall, f-score, and kappa to offer a thorough assessment of BCINetV1 against state-of-the-art methods.
As indicated by Table 4, the BCINetV1 demonstrated all-around performance on Dataset 2, achieving a high average accuracy of 96.6%, recall of 96.6%, an f-score of 96.5%, and a kappa score of 91.9%, all with very consistent results across the 52 subjects. When compared to hybrid signal processing methods, BCINetV1 was clearly superior; for example, its accuracy was 4.6% higher than the DR+ICA+SVM method by Tyler et al. [48] (92.0% accuracy), and it significantly surpassed the EFD-CNN method by Binwen et al. [49], being 6.7% more accurate (EFD-CNN had an accuracy and f-score of 89.9% and a kappa of 79.8%). Against the benchmark deep learning models, BCINetV1 also maintained a strong lead across all metrics. For instance, the ST-Transformer model achieved 93.7% accuracy (making BCINetV1 2.9% higher), 94.3% recall, a 93.3% f-score, and an 87.4% kappa, all of which are noticeably lower than BCINetV1’s scores. Similarly, models like TCANet (92.5% accuracy, meaning BCINetV1 was 4.1% higher; 93.7% f-score; 84.9% kappa) and EEG-Conformer (84.9% accuracy, resulting in an 11.7% lead for BCINetV1; 85.6% f-score; 69.8% kappa) lagged behind BCINetV1 in these measures. Simpler deep learning architectures such as DeepConvNet and ShallowConvNet showed even larger performance gaps across all these indicators.
For the three-class Dataset 3 (given in Table 5), BCINetV1 demonstrated stable performance, achieving an average accuracy of 97.2%, recall of 96.9%, an f-score of 97.5%, and a kappa of 95.3%, all with a low standard deviation of 0.3%. When compared to hybrid signal processing methods, BCINetV1 was 1.1% behind CADMMI-SDI by Sadiq et al. [50] (99.3% accuracy). However, this difference is insignificant. Additionally, the CADMMI-SDI belongs to the category of hybrid signal processing methods, which has their inherent limitations mentioned in Section 1. Moving forward, the BCINetV1 shows a considerable advantage; for instance, its accuracy was 3.4% higher than the EFD-CNN method by Binwen et al. and the PCA-based RF Model by Siuly et al. [51] (83.2% accuracy, resulting in a 14% lead for BCINetV1). Against other deep learning models, BCINetV1 also performed impressively. While the EEG-Conformer matched BCINetV1’s accuracy at 97.2% (with 98.3% recall, 98.0% f-score, and 95.7% kappa), BCINetV1 achieved its results with remarkable consistency, as indicated by its extremely low standard deviation. Other strong deep learning models like TCANet (96.0% accuracy, meaning BCINetV1 was 1.2% higher; 95.5% f-score; 94.0% kappa) and ST-Transformer (94.7% accuracy, giving BCINetV1 a 2.5% advantage; 95.3% f-score; 92.1% kappa) did not reach BCINetV1’s level of combined accuracy and stability. Simpler architectures such as EEGNet (83.1% accuracy, where BCINetV1 was 14.1% higher; 83.5% f-score; 74.6% kappa) and ShallowConvNet (84.8% accuracy, showing a 12.4% lead for BCINetV1; 86.0% f-score; 77.1% kappa) showed considerably lower performance on all metrics.
Table 4. Comparison chart of Dataset 2 with state-of-the-art methods. To facilitate comparison, studies are ordered by increasing average classification accuracy.
Table 4. Comparison chart of Dataset 2 with state-of-the-art methods. To facilitate comparison, studies are ordered by increasing average classification accuracy.
CategoryAuthored ByYearMethodAccuracy (%)Recall (%)F-score (%)Kappa (%)
Hybird signal
processing methods
Kumar et al. [52]2019CSP+LSTM68.1 ± 9.0683.3 ± NA-65.0 ± NA
Kumar et al. [53]2021OPTICAL+69.5 ± NA--39.8 ± NA
Hossain et al. [54]2021SVM+LR+NB+KNNFFS71.1 ± 6.7770.2 ± 5.6079.0 ± 10.08-
Park et al. [55]20233D-EEGNet81.3 ± 7.27---
Yu et al. [25]2021EFD83.8 ± NA-83.8 ± NA-
Binwen et al. [49]2022EFD-CNN89.9 ± NA-89.9 ± NA79.8 ± NA
Fan et al. [56]2023TFTP+3D-CNN91.9 ± NA---
Tyler et al. [48]2021DR+ICA+SVM92.0 ± NA---
Deep Learning methodsSimulated2025DeepConvNet67.7 ± 0.4667.7 ± 0.4667.7 ± 0.4635.3 ± 0.91
Simulated2025ShallowConvNet70.7 ± 4.5470.6 ± 5.1370.0 ± 4.4041.3 ± 9.20
Simulated2025FBCNet75.9 ± 2.9276.8 ± 3.0176.1 ± 2.4151.7 ± 5.86
Simulated2025CCNet78.8 ± 4.8880.0 ± 5.8479.7 ± 5.6457.5 ± 9.78
Simulated2025EEG-Conformer84.9 ± 2.9983.9 ± 2.4585.6 ± 3.2569.8 ± 5.98
Simulated2025TSception85.3 ± 4.7684.5 ± 5.0885.0 ± 5.1669.9 ± 9.52
Simulated2025MSCNN90.6 ± 2.9590.9 ± 3.5290.3 ± 2.1981.1 ± 5.92
Simulated2025ATCNet91.7 ± 1.0790.9 ± 0.1191.4 ± 0.2583.3 ± 2.15
Simulated2025TCANet92.5 ± 3.0092.7 ± 2.7493.7 ± 3.5984.9 ± 6.02
Simulated2025ST-Transformer93.7 ± 1.7794.3 ± 1.8393.3 ± 1.5687.4 ± 3.54
This Study2025BCINetV196.6 ± 1.7096.6 ± 1.7396.5 ± 1.4191.9 ± 2.51
For the four-class Dataset 4 given in Table 6, BCINetV1 again demonstrated superior performance across all metrics, achieving an average accuracy of 98.4%, recall of 98.4%, an f-score of 98.4%, and a kappa of 97.8%, with a low standard deviation of 0.6%. When compared to hybrid signal processing methods, BCINetV1’s advantage was substantial. For example, its accuracy was remarkably 15.9% higher than the ESVL method by Luo et al. [58] (82.5% accuracy; 65% kappa). Against other deep learning models, BCINetV1 also showcased a significant lead. The attention-based AMSTCNet by Zhang et al. [59] achieved an accuracy of 87.5% (88% f-score; 83% kappa), meaning BCINetV1 was 10.9% more accurate. Similarly, ATCNet by Altaheri et al. [60] (85.4% accuracy; 81% kappa) was outperformed by BCINetV1 by 13%. Even robust general benchmarks like EEG-Conformer by Song et al. [33] (78.6% accuracy; 71.55% kappa) and EEGNet (76.4% accuracy; 68.6% kappa) showed considerably lower accuracies, with BCINetV1 being 19.8% and 22% higher, respectively.
In summary, the comprehensive analysis across these diverse BCI EEG datasets underscores the stability and generalizability of BCINetV1. While many contemporary deep learning models, such as ST-Transformer which performed well on Dataset 2 (93.7% accuracy) and Dataset 3 (94.7% accuracy) but saw its accuracy drop to 57.7% on the more complex Dataset 4, or EEG-Conformer which achieved 97.2% on Dataset 3 but only 78.6% on Dataset 4, exhibit significant performance fluctuations and inconsistencies when faced with different data characteristics or task complexities. In stark contrast, BCINetV1 consistently delivered exceptionally high accuracy, recall, f-score, and kappa values across all tested datasets, regardless of the number of classes, subjects, or specific recording paradigms. This consistent, high-level performance strongly suggests that BCINetV1’s architecture effectively captures robust and discriminative features, making it a highly reliable and dataset-independent solution for MI EEG decoding.

5. Limitations and Future Work

The BCINetV1 model showcases promising potential for accurate and clinically significant classification of EEG signals for the realization of BCI systems; nonetheless, there is room for improvement in several areas: (1) While effectively processing the non-stationary temporal and spectral data, the model lacks a mechanism for spatial information analysis. Currently, the selection of 18 motor cortex electrodes is manual, but this process could be automated by integrating a spatial information processing block into the model. (2) The model could be further improved by adding a module for extracting common features across subjects in an end-to-end manner, making it more suitable for cross-subjects classification tasks. (3) As the name implies, the BCINetV1 is the first version of a planned series of networks. Future developments will focus on creating a more generalized framework capable of handling not only MI EEG but also other BCI paradigms like P300, ERP, and SSVEP, thereby broadening the model’s applicability and effectiveness.

6. Conclusions

This paper introduced BCINetV1, a new deep learning model for decoding motor imagery from EEG signals. The model uses three main components: a temporal convolution-based attention block (T-CAB), a spectral convolution-based attention block (S-CAB), and a squeeze-and-excitation block (SEB). The T-CAB and S-CAB modules, using a novel convolutional self-attention (ConvSAT) mechanism, identify important time-based and frequency-based patterns in the EEG data. The ConvSAT mechanism helps capture both local details and broader context. The SEB module then combines these identified patterns for classification. Experiments were conducted on four different MI EEG datasets, which included tasks with two or more choices. BCINetV1 consistently achieved high average classification accuracies of 98.6% (Dataset 1), 96.6% (Dataset 2), 96.9% (Dataset 3), and 98.4% (Dataset 4), along with strong results for recall, f-score, and kappa. When compared to several existing methods, BCINetV1 generally showed better accuracy and more stable performance across these diverse datasets. The study also indicated that BCINetV1 is computationally efficient, can extract clinically meaningful features, and addresses the non-stationary nature of EEG signals without needing complex pre-processing steps. In summary, BCINetV1 offers an effective and interpretable approach for MI EEG decoding, suitable for brain–computer interface applications.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/s25154657/s1, Section S1: Statistical Analysis for Dataset 1; Section S2: Detailed Results for Dataset 2; Section S3: Detailed Results for Dataset 3; Section S4: Detailed Results for Dataset 4.

Author Contributions

Conceptualization, M.Z.A. and X.Y.; methodology, M.Z.A., X.Y., X.G. and Z.F.; software, M.Z.A. and X.G.; validation, M.Z.A., X.H. and B.H.; formal analysis, M.Z.A. and X.Y.; writing—original draft preparation, M.Z.A.; visualization, M.Z.A. and X.G.; supervision, X.Y.; project administration, X.Y.; funding acquisition, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Natural Science Foundation of Guangdong Province (Grant No. 2025A1515011449), Aviation Science Foundation Project (Grant No. 20240001053007), and National Natural Science Foundation of China (Grant No. 62220106006).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in this article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. World Health Organization. Over 1 in 3 People Affected by Neurological Conditions: The Leading Cause of Illness and Disability Worldwide. WHO News. 2024. Available online: https://www.who.int/news/item/14-03-2024-over-1-in-3-people-affected-by-neurological-conditions--the-leading-cause-of-illness-and-disability-worldwide (accessed on 9 May 2025).
  2. Daly, J.J.; Huggins, J.E. Brain-computer interface: Current and emerging rehabilitation applications. Arch. Phys. Med. Rehabil. 2015, 96, S1–S7. [Google Scholar] [CrossRef]
  3. Yu, X.; Aziz, M.Z.; Sadiq, M.T.; Jia, K.; Fan, Z.; Xiao, G. Computerized multidomain EEG classification system: A new paradigm. IEEE J. Biomed. Health Inform. 2022, 26, 3626–3637. [Google Scholar] [CrossRef]
  4. Lee, M.H.; Kwon, O.Y.; Kim, Y.J.; Kim, H.K.; Lee, Y.E.; Williamson, J.; Fazli, S.; Lee, S.W. EEG dataset and OpenBMI toolbox for three BCI paradigms: An investigation into BCI illiteracy. GigaScience 2019, 8, giz002. [Google Scholar] [CrossRef]
  5. Chaudhary, U.; Birbaumer, N.; Ramos-Murguialday, A. Brain–computer interfaces for communication and rehabilitation. Nat. Rev. Neurol. 2016, 12, 513–525. [Google Scholar] [CrossRef]
  6. Lotte, F.; Bougrain, L.; Cichocki, A.; Clerc, M.; Congedo, M.; Rakotomamonjy, A.; Yger, F. A review of classification algorithms for EEG-based brain–computer interfaces: A 10 year update. J. Neural Eng. 2018, 15, 031005. [Google Scholar] [CrossRef]
  7. Saeidi, M.; Karwowski, W.; Farahani, F.V.; Fiok, K.; Taiar, R.; Hancock, P.A.; Al-Juaid, A. Neural decoding of EEG signals with machine learning: A systematic review. Brain Sci. 2021, 11, 1525. [Google Scholar] [CrossRef]
  8. Li, D.; Wang, J.; Xu, J.; Fang, X.; Ji, Y. Cross-Channel Specific-Mutual Feature Transfer Learning for Motor Imagery EEG Signals Decoding. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 13472–13482. [Google Scholar] [CrossRef]
  9. Moctezuma, L.A.; Molinas, M. Classification of low-density EEG for epileptic seizures by energy and fractal features based on EMD. J. Biomed. Res. 2020, 34, 180. [Google Scholar] [CrossRef]
  10. Akbari, H.; Sadiq, M.T.; Rehman, A.U. Classification of normal and depressed EEG signals based on centered correntropy of rhythms in empirical wavelet transform domain. Health Inf. Sci. Syst. 2021, 9, 9. [Google Scholar] [CrossRef]
  11. Pandey, P.; Seeja, K. Subject independent emotion recognition from EEG using VMD and deep learning. J. King Saud Univ.—Comput. Inf. Sci. 2022, 34, 1730–1738. [Google Scholar] [CrossRef]
  12. Thanigaivelu, P.; Sridhar, S.; Sulthana, S.F. OISVM: Optimal Incremental Support Vector Machine-based EEG Classification for Brain-computer Interface Model. Cogn. Comput. 2023, 15, 888–903. [Google Scholar] [CrossRef]
  13. Zhang, R.; Zong, Q.; Dou, L.; Zhao, X. A novel hybrid deep learning scheme for four-class motor imagery classification. J. Neural Eng. 2019, 16, 066004. [Google Scholar] [CrossRef]
  14. Brunner, C.; Leeb, R.; Müller-Putz, G.; Schlögl, A.; Pfurtscheller, G. BCI Competition 2008—Graz Data Set A; Institute for Knowledge Discovery (Laboratory of Brain-Computer Interfaces), Graz University of Technology: Graz, Austria, 2008; Volume 16, pp. 1–6. [Google Scholar]
  15. Dai, G.; Zhou, J.; Huang, J.; Wang, N. HS-CNN: A CNN with hybrid convolution scale for EEG motor imagery classification. J. Neural Eng. 2020, 17, 016025. [Google Scholar] [CrossRef]
  16. Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural Eng. 2018, 15, 056013. [Google Scholar] [CrossRef]
  17. Deng, X.; Zhang, B.; Yu, N.; Liu, K.; Sun, K. Advanced TSGL-EEGNet for motor imagery EEG-based brain-computer interfaces. IEEE Access 2021, 9, 25118–25130. [Google Scholar] [CrossRef]
  18. Ingolfsson, T.M.; Hersche, M.; Wang, X.; Kobayashi, N.; Cavigelli, L.; Benini, L. EEG-TCNet: An accurate temporal convolutional network for embedded motor-imagery brain–machine interfaces. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada,, 11–14 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 2958–2965. [Google Scholar]
  19. Musallam, Y.K.; AlFassam, N.I.; Muhammad, G.; Amin, S.U.; Alsulaiman, M.; Abdul, W.; Altaheri, H.; Bencherif, M.A.; Algabri, M. Electroencephalography-based motor imagery classification using temporal convolutional network fusion. Biomed. Signal Process. Control 2021, 69, 102826. [Google Scholar] [CrossRef]
  20. Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef]
  21. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  22. Zhang, D.; Chen, K.; Jian, D.; Yao, L. Motor imagery classification via temporal attention cues of graph embedded EEG signals. IEEE J. Biomed. Health Inform. 2020, 24, 2570–2579. [Google Scholar] [CrossRef]
  23. Amin, S.U.; Altaheri, H.; Muhammad, G.; Abdul, W.; Alsulaiman, M. Attention-inception and long-short-term memory-based electroencephalography classification for motor imagery tasks in rehabilitation. IEEE Trans. Ind. Inform. 2021, 18, 5412–5421. [Google Scholar] [CrossRef]
  24. Huang, Y.C.; Chang, J.R.; Chen, L.F.; Chen, Y.S. Deep neural network with attention mechanism for classification of motor imagery EEG. In Proceedings of the 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER), San Francisco, CA, USA, 20–23 March 2019; IEEE: Piscatway, NJ, USA, 2019; pp. 1130–1133. [Google Scholar]
  25. Yu, X.; Aziz, M.Z.; Hou, Y.; Li, H.; Lv, J.; Jamil, M. An extended computer aided diagnosis system for robust BCI applications. In Proceedings of the 2021 IEEE 9th International Conference on Information, Communication and Networks (ICICN), Xi’an, China, 25–28 November 2021; IEEE: Piscatway, NJ, USA, 2021; pp. 475–480. [Google Scholar]
  26. Blankertz, B.; Muller, K.R.; Krusienski, D.J.; Schalk, G.; Wolpaw, J.R.; Schlogl, A.; Pfurtscheller, G.; Millan, J.R.; Schroder, M.; Birbaumer, N. The BCI competition III: Validating alternative approaches to actual BCI problems. IEEE Trans. Neural Syst. Rehabil. Eng. 2006, 14, 153–159. [Google Scholar] [CrossRef]
  27. Kumar, S.; Sharma, A. A new parameter tuning approach for enhanced motor imagery EEG signal classification. Med. Biol. Eng. Comput. 2018, 56, 1861–1874. [Google Scholar] [CrossRef]
  28. Yang, G.; Liu, J. A novel multi-scale fusion convolutional neural network for EEG-based motor imagery classification. Biomed. Signal Process. Control 2024, 96, 106645. [Google Scholar] [CrossRef]
  29. Ding, Y.; Robinson, N.; Zhang, S.; Zeng, Q.; Guan, C. TSception: Capturing temporal dynamics and spatial asymmetry from EEG for emotion recognition. IEEE Trans. Affect. Comput. 2022, 14, 2238–2250. [Google Scholar] [CrossRef]
  30. Liu, X.; Shi, R.; Hui, Q.; Xu, S.; Wang, S.; Na, R.; Sun, Y.; Ding, W.; Zheng, D.; Chen, X. TCACNet: Temporal and channel attention convolutional network for motor imagery classification of EEG-based BCI. Inf. Process. Manag. 2022, 59, 103001. [Google Scholar] [CrossRef]
  31. Altaheri, H.; Muhammad, G.; Alsulaiman, M. Physics-informed attention temporal convolutional network for EEG-based motor imagery classification. IEEE Trans. Ind. Inform. 2022, 19, 2249–2258. [Google Scholar] [CrossRef]
  32. Fu, T.; Chen, L.; Fu, Z.; Yu, K.; Wang, Y. CCNet: CNN model with channel attention and convolutional pooling mechanism for spatial image steganalysis. J. Vis. Commun. Image Represent. 2022, 88, 103633. [Google Scholar] [CrossRef]
  33. Song, Y.; Zheng, Q.; Liu, B.; Gao, X. EEG conformer: Convolutional transformer for EEG decoding and visualization. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 31, 710–719. [Google Scholar] [CrossRef]
  34. Liu, S.; An, L.; Zhang, C.; Jia, Z. A spatial-temporal transformer based on domain generalization for motor imagery classification. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, HI, USA, 1–4 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 3789–3794. [Google Scholar]
  35. Belwafi, K.; Gannouni, S.; Aboalsamh, H.; Mathkour, H.; Belghith, A. A dynamic and self-adaptive classification algorithm for motor imagery EEG signals. J. Neurosci. Methods 2019, 327, 108346. [Google Scholar] [CrossRef]
  36. Barmpas, K.; Panagakis, Y.; Adamos, D.A.; Laskaris, N.; Zafeiriou, S. BrainWave-Scattering Net: A lightweight network for EEG-based motor imagery recognition. J. Neural Eng. 2023, 20, 056014. [Google Scholar] [CrossRef]
  37. Singh, A.; Lal, S.; Guesgen, H.W. Reduce calibration time in motor imagery using spatially regularized symmetric positives-definite matrices based classification. Sensors 2019, 19, 379. [Google Scholar] [CrossRef]
  38. Singh, A.; Lal, S.; Guesgen, H.W. Small sample motor imagery classification using regularized Riemannian features. IEEE Access 2019, 7, 46858–46869. [Google Scholar] [CrossRef]
  39. Dai, M.; Wang, S.; Zheng, D.; Na, R.; Zhang, S. Domain transfer multiple kernel boosting for classification of EEG motor imagery signals. IEEE Access 2019, 7, 49951–49960. [Google Scholar] [CrossRef]
  40. Abougharbia, J.; Attallah, O.; Tamazin, M.; Nasser, A. A novel BCI system based on hybrid features for classifying motor imagery tasks. In Proceedings of the 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA), Istanbul, Turkey, 6–9 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
  41. Hekmatmanesh, A.; Wu, H.; Jamaloo, F.; Li, M.; Handroos, H. A combination of CSP-based method with soft margin SVM classifier and generalized RBF kernel for imagery-based brain computer interface applications. Multimed. Tools Appl. 2020, 79, 17521–17549. [Google Scholar] [CrossRef]
  42. Wijaya, A.; Adji, T.B.; Setiawan, N.A. Logistic Regression based Feature Selection and Two-Stage Detection for EEG based Motor Imagery Classification. Int. J. Intell. Eng. Syst. 2021, 14, 134–146. [Google Scholar] [CrossRef]
  43. Sadiq, M.T.; Yu, X.; Yuan, Z.; Zeming, F.; Rehman, A.U.; Ullah, I.; Li, G.; Xiao, G. Motor imagery EEG signals decoding by multivariate empirical wavelet transform-based framework for robust brain–computer interfaces. IEEE Access 2019, 7, 171431–171451. [Google Scholar] [CrossRef]
  44. Liu, S.; Zhang, J.; Wang, A.; Wu, H.; Zhao, Q.; Long, J. Subject adaptation convolutional neural network for EEG-based motor imagery classification. J. Neural Eng. 2022, 19, 066003. [Google Scholar] [CrossRef]
  45. Miao, M.; Hu, W.; Yin, H.; Zhang, K. Spatial-frequency feature learning and classification of motor imagery EEG based on deep convolution neural network. Comput. Math. Methods Med. 2020, 2020, 1981728. [Google Scholar] [CrossRef]
  46. Mehtiyev, A.; Al-Najjar, A.; Sadreazami, H.; Amini, M. Deepensemble: A novel brain wave classification in MI-BCI using ensemble of deep learners. In Proceedings of the 2023 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 6–8 January 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
  47. Sharma, N.; Upadhyay, A.; Sharma, M.; Singhal, A. Deep temporal networks for EEG-based motor imagery recognition. Sci. Rep. 2023, 13, 18813. [Google Scholar] [CrossRef]
  48. Grear, T.; Jacobs, D. Classifying EEG motor imagery signals using supervised projection pursuit for artefact removal. In Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2952–2958. [Google Scholar]
  49. Huang, B.; Xu, H.; Yuan, M.; Aziz, M.Z.; Yu, X. Exploiting asymmetric EEG signals with EFD in deep learning domain for robust BCI. Symmetry 2022, 14, 2677. [Google Scholar] [CrossRef]
  50. Sadiq, M.T.; Yu, X.; Yuan, Z.; Aziz, M.Z. Identification of motor and mental imagery EEG in two and multiclass subject-dependent tasks using successive decomposition index. Sensors 2020, 20, 5283. [Google Scholar] [CrossRef]
  51. Siuly, S.; Zarei, R.; Wang, H.; Zhang, Y. A new data mining scheme for analysis of big brain signal data. In Proceedings of the Databases Theory and Applications: 28th Australasian Database Conference, ADC 2017, Brisbane, QLD, Australia, 25–28 September 2017; Proceedings 28. Springer: Berlin/Heidelberg, Germany, 2017; pp. 151–164. [Google Scholar]
  52. Kumar, S.; Sharma, A.; Tsunoda, T. Brain wave classification using long short-term memory network based OPTICAL predictor. Sci. Rep. 2019, 9, 9153. [Google Scholar] [CrossRef]
  53. Kumar, S.; Sharma, R.; Sharma, A. OPTICAL+: A frequency-based deep learning scheme for recognizing brain wave signals. Peerj Comput. Sci. 2021, 7, e375. [Google Scholar] [CrossRef]
  54. Hossain, M.Y.; Sayeed, A. A comparative study of motor imagery (mi) detection in electroencephalogram (eeg) signals using different classification algorithms. In Proceedings of the 2021 International Conference on Automation, Control and Mechatronics for Industry 4.0 (ACMI), Rajshahi, Bangladesh, 8–9 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
  55. Park, D.; Park, H.; Kim, S.; Choo, S.; Lee, S.; Nam, C.S.; Jung, J.Y. Spatio-temporal explanation of 3D-EEGNet for motor imagery EEG classification using permutation and saliency. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 4504–4513. [Google Scholar] [CrossRef]
  56. Fan, C.; Yang, B.; Li, X.; Zan, P. Temporal-frequency-phase feature classification using 3D-convolutional neural networks for motor imagery and movement. Front. Neurosci. 2023, 17, 1250991. [Google Scholar] [CrossRef]
  57. Sadiq, M.T.; Yu, X.; Yuan, Z.; Aziz, M.Z.; Siuly, S.; Ding, W. A matrix determinant feature extraction approach for decoding motor and mental imagery EEG in subject-specific tasks. IEEE Trans. Cogn. Dev. Syst. 2020, 14, 375–387. [Google Scholar] [CrossRef]
  58. Luo, J.; Gao, X.; Zhu, X.; Wang, B.; Lu, N.; Wang, J. Motor imagery EEG classification based on ensemble support vector learning. Comput. Methods Programs Biomed. 2020, 193, 105464. [Google Scholar] [CrossRef]
  59. Zhang, Y.; Li, P.; Cheng, L.; Li, M.; Li, H. Attention-Based Multiscale Spatial-Temporal Convolutional Network for Motor Imagery EEG Decoding. IEEE Trans. Consum. Electron. 2023, 70, 2423–2434. [Google Scholar] [CrossRef]
  60. Altaheri, H.; Muhammad, G.; Alsulaiman, M.; Amin, S.U.; Altuwaijri, G.A.; Abdul, W.; Bencherif, M.A.; Faisal, M. Deep learning techniques for classification of electroencephalogram (EEG) motor imagery (MI) signals: A review. Neural Comput. Appl. 2023, 35, 14681–14722. [Google Scholar] [CrossRef]
  61. Aghaei, A.S.; Mahanta, M.S.; Plataniotis, K.N. Separable common spatio-spectral patterns for motor imagery BCI systems. IEEE Trans. Biomed. Eng. 2015, 63, 15–29. [Google Scholar] [CrossRef]
  62. Zhao, D.; Tang, F.; Si, B.; Feng, X. Learning joint space–time–frequency features for EEG decoding on small labeled data. Neural Netw. 2019, 114, 67–77. [Google Scholar] [CrossRef]
  63. Sakhavi, S.; Guan, C.; Yan, S. Learning temporal information for brain-computer interface using convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5619–5629. [Google Scholar] [CrossRef]
  64. Mahamune, R.; Laskar, S.H. An automatic channel selection method based on the standard deviation of wavelet coefficients for motor imagery based brain–computer interfacing. Int. J. Imaging Syst. Technol. 2023, 33, 714–728. [Google Scholar] [CrossRef]
  65. Mane, R.; Chew, E.; Chua, K.; Ang, K.K.; Robinson, N.; Vinod, A.P.; Lee, S.W.; Guan, C. FBCNet: A multi-view convolutional neural network for brain-computer interface. arXiv 2021, arXiv:2104.01233. [Google Scholar]
  66. Altuwaijri, G.A.; Muhammad, G. Electroencephalogram-based motor imagery signals classification using a multi-branch convolutional neural network model with attention blocks. Bioengineering 2022, 9, 323. [Google Scholar] [CrossRef]
  67. Ma, W.; Xue, H.; Sun, X.; Mao, S.; Wang, L.; Liu, Y.; Wang, Y.; Lin, X. A novel multi-branch hybrid neural network for motor imagery EEG signal classification. Biomed. Signal Process. Control. 2022, 77, 103718. [Google Scholar] [CrossRef]
Figure 1. Detailed structure of BCINetV1 model with its constituent T-CAB, S-CAB, ConvSAT, and SEB blocks. Throughout the figure, the notation k @ m × n is used to specify a convolutional layer with k kernels of size m × n .
Figure 1. Detailed structure of BCINetV1 model with its constituent T-CAB, S-CAB, ConvSAT, and SEB blocks. Throughout the figure, the notation k @ m × n is used to specify a convolutional layer with k kernels of size m × n .
Sensors 25 04657 g001
Figure 2. Five-fold classification performance evaluation of the BCINetV1 model, depicting (a) accuracy, (b) recall, (c) f-score, and (d) kappa across five subjects (AA, AL, AV, AW, and AY). The bar heights denote the mean outcomes while the sides whiskers represent the standard deviation.
Figure 2. Five-fold classification performance evaluation of the BCINetV1 model, depicting (a) accuracy, (b) recall, (c) f-score, and (d) kappa across five subjects (AA, AL, AV, AW, and AY). The bar heights denote the mean outcomes while the sides whiskers represent the standard deviation.
Sensors 25 04657 g002
Figure 3. Illustration of classification accuracy for five subjects from Dataset 1 using different channel combinations. The x-axis indicates the number of electrode channels used for training (3, 8, 18, and 118 channels), while the y-axis shows the corresponding test accuracy. The channel combinations in the figure legend illustrate the channels used as the test data.
Figure 3. Illustration of classification accuracy for five subjects from Dataset 1 using different channel combinations. The x-axis indicates the number of electrode channels used for training (3, 8, 18, and 118 channels), while the y-axis shows the corresponding test accuracy. The channel combinations in the figure legend illustrate the channels used as the test data.
Sensors 25 04657 g003
Figure 4. Statistical quantification of BCINetV1 for Dataset 1.
Figure 4. Statistical quantification of BCINetV1 for Dataset 1.
Sensors 25 04657 g004
Figure 5. (a) Pairwise statistical significance (p-values) of features using ANOVA test. (b) Histogram of ANOVA p-values.
Figure 5. (a) Pairwise statistical significance (p-values) of features using ANOVA test. (b) Histogram of ANOVA p-values.
Sensors 25 04657 g005
Figure 6. Topological maps illustrating the model activations in different regions corresponding to the imagined activity. These maps represent the average response from the SEB module, calculated over the entire 3.5 s trial duration for each task. Dark red indicates elevated activity, while the blue shades represent decreased activations.
Figure 6. Topological maps illustrating the model activations in different regions corresponding to the imagined activity. These maps represent the average response from the SEB module, calculated over the entire 3.5 s trial duration for each task. Dark red indicates elevated activity, while the blue shades represent decreased activations.
Sensors 25 04657 g006
Figure 7. BCINetV1 2D t-SNE embeddings for feature separability in Datasets 1–4.
Figure 7. BCINetV1 2D t-SNE embeddings for feature separability in Datasets 1–4.
Sensors 25 04657 g007
Figure 8. Illustration of the working mechanism of T-CAB, S-CAB, and SEB modules in BCINetV1.
Figure 8. Illustration of the working mechanism of T-CAB, S-CAB, and SEB modules in BCINetV1.
Sensors 25 04657 g008
Figure 9. Variations in computational durations by altering the (a) time stamps; (b) frequency of the input MI EEG signals.
Figure 9. Variations in computational durations by altering the (a) time stamps; (b) frequency of the input MI EEG signals.
Sensors 25 04657 g009
Figure 10. Multi-dimensional comparison of model performance on Dataset 1. Classification accuracy (y-axis) is plotted against the number of trainable parameters (x-axis, log scale). The size of each bubble denotes the inference time of the model. Larger bubble means higher inference time and vice versa. * denotes the highest classification accuracy model.
Figure 10. Multi-dimensional comparison of model performance on Dataset 1. Classification accuracy (y-axis) is plotted against the number of trainable parameters (x-axis, log scale). The size of each bubble denotes the inference time of the model. Larger bubble means higher inference time and vice versa. * denotes the highest classification accuracy model.
Sensors 25 04657 g010
Figure 11. Impact of varying model hyperparameters (a) learning rate, (b) number of epochs (c), and optimizers over the classification outcomes.
Figure 11. Impact of varying model hyperparameters (a) learning rate, (b) number of epochs (c), and optimizers over the classification outcomes.
Sensors 25 04657 g011
Table 1. Overview of benchmark EEG decoding models for comparison.
Table 1. Overview of benchmark EEG decoding models for comparison.
Sr. #Model NameArchitecture/Working MechanismSignificanceRelevance to BCINetV1
CNN-based Models
1.EEGNet [16]Compact; temporal convolutions
for frequency filters, followed by depthwise separable convolutions for efficient spatial filtering across EEG channels. Extracts time-domain dynamic features and their spatial distributions.
Highly efficient, generalizable benchmark for various EEG paradigms, good with limited data/resources.BCINetV1 also employs convolutional layers; EEGNet serves as a foundational compact CNN baseline.
2.DeepConvNet [20]Deeper architecture; standard convolutional and pooling layers for hierarchical feature learning from raw EEG. Extracts increasingly complex temporal and spatial features.Early successful deep learning model for EEG, showed CNN capability for time-series brain data.Represents a standard deep CNN approach; BCINetV1 builds upon and refines convolutional strategies.
3.ShallowConvNet [20]Simpler, shallow architecture; temporal convolution followed by a spatial convolution. Extracts fundamental temporal patterns (e.g., band-power) and spatial distributions.Strong, simpler baseline capturing basic EEG features with good interpretability.Offers a contrast to deeper models and highlights efficiency; BCINetV1 aims for both depth and efficiency in its convolutional components.
4.FBCNet [27]Integrates filter banks (frequency sub-bands) with CSP-like spatial filters learned within a CNN. Extracts frequency-specific spatial patterns.Effectively combines traditional signal processing (filter banks, CSP) with deep learning.BCINetV1 aims to learn spectral features directly; FBCNet provides a benchmark for hybrid spectral–spatial feature extraction.
5.MSCNN [28]Parallel convolutional pathways with varying kernel sizes/receptive fields for multiscale processing. Extracts features from short/long duration events and localized/broader spatial activities.Captures richer features by considering information from different resolutions, robust to signal variations.BCINetV1 incorporates multiscale principles in its design; MSCNN is a direct benchmark for multiscale convolutional approaches.
6.TSception [29]Multiscale temporal convolutions (inception-inspired) followed by spatial feature learning. Extracts diverse temporal features from various receptive fields for short/long-term dependencies.Specialized for rich, multiscale temporal information extraction, crucial for dynamic brain states.Aligns with BCINetV1’s focus on temporal feature extraction at multiple scales through its convolutional design.
Attention-based Models
7.TCANet [30]TCN backbone enhanced with temporal self-attention. Extracts long-range temporal dependencies and weights important time points.Strong capability for sequential EEG modeling, highlights critical temporal segments.BCINetV1 uses attention (ConvSAT); TCANet benchmarks attention specifically on temporal sequences learned by TCNs.
8.ATCNet [31]Combines TCNs with attention across temporal, spatial, or feature dimensions. Extracts adaptively weighted temporal sequences and salient channel interactions.Enhances TCN feature learning with dynamic, data-driven focus.BCINetV1’s attention mechanism also aims for dynamic focus; ATCNet offers a comparison for TCNs augmented with attention.
9.CCNet [32]Explicitly models inter-channel correlations using specialized convolutions/graphs with attention. Extracts spatial dependencies and connectivity patterns.Focuses on complex spatial relationships in multi-channel EEG, vital for distributed brain activity.While BCINetV1’s primary attention is tempo-spectral, CCNet provides context for attention on spatial/channel correlations.
10.EEG-Conformer [33]Hybrid: CNNs for local feature extraction, then Conformer blocks (Transformer-style self-attention and convolution) for global dependencies. Extracts local, fine-grained features and global contextual relationships.Effectively leverages CNN strengths for local patterns and Transformers for global interactions.BCINetV1 combines convolution and attention; EEG-Conformer is a benchmark for hybrid CNN-Transformer (attention) architectures.
11.ST-Transformer or ST-DG [34]Transformer-based;
jointly models spatial (inter-channel) and temporal dependencies using factorized/specialized attention. Extracts integrated spatio-temporal dynamics.
Explicit, unified approach to capturing complex interplay between spatial and temporal aspects.BCINetV1’s ConvSAT addresses tempo-spectral attention; ST-Transformer benchmarks pure Transformer-based spatio-temporal attention.
Table 2. State-of-the-art comparison for MI EEG classification on Dataset 1, featuring BCINetV1. All performance metrics are reported in percentages (%). To facilitate comparison, studies are ordered by increasing average classification accuracy.
Table 2. State-of-the-art comparison for MI EEG classification on Dataset 1, featuring BCINetV1. All performance metrics are reported in percentages (%). To facilitate comparison, studies are ordered by increasing average classification accuracy.
CategoryAuthored ByYearMethodAAALAVAWAYAvg.Std.
Hybird signal processing methodsBelwafi et al. [35]2019DSAA69.596.360.570.578.681.914.18
Barmpas et al. [36]2023Brain-wave scattering Net78.992.260.386.785.780.711.07
Singh et al. [37]2019SR-MDRM79.410073.489.288.486.110.15
Amardeep et al. [38]2019R-MDRM81.310076.587.191.287.28.13
Dai et al. [39]2019DTMKB91.996.475.581.292.887.67.89
Jaidaa et al. [40]2019WPD+HOS+SVM89.699.377.997.594.391.77.66
Amin et al. [41]2020DFBCSP+DSLVQ+SSVM/GRBF93.598.581.893.696.192.75.77
Wijaya et al. [42]2021LRFS+TSD93.992.198.594.696.795.22.51
Sadiq et al. [43]2019MEWT+JIA+MLP95.095.095.010010097.02.70
Deep learning methodsSimulated2025EEGNet61.170.355.357.958.960.75.15
Simulated2025ATCNet71.177.173.576.577.775.72.51
Simulated2025CCNet79.076.185.586.986.982.94.47
Liu et al. [44]2022SACNN91.092.077.077.079.083.26.82
Simulated2025ShallowConvNet77.581.888.389.889.985.24.80
Simulated2025EEG-Conformer89.786.186.787.990.988.31.80
Miao et al. [45]2020Spatial Frequency+CNN97.290.090.090.080.090.07.10
Simulated2025TSception93.689.393.689.493.691.92.07
Simulated2025DeepConvNet92.194.592.189.795.792.82.09
Simulated2025TCANet94.194.789.789.697.493.13.03
Mehtiyev et al. [46]2023DeepEnsembleNet9696.688.790.69693.63.26
Simulated2025MSCNN91.898.792.197.897.495.62.97
Sharma et al. [47]2023LSTM+multi-head Attention97.598.399.597.698.498.22.72
This study2025BCINetV198.29898.799.099.498.60.50
Table 3. Ablation study on the impact of individual and combined processing modules on BCINetV1’s classification performance. The p-values are derived from a one-way ANOVA test performed on the features extracted by each model configuration, assessing the statistical significance of the separability between the features of different classes. Here, (*) indicates p > 0.1, (**) denotes p < 0.1, and (***) signifies p < 0.05.
Table 3. Ablation study on the impact of individual and combined processing modules on BCINetV1’s classification performance. The p-values are derived from a one-way ANOVA test performed on the features extracted by each model configuration, assessing the statistical significance of the separability between the features of different classes. Here, (*) indicates p > 0.1, (**) denotes p < 0.1, and (***) signifies p < 0.05.
Experimental BlocksAccuracy (%)Extracted Features ANOVA Test p-Values
T-CAB w/o Temporal ConvSAT52.06*
SEB w/o channels reordering53.09*
S-CAB w/o Spectral ConvSAT55.08*
SEB72.45 * *
T-CAB75.98 * *
S-CAB85.09 * *
T-CAB + SEB88.87 * *
S-CAB + SEB90.67 * *
T-CAB + S-CAB + SEB98.68 * * *
Table 5. Comparison chart of Dataset 3 with state-of-the-art methods. To facilitate comparison, studies are ordered by increasing average classification accuracy.
Table 5. Comparison chart of Dataset 3 with state-of-the-art methods. To facilitate comparison, studies are ordered by increasing average classification accuracy.
CategoryAuthored ByYearMethodAccuracy (%)Recall (%)F-Score (%)Kappa (%)
Hybird
signal
processing
methods
Siuly et al. [51]2017PCA based RF Model83.2 ± 8.33---
Sadiq et al. [57]202020-order Matrix Determinant+FFNN91.8 ± 2.58---
Binwen et al. [49]2022EFD-CNN93.8 ± NA-93.7 ± NA86.6 ± NA
Sadiq et al. [50]2020CADMMI-SDI99.3 ± 2.60---
Deep Learning methodsSimulated2025EEGNet83.1 ± 4.4484.2 ± 4.8383.5 ± 4.9674.6 ± 6.69
Simulated2025ShallowConvNet84.8 ± 1.8285.0 ± 0.9786.0 ± 1.9677.1 ± 2.74
Simulated2025DeepConvNet84.9 ± 3.7385.8 ± 4.1684.8 ± 3.9177.2 ± 5.60
Simulated2025CCNet86.5 ± 6.9486.5 ± 6.8187.0 ± 6.6179.8 ± 10.43
Simulated2025FBCNet87.9 ± 1.5587.4 ± 1.6388.7 ± 2.1481.7 ± 2.34
Simulated2025MSCNN88.9 ± 3.1288.7 ± 3.7788.3 ± 3.7383.3 ± 4.68
Simulated2025TSception91.8 ± 1.7991.3 ± 1.1390.8 ± 1.3887.6 ± 2.70
Simulated2025ATCNet93.4 ± 1.8092.5 ± 1.6492.2 ± 2.5790.1 ± 2.71
Simulated2025ST-Transformer94.7 ± 2.5593.6 ± 2.1295.3 ± 1.9592.1 ± 3.84
Simulated2025TCANet96.0 ± 1.9395.6 ± 1.6295.5 ± 1.5794.0 ± 2.91
Simulated2025EEG-Comformer97.2 ± 1.9598.3 ± 2.6398.0 ± 1.0795.7 ± 2.94
This Study2025BCINetV197.2 ± 0.3096.9 ± 0.54797.5 ± 0.5295.3 ± 0.90
Table 6. Comparison chart of Dataset 4 with state-of-the-art methods. To facilitate comparison, studies are ordered by increasing average classification accuracy.
Table 6. Comparison chart of Dataset 4 with state-of-the-art methods. To facilitate comparison, studies are ordered by increasing average classification accuracy.
CategoryAuthored ByYearMethodAccuracy (%)Recall (%)F-score (%)Kappa (%)
Hybird signal
processing
methods
Agha et al. [61]2016SCSSP63.8 ± 0.49---
Zhao et al. [62]2019WaSF+ConvNet69 ± NA--58 ± NA
Sakhavi et al. [63]2018C2CM74.4 ± NA--65.9 ± NA
Mahamune et al. [64]2022stdWC+CSP+CNN75.0 ± 0.67---
Luo et al. [58]2020ESVL82.5 ± NA--65 ± NA
Deep Learning methodsLiu et al. [34]2023ST-Transformer57.7 ± 0.01---
Schirrmeister et al. [20]2017DeepConvNet70.9 ± NA---
Schirrmeister et al. [20]2017ShallowConvNet73.7 ± NA---
Simulated2025CCNet76.7 ± NA74.5 ± NA75.8 ± NA68.9 ± NA
Simulated2025EEGNet76.4 ± 14.6--68.6 ± 19.5
Song et al. [33]2022EEG-Conformer78.6 ± 0.26--71.55 ± NA
Mane et al. [65]2020FBCNet79.0 ± NA---
Altuwaijri et al. [66]2022MB-EEG-CBAM82.8 ± 11.383.2 ± 11.383 ± 0.1577.1 ± 11.3
Ma et al. [67]2022MB-HNN83.9 ± 9.09 78 ± 12
Altaheri et al. [31]2023ATCNet85.4 ± 9.1--81 ± 12
Liu et al. [30]2022TCANet86.8 ± 10.3---
Yang et al. [28]2024MSFCNNnet87.1 ± 8.85-87 ± 9.0182 ± 12
Zhang et al. [59]2023AMSTCNet87.5 ± 8.0487.4±NA88 ± NA83 ± 0.11
Simulated2025TSception88.5 ± NA88.5 ± NA86.2 ± NA84.7 ± NA
This Study2025BCINetV198.4 ± 0.6098.4 ± 0.6198.4 ± 0.6097.8 ± 0.81
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aziz, M.Z.; Yu, X.; Guo, X.; He, X.; Huang, B.; Fan, Z. BCINetV1: Integrating Temporal and Spectral Focus Through a Novel Convolutional Attention Architecture for MI EEG Decoding. Sensors 2025, 25, 4657. https://doi.org/10.3390/s25154657

AMA Style

Aziz MZ, Yu X, Guo X, He X, Huang B, Fan Z. BCINetV1: Integrating Temporal and Spectral Focus Through a Novel Convolutional Attention Architecture for MI EEG Decoding. Sensors. 2025; 25(15):4657. https://doi.org/10.3390/s25154657

Chicago/Turabian Style

Aziz, Muhammad Zulkifal, Xiaojun Yu, Xinran Guo, Xinming He, Binwen Huang, and Zeming Fan. 2025. "BCINetV1: Integrating Temporal and Spectral Focus Through a Novel Convolutional Attention Architecture for MI EEG Decoding" Sensors 25, no. 15: 4657. https://doi.org/10.3390/s25154657

APA Style

Aziz, M. Z., Yu, X., Guo, X., He, X., Huang, B., & Fan, Z. (2025). BCINetV1: Integrating Temporal and Spectral Focus Through a Novel Convolutional Attention Architecture for MI EEG Decoding. Sensors, 25(15), 4657. https://doi.org/10.3390/s25154657

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop