Skip Content
You are currently on the new version of our website. Access the old version .
AIAI
  • Review
  • Open Access

1 February 2026

Efficient Feature Extraction for EEG-Based Classification: A Comparative Review of Deep Learning Models

,
,
and
1
Division of Engineering, Saint Mary’s University, Halifax, NS B3H 3C3, Canada
2
Sobey School of Business, Saint Mary’s University, Halifax, NS B3H 3C3, Canada
3
Department of Psychology and Neuroscience, Dalhousie University, Halifax, NS B3H 4R2, Canada
*
Author to whom correspondence should be addressed.
AI2026, 7(2), 50;https://doi.org/10.3390/ai7020050 
(registering DOI)
This article belongs to the Topic Theoretical Foundations and Applications of Deep Learning Techniques

Abstract

Feature extraction (FE) is an important step in electroencephalogram (EEG)-based classification for brain–computer interface (BCI) systems and neurocognitive monitoring. However, the dynamic and low-signal-to-noise nature of EEG data makes achieving robust FE challenging. Recent deep learning (DL) advances have offered alternatives to traditional manual feature engineering by enabling end-to-end learning from raw signals. In this paper, we present a comparative review of 88 DL models published over the last decade, focusing on EEG FE. We examine convolutional neural networks (CNNs), Transformer-based mechanisms, recurrent architectures including recurrent neural networks (RNNs) and long short-term memory (LSTM), and hybrid models. Our analysis focuses on architectural adaptations, computational efficiency, and classification performance across EEG tasks. Our findings reveal that efficient EEG FE depends more on architectural design than model depth. Compact CNNs offer the best efficiency–performance trade-offs in data-limited settings, while Transformers and hybrid models improve long-range temporal representation at a higher computational cost. Thus, the field is shifting toward lightweight hybrid designs that balance local FE with global temporal modeling. This review aims to guide BCI developers and future neurotechnology research toward efficient, scalable, and interpretable EEG-based classification frameworks.

1. Introduction

Electroencephalography (EEG) is a non-invasive and affordable technique for studying brain activity across clinical, cognitive, engineering, and security domains [1]. The classification of different brain states or conditions, such as motor imagery (MI), emotional responses, seizure detection, or cognitive workload, is a major focus in EEG research [2]. EEG is also utilized for biometric authentication and identification, treating neural signals as unique signatures to verify or differentiate individuals [3,4]. Success in these tasks depends on FE to transform raw, noisy, and variable EEG signals into informative and robust representations for machine learning (ML) algorithms [5]. Traditional EEG FE combined common spatial patterns (CSP), wavelet transforms (WTs), and spectral power analysis (SPA) techniques with shallow ML methods like support vector machines (SVMs) for classification. While effective in controlled settings, they are sensitive to noise, intrasubject variability, and the high-dimensional nature of EEG signals [6,7,8]. These approaches require domain expertise and extensive preprocessing. In contrast, modern DL models offer the advantage of learning features directly from raw or minimally processed EEG data. DL models have been applied in classification tasks and user authentication [9,10,11]. Although promising, they face large computational requirements, limited training data, and difficulties in generalizing across subjects and sessions. Many studies have recently focused on designing DL architectures that balance representational power with efficiency. Researchers have adapted CNNs, recurrent networks, Transformer-based models, and hybrid designs to capture temporal, spatial, and frequency features and reduce the computational complexity for EEG signals [12,13]. Efficient FE is key to improve the classification accuracy (Acc.) and enable practical implementation in BCIs and mobile neurotechnology systems [14]. In this review, we survey and compare DL models developed for efficient FE in EEG-based classification between 2015 and 2025. We examine their design principles, computational trade-offs, and reported performance across various EEG applications. The contributions of this paper are as follows:
  • A comprehensive review of 114 DL-based EEG classification papers, including systematic reviews, CNN-based models, Transformer-based models, CNN–Transformer hybrids, and other recurrent-based hybrids.
  • An evaluation and discussion of 88 DL-based EEG models, covering the most common network architectures, along with an analysis of efficiency and performance challenges.
  • An in-depth trade-off analysis using different approaches of evaluation to cover a wider spectrum of possible trade-offs.
  • The identification of current challenges in DL-based EEG classification and potential directions to inform future research.
The remainder of this review is organized as follows. We first describe the methodology and paper collection process. Next, we provide a brief background for EEG-based classification and existing surveys. We then survey DL models for efficient FE, grouped by architectural family—namely convolutional, Transformer-based, recurrent-based, and hybrid networks. Comparative trade-off insights across these models are highlighted, followed by a discussion of emerging trends, future directions, and recommendations.

2. Methods

To ensure transparency in conducting our comparative review, we followed a procedure inspired by the PRISMA reporting principles (Figure 1), without claiming a formal systematic review. For a comprehensive review, we collected a total of 114 papers about DL-based EEG classification. The initial screening was based on titles and abstracts. Then, we removed duplicate and irrelevant studies. The remaining papers were reviewed in full to assess their methodologies, FE techniques, DL architectures, and performance metrics. We categorized key information from each study to structurally compare FE strategies.

2.1. Identification

In the selection process used to identify relevant studies, we implemented a title and abstract-based strategy to search major academic research databases, namely IEEE Xplore, Scopus, ScienceDirect, PubMed, and Google Scholar. These databases were chosen to cover engineering, biomedical, ML, and interdisciplinary venues where EEG-based DL studies are published. The IEEE Xplore database consists of articles in engineering and interdisciplinary fields. The Scopus and ScienceDirect databases consist of articles in engineering, computing, and scientific-related research. PubMed comprises bioengineering and neuroscience articles. Google Scholar indexes papers across multiple disciplines and publishers, including engineering, computer science, neuroscience, and biomedicine. For EEG-based DL classification, which spans these domains, Google Scholar helps to capture studies that may not be indexed in a single database. We only considered papers published between 2015 and 2025 to ensure the inclusion of recent and rapid advancements in DL architectures. Using relevant keywords including “EEG”, “feature extraction”, “deep learning”, “classification”, and their combinations, an initial pool of 893 papers was retrieved. The keywords were selected based on common usage in the literature. The acronym “EEG” was used because it is a widely adopted form in paper titles and abstracts across relevant fields, which made it effective for identification during the search phase. The full term “deep learning” was used instead of its acronym to avoid ambiguity, as certain acronyms have different meanings across disciplines. The full term is also commonly used in article titles, which improved the search accuracy.

2.2. Screening

The collected papers were subjected to a multistage screening process. First, titles and abstracts were examined to identify studies relevant to DL-based EEG classification. We removed duplicate records, non-English publications, and studies unrelated to our scope. A total of 256 papers were retained for abstract and full-text examination. Studies were considered eligible if they met the following inclusion criteria: (1) surveys on EEG-based classification, (2) employed EEG-based deep learning classification models, and (3) included DL-based design strategies to improve feature extraction. Surveys and studies lacking methodological detail were excluded at this stage.
Figure 1. Workflow model of our methodology.
Figure 1. Workflow model of our methodology.
Ai 07 00050 g001

2.3. Inclusion

The full-text assessment resulted in 114 papers that we kept for an in-depth analysis. Instead of applying inclusion/exclusion rules typical of systematic reviews, we organized the studies based on their methodological focus. The papers were categorized as follows: (1) 40 papers on CNN-based EEG classification, (2) 15 papers on Transformer-based EEG classification, (3) 22 papers on CNN–Transformer hybrid architectures, (4) 11 papers using miscellaneous deep learning approaches, and (5) 26 surveys on EEG-based classification.

2.4. Analysis

For each study, input representations, feature extraction techniques, network architectures, and reported performance and efficiency metrics were extracted and compared to identify trends, strengths, and challenges in the field. This structured grouping allowed us to perform a fair comparison of DL strategies without imposing the constraints of a formal systematic review protocol.

4. Feature Extraction for EEG-Based Classification

4.1. Efficiency in CNN-Based Models

Efficient FE aims to boost the classification performance (Acc., robustness) while reducing the overall cost (fewer parameters, fast inference, simple models). Over the past decade, many studies (Table 2) have proposed CNN architectures to improve the efficiency of EEG FE. In this section, we analyze a selection of studies to show how each approach enhances FE and impacts both performance and efficiency. Most works focus on EEG-based biometric mechanisms to address traditional challenges. For clarity, we group the studies by their main methodological focus, although overlap across categories exists.

4.1.1. Applications to Raw EEG

Initial studies of CNNs showed that temporal and spatial features could be directly extracted from minimally preprocessed signals, eliminating the need for manually engineered features. The work by Ma et al. [70] showed that CNNs could automatically extract useful patterns from raw resting-state EEG signals and create reliable “brain fingerprints” to identify individuals. Their end-to-end pipeline was jointly optimized using gradient descent. It uses two convolutional layers to extract invariant temporal patterns, followed by two average pooling layers to reduce the dimensionality and computational cost (Comp. Cost), and ends with a fully connected layer for classification. This approach achieved 88% Acc. in a 10-class identification task and maintained strong results with very low-frequency bands (0–2 Hz) and short temporal segments (<200 ms), reaching 76% Acc. with only 62.5 ms of data. Following this direction, Mao et al. [71] used a CNN to learn discriminative EEG features for person identification. Their pipeline processed raw EEG signals from a large-scale driving fatigue experiment. The CNN architecture utilized three convolutional layers with ReLU and max pooling modules after each, followed by two fully connected layers and a softmax output for classification. This method achieved 97% Acc. from 14,000 testing epochs and was trained in only 0.3 h for over 100 K epochs, outperforming traditional shallow classifiers. Schirrmeister et al. [13] proposed shallow and deep ConvNets for raw EEG decoding. The shallow network, inspired by Filter Bank Common Spatial Patterns (FBCSP), combined temporal convolutions, spatial filtering, squaring nonlinearity, mean pooling, and logarithmic activation to extract band power features. The deep network captured hierarchical spatiotemporal modulations, splitting the first block into temporal and spatial layers for regularization. Cropped training with sliding windows augmented the dataset and reduced overfitting. Although training was slower than for FBCSP, the predictions were efficient. Batch normalization and ELU activation improved its performance. Visualizations revealed meaningful frequency modulation, demonstrating robust decoding and interpretable feature learning. This approach outperformed FBCSP pipelines, achieving higher mean Acc. of 84.0%, compared to 82.1%. Similarly, Schöns et al. [72] presented a deep CNN-based biometric system trained on raw EEG. Recordings were segmented into 12 s overlapping windows to expand the training data. The network, composed of convolution, ReLU activation, pooling, and normalization layers, processed the signals, and classification layers were later discarded. CNN outputs served as compact feature vectors for verification. The system achieved an equal error rate (EER) of 0.19%, with the gamma band providing the most discriminative signal. Sliding-window augmentation and the shallow CNN enabled both scalability and near-perfect identification Acc. Di et al. [73] also proposed a CNN-based EEG biometric identification system. EEG signals were preprocessed and segmented before being input into the network. Temporal convolutions captured frequency patterns, while spatial convolutions modeled correlations across electrodes. Deeper layers extracted subject-specific representations to distinguish individual participants. Zhang et al. [74] designed HDNN and CNN4EEG for EEG-based event classification. HDNN divides EEG epochs into sub-epochs processed by child DNNs with shared weights, boosting the training speed and cutting the memory usage, while improving the Acc. over standard DNNs. CNN4EEG uses custom convolutional filters adapted to EEG’s spatial and temporal structure to effectively capture its spatiotemporal patterns. CNN4EEG outperformed all baselines, achieving 13% higher Acc. than shallow methods, 9% higher than that of a canonical CNN, and 6% higher than that of a DNN. These works demonstrate feasibility, but the systems’ performance depends on large models and extensive training data.

4.1.2. Frequency- and Spectral-Domain Approaches

Further applications have transformed EEG signals into compact representations to highlight spectral discriminants. To adapt CNNs to consumer-grade EEG, González et al. [75] proposed a 1D CNN operating on power spectral density (PSD) estimates computed from 6 s EEG windows with Welch’s method. The PSD inputs offered compact frequency-domain representations, reducing the dimensionality while retaining discriminative information. Convolution and max pooling layers operated like wavelet decomposition by extracting coarse spectral features; combined with downsampling, subsequent layers captured finer temporal–spectral patterns. An inception block further enabled multiscale feature learning. This approach achieved 94% Acc., surpassing SVM baselines and showing that PSD-based CNNs can efficiently extract subject-specific features from low-cost hardware. For SSVEP classification, Waytowich et al. [76] compared traditional FE methods such as canonical correlation analysis (CCA) with the compact EEGNet [12]. The compact CNN extracted frequency, phase, and amplitude features directly from raw EEG. It contrasted CCA, which requires prior knowledge of the stimulus signals and performs well in synchronous paradigms but fails in asynchronous ones. Using temporal convolutions as bandpass filters and depthwise spatial convolutions for spatial filtering, its compact design reduces the parameters and supports training on small datasets. It achieved 80% cross-subject Acc., outperforming traditional methods and demonstrating robust, calibration-free performance for BCI applications. Expanding this line of work, Yu et al. [77] focused on low-frequency SSVEP components (<20 Hz) as discriminative features for user authentication. These signals were marked by high intersubject and low intrasubject variability. They were isolated using a Chebyshev low-pass filter to suppress less informative high-frequency oscillations. For classification, the authors extended Schirrmeister’s shallow ConvNet [13] to a multiclass variant (M-Shallow) by introducing parallel temporal filters and additional layers to improve the scalability without increasing Complexity. Despite its lightweight design of 30 k parameters, it preserved fast training and inference and delivered high Acc. across multiple tasks. This integration yielded 97% cross-day authentication Acc. on eight subjects and shows that shallow architectures can be adapted to complex EEG biometrics with high efficiency. Compared with raw-signal CNNs, frequency-domain CNNs balance interpretability and efficiency by focusing on known neural rhythms. They draw on neuroscience principles to simplify models and enhance their robustness. These models show that embedding spectral knowledge into the network is more effective than increasing its depth.

4.1.3. EEG Representation and Topology Strategies

Other EEG strategies focus on how signals are structured before entering CNNs. They range from spatial electrode mappings to graph-based connectivity encodings to exploit the spatial organization and network properties of brain activity. Lai et al. [78] examined how EEG input representation affects CNN-based biometrics. The matrix of amplitude vs. time presents raw EEG amplitude values in their default channel order. It preserves spatial patterns, minimizes the preparation time, and achieves high identification Acc. Converting data into images introduces slight information loss but reduces storage, proving an acceptable trade-off for handling large datasets. Normalizing energy in an image of energy vs. time stabilizes large power values, simplifies CNN calculations, and improves FE. Rearranging channels by correlation degrades performance by disrupting the 2D spatial patterns that are essential for CNN feature learning. The study shows that CNN performance depends on input data structuring. Wang et al. [79] represented EEG signals as functional connectivity (FC) graphs generated using the phase locking value (PLV) in the beta and gamma bands, producing fused representations. Applying graph CNNs (GCNNs) to these graphs resulted in higher correct recognition rates (CRRs), robust generalization, reduced training time, and efficient transfer learning. In a related approach, 1D EEG data were converted into 2D layouts and processed with 3D CNNs for the simultaneous extraction of spatial and temporal features. This 3D method outperformed 2D CNNs, reduced the input dimensionality, and captured complex spatiotemporal patterns in ERP classification. Graph-based connectivity methods [80] and 3D convolutional representations [81] extended this further, capturing interchannel relationships and depth. Wang et al. [80] evaluated three EEG biometric methods—RHO + CNN, UniFeatures + CNN, and Raw + CNN—in terms of efficiency and performance. Firstly, RHO + CNN integrates FC with a CNN. The FC module calculates beta-band synchronization using the RHO index to generate 2D maps, which the CNN processes for FE and classification. Training converges in about 10 epochs, and the model achieves a CRR of 99.94% with low equal error rates (EERs). Its success comes from stable identity information in FC. Secondly, UniFeatures + CNN extracts univariate features such as autoregressive (AR) coefficients, fuzzy entropy (FuzzyEn), and PSD. While efficient, it yields lower CRRs and higher EERs in cross-state authentication, due to the state dependence of these features. Lastly, Raw + CNN directly processes raw EEG, requiring around 25 epochs for training and producing the lowest CRRs and highest EERs. It is highly sensitive to mental states, noise, and artifacts, resulting in poor generalization. Overall, RHO + CNN is the fastest, most accurate, and robust method, effectively handling mental state variabilities. Zhang et al. [81] applied 3D convolution to high-density EEG by modeling the electrode layout as a 2D grid with time as a third dimension to jointly learn scalp topology-aware spatial filters and temporal dynamics. The model’s computation was optimized through kernel and dimensionality choices. It outperformed 2D CNNs in tasks using spatial patterns like frontal–occipital activity and achieved strong results in ER and seizure detection. These representation strategies emphasize spatial locality and generalize better across cognitive tasks in comparison with frequency-based models.

4.1.4. ERP- and VEP-Based CNN Models

In contrast to continuous EEG, discrete approaches leverage stimulus-locked potentials as discriminative signatures, embedding spatial and temporal filtering directly into CNN architectures. Das et al. [82] investigated visual evoked potentials (VEPs) as stable biometric signatures. Data from 40 subjects across two sessions underwent preprocessing, including common average referencing (CAR), bandpass filtering, downsampling, z-score normalization, and detrending, before generating averaged VEP templates. Template averaging enhanced the signal-to-noise ratio and reduced the input dimensionality for efficient training. These inputs were processed by stacked convolutional and max pooling layers to extract spatiotemporal features and improve the robustness to subject variability. The system achieved 98.8% rank-1 identification Acc. and strong temporal stability across sessions performed a week apart. Cecotti et al. [83] conducted a systematic evaluation of CNN architectures of varying depth for single-trial ERP detection. They showed that well-designed CNNs can efficiently extract discriminative ERP features with minimal preprocessing. Integrating convolutional spatial filtering and shift-invariant temporal convolution achieved an average area under the curve (AUC) of 0.905 across subjects, while weight sharing reduced the parameters and accelerated training. Training multisubject classifiers eliminated subject-specific calibration and enhanced the efficiency and practicality of ERP-based BCI systems. Although the initial training was computationally intensive, the resulting models offered robustness and scalability beyond traditional linear methods. Extending this work, Cecotti et al. [84] evaluated 1D, 2D, and 3D CNNs for single-trial ERP detection. They introduced a volumetric representation by remapping 64 EEG sensors into a 2D scalp layout and adding time as a third dimension. These multidimensional convolutions simultaneously captured robust spatiotemporal features, improved generalization, and reduced the sensitivity to subject variability. The best 3D CNN required fewer inputs and achieved a mean AUC of 0.928, demonstrating superior efficiency and scalability. Chen et al. [85] proposed the Global Spatial and Local Temporal Filter CNN (GSLT-CNN). It integrates global spatial convolutions, which capture interchannel dependencies, with local temporal convolutions that extract fine dynamics. Trained directly on raw EEG signals, it achieved 96% Acc. across 157 subjects and up to 99% in rapid serial visual presentation (RSVP)-based cross-session tasks. The model was trained with 279,000 epochs in under 30 m, while remaining lightweight, robust, and highly adaptable. This demonstrates strong potential for EEG biometrics and broader BCI applications. These studies show how CNNs can be adapted to task-specific temporal alignments, in contrast to generic time–frequency approaches.

4.1.5. Multiscale and Temporal Modeling Strategies

Other strategies have expanded the receptive fields and decomposed EEG into finer sub-bands. This allows CNNs to capture both short- and long-range dynamics efficiently. Salami et al. [86] introduced EEG-ITNet, an Inception-based temporal CNN, for efficient MI classification in BCI systems. FE was performed through inception modules and causal convolutions with dilation. These modules separated multichannel EEG into informative sub-bands using parallel convolutional layers with different kernel sizes. Depthwise convolutions combined electrode information. The temporal convolution block employed dilated causal convolutions in residual layers to increase the receptive field. This hierarchical design efficiently integrated spectral, spatial, and temporal information with fewer parameters and enhanced interpretability comparable to other architectures. EEG-ITNet achieved 76.74% mean Acc. in within-subject and 78.74% in cross-subject scenarios and also generalized well on the OpenBMI dataset. Bai et al. [87] leveraged temporal convolutional networks (TCNs), which apply causal and dilated convolutions to efficiently capture long-range temporal features from sequential EEG data. Dilated convolutions enabled exponentially expanding receptive fields that could model dependencies far into the past, while residual connections stabilized the training and architecture depth. TCNs processed sequences in parallel to avoid the bottlenecks of recurrent networks, reduce memory usage during training, and allow faster inference. Compared to LSTM and generative recurrent units (GRUs), TCNs delivered superior performance across tasks. They improved FE and the predictive Acc., making TCNs a scalable and powerful alternative for EEG sequence modeling. For MI decoding in BCIs, Riyad et al. [88] introduced Incep-EEGNet, a deep ConvNet architecture. The pipeline processed raw EEG signals through bandpass filtering and trial segmentation before end-to-end FE and classification. Incep-EEGNet applied filters similar to EEGNet [12], followed by an Inception block. This block integrated parallel branches with varying convolutional kernels, pointwise convolution, and average pooling to extract richer temporal features. Depthwise convolution reduced the Comp. Cost, while pointwise convolution served as a residual connection. The model achieved 74.07% Acc. and a kappa of 0.654, outperforming traditional methods. Liu et al. [89] also proposed a parallel spatiotemporal self-attention CNN for four-class MI classification. The spatial module assigns higher weights to motor-relevant channels and reduces artifacts. The temporal module emphasizes MI-relevant sampling points and encodes continuous temporal changes. The model achieved 78.51% Acc. for intrasubject and 74.07% for intersubject classification. This lightweight design supports real-time BCI applications and outperformed both traditional and DL models. Multiscale strategies were further refined by Zhu et al. [90] in RAMST-CNN, a residual and multiscale spatiotemporal convolutional network for EEG-based personal identification. The model combined residual connections, multiscale grouping convolutions, global average pooling, and batch normalization. Parallel convolutions extracted both coarse and fine temporal patterns, while residual links facilitated gradient flow. Despite being lightweight, RAMST-CNN achieved 99.96% Acc. Similarly, Lakhan et al. [91] introduced a broadband network. It processes multiple EEG frequency bands in parallel through convolution branches tailored to distinct ranges. By integrating multiband FE into a single model, it achieved higher Acc. than single-band approaches. It maintained lower parameter counts than training separate models for each band. Ding et al. [92] introduced Tsception, a multiscale CNN for ER from raw EEG. It consisted of three layers: a dynamic temporal layer, an asymmetric spatial layer, and a high-level fusion layer. The temporal layer used 1D kernels of varying lengths scaled to the EEG sampling rate to capture both long-term low-frequency and short-term high-frequency dynamics. The spatial layer used global and hemisphere-specific kernels to exploit EEG asymmetry and model interhemispheric relationships. Features were then integrated by the fusion layer for classification. Tsception achieved higher Acc. and F1 scores than other methods while remaining compact with fewer parameters, supporting online BCI use. These approaches trade off efficiency for robustness. They capture dynamic EEG patterns beyond fixed time windows, situating them between raw CNNs and sequential models.

4.1.6. Compact and Lightweight Architectures

Further CNN approaches prioritize efficiency by reducing the parameter counts to improve their distinctive power. A key milestone was given by Lawhern et al. [12], who introduced EEGNet, a compact CNN for general-purpose BCI applications. The network processed raw EEG trials directly, using temporal convolutions for frequency filters, depthwise convolutions for spatial patterns, and a separable convolution to summarize temporal dynamics. This design performed well with limited data and no augmentation. EEGNet’s efficiency and interpretable features made it robust and practical for both ERP- and oscillatory-based BCIs. Subsequent variants [93,94] optimized EEGNet [12]. Salimi et al. [93] applied EEGNet [12] to extract cognitive signatures for biometric identification. They combined the N-back task cognitive protocol with an optimized EEGNet model to elicit and capture robust EEG patterns. This approach reduced the recording time and Comp. Cost. Using single EEG segments of only 1.1 s, the lightweight network achieved up to 99% identification Acc. Ingolfsson et al. [94] also developed an EEGNet [12] and TCN-based [87] model optimized for edge devices. EEG-TCNet used temporal convolutions with depthwise separable filters and dilation to capture EEG dynamics. It had only 4272 parameters and 6.8 M multiply–accumulate operations (MACs) per inference, but it achieved 83.8% Acc. on four-class MI, matching larger networks. Its low memory and compute footprint supports real-time on-device BCI processing, indicating that high Acc. is achievable with very few parameters. The model also generalized well across 12 datasets, surpassing prior results on the MOABB benchmark. Similarly, Kasim et al. [95] showed that even simple 1D CNNs could achieve competitive performance with minimal parameter counts. Their model extracted temporal features from each EEG channel using small convolution kernels, fused them efficiently, and avoided 2D operations, resulting in a lightweight network with only 10 K parameters. It matched the Acc. of more complex models in ER and MI tasks while being faster and easier to train. Lightweight convolutional strategies continued with Wu et al. [96]. They introduced MixConv CNNs by embedding varying convolution kernels within each layer to emulate an FBCSP approach. Different kernel lengths targeted frequency bands from delta to gamma to allow simultaneous temporal filtering. Mixed-FBCNet used 0.45 M parameters and only 10 EEG channels yet achieved high Acc. and low EERs. Building on this compactness, Altuwaijri et al. [97] developed a multibranch CNN for MI. EEG signals were split into three branches, processed by EEGNet [12] modules, and recombined using squeeze and excitation (SE) attention blocks, which reweighted features to improve the discriminability and efficiency. This compact architecture achieved 70% Acc. on challenging four-class MI tasks, outperforming the single-branch EEGNet [12]. Autthasan et al. [98] proposed MIN2Net, a multitask learning model for subject-independent MI EEG classification, to eliminate calibration for new users. EEG signals were bandpass-filtered and encoded into latent vectors by a multitask autoencoder. Deep metric learning (DML) with triplet loss refined these embeddings by clustering same-class samples and separating different-class samples, and a supervised classifier performed the final prediction. This design reduced preprocessing and kept the model small. MIN2Net improved the F1 score by 6.7% on SMR-BCI and 2.2% on OpenBMI, with ablation studies confirming the contribution of the DML module. Latent feature visualizations showed clearer feature clustering than baselines, and its consistent performance across binary and multiclass tasks demonstrates its suitability for calibration-free online BCI applications. Bidgoly et al. [99] used Siamese and triplet CNNs for EEG biometrics. Their model learned compact embeddings of EEG segments for subject matching, rather than multiclass classification. This approach avoided large output layers, ensuring scalability as the number of subjects increased, and achieved a 98.04% CRR with a 1.96% error rate on 105 subjects from the PhysioNet EEG dataset. Alsumari et al. [100] also studied EEG-based identification under controlled recording conditions as a matching task between distinct brain states (open vs. closed eyes). Using a CNN to differentiate individuals, their approach achieved a 99.05% CRR with only a 0.187% error rate on PhysioNet. As in the work of Bidgoly et al. [99], the model was compact, with small output layers. The results indicated robust FE for efficient and reliable identification. These studies emphasize deployment and robustness in resource-limited environments. Networks like EEGNet [12] and its variants [93,94] show that, for EEG tasks, efficiency and inductive biases matter more than large architectures. Unlike computer vision, where depth means richer features, EEG’s low SNR and small datasets make shallow models more effective. This is why EEGNet [12] has become a baseline for research and applied settings.

4.1.7. Benchmarking and Next-Generation CNN Designs

Comparative benchmarks highlight trade-offs across CNN families. Recently, Yap et al. [101] benchmarked CNN architectures such as GoogLeNet, InceptionV3, ResNet50/101, DenseNet201, and EfficientNet-B0 for EEG-based classification. EEG signals were converted to time–frequency images as input to these networks. They found that these CNNs could achieve high Acc. on EEG tasks, although trade-offs exist between model complexity and speed. EfficientNet-B0 matched the Acc. of deeper models while using fewer parameters, making it faster. ResNet101 and DenseNet201 offered only slight Acc. gains at the cost of a higher Comp. Cost and memory. The study concluded that balancing performance with computational efficiency is crucial and highlighted the potential of lightweight transfer learning for EEG. Chen et al. [102] introduced EEGNeX, a novel CNN that leverages the latest DL advances. The model combines expanded receptive fields, attention mechanisms, depthwise convolutions, bottleneck layers, residual connections, and optimized convolutional blocks to enhance spatiotemporal FE in EEG signals. As with EEGNet [12] and prior lightweight models, this model achieved high Acc. across multiple EEG classification tasks. Shakir et al. [103] proposed a CNN-based EEG authentication system with three convolutional layers, max pooling, two ReLU-activated dense layers, dropout, and a softmax output optimized using root mean square propagation (RMSprop). Feature selection used Gram–Schmidt orthogonalization (GSO) to identify three key channels (Oz, T7, Cz). For FE, the CNN was a “fingerprint model”, with the penultimate layer’s vectors compared against stored templates using cosine similarity (CS). Supporting both single-task and multitask FE, the system’s Acc. improved from 71% to 95% after optimizing the final dense layer to 30 neurons. This makes it a robust approach for practical EEG authentication. These works illustrate that simplified vision architectures can be adapted for EEG. This reinforces that EEG benefits from shallow designs rather than from scaled ones.

4.1.8. Fusion and Hybrid Feature Extraction Strategies

Beyond unimodal pipelines, fusion strategies integrate complementary FE pathways, like spatial, temporal, spectral, or multimodal information, for improved robustness and efficiency. Wu et al. [104] fused EEG with blinking features for multitask authentication. Using an RSVP paradigm, the system evoked stable EEG and electrooculography (EOG) signals. Hierarchical EEG features were extracted by selecting discriminative channels and time intervals via pointwise biserial correlation, averaging, and forming spatiotemporal maps for a CNN. Blinking signals were processed into time-domain morphological features through a backpropagation network. The two modalities were fused at the decision level using least squares, achieving 97.6% Acc. Subsampling and weight sharing ensured computational efficiency. Özdenizci et al. [105] incorporated adversarial training into CNN FE by adding an auxiliary domain discriminator to enforce subject-invariant EEG features. This approach reduced the need for calibration and improved cross-subject generalization, with only minor computational overhead. Musallam et al. [106] advanced hybrid feature fusion by introducing TCNet-Fusion, an enhanced version of EEG-TCNet [94] with a fusion layer combining features from multiple depths. Shallow outputs from an initial EEGNet [12] module were concatenated with a TCN [87] deep temporal features. This multilevel fusion improved the MI classification Acc. compared to the base EEG-TCNet [94] while adding a negligible Comp. Cost. The model was small due to the efficiency of EEGNet [12] and the TCN [87]. Mane et al. [107] introduced FBCNet, a CNN designed to handle limited training data, noise, and multivariate EEG for MI classification in BCIs. The network processed raw EEG through multiview spectral filtering and isolated MI-relevant frequency bands. The spatial convolution block captured discriminative spatial patterns for each band. The variance layer extracted temporal features by computing the variance in non-overlapping windows, emphasizing ERD/ERS-related activity. These strategies simplified the processing of complex EEG, enhanced the robustness, and reduced overfitting. The compact FBCNet performed well across diverse MI datasets, including stroke patients. These implementations show CNNs’ flexibility, transitioning from standalone classifiers to modular components in multimodal or sequential pipelines. They function as efficient feature extractors when paired with TCNs or RNNs or integrated into adversarial/fusion strategies.

4.1.9. Cross-Integration Overlaps

Compact CNNs [12,93] also process raw EEG inputs (Section 4.1.1); we classify them as compact designs (Section 4.1.6) to emphasize their efficiency contributions rather than input modalities. Other compact CNNs [94,95,96] also adopt spectral representations (Section 4.1.2). We also categorize these as compact designs (Section 4.1.6), since architectural parsimony is their primary focus. Some methods [78,79] combine frequency decomposition (Section 4.1.2) with topological electrode mapping. We discuss them in Section 4.1.3, since spatial encoding is their focus. Temporal CNNs [88,90,92] are also hybrid architectures that incorporate sequential models. We include them in Section 4.1.5 to emphasize their CNN-specific contributions, while broader hybridization is covered in Section 4.1.8. Benchmark studies [101,102] include compact and deeper architectures. We discuss them under benchmarking Section 4.1.7 to highlight comparative findings, rather than compactness (Section 4.1.6). Table 2 below presents a chronological overview of the studies.
Table 2. CNN-based architectures for EEG classification.
Table 2. CNN-based architectures for EEG classification.
Ref.Author/YearModelProtocolSamples ChannelsInspiration BasisAcc.
[70]Ma et al. 2015CNN-basedResting1064CNN88.00
[71]Mao et al. 2017CNN-basedVAT10064CNN97.00
[75]Gonzalez et al. 2017ES1DAVEPs 23161D CNN [Inception]94.01
[82]Das et al. 2017CNN-basedMI, VEPs4017CNN98.80
[83]Cecotti et al. 2017CNN1-6RSVP1664CNN83.10–90.50
[11]Schirrmeister et al. 2017ConvNetsMI9/5422/54/3ResNet, Deep/Shallow ConvNet81.00–85.20
[87]Bai et al. 2018TCN CNN97.20–99.00
[12]Lawhern et al. 2018EEGNetERPs, ERN,
SMR, MRCP
15/26/13/964/56/22CNN0.91 [AUC]
[104]Wu et al. 2018CNN-basedRSVP10/1516CNN97.60
[72]Schons et al. 2018CNN-basedResting10964CNN99.00
[73]Di et al. 2018CNN-basedERPs3364CNN99.30–99.90
[74]Zhang et al. 2018HDNN/CNN4EEGRSVP1564CNN89.00
[76]Waytowich et al. 2018Compact EEGNetSSVEP108EEGNet80.00
[78]Lai et al. 2019CNN-basedResting1064CNN83.21/79.08
[85]Chen et al. 2019GSLT-CNNERPs, RSVP10/32/15728/64CNN97.06
[79]Wang et al. 2019CNN-basedSSVEP108CNN99.73
[77]Yu et al. 2019M-Shallow ConvNetSSVEP89CNN96.78
[84]Cecotti et al. 20191/2/3D CNNRSVP1664CNN92.80
[80]Wang et al. 2019CNN-basedResting109/5964/46Graph CNN99.98/98.96
[105]Özdenizci et al. 2019Adversarial CNNRSVP3/1016CNN [Adversarial]98.60
[93]Salimi et al. 2020N-Back-EEGNetN-back 2628EEGNet95.00
[94]Ingolfsson et al. 2020EEG-TCNetMI922EEGNet-TCN77.35–97.44
[88]Riyad et al. 2020Incep-EEGNetMI922Inception-EEGNet74.08
[89]Liu et al. 2020PSTSA-CNNMI9/1422/44CNN-S Attention74.07–97.68
[95]Kasim et al. 20211DCNNPhotic stimuli1616CNN97.17
[90]Zhu et al. 2021RAMST-CNNMI10964CNN [ResNet]96.49
[106]Musallam et al., 2021TCNet-FusionMI9/1422/44EEG-TCNet83.73–94.41
[107]Mane et al. 2021FBCNetMI9/54/37/3422/20/27CNN74.70–81.11
[86]Salami et al. 2022EEG-ITNetMI54/920/22Inception-TCN76.19/78.74
[81]Zhang et al. 20223D CNNVEPs7016CNN82.33
[99]Bidgoly et al. 2022CNN-basedResting10964/32/3CNN98.04
[96]Wu et al. 2022Mixed-FBCNetMI109/9/1064/22/10FBCNet98.89–99.48
[97]Altuwaijri et al. 2022MBEEG-SEMI 922EEGNet-S Attention82.87–96.15
[98]Autthasan et al. 2022MIN2NetMI9/14/5420/15AE [CNN]72.03/68.81
[92]Ding et al. 2023TSceptionEMO32/2732/32GoogleNet61.27/63.75
[100]Alsumari et al. 2023CNN-basedResting1093CNN99.05
[101]Yap et al. 2023GoogleNet, ResNet, EfficientNet, DenseNet, InceptionERPs3014CNN80:00
[102]Chen et al. 2024EEGNeXERPs, MI,
SMR, ERN
1/54/6/2614/20/22/56EEGNet78.81–93.81
[103]Shakir et al. 2024STFE/MTFE-R-CNNMI10964CNN89.00/95.00
[91]Lakhan et al. 2025EEG-BBNetMI, ERPs, SSVEP5462/14/8CNN–Graph CNN99.26
Acc.: Accuracy, AE: Autoencoder, Adversarial: Adversarial CNN, AVEP: Auditory Visual Evoked Potential, Deep/Shallow ConvNet: Deep/Shallow Convolutional Network, EMO: Emotion Protocol, ERN: Error-Related Negativity, ERP: Event-Related Potential, Inception: Inception Variant, MI: Motor Imagery, MRCP: Movement-Related Cortical Potential, MHA: Multihead Attention, N-Back Memory: N-Back Memory Task, Photic Stimuli: Light Stimulation Protocol, ResNet: Residual Neural Network, Resting: Resting State Protocol, RSVP: Rapid Serial Visual Presentation, S Attention: Self-Attention, SMR: Sensory Motor Rhythm, SSVEP: Steady-State Visual Evoked Potential, TN: Tensor Network, T Encoder: Transformer Encoder, VAT: Visual Attention Task, VEPs: Visual Evoked Potential, Y [X]: Indicates that model Y has components from model X.

4.2. Efficiency in Transformer-Based Models

Transformers provide a flexible and scalable framework for EEG FE (Table 3). By leveraging SA, they capture complex temporal dynamics, cross-channel spatial relationships, and long-range dependencies. Innovations such as generative pretraining, modular designs, and ensemble strategies enhance their robustness, discriminative power, and computational efficiency. Their ability to process raw signals, integrate multiple feature domains, and adapt across tasks has established them as a versatile tool in EEG analysis.

4.2.1. Applications to Raw EEG

Several studies have focused on Transformers that directly process raw EEG to eliminate manual feature engineering while achieving high Acc. and efficiency. Arjun et al. [38] adapted Vision Transformer (ViT) [34] for EEG ER, comparing CWT-based scalograms with raw multichannel signals. The model used multihead self-attention (MHSA), with the raw-input ViT treating each time window as a patch to capture transient emotional patterns. The raw model outperformed the CWT version, achieving 99% Acc. on DEAP. Shorter windows (6 s) boosted the performance by emphasizing local dynamics and increasing the sample size. A design with six layers, 512-D embeddings, and eight heads made the ViT two to three times smaller than its NLP counterparts. The study showed that end-to-end attention-based learning can surpass CNN/LSTM baselines for applied use. Siddhad et al. [41] applied a pure Transformer model with four stacked Transformer encoders and MHSA. Positional encoding integrated temporal order and channel information to enable the joint learning of spatiotemporal patterns. The model size was tuned per dataset to reduce overfitting and the Comp. Cost, simplify preprocessing, and improve efficiency. This approach removed the need for features like PSD or entropy. It achieved >95% Acc. for mental workload and >87% for age/gender classification, matching state-of-the-art results.

4.2.2. Generative and Self-Supervised Foundation Models

Transformers have also been used as foundational models by using self-supervised or generative learning to extract robust representations, synthesize EEG signals, and support cross-task generalization. Dosovitskiy et al. [34] introduced ViT. It replaces convolutional feature extractors by dividing images into fixed-size patches, embedding them, and applying MHSA. This design removes CNNs’ inductive biases and enables flexible and generalizable feature learning. ViT models (Base, Large, Huge) were pretrained on large datasets like ImageNet-21k and JFT-300M. MHA encodes diverse interactions in parallel, and pretrained representations allow high Acc. on smaller datasets. ViT proves that trained Transformers can match or surpass CNNs in FE while offering faster, more adaptable performance. This approach inspired EEG-specific Transformers like EEGPT [46]. Omair et al. [44] developed the Generative EEG Transformer (GET), a GPT-style [31] model that learns long-range EEG features through self-supervised signal generation. By predicting future EEG samples, GET forces attention layers to extract oscillations and distant dependencies. Pretraining on diverse EEG datasets rendered GET a foundational model. This improved subsequent tasks like epilepsy detection and BCI control while generating robust synthetic EEG for data augmentation, surpassing generative adversarial network (GAN) methods. Once pretrained, GET reduces the reliance on manual preprocessing and customized design. Lim et al. [45] proposed EEGTrans, a two-stage generative Transformer that integrates a vector-quantized autoencoder (VQ-VAE) with a Transformer decoder for EEG synthesis. The VQ-VAE compresses EEG into discrete latent tokens while filtering noise. The Transformer decoder models these tokens autoregressively, reconstructing realistic EEG with preserved spectral characteristics. This efficient separation of local and global modeling reduces the sequence length and allows a focus on informative features. EEGTrans resulted in realistic data augmentation, surpassing GAN-based methods. It also demonstrated strong cross-dataset generalization and robust unsupervised pretraining. Wang et al. [46] introduced EEGPT, a Transformer pretrained with masked self-supervised learning. EEG signals were segmented along time and channels with large and masked portions (50% time, 80% channels). Then, an encoder was trained to reconstruct missing data while aligning latent embeddings via a momentum encoder. This strategy forced EEGPT to learn spatiotemporal dependencies to produce robust representations. With 10 M parameters and minimal fine tuning, EEGPT achieved high performance across diverse EEG tasks. The masking strategy doubled data augmentation and improved generalization while avoiding task-specific model design. These models provide scalable, reusable EEG representations that reduce the fine tuning costs.

4.2.3. Modular and Dual-Branch Spatiotemporal Transformers

Some approaches model spatial and temporal EEG features using attention to improve the interpretability, discriminative power, and efficiency for tasks like ER and MI. Song et al. [39] proposed S3T, a two-branch Transformer for EEG MI that separates spatial and temporal FE. A spatial filtering module (CSP) improves the signal quality, followed by an encoder to capture dependencies between brain regions via SA. A Transformer then focuses on sequential time steps to extract evoked patterns. With only three MHA layers, this modular design directs each encoder to a specific signal aspect to improve efficiency and reduce overfitting. Trained end-to-end on BCI Competition IV-2a, S3T achieved 82–84% Acc. Its shallow design and integrated spatial filtering enhanced its generalization without large datasets. Du et al. [42] proposed ETST, a dual-encoder Transformer that separately models EEG features for person identification. The Temporal Transformer Encoder (TTE) treats each time point as a token to capture correlations, rhythms, and ERPs, while the Spatial Transformer Encoder (STE) treats each channel as a token to model connectivity. Preprocessing with bandpass filtering, artifact removal, and z-score normalization reduced noise. Trained without pretraining or augmentation, ETST achieved high Acc. It was also shown that modular attention aligned with the EEG structure in a compact model resulted in rich discriminative representations. Hu et al. [108] presented HASTF, a hybrid spatiotemporal attention network for EEG-based ER. EEG was filtered into five frequency bands using differential entropy (DE) features extracted from 1 s windows. These features were arranged into 3D patches to reflect the scalp topology. Spatial features were extracted by the Spatial Attention Feature Extractor (SAFE), which combines a U-shaped convolutional fusion module, skip connections, and parameter-free spatial attention. The Temporal Attention Feature Extractor (TAFE) applied positional embeddings and SA to model temporal dynamics. HASTF achieved high Acc. > 99%. Ablation studies showed temporal attention as the most impactful component. Muna et al. [109] introduced SSTAF, or the Spatial–Spectral–Temporal Attention Fusion Transformer, for upper-limb MI classification. EEG undergoes bandpass (8–30 Hz) and notch filtering, common average referencing, segmentation, and normalization. A Short-Time Fourier Transform (STFT) produces 4D time–frequency features. These rhythms were processed by spectral and spatial attention modules and an encoder to model temporal and channel interactions. SSTAF achieved 76.83% on EEGMMIDB and 68.30% on BCI Competition IV-2a, outperforming prior CNN and Transformer models. Ablation studies highlighted the role of the encoder in capturing temporal dynamics and attention modules in improving the Acc. Wei et al. [110] further refined dual-encoder modeling for EEG ER by separating temporal and spatial dependencies. A time step attention encoder extracted sequence-level features per channel, while a channel attention encoder captured interchannel relationships. A weighted fusion module combined these representations to improve the discriminative power and eliminate redundant computation. The approach achieved 95.73% intrasubject and 87.38% intersubject Acc.

4.2.4. Ensemble and Multidomain Transformers

Ensembles and specialized Transformers capture spectral and temporal patterns to enhance robustness and overall performance. Zeynali et al. [43] proposed a Transformer ensemble for EEG classification, where each model focuses on a different feature domain. A temporal Transformer captures waveform dynamics and short-term dependencies from raw EEG. A spectral Transformer processes PSD inputs to extract frequency-domain features and cross-frequency relationships. Their combination produced rich representations that led to strong results (F1 98.9% for cognitive workload). PSD preprocessing and targeted attention reduced noise and emphasized discriminative patterns. This fusion increased the computation, but the lightweight models mitigated overfitting, sped up convergence, and generalized across tasks without heavy pretraining. Ghous et al. [111] proposed a Transformer-based model for ER. Training proceeded in two stages: attention-enhanced base model development (AE-BMD) on SEED-IV and cross-dataset fine-tuning adaptation (CD-FTA) on SEED-V and MPED for generalization. Preprocessing was performed using Kalman and Savitzky–Golay (SG) filtering. FE included mel-frequency cepstral coefficients (MFCCs), gammatone frequency cepstral coefficients (GFCCs), PSD, DE, Hjorth parameters, band power, and entropy measures. Class imbalance was addressed using the Synthetic Minority Oversampling Technique (SMOTE). Spectral and temporal attention, positional encoding, MHA, and RNN/MLP-RNN layers captured precise patterns. The model achieved Acc. of 84% (SEED-IV), 90% (SEED-V), and 79% (MPED).

4.2.5. Specialized Attention Mechanisms and Dual Architectures

Mechanisms like gating, capsules, or regularization stabilize training, capture long-term dependencies, and refine EEG feature representations. Tao et al. [40] proposed a gated Transformer (GT) for EEG decoding. They integrated GRU/LSTM-inspired gating mechanisms into SA blocks to preserve the relevant EEG context, suppress noise, and stabilize FE over long sequences. The gating provides implicit regularization that improves convergence and maintains session stability. Trained from scratch without pretraining or augmentation, the model efficiently learns discriminative features for tasks such as continuous EEG decoding or ERP analysis. Wei et al. [112] introduced TC-Net, a Transformer capsule network, for EEG-based ER. They segmented EEG signals into non-overlapping windows and then processed them via a temporal Transformer module. A novel EEG PatchMerging strategy was used to balance global and local representations. Features were refined using an emotion capsule module that captured interchannel relationships before classification. TC-Net achieved strong performance on the DEAP and DREAMER datasets. It combines global context modeling, localized feature merging, and capsule refinement to improve the discriminative power, reduce redundancy, and enable efficient, robust recognition.

4.2.6. Cross-Integration Overlaps

Transformer-based EEG models have conceptual overlaps, where raw, generative, modular, ensemble, and hybrid approaches aim to improve efficiency, generalization, and scalability. All models [34,38,39,40,41,42,43,44,45,46,108,109,110,111,112] use SA mechanisms for FE. However, they differ in how they exploit it for efficiency. We classify the system in [46] under foundation models (Section 4.2.2) since its primary focus is masked self-supervised pretraining, despite combining it with spatiotemporal decomposition principles (Section 4.2.3) through time-channel segmentation. EEGTrans [45] incorporates a generative reconstruction mechanism within a two-stage architecture like modular Transformers (Section 4.2.3), but we place it under generative models (Section 4.2.2) since unsupervised representation learning is its central innovation. Models like those in [39,42] share parallel attention modules (Section 4.2.4) and dual-encoder hierarchies (Section 4.2.5). However, we place them under modular spatiotemporal designs (Section 4.2.3) to demonstrate their efficient decomposition, rather than dual fusion. Moreover, Refs. [43,111] utilize spectral and temporal attention (Section 4.2.3) within ensemble frameworks, but we group these models into multidomain ensembles (Section 4.2.4) because they focus on cross-domain integration. The works in [40,112] employ gating and capsule mechanisms, which can be considered as extensions of modular (Section 4.2.2) or generative Transformers (Section 4.2.3), but we classify them as specialized and hybrid attention mechanisms (Section 4.2.5) to highlight their focus on efficiency and model stability, rather than representational design.
Table 3. Transformer-based architectures for EEG classification.
Table 3. Transformer-based architectures for EEG classification.
Ref.Author/YearModelProtocolSamples ChannelsInspiration BasisAcc.
[38]Arjun et al. 2021ViT-CWT,
ViT-Raw EEG
EMO-VAT3232T Encoder (ViT)97.00/95.75
99.40/99.10
[34]Dosovitskiy et al. 2021ViT (Base/Large/Huge) T Encoder (ViT)77.63–94.55
[39]Song et al. 2021S3TMI9/922/3T Encoder [CNN]82.59/84.26
[40]Tao et al. 2021Gated TransformerMI, VAT109/664/128T Encoder [GRU]61.11/55.4
[42]Du et al. 2022ETSTResting10964T Encoder97.29–97.90
[43]Zeynali et al. 2023Ensemble TransformerVEPs864T Encoder96.10
[112]Wei et al. 2023TC-NetEMO, AVP32/2348/15T Encoder [CapsNet, ViT]98.59–98.82
[41]Siddhad et al. 2024Transformer-basedResting60/4814T Encoder 95.28
[44]Omair et al. 2024GETMI/Alpha EEGs9/203/16Transformer85.00
[46]Wang et al. 2024EEGPTERPs, MI,
SSVEP, EMO
9-238358/3-128Transformer [BERT, ViT]58.46–80.59
[108]Hu et al. 2024HASTFEMO32/1532/62Transformer [BERT]98.93/99.12
[110]Wei et al. 2025Fusion TransformerEMO 62T Encoder87.38/95.73
[45]Lim et al. 2025EEGTransMI1/7/9/143/22/59/128Transformer80.69–90.84
[111]Ghous et al. 2025AE-BMD, CD-FTAEMO15/20/2362T Encoder [RNN, MLP]79.00–95.00
[109]Muna et al. 2025SSTAFMI103/964/22Transformer68.30/76.83
Acc.: Accuracy, AVP: Audiovisual Evoked Potential, BERT: Bidirectional Encoder Representations from Transformers, CapsNet: Capsule Network, EMO: Emotion Protocol, EMO-VAT: Emotion with Visual Attention Task, ERP: Event-Related Potential, GRU: Gated Recurrent Unit, MI: Motor Imagery, MI/Alpha EEG: Motor Imagery with Alpha EEG, MLP: Multilayer Perceptron, Resting: Resting Protocol, RNN: Recurrent Neural Network, SSVEP: Steady-State Visual Evoked Potential, T Encoder: Transformer Encoder, Transformer: Transformer Model, VAT: Visual Attention Task, ViT: Vision Transformer, VEP: Visual Evoked Potential.

4.3. Efficiency in CNN–Transformer-Based Hybrids

Recent advances in EEG decoding have combined Transformer-based architectures with CNNs or TCNs to capture local and global features efficiently (Table 4). Key ideas have included hybrid CNN–Transformer designs, multibranch networks for parallel FE, temporal convolution modules, and self-supervised pretraining for richer feature maps. Many models have also used spatiotemporal attention and hierarchical encoding like patching or multistage FE to improve the Acc. and efficiency. The following sections review these models.

4.3.1. Sequential Pipelines

Sequential pipelines apply CNNs first to extract localized spatiotemporal representations. These features are then fed into Transformer encoders, which extract long-range relationships. These designs preserve the hierarchical nature of EEG while improving efficiency through convolutional prefiltering. Sun et al. [113] combined CNN-based spectral and temporal filtering with Transformer attention. Although detailed architectural specifications were not disclosed, the hybrid model demonstrated significant Acc. gains over CNN or RNN baselines. Omair et al. [114] proposed ConTraNet, a hybrid CNN–Transformer architecture for both EEG and electromyography (EMG). CNN layers extracted local patterns, and the Transformer modeled long-range dependencies. Designed to generalize across modalities and tasks, ConTraNet achieved top performance on 2–10-class datasets. With limited data, CNN filtering improved efficiency by reducing overfitting and helping the Transformer to focus on key global patterns. Wan et al. [115] developed EEGFormer. They employed a depthwise 1D CNN frontend to extract channel-wise features before feeding them into a Transformer encoder for spatial and temporal SA. The depthwise CNN reduced the parameters, allowing end-to-end training on raw EEG. EEGFormer achieved high performance across multiple tasks (emotion, depression, SSVEP). Ma et al. [116] proposed a hybrid CNN–Transformer for MI classification. Preprocessing included 4–40 Hz bandpass filtering, z-score normalization, and one versus rest (OVR)-CSP. A two-layer CNN extracted local spatiotemporal features, complemented by MHA across channels and frequency bands. The model achieved 83.91% Acc. on BCI-IV 2a. It outperformed CNN-only (58.10%) and Transformer-only (46.68%) baselines, showing efficient joint local–global FE. Zhao et al. [117] introduced CTNet, a CNN–Transformer hybrid for MI EEG. CNN layers extracted spatial and temporal features, followed by Transformer attention across channels and time. CTNet showed improvements of 2–3% over prior hybrids (82.5% BCI-IV 2a, 88.5% BCI-IV 2b). Liu et al. [118] developed ERTNet, an interpretable CNN–Transformer framework for emotion EEG. Temporal convolutions isolated important frequency bands, spatial depthwise convolutions captured the channel topology, and a Transformer fused abstract spatiotemporal features. Achieving 74% on DEAP and 67% on SEED-V, the model provided attention-based interpretability and reduced feature dimensionality. These models reduce overfitting and memory use and perform well in MI and ER.

4.3.2. Parallel and Multibranch Blocks

These hybrids process different feature types (temporal, spatial, or spectral) in dedicated branches before global fusion via attention or Transformer blocks. This allows simultaneous multiscale analysis with reduced redundancy. Li et al. [119] introduced a dual-branch CNN where one branch learned spatiotemporal features from raw EEG and the other processed scalograms. These features were fused and passed to a Transformer. This network achieved 96.7% Acc. on SEED and outperformed over ten previous methods. Xie et al. [120] introduced CTrans, a CNN–Transformer hybrid with dedicated spatial and temporal attention branches and optimized positional encodings. Certain variants embed EEG signals with CNNs before applying SA across channels or time. By separating attention dimensions, the model reduced the complexity, achieved 83% Acc. on two-class MI, and demonstrated efficient intersubject generalization. Si et al. [121] proposed TBEM, an ensemble combining a pure CNN and a CNN–Transformer hybrid. The CNN captured robust local features, and the hybrid model extracted global attention-based representations. Each block in the ensemble is lightweight, which enhances feature reliability. Averaging predictions improved Acc. and generalization, winning an IEEE EEG-ER competition. Yao et al. [122] introduced EEG ST-TCNN, a parallel spatiotemporal Transformer–CNN network. Separate Transformers modeled channels and time, with CNN fusion combining the outputs. This separation reduced Complex., and it achieved 96–96.6% Acc. on SEED/DEAP. Lu et al. [123] developed CIT-EmotionNet, a CNN–interactive Transformer for emotion EEG. Raw signals were converted to spatial–frequency maps; a parallel CNN extracted local features; and a Transformer captured global dependencies. Iterative interaction between the CNN and Transformer feature maps fused global and local information efficiently. The model achieved 98.57% on SEED and 92.09% on SEED-IV using a compact 10-layer network. These models improve performance due to their modular design and shared parameters, but the fusion stage slightly increases the computational overhead.

4.3.3. Integrated and Hierarchical Attention

These techniques integrate CNN and Transformer components within attention modules. Hierarchical attention captures spatial, spectral, and temporal dependencies across scales, yielding rich multilevel representations. Bagchi et al. [124] designed a ConvTransformer for single-trial visual EEG classification, integrating temporal CNN filters and MHSA within a single hybrid block. This design avoids the need for deep networks by capturing local temporal patterns and global cross-channel relationships simultaneously. The model outperformed prior CNN architectures on five visual tasks, highlighting its efficiency in extracting informative EEG features. Song et al. [125] proposed EEG Conformer, a compact CNN–Transformer hybrid inspired by audio conformers. Shallow CNNs extract local spatiotemporal features, and six Transformer layers capture the global context. The model achieved strong MI decoding performance while providing interpretable attention maps. Si et al. [126] developed MACTN, a hierarchical CNN–Transformer for ER. A temporal CNN captures local patterns, and sparse attention learns temporal dependencies, while channel attention highlights key electrodes. This mixed attention strategy achieved top Acc. on DEAP datasets and won the 2022 BCI ER Challenge. Gong et al. [127] designed ACTNN, a CNN–Transformer hybrid for multivariate emotion EEG. Spatial and spectral attention emphasize relevant channels and frequency bands, convolutional layers extract local features, and a Transformer encodes the temporal context. ACTNN efficiently focused on emotion-relevant brain regions while providing interpretable attention visualizations. It achieved 98.47% SEED and 91.90% SEED-IV Acc. Such models yield high Acc. but at a moderate cost in complexity compared to sequential hybrids (Section 4.3.1).

4.3.4. TCN-Enhanced Models for Lightweight Temporal Modeling

TCNs complement attention mechanisms by efficiently modeling long-range temporal dependencies with dilated convolutions that reduce the Transformer depth and improve latency. Altaheri et al. [128] proposed ATCNet, integrating MHSA with a TCN [87] and CNN spatial filters for MI EEG. Dilated depthwise separable convolutions in the TCN efficiently modeled temporal patterns, while attention focused on discriminative features. The model achieved 84–87% Acc. on BCI-IV 2a datasets with low latency. This shows that shallow attention blocks with lightweight TCN and CNN modules yield high-performance EEG decoding. Nguyen et al. [129] implemented EEG-TCNTransformer by combining a TCN [87] with a Transformer for MI EEG. Dilated convolutions in the TCN capture long-range patterns, while a Transformer models residual global dependencies. The model achieved 83.41% on BCI-IV 2a without bandpass filtering, learning optimal frequency filters internally with a low Comp. Cost. Cheng et al. [130] proposed MSDCGTNet, a fully integrated emotion EEG framework combining multiscale dynamic 1D CNNs, a GT encoder, and a TCN [87]. The CNN extracted spatial–spectral features directly from raw signals, avoiding spectrogram transformations. The GT used MHA with a GLU to capture global dependencies, while the TCN modeled the temporal context via dilated causal convolutions. The model achieved 98–99.7% across DEAP, SEED, and SEED-IV, maintaining low processing times and parameters per sample. These efficient models can maintain high Acc. without much computing power, even when using shallow attention mechanisms.

4.3.5. Pretrained and Self-Supervised Transformers

To learn EEG-transferable representations across tasks, datasets, and modalities, the following approaches have been used. Kostas et al. [131] developed BENDR, a self-supervised EEG pretraining framework. The hierarchical encoder had multiple 1D convolutional blocks with grouped convolutions, compressing raw EEG into latent representations. A Transformer encoder with MHSA operated on these sequences. Pretraining via contrastive self-supervision enabled cross-task generalization. Convolutional position encodings reduced the memory requirements, allowing a 10-M-parameter Transformer to train on limited data. Yang et al. [132] introduced ViT2EEG, fine-tuning a ViT [34] pretrained on ImageNet to process EEG represented as channel × time patches with a CNN embedding. Transfer learning from vision tasks improved the regression performance on EEG. This approach efficiently extracted spatiotemporal features without training a large Transformer from scratch, showing the value of visual priors for EEG. Jiang et al. [133] proposed LaBraM, a foundation Transformer model for cross-dataset EEG representation. EEG signals were segmented into channel-wise patches and tokenized via a vector quantized (VQ) neural codebook. A masked Transformer predicted missing patch codes, pretraining on 2500 h of multi-dataset EEG. LaBraM generalized across emotion, anomaly, and gait tasks. It also reduced the sequence length, enabling compact fine tuning without retraining new models for each dataset. Li et al. [134] proposed a Multitask Learning Transformer (MTL-Transformer) with an auxiliary EEG reconstruction head. The model was trained to reconstruct EEG alongside the main task, which helped it to learn better. This improved downstream tasks like eye-tracking regression. These frameworks shift efficiency from model compression to data efficiency for better generalization with minimal fine tuning. However, building them requires a large amount of upfront computing power.

4.3.6. Cross-Integration Overlaps

All these models [113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134] integrate convolutional and attention mechanisms for EEG feature extraction. Sequential pipeline models [115,116,118] also employ convolutional frontends that compress raw EEG and facilitate cross-task generalization, aligning them with those in Section 4.3.5. Architectures like those in [120,122,123] (Section 4.3.2) integrate hierarchical attention mechanisms (Section 4.3.3) within their fusion stages that improve selective spatiotemporal feature processing and reduce redundancy. TCN-enhanced models [128,130] (Section 4.3.4) incorporate hierarchical attention (Section 4.3.3). Ref. [128] uses multiscale attention to emphasize temporal features and reduce the Transformer depth. Ref. [130] combines multiscale CNNs with a GT encoder and attention, fusing spatial, spectral, and temporal features for richer feature integration. The models in [131,132,133,134] utilize convolutional or tokenization frontends like sequential pipelines (Section 4.3.1), supporting efficient attention computation and compact feature representation. These overlaps reduce computation and enhance feature efficiency and generalization.
Table 4. Hybrid CNN–Transformer architectures for EEG classification.
Table 4. Hybrid CNN–Transformer architectures for EEG classification.
Ref.Author/YearModelProtocolSamples ChannelsInspiration BasisAcc.
[113]Sun et al. 2021Fusion-CNN-TransMI10964CNN–T Encoder87.80
[131]Kostas et al. 2021BENDRRaw EEG>10,00020CNN–T Encoder86.70
[124]Bagchi et al. 2022EEG-ConvTranformerVEPs10128CNN–T Encoder (MHA)89.64
[120]Xie et al. 2022CTransMI10964CNN–T Encoder83.31
[128]Altaheri et al. 2023ATCNetMI922EEGNet–T Encoder (MHA)–TCN70.97–85.38
[132]Yang et al. 2023ViT2EEGRaw EEG27128EEGNet–T Encoder (ViT)55.40–61.70
[119] Li et al. 2023Dual-TSSTMI, EMO9/1522/3/62CNN–T Encoder96.65
[125]Song et al. 2023EEG ConformerMI9/9/1522/3/62CNN–T Encoder (MHA)78.66/95.30
[115]Wan et al. 2023EEGFormerSSVEP70/15/1264/62/6CNN–T Encoder-CNN92.75
[121]Si et al. 2023TBEMEMO80/630/30CNN–T Encoder-CNN
(HybridNet–PureConvNet)
42.50
[127]Gong et al. 2023ACTNNEMO1562CNN–T Encoder95.30
[116]Ma et al. 2023CNN-TransformerMI922CNN–Transformer83.91
[114]Omair et al. 2024ConTraNetMI9/1053/64CNN–T Encoder86.98
[126]Si et al. 2024MACTNEMO80/3230/28CNN–T Encoder (MHA)67.80
[117]Zhao et al. 2024CTNetMI922/3CNN–T Encoder83.11–97.81
[133]Jiang et al. 2024LaBraMResting, MI,
Raw EEG
>14019-64CNN–T Encoder82.58
[118]Liu et al. 2024ERTNetEMO-AAT32/1632/62CNN–T Encoder74.23
[122]Yao et al. 2024EEG ST-TCNNEMO15/3262/32T Encoder–CNN95.73–96.95
[134]Li et al. 2024MTL-Transformer1-2EEG, Eye tracking356128ViT2EEG–CNN
[123]Lu et al. 2024CIT-EmotionNetEMO1562ResNet II–T Encoder92.09/98.57
[129]Nguyen et al. 2024EEG-TCNTransformerMI922EEG-TCNet–T Encoder (MHA)83.41
[130]Cheng et al. 2024MSDCGTNetEMO32/1532/62CNN–T Encoder–TCN98.85/99.67
Acc.: Accuracy, AAT: Auditory Attention Task, EEG: Electroencephalogram, EEGNet: EEG-Specific Neural Network, HybridNet: Hybrid Neural Network, MHA: Multihead Attention, MI: Motor Imagery, Raw EEG: Raw EEG Signals, ResNet: Residual Neural Network, SSVEP: Steady-State Visual Evoked Potential, TCN: Temporal Convolutional Network, Transformer: Standard Transformer Model, T Encoder: Transformer Encoder, ViT: Vision Transformer, VEP: Visual Evoked Potential, ViT2EEG: Vision Transformer Adapted to EEG.

4.4. Efficiency in Recurrent Deep Learning Models

Beyond CNN–Transformer hybrids, other DL architectures also excel in classifying EEG signals. Recurrent models are popular for their ability to process sequential EEG data. While effective, their computational efficiency has varied. Newer hybrid models, such as LSTM-CNN and LSTM–Transformer combinations, balance temporal modeling with spatial feature extraction. They improve the Acc. at the cost of computation. This section examines efficiency-oriented designs among recurrent hybrid models (Table 5). We focus on how attention, convolutional integration, temporal alignment, and multimodal fusion influence the Comp. Cost and deployment feasibility in real-time or resource-constrained applications.

4.4.1. Attention-Based Architectures

Attention mechanisms have been paired with frequency-domain decomposition to isolate discriminative EEG features in low-frequency bands. Zhang et al. [135] proposed an attention-based encoder–decoder RNN with XGBoost (XGB) for EEG-based person identification. Delta-band EEG was isolated and passed through an RNN with attention to emphasize informative channels. The resulting features were classified with XGB, achieving high Acc. across single-trial, multi-trial, and public datasets. Training was computationally intensive, but inference took less than 1 s, allowing real-world deployment. The method performed well with limited training data and across various EEG setups. Zhang et al. [136] developed a multimodal authentication system using EEG and gait data, combining an attention-based RNN with a one-class SVM and a nearest neighbor (NN) classifier. Delta-band EEG and gait sequences were modeled with an LSTM and attention to extract temporal features. An EEG-based 1D filter first rejected impostors before gait data were processed. With a 0% FAR and 1% FRR, DeepKey achieved strong security with 0.39 s latency, despite its longer setup time. Its architecture scaled to new users without retraining, supporting high-security applications. Balci [137] proposed DM-EEGID, a hybrid model combining an attention-based LSTM-MLP with random forest (RF)-based feature selection. EEG signals were decomposed into sub-bands, and delta was identified as the most distinctive pattern. RF feature selection determined optimal electrode subsets, reducing the channel count while maintaining high Acc. The LSTM attention mechanism focused on salient signal segments, and the MLP finalized classification. The system reached 99.96% Acc. for eyes-closed and 99.70% for eyes-open data. Its reliability with fewer electrodes shows a practical balance between efficiency and performance. These attention-based designs highlight efficiency through selective focus and data reduction rather than network size. Despite heavier training, their low latency and reduced channels make them practical for user-specific and real-time applications.

4.4.2. CNN–Recurrent Hybrids

Joint spatial–temporal representation learning has been achieved through the integration of convolutional and recurrent layers to encode EEG dynamics. Wilaiprasitporn et al. [138] presented a cascaded CNN-RNN architecture, evaluating both CNN-LSTM and CNN-GRU for EEG-based person identification using the DEAP dataset. EEG signals were converted into sequences of 2D spatial meshes, which CNN layers processed for spatial pattern learning, followed by RNN layers to model temporal dynamics. This two-stage pipeline provided expressive features with low latency. CNN-GRU variants trained faster, reached up to 100% CRRs, and remained effective with as few as five electrodes, showing practicality for portable systems. Sun et al. [139] introduced a 1D convolutional LSTM that combined 1D CNN layers with LSTM units for spatiotemporal EEG modeling. After preprocessing, CNN layers extracted localized features from segmented EEG signals, and LSTM layers captured temporal dependencies. Despite its complexity, the model reached 99.58% rank-1 Acc. with only 16 channels, outperforming deeper CNN and LSTM baselines. It remained robust across mental states and tasks, and the small number of channels makes it suitable for real-time use. Chakravarthi et al. [140] proposed a hybrid DL model for EEG-based ER, combining a CNN-LSTM framework with ResNet152 to address the limitations of traditional methods in PTSD-related applications. Using the SEED-V dataset (happiness, disgust, fear, neutral, and sadness), EEG signals were normalized and bandpass-filtered (1–75 Hz). FE used MFCCs from FP1, FP2, FC6, and F3, along with the sample entropy, Hurst exponent (R/S analysis), and average power from the alpha, beta, gamma, and theta bands. These features were converted into topographic maps and fed into the model for training using categorical cross-entropy and the Adam optimizer. The model achieved 98% Acc., with a low MSE, outperforming SVM and ANN baselines. Its integration of spatial, temporal, and spectral features supported the reliable recognition of non-verbal emotional cues, pointing out potential for emotion-aware interfaces. CNN-RNN hybrids improve efficiency by distributing the computational load; CNNs handle low-cost spatial abstraction, while RNNs capture essential temporal dependencies. This supports faster training and robust inference when electrode counts or sampling durations are constrained.

4.4.3. Stimulus-Locked Models

Temporally aligned brain responses to visual or auditory stimuli have been used to extract consistent EEG features. Puengdang et al. [141] utilized a personalized LSTM model for person authentication with dual stimuli: 7.5 Hz SSVEP and ERP components evoked by target images. Preprocessed EEG signals were shaped into fixed-length time series and fed into individual-specific LSTM networks trained on authorized and impostor data. Using only seven channels, the model reached 91.44% verification Acc., balancing performance and setup efficiency. Dual stimulation improved user specificity, although Acc. varied across individuals, revealing a need for further personalization. Zheng et al. [142] developed an ERP-guided LSTM framework for EEG-based visual classification. ERPs were averaged across trials to enhance signal quality and reduce noise. Then, they were fed into an LSTM encoder to learn temporal patterns, followed by softmax. The model reached 66.81% Acc. for six-class and 27.08% for 72-class classification, outperforming several raw EEG baselines. These designs improve efficiency through temporal alignment, reducing signal variability and feature redundancy. By leveraging evoked responses, they achieve compact yet reliable representations, although subject dependence limits scalability.

4.4.4. Multimodal and Parameter-Efficient Hybrids

Cross-modal fusion frameworks have combined EEG with other biometric cues using contrastive learning or joint embedding strategies to improve classification. Jin et al. [143] introduced the Convolutional Tensor-Train Neural Network (TTNN), a hybrid model integrating CNNs and tensor networks (TT). EEG signals were segmented into 1 s trials and processed by depthwise separable CNNs. Features were then transformed into high-order tensors by the TT layer, capturing multilinear dependencies with up to 800 times fewer parameters than fully connected layers. This low-rank representation reduced memory use in multitask EEG-based brain print recognition. CTNN achieved over 99% Acc., making it suitable for real-world applications. Kumar et al. [144] proposed a bidirectional LSTM-based (BLSTM-NN) authentication framework by combining EEG and dynamic signatures. DFT was used to derive angular features from signatures. Separate BLSTM models were trained independently for each modality and fused at the decision level using the Borda count and max rule. This efficient modular design avoided complex feature alignment and achieved 98.78% Acc. in security-critical scenarios. It even maintained a low FAR (3.75%) and HTER (1.87%) with noisy individual modalities. Chakladar et al. [145] proposed a multimodal Siamese neural network (mSNN) for user verification by fusing EEG and offline signatures. The architecture used parallel CNN and LSTM encoders for FE from both modalities. The embeddings were compared via contrastive loss, enabling one-shot learning with minimal data. The model achieved 98.57% Acc. and a 2.14% FAR. This demonstrates the robustness of multimodal fusion against single-trait spoofing attempts. These multimodal and tensorized designs achieve efficiency through shared representations and modular training. Compression and late fusion reduce memory and retraining, offering scalable solutions for multisensor or privacy-sensitive EEG systems.

4.4.5. Cross-Integration Overlaps

The work in [136] (Section 4.4.1) overlaps with that in Section 4.4.4 as its attention–RNN was fused with gait-based classifiers to improve efficiency through shared feature representations. Balci [137] in Section 4.4.1 intersects with Section 4.4.2 by combining an attention–LSTM with MLP and RF selection, achieving CNN-like compression through selective sub-band and electrode optimization. Ref. [140], discussed in Section 4.4.2, also relates to Section 4.4.4. It integrates ResNet152 with spectral features in a fusion-like design to enhance efficiency through feature reuse and pretrained transfer. While we include [141] in Section 4.4.3, it shares traits with Section 4.4.1 due to its temporal focus on stimulus-aligned epochs to mimic selective attention for efficiency. Lastly, Ref. [143] (Section 4.4.4) aligns with Section 4.4.2. It uses CNN feature extraction coupled with tensor-train compression, using low-rank representation for scalable and efficient learning.
Table 5. Other hybrid architectures for EEG classification.
Table 5. Other hybrid architectures for EEG classification.
Ref.Author/YearModelProtocolSamples ChannelsInspiration BasisAcc.
Pure Architectures
[144]Kumar et al. 2019BLSTM-NNVEPs33/5814/16LSTM97.57
[141] Puengdang et al. 2019LSTM-basedSSVEP, ERPs206LSTM91.44
[142]Zheng et al. 2020ERP-LSTMVEP10128LSTM66.81
[145]Chakladar et al. 2021mSNNMI7014SNN98.57
Other Hybrids
[138]Wilaiprasitporn et al. 2015CNN-LSTM, CNN-GRUERPs, EMO32/405/32CNN-LSTM/GRU99.17–99.90
[135]Zhang et al. 2018MindIDResting814/64Attention RNN98.20–99.89
[139]Sun et al. 20191DCNN-LSTMResting10964/32/41D CNN-LSTM94.34–99.58
[136]Zhang et al. 2020DeepKeyRelaxing714Attention RNN99.00
[143]Jin et al. 2021CTNNResting, MI, EEG105/20/3264/32/7CNN-TN99.50
[140]Chakravarthi et al. 2022ResNet152-LSTMResting204ResNet-LSTM98.00
[137]Balci et al. 2023DM-EEGIDResting10948Attention LSTM99.97–99.70
Acc.: Accuracy, Attention: Selective Neural Focus Mechanism, Attention RNN: Attention-Based Recurrent Neural Network, GRU: Gated Recurrent Unit, TN: Tensor Network, ERP: Event-Related Potential, EMO: Emotion Protocol, LSTM: Long Short-Term Memory, MI: Motor Imagery, Resting: Resting Protocol, ResNet: Residual Network, RNN: Recurrent Neural Network, SNN: Siamese Neural Network, SSVEP: Steady-State Visual Evoked Potential, VEP: Visual Evoked Potential.

5. Comparative Analysis of Efficiency Trade-Offs

5.1. Proxy Metric Development

In our method, for each DL architectural category (CNNs, Transformers, CNN–Transformers), we performed the following:
  • We collected any metrics reported by the authors based on four axes: (1) accuracy (Acc.) to represent the overall performance of the model—we kept the highest reported value; (2) computational resources to represent system costs, such as the architectural cost (parameters), compute cost (FLOPs/MACs), memory footprint, and inference latency; (3) operational costs for acquisition costs like the epoch length (EEG segment) and channel count—for the channels and sample size, we kept the minimum reported values; (4) training costs such as training time and the GPU/TPU/cloud environment used for training. Thus, we were able to approximate the total cost to train and deploy these systems.
  • Due to the inconsistent and incomplete reporting of system metrics, we employed a mixed approach to create four proxy metrics (each scored on a scale from 1 (Best/Lowest Cost) to 5 (worst/highest cost)). This calculation was based on evaluating a set of quantitative metrics (white background in tables) and the authors’ qualitative claims (orange background in tables) to define a unified cost dimension across all models.
  • Complex. Proxy (Complex.): This metric reflects the size and depth of the architecture using the number of parameters (k/M) and EEG channels used. It quantifies the memory cost and architectural burden. Low Complex. is ideal for edge devices (Table 6).
Table 6. Complexity Proxy Development Rationale.
Table 6. Complexity Proxy Development Rationale.
ScaleRationale
1 (Very Low)Very simple or lightweight design. Lowest parameter counts. Designed for few channels. Minimal depth.
2 (Low)Small or simplified models. Low parameter counts. Low channel count. Uses efficient blocks.
3 (Medium)Standard deep learning models. Moderate parameter count. Low channel count.
4 (High)Deep, complex, or specialized models. High parameter count. Applied to high-channel datasets.
5 (Very High)Very complex models. Very high parameter count. Requires full channel count.
  • Computational Cost Proxy (Comp. Cost): This metric reflects the hardware resources required for a single prediction (inference) after training is complete. It is measured by MACs/FLOPs and the model parameters. It quantifies the processing cost. A low cost is crucial for real-time BCIs (Table 7).
Table 7. Computational Cost Proxy Development Rationale.
Table 7. Computational Cost Proxy Development Rationale.
ScaleRationale
1 (Minimal)Very low FLOPs/MACs. Designed for mobile/embedded focus.
2 (Low)Low parameter count. Claims to be more efficient/faster than standard models.
3 (Medium)Standard operational load (for most desktop/GPU systems). Claims of optimization but lacks high efficiency.
4 (High)High operational load (a powerful discrete GPU is required). High resource use optimized for Acc.
5 (Very High)Very high FLOPS/MACs. Model’s Complex. indicates high computational demand.
  • Operational Cost Proxy (Oper. Cost): This metric reflects the time required for a model to generate a prediction (latency) and the data requirements. It quantifies the real-world usability and user burden needed for the system. This is critical for systems needing seamless and immediate user interaction (Table 8).
Table 8. Operational Cost Proxy Development Rationale.
Table 8. Operational Cost Proxy Development Rationale.
ScaleRationale
1 (Real-Time)Reported latency is in sub-seconds. Uses few channels, short segments, and raw signals. No calibration.
2 (Near Real-Time)Latency is slightly longer (suitable for basic BCI control). Minimal preprocessing.
3 (Acceptable)Latency is half to a few seconds (adequate for user-paced, non-immediate tasks). Uses data transformation.
4 (Slow)Latency exceeds seconds (unsuitable for immediate interaction). Extensive pretraining and fine-tuning.
5 (Very Slow)Relies on a high number of channels or long segments. Large-scale pretraining or high input dimensions.
  • Training Cost Proxy (Train. Cost): This metric reflects the training time and the number of training epochs required to train the model from scratch. It quantifies the initial development and time investment before deployment. A low cost is desirable for research and rapid development (Table 9).
Table 9. Training Cost Proxy Development Rationale.
Table 9. Training Cost Proxy Development Rationale.
ScaleRationale
1 (Instant)Extremely low epoch count. Training time reported in minutes. Efficient transfer learning for new users.
2 (Fast)Low to moderate epoch counts (meaning quick convergence). No high-cost hardware.
3 (Standard)Typical training time for most deep learning models on one GPU. Common epoch count for the domain.
4 (Long)Extended training time on one GPU (suggesting a larger dataset or deeper model). High epoch count.
5 (Very Long)Extensive training time (hours/days) or complex iterative process on high GPUs. Very high epoch count.

5.1.1. Efficiency Insights: CNNs

The collected data (Table 10) reveal clear trade-offs between metrics across studies.
  • Acc. vs. Complex. and Comp. Cost: Compact architectures can maintain strong performance with a limited Comp. Cost. Ref. [12] achieved 91% Acc. using only 1.066 K parameters. In [94], the results showed suitability for low-power deployment, with 4.27 K parameters, 6.8 M MACs, and 77.35% Acc. Ref. [106] used 17.58 K parameters and 20.69 M MACs to achieve 83.73% Acc., although the authors noted that the increased MACs might limit use on lightweight devices for a modest Acc. gain.
  • Acc. vs. Oper. Cost: Reducing the number of channels lowers the setup complexity and hardware requirements without necessarily reducing Acc. With only three channels, Refs. [99,100] achieved 98.04% and 99.05% Acc., respectively. Ref. [100] also used 95 s epochs, balancing Acc. and recording efficiency. The epoch length is another important factor for real-time operation. Ref. [70] achieved only 76% Acc. with 62.5 ms segments, showing a speed–Acc. trade-off. In contrast, Refs. [79,80] reached around 99.9% Acc. using 1 s segments, trading speed for better performance. Low latency is critical for online BCI and authentication systems. Ref. [97] reported 1.79 ms latency with 96.15% Acc., suitable for real-time use. Ref. [94] showed 197 ms latency, representing a slower but computationally efficient design.
  • Acc. vs. Train. Cost: Training requirements affect deployment feasibility. Ref. [11] reported 24.77 min training time, compared to 33 s for the FBCSP baseline, reflecting the high Comp. Costs of deep models for a moderate Acc. gain (85.20%). Refs. [79,80] required only 2–4 fine-tuning epochs (<1 min) to adapt to new users, reducing the calibration time. Ref. [96] reported 8–18 min enrollment time per subject, showing the time cost of subject-specific model adaptation.

5.1.2. Efficiency Insights: Transformers

From Table 11, we can observe trade-offs between Acc., Comp. and Oper. costs.
  • Acc. vs. Complex. and Comp. Cost: Model size influences both Acc. and computational efficiency. The ViT and EEGPT models [34,46] lack inductive biases useful for small datasets, which resulted in lower Acc. compared to ResNet. Ref. [34] used large-scale pretraining with 14–300 M images to overcome this trade-off. In [46], performance was scaled with up to 101 M parameters, but this increased the model size and memory usage. In contrast, Ref. [39] achieved 84.26% Acc. with only 6.50–8.68 K, which means that compact models can maintain competitive performance. Architectural choices also affect computational efficiency. Sequential RNNs/LSTMs have long training times for long sequences. Transformer-based models [40,42] address this by using faster parallel attention mechanisms. Expanding the input signal window provides a richer context and better performance but increases the computational demands and memory usage. Ref. [44] mitigated this by projecting the input into a lower-dimensional latent space (100). Ref. [45] had to limit its codebook to fit GPU constraints during training.
  • Acc. vs. Oper. Cost: Transformer-based models suppress heavy FE, which reduces the operational complexity. Using raw signals [38] led to 99.4% Acc. with lower Oper. Costs compared to the CWT method (97% Acc.), meaning that reduced preprocessing can maximize performance. However, the raw data approach [41] yielded lower performance compared to traditional feature engineering. In [43], the simple temporal Transformer had a low Oper. Cost, while the spectrotemporal ensemble model had a high Oper. Cost but provided 96.1% Acc.
Table 10. CNN-based models’ metrics.
Table 10. CNN-based models’ metrics.
Ref.Author/YearParameters
(K/M)
MACs/FLOPs
(M/G)
Latency (s)Training Time
Epochs
(s/m/h)
Memory FootprintEpoch Length
(Segment)
(ms/s)
GPU
TPU
Cloud
Acc. (%)Sample SizeChannelsComplex.Comp. CostOper. CostTrain. Cost
[70]Ma et al. 2015-----62.5 ms-88.0010643323
[71]Mao et al. 2017---0.3 h---97.00100643332
[75]Gonzalez et al. 2017---1 M iterations-1 s-94.0123162334
[82]Das et al. 2017---50 epochs-600 ms-98.8040172232
[83]Cecotti et al. 2017-----800 msNVIDIA
GTX 1080
90.5016642335
[11]Schirrmeister et al. 2017---24 m 46 s-4 sNVIDIA GeForce
GTX 980
85.20933434
[87]Bai et al. 201870 K--Fast convergenceLow1 s-99.00--3232
[12]Lawhern et al. 20181.066 K--500 epochs-4 s NVIDIA Quadro
M6000 GPU
91.009221233
[104]Wu et al. 2018--7s500 epochs-2 s-97.6010162443
[72]Schons et al. 2018-----1 s-99.00109643333
[73]Di et al. 2018-----1 sGPU99.9033644333
[74]Zhang et al. 2018---Long training10x1.25 s-89.0015642334
[76]Waytowish et al. 2018-----1s-80.001082223
[78]Lai et al. 2019---30 repetitions---83.2110642232
[85]Chen et al. 2019---0.5 h--NVIDIA GeForce
GTX TITAN X
97.0610284232
[79]Wang et al. 2019---2–4 epochs (fine-tune)-1 s-99.731082221
[77]Yu et al. 2019---50 iterations---96.78892232
[84]Cecotti et al. 2019-----800 ms-92.8016644333
[80]Wang et al. 2019---<1 min, 0 epochs-1s-99.9859462231
[105]Özdenizci et al. 2019---100 epochs-0.5 s-98.603163332
[93]Salimi et al. 2020---100 epochs-1.1 sNVIDIA
Tesla K80
95.0026281232
[94]Ingolfsson et al. 20204.27 K6.8 M197 ms750 epochs396 kB4 sNVIDIA GTX
1080 Ti GPU
97.449221144
[88]Riyad et al. 2020---180 epochs-4 sNVIDIA
P100 GPU
74.089223342
[89]Liu et al. 2020-----4 sNVIDIA RTX
2080Ti GPU
97.689224443
[95]Kasim et al. 2021---1200 epochs-3 s-97.1716163335
[90]Zhu et al. 2021-----1 s-96.49109644333
[106]Musallam et al. 202117.58 K20.69 M-1000 epochs1188 kB4.5 sTensorFlow94.419223444
[107]Mane et al. 2021---600/1500 epochs-4 s-81.119202345
[86]Salami et al. 20223 K--500 epochs-4 s-78.749201243
[81]Zhang et al. 2022---Hour level-4 s-82.3370164545
[99]Bidgoly et al. 2022---30 epochs-1 s-98.0410932212
[96]Wu et al. 2022450.626 K-4 s8–10 m (enrollment)-4 s-99.489104422
[97]Altuwaijri et al. 202210.17 K-1.79 ms1000 epochs-4.5 sGoogle Colab96.159222144
[98]Autthasan et al. 202255.232 K-0.1–0.3 s0.47–1.36 s/epoch-2 sNVIDIA Tesla
V100 GPU
72.039153232
[92]Ding et al. 202312.56 K--500 epochs-2–4 s-63.7527322333
[100]Alsumari et al. 202374.071 K--20 epochs-5 sGoogle Colab99.0510932311
[101]Yap et al. 20235–45 M-2 s30 epochs-4.5 sGTX 1080 Ti80.0030142124
[102]Chen et al. 2024-----2 s-93.811564333
[103]Shakir et al. 2024-----1 s-95.0010932213
[91]Lakhan et al. 2025---20 epochs--NVIDIA
Tesla V100GPU
99.265484321
Table 11. Transformer-based models’ metrics.
Table 11. Transformer-based models’ metrics.
Ref.Author/YearParameters
(K/M)
MACs/FLOPs (M/G)Latency (s)Training Time
Epochs (s/m/h)
Memory FootprintEpoch Length
(Segment)
(ms/s)
GPU
TPU
Cloud
Acc. (%)Sample SizeChannelsComplex.Comp. CostOper. CostTrain. Cost
[38]Arjun et al. 2021-----6 s-99.4032321112
[34]Dosovitskiy et al. 2021-----14–300 M
image patches
TPU v3 core days94.55--3234
[39]Song et al. 20216.50–8.68 K----Small segments-84.26931111
[40]Tao et al. 2021-----20–460 ms-61.116643222
[41]Siddhad et al. 2024-------95.2848142212
[42]Du et al. 2022-----1s-97.90109642212
[43]Zeynali et al. 2023---1000 epochs---96.108644335
[44]Omair et al. 2024----Latent dim. 100150 time stamps-85.00933322
[45]Lim et al. 2025(Embed. size 256)--<1 day>24 GB2 sRTX 4090 GPU90.84135544
[46]Wang et al. 202410–101 M--200 epochs-4 s8 NVIDIA 3090s GPUs80.59935555
[108]Hu et al. 2024---100 epochs--NVIDA TESLA T4 Tensor Core GPU99.1215324323
[109]Muna et al. 2025---20 epochs--CUDA Cloud76.839223331
[111]Ghous et al. 2025---50 epochs---95.0015624443

5.1.3. Efficiency Insights: CNN–Transformer Hybrids

The data (Table 12) highlight many trade-offs between the different costs.
  • Acc. vs. Complex. and Comp. Cost: Models with high parameter counts deliver higher accuracy at the expense of computational efficiency. For instance, architectures with up to 23.55 M [131] or even 369 M [133] parameters achieved strong performance but required more computation. In contrast, Ref. [117] achieved very high Acc. (97.81%) with a very low parameter count (24.9–25.7 K), indicating high architectural efficiency. Similarly, Ref. [128] is noted for its small memory footprint and low parameter count (115.2 K), making it suitable for resource-constrained applications or embedded BCI applications. The models in [120,126] show that longer EEG data segments or window lengths generally increase Acc. but also raise the computational complexity.
  • Accu. vs. Complex.: Pure Transformer models lack inductive bias, which necessitates a large amount of data to prevent overfitting. Hybrid models [114,117] reduce this data dependency and complexity by incorporating CNN FE modules for improved training efficiency and performance stability.
  • Oper. Cost vs. Train. Cost: Some models optimize real-time usability at the expense of training overhead. The model in [130] showed a low latency of 0.0043 s with high Acc. (99.67%), showing a strong design for real-time operation (low Oper. Cost). Others [129] had up to 5000 epochs, indicating a high Train. Cost.

5.1.4. Efficiency Insights: Recurrent Hybrids

Based on Table 13, the following are key efficiency trade-offs.
  • Acc. vs. Complex. and Comp. Cost: Multimodal EEG systems increase user complexity yet yield high security. Ref. [144] boosted the Acc. from 97.57% (unimodal) to 98.78% (EEG and signature). Similarly, Ref. [136] achieved 99.57% overall Acc. by combining EEG and gait. Advanced models can reduce the computational burden without compromising Acc. Ref. [143] used tensor-train decomposition for computational efficiency gains. It required only 1.6 K parameters for classification, compared to a traditional model at 1.28 M parameters. This led to a reduced memory footprint with 99.50% Acc.
  • Acc. vs. Oper. Costs: Channels’ dimensionality reduction increases efficiency and user practicality while maintaining high Acc. Refs. [139,140] demonstrated high Acc. (99.58% and 98.00%, respectively) using a minimal number of four channels. Ref. [138] achieved a 100% CRR with 32 channels and still maintained a 99.17% CRR when reduced to five, making the system practical and efficient. Ref. [137] found the optimal efficiency–Acc. balance at 48 channels out of 64. An operational burden during data collection is sometimes accepted for FE gains to improve both the signal-to-noise ratio and Acc. Ref. [142] accepted a very high Oper. Cost, requiring over 50,000 trials, for an enhanced feature space. This resulted in a 30.09% improvement in classification Acc. over comparable methods.
  • Acc. vs. Train. Cost: Many high-performing systems [135,137,139] exhibit long training times. For example, Ref. [139] achieved 99.58% Acc. by accepting a longer training time but balanced this with a fast-testing latency of 0.065 s.
  • Oper. Cost vs. Train. Cost: A few authors have accepted high Train. Costs in exchange for very low Oper. Costs. Ref. [135] had a long training time in exchange for less than 1 s latency for a better authentication decision and practical deployment. Ref. [139] achieved lower batch testing latency of 0.065 s. Ref. [145] used one-shot learning training on as few as six pairs. It resulted in a reduced initial Oper. Cost for user enrollment despite a long total training time (870 min).
Table 12. CNN–Transformer hybrids’ metrics.
Table 12. CNN–Transformer hybrids’ metrics.
Ref.Author/YearParameters
(K/M)
MACs/FLOPs
(M/G)
Latency (s)Training Time
Epochs
(s/m/h)
Memory FootprintEpoch Length
(Segment)
(ms/s)
GPU
TPU
Cloud
Acc. (%)Sample SizeChannelsComplex.Comp. CostOper. CostTrain. Cost
[113]Sun et al. 2021-------87.80109643323
[131]Kostas et al. 2021-Quadratic-----86.70>10,000205543
[124]Bagchi et al. 20224.56–23.55 MQuadratic-35–80 epochs---89.64101285543
[120]Xie et al. 2022-------83.31109643333
[128]Altaheri et al. 2023115.2 K---Small--85.389221112
[132]Yang et al. 202386 M--15 epochs---61.70271284432
[119] Li et al. 2023-------96.65934333
[125]Song et al. 2023--0.27---GPU95.30933322
[115]Wan et al. 2023Avoids huge complexity------92.751263223
[121]Si et al. 2023Hybrid: slightly lower----14 s-42.506304333
[127]Gong et al. 2023-------95.3015623333
[116]Ma et al. 2023---200 epochs---83.919223334
[114]Omair et al. 2024---100 epochs---86.98933333
[126]Si et al. 2024-----14 s
Optimal
-67.8032283433
[117]Zhao et al. 202424.9–25.7 K-----RTX309097.81932223
[133]Jiang et al. 20245.8–369 M--Fine tuning costlyCostly1 s-82.58>140195555
[118]Liu et al. 2024-------74.2316322223
[122]Yao et al. 2024-----3 s-96.9515323323
[134]Li et al. 2024---15 epochs--RTX4090-3561283222
[123]Lu et al. 2024-------98.5715624333
[129]Nguyen et al. 2024---Up to 5000 epochs---83.419223335
[130]Cheng et al. 2024Linear complexity-0.0043200–300 epochs-2–17 s2080Ti99.6715322114
Table 13. Recurrent-based models’ metrics.
Table 13. Recurrent-based models’ metrics.
Ref.Author/YearParameters
(K/M)
MACs/FLOPs
(M/G)
Latency (s)Training Time
Epochs
(s/m/h)
Memory FootprintEpoch Length
(Segment)
(ms/s)
GPU/TPU/CloudAcc. (%)Sample SizeChannelsComplex.Comp. CostOper. CostTrain. Cost
[144]Kumar et al. 2019-------97.5733144242
[141]Puengdang et al. 2019---28.5 m
30–50 epochs
---91.442063231
[142]Zheng et al. 2020-------66.81101283453
[145]Chakladar et al. 2021---870 m
150 epochs
---98.5770144335
[138]Wilaiprasitporn et al. 2015---Fast---99.903254222
[135]Zhang et al. 2018--<1Increased---99.898144134
[139]Sun et al. 2019--0.065Long--GPU99.5810944114
[136]Zhang et al. 2020--0.39----99.007145253
[143]Jin et al. 20211.6 K (vs. 1.28 M)---Reduced--99.502075122
[140]Chakravarthi et al. 2022-------98.002044313
[137]Balci et al. 2023---Long---99.97109484344

5.2. Comprehensive Weighted Sum Model

We used the previously collected metrics and cost proxies to implement three weighted sensitivity analysis (WSA) scenarios (Table 14) and provide comparative rankings under specific operational priorities. We applied five criteria (Acc. vs. all four cost proxies) to ensure a full picture for each ranking. We focus on three scenarios to illustrate the overall classification efficiency.
Table 14. WSA Scenarios.
We used these five criteria weights (sum to 100) for all DL categories (Table 15).
Table 15. WSA Criteria Weights.
We used the normalized five criteria to calculate the WSA utility score using the following formula:
S i = j = 1 5 w j · n i j
where S i is the utility score for model i, w j is the weight of criterion j, and n i j is the normalized value of model i on criterion j.
We then plotted the WSA utility scores using heatmaps to show which models are the most robust to changing weights. The heatmaps illustrate the strengths of the networks across the three scenarios. Lighter green colors indicate higher efficiency, and darker blue colors indicate lower efficiency.

5.2.1. CNN Heatmap Analysis

Across 40 studies (Figure 5), a few studies demonstrate consistently high efficiency by scoring well in all three scenarios. These models [79,80] show the highest scores and are the highest ranked for the S1 and S3 scenarios. Their versatile performance is driven by very high accuracy and a near-perfect score for the Train. Cost (very fast fine tuning/convergence). The studies in [99,100] also rank highly in all scenarios. They excel because of their Oper. Costs (three channels) and high Acc. Ref. [100] stands out with the highest score for S2. The Scenario column highlights models prioritized for hardware constraints. These models [12,93] jump in rank compared to the others due to their low Complex. (tiny size). Ref. [94] is an example of structural efficiency specialization. Despite the overall low scores, it has a distinctively higher score in S2 compared to S1 and S3. This shows a clear trade-off, sacrificing speed and Acc. for model compactness. Models clustered at the bottom [81,92,106] show low scores across all columns. This indicates a poor balance among the five criteria, often due to low Acc. ([92] at only 63.75%) or very high resource requirements, such as long Train. Costs, long epoch lengths, or high computational demands.
Figure 5. Heatmap depicting CNNs’ WSA scores [11,12,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107].
The WSA analysis quantifies the trade-offs of DL model design for EEG tasks.
  • Acc. vs. Oper. Cost: In S1, a high weight on Oper. Cost means accepting constraints like low channel counts or very short epoch lengths. The most balanced models [79,80] achieved very high Acc. with very short epoch lengths (1 s) and efficient fine tuning. Ref. [104] scores highly in S1, but its 7 s latency shows a poor speed/operational trade-off, reducing its score despite its good Acc.
  • Acc. vs. Complex.: In S2, a high weight for Complex. means sacrificing peak Acc. or longer training. The model in Ref. [12] is designed to be ultra-compact (1.066 k parameters), resulting in a high score for S2 but with modest Acc. of 91.00% and a high number of training epochs (500).
  • Acc. vs. Training Cost: A heavy Train. Cost is necessary for a well-performing model. Studies like [81,83] have the lowest scores due to long training, which means that a high Train. Cost does not guarantee high operational efficiency. However, models like that in [97], with a high number of training epochs (1000), achieved excellent latency, showing a trade-off between training effort and post-deployment speed.

5.2.2. Transformer Heatmap Analysis

The heatmap (Figure 6) represents the overall efficiency of the 13 Transformer-based models across the three scenarios. The consistent leader [38] remains the top-ranked model across all scenarios, confirming its robust efficiency due to high Acc. and minimal resource requirements. The models in [39,41,42], ranked second, third, and fourth, are consistently the most efficient. For Refs. [41,42], the high scores arise from using raw EEG and tiny architectures, resulting in high Acc., low Complex., and low Oper. Costs. Ref. [42] scores highest in the S1 scenario, highlighting its speed. Ref. [39] scores highest in the S2 scenario, pointing out its small parameter count. The lowest-scoring models [45,46] are significantly less efficient. This is due to large parameter counts (101 M for [46]) or high memory requirements (>24 GB for [45]). Overall, the relative rankings are highly consistent across all three scenarios, which means that the underlying efficiency trade-offs of the models are fundamental to their designs and less sensitive to the specific weight distribution.
Figure 6. Heatmap depicting Transformers’ WSA scores [34,38,39,40,41,42,43,44,45,46,108,109,111].
The WSA rankings reveal critical trade-offs between model Acc. and resource costs.
  • Acc. vs. Oper. Cost: This trade-off is critical in S1, as it determines whether an Acc. gain justifies added preprocessing Complex. and slowness. Some studies [38,41,42] use raw EEG signals and achieve the best overall performance with a minimal Oper. Cost. Ref. [38] best embodies an efficient lightweight solution that balances Complex., Oper., and Comp. Costs. In contrast, Ref. [43] combined raw temporal and spectral (PSD) features, which resulted in a slight increase in Acc. but incurred a higher Oper. Cost due to the added computation.
  • Acc. vs. Complex.: This trade-off is central for S2, where the model size and computational load are minimized. The tiny model in [39] had few parameters and obtained the best Complex. This shows that attention mechanisms can be effective at low parameter counts. This model trades Acc. (84.26%) for very high efficiency. On the other hand, Ref. [46] demonstrates a large-model penalty, with millions of parameters and high Complex. While its size boosts its performance via scalability, the resulting high Comp. Cost cancels out any Acc. benefits.
  • Acc. vs. Training Cost: This trade-off reflects the computational and time resources required to train a model, heavily influencing the all-rounder efficiency. The studies in [43,45] represent the resource-heavy end of the spectrum, both requiring 1000 training epochs. This high demand for training time indicates that these models need a high development cost to reach high Acc. In contrast, Ref. [109] achieved fast convergence, completing training in only 20 epochs. This reflects a highly efficient training process, balancing a low development cost with moderate Acc. (76.83%).

5.2.3. CNN–Transformer Heatmap Analysis

The heatmap (Figure 7) demonstrates that a model’s efficiency determines its overall rank, rather than slight variations in the criteria weights. The models in [118,130] are consistently the most efficient, with the highest scores (ranking first and second, respectively). They offer the best overall efficiency trade-offs, as their low Complex., Comp. Cost, and Oper. Cost offset any reduction in performance. Thus, they are the most practical choices for resource-limited applications. Notably, Ref. [118] ranks first across all scenarios, demonstrating its superior and robust efficiency trade-off. Ref. [130] is consistently the second-best performer across all scenarios. Conversely, the least efficient models, including [125,133] and the two large-scale models [115,131] with the lowest scores, rank 17th–21st. Irrespective of their potential for high Acc., these models are resource-intensive and are poorly suited for edge environments or real-time systems. The stability in the relative rankings of the models across all three weighting scenarios means that small and fast models are consistently preferred in this efficiency-focused analysis. Finally, models such as those in [114,126] occupy the mid-range, offering a moderate trade-off. This makes them good general-purpose options when computational constraints are not a defining factor.
Figure 7. Heatmap depicting CNN–Transformer hybrids’ WSA scores [113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133].
The WSA analysis points out the following trade-offs.
  • Acc. vs. Total Cost Efficiency: The model in [118] demonstrates the maximum trade-off in favor of efficiency. It achieves good but not the best raw Acc. of 85.38%, which is compensated for by a zero-cost profile across Complex., Comp. Cost, and Oper. Cost. The model’s minimal resource footprint makes it the ideal edge solution.
  • Acc. vs. Oper. Cost: The model in [130] represents the most desirable trade-off. It provides peak performance across the dataset with the highest raw Acc. of 99.67%, while simultaneously demonstrating minimal latency due to its zero-cost Oper. Cost profile. This secures its status as the optimal real-time solution.
  • Acc. vs. Complex. and Oper. Cost: The model in [129] clearly prioritizes performance, as shown by its near-best raw Acc. of 98.08%. However, this high performance comes at the cost of its resource profile, with only average performance in Complex. and Oper. Cost. It can be chosen only when high Acc. is mandatory and the system can tolerate its high resource utilization. Overall, it is less efficient than the two top models [118,130].
  • Acc. vs. Complex. and Comp. Cost: The model in [133] demonstrates a poor trade-off. It yields modest raw Acc. of 70.11% with the maximum resources across the Complex. and Comp. Costs. It proves the diminishing returns of architectural scaling that do not translate into superior performance. This is the least practical choice in constrained environments.

5.2.4. Recurrent Hybrid Heatmap Analysis

The heatmap (Figure 8) shows the following insights for 11 recurrent hybrid models. The system in [139] is the top performer for both the S1 and S3 scenarios due to its high Acc., minimal channel use, and fast latency. Ref. [141] is the best for S2, driven by its excellent Complex. and Train. Cost scores. Systems like that in [143] score highly in S1 but low in S2 due to high Complex. scores. Ref. [142] consistently shows the lowest scores due to its very low Acc. and high Oper. Cost (128 channels). The models in [136,137] rank poorly (10 and 8, respectively) despite their high Acc. This is due to their heavily penalized Complex. and Oper. Costs in the WSA.
Figure 8. Heatmap depicting recurrent hybrids’ WSA scores [135,136,137,138,139,140,141,142,143,144,145].
The WSA reveals specific efficiency trade-offs among the top-ranked systems.
  • Acc. vs. Oper./Comp. Cost: In S1, the model in [139] ranks first and achieves 99.58% Acc. by minimizing resource use during operation. It secures efficiency by using only four channels and has the fastest testing latency at 1 s. This performance comes at the cost of a long training time due to its complex recurrent architecture. Ref. [140] ranks second and achieves 98.00% Acc. and uses four channels, but its Comp. Cost score is lower because of its latency of 1.7 s.
  • Acc. vs. Complex./Train. Cost: In S2, the model in [141] ranks first due to having the lowest Acc. among the top tier (91.44%) in exchange for being the most resource-efficient system. It requires only minutes of training time. In contrast, Ref. [139] ranks second and maintains 99.58% Acc. but its recurrent architecture results in high Complex., increasing its memory footprint.
  • Acc. vs. Total System Cost: In S3, Ref. [139] ranks first, achieving 99.58% Acc. and combining this with superior operational efficiency (four channels and 1 s latency). Its score confirms that its fast low-channel operation outweighs the penalty of its long initial training time. Ref. [138] ranks second and achieves high Acc. (99.17%) and reliable performance across all cost metrics, with six channels and 1.25 s latency. This allows it to avoid the extreme cost trade-offs of the rank-1 models.

5.3. The Efficiency Frontier

This section provides visual insight into the performance vs. efficiency trade-off and identifies architectural designs that we should consider as benchmarks for efficiency. These non-dominated models (Pareto principle) achieve the best performance-to-cost ratios. We used the models’ Acc. vs. WSA S3 scores. For selection, we selected the five top-performing models in the all-rounder scenario from each DL architectural category, while highlighting whether any model is the highest scorer in any scenario (S1, S2, S3).

5.3.1. Trade-off Analysis

The scatterplot (Figure 9) offers key insights into the trade-offs between model Acc. and the all-rounder WSA scores. The analysis focuses on the concept of Pareto optimality, which is represented by the solid black line connecting the non-dominated models. These models achieve the best possible combination of high Acc. and high S3 utility compared to all other models. Their WSA scores cannot be improved without decreasing their Acc. and vice versa. The most efficient models are those lying directly on or very close to the black line. The model in [38], with the highest WSA score and Acc., and the CNN model in [79], with a slightly lower WSA score, define the extremes of the most efficient set. The hybrid CNN–Transformer models [118,130] and the recurrent hybrid [139] also lie at this frontier. The curve generally slopes upward and to the right, illustrating the positive correlation between the two metrics: higher Acc. generally comes with a higher S3 score. However, moving along the frontier reveals the increasing cost of improvement. The frontier flattens out as we move toward 100% Acc. (approached by [80] at 99.98%), indicating that achieving the final few percentage points of Acc. requires significant sacrifices or yields diminishing returns in the S3 score. All points that fall below the Pareto frontier are dominated. This means that there is at least one model on the line that is superior in both Acc. and the WSA score or equal in one and superior in the other. For instance, the CNN model in [99] is dominated by the model in [79]. These dominated models are inherently suboptimal choices for an all-rounder system.
Figure 9. Acc. vs. S3 WSA scores.

5.3.2. Architectural Insights

The plot reveals performance clusters among the different architectural groups. The Transformer architecture shows the strongest potential for S3 utility, with the absolute highest WSA score [38]. This shows that, while maintaining high Acc., these models are particularly efficient across the WSA criteria (latency, energy efficiency). The CNN group dominates the high-Acc., mid-range utility segment. The highest Acc. (99.98%) belongs to the CNN model in [80]. This architecture is a solid choice when pure Acc. is the priority, even if the S3 score for [80] is slightly lower than that of the top Transformer [38]. The CNN–Transformer and recurrent hybrid groups show models scattered across the mid-to-high Acc. range. They both offer competitive, non-dominated representations on the Pareto frontier ([118,130] for CNN–Transformers and [139] for recurrent). The CNN–Transformer models have a balance among their individual constituent models’ strengths.

5.3.3. Scenario Performance Insights

The markers indicate the models’ statuses across the three scenarios (S1, S2, S3). The top-ranked performer in all three scenarios (S1, S2, and S3) is the Transformer [38]. This is the choice for systems requiring multi-criteria excellence. Another multi-champion model is that in [118]. The highest Acc. is seen for the CNN in [79]; it performs well in S1 and S3, but not S2. The recurrent model in [139] also shares this champion status.

5.4. The Pareto Frontier

In this step, we selected the three true Pareto champions from the previous step. Since the three models were from only two architectural categories, we added the best high-performing representatives for the two remaining architectural groups to give better comparative insights. The analysis is based on a scaled score, where 4 is the best outcome (highest Acc./lowest cost) and 1 is the worst outcome (lowest Acc./highest cost). The models are sorted by the calculated polygon area (as an overall utility score, Figure 10).
The model in [38] achieves the highest overall utility (largest area) by dominating the cost metrics. It scores the maximum in three out of four cost metrics (Complex., Comp. Cost, and Oper. Cost). This means that it is the least costly model in these areas. Its high score in cost-effectiveness is offset by only a moderate score (2) in the Train. Cost. Its Acc. score is the lowest of the top three models (slightly above 3), but this dip in performance is outweighed by its operational efficiency. In applications where the runtime cost (Complex., computation, and operation) is the primary constraint, the model in Ref. [38] is the optimal choice as it delivers high Acc. with minimal operational expenses. The models in [79,80] trade higher Oper. Costs for marginal gains in Acc. The model in [80] has the highest Acc. (4, representing 99.98%), but scores 1 for its Oper. Cost, making it the most expensive to run among this set. Its costs are unbalanced, with a low Train. Cost but high Complex. and Oper. Costs. The model in [79] is a compromise between those in [38,80]. It maintains very high Acc. and a low Train. Cost like [80], but it slightly improves in its Complex., Comp. Cost, and Oper. Cost, scoring 3 in each. The two models in [130,139] are dominated by other models in key axes, resulting in lower overall utility. Ref. [130] achieves high scores in three metrics (Acc., Comp. Cost, Oper. Cost) but is penalized by its score of 1 for the Train. Cost. This model is best only if the Train. Cost can be completely ignored, which is rarely the case. The model in [139] has the worst Complex. score, while it maintains high Comp. Costs and Oper. Costs (4). Its low Complex. and Train. Cost decrease its overall area, positioning it as the option with the lowest overall utility. The high-cost performance of the Transformer architecture [38] contrasts with the high Acc. of the CNN architectures [79,80], pointing out the architectural trade-off that exists at the top of the performance frontier.
Figure 10. Comparison of Pareto frontier optimum models.
Figure 10. Comparison of Pareto frontier optimum models.
Ai 07 00050 g010

5.5. Performance vs. Total Real Cost Analysis

In this step, we performed an Acc. vs. total real cost analysis. The total real cost is defined as follows:
T o t a l   R e a l   C o s t = C o m p l e x . +   C o m p .   C o s t + O p e r .   C o s t + T r a i n .   C o s t 4
where Complex., Comp. Cost, Oper. Cost, and Train. Cost are defined in Section 5.1.
The plot below (Figure 11) maps the performance against the resource consumption for all 85 models. The objective is to maximize Acc. while minimizing the cost value. The ideal region is the top-left corner (high Acc., low cost). The data points are skewed towards the high Acc. range (90–100%) across costs between 1.5 and 4.0. This implies that achieving high Acc. is generally feasible, but doing so with the minimum cost is difficult. CNNs form the largest group, with a wide distribution, achieving some of the lowest costs and highest Acc. Recurrent hybrids and Transformers cluster within the high-Acc. band near or within the Pareto frontier region, indicating strong performance-to-cost ratios. CNN–Transformers have a higher average cost, with the densest cluster around cost = 3.25.
Figure 11. Acc. vs. total real cost and Pareto frontier.
The Pareto frontier optimum (black dashed line) represents the set of optimal, non-dominated solutions. Any model below this line is inferior to at least one point on the frontier in terms of both cost and Acc. The first steep segment of the frontier (cost of 1.0–2.0) defines the most critical trade-off. The Transformer in [38], with a 1.25 cost and nearly perfect Acc. (99.40%), presents an excellent balance. The CNN in [80] at cost = 2.00 sets the performance ceiling at 99.98%. The CNN–Transformer in [130], with a 2.0 cost and 99.67% Acc., is close in Acc. with a similar cost. The analysis shows that a greater cost than 2.00 does not yield higher Acc. Beyond a cost = 2.00, the Pareto frontier flattens out almost completely, forming a dense optimal plateau in the 99.0–99.9% range. Models like the CNN in [73] at cost = 3.25 (99.90%) and the recurrent hybrid in [137] at cost = 3.75 (99.97%) are technically on the frontier (as they are the best options at their specific cost points), but they demonstrate a clear case of diminishing and negative returns. These models offer no significant Acc. advantage over the CNN at cost = 2.00 but require up to twice the resource investment. In summary, the most effective strategy is to target models in the cost range of 1.25 to 2.00, as this provides the highest performance return per unit of cost.

7. Findings and Discussion

To give a pictorial summary of the research insights from the reviewed papers, we plot an alluvial diagram, as shown in Figure 14. The vertical volume represents the weight of each category in the reviewed studies. Sections include years of publication, architectural categories (CNN, Transformer, CNN–Transformer, recurrent hybrids), domains of application (Bio, MI, EP, ER, GF), performance (very low–very high), and efficiency (very low–very high). The diagram gives a picture of the research trends in terms of each section’s variables. Insights and gaps are visually clear. In this section, we discuss these insights and future directions.
Figure 14. A recapitulative alluvial diagram.

7.1. Domains of Application

The table below (Table 16) summarizes the domains and tasks that the reviewed papers focus on.
Table 16. Domains of application.

7.1.1. Trends over Time—Architectures

CNNs were the most dominant architectures in comparison to recurrent hybrids in the 2015–2018 and 2019–2020 periods. In 2021–2022, there was a shift toward more diverse and complex architectures (Transformers, CNN–Transformer hybrids). These complex architectures have become a major focus in 2023–2025. The inclusion of recurrent-based models has become less frequent, largely being replaced by newer architectures. Despite the rise of Transformers, CNNs remain relevant and a frequently used architecture across all time periods. This reconfirms their foundational utility.

7.1.2. Architecture–Domain Mapping

CNNs are versatile and show the broadest application across all domains, including Bio, EP, MI, and GF models. The Bio domain demonstrates the highest number of studies across all architectures and time periods, followed closely by EP and MI. Both ER and GF emerge only in the 2021–2022 and 2023–2025 periods and are the major foci of Transformer and CNN–Transformer models.

7.1.3. Performance–Efficiency Trade-Offs

Most of the research across all years, architectures, and domains focuses on achieving very high or high performance. This indicates that maximizing Acc. is the primary goal. A large portion of very high-performing studies are linked to low or very low efficiency (mainly 2019–2020). This highlights the common trade-off whereby complex and computationally expensive models are needed for top performance. The newer Transformer architectures (particularly in the 2021–2022 period) frequently link very high performance with very low or low efficiency. This implies that early attempts to use them for high performance came at a high Comp. Cost.

7.1.4. Notable Trends in the Latest Period

The 2023–2025 period shows a promising trend of achieving high or very high performance and efficiency with either Transformer or CNN–Transformer models [45,46,119,123]. This reveals that these architectures have matured to be both powerful and efficient. In the same period, CNNs have frequently been associated with very low efficiency despite their very high performance [93,100,103]. This indicates that even established architectures are being pushed to their limits in terms of performance, sometimes sacrificing efficiency. Moreover, in the 2023–2025 period, Transformer and CNN–Transformer models [111,119,123] have successfully balanced very high performance with high efficiency within the ER domain.

7.2. Future Directions

Based on the alluvial diagram, below are our recommendations for future research and underexplored development directions in the EEG field.

7.2.1. Bridging the Performance–Efficiency Gap

The data consistently show a trade-off where the highest-performing models often have low efficiency. Future work should focus on the following:
  • Developing novel Transformer and CNN–Transformer architectures that maintain very high performance and improve efficiency beyond media. This could be achieved by (1) exploring knowledge distillation from complex and less efficient models to smaller and faster ones; (2) implementing sparsity techniques in Transformer layers; and (3) researching hardware-aware network designs specific to BCI/EEG applications.
  • Given the stability of foundational CNNs, revising and optimizing lightweight and high-performing CNN variants that are deployable on low-power devices.

7.2.2. Expanding Domain Specialization and Generalization

A strong focus remains on Bio, EP, and MI. Future directions are as follows:
  • Increase research attention to the ER domain, which has emerged in the most recent period. The latest models show promising high performance/high efficiency in this area, revealing that it is an impactful research domain.
  • Invest in the GF domain, using Transformer and CNN–Transformer architectures to reduce the need for domain-specific models. The goal should be to build models that can achieve high performance across multiple domains (Bio, MI, EP) without extensive retraining.
  • Explore novel and niche BCI domains outside personalized medicine with newer architectures to see if the performance gains translate to these areas.

7.2.3. Deeper Analysis of Architecture Components

The emergence of hybrid models signifies that the combination of components leads to efficient and well-performing models. Future directions include the following:
  • Perform ablation studies on hybrid CNN–Transformers by isolating the contributions of the CNN part vs. the Transformer part across different domains. This will determine the optimal split to maximize performance and efficiency.
  • Standardizing performance and efficiency metrics since the use of low to high scales is relative. Future research should adopt quantitative standardized metrics for reporting to allow for the rigorous and fair comparison of architectures.

7.2.4. Longitudinal Studies and Reproducibility

  • It is necessary to conduct studies to track architectural lifecycles (how long a design remains relevant)—for instance, investigate whether the efficiency gains seen in early CNNs can be replicated with modern training techniques on new architectures.
  • Since the initial focus has been on peak performance, future work should prioritize measuring robustness and generalization. Very high performance is less valuable if it is not reproducible by other researchers.
  • It is important to develop foundational models that are pretrained on low-density EEG to support robust edge deployment.

7.2.5. Generative Models as the Next Frontier

The new generative models are primarily used for data augmentation in EEG [46]. However, their encoder components may yield compact latent-space embeddings for more efficient FE and classification [45]. These models (Section 4.2.2 and Section 4.3.5) address the data scarcity and subject-specific calibration challenges for better efficiency. Pretraining models, as in [131,133], for long periods using multi-dataset EEG will create transferable representations. This shifts the computational challenges from the end-user to the large-scale model developer. This practice is still not well established.

7.3. Limitations

Our review of 114 DL EEG classification studies shows consistent methodological gaps in reporting (see the blue-shaded areas in Table 6, Table 7, Table 8 and Table 9). Most papers focus on Acc. and neglect key details like computational complexity, memory use, and latency. It becomes challenging to compare the performance and efficiency of different models and check their practicality on wearables or embedded systems without this information. Only a few studies, such as [46,94,106,132], have reported these metrics. To ensure a unified comparison despite these reporting gaps, we implemented specific dataset extraction decisions, such as choosing the highest-reported accuracy and lowest channel counts. Our proxy metrics (Complex. Comp. Cost, Oper. Cost, and Train. Cost) represent normalized approximations based on a mixed evaluation of quantitative data and qualitative author claims regarding model efficiency and architectural design. These scores should be viewed as expert-estimated cost dimensions rather than absolute hardware benchmarks. This reflects the constraints of non-standardized reporting. To address these gaps and support meaningful reproducibility and deployable EEG systems, we recommend that future research report the following parameters:
  • Performance metrics—Acc., EER, AUC, CRR, etc.;
  • Model size—parameters in K or M;
  • Computational complexity—MACs or FLOPs per inference window;
  • Memory footprint—memory usage at inference, including weights and activations;
  • Inference latency—measured on embedded, mobile, desktop, or server systems;
  • Training details—training time, number of epochs, batch size, and hardware used;
  • Validation protocol—within/cross-subject or cross-session and number of subjects;
  • Operational setup—number of EEG channels, epoch duration, and calibration/enrollment time per subject/session.

8. Conclusions

Recent research shows that careful architectural design may improve FE, advancing EEG-based classification. Across CNNs, RNNs, Transformers, and hybrid designs, the surveyed studies reveal progress toward models that are both accurate and computationally practical. CNNs are strong at extracting spatiotemporal features, but they struggle with long-term temporal dependencies. Transformers are better at capturing these dependencies but require large datasets at a higher Comp. Cost. Hybrid networks are more promising since they attempt to capture the best of both, using CNNs for efficient local FE and Transformers for powerful global context modeling. A recurring theme is that design choices—namely temporal–spatial fusion, attention mechanisms, and lightweight architectures—can influence both efficiency and performance. Despite these gains, challenges persist in reproducibility, cross-subject generalization, and adapting models to noisy or resource-limited environments. Addressing these issues will require larger and diverse datasets, standardized evaluation protocols, and alignment between model design and practical deployment. Integrating approaches such as self-supervised learning, multimodal fusion, adaptive architectures, or generative models may help in creating EEG systems that are accurate, scalable, interpretable, and practical for real-world applications.
The objective of this paper was to contribute to research on EEG-based classification by comparing DL models and their FE strategies. We reviewed 114 papers overall and synthesized 88 studies from the past decade, focusing on trade-offs between Acc. and computational efficiency. The distinctive contribution of this review is its efficiency-oriented, scenario-based comparative perspective. Unlike prior surveys focusing mainly on classification accuracy for system evaluation, we applied WSA to assess models under different computational and deployment constraints. This approach identified the optimal architectures for real-time interaction vs. edge deployment environments. In doing so, we demonstrate how design choices such as input representation, model architectures, fusion techniques, and training strategies influence performance. Our aim is to provide researchers with insights to develop lightweight but reliable EEG pipelines. We also hope that this work encourages future efforts toward creating standardized, explainable, and robust approaches that can help to move EEG-based systems from the lab to practical BCIs and neurotechnology.

Author Contributions

Conceptualization, L.H.; methodology, L.H.; software, L.H.; validation, L.H. and J.R.; formal analysis, L.H.; investigation, L.H.; resources, L.H.; data curation, L.H.; writing—original draft preparation, L.H.; writing—review and editing, L.H., J.R. and A.N.; visualization, L.H.; supervision, J.R.; project administration, J.R. and R.V.; funding acquisition, R.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/included tables. Further inquiries can be directed to the corresponding author(s).

Acknowledgments

This research was undertaken, in part, thanks to funding from the David Sobey Retailing Centre, Sobey School of Business. We acknowledge the support of the NeuroCognitive Imaging Lab (NCIL) at Dalhousie University. This research was conducted as part of Saint Mary’s University Computer Engineering Research Lab (CERL).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
Acc.Accuracy
ANNArtificial Neural Network
ARAutoregressive
AUCArea Under the Curve
BCIBrain–Computer Interface
BLSTM-NNBidirectional Long Short-Term Memory Neural Network.
Borda CountRank-Based Aggregation Method
CARCommon Average Referencing
CCACanonical Correlation Analysis
CNNConvolutional Neural Network
CRRCorrect Recognition Rate
CSCosine Similarity
CSPCommon Spatial Patterns
CWTContinuous Wavelet Transform
DBNDeep Belief Network
DEDifferential Entropy
DLDeep Learning
DMLDeep Metric Learning
ECGElectrocardiography
EEGElectroencephalography
EEREqual Error Rate
EOGElectrooculography
FARFalse Acceptance Rate
FBCSPFilter Bank Common Spatial Patterns
FCFully Connected layer
FEFeature Extraction
FLOPsFloating-Point Operations Per Second
FRRFalse Rejection Rate
FuzzyEnFuzzy Entropy
GANGenerative Adversarial Network
GATGraph Attention Network
GETGenerative EEG Transformer
GFCCGammatone Frequency Cepstral Coefficient
GNNGraph Neural Network
GRUGated Recurrent Unit
GSOGram–Schmidt Orthogonalization
GSRGalvanic Skin Response
GTGated Transformer
HTERHalf Total Error Rate
ITRInformation Transfer Rate
LSTMLong Short-Term Memory
MACsMillions of Multiply–Accumulate Operations
Max RuleA Decision Fusion Strategy Selecting the Maximum Score
MFCCMel-Frequency Cepstral Coefficient
MIMotor Imagery
MLMachine Learning
MLPMultilayer Perceptron
MSEMean Squared Error
NNNearest Neighbor
PLVPhase Locking Value
PSDPower Spectral Density
RFRandom Forest
RMSpropRoot Mean Square Propagation
RNNRecurrent Neural Network
R/S AnalysisRescaled Range Analysis
SASelf-Attention
SAFESpatial Attention Feature Extractor
SESqueeze-and-Excitation
SMOTESynthetic Minority Oversampling Technique
SNNSiamese Neural Network
SPASpectral Power Analysis
STESpatial Transformer Encoder
STFTShort-Time Fourier Transform
TTNNTensor-Train Neural Network
TAFETemporal Attention Feature Extractor
TTETemporal Transformer Encoder
TCNTemporal Convolutional Network
ViTVision Transformer
WTWavelet Transform
XGBXGBoost
The following dataset abbreviations are used in this manuscript:
BCI Competition IV-2aMotor Imagery EEG Benchmark Dataset (22 channels, 9 subjects).
CD FTACross-Dataset Fine-Tuning Adaptation. Benchmarking for adapting EEG models across datasets.
DEAPDataset for Emotion Analysis using EEG and peripheral Physiological signals during video watching.
DREAMERDataset for emotion analysis using EEG and ECG while subjects watched affective videos.
EEGMMIDBLarge-scale PhysioNet dataset for EEG Motor Movement/Imagery Database.
HGDGamma dataset.
ImageNetLarge-scale visual dataset (1.2 M images, 1000 categories), widely used for pretraining.
ImageNet 21kExtended ImageNet with 21,000 categories for large-scale vision pretraining.
JFT 300 MGoogle’s large-scale proprietary dataset of 300 M images, used for pretraining.
MOABBMother of All BCI Benchmarks. A standardized benchmarking framework for EEG datasets.
MPEDMultimodal Physiological Emotion Database. EEG and physiological modalities for emotion recognition.
PhysioNet EEGCollection of EEG datasets on PhysioNet for various clinical and cognitive studies.
SEED-IVSJTU Emotion EEG Dataset (four-class emotion recognition for 15 participants).
SEED-VSJTU Emotion EEG Dataset (five-class emotion recognition for 16 participants).
SMR BCISensorimotor Rhythm Brain–Computer Interface Dataset. Longitudinal MI dataset (600 h, 62 participants).

References

  1. Teplan, M. Fundamentals of EEG Measurement. Meas. Sci. Rev. 2002, 2, 1–11. Available online: https://www.measurement.sk/2002/S2/Teplan.pdf (accessed on 15 September 2025).
  2. Craik, A.; He, Y.; Contreras-Vidal, J.L. Deep Learning for Electroencephalogram (EEG) Classification Tasks: A Review. J. Neural Eng. 2019, 16, 031001. [Google Scholar] [CrossRef] [PubMed]
  3. Lin, F.; Cho, K.W.; Song, C.; Xu, W.; Jin, Z. Brain Password: A Secure and Truly Cancelable Brain Biometrics for Smart Headwear. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, Munich, Germany, 10–15 June 2018; ACM: New York, NY, USA, 2018; pp. 296–309. [Google Scholar] [CrossRef]
  4. Yu, Y.-C.; Wang, S.; Gabel, L.A. A Feasibility Study of Using Event-Related Potential as a Biometrics. In Proceedings of 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 16–20 August 2016; IEEE: New York, NY, USA, 2016; pp. 4547–4550. [Google Scholar] [CrossRef]
  5. Singh, A.K.; Krishnan, S. Trends in EEG Signal Feature Extraction Applications. Front. Artif. Intell. 2023, 5, 1072801. [Google Scholar] [CrossRef] [PubMed]
  6. Ramoser, H.; Muller-Gerking, J.; Pfurtscheller, G. Optimal Spatial Filtering of Single Trial EEG during Imagined Hand Movement. IEEE Trans. Rehab. Eng. 2000, 8, 441–446. [Google Scholar] [CrossRef]
  7. Subasi, A. EEG Signal Classification Using Wavelet Feature Extraction and a Mixture of Expert Model. Expert Syst. Appl. 2007, 32, 1084–1093. [Google Scholar] [CrossRef]
  8. Roy, Y.; Banville, H.; Albuquerque, I.; Gramfort, A.; Falk, T.H.; Faubert, J. Deep Learning-Based Electroencephalography Analysis: A Systematic Review. J. Neural Eng. 2019, 16, 051001. [Google Scholar] [CrossRef] [PubMed]
  9. Takahashi, S.; Sakaguchi, Y.; Kouno, N.; Takasawa, K.; Ishizu, K.; Akagi, Y.; Aoyama, R.; Teraya, N.; Bolatkan, A.; Shinkai, N.; et al. Comparison of Vision Transformers and Convolutional Neural Networks in Medical Image Analysis: A Systematic Review. J. Med. Syst. 2024, 48, 84. [Google Scholar] [CrossRef]
  10. Sun, C.; Mou, C. Survey on the Research Direction of EEG-Based Signal Processing. Front. Neurosci. 2023, 17, 1203059. [Google Scholar] [CrossRef]
  11. Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep Learning with Convolutional Neural Networks for EEG Decoding and Visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef]
  12. Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A Compact Convolutional Neural Network for EEG-Based Brain–Computer Interfaces. J. Neural Eng. 2018, 15, 056013. [Google Scholar] [CrossRef]
  13. Vafaei, E.; Hosseini, M. Transformers in EEG Analysis: A Review of Architectures and Applications in Motor Imagery, Seizure, and Emotion Classification. Sensors 2025, 25, 1293. [Google Scholar] [CrossRef]
  14. Zhang, X.; Yao, L.; Wang, X.; Monaghan, J.; McAlpine, D.; Zhang, Y. A Survey on Deep Learning-Based Non-Invasive Brain Signals: Recent Advances and New Frontiers. J. Neural Eng. 2021, 18, 031002. [Google Scholar] [CrossRef]
  15. Fu, J. A Comparison of CNN and Transformer in Continual Learning. Master’s Thesis, KTH Royal Institute of Technology, Stockholm, Sweden, 2023. Available online: https://kth.diva-portal.org/smash/get/diva2:1820229/FULLTEXT01.pdf (accessed on 16 September 2025).
  16. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015; Available online: https://ora.ox.ac.uk/objects/uuid:60713f18-a6d1-4d97-8f45-b60ad8aebbce (accessed on 27 September 2025).
  17. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; IEEE: New York, NY, USA, 2015; pp. 1–9. [Google Scholar] [CrossRef]
  18. Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017; pp. 5987–5995. [Google Scholar] [CrossRef]
  19. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
  20. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
  21. Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. Available online: http://proceedings.mlr.press/v97/tan19a.html (accessed on 16 September 2025).
  22. Radosavovic, I.; Kosaraju, R.P.; Girshick, R.; He, K.; Dollar, P. Designing Network Design Spaces. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 10425–10433. [Google Scholar] [CrossRef]
  23. Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: New York, NY, USA, 2022; pp. 11966–11976. [Google Scholar] [CrossRef]
  24. Chen, G.; Zhang, X.; Zhang, J.; Li, F.; Duan, S. A Novel Brain-Computer Interface Based on Audio-Assisted Visual Evoked EEG and Spatial-Temporal Attention CNN. Front. Neurorobot. 2022, 16, 995552. [Google Scholar] [CrossRef] [PubMed]
  25. Mai, N.-D.; Hoang Long, N.M.; Chung, W.-Y. 1D-CNN-Based BCI System for Detecting Emotional States Using a Wireless and Wearable 8-Channel Custom-Designed EEG Headset. In Proceedings of the 2021 IEEE International Conference on Flexible and Printable Sensors and Systems (FLEPS), Manchester, UK, 20–23 June 2021; IEEE: New York, NY, USA, 2021; pp. 1–4. [Google Scholar] [CrossRef]
  26. Huang, J.; Wang, C.; Zhao, W.; Grau, A.; Xue, X.; Zhang, F. LTDNet-EEG: A Lightweight Network of Portable/Wearable Devices for Real-Time EEG Signal Denoising. IEEE Trans. Consum. Electron. 2024, 70, 5561–5575. [Google Scholar] [CrossRef]
  27. Borra, D.; Magosso, E. Deep Learning-Based EEG Analysis: Investigating P3 ERP Components. J. Integr. Neurosci. 2021, 20, 791–811. [Google Scholar] [CrossRef]
  28. Ramakrishnan, K.; Groen, I.I.A.; Smeulders, A.W.M.; Scholte, H.S.; Ghebreab, S. Deep Learning for EEG-Based Brain Mapping. bioRxiv 2017, 178541. [Google Scholar] [CrossRef]
  29. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Available online: https://papers.nips.cc/paper/7181-attention-is-all-you-need (accessed on 23 September 2025).
  30. Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A Survey of Transformers. arXiv 2021, arXiv:2106.04554. [Google Scholar] [CrossRef]
  31. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. OpenAI. 2018. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 17 September 2025).
  32. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
  33. Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. Available online: https://jmlr.org/papers/v21/20-074.html (accessed on 22 September 2025).
  34. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021; Available online: https://openreview.net/forum?id=YicbFdNTTy (accessed on 19 September 2025).
  35. Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. PaLM: Scaling Language Modeling with Pathways. J. Mach. Learn. Res. 2023, 24, 1–113. Available online: https://jmlr.org/papers/volume24/22-1144/22-1144.pdf (accessed on 21 September 2025).
  36. Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. Meta AI. 2023. Available online: https://ai.meta.com/research/publications/llama-open-and-efficient-foundation-language-models (accessed on 25 September 2025).
  37. Google Developers/Google DeepMind. Introducing Gemini: Google’s Most Capable AI Model Yet. Google Blog 6 December 2023. Available online: https://blog.google/technology/ai/google-gemini-ai/ (accessed on 20 September 2025).
  38. Arjun, A.; Rajpoot, A.S.; Raveendranatha, P.M. Introducing Attention Mechanism for EEG Signals: Emotion Recognition with Vision Transformers. In Proceedings of 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Guadalajara, Mexico, 1–5 November 2021; IEEE: New York, NY, USA, 2021; pp. 5723–5726. [Google Scholar] [CrossRef]
  39. Song, Y.; Jia, X.; Yang, L.; Xie, L. Transformer-Based Spatial-Temporal Feature Learning for EEG Decoding. arXiv 2021, arXiv:2106.11170. [Google Scholar] [CrossRef]
  40. Tao, Y.; Sun, T.; Muhamed, A.; Genc, S.; Jackson, D.; Arsanjani, A.; Yaddanapudi, S.; Li, L.; Kumar, P. Gated Transformer for Decoding Human Brain EEG Signals. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Guadalajara, Mexico, 1–5 November 2021; IEEE: New York, NY, USA, 2021; pp. 125–130. [Google Scholar] [CrossRef]
  41. Siddhad, G.; Gupta, A.; Dogra, D.P.; Roy, P.P. Efficacy of Transformer Networks for Classification of EEG Data. Biomed. Signal Process. Control 2024, 87, 105488. [Google Scholar] [CrossRef]
  42. Du, Y.; Xu, Y.; Wang, X.; Liu, L.; Ma, P. EEG Temporal–Spatial Transformer for Person Identification. Sci. Rep. 2022, 12, 14378. [Google Scholar] [CrossRef]
  43. Zeynali, M.; Seyedarabi, H.; Afrouzian, R. Classification of EEG Signals Using Transformer Based Deep Learning and Ensemble Models. Biomed. Signal Process. Control 2023, 86, 105130. [Google Scholar] [CrossRef]
  44. Omair, A.; Saif-ur-Rehman, M.; Metzler, M.; Glasmachers, T.; Iossifidis, I.; Klaes, C. GET: A Generative EEG Transformer for Continuous Context-Based Neural Signals. arXiv 2024, arXiv:2406.03115. [Google Scholar] [CrossRef]
  45. Lim, J.-H.; Kuo, P.-C. EEGTrans: Transformer-Driven Generative Models for EEG Synthesis. In Proceedings of the 13th International Conference on Learning Representations (ICLR 2025), Singapore, 24–28 April 2025; Available online: https://openreview.net/forum?id=ydw2l8zgUB (accessed on 18 September 2025).
  46. Wang, G.; Liu, W.; He, Y.; Xu, C.; Ma, L.; Li, H. EEGPT: Pretrained Transformer for Universal and Reliable EEG Representation Learning. In Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 10–15 December 2024; pp. 1–12. [Google Scholar]
  47. Saeidi, M.; Karwowski, W.; Farahani, F.V.; Fiok, K.; Taiar, R.; Hancock, P.A.; Al-Juaid, A. Neural Decoding of EEG Signals with Machine Learning: A Systematic Review. Brain Sci. 2021, 11, 1525. [Google Scholar] [CrossRef] [PubMed]
  48. Li, X.; Zhang, Y.; Tiwari, P.; Song, D.; Hu, B.; Yang, M.; Zhao, Z.; Kumar, N.; Marttinen, P. EEG Based Emotion Recognition: A Tutorial and Review. ACM Comput. Surv. 2023, 55, 1–57. [Google Scholar] [CrossRef]
  49. Prabowo, D.W.; Nugroho, H.A.; Setiawan, N.A.; Debayle, J. A Systematic Literature Review of Emotion Recognition Using EEG Signals. Cogn. Syst. Res. 2023, 82, 101152. [Google Scholar] [CrossRef]
  50. Vempati, R.; Sharma, L.D. A Systematic Review on Automated Human Emotion Recognition Using Electroencephalogram Signals and Artificial Intelligence. Results Eng. 2023, 18, 101027. [Google Scholar] [CrossRef]
  51. Mohammed, S.A.; Jasim, S.S.; Thamir, B.A.; Alabdel Abass, A. A Survey on EEG Signal Analysis Using Machine Learning. UTJES 2025, 15, 89–106. [Google Scholar] [CrossRef]
  52. Gatfan, K.S. A Review on Deep Learning for Electroencephalogram Signal Classification. J. AI-Qadisiyah Comp. Sci. Math. 2024, 16, 137–151. [Google Scholar] [CrossRef]
  53. Suhaimi, N.S.; Mountstephens, J.; Teo, J. EEG-Based Emotion Recognition: A State-of-the-Art Review of Current Trends and Opportunities. Comput. Intell. Neurosci. 2020, 2020, 1–19. [Google Scholar] [CrossRef] [PubMed]
  54. Rahman, M.M.; Sarkar, A.K.; Hossain, M.A.; Hossain, M.S.; Islam, M.R.; Hossain, M.B.; Quinn, J.M.W.; Moni, M.A. Recognition of Human Emotions Using EEG Signals: A Review. Comput. Biol. Med. 2021, 136, 104696. [Google Scholar] [CrossRef] [PubMed]
  55. Khare, S.K.; Blanes-Vidal, V.; Nadimi, E.S.; Acharya, U.R. Emotion Recognition and Artificial Intelligence: A Systematic Review (2014–2023) and Research Recommendations. Inf. Fusion 2024, 102, 102019. [Google Scholar] [CrossRef]
  56. Jafari, M.; Shoeibi, A.; Khodatars, M.; Bagherzadeh, S.; Shalbaf, A.; García, D.L.; Gorriz, J.M.; Acharya, U.R. Emotion Recognition in EEG Signals Using Deep Learning Methods: A Review. Comput. Biol. Med. 2023, 165, 107450. [Google Scholar] [CrossRef] [PubMed]
  57. Ma, W.; Zheng, Y.; Li, T.; Li, Z.; Li, Y.; Wang, L. A Comprehensive Review of Deep Learning in EEG-Based Emotion Recognition: Classifications, Trends, and Practical Implications. PeerJ Comput. Sci. 2024, 10, e2065. [Google Scholar] [CrossRef]
  58. Gkintoni, E.; Aroutzidis, A.; Antonopoulou, H.; Halkiopoulos, C. From Neural Networks to Emotional Networks: A Systematic Review of EEG-Based Emotion Recognition in Cognitive Neuroscience and Real-World Applications. Brain Sci. 2025, 15, 220. [Google Scholar] [CrossRef] [PubMed]
  59. Al-Saegh, A.; Dawwd, S.A.; Abdul-Jabbar, J.M. Deep Learning for Motor Imagery EEG-Based Classification: A Review. Biomed. Signal Process. Control 2021, 63, 102172. [Google Scholar] [CrossRef]
  60. Ko, W.; Jeon, E.; Jeong, S.; Phyo, J.; Suk, H.-I. A Survey on Deep Learning-Based Short/Zero-Calibration Approaches for EEG-Based Brain–Computer Interfaces. Front. Hum. Neurosci. 2021, 15, 643386. [Google Scholar] [CrossRef]
  61. Pawan; Dhiman, R. Machine Learning Techniques for Electroencephalogram Based Brain-Computer Interface: A Systematic Literature Review. Meas. Sens. 2023, 28, 100823. [Google Scholar] [CrossRef]
  62. Saibene, A.; Ghaemi, H.; Dagdevir, E. Deep Learning in Motor Imagery EEG Signal Decoding: A Systematic Review. Neurocomputing 2024, 610, 128577. [Google Scholar] [CrossRef]
  63. Moreno-Castelblanco, S.R.; Vélez-Guerrero, M.A.; Callejas-Cuervo, M. Artificial Intelligence Approaches for EEG Signal Acquisition and Processing in Lower-Limb Motor Imagery: A Systematic Review. Sensors 2025, 25, 5030. [Google Scholar] [CrossRef]
  64. Wang, X.; Liesaputra, V.; Liu, Z.; Wang, Y.; Huang, Z. An In-Depth Survey on Deep Learning-Based Motor Imagery Electroencephalogram (EEG) Classification. Artif. Intell. Med. 2024, 147, 102738. [Google Scholar] [CrossRef]
  65. Hassan, J.; Reza, S.; Ahmed, S.U.; Anik, N.H.; Khan, M.O. EEG Workload Estimation and Classification: A Systematic Review. J. Neural Eng. 2025, 22, 051003. [Google Scholar] [CrossRef] [PubMed]
  66. de Bardeci, M.; Ip, C.T.; Olbrich, S. Deep Learning Applied to Electroencephalogram Data in Mental Disorders: A Systematic Review. Biol. Psychol. 2021, 162, 108117. [Google Scholar] [CrossRef]
  67. Nwagu, C.; AlSlaity, A.; Orji, R. EEG-Based Brain-Computer Interactions in Immersive Virtual and Augmented Reality: A Systematic Review. Proc. ACM Hum.-Comput. Interact. 2023, 7, 1–33. [Google Scholar] [CrossRef]
  68. Dadebayev, D.; Goh, W.W.; Tan, E.X. EEG-Based Emotion Recognition: Review of Commercial EEG Devices and Machine Learning Techniques. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 4385–4401. [Google Scholar] [CrossRef]
  69. Klepl, D.; Wu, M.; He, F. Graph Neural Network-Based EEG Classification: A Survey. arXiv 2023, arXiv:2310.02152. [Google Scholar] [CrossRef]
  70. Ma, L.; Minett, J.W.; Blu, T.; Wang, W.S.-Y. Resting State EEG-Based Biometrics for Individual Identification Using Convolutional Neural Networks. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; IEEE: New York, NY, USA, 2015; pp. 2848–2851. [Google Scholar] [CrossRef]
  71. Mao, Z.; Yao, W.X.; Huang, Y. EEG-Based Biometric Identification with Deep Learning. In Proceedings of the 2017 8th International IEEE/EMBS Conference on Neural Engineering (NER), Shanghai, China, 25–28 May 2017; IEEE: New York, NY, USA, 2017; pp. 609–612. [Google Scholar] [CrossRef]
  72. Schons, T.; Moreira, G.J.P.; Silva, P.H.L.; Coelho, V.N.; Luz, E.J.S. Convolutional Network for EEG-Based Biometric. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications; Mendoza, M., Velastín, S., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 10657, pp. 601–608. [Google Scholar] [CrossRef]
  73. Di, Y.; An, X.; Liu, S.; He, F.; Ming, D. Using Convolutional Neural Networks for Identification Based on EEG Signals. In Proceedings of the 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, China, 25–26 August 2018; IEEE: New York, NY, USA, 2018; pp. 119–122. [Google Scholar] [CrossRef]
  74. Zhang, F.Q.; Mao, Z.J.; Huang, Y.F.; Xu, L.; Ding, G.Y. Deep Learning Models for EEG-Based Rapid Serial Visual Presentation Event Classification. J. Inf. Hiding Multimed. Signal Process. 2018, 9, 177–187. [Google Scholar]
  75. Gonzalez, P.A.; Katsigiannis, S.; Ramzan, N.; Tolson, D.; Arevalillo-Herrez, M. ES1D: A Deep Network for EEG-Based Subject Identification. In Proceedings of the 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE), Washington, DC, USA, 23–25 October 2017; IEEE: New York, NY, USA, 2017; pp. 81–85. [Google Scholar] [CrossRef]
  76. Waytowich, N.; Lawhern, V.J.; Garcia, O.; Cummings, J.; Faller, J.; Sajda, P.; Vettel, J.M. Compact Convolutional Neural Networks for Classification of Asynchronous Steady-State Visual Evoked Potentials. J. Neural Eng. 2018, 15, 066031. [Google Scholar] [CrossRef]
  77. Yu, T.; Wei, C.-S.; Chiang, K.-J.; Nakanishi, M.; Jung, T.-P. EEG-Based User Authentication Using a Convolutional Neural Network. In Proceedings of the 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER), San Francisco, CA, USA, 20–23 March 2019; IEEE: New York, NY, USA, 2019; pp. 1011–1014. [Google Scholar] [CrossRef]
  78. Lai, C.Q.; Ibrahim, H.; Abdullah, M.Z.; Abdullah, J.M.; Suandi, S.A.; Azman, A. Arrangements of Resting State Electroencephalography as the Input to Convolutional Neural Network for Biometric Identification. Comput. Intell. Neurosci. 2019, 2019, 1–10. [Google Scholar] [CrossRef] [PubMed]
  79. Wang, M.; El-Fiqi, H.; Hu, J.; Abbass, H.A. Convolutional Neural Networks Using Dynamic Functional Connectivity for EEG-Based Person Identification in Diverse Human States. IEEE Trans. Inform. Forensic Secur. 2019, 14, 3259–3272. [Google Scholar] [CrossRef]
  80. Wang, M.; Hu, J.; Abbass, H. Stable EEG Biometrics Using Convolutional Neural Networks and Functional Connectivity. Aust. J. Intell. Inf. Process. Syst. 2019, 15, 19–26. [Google Scholar]
  81. Zhang, R.; Zeng, Y.; Tong, L.; Shu, J.; Lu, R.; Li, Z.; Yang, K.; Yan, B. EEG Identity Authentication in Multi-Domain Features: A Multi-Scale 3D-CNN Approach. Front. Neurorobot. 2022, 16, 901765. [Google Scholar] [CrossRef]
  82. Das, R.; Maiorana, E.; Campisi, P. Visually Evoked Potential for EEG Biometrics Using Convolutional Neural Network. In Proceedings of the 2017 25th European Signal Processing Conference (EUSIPCO), Kos, Greece, 28 August–2 September 2017; IEEE: New York, NY, USA, 2017; pp. 951–955. [Google Scholar] [CrossRef]
  83. Cecotti, H. Convolutional Neural Networks for Event-Related Potential Detection: Impact of the Architecture. In Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Seogwipo, Republic of Korea, 11–15 July 2017; IEEE: New York, NY, USA, 2017; pp. 2031–2034. [Google Scholar] [CrossRef]
  84. Cecotti, H.; Jha, G. 3D Convolutional Neural Networks for Event-Related Potential Detection. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; IEEE: New York, NY, USA, 2019; pp. 4160–4163. [Google Scholar] [CrossRef]
  85. Chen, J.X.; Mao, Z.J.; Yao, W.X.; Huang, Y.F. EEG-Based Biometric Identification with Convolutional Neural Network. Multimed. Tools Appl. 2020, 79, 10655–10675. [Google Scholar] [CrossRef]
  86. Salami, A.; Andreu-Perez, J.; Gillmeister, H. EEG-ITNet: An Explainable Inception Temporal Convolutional Network for Motor Imagery Classification. IEEE Access 2022, 10, 36672–36685. [Google Scholar] [CrossRef]
  87. Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
  88. Riyad, M.; Khalil, M.; Adib, A. Incep-EEGNet: A ConvNet for Motor Imagery Decoding. In Image and Signal Processing; El Moataz, A., Mammass, D., Mansouri, A., Nouboud, F., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12119, pp. 103–111. [Google Scholar] [CrossRef]
  89. Liu, X.; Shen, Y.; Liu, J.; Yang, J.; Xiong, P.; Lin, F. Parallel Spatial–Temporal Self-Attention CNN-Based Motor Imagery Classification for BCI. Front. Neurosci. 2020, 14, 587520. [Google Scholar] [CrossRef]
  90. Zhu, Y.; Peng, Y.; Song, Y.; Ozawa, K.; Kong, W. RAMST-CNN: A Residual and Multiscale Spatio-Temporal Convolution Neural Network for Personal Identification with EEG. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 2021, E104.A, 563–571. [Google Scholar] [CrossRef]
  91. Lakhan, P.; Banluesombatkul, N.; Sricom, N.; Sawangjai, P.; Sangnark, S.; Yagi, T.; Wilaiprasitporn, T.; Saengmolee, W.; Limpiti, T. EEG-BBNet: A Hybrid Framework for Brain Biometric Using Graph Connectivity. IEEE Sens. Lett. 2025, 9, 1–4. [Google Scholar] [CrossRef]
  92. Ding, Y.; Robinson, N.; Zhang, S.; Zeng, Q.; Guan, C. TSception: Capturing Temporal Dynamics and Spatial Asymmetry from EEG for Emotion Recognition. IEEE Trans. Affect. Comput. 2023, 14, 2238–2250. [Google Scholar] [CrossRef]
  93. Salimi, N.; Barlow, M.; Lakshika, E. Towards Potential of N-back Task as Protocol and EEGNet for the EEG-based Biometric. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, 1–4 December 2020; pp. 1718–1724. [Google Scholar] [CrossRef]
  94. Ingolfsson, T.M.; Hersche, M.; Wang, X.; Kobayashi, N.; Cavigelli, L.; Benini, L. EEG-TCNet: An Accurate Temporal Convolutional Network for Embedded Motor-Imagery Brain–Machine Interfaces. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; IEEE: New York, NY, USA, 2020; pp. 2958–2965. [Google Scholar] [CrossRef]
  95. Kasim, Ö.; Tosun, M. Biometric Authentication from Photic Stimulated EEG Records. Appl. Artif. Intell. 2021, 35, 1407–1419. [Google Scholar] [CrossRef]
  96. Wu, B.; Meng, W.; Chiu, W.-Y. Towards Enhanced EEG-Based Authentication with Motor Imagery Brain-Computer Interface. In Proceedings of the 38th Annual Computer Security Applications Conference, Austin, TX, USA, 5–9 December 2022; ACM: New York, NY, USA, 2022; pp. 799–812. [Google Scholar] [CrossRef]
  97. Altuwaijri, G.A.; Muhammad, G.; Altaheri, H.; Alsulaiman, M. A Multi-Branch Convolutional Neural Network with Squeeze-and-Excitation Attention Blocks for EEG-Based Motor Imagery Signals Classification. Diagnostics 2022, 12, 995. [Google Scholar] [CrossRef]
  98. Autthasan, P.; Chaisaen, R.; Sudhawiyangkul, T.; Rangpong, P.; Kiatthaveephong, S.; Dilokthanakul, N.; Bhakdisongkhram, G.; Phan, H.; Guan, C.; Wilaiprasitporn, T. MIN2Net: End-to-End Multi-Task Learning for Subject-Independent Motor Imagery EEG Classification. IEEE Trans. Biomed. Eng. 2022, 69, 2105–2118. [Google Scholar] [CrossRef] [PubMed]
  99. Bidgoly, A.J.; Bidgoly, H.J.; Arezoumand, Z. Towards a Universal and Privacy Preserving EEG-Based Authentication System. Sci. Rep. 2022, 12, 2531. [Google Scholar] [CrossRef] [PubMed]
  100. Alsumari, W.; Hussain, M.; Alshehri, L.; Aboalsamh, H.A. EEG-Based Person Identification and Authentication Using Deep Convolutional Neural Network. Axioms 2023, 12, 74. [Google Scholar] [CrossRef]
  101. Yap, H.Y.; Choo, Y.-H.; Mohd Yusoh, Z.I.; Khoh, W.H. An Evaluation of Transfer Learning Models in EEG-Based Authentication. Brain Inf. 2023, 10, 19. [Google Scholar] [CrossRef]
  102. Chen, X.; Teng, X.; Chen, H.; Pan, Y.; Geyer, P. Toward Reliable Signals Decoding for Electroencephalogram: A Benchmark Study to EEGNeX. Biomed. Signal Process. Control 2024, 87, 105475. [Google Scholar] [CrossRef]
  103. Shakir, A.M.; Bidgoly, A.J. Task-Independent EEG-Based Authentication. J. Tianjin Univ. Sci. Technol. 2024, 57, 1–14. [Google Scholar] [CrossRef]
  104. Wu, Q.; Zeng, Y.; Zhang, C.; Tong, L.; Yan, B. An EEG-Based Person Authentication System with Open-Set Capability Combining Eye Blinking Signals. Sensors 2018, 18, 335. [Google Scholar] [CrossRef]
  105. Ozdenizci, O.; Wang, Y.; Koike-Akino, T.; Erdogmus, D. Adversarial Deep Learning in EEG Biometrics. IEEE Signal Process. Lett. 2019, 26, 710–714. [Google Scholar] [CrossRef]
  106. Musallam, Y.K.; AlFassam, N.I.; Muhammad, G.; Amin, S.U.; Alsulaiman, M.; Abdul, W.; Altaheri, H.; Bencherif, M.A.; Algabri, M. Electroencephalography-Based Motor Imagery Classification Using Temporal Convolutional Network Fusion. Biomed. Signal Process. Control 2021, 69, 102826. [Google Scholar] [CrossRef]
  107. Mane, R.; Chew, E.; Chua, K.; Ang, K.K.; Robinson, N.; Vinod, A.P.; Lee, S.-W.; Guan, C. FBCNet: A Multi-View Convolutional Neural Network for Brain-Computer Interface. arXiv 2021, arXiv:2104.01233. [Google Scholar] [CrossRef]
  108. Hu, F.; Wang, F.; Bi, J.; An, Z.; Chen, C.; Qu, G.; Han, S. HASTF: A Hybrid Attention Spatio-Temporal Feature Fusion Network for EEG Emotion Recognition. Front. Neurosci. 2024, 18, 1479570. [Google Scholar] [CrossRef] [PubMed]
  109. Muna, U.M.; Shawon, M.M.H.; Jobayer, M.; Akter, S.; Sabuj, S.R. SSTAF: Spatial-Spectral-Temporal Attention Fusion Transformer for Motor Imagery Classification. arXiv 2025, arXiv:2504.13220. [Google Scholar] [CrossRef]
  110. Wei, C.; Zhou, G. EEG Emotion Recognition Based on Attention Mechanism Fusion Transformer Network. In Proceedings of the 2024 11th International Conference on Biomedical and Bioinformatics Engineering, Osaka, Japan, 8–11 November 2024; ACM: New York, NY, USA, 2024; pp. 146–150. [Google Scholar] [CrossRef]
  111. Ghous, G.; Najam, S.; Alshehri, M.; Alshahrani, A.; AlQahtani, Y.; Jalal, A.; Liu, H. Attention-Driven Emotion Recognition in EEG: A Transformer-Based Approach with Cross-Dataset Fine-Tuning. IEEE Access 2025, 13, 69369–69394. [Google Scholar] [CrossRef]
  112. Wei, Y.; Liu, Y.; Li, C.; Cheng, J.; Song, R.; Chen, X. TC-Net: A Transformer Capsule Network for EEG-Based Emotion Recognition. Comput. Biol. Med. 2023, 152, 106463. [Google Scholar] [CrossRef]
  113. Sun, J.; Xie, J.; Zhou, H. EEG Classification with Transformer-Based Models. In Proceedings of the 2021 IEEE 3rd Global Conference on Life Sciences and Technologies (LifeTech), Nara, Japan, 9–11 March 2021; IEEE: New York, NY, USA, 2021; pp. 92–93. [Google Scholar] [CrossRef]
  114. Omair, A.; Saif-ur-Rehman, M.; Glasmachers, T.; Iossifidis, I.; Klaes, C. ConTraNet: A Hybrid Network for Improving the Classification of EEG and EMG Signals with Limited Training Data. Comput. Biol. Med. 2024, 168, 107649. [Google Scholar] [CrossRef]
  115. Wan, Z.; Li, M.; Liu, S.; Huang, J.; Tan, H.; Duan, W. EEGformer: A Transformer–Based Brain Activity Classification Method Using EEG Signal. Front. Neurosci. 2023, 17, 1148855. [Google Scholar] [CrossRef]
  116. Ma, Y.; Song, Y.; Gao, F. A Novel Hybrid CNN-Transformer Model for EEG Motor Imagery Classification. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; IEEE: New York, NY, USA, 2022; pp. 1–8. [Google Scholar] [CrossRef]
  117. Zhao, W.; Jiang, X.; Zhang, B.; Xiao, S.; Weng, S. CTNet: A Convolutional Transformer Network for EEG-Based Motor Imagery Classification. Sci. Rep. 2024, 14, 20237. [Google Scholar] [CrossRef]
  118. Liu, R.; Chao, Y.; Ma, X.; Sha, X.; Sun, L.; Li, S.; Chang, S. ERTNet: An Interpretable Transformer-Based Framework for EEG Emotion Recognition. Front. Neurosci. 2024, 18, 1320645. [Google Scholar] [CrossRef] [PubMed]
  119. Li, H.; Zhang, H.; Chen, Y. Dual-TSST: A Dual-Branch Temporal-Spectral-Spatial Transformer Model for EEG Decoding. arXiv 2024, arXiv:2409.03251. [Google Scholar] [CrossRef]
  120. Xie, J.; Zhang, J.; Sun, J.; Ma, Z.; Qin, L.; Li, G.; Zhou, H.; Zhan, Y. A Transformer-Based Approach Combining Deep Learning Network and Spatial-Temporal Information for Raw EEG Classification. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 30, 2126–2136. [Google Scholar] [CrossRef]
  121. Si, X.; Huang, D.; Sun, Y.; Huang, S.; Huang, H.; Ming, D. Transformer-Based Ensemble Deep Learning Model for EEG-Based Emotion Recognition. Brain Sci. Adv. 2023, 9, 210–223. [Google Scholar] [CrossRef]
  122. Yao, X.; Li, T.; Ding, P.; Wang, F.; Zhao, L.; Gong, A.; Nan, W.; Fu, Y. Emotion Classification Based on Transformer and CNN for EEG Spatial–Temporal Feature Learning. Brain Sci. 2024, 14, 268. [Google Scholar] [CrossRef] [PubMed]
  123. Lu, W.; Xia, L.; Tan, T.P.; Ma, H. CIT-EmotionNet: Convolution Interactive Transformer Network for EEG Emotion Recognition. PeerJ Comput. Sci. 2024, 10, e2610. [Google Scholar] [CrossRef] [PubMed]
  124. Bagchi, S.; Bathula, D.R. EEG-ConvTransformer for Single-Trial EEG-Based Visual Stimulus Classification. Pattern Recognit. 2022, 129, 108757. [Google Scholar] [CrossRef]
  125. Song, Y.; Zheng, Q.; Liu, B.; Gao, X. EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 710–719. [Google Scholar] [CrossRef]
  126. Si, X.; Huang, D.; Liang, Z.; Sun, Y.; Huang, H.; Liu, Q.; Yang, Z.; Ming, D. Temporal Aware Mixed Attention-Based Convolution and Transformer Network for Cross-Subject EEG Emotion Recognition. Comput. Biol. Med. 2024, 181, 108973. [Google Scholar] [CrossRef]
  127. Gong, L.; Li, M.; Zhang, T.; Chen, W. EEG Emotion Recognition Using Attention-Based Convolutional Transformer Neural Network. Biomed. Signal Process. Control 2023, 84, 104835. [Google Scholar] [CrossRef]
  128. Altaheri, H.; Muhammad, G.; Alsulaiman, M. Physics-Informed Attention Temporal Convolutional Network for EEG-Based Motor Imagery Classification. IEEE Trans. Ind. Inf. 2023, 19, 2249–2258. [Google Scholar] [CrossRef]
  129. Nguyen, A.H.P.; Oyefisayo, O.; Pfeffer, M.A.; Ling, S.H. EEG-TCNTransformer: A Temporal Convolutional Transformer for Motor Imagery Brain–Computer Interfaces. Signals 2024, 5, 605–632. [Google Scholar] [CrossRef]
  130. Cheng, Z.; Bu, X.; Wang, Q.; Yang, T.; Tu, J. EEG-Based Emotion Recognition Using Multi-Scale Dynamic CNN and Gated Transformer. Sci. Rep. 2024, 14, 31319. [Google Scholar] [CrossRef] [PubMed]
  131. Kostas, D.; Aroca-Ouellette, S.; Rudzicz, F. BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to Learn from Massive Amounts of EEG Data. Front. Hum. Neurosci. 2021, 15, 653659. [Google Scholar] [CrossRef]
  132. Yang, R.; Modesitt, E. ViT2EEG: Leveraging Hybrid Pretrained Vision Transformers for EEG Data. arXiv 2023, arXiv:2308.00454. [Google Scholar] [CrossRef]
  133. Jiang, W.; Zhao, L.; Lu, B. LaBraM: Large Brain Model for Learning Generic Representations with Tremendous EEG Data. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 7–11 May 2024; pp. 1–22. Available online: https://proceedings.iclr.cc/paper_files/paper/2024/file/47393e8594c82ce8fd83adc672cf9872-Paper-Conference.pdf (accessed on 23 September 2025).
  134. Li, W.; Zhou, N.; Qu, X. Enhancing Eye-Tracking Performance Through Multi-Task Learning Transformer. In Augmented Cognition; Schmorrow, D.D., Fidopiastis, C.M., Eds.; Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2024; Volume 14695, pp. 31–46. [Google Scholar] [CrossRef]
  135. Zhang, X.; Yao, L.; Kanhere, S.S.; Liu, Y.; Gu, T.; Chen, K. MindID: Person Identification from Brain Waves through Attention-Based Recurrent Neural Network. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2018, 2, 1–23. [Google Scholar] [CrossRef]
  136. Zhang, X.; Yao, L.; Huang, C.; Gu, T.; Yang, Z.; Liu, Y. DeepKey: A Multimodal Biometric Authentication System via Deep Decoding Gaits and Brainwaves. ACM Trans. Intell. Syst. Technol. 2020, 11, 1–24. [Google Scholar] [CrossRef]
  137. Balcı, F. DM-EEGID: EEG-Based Biometric Authentication System Using Hybrid Attention-Based LSTM and MLP Algorithm. Trait. Du Signal 2023, 40, 65–79. [Google Scholar] [CrossRef]
  138. Wilaiprasitporn, T.; Ditthapron, A.; Matchaparn, K.; Tongbuasirilai, T.; Banluesombatkul, N.; Chuangsuwanich, E. Affective EEG-Based Person Identification Using the Deep Learning Approach. IEEE Trans. Cogn. Dev. Syst. 2020, 12, 486–496. [Google Scholar] [CrossRef]
  139. Sun, Y.; Lo, F.P.-W.; Lo, B. EEG-Based User Identification System Using 1D-Convolutional Long Short-Term Memory Neural Networks. Expert Syst. Appl. 2019, 125, 259–267. [Google Scholar] [CrossRef]
  140. Chakravarthi, B.; Ng, S.-C.; Ezilarasan, M.R.; Leung, M.-F. EEG-Based Emotion Recognition Using Hybrid CNN and LSTM Classification. Front. Comput. Neurosci. 2022, 16, 1019776. [Google Scholar] [CrossRef] [PubMed]
  141. Puengdang, S.; Tuarob, S.; Sattabongkot, T.; Sakboonyarat, B. EEG-Based Person Authentication Method Using Deep Learning with Visual Stimulation. In Proceedings of the 2019 11th International Conference on Knowledge and Smart Technology (KST), Phuket, Thailand, 23–26 January 2019; IEEE: New York, NY, USA, 2019; pp. 6–10. [Google Scholar] [CrossRef]
  142. Zheng, X.; Cao, Z.; Bai, Q. An Evoked Potential-Guided Deep Learning Brain Representation for Visual Classification. In Neural Information Processing; Yang, H., Pasupa, K., Leung, A.C.-S., Kwok, J.T., Chan, J.H., King, I., Eds.; Communications in Computer and Information Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 1333, pp. 54–61. [Google Scholar] [CrossRef]
  143. Jin, X.; Tang, J.; Kong, X.; Peng, Y.; Cao, J.; Zhao, Q.; Kong, W. CTNN: A Convolutional Tensor-Train Neural Network for Multi-Task Brainprint Recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 103–112. [Google Scholar] [CrossRef] [PubMed]
  144. Kumar, P.; Saini, R.; Kaur, B.; Roy, P.P.; Scheme, E. Fusion of Neuro-Signals and Dynamic Signatures for Person Authentication. Sensors 2019, 19, 4641. [Google Scholar] [CrossRef] [PubMed]
  145. Chakladar, D.D.; Kumar, P.; Roy, P.P.; Dogra, D.P.; Scheme, E.; Chang, V. A Multimodal-Siamese Neural Network (mSNN) for Person Verification Using Signatures and EEG. Inf. Fusion 2021, 71, 17–27. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.