A Novel Spatiotemporal Framework for EEG-Based Visual Image Classification Through Signal Disambiguation

Fares, Ahmed

doi:10.3390/asi8050121

Open AccessArticle

A Novel Spatiotemporal Framework for EEG-Based Visual Image Classification Through Signal Disambiguation

by

Ahmed Fares

^1,2

¹

Department of Computer Science and Engineering, Egypt-Japan University of Science and Technology (E-JUST), Alexandria 21934, Egypt

²

Department of Electrical Engineering, Faculty of Engineering at Shoubra, Benha University, Cairo 11629, Egypt

Appl. Syst. Innov. 2025, 8(5), 121; https://doi.org/10.3390/asi8050121

Submission received: 15 July 2025 / Revised: 9 August 2025 / Accepted: 15 August 2025 / Published: 25 August 2025

(This article belongs to the Special Issue Advancements in Deep Learning and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

This study presents a novel deep learning framework for classifying visual images based on brain responses recorded through electroencephalogram (EEG) signals. The primary challenge in EEG-based visual pattern recognition lies in the inherent spatiotemporal variability of neural signals across different individuals and recording sessions, which severely limits the generalization capabilities of classification models. Our work specifically addresses the task of identifying which image category a person is viewing based solely on their recorded brain activity. The proposed methodology incorporates three primary components: first, a brain hemisphere asymmetry-based dimensional reduction approach to extract discriminative lateralization features while addressing high-dimensional data constraints; second, an advanced channel selection algorithm utilizing Fisher score methodology to identify electrodes with optimal spatial representativeness across participants; and third, a Dynamic Temporal Warping (DTW) alignment technique to synchronize temporal signal variations with respect to selected reference channels. Comprehensive experimental validation on a visual image classification task using a publicly available EEG-based visual classification dataset, ImageNet-EEG, demonstrates that the proposed disambiguation framework substantially improves classification accuracy while simultaneously enhancing model convergence characteristics. The integrated approach not only outperforms individual component implementations but also accelerates the learning process, thereby reducing training data requirements for EEG-based applications. These findings suggest that systematic spatiotemporal disambiguation represents a promising direction for developing robust and generalizable EEG classification systems across diverse neurological and brain–computer interface applications.

Keywords:

EEG; spatiotemporal signal alignment; brain asymmetry analysis; fisher score feature selection; dynamic temporal warping; visual stimulus classification

1. Introduction

This paper addresses the challenge of classifying visual images based on human brain activity recorded through electroencephalogram (EEG) signals—a pattern recognition task where the goal is to determine what image category a person is viewing based solely on their neural responses. The electroencephalogram (EEG) signal, gathered from the scalp, directly reflects human brain activity over time. It is widely used as a non-invasive and easy-to-implement technique for classifying brain activity [1]. EEG-based classifications have increasingly drawn the interest of brain science and neuroscience research teams and have been investigated in various areas such as emotion recognition [2,3], disease detection [4,5], brain–computer interface [6,7], semantic analysis [8], etc.

In visual pattern recognition from brain signals, subjects view images from different categories (such as animals, objects, or scenes) and their EEG responses are recorded. The fundamental question we address is the following: Can we accurately predict which image category a person is viewing based only on their brain signals? This capability has significant implications for brain–computer interfaces, assistive technologies for individuals with motor disabilities, and understanding the neural basis of visual perception.

Recent advancements in EEG acquisition equipment, signal processing techniques, machine learning methods, and other related fields have significantly promoted research on image classification from brain signals. These advancements have shown that using EEG signals for content understanding, pattern recognition, and classification is not just feasible, but also universal. However, achieving robust image classification from EEG signals faces two critical challenges: (i) Most EEG datasets are relatively small due to the high cost of data gathering and the unique characteristics of EEG signals and (ii) EEG data are high-dimensional, which poses challenges in exploring EEG-based classification. For instance, utilizing deep models for EEG-based brain activity classification is tougher due to the need for sufficient training data. To address the issue of limited EEG data, methods such as data augmentation [9], dimensionality reduction [10], and database merging [11] have been reported as potential solutions.

On the other hand, the scalp-recorded EEG signals contain spatiotemporal differences, including spatial and temporal perception differences. Specifically, the spatial perception difference of EEG signals is promoted by the distinctions in the size and shape of the subjects’ heads and the distinctions in the structure and function distribution of the subjects’ brains. In EEG signal recording, subjects of similar head sizes wear an electrode cap, including small, medium, and large. However, the specific measure and shape of these subjects’ heads are unique; this prompts the fact that the same electrode on an EEG cap cannot be ensured to fall at a similar scalp position of each subject. Thus, the distinctions in structure and function between subjects’ brains may lead to diverse functional information of the EEG signals collected by the same electrode. People ignore this spatial deviation and process EEG signals gathered from various subjects with a unified methodology. Moreover, the scale-recoded EEG signals contain time perception distinction between subjects and sessions. Various subjects have distinctive response speeds and response cycles for similar boosts. For one subject, the response speeds and cycles to the same stimuli may also be distinctive in various sessions because of the changing body state and intellectual level. The time perception difference expands the difficulty of feature extraction and feature selection on EEG data collected from multiple subjects [12]. It reduces the generalization capability of the EEG-based classifier between individuals and tasks [13]. For this issue, a few investigations have attempted to adjust EEG signals in the time domain and lessen the impact of time perception distinctions by the time-warping method [1,14].

To address the limitations of EEG signals and develop a unified spatiotemporal EEG-based classification model, we propose a comprehensive deep framework featuring context-based spatiotemporal disambiguation for EEG signal classification. Our contributions include: (i) utilizing dimensionality reduction based on brain asymmetry to mitigate sample size and dimensionality constraints; (ii) implementing a feature selection method to choose spatially representative channels that capture the principal components for the current task across all subjects; (iii) applying a time-warping method to align signals in the time domain effectively; and (iv) conducting extensive experiments to demonstrate that our deep framework outperforms existing state-of-the-art methods.

2. Related Work

Electroencephalographic signal analysis fundamentally operates through two distinct computational phases: feature extraction and pattern recognition [15,16]. The initial phase involves deriving meaningful characteristics from neural recordings through sophisticated signal processing and analytical methodologies. The subsequent phase encompasses the examination of extracted feature patterns utilizing appropriate pattern recognition algorithms. Before the widespread adoption of deep learning methodologies, electroencephalographic signal analysis predominantly relied on time-frequency domain characteristics, including power spectral density measurements and differential entropy calculations, which were obtained through conventional signal processing approaches. Classical pattern recognition and machine learning frameworks encompassed artificial neural networks, Naive Bayes classification algorithms, and support vector machine implementations. The emergence of deep learning technologies has catalyzed a paradigm shift, with numerous research groups within neuroscience and brain science disciplines actively exploring the implementation of advanced deep learning methodologies for electroencephalographic data comprehension and analysis. Contemporary research efforts are increasingly focused on developing comprehensive end-to-end architectures that seamlessly integrate feature extraction processes with classification or clustering operations, thereby eliminating the traditional separation between these computational stages and enabling more sophisticated neural signal interpretation capabilities.

Recent investigations have demonstrated the substantial potential of deep learning architectures in electroencephalographic signal analysis across diverse neurological applications. Kulasingham et al. implemented dual deep learning methodologies, specifically deep belief networks and stacked autoencoder architectures, for detecting P300 event-related potentials within guilty knowledge test paradigms [17]. Recent research by Zhou et al. (2025) [18] investigated cognitive load recognition using EEG signals during complex simulated flight missions, addressing the gap between simplistic laboratory tasks and real-world operational environments. Their study employed the Multi-Attribute Task Battery (MATB) to induce three levels of cognitive load (low, medium, high) across multiple sessions, collecting both EEG data and behavioral metrics from 36 participants. The researchers compared traditional machine learning approaches (PSD features with SVM) against several convolutional neural network architectures, including shallow CNN, deep CNN, EEGNet, EEGNex, and EEGTCN. Their findings revealed that simpler CNN models, particularly the shallow CNN, achieved superior performance (up to 83% accuracy) in within-subject cognitive load classification compared to more complex architectures [18]. Wang et al. presented an innovative electroencephalographic motor imagery classification system based on long short-term memory network architectures, incorporating one-dimensional aggregate approximation techniques to achieve enhanced signal representation capabilities [19]. Furthermore, Gao et al. constructed a specialized electroencephalographic spatial-temporal convolutional neural network (ESTCNN) that strengthens temporal dependency modeling for individual electrode channels while simultaneously improving spatial feature extraction mechanisms, resulting in substantial performance improvements for driver fatigue detection applications [20]. Liu et al. addressed the critical issue of reliability assessment in automated EEG-based epileptic seizure detection by proposing an Evidential Multi-view Learning (EML) framework. Recognizing that traditional seizure detection methods focus primarily on accuracy while neglecting decision reliability, their approach integrates multiple feature representations (temporal, spectral, and temporal-spectral views) to mitigate noise effects inherent in EEG signals. By dynamically weighting views based on their confidence levels during the fusion process and incorporating a multi-view common graph to maintain cross-view consistency, the method achieved 99.05% accuracy on the CHB-MIT dataset, outperforming state-of-the-art approaches [21]. Li et al. demonstrated the clinical utility of transfer learning methodologies in constructing convolutional neural network models, establishing their effectiveness for objective, precise, and expeditious diagnosis of mild depressive disorders [22]. Zhang et al. proposed innovative cascade and parallel convolutional recurrent neural network architectures for precise detection of voluntary human movements through effective spatiotemporal feature extraction from unprocessed electroencephalographic signals [23]. Additionally, Tan et al. engineered a hybrid brain–computer interface rehabilitation support system utilizing combined convolutional neural network and recurrent neural network methodologies for electroencephalographic signal classification, integrating video-based electroencephalographic analysis with optical flow processing techniques [24].

Deep learning has significantly enhanced EEG-based emotion recognition in recent years, a crucial aspect of EEG analysis. Numerous deep learning techniques have been introduced to accelerate progress and expand the application of this field. Wang et al. introduced a hierarchical spatial learning transformer (HSLT) model for EEG-based emotion recognition that addresses the limitations of conventional approaches in capturing long-range spatial dependencies across electrodes and brain regions. Their method organizes EEG electrodes into nine anatomically based brain region clusters and employs a two-stage transformer architecture: electrode-level encoders that integrate information within individual brain regions, followed by brain-region-level encoding that captures inter-regional dependencies. Through subject-independent experiments on the DEAP and MAHNOB-HCI databases, they achieved accuracies of 65.75% and 66.51% for arousal and valence classification, respectively, on DEAP, with comparable results on MAHNOB-HCI [25]. Li et al. developed a bi-hemisphere domain adversarial neural network (BiDANN) model featuring a global and two local domain discriminators that adversarially interact with a classifier to extract distinct emotional features for each hemisphere, achieving leading performance on the SEED emotion recognition database [26]. Luo et al. introduced a novel Wasserstein generative adversarial network domain adaptation (WGANDA) framework to address domain shift issues in cross-subject EEG-based emotion recognition, demonstrating that this framework significantly surpasses existing domain adaptation methods on two public EEG datasets for emotion recognition [27].

In addition, deep learning methods are becoming increasingly vital in EEG-based multimedia content analysis [28,29,30,31]. Spampinato et al. introduced an RNN-based approach to learn descriptors for visual stimuli-evoked EEG data, mapping CNN image features to EEG features to predict image classes accurately [32]. Xue et al. developed a hybrid local–global neural network architecture for EEG-based visual classification that processes raw signals without requiring handcrafted frequency-domain features. Their framework introduces several innovative components: a reweight module that adaptively learns electrode importance across subjects rather than relying on fixed spatial coordinates, a local feature extraction module combining one-dimensional temporal convolutions with residual connections to capture both simple and complex signal patterns, and a global transformer block for modeling long-range temporal dependencies. For high sampling rate scenarios, they incorporated a feature fusion module to handle the increased dimensionality effectively. The model was extensively validated across five public datasets at multiple sampling rates (62.5 Hz, 125 Hz, and 250 Hz), achieving state-of-the-art performance including 55.93% accuracy on the EEG72 6-class task and 32.24% on the 72-class task, surpassing previous methods that relied on combined spectral-temporal features [33]. Research has demonstrated the potential for generating multimedia content information from EEG data. Kavasidis et al. proposed a method for creating images from visually evoked brain signals recorded via EEG [34], using variational autoencoders (VAE) and generative adversarial networks (GAN) to produce semantically coherent images. Additionally, Tirupattur et al. utilized adversarial learning to reveal that EEG signals encode cues from thoughts, generating semantically relevant visualizations [35].

The advancement of deep learning technology has contributed to the progress of EEG classification, particularly in tasks like EEG-based visual content analysis. However, compared to the progress seen in computer vision [36,37,38,39], EEG analysis still faces several limitations. These include EEG data’s limited availability, EEG datasets’ high dimensionality, and the spatiotemporal variations in EEG signals. Further research is needed to address these issues.

3. Methodology

Based on the comprehensive survey of previous studies, a deep framework driven by spatiotemporal disambiguation (SD) for classifying visual images based on EEG recordings (SD-BiLSTM) is proposed. Our approach enables accurate prediction of which image category from ImageNet a subject is viewing based on their brain signals. The structure of the proposed Spatiotemporal Disambiguation BiLSTM (SD-BiLSTM) framework, along with the training and testing phases, is depicted in Figure 1. Our approach comprises three stages:

Dimensionality Reduction Based on Brain Asymmetry: This step reduces data complexity and addresses sample size limitations.
Feature Selection of Spatially Efficient Channels: We select channels that effectively represent principal components for the task.
Time-Warping: This aligns signals in the time domain for consistency.

Figure 1. Architecture of the proposed Spatiotemporal Disambiguation BiLSTM (SD-BiLSTM) deep framework.

Multi-channel EEG signals are recorded while subjects view images, and the data are divided into training and test subsets. During the training phase, the classifier uses the training data to generate the model. In the testing phase, this model predicts the visual labels of the test subset.

3.1. Brain Asymmetry-Based Dimensionality Reduction

In information theory, machine learning, and statistics, dimensionality reduction involves reducing data dimensions to obtain a set of principal features. This helps with data compression, visualization, redundancy elimination, simplification, and model training and testing optimization. When dealing with multi-channel EEG signals, reducing the number of channels can not only remove noise [40] but also lessen the impact of the EEG data shortage, thus enhancing the generalization capability of EEG-based models [10].

In this study, we implement dimensionality reduction on EEG signals based on brain asymmetry. Brain asymmetry refers to the brain’s selective specialization of certain neural activities or cognitive processes in either the right or left hemisphere [41]. Research has shown that brain asymmetry can provide valuable features for EEG-based tasks. For example, Ahmed et al. used statistical features from EEG data of the left and right brains to explore the correlation between brain hemispheres and emotions [42]. Aris et al. proposed the Asymmetry Score feature to investigate brain activity in relaxed and non-relaxed states [43]. Additionally, Koolen et al. investigated the quantification of interhemispheric synchrony (IHS) in neonatal EEG using the activation synchrony index (ASI), addressing the limitations of traditional visual assessment methods that lack objective, quantifiable definitions [44].

Our approach integrates dimension reduction with brain asymmetry by applying dimensionality reduction to EEG signals. Specifically, we retain signals from electrodes at the centerline position and subtract the signal from electrodes in the right hemisphere from their corresponding counterparts in the left hemisphere. For example, using the 128-channel EEG cap of the international 10–20 system, we focus on eight centerline electrodes (Fpz, Fz, Cz, CPz, Pz, POz, Oz, and Iz) and sixty pairs of brain-symmetric electrodes, as illustrated in Figure 2.

3.2. Channel Selection for Spatial Representation

This investigation utilizes channel selection methodologies to determine the most informative EEG electrodes that efficiently encapsulate the fundamental characteristics of the experimental paradigm across all participants. As depicted in Figure 1, the training dataset encompasses EEG recordings from all subjects. The implementation employs the Fisher score feature selection approach to identify spatially informative channels from the training dataset and corresponding class labels. The Fisher score represents a feature selection methodology designed to determine the most discriminative attributes for classification problems [45]. This technique assesses each feature dimension through computation of the between-class and within-class variance ratio. The primary goal involves selecting attributes that maximize inter-class separation while minimizing intra-class dispersion.

Between-class Variance ( $Σ_{b}$ ): Quantifies the variance across different classes, representing the degree of class separation.
Within-class Variance ( $Σ_{w}$ ): Quantifies the variance within individual classes, indicating class cohesion.

The Fisher score for the jth dimensional attribute is expressed as

F (j) = \frac{Σ_{b}^{(j)}}{Σ_{w}^{(j)}}

(1)

An elevated Fisher score value indicates that the attribute possesses substantial discriminative capability, establishing it as an optimal candidate for class differentiation within the dataset. Through ranking attributes according to their Fisher score values, one can identify the most efficient attributes for enhancing classification performance. The between-class variance is computed as follows:

Σ_{b}^{(j)} = \sum_{l = 1}^{L} \frac{N_{l}}{N} {(μ_{l}^{(j)} - {\bar{μ}}^{j})}^{2}

(2)

where L represents the number of classes in the training dataset, N denotes the total sample count,

N_{l}

indicates the sample count for class l,

μ_{l}^{(j)}

represents the mean value for class l, and

{\bar{μ}}^{j}

denotes the global mean. The within-class variance is determined as

Σ_{w}^{(j)} = \frac{1}{N} \sum_{l = 1}^{L} \sum_{z \in L_{l}} (z^{(j)} - μ_{l}^{(j)}) {(z^{(j)} - μ_{l}^{(j)})}^{T}

(3)

where

z^{(j)}

represents the value of the jth dimension for sample z. A dimension exhibiting robust classification capability should demonstrate elevated between-class variance and reduced within-class variance. Consequently, a higher Fisher score indicates stronger classification correlation for that dimension. To select multiple dimensions, one can rank the Fisher score and select those with superior values.

3.3. Time-Warping

In temporal signal processing, temporal alignment, Time-Warping, constitutes a frequently utilized approach for synchronizing peaks in dual spectra through time axis adjustments [46]. We implement temporal alignment to accommodate temporal perception variations between sessions and subjects by synchronizing EEG signals that exhibit temporal deviations. Specifically, we utilize spatially informative channels identified through feature selection on the training dataset as the reference signal for temporal alignment. For each electrode within the EEG sample, we employ the temporal alignment algorithm to synchronize the electrode’s signal with the nearest spatially informative channel. It should be emphasized that the spatially informative channels identified from the training dataset are directly utilized as the alignment reference for the testing dataset. Our methodology employs the Dynamic Temporal Warping (DTW) algorithm to synchronize the EEG signals. Let us provide a concise overview of this algorithm.

Dynamic Temporal Warping (DTW) represents a distinguished algorithm for evaluating similarity between dual temporal sequences that may vary in velocity or duration [47]. It achieves this through determining an optimal correspondence, termed the warping trajectory.

We consider dual time series,

P = {p_{1}, p_{2}, \dots, p_{| P |}}

and

Q = {q_{1}, q_{2}, \dots, q_{| Q |}}

, with lengths

| P |

and

| Q |

. DTW attempts to identify the warping trajectory

T

that minimizes the accumulated distance between corresponding points in these sequences.

The warping trajectory

T

comprises index pairs

t_{r} = (u, v)

, where

1 \leq r \leq R

, and R represents the trajectory length, satisfying

T = t_{1}, t_{2}, \dots, t_{R} max (| P |, | Q |) \leq R < | P | + | Q |

(4)

Each pair

t_{r} = (u, v)

corresponds point

p_{u}

in

P

with point

q_{v}

in

Q

. The trajectory begins at

t_{1} = (1, 1)

and terminates at

t_{R} = (| P |, | Q |)

, ensuring complete traversal of both sequences. The indices u and v progress monotonically, preserving sequence ordering.

The accumulated distance along

T

, designated as

Cost (T)

, is calculated as

Cost (T) = \sum_{r = 1}^{R} Cost (t_{r u}, t_{r v})

(5)

Here,

Cost (t_{r u}, t_{r v})

denotes the distance between corresponding points

p_{u}

and

q_{v}

, commonly computed using the Euclidean distance.

To establish the optimal warping trajectory, DTW minimizes

cost (T)

. This is typically accomplished through dynamic programming, which iteratively computes the accumulated distance

D (u, v)

for subsequences of

P

and

Q

up to indices u and v:

\begin{matrix} D (u, v) = Cost (u, v) + \\ min [D (u - 1, v), D (u, v - 1), D (u - 1, v - 1)] D (1, 1) = 0 \end{matrix}

(6)

The value

D (| P |, | Q |)

represents the minimal distance along the optimal trajectory, effectively quantifying the similarity between the time series.

4. Experimental Results

The performance assessment of our proposed spatiotemporal disambiguation methodology is conducted using a publicly accessible electroencephalogram dataset specifically designed for visual stimulus classification tasks [32,48]. This comprehensive dataset encompasses 12,000 neural signal recordings collected from six participants during visual perception experiments. The experimental protocol involved presenting participants with carefully curated image stimuli selected from ImageNet, organized into 40 distinct categorical classes, with each class containing 50 representative images.

Neural activity acquisition was accomplished using a high-density electrode array configuration consisting of 128 recording channels equipped with active, low-impedance sensing elements (actiCAP 128Ch system). The data acquisition system operated at a sampling frequency of 1000 Hz to ensure adequate temporal resolution for capturing rapid neural dynamics. Following standard preprocessing procedures and artifact removal protocols to eliminate contaminated signal segments, the resulting dataset contains individual trial recordings with a duration of 440 milliseconds each.

The complete dataset underwent systematic partitioning into three distinct subsets to facilitate robust model evaluation: the training partition comprises 80% of the available data (9191 samples), while both validation and testing partitions each contain 10% of the total samples (1127 samples, respectively). To ensure experimental integrity and prevent data leakage, a subject-wise and stimulus-wise splitting strategy was implemented, whereby all neural responses recorded from individual participants for specific visual stimuli were exclusively assigned to a single data partition.

To evaluate our model’s performance, we used the study by Spampinato et al. [32] as a reference, which utilized the same dataset and LSTM encoder. Our research investigates the effectiveness of various model configurations by comparing classification performance. Our classifier is a BiLSTM model, and we compared its results with the LSTM model reported by Spampinato et al. [32]. We also examined the impact of different data processing methods, such as dimensionality reduction based on brain asymmetry, selection of spatially representative channels, and time-warping with spatiotemporal disambiguation. Additionally, we compared our time-warping algorithm, DTW, with the Fast Parametric Time Warping (Fast-PTW) algorithm proposed by Wehrens et al. [49] for aligning chromatograms.

In the upcoming sections, we validate the effectiveness of spatiotemporal disambiguation in two steps. First, we quantify the contributions of three key processing stages: dimensionality reduction, feature selection, and time-warping. Then, we quantify the contribution of the proposed spatiotemporal disambiguation framework to EEG-based visual classification. Our model includes a BiLSTM model with five layers: the input layer, BiLSTM layer, dense fully connected layer, soft-max layer, and output layer. We follow the specific parameter settings of Spampinato et al. [32]. All statistical experiments are repeated five times, and the average results are reported.

4.1. Evaluations of Three Key Stages

In this phase of experiments, we first compared the precision rates of EEG-based visual classification using LSTM and BiLSTM models before and after dimensionality reduction based on brain asymmetry. The experimental results are reported in Table 1. The precision rate of the original 128-channel dataset on LSTM was 83.41%. However, we found that dimension reduction based on brain asymmetry not only reduced the data dimension and removed redundancies but also obtained effective asymmetry features and improved the performance of visual classification.

After dimensionality reduction, the input data dimension was reduced to 68 as seen in Figure 2, and the precision rates of LSTM and BiLSTM improved to 88.82% and 91.75% respectively, which are better than the results of the original 128-channel data on LSTM and BiLSTM (83.41% and 87.93%). Additionally, we observed that the classification precision rates on BiLSTM were better than those on LSTM, indicating that EEG signals contain effective bidirectional features that can help improve classification accuracy.

In order to investigate how effectively selecting specific spatially representative channels impacts performance, we conducted experiments on LSTM and BiLSTM models. To achieve this, we employed the Fisher score feature selection method to rank the 128 channels according to their classification ability. The distribution of the Fisher score for each channel is illustrated in Figure 3. This figure reveals variations in the visual classification capabilities of each channel, as well as functional differences between the left and right hemispheres. For instance, channels FC4 and C6 on the right hemisphere have higher Fisher scores than those on the left hemisphere.

In this phase of the experiments, we selected the top 10, 32, and 96 high-efficiency spatially representative channels based on the Fisher score sorting as the input for the classifiers. The precision rates for visual classification are provided in Table 2. We achieved the highest accuracy of 89.53% using LSTM with 96 channels and 90.06% using BiLSTM with 32 channels, which are higher than the results obtained with the original 128-channel data on LSTM and BiLSTM (83.41% and 87.93%, respectively).

In this phase of the experiments, we investigate the impact of the time-warping method on the original 128-channel data using LSTM and BiLSTM. After sorting based on the Fisher score, we selected 32 and 64 high-efficiency spatial channels. Subsequently, we applied the DTW algorithm to align each channel’s signal with the nearest spatially representative channel for each channel in an EEG sample. The resulting warped 128-channel EEG data were then used as input for the models. Table 3 shows the precision rates of visual classification for the warped 128-channel data using LSTM and BiLSTM. The results demonstrate improved precision rates for visual classification after time warping. Specifically, the best accuracy of 92.10% was achieved on BiLSTM with time warping using 64 spatially efficient channels, in comparison to the accuracy of 83.41% for LSTM and 87.93% for BiLSTM using the original 128-channel data.

4.2. Experimental Results of Spatiotemporal Disambiguation

We conducted experiments to introduce and evaluate three key methods: dimension reduction based on brain asymmetry, selection of spatially efficient public channels, and time-warping. Our findings demonstrate that, in most cases, BiILSTM-based models yield better results than LSTM-based models. In this phase of experiments, we aimed to combine the aforementioned three key stages and propose a novel spatiotemporal-driven disambiguation deep framework for EEG-based brain activity classifications.

Initially, we extracted 68-dimensional asymmetry features from the original 128-channel signals using dimension reduction based on brain asymmetry. We then used the Fisher score to select 32 spatially representative channels from these 68 asymmetry features. Subsequently, we applied the DTW algorithm to align the signals of each channel with the nearest spatially representative channel. The aligned 68-channel features serve as input for the BiLSTM model. Additionally, in this phase of experiments, we compared the effectiveness of the Fast-PTW algorithm with the DTW algorithm. The experimental results are shown in Table 4.

Table 4 summarizes the experimental results for the classification precisions of the proposed deep framework using two time-warping algorithms. The findings indicate that spatiotemporal disambiguation significantly enhances the performance of EEG-based visual classification. The precision rates achieved by the spatiotemporal disambiguation method on the BiLSTM model surpass the best results obtained from using the three key steps independently (90.06%, 91.75%, and 92.10%). By applying spatiotemporal disambiguation with the DTW algorithm, we achieved the highest precision rate of 94.23%, which exceeds the result obtained with Fast-PTW (93.35%).

Furthermore, we found that the proposed spatiotemporal disambiguation framework quickens the convergence speed of the models during the training stage. It was observed that the convergence speed of the LSTM model after spatiotemporal disambiguation with the DTW algorithm is faster than without it (Figure 4). For the BiLSTM model, both the convergence speeds after applying spatiotemporal disambiguation with the DTW algorithm and the Fast-PTW algorithm are faster than those without spatiotemporal disambiguation. These results indicate that the proposed spatiotemporal disambiguation method significantly reduces the data’s inconsistency, making the EEG data more consistent in both time and space dimensions. This not only improves classification performance but also accelerates the learning process of the model and reduces the requirements for the amount of training data for EEG-related tasks.

In addition to the classification results, we also provide an analysis based on feature selection. As previously described, we obtained 68-dimensional features from the original 128-channel signals by reducing dimensionality based on the human brain’s lateralization effect. We then used the Fisher score to identify 32 spatially representative channels. Are these selected channels consistent for each subject? To illustrate this, in Figure 5, we display each subject’s physical locations on the top 32 channels. It is important to note that Figure 5a–f correspond to each individual subject, while Figure 5g displays the physical locations of the combined EEG data from all subjects, with features selected based on the Fisher score. It is clear that the selected channels across different subjects are consistent with each other. This demonstrates that there are some similarities between different subjects, and our method effectively captures this similarity.

4.3. Comparative Analysis with State-of-the-Art Methods

To comprehensively evaluate the efficacy of our proposed spatiotemporal disambiguation framework, extensive comparative experiments against established benchmark methods in EEG-based visual classification were conducted. This comparative analysis serves to validate the practical advantages of our approach by directly measuring classification performance against existing state-of-the-art methodologies.

Our experimental protocol utilized the same publicly available ImageNet-EEG dataset [32,48] employed throughout this study, ensuring consistent evaluation conditions across all methods. The comparative analysis encompasses five representative approaches from the literature: the foundational RNN-based model proposed by Spampinato et al. [32], Transformer-based Methods, including Vision Transformer (ViT) adapted for EEG [50] and Transformer with Positional Encoding [51], advanced Siamese network architectures [48], multimodal integration networks [30], the CogniNet framework [52], and traditional RS-LDA methodology [53].

Table 5 presents the comparative classification accuracies achieved by our spatiotemporal disambiguation framework against these established benchmarks. Our proposed SD-BiLSTM framework, incorporating the complete spatiotemporal disambiguation pipeline, achieves a classification accuracy of 94.23%, demonstrating substantial improvements over existing approaches.

The experimental results reveal several noteworthy observations. While contemporary deep learning approaches such as Siamese networks (93.70%) and multimodal networks (94.10%) achieve competitive performance levels, our spatiotemporal disambiguation framework demonstrates consistent superiority. This improvement can be attributed to the systematic addressing of fundamental EEG signal challenges through our three-stage approach: brain asymmetry-based dimensionality reduction effectively captures lateralization features while managing data complexity, Fisher score-based channel selection identifies spatially informative electrodes across subjects, and DTW-based temporal alignment resolves inter-subject and inter-session timing variations.

The substantial performance gap between our method and the baseline RNN approach (82.90%) underscores the importance of addressing spatiotemporal variability in EEG signals. Traditional methods that process raw EEG data without explicit disambiguation suffer from the inherent signal inconsistencies across subjects and sessions. Similarly, the limited performance of classical machine learning approaches like RS-LDA (13.00%) highlights the necessity of deep learning architectures for capturing complex neural patterns in high-dimensional EEG data.

While Vision Transformer adapted for EEG signals achieves competitive performance (91.85%), it falls short of our method. The standard Transformer with positional encoding performs slightly better (92.18%), likely due to better temporal modeling. However, both Transformer variants struggle with the limited training data available in EEG datasets, as they typically require larger datasets to reach optimal performance.

These comparative results confirm that our spatiotemporal disambiguation framework not only achieves state-of-the-art classification accuracy but also provides a principled approach to handling the fundamental challenges in EEG signal analysis. The consistent improvements across different evaluation metrics suggest that our methodology successfully enhances the generalization capability of EEG-based classification systems, making them more robust for practical applications in brain–computer interfaces and neurological assessment tasks.

5. Conclusions

Electroencephalographic signal analysis for classification applications possesses a strong neurophysiological foundation. Nevertheless, this research domain encounters substantial obstacles, including insufficient dataset availability, excessive feature dimensionality, and inherent spatiotemporal signal variability. To overcome these limitations, we developed a comprehensive neural signal classification framework that prioritizes spatiotemporal signal disambiguation techniques. Our approach systematically addresses data complexity through hemisphere-based asymmetry analysis for dimensional reduction, effectively mitigating complications arising from high-dimensional feature spaces and constrained sample sizes. Subsequently, we implemented an advanced channel selection methodology to isolate the most discriminative electrodes that effectively capture spatial signal characteristics across participants. Finally, we deployed a sophisticated temporal alignment algorithm that synchronizes signal sequences derived from these optimally selected channels, thereby resolving temporal inconsistencies observed across different recording sessions and individual subjects. Comprehensive experimental validation conducted on an established electroencephalographic visual classification benchmark dataset confirmed that our spatiotemporal disambiguation approach significantly improves classification accuracy. The systematic evaluation reveals that each methodological component contributes meaningfully to overall performance enhancement while simultaneously facilitating accelerated model convergence during the training phase.

Future research directions encompass extending this framework to diverse real-world applications in neural signal interpretation and brain activity analysis tasks. Furthermore, promising opportunities exist for advancing the spatiotemporal disambiguation methodology through multimodal integration approaches, specifically combining electroencephalographic recordings, which provide superior temporal resolution, with functional magnetic resonance imaging modalities that offer enhanced spatial precision. Such integration could potentially yield more comprehensive and robust neural signal analysis capabilities.

Funding

The author thank the Academy of Scientific Research and Technology (ASRT, Egypt) for funding [Grant 25761]. This work was supported by the National Natural Science Foundation of China (NSFC) [Grant W2412099].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable. This study utilized only publicly available data from the PeRCeiVe Laboratory, for which ethics approval was obtained by the original data collectors.

Data Availability Statement

The datasets used and analysed during the current study are publicly available from Pattern Recognition and Computer Vision Laboratory, which is a publicly available EEG dataset for brain imaging classification hosted by http://www.perceivelab.com/datasets (accessed on 9 August 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yamauchi, T.; Xiao, K.; Bowman, C.; Mueen, A. Dynamic time warping: A single dry electrode EEG study in a self-paced learning task. In Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), Xi’an, China, 21–24 September 2015; pp. 56–62. [Google Scholar] [CrossRef]
Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. DEAP: A Database for Emotion Analysis; Using Physiological Signals. IEEE Trans. Affect. Comput. 2012, 3, 18–31. [Google Scholar] [CrossRef]
Lan, Y.T.; Jiang, W.B.; Zheng, W.L.; Lu, B.L. CEMOAE: A Dynamic Autoencoder with Masked Channel Modeling for Robust EEG-Based Emotion Recognition. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 1871–1875. [Google Scholar] [CrossRef]
Yuan, Y.; Xun, G.; Jia, K.; Zhang, A. A novel wavelet-based model for EEG epileptic seizure detection using multi-context learning. In Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 13–16 November 2017; pp. 694–699. [Google Scholar] [CrossRef]
Hassan, K.M.; Zhao, X.; Sugano, H.; Tanaka, T. Detection of Epileptic Seizures in Long Eeg Recordings Using an Anomaly Detector with Artifact Rejection. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 2230–2234. [Google Scholar] [CrossRef]
Yuan, H.; He, B. Brain–Computer Interfaces Using Sensorimotor Rhythms: Current State and Future Perspectives. IEEE Trans. Biomed. Eng. 2014, 61, 1425–1435. [Google Scholar] [CrossRef] [PubMed]
Hu, L.; Zhu, J.; Chen, S.; Zhou, Y.; Song, Z.; Li, Y. A Wearable Asynchronous Brain-Computer Interface Based on EEG-EOG Signals With Fewer Channels. IEEE Trans. Biomed. Eng. 2024, 71, 504–513. [Google Scholar] [CrossRef] [PubMed]
Hanouneh, S.; Amin, H.U.; Saad, N.M.; Malik, A.S. The correlation between EEG asymmetry and memory performance during semantic memory recall. In Proceedings of the 2016 6th International Conference on Intelligent and Advanced Systems (ICIAS), Kuala Lumpur, Malaysia, 15–17 August 2016; pp. 1–4. [Google Scholar] [CrossRef]
Fahimi, F.; Dosen, S.; Ang, K.K.; Mrachacz-Kersting, N.; Guan, C. Generative Adversarial Networks-Based Data Augmentation for Brain–Computer Interface. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4039–4051. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Guo, Y.; Tang, F. Dimension selection for EEG classification in the SPD Riemannian space based on PSO. Knowl.-Based Syst. 2023, 279, 110933. [Google Scholar] [CrossRef]
Bashivan, P.; Rish, I.; Yeasin, M.; Codella, N. Learning representations from EEG with deep recurrent-convolutional neural networks. In Proceedings of the International Conference on Learning Representations ICLR, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
Macaš, M.; Vavrecka, M.; Gerla, V.; Lhotská, L. Classification of the emotional states based on the EEG signal processing. In Proceedings of the 2009 9th International Conference on Information Technology and Applications in Biomedicine, Larnaka, Cyprus, 4–7 November 2009; pp. 1–4. [Google Scholar] [CrossRef]
Das Chakladar, D.; Roy, P.P. Cognitive workload estimation using physiological measures: A review. Cogn. Neurodynamics 2024, 18, 1445–1465. [Google Scholar] [CrossRef]
Aarabi, A.; Kazemi, K.; Grebe, R.; Moghaddam, H.A.; Wallois, F. Detection of EEG transients in neonates and older children using a system based on dynamic time-warping template matching and spatial dipole clustering. NeuroImage 2009, 48, 50–62. [Google Scholar] [CrossRef]
Alghamdi, A.M.; Ashraf, M.U.; Bahaddad, A.A.; Almarhabi, K.A.; Al Shehri, W.A.; Daraz, A. Cross-subject EEG signals-based emotion recognition using contrastive learning. Sci. Rep. 2025, 15, 28295. [Google Scholar] [CrossRef]
Cheng, C.; Liu, W.; Feng, L.; Jia, Z. Emotion recognition using hierarchical spatial–temporal learning transformer from regional to global brain. Neural Netw. 2024, 179, 106624. [Google Scholar] [CrossRef]
Kulasingham, J.P.; Vibujithan, V.; De Silva, A.C. Deep belief networks and stacked autoencoders for the P300 Guilty Knowledge Test. In Proceedings of the 2016 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES), Kuala Lumpur, Malaysia, 4–8 December 2016; pp. 127–132. [Google Scholar] [CrossRef]
Zhou, Y.; Xu, X.; Zhang, D. Cognitive load recognition in simulated flight missions: An EEG study. Front. Hum. Neurosci. 2025, 19, 1542774. [Google Scholar] [CrossRef]
Wang, P.; Jiang, A.; Liu, X.; Shang, J.; Zhang, L. LSTM-Based EEG Classification in Motor Imagery Tasks. IEEE Trans. Neural Syst. Rehabil. Eng. 2018, 26, 2086–2095. [Google Scholar] [CrossRef] [PubMed]
Gao, Z.; Wang, X.; Yang, Y.; Mu, C.; Cai, Q.; Dang, W.; Zuo, S. EEG-Based Spatio–Temporal Convolutional Neural Network for Driver Fatigue Evaluation. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2755–2763. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Xu, C.; Wen, Z.; Dong, Y. Trust EEG epileptic seizure detection via evidential multi-view learning. Inf. Sci. 2025, 694, 121699. [Google Scholar] [CrossRef]
Zhu, J.; Jiang, C.; Chen, J.; Lin, X.; Yu, R.; Li, X.; Hu, B. EEG based depression recognition using improved graph convolutional neural network. Comput. Biol. Med. 2022, 148, 105815. [Google Scholar] [CrossRef]
Zhang, D.; Yao, L.; Zhang, X.; Wang, S.; Chen, W.; Boots, R. Cascade and parallel convolutional recurrent neural networks on EEG-based intention recognition for brain computer interface. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Tan, C.; Sun, F.; Zhang, W.; Chen, J.; Liu, C. Multimodal Classification with Deep Convolutional-Recurrent Neural Networks for Electroencephalography. In Proceedings of the Neural Information Processing; Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S.M., Eds.; Springer: Cham, Switzerland, 2017; pp. 767–776. [Google Scholar]
Wang, Z.; Wang, Y.; Hu, C.; Yin, Z.; Song, Y. Transformers for EEG-Based Emotion Recognition: A Hierarchical Spatial Information Learning Model. IEEE Sens. J. 2022, 22, 4359–4368. [Google Scholar] [CrossRef]
Li, Y.; Zheng, W.; Zong, Y.; Cui, Z.; Zhang, T.; Zhou, X. A Bi-Hemisphere Domain Adversarial Neural Network Model for EEG Emotion Recognition. IEEE Trans. Affect. Comput. 2021, 12, 494–504. [Google Scholar] [CrossRef]
Luo, Y.; Zhang, S.Y.; Zheng, W.L.; Lu, B.L. WGAN Domain Adaptation for EEG-Based Emotion Recognition. In Proceedings of the Neural Information Processing; Cheng, L., Leung, A.C.S., Ozawa, S., Eds.; Springer: Cham, Switzerland, 2018; pp. 275–286. [Google Scholar]
Fares, A.; Zhong, S.h.; Jiang, J. Brain-media: A Dual Conditioned and Lateralization Supported GAN (DCLS-GAN) towards Visualization of Image-evoked Brain Activities. In Proceedings of the MM ’20: Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1764–1772. [Google Scholar] [CrossRef]
Jiang, J.; Fares, A.; Zhong, S.H. A Brain-Media Deep Framework Towards Seeing Imaginations Inside Brains. IEEE Trans. Multimed. 2021, 23, 1454–1465. [Google Scholar] [CrossRef]
Jiang, J.; Fares, A.; Zhong, S.H. A Context-Supported Deep Learning Framework for Multimodal Brain Imaging Classification. IEEE Trans. Hum.-Mach. Syst. 2019, 49, 611–622. [Google Scholar] [CrossRef]
Fares, A.; Zhong, S.h.; Jiang, J. EEG-based image classification via a region-level stacked bi-directional deep learning framework. BMC Med. Inform. Decis. Mak. 2019, 19, 1–11. [Google Scholar] [CrossRef]
Spampinato, C.; Palazzo, S.; Kavasidis, I.; Giordano, D.; Souly, N.; Shah, M. Deep Learning Human Mind for Automated Visual Classification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4503–4511. [Google Scholar] [CrossRef]
Xue, S.; Jin, B.; Jiang, J.; Guo, L.; Liu, J. A hybrid local-global neural network for visual classification using raw EEG signals. Sci. Rep. 2024, 14, 27170. [Google Scholar] [CrossRef]
Kavasidis, I.; Palazzo, S.; Spampinato, C.; Giordano, D.; Shah, M. Brain2Image: Converting Brain Signals into Images. In Proceedings of the MM ’17: 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1809–1817. [Google Scholar] [CrossRef]
Tirupattur, P.; Rawat, Y.S.; Spampinato, C.; Shah, M. ThoughtViz: Visualizing Human Thoughts Using Generative Adversarial Network. In Proceedings of the MM ’18: 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 950–958. [Google Scholar] [CrossRef]
Serikawa, S.; Lu, H. Underwater image dehazing using joint trilateral filter. Comput. Electr. Eng. 2014, 40, 41–50. [Google Scholar] [CrossRef]
Lu, H.; Li, Y.; Mu, S.; Wang, D.; Kim, H.; Serikawa, S. Motor Anomaly Detection for Unmanned Aerial Vehicles Using Reinforcement Learning. IEEE Internet Things J. 2018, 5, 2315–2322. [Google Scholar] [CrossRef]
Lu, H.; Li, Y.; Chen, M.; Kim, H.; Serikawa, S. Brain Intelligence: Go beyond Artificial Intelligence. Mob. Netw. Appl. 2018, 23, 368–375. [Google Scholar] [CrossRef]
Lu, H.; Wang, D.; Li, Y.; Li, J.; Li, X.; Kim, H.; Serikawa, S.; Humar, I. CONet: A Cognitive Ocean Network. IEEE Wirel. Commun. 2019, 26, 90–96. [Google Scholar] [CrossRef]
Li, M.; Yu, P.; Shen, Y. A spatial and temporal transformer-based EEG emotion recognition in VR environment. Front. Hum. Neurosci. 2025, 19, 1517273. [Google Scholar] [CrossRef]
Lee, A.C.; Robbins, T.W.; Pickard, J.D.; Owen, A.M. Asymmetric frontal activation during episodic memory: The effects of stimulus type on encoding and retrieval. Neuropsychologia 2000, 38, 677–692. [Google Scholar] [CrossRef]
Ahmed, M.A.; Loo, C.K. Emotion recognition based on correlation between left and right frontal EEG assymetry. In Proceedings of the 2014 10th France-Japan/8th Europe-Asia Congress on Mecatronics (MECATRONICS2014-Tokyo), Tokyo, Japan, 27–29 November 2014; pp. 99–103. [Google Scholar] [CrossRef]
Mohd Aris, S.A.; Taib, M.N.; Sulaiman, N. Classification of frontal alpha asymmetry using k-Nearest Neighbor. In Proceedings of the 2012 International Conference on Biomedical Engineering (ICoBE), Penang, Malaysia, 27–28 February 2012; pp. 74–78. [Google Scholar] [CrossRef]
Koolen, N.; Dereymaeker, A.; Räsänen, O.; Jansen, K.; Vervisch, J.; Matic, V.; De Vos, M.; Van Huffel, S.; Naulaers, G.; Vanhatalo, S. Interhemispheric synchrony in the neonatal EEG revisited: Activation synchrony index as a promising classifier. Front. Hum. Neurosci. 2014, 8, 1030. [Google Scholar] [CrossRef]
Gu, Q.; Li, Z.; Han, J. Generalized Fisher score for feature selection. In Proceedings of the UAI’11: Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, Barcelona, Spain, 14–17 July 2011; pp. 266–273. [Google Scholar]
Eilers, P.H.C. Parametric Time Warping. Anal. Chem. 2004, 76, 404–411. [Google Scholar] [CrossRef] [PubMed]
Fang, C. From Dynamic Time Warping (DTW) to Hidden Markov Model (HMM) Final project report for ECE 742 Stochastic Decision; University of Cincinnati: Cincinnati, OH, USA, 2009. [Google Scholar]
Palazzo, S.; Spampinato, C.; Kavasidis, I.; Giordano, D.; Schmidt, J.; Shah, M. Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3833–3849. [Google Scholar] [CrossRef]
Wehrens, R.; Bloemberg, T.G.; Eilers, P.H. Fast parametric time warping of peak lists. Bioinformatics 2015, 31, 3063–3065. [Google Scholar] [CrossRef]
Arjun, A.; Rajpoot, A.S.; Raveendranatha Panicker, M. Introducing Attention Mechanism for EEG Signals: Emotion Recognition with Vision Transformers. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Mexico, 1–5 November 2021; pp. 5723–5726. [Google Scholar] [CrossRef]
Tao, Y.; Sun, T.; Muhamed, A.; Genc, S.; Jackson, D.; Arsanjani, A.; Yaddanapudi, S.; Li, L.; Kumar, P. Gated Transformer for Decoding Human Brain EEG Signals. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Guadalajara, Mexico, 1–5 November 2021; pp. 125–130. [Google Scholar] [CrossRef]
Mukherjee, P.; Das, A.; Bhunia, A.K.; Roy, P.P. Cogni-Net: Cognitive Feature Learning Through Deep Visual Perception. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 4539–4543. [Google Scholar] [CrossRef]
Kaneshiro, B.; Perreau Guimaraes, M.; Kim, H.S.; Norcia, A.M.; Suppes, P. A Representational Similarity Analysis of the Dynamics of Object Processing Using Single-Trial EEG Classification. PLoS ONE 2015, 10, e0135697. [Google Scholar] [CrossRef]

Figure 2. The 68-dimensional asymmetry features and their corresponding labels, from standard EEG channel placements for 128-channel EEG signals, achieved through dimension reduction based on brain asymmetry.

Figure 3. Illustration of 128-channel distribution of the Fisher score.

Figure 4. Comparison of models convergence speed during training before and after spatiotemporal disambiguation.

Figure 5. The electrodes’ physical locations of the top 32 channels of each subject (a–f) and all subjects (g).

Table 1. Average precision rates of dimensionality reduction based on brain asymmetry on LSTM and BiLSTM.

Dimension	Models
Dimension	LSTM	BiLSTM
128	83.41%	87.93%
68	88.82%	91.75%

Table 2. Average precision rates of spatially representative channels on LSTM and BiLSTM.

Selected Channel	Models
Subset	LSTM	BiLSTM
96 channels	89.53%	81.29%
32 channels	39.10%	90.06%
10 channels	21.23%	47.74%

Table 3. Average precision rates of time-warping on LSTM and BiLSTM.

Time Warping Algorithm	Selected Channel Subset	Models
Time Warping Algorithm	Selected Channel Subset	LSTM	BiLSTM
DTW	32 channels	91.84%	92.01%
DTW	64 channels	84.03%	92.10%

Table 4. Average precision rates of Spatiotemporal Disambiguation BiLSTM (SD-BiLSTM).

Time Warping Algorithm	SD-BiLSTM
DTW	94.23%
Fast-PTW	93.35%

Table 5. Comparative classification performance between the proposed spatiotemporal disambiguation framework and existing benchmark methods.

Models	Accuracy
Proposed SD-BiLSTM Framework	94.23%
RNN-based Framework [32]	82.9%
Vision Transformer (ViT) based EEG [50]	91.85%
Transformer with Positional Encoding [51]	92.18%
Siamese network [48]	93.7%
Multimodal-based Framework [30]	94.1%
CogniNet [52]	89.6%
RS-LDA [53]	13.0%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fares, A. A Novel Spatiotemporal Framework for EEG-Based Visual Image Classification Through Signal Disambiguation. Appl. Syst. Innov. 2025, 8, 121. https://doi.org/10.3390/asi8050121

AMA Style

Fares A. A Novel Spatiotemporal Framework for EEG-Based Visual Image Classification Through Signal Disambiguation. Applied System Innovation. 2025; 8(5):121. https://doi.org/10.3390/asi8050121

Chicago/Turabian Style

Fares, Ahmed. 2025. "A Novel Spatiotemporal Framework for EEG-Based Visual Image Classification Through Signal Disambiguation" Applied System Innovation 8, no. 5: 121. https://doi.org/10.3390/asi8050121

APA Style

Fares, A. (2025). A Novel Spatiotemporal Framework for EEG-Based Visual Image Classification Through Signal Disambiguation. Applied System Innovation, 8(5), 121. https://doi.org/10.3390/asi8050121

Article Menu

A Novel Spatiotemporal Framework for EEG-Based Visual Image Classification Through Signal Disambiguation

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Brain Asymmetry-Based Dimensionality Reduction

3.2. Channel Selection for Spatial Representation

3.3. Time-Warping

4. Experimental Results

4.1. Evaluations of Three Key Stages

4.2. Experimental Results of Spatiotemporal Disambiguation

4.3. Comparative Analysis with State-of-the-Art Methods

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI