1. Introduction
Biometric systems have become essential tools for secure identification [
1]. However, traditional methods such as fingerprint [
2], facial recognition [
3], and sound recognition systems suffer from various vulnerabilities [
4]. In contrast, EEG signals provide a promising alternative due to their inherent complexity, uniqueness, and resistance to external manipulation. As EEG signals reflect internal neural activities, they are extremely difficult to forge or replicate, making them a reliable biometric modality [
5]. The intrinsic variability and noise characteristics of EEG signals also make them robust against adversarial attack while offering high discrimination potential for individual identification tasks [
6].
Machine learning (ML) techniques have been increasingly applied to EEG signal analysis to extract meaningful patterns and improve classification performance [
7]. Traditional EEG feature extraction methods, such as Fast Fourier Transform (FFT) [
8], Power Spectral Density (PSD) [
9], and Wavelet Transform [
10], are commonly used to capture frequency-domain and time–frequency-domain features. However, these techniques rely on hand-crafted parameters and assumptions, limiting their ability to generalize across subjects and tasks. Beyond these traditional feature extraction methods, researchers have studied pattern recognition approaches to identify distinctive characteristics in various forms of EEG signals [
11,
12].
Convolutional neural networks (CNNs) have emerged as a powerful tool for automated feature extraction from EEG signals, offering significant advantages over traditional manual feature engineering methods [
13]. Thanks to their hierarchical learning capabilities, CNNs can capture complex and discriminative patterns directly from EEG data [
14], making them particularly valuable for tasks such as brain–computer interfacing [
15], neurological disorder diagnosis [
16], and biometric identification [
17]. Ozdenizci et al. [
18] demonstrated the effectiveness of CNNs in analyzing multi-channel EEG data, where the network successfully identified localized features in raw signals, time–frequency representations, and topographic scalp projections. On the other hand, recurrent neural networks (RNNs), especially long short-term memory (LSTM) networks and gated recurrent units (GRUs), have proven highly effective in modeling the temporal dynamics inherent in EEG signals, which often exhibit non-stationary and time-varying properties [
19]. Additionally, unsupervised approaches such as AEs have been explored for EEG feature extraction [
20]. As shown in [
21], AEs can learn compact, informative representations of EEG signals, enabling efficient personal identification.
Despite progress, most of the EEG-based studies still rely on 1D signal vectors or 2D time–frequency projections, which fail to fully preserve the spatial, temporal, and spectral relationships in the data. As stated by Shah et al., 3D representation of EEG data encompasses rich spectral, spatial, and temporal information [
22]. The novelty of our study, therefore, lies in the design and use of a 3D topographic cube representation that stacks time-resolved EEG topographic maps to capture rich spatio-temporal dependencies. This 3D cube bridges raw EEG data and deep learning models in a more biologically and spatially interpretable format. Furthermore, this study leverages a self-referencing convolutional autoencoder (CAE) to automatically extract latent representations from these volumetric EEG cubes.
To the best of our knowledge, this is the first work that combines 3D EEG topographic cubes with a self-supervised CAE framework for personal identification. This combination enables a more discriminative and generalizable feature space without reliance on manual feature engineering. Soysal et al. [
23] demonstrated that features extracted using autoencoder-based architectures outperform conventional statistical features in EEG-based classification tasks. This evidence highlights the strength of unsupervised representation learning in capturing subject-specific neural signatures. Building on that foundation, the present work introduces a convolutional autoencoder (CAE) applied to 3D EEG topographic cubes, enabling the model to effectively learn spatio-temporal dependencies under varying stimulus conditions. The significance of this work lies in its proposal of a scalable and transferable framework that enhances EEG-based personal identification through automated feature extraction from rich volumetric representations.
In this study, to ensure diverse neural activation and to enrich the training dataset, EEG signals were collected under three types of stimuli: resting state, cognitive brain activity, and sound stimuli. Resting-state EEG reflects the brain’s baseline activity and offers insights into stable, intrinsic neural patterns unique to individuals [
24]. Cognitive tasks stimulate decision-making and motor planning areas, eliciting dynamic brain activity that varies significantly across subjects [
25]. Sound stimulates the auditory cortex and emotional processing regions, providing an additional modality of brain engagement [
26].
Permanence of the EEG signals presents a significant challenge for biometric applications, as neural patterns exhibit considerable variability in response to tasks, mental state, and experimental conditions recorded over time. Arnau González et al. designed the BED dataset with collected EEG signals across three separate sessions, indicating a more realistic cross-session biometric evaluation [
27]. Note that the models using only within-session data for evaluation purposes may overestimate performance [
25,
26,
27].
Topographic map representations of EEG signals offer a spatial view of neural activity by projecting channel-wise data onto a 2D scalp model [
28]. This spatial encoding preserves the relationships between electrode positions and supports visualization of region-specific brain activity. By stacking these 2D topographic frames over consecutive time windows, a 3D EEG cube can be constructed. This volumetric representation captures both spectral and temporal dependencies, offering a richer context for feature learning. The 3D EEG cube serves as a bridge between raw EEG signals and machine-learnable input formats, making it ideal for convolutional-based learning models and facilitating the extraction of spatio-temporal patterns.
In this study, various machine learning classifiers were evaluated to classify features extracted from EEG signals. KNN is a simple yet effective distance-based classifier that works well with low-dimensional data [
29]. RF, a tree-based ensemble method, provides robustness against overfitting and handles high-dimensional features efficiently [
30]. SVMs are known for their ability to create optimal decision boundaries in high-dimensional space using kernels [
31]. ANNs offer flexible nonlinear modeling and can capture complex data patterns [
32]. These classifiers are assessed for their performance in distinguishing between individuals based on the features derived from the CAE.
This paper aims to explore the effectiveness of a self-referencing convolutional autoencoder as an automated extraction method, along with the utilization of spatio-temporal data cube representation for EEG-based personal identification. The rest of this paper is organized as follows: The ‘Previous Work’ section reviews the relevant literature and previous work in the field.
Section 2 describes the proposed methods in detail.
Section 3 outlines the experimental setup and the data used in this study.
Section 4 presents the results and performance evaluation. Finally,
Section 5 discusses outcomes and potential future directions.
Previous Work
EEG-biometric studies often utilize visually evoked potentials (VEPs) and other event-related EEG records. Subjects are presented with stimuli that bring about brain responses, and the resulting EEG waveforms are used as biometric signatures. This is clearly realized in the study carried out by Das et al., as they performed a longitudinal study using multiple visual stimuli and reported a consistent person-specific EEG pattern [
33]. Similarly, Koike-Akino et al. showed that P300 components from a rapid serial visual presentation paradigm can yield very high identification accuracy [
34].
Convolutional neural networks have also been applied to EEG signals to automatically learn discriminative features. Ozdenizci et al. present the utilization of CNNs across EEG channels for capturing local and special dependencies over the EEG data, spectrograms, and topographic projections [
18]. Similarly, recurrent neural networks such as long short-term memory and gated recurrent units are used to learn temporal dependencies of EEG signals [
19]. Banee et al. [
21] showed the utilization of AEs for feature extraction in the application of EEG-based personal identification.
Resting-state recordings refer to a period when an individual is awake but not actively engaged in any specific task or mental effort [
35]. Resting-state recordings are particularly valuable in EEG-based identification, not only because they are easy to acquire but also because they capture the brain’s intrinsic activity patterns, which are stable and distinctive enough for biometric recognition. Fraschini et al. used the eigenvector centrality of functional connectivity graphs composed from resting EEG as a brainprint signature [
36]. Moreover, CNNs have been trained on resting EEG spectrograms or raw signals for identification [
18]. They claimed that features extracted from resting state responses provide stable biometric patterns without any need for a specific stimulus.
Marcos et al. explored EEG-based biometric identification using both resting-state and task-induced EEG with visual/auditory stimuli. Time–frequency features were extracted via Continuous Wavelet Transform (CWT) across five bands: Delta (0.5–4 Hz), Theta (4–8 Hz), Alpha (8–13 Hz), Beta (13–30 Hz), and Gamma (30–50 Hz). They reported that the cross-session identification accuracy dropped to ~72–85% from the highest identification accuracy of ~99% when they used both training and testing datasets from the same session. The results highlight the superiority of stimulus-evoked EEG for biometric applications but underscore the challenge of longitudinal consistency [
37]. Earlier in
Section 1, this problem was also pointed out by several researchers.
2. Methods
This section describes the proposed framework for EEG-based identification. The framework consists of four key modules: preprocessing, EEG data cube generation from topo-maps, feature extraction using CAE, and subject identification using SVM, RF, KNN, and ANN models.
Figure 1 illustrates the overall pipeline.
2.1. Preprocessing
In the preprocessing stage, notch filters were employed to suppress power line interference and its harmonics commonly present in EEG signals. A notch filter, also known as a band-stop filter [
22], is designed to attenuate a narrow frequency band while allowing frequencies outside that range to pass. Specifically, notch filters were applied at 60 Hz, 120 Hz, 180 Hz, and 240 Hz to target the fundamental frequency of electrical line noise and its first three harmonics.
Figure 2 shows an example of the resulting notch-filtered EEG data. In addition to noise removal, EEG signals were band-pass filtered to isolate eight specific frequency bands: Delta (1–4 Hz), Theta (4–8 Hz), Alpha (8–13 Hz), Beta (13–32 Hz), Delta–Beta (De2Be, 1–32 Hz), Theta–Beta (Th2Be, 4–32 Hz), Gamma (32–125 Hz), and an All band covering frequencies above 1 Hz.
Among these, hybrid bands were specifically investigated, such as Delta–Beta and Theta–Beta, which span broader neural oscillations across multiple adjacent frequency ranges. These hybrid bands enable the capture of more complex and comprehensive brain activity patterns that may be overlooked when analyzing traditional single bands alone. Prior to this step, a high-pass filter with a cutoff frequency of 0.5 Hz was applied to eliminate slow baseline drifts and low-frequency physiological artifacts.
2.2. Data Cube Generation
Following the preprocessing, a data cube was generated from 24-channel EEG streams. A data cube is a stack of topographic maps. The data cube was created in two main steps: preparation of streams and projection of streams onto 2D frames.
Figure 3 demonstrates the topographic data cube generation steps.
The length of each stream is 3 s, producing 3000 EEG data points. Due to the minor data loss at the end of some trials, only the first 2880 data points were used for further processing. This stream was down-sampled to reduce the temporal resolution and to smooth out the high-frequency noise. The down-sampling was performed by taking the median value of every 10 consecutive data points, resulting in 288 time points per stream. Each down-sampled stream was divided into 9 non-overlapping segments. Each session data includes 35, 10, and 10 sets of streams from 24 channels for resting state, sound, and cognitive stimuli, respectively. Therefore, 315 (9 × 35), 90 (9 × 10), and 90 (9 × 10) EEG cubes were generated for resting state, music stimuli, and cognitive task per subject, respectively. Each cube was composed of 32 (= 288/9) frames as shown in
Figure 3b.
In creating a topographic map grid for each EEG cube, 3D coordinates of the electrodes were projected onto 2D planes using an azimuthal projection method. Next, frames of EEG cubes were mapped to a 32 × 32 grid using the bilinear interpolation (as shown in
Figure 3c), which estimates each pixel value as a weighted average of the four nearest neighbors. The colors in the topographic maps represent brain activity levels, where red indicates areas of high neural response and blue indicates regions with lower or minimal activity.
2.3. Automated Feature Extraction
In this study, 3D CAE is utilized to process EEG data cubes. Three-dimensional CAE is well suited for capturing both spatial and temporal dependencies. Formally, an autoencoder consists of an encoder function
(1) with an input tensor
and a latent feature tensor Ƶ
(
d denotes the size of the tensor) and a decoder function
(2) with an output tensor of the same size as
X; both functions consist of a non-linear activation function such as Relu, sigmoid, and tanh. The encoder and decoder have symmetric network structures, ensuring the input size and reconstructed output size remain the same. Training of CAE is optimized to minimize the reconstruction loss
function (3) that measures the mean squared error between the input and its reconstructed version by the decoder. This optimization enables extracting a distinct and compact representation from the input domain, along with eliminating the need for manual feature engineering.
Figure 4 illustrates the general mechanism of an autoencoder applied to a 3D input space.
2.4. Identification
A 3D CAE extracts compact latent features from EEG data cubes. These features represent the essential spatial and temporal patterns within the EEG signals and serve as input for a subsequent classification task. This two-stage method uses the autoencoder’s unsupervised learning to generate meaningful inputs for supervised classification. Four classification algorithms are employed for subject identification: KNN, ANN, SVM, and RF. These classifiers were selected to evaluate the generalizability and robustness of the learned feature representations across diverse algorithmic paradigms—ranging from ensemble learning, instance-based learning, deep learning, and margin-based classification. Moreover, during longitudinal analysis for brain signals, each classifier is trained with the data from Session 1 and tested with the data from Session 2.
2.5. Hyperparameter Tuning
To determine the optimal configurations for both the autoencoder and the classification algorithms, Keras Tuner with Bayesian optimization was utilized as a unified strategy for hyperparameter tuning. This approach efficiently explores complex and high-dimensional parameter spaces by balancing exploration and exploitation, leading to the automatic selection of configurations that enhance model performance and generalization. For the autoencoder, the search space included critical architectural parameters such as the number of convolutional layers, the number of units in the layers, kernel sizes, kernel initializer, activation functions, and optimizer. The objective was to minimize the reconstruction loss while preserving a compact and discriminative latent representation of the EEG input.
Similarly, to ensure optimal classification performance, Bayesian optimization was employed to systematically tune each classifier’s hyperparameters with the objective of maximizing validation accuracy. This includes tuning the number of trees and maximum depth for the RF, selecting the appropriate number of neighbors and distance metrics for KNN, and optimizing architectural choices such as the number of hidden layers, network units per layer, weight initializer, and the optimizer for the ANN. In addition, the SVM, hyperparameters such as the kernel type and regularization parameter C, and kernel-specific parameters like Gamma (in the case of RBF kernels) were fine-tuned.
Thanks to this automated optimization process across both the feature extraction and classification stages, the entire pipeline—from raw EEG cubes to subject identification—was built on robust and well-tuned components.
Table 1 provides a comprehensive overview of the hyperparameter search spaces and value ranges used during this process.
3. Experimental Setup and Data
The EEG dataset was collected from 7 subjects who were college students after receiving the institutional review board (IRB) approval of the university. The EEG data was collected in two sessions in spring 2023. Each session was set apart for 10 days. We used the mBrain train amplifier with a headcap of 24 channels and Neuro Behavioral Systems’ Presentation (version 2.4) software for the collection of brain signals. The sampling frequency of the amplifier was 1 kHz. The headcap electrode locations were designed according to the 10–20 system. Each recording was saved using a designated subject identity. The subject IDs are as follows: sb106, sb328, sb330, sb381, sb455, sb717, and sb768. A longitudinal study requires that the test dataset be collected after the collection of the training dataset. To ensure the validity of longitudinal evaluation, the models were trained exclusively on the data from Session 1 and tested on the data from Session 2. This approach complies with the standard requirement for temporal separation between training and test datasets in longitudinal studies.
Table 2 presents the hardware configuration used to perform all computations in this study, including training and testing of the autoencoder and classification models. It also provides the total train and test computation time for a specific band and stimuli (Gamma–bk_pic_EC).
Three types of stimuli were used to record EEG responses: resting state, cognitive, and auditory. During the resting state condition, participants were instructed to close their eyes and remain for 3 s without engaging in any task. In the cognitive task, participants asked to perform inner speech repeated the word “evergreen” in 3 s. For the auditory stimulus, participants listened to the sound of a musical instrument, the conga, for a duration of 3 s. Resting state EEG data were collected over three trials per session, resulting in a total of 35 resting-state signals. Cognitive and auditory responses were collected over 10 trials per session, yielding 10 EEG recordings for each condition. All EEG segments were recorded for a consistent 3 s duration per trial. Note that the dataset utilized in this study will be available upon request.
4. Results and Discussion
The hyperparameter search determined the optimized CAE architecture for each stimulus type, as shown in
Figure 5. For resting state recordings, the optimized CAE generated a set of 16 latent feature tensors of size 8 × 8 × 8 each. In contrast, the size of each latent feature extracted from the sound and cognitive recordings was determined as 16 × 16 × 16, along with 32 network units at the last layer of the encoder module. This finding shows that the representation space obtained from the resting state patterns is more compact than that of other stimulus conditions. In comparison of the most discriminative EEG bands, the results showed that the highest score was observed at the Gamma band and the Th2Be [4–32 Hz] range when the resting state and both the sound and cognitive stimuli were used, respectively.
Table 3 provides a summary of the optimal parameters selected for both the autoencoder and classifier networks within specific band ranges.
Regarding computational feasibility, it is important to acknowledge that the training of the convolutional autoencoder (CAE) model required approximately 49 min per configuration on a high-performance GPU (NVIDIA A100 40GB). While this level of computational demand is manageable in research and development settings, it may pose challenges for real-time or resource-constrained applications. However, it is important to note that CAE training is conducted offline, and once the model is trained, the feature extraction and classification processes are relatively lightweight and can be performed efficiently on standard hardware. This makes deployment in real-world applications feasible, especially in scenarios where models are trained centrally and deployed on edge devices for inference. Further optimization—such as model pruning, quantization, or use of more compact architectures—can also be explored to reduce the computational footprint without significantly sacrificing performance.
Figure 6 presents the AUC score distribution across different classifiers and EEG bands for three stimulus conditions: resting state, auditory task, and cognitive task. The violin plots present the performance distribution for four classifiers: KNN, SVM, RF, and ANN across eight frequency bands: Delta, Theta, Alpha, Beta, De2Be, Th2Be, Gamma, and All bands. The violin plots show the distribution of AUC values across 5-fold cross-validation scores. In the decoupling of resting state patterns, the Gamma band (32–125 Hz) demonstrated the highest discriminative power, achieving 90.23% AUC across all classifiers. In contrast, for sound and cognitive tasks, the Th2Be range (4–32 Hz) emerged as the most discriminative frequency band, consistently showing superior classification performance.
A clear enhancement in classification performance is observed across the broader frequency ranges, particularly in the 13–32 Hz, 8–32 Hz, 4–32 Hz, and 32–125 Hz bands compared to narrower bands like Delta, Theta, and Alpha. Regarding classifier performance, SVM and RF consistently emerge as the leading models across most EEG bands and stimulus conditions, demonstrating superior and more stable AUC distributions compared to KNN and ANN. The violin plots show that these two classifiers not only achieve higher median performance but also exhibit less variability across the cross-validation folds, indicating more reliable and robust classification outcomes across different frequency bands and experimental paradigms.
Figure 7 presents subject identifiability through AUC scores obtained from pairwise SVM-based classification of the Gamma band patterns extracted from the resting state EEG responses. The results showed a significant inter-subject variability in identifiability performance. Most subjects (sb328, sb330, sb381, sb717, and sb768) demonstrated high and consistent identifiability, with AUC distributions concentrated around 0.95. In contrast, subjects sb106 and sb455 exhibited significantly lower identifiability scores, with a wider score range of approximately 0.42 to 0.99, possibly due to intra-class variability and inter-class similarity of these subjects or weaker correlation structures in extracted features, which may reflect inconsistent cognitive or physiological states during data acquisition.
Figure 8 illustrates the performance metrics of the SVM-based classifier applied to Gamma band EEG signals recorded during the resting state. The results show that removing less identifiable subjects leads to a substantial improvement in the overall model performance.
In addition to the seven-subject and five-subject classification tasks, the SVM-based model was also evaluated on pairwise identification scenarios.
Figure 9 presents the classification results for all subject pairs. Although the model exhibited lower performance when distinguishing between the less identifiable subject pair (sb106 and sb455), it achieved high accuracy in other pairings involving these subjects. For instance, the pair sb106 and sb330 was classified with an accuracy of 96.83%, indicating that sb106 remains distinguishable when paired with certain other subjects.
Figure 10 illustrates the impact of subject variability for the SVM-based classifier on the resting-state Gamma band EEG data cubes.
Figure 10a presents the confusion matrix of seven subjects; the number of misclassified samples from sb106 and sb455 highlights the challenge that the framework faced during identification. We investigated the effect of weakly identifiable (highly similar) subjects by removing them from the datasets;
Figure 10b presents the confusion matrix without these subjects. Compared to the seven-subject case, the classification performance is notably improved.
Figure 10c,d represent the worst-performing and best-performing subject pairs, respectively. While the framework achieved only 44.6% accuracy (ACC) for the lowest-identifiable subject (sb106 and sb455), it attained a significantly higher accuracy of 96.98% when identifying the two most highly distinguishable subjects (sb328 and sb330).
Figure 10e compares identification scores across two-, five-, and seven-subject groups in terms of AUC and ACC. The model achieves near-perfect scores for the two-subject case and maintains high performance with five subjects. However, the performance drops significantly in the even-subject setting, reflecting the challenges introduced by the inclusion of less discriminable subjects.
While the concept of signal permanence is referenced to indicate stable subject-specific patterns and a more explicit treatment of intra-subject variability is crucial for practical biometric applications, the temporal fluctuations in EEG signals within the same individual present a significant challenge. The proposed method addresses this variability implicitly through the use of autoencoder-based feature extraction, which aims to capture robust latent representations that generalize across temporal variations. Additionally, the classification models are trained on multiple EEG segments to improve resilience to within-subject fluctuations.
The superior performance of SVM and Random Forest (RF) classifiers compared to Artificial Neural Network (ANN) and k-Nearest Neighbor (KNN) models can be attributed to several factors. First, SVM and RF are well suited for high-dimensional, low-sample-size data scenarios—common in EEG-based biometrics—where they effectively manage overfitting and capture relevant discriminative patterns. SVM excels in finding optimal decision boundaries, especially in cases where class separation is complex but not abundant in training data. RF, as an ensemble method, enhances robustness through feature bagging and decision tree aggregation, reducing variance and improving generalization. In contrast, ANN models typically require larger datasets to fully leverage their learning capacity, and KNN is sensitive to noise and feature scaling, which can degrade performance in high-dimensional EEG feature spaces.
Comparison with the State of the Art
Using the same session data or mixed session data in model training and testing is a common mistake that results in misleadingly high accuracy scores in EEG-based biometric research [
27]. In the assessment of a biometric system, a longitudinal study must be performed, as pointed out by Nakamura et al. [
38]. Pluciska et al. pointed out this challenge by demonstrating a 20% drop in accuracy scores depending on cross-session data usage [
39]. Similarly, Kostilek et al. reported a 10% drop in accuracy when test and train data were used from separate sessions [
40]. In another study, cross-session classification using an SVM classifier on the publicly available SEED database yielded an accuracy of 79.34% [
41]. Several pattern classifiers were applied for subject identification, achieving an accuracy ranging between 82 and 97% [
42]. Das et al. demonstrate a pipeline using event-related potential (ERP) features that achieved 95% accuracy when trained on one session of data and tested on data from different recording sessions [
33]. Similar to the above studies, Ref [
11] presented a graph convolutional neural network for EEG-based human identification.
In this study, we report a maximum subject identification accuracy of 97.46% under a strict longitudinal evaluation protocol, with training and testing performed on data collected ten days apart. To the best of our knowledge, this is the first study to employ spatio-temporal autoencoder-based feature learning on 3D EEG topographic data cubes for personal identification while conducting a true longitudinal design. This work, therefore, makes a distinctive contribution to the literature by combining methodological thoroughness with state-of-the-art performance.
Table 4 presents a comparative overview of various EEG-based personal identification methods. Ozdenizci et al. utilized deep learning with quadratic discriminant analysis (QDA) on 10 subjects, achieving 72% accuracy [
18]. Kostilek et al. applied autoregressive (AR) features with distance-based classification (DBC) for nine subjects, reaching 77% accuracy. Maiorana et al. reported a notably low equal error rate (EER) of 2% using AR features and hidden Markov models (HMMs) across 45 subjects [
40]. Gonzales et al. combined AR, fractal complexity coefficients (FCCs), and Power Spectral Density (PSD) features with multiple classifiers, including SVM, KNN, Adaboost, and MLP, achieving up to 73% accuracy on 15 subjects [
41]. The proposed method in this study employs autoencoders (AEs) for feature extraction and various classifiers, demonstrating superior performance with Area Under the Curve (AUC) scores ranging from 90.53% to 99.89% across experiments involving two, five, and seven subjects.
5. Conclusions
This study demonstrated the effectiveness of a 3D convolutional autoencoder (CAE) framework for extracting discriminative features from EEG data cubes that capture both spectral and temporal brain dynamics. The proposed method was evaluated using EEG recordings collected under three different stimulus conditions: resting state, sound stimuli, and cognitive tasks. Features extracted by the CAE were classified using four different machine learning models: KNN, SVM, RF, and ANN. Classification performance was assessed across various subject group sizes, achieving AUC scores ranging from 90.53% to 99.89%, depending on the number of subjects and the presence of highly similar (weakly distinguishable) individuals. The experimental design followed a longitudinal setup, where training and testing data were collected in different sessions, ensuring session independence and enabling robust analysis of temporal stability.
Frequency band analysis revealed that, for resting-state EEG, the Gamma band (32–125 Hz) consistently exhibited the highest discriminative power across all classifiers. In contrast, the Theta-to-Beta range (8–32 Hz) yielded superior results for the sound and cognitive task conditions. These findings underscore the adaptability of the proposed framework across different mental states and frequency bands.
Another conclusion to be drawn from this study is that the sample size of the subjects should be increased. Although seven subjects provide an indication of the classification capability of the classifiers, the variability of the subject identifiability is still higher than expected, which would require further analysis. Therefore, the performance trends observed should be interpreted as preliminary (proof of concept). The inter-subject variability observed—particularly among less identifiable individuals—indicates the need for further investigation using larger and more diverse populations to draw stronger general conclusions.
It should be noted that EEG acquisition systems, although non-invasive, may be generally bulky and not user-friendly. Reducing the number of EEG channels would be one of the solutions for making it feasible for real-world applications [
44,
45]. However, in this preliminary study, since we especially focused on and analyzed the spatio-temporal patterns, we intentionally considered all EEG channels. But improving the performance by decreasing the number of channels still needs to be studied.
As part of future work, we plan to develop an attention-based mechanism to better address weakly identifiable subjects. We also aim to explore temporal modeling techniques that capture sequential dependencies between consecutive EEG frames, which may further enhance subject-specific feature representations. Additionally, the integration of multi-stimuli fusion systems—combining EEG responses from different tasks—will be investigated to enrich the biometric signature. To further evaluate the generalizability and robustness of our proposed framework, we intend to validate it using publicly available EEG datasets such as BED and SEED. Furthermore, as the subject pool is expanded into future studies, the clustering techniques to group subjects based on feature similarity and adapt the model accordingly to improve identification performance within subpopulations will be investigated.