Decoding Subject-Driven Cognitive States from EEG Signals for Cognitive Brain–Computer Interface

In this study, we investigated the feasibility of using electroencephalogram (EEG) signals to differentiate between four distinct subject-driven cognitive states: resting state, narrative memory, music, and subtraction tasks. EEG data were collected from seven healthy male participants while performing these cognitive tasks, and the raw EEG signals were transformed into time–frequency maps using continuous wavelet transform. Based on these time–frequency maps, we developed a convolutional neural network model (TF-CNN-CFA) with a channel and frequency attention mechanism to automatically distinguish between these cognitive states. The experimental results demonstrated that the model achieved an average classification accuracy of 76.14% in identifying these four cognitive states, significantly outperforming traditional EEG signal processing methods and other classical image classification algorithms. Furthermore, we investigated the impact of varying lengths of EEG signals on classification performance and found that TF-CNN-CFA demonstrates consistent performance across different window lengths, indicating its strong generalization capability. This study validates the ability of EEG to differentiate higher cognitive states, which could potentially offer a novel BCI paradigm.


Introduction
Brain-computer interfaces (BCIs) are advanced technologies that establish a direct connection between the human brain and external devices [1][2][3].BCIs can interpret users' intentions directly from their brain signals, enabling control of external computers or devices.Currently, BCI devices have been used to assist individuals with motor disabilities to interact with the outside world, such as by enabling movement or feeling through tools like mechanical arms [4][5][6].Based on the method of acquiring neural signals, BCI devices are mainly divided into two categories: invasive and non-invasive [7].The invasive approach utilizes a microelectrode array, which involves surgically implanting electrodes into the depths of the brain cortex to record action potentials of individual neurons, as well as local field potentials of highly concentrated small clusters of neurons and synapses.This method provides high precision in capturing neural information, but carries a relatively high risk of damage to the human brain.In contrast, non-invasive BCIs place signal acquisition devices outside the scalp, making them more easily accepted by the general population [8].Among non-invasive devices, EEG-based BCIs are the most commonly researched due to their low cost, ease of subject recruitment, and high temporal resolution The purpose of this step is to effectively display the characteristics of brain signals in both the time and frequency domains, thereby better revealing patterns and properties of brainwave activities.Each EEG signal channel is transformed into a time-frequency map, illustrating the activity state of this channel at different time points and frequencies.Each time-frequency map is grouped, segmented by frequency range, and decomposed into RGB channels, corresponding to frequency ranges of 0-15 Hz, 15-30 Hz, and 30-45 Hz.Next, the time-frequency maps of all channels are overlaid in the RGB dimension.Finally, the overlaid time-frequency maps are input into the network for training to decode and classify EEG signals corresponding to the four cognitive states.We have chosen Convolutional Neural Networks (CNNs) as the primary tool for our research because CNNs exhibit significant advantages in processing image data.They can effectively analyze images and extract features from them, which makes them particularly proficient in extracting time-frequency features from EEG signals, enabling precise classification of brainwave data.
In this study, to further enhance the neural network's processing of EEG signals, we introduce a channel and frequency attention (CFA) module.The design of this module aims to boost the network's focus on different EEG signal channels, allowing the network to concentrate more on the feature information of key channels, thereby improving the accuracy and robustness of the classification task.By incorporating the CFA module, we anticipate strengthening the neural network's ability to identify important channels within the EEG signals, thus enhancing the precision and efficiency of the classification process.

Experimental Paradigm Design
The experimental paradigm design is illustrated in Figure 2.Each participant is required to complete three sessions of experiments, with each session consisting of 5 blocks.In each block, the participant sequentially engages in four types of imaginations-resting, memory, music, and subtraction: a resting state, a narrative memory task, a music lyrics task, and a subtraction task.The resting state is always performed first, while the order of the three cognitive tasks is balanced.For the resting task, participants let their minds wander without focusing on anything specific.For the memory task, participants are asked to recall events from the moment they woke up until the current time.In the music task, participants are instructed to mentally sing their favorite song lyrics.In the subtraction task, participants are required to count down from 5000 in increments of 3. Participants are instructed to maintain an open-eyed state in a self-driven cognitive state throughout.
are instructed to maintain an open-eyed state in a self-driven cognitive state throughout.
In each block, a 6 s display of "+" on the screen signals the participant to prepare for the experiment, during which the participant is required to gaze at the center of the screen where the "+" is located without moving their body.Subsequently, cue words indicating the four states are displayed on the screen, prompting the participant to use their imagination for 60 s.During this period, the participant must concentrate, avoid head and body movements, and minimize blinking.Finally, a 24 s rest period follows, allowing the participant to relax with minimal body movement as the screen remains blank.A one-minute rest interval separates two consecutive blocks, while a five-minute break is provided between sessions.At the end of the experiment, each participant has 15 min of EEG experimental data for each task state.
Figure 2. Experimental paradigm design.Each participant is required to complete three experimental sessions, each comprising 5 blocks.Within each block, participants engage in four types of mental imagery-resting, memory, music, and subtraction.In each block, a "+" signal appears on the screen to indicate the beginning of the task.Cue words for the four mental states are then displayed for 60 s, followed by a 24 s rest period.There is no break between the 5 blocks within a session, but a five-minute rest interval separates each session.At the conclusion of the experiment, each participant has 15 min of EEG data recorded for each mental imagery task state.
In each block, a 6 s display of "+" on the screen signals the participant to prepare for the experiment, during which the participant is required to gaze at the center of the screen where the "+" is located without moving their body.Subsequently, cue words indicating the four states are displayed on the screen, prompting the participant to use their imagination for 60 s.During this period, the participant must concentrate, avoid head and body movements, and minimize blinking.Finally, a 24 s rest period follows, allowing the participant to relax with minimal body movement as the screen remains blank.A one-minute rest interval separates two consecutive blocks, while a five-minute break is provided between sessions.At the end of the experiment, each participant has 15 min of EEG experimental data for each task state.
Seven healthy male participants aged 22 to 28 years took part in the experiment, all with normal vision or corrected vision with glasses, and no history of electrical or drug therapy 30 days prior to the experiment.They were all right-handed and had no mental illnesses.Participants refrained from staying up late the night before the experiment, ensuring good rest, and abstained from alcohol, smoking, and tea consumption.Detailed explanations of the experimental tasks and procedures were provided to participants before the experiment.All participants signed written informed consent forms and were involved in the experiment.This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Xiangya Hospital of Central South University.
During the data collection process, participants sat comfortably in a chair 0.5 m away from the monitor, with their hands naturally resting on the armrests.The experiments were conducted in a quiet, well-lit, and temperature-controlled laboratory environment.Before the formal experiment commenced, participants were required to complete a 30 min training task.The training session aimed to familiarize participants with the experimental tasks and procedures by practicing each of the four cognitive states-resting, memory, music, and subtraction-five times.Seven healthy male participants aged 22 to 28 years took part in the experiment, all with normal vision or corrected vision with glasses, and no history of electrical or drug therapy 30 days prior to the experiment.They were all right-handed and had no mental illnesses.Participants refrained from staying up late the night before the experiment, ensuring good rest, and abstained from alcohol, smoking, and tea consumption.Detailed explanations of the experimental tasks and procedures were provided to participants before the experiment.All participants signed written informed consent forms and were involved in the experiment.This study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Xiangya Hospital of Central South University.
During the data collection process, participants sat comfortably in a chair 0.5 m away from the monitor, with their hands naturally resting on the armrests.The experiments were conducted in a quiet, well-lit, and temperature-controlled laboratory environment.Before the formal experiment commenced, participants were required to complete a 30 min training task.The training session aimed to familiarize participants with the experimental tasks and procedures by practicing each of the four cognitive states-resting, memory, music, and subtraction-five times.

Data Preprocessing
In our study, we employed the EEGLAB toolbox in MATLAB to preprocess the raw EEG signals.The data preprocessing workflow, as illustrated in Figure 4, involves several key steps.Initially, we applied band-pass filtering (0.1-45 Hz) to the collected EEG signals to retain information within the specific frequency range of interest while reducing noise interference, thereby enhancing signal quality for clearer analysis and interpretation.Due to the high number and even distribution of electrodes on the cap used, we utilized the average reference method to obtain a reference electrode and performed offline re-referencing of the EEG data.Subsequently, the EEG data were decomposed into independent components using Independent Component Analysis (ICA).By identifying components with artifacts related to eye movements, heartbeats, muscle activities, and other non-neural activity-related artifacts, these components were removed to preserve authentic brain-

Data Preprocessing
In our study, we employed the EEGLAB toolbox in MATLAB to preprocess the raw EEG signals.The data preprocessing workflow, as illustrated in Figure 4, involves several key steps.Initially, we applied band-pass filtering (0.1-45 Hz) to the collected EEG signals to retain information within the specific frequency range of interest while reducing noise interference, thereby enhancing signal quality for clearer analysis and interpretation.Due to the high number and even distribution of electrodes on the cap used, we utilized the average reference method to obtain a reference electrode and performed offline re-referencing of the EEG data.Subsequently, the EEG data were decomposed into independent components using Independent Component Analysis (ICA).By identifying components with artifacts related to eye movements, heartbeats, muscle activities, and other non-neural activityrelated artifacts, these components were removed to preserve authentic brainwave signals.
The data were then reorganized to eliminate artifact effects and retain genuine brainwave signals.For bad channels, interpolation repair using the commonly used spherical spline method was conducted based on data from normal channels.Following this, the data were segmented according to the experimental task labels for the four tasks (resting, memory, music, and subtraction), with an additional baseline correction involving the extraction of 6000 ms of resting time before the start of each task.
Each participant provided 60 min of EEG data, with each of the four states having an interval of 15 min.To further analyze the data, we segmented the EEG data for each state into three-second intervals without overlap.Subsequently, we utilized continuous wavelet transform technology to generate corresponding color-coded time-frequency maps.These maps display changes in signal energy intensity across different frequencies over time, with the X-axis representing time (0-3000 ms) and the Y-axis representing frequency (0-45 Hz).The color mapping in time-frequency images, achieved through continuous wavelet transform (CWT), presents an alternative visualization technique for illustrating the distribution of signal energy across various frequencies.Warm hues, such as red and orange, signify high signal power, whereas cooler tones like blue and purple indicate low power levels, effectively capturing temporal and spectral fluctuations in signal intensity.
Nonetheless, it is crucial to recognize that this color-coded depiction constitutes an esthetic variation; fundamentally, the information conveyed is consistent with that of grayscale time-frequency images, differing only in visual presentation.
, x FOR PEER REVIEW 6 of 23 wave signals.The data were then reorganized to eliminate artifact effects and retain genuine brainwave signals.For bad channels, interpolation repair using the commonly used spherical spline method was conducted based on data from normal channels.Following this, the data were segmented according to the experimental task labels for the four tasks (resting, memory, music, and subtraction), with an additional baseline correction involving the extraction of 6000 ms of resting time before the start of each task.Each participant provided 60 min of EEG data, with each of the four states having an interval of 15 min.To further analyze the data, we segmented the EEG data for each state into three-second intervals without overlap.Subsequently, we utilized continuous wavelet transform technology to generate corresponding color-coded time-frequency maps.These maps display changes in signal energy intensity across different frequencies over time, with the X-axis representing time (0-3000 ms) and the Y-axis representing frequency (0-45 Hz).The color mapping in time-frequency images, achieved through continuous wavelet transform (CWT), presents an alternative visualization technique for illustrating the distribution of signal energy across various frequencies.Warm hues, such as red and orange, signify high signal power, whereas cooler tones like blue and purple indicate low power levels, effectively capturing temporal and spectral fluctuations in signal intensity.Nonetheless, it is crucial to recognize that this color-coded depiction constitutes an esthetic variation; fundamentally, the information conveyed is consistent with that of grayscale time-frequency images, differing only in visual presentation.
These time-frequency images were grouped, segmented by frequency range, and decomposed into RGB channels, corresponding to frequency ranges of 0-15 Hz, 15-30 Hz, and 30-45 Hz.Each channel shows the energy distribution within different frequency ranges: the red channel corresponds to low-frequency energy, the green channel to moderate frequency energy, and the blue channel to high-frequency energy.The magnitude of grayscale values reflects the power within each frequency band, with higher values indicating higher power.By overlaying the grayscale values from all three channels, the original colored image was obtained, showcasing the signal strength across all frequencies.
After compressing each single-channel time-frequency map to a size of 60 × 100, we combined the 59 channels' time-frequency maps onto the RGB channels to create composite images with 177 channels (59 RGB three-channel overlays).Finally, these composite images were used as input data for the neural network training, with a dimension of (32, 177, 60, 100)-where 32 denotes batch size, 177 represents the number of channels in the These time-frequency images were grouped, segmented by frequency range, and decomposed into RGB channels, corresponding to frequency ranges of 0-15 Hz, 15-30 Hz, and 30-45 Hz.Each channel shows the energy distribution within different frequency ranges: the red channel corresponds to low-frequency energy, the green channel to moderate frequency energy, and the blue channel to high-frequency energy.The magnitude of grayscale values reflects the power within each frequency band, with higher values indicating higher power.By overlaying the grayscale values from all three channels, the original colored image was obtained, showcasing the signal strength across all frequencies.
After compressing each single-channel time-frequency map to a size of 60 × 100, we combined the 59 channels' time-frequency maps onto the RGB channels to create composite images with 177 channels (59 RGB three-channel overlays).Finally, these composite images were used as input data for the neural network training, with a dimension of (32, 177, 60, 100)-where 32 denotes batch size, 177 represents the number of channels in the timefrequency maps, 60 signifies the height of the time-frequency maps, and 100 indicates the width of the time-frequency maps.These processed data were subsequently fed into the neural network for training.

Network Architecture
In this study, we propose a custom convolutional neural network model named TF-CNN-CFA designed to handle EEG signal classification tasks.The neural network model comprises several key components, including convolutional layers, pooling layers, and fully connected layers.Specifically, the model consists of four convolutional layers (Conv1, Conv2, Conv3, and Conv4), with Conv4 incorporating a dropout layer and an ReLU nonlinear activation function layer to enhance model generalization and non-linear feature representation.Additionally, the model includes three pooling layers (Pool1, Pool2, and Pool3) to reduce feature dimensions and extract salient features.Notably, the model integrates a channel and frequency attention (CFA) module aimed at enhancing the network's focus on different EEG signal channels to capture crucial features and improve classification accuracy.Finally, the model structure encompasses a fully connected layer (FC) responsible for learning feature representations and achieving the 4-classification task.The network parameters are detailed in Table 1.The input layer N1 is sized at (32,177,60,100).It undergoes a convolutional operation with a 3 × 3 kernel size (conv1) to produce N2, maintaining the feature map size at (32,177,60,100).Subsequently, a CFA module operation is applied to maintain the size.Next, a 5 × 5 pooling operation (pool) is performed on N2 to obtain N3, resulting in a feature map size of (32,177,12,20).Following this, a 5 × 5 convolution operation with conv2 on N3 generates N4 with a feature map size of (32,128,12,20).The subsequent 2 × 2 pooling operation (pool2) on N4 yields N5 with a feature map size of (32,128,6,10).N5 then undergoes a 5 × 5 convolution operation (conv3) to produce N6 with a feature map size of (32,128,6,10).This is followed by another 5 × 5 convolution operation (conv4) on N6 to obtain N7 with a feature map size of (32,64,6,10).A dropout operation is conducted on N7, maintaining its size before a 2 × 2 pooling operation (pool4) on N7 produces N8 with a feature map size of (32,64,3,5).N8 is then flattened into a one-dimensional vector to obtain N9 with a size of (32,960).Finally, a fully connected layer (fc) maps N9 to N10, achieving the 4-classification task.

Channel and Frequency Attention (CFA) Module
In our study, we introduce a CFA module to enhance the neural network's performance of classifying EEG signals.This module receives an input tensor of dimensions (32,177,60,100), where the batch size is 32, the number of channels in the time-frequency maps is 177 (derived from 59 RGB three-channel time-frequency maps), and the height and width of the time-frequency maps are 60 and 100, respectively.This module allows for dynamic adjustments of weights across different channels, enabling the network to focus more on key channel information to enhance model performance and accuracy.
The specific operational flow is as follows: Initially, through global max-pooling and global average-pooling operations, features of each channel are extracted to a spatial dimension of 1 × 1, resulting in an output size of (32, 177, 1, 1).Subsequently, a fully connected layer (fc1) reduces the number of channels to 59, applying ReLU activation for non-linear transformation.Then, another fully connected layer (fc2) restores the number of channels to 177, followed by another application of ReLU activation.Finally, channel weights are normalized using the Sigmoid activation function, multiplying the channel weights with the input feature map to produce an output feature map with a size of (32,177,60,100).The network structure of the CFA module is depicted in Table 2.
By integrating the CFA module, the neural network can better learn and utilize key channel features, enhancing focus on important channels, thereby improving model performance and accuracy when processing EEG signals.The introduction of this module significantly enhances and optimizes EEG signal classification tasks.

Training Process and Strategies
To ensure the stability of the model, we employed a five-fold cross-validation method during training.Within the limited dataset context, each participant's data samples (1200) were divided into an 80% training set (960) and a 20% test set (240).Subsequently, the 960 training samples per participant were randomly split into 5 equal parts.Four of these parts were assigned to the training set, while the remaining fifth part was used as the validation set for model verification.Following the completion of model training, testing was conducted using the test set to obtain classification accuracy for each fold.Finally, the average accuracy across the five folds was calculated and considered the model's classification outcome.
In the training process, we utilized the Adam optimizer, a commonly preferred optimization algorithm known for its simplicity, high computational efficiency, and minimal memory usage.With bias correction applied, the optimizer ensured that the learning rate for each iteration remained within a specific range, contributing to parameter stability.The chosen loss function was the cross-entropy loss function, well-suited for multi-classification networks, as depicted in Equation (1).Cross-entropy serves to measure the disparity between two different probability distributions within the same random variable, representing the difference between the actual probability distribution and the predicted probability distribution.
where x represents a single data sample, n denotes the total number of categories, indicating how many categories need to be classified, p i (x) represents the i-th target probability distribution, and q i (x) denotes the i-th predicted probability distribution.A smaller cross- entropy loss value indicates improved predictive performance of the model.
The training configuration involved 50 epochs with a batch size of 32 for each iteration.To achieve superior training outcomes, the network also implemented the dropout technique with a probability of P = 0.5 to mitigate overfitting and gradient explosions.Additionally, the ReLU activation function was utilized as part of the network architecture to introduce non-linearity and address vanishing gradient issues effectively, as demonstrated in Equation (2).

Evaluation Metrics
In assessing model performance, we utilize various metrics including accuracy (Acc), precision (P), recall (R), F-score, and Kappa coefficient (P k ).
Accuracy (Acc): Accuracy represents the proportion of correctly classified samples over the total number of samples in the test set.It is calculated as shown in Equation (3): Brain Sci.2024, 14, 498 where TP denotes true positives (correctly predicted positive instances), TN represents true negatives (correctly predicted negative instances), FP signifies false positives (negative instances incorrectly predicted as positive), and FN indicates false negatives (positive instances erroneously predicted as negative).Precision (P) and Recall (R): Precision measures the ratio of correctly classified positive samples to the total samples classified as positive, while recall evaluates the ratio of correctly classified positive samples to all actual positive samples.The formulas are given by the following: F-score: The F-score combines precision and recall, providing a single metric that balances both aspects.It is computed as shown in Equation ( 5): Kappa Coefficient (P k ): The Kappa coefficient is utilized for consistency testing and to gauge classification accuracy.It is expressed as follows in Equation ( 6): Here, P 0 represents the chance-corrected agreement.

Model Performance Analysis
We evaluated the overall performance of the model, with EEG signal time lengths L set to 3 s, using five-fold cross-validation, and conducting experiments on a self-collected dataset.The experimental results are presented in Table 3. Overall, the experimental outcomes demonstrate high accuracy and a certain level of consistency.The average accuracy stands at 76.14%, with a standard deviation of 9.13%, indicating some variability across different participants.
Observing the results for the seven participants, except for S3 and S5, the accuracy rates surpass 70% for the remaining individuals.Notably, S1 and S7 exhibit exceptional accuracy rates of 86.61% and 90.89%, respectively, marking them as the top-performing participants in the experiment.Their average accuracy rates both exceed 85%, showcasing the model's outstanding recognition capabilities for specific individuals.
In order to gain a deeper understanding of each participant's classification performance, we generated ROC curves for the seven participants, as displayed in Figure 5. the model's outstanding recognition capabilities for specific individuals.
In order to gain a deeper understanding of each participant's classification performance, we generated ROC curves for the seven participants, as displayed in Figure 5.The results indicate that participant 7 exhibited outstanding performance, with an AUC value as high as 0.94.This signifies the model's exceptional ability in distinguishing The results indicate that participant 7 exhibited outstanding performance, with an AUC value as high as 0.94.This signifies the model's exceptional ability in distinguishing between the four different states, demonstrating strong category discrimination capability.Participant 1 also achieved notable results, obtaining an AUC value of 0.91, further confirming the model's outstanding performance.However, participant 3 showcased the lowest AUC value at 0.78, highlighting fluctuations in classification performance across different participants.
The overall average AUC of 0.84 with a standard deviation of 0.06 suggests a relatively stable classification performance across the varied participants.This implies that the model performs well in multi-classification tasks, displaying good overall performance despite minor individual differences.Considering the collective performances of all participants, it is evident that the model exhibits effectiveness and robustness in recognizing the states of resting, memory, music, and subtraction.
To assess the recognition capabilities for each state, we present the average confusion matrix for all participants in Figure 6.It is observed that the "Resting" state is relatively easier to identify compared to "Memory", "Music", and "Subtraction", boasting the highest classification accuracy of 80%.Additionally, the classification accuracies for "Resting", "Memory", "Music", and "Subtraction" all exceed 72%.This indicates that the classification accuracies for the four categories are quite close, suggesting good model performance, with effective classification results for each category.
The algorithm demonstrates a low false positive rate for the "Resting" state, indicating its ability to correctly identify instances of this state.In the case of the "Memory" and "Music" states, the algorithm shows low misclassification rates, with both low false positives and false negatives, highlighting its accuracy in differentiating these states.Conversely, the higher false positive rates for the "Subtraction" state suggest that there is room for improvement in accurately classifying this state.
matrix for all participants in Figure 6.It is observed that the "Resting" state is relatively easier to identify compared to "Memory", "Music", and "Subtraction", boasting the highest classification accuracy of 80%.Additionally, the classification accuracies for "Resting", "Memory", "Music", and "Subtraction" all exceed 72%.This indicates that the classification accuracies for the four categories are quite close, suggesting good model performance, with effective classification results for each category.An average confusion matrix of the 7 participants using the TF-CNN-CFA network model on the self-collected dataset.The horizontal axis represents predicted labels, while the vertical axis denotes true labels.The numbers "0", "1", "2", and "3" represent "Resting", "Memory", "Music", and "Subtraction", respectively.The element (i, j) indicates the probability of samples of class i being classified as class j.
The algorithm demonstrates a low false positive rate for the "Resting" state, indicating its ability to correctly identify instances of this state.In the case of the "Memory" and "Music" states, the algorithm shows low misclassification rates, with both low false positives and false negatives, highlighting its accuracy in differentiating these states.Conversely, the higher false positive rates for the "Subtraction" state suggest that there is room for improvement in accurately classifying this state.
However, the classification accuracy for the "Memory" state is relatively lower, with probabilities of misclassification as "Resting", "Music", and "Subtraction" standing at 8%, 11%, and 9%, respectively.This phenomenon may be attributed to the complexity and diversity inherent in memory processes.Memory involves various cognitive processes and brain regions, presenting differences in brain activity patterns, potentially leading to Figure 6.An average confusion matrix of the 7 participants using the TF-CNN-CFA network model on the self-collected dataset.The horizontal axis represents predicted labels, while the vertical axis denotes true labels.The numbers "0", "1", "2", and "3" represent "Resting", "Memory", "Music", and "Subtraction", respectively.The element (i, j) indicates the probability of samples of class i being classified as class j.
However, the classification accuracy for the "Memory" state is relatively lower, with probabilities of misclassification as "Resting", "Music", and "Subtraction" standing at 8%, 11%, and 9%, respectively.This phenomenon may be attributed to the complexity and diversity inherent in memory processes.Memory involves various cognitive processes and brain regions, presenting differences in brain activity patterns, potentially leading to increased difficulty in identification.From a physiological perspective, memory entails the coordinated action of multiple brain regions, such as the hippocampus and frontal cortex, contributing to its complexity and potentially resulting in diverse and challengingto-capture features of memory states within EEG signals.

Ablation Analysis
This study proposed a channel and frequency attention (CFA) module aimed at extracting the channels and frequencies that contribute significantly to the classification task, enhancing the classification performance of these four major subject-driven higher cognitive states.The experimental results are illustrated in Figure 7.
The experimental outcomes reveal that incorporating the CFA module led to an overall enhancement in the model's classification accuracy.With the CFA module in place, the average classification accuracy across participants was 0.7614, compared to 0.7005 without the CFA module.This indicates that the CFA module played a beneficial role in improving the overall classification accuracy.Particularly noteworthy is the significant improvement in classification performance observed in participant 2 and participant 7, who initially exhibited poorer classification results.Thus, the results of this study suggest that introducing the CFA module effectively boosts the classification accuracy of EEG signals.
Moreover, the proposed CFA module also contributes to enhancing the model's interpretability and visualization.The CFA module can highlight the importance weights of each signal channel in different state classifications, making the model's decision-making process more transparent and understandable.Figure 8 illustrates the distribution of the average channel and frequency attention weight coefficients for the seven participants on brain topographic maps.The present study found that channels such as C5, FP1, FP2, C4, PO4, TP7, CP6, T7, F1, and F2 in the low frequency range (0-15 Hz), channels including FT8, FC4, FP1, F8, C4, PO7, POz, P5, FC3, and CP1 in the mid-frequency range (15-30 Hz), as well as channels like AF8, C3, Fp2, FC4, CP2, PO8, C5, CP1, FC6, C4, and PO7 in the high frequency range (30-45 Hz) play a significant role in the classification of the four cognitive states.
increased difficulty in identification.From a physiological perspective, memory entails the coordinated action of multiple brain regions, such as the hippocampus and frontal cortex, contributing to its complexity and potentially resulting in diverse and challengingto-capture features of memory states within EEG signals.

Ablation Analysis
This study proposed a channel and frequency attention (CFA) module aimed at extracting the channels and frequencies that contribute significantly to the classification task, enhancing the classification performance of these four major subject-driven higher cognitive states.The experimental results are illustrated in Figure 7.The experimental outcomes reveal that incorporating the CFA module led to an overall enhancement in the model's classification accuracy.With the CFA module in place, the average classification accuracy across participants was 0.7614, compared to 0.7005 without the CFA module.This indicates that the CFA module played a beneficial role in improving the overall classification accuracy.Particularly noteworthy is the significant improvement in classification performance observed in participant 2 and participant 7, who initially exhibited poorer classification results.Thus, the results of this study suggest that introducing the CFA module effectively boosts the classification accuracy of EEG signals.
Moreover, the proposed CFA module also contributes to enhancing the model's interpretability and visualization.The CFA module can highlight the importance weights of each signal channel in different state classifications, making the model's decision-making process more transparent and understandable.Figure 8 illustrates the distribution of the average channel and frequency attention weight coefficients for the seven participants on brain topographic maps.The present study found that channels such as C5, FP1, FP2, C4, PO4, TP7, CP6, T7, F1, and F2 in the low frequency range (0-15 Hz), channels including FT8, FC4, FP1, F8, C4, PO7, POz, P5, FC3, and CP1 in the mid-frequency range (15-30 Hz), as well as channels like AF8, C3, Fp2, FC4, CP2, PO8, C5, CP1, FC6, C4, and PO7 in the high frequency range (30-45 Hz) play a significant role in the classification of the four cognitive states.The horizontal axis of the confusion matrix represents predicted labels, while the vertical axis denotes true labels.The numbers "0", "1", "2", and "3" represent "Resting", "Memory", "Music", and "Subtraction", respectively.The t-SNE visualization effectively demonstrates the classification performance, showcasing the superior classification results of the TF-CNN-CFA network model in distinguishing between the four different brain states.This further confirms the model's effectiveness in multi-class classification tasks, providing compelling visual evidence and guiding directions for future model improvements and optimizations.

Comparative Analysis
We compared the TF-CNN-CFA with EEGNet_4_2 [43], EEGNet_8_2 [43], EEGNex [44], ResNet18 [45], VGG16 [46], LeNet [47], and TF-CNN.EEGNet_4_2, EEGNet_8_2, and EEGNex take raw signals as input, while TF-ResNet18, TF-VGG16, TF-LeNet, TF-CNN, and TF-CNN-CFA take time-frequency maps as input after feature representation.The comparison results of the accuracy, precision, recall, F-score, Kappa, and AUC values for different models on the 3000 ms data are presented in Table 4. Table 4.A comparison of the different models on the 3000 ms data (five-fold cross-validation).The best results for each evaluation metric are highlighted in bold.The first three network models take raw signals as input, while the following five models take time-frequency maps as input after feature representation.0.25 ± 0.00 0.06 ± 0.00 0.25 ± 0.00 0.10 ± 0.00 0.00 ± 0.00 0.50 ± 0.00 The horizontal axis of the confusion matrix represents predicted labels, while the vertical axis denotes true labels.The numbers "0", "1", "2", and "3" represent "Resting", "Memory", "Music", and "Subtraction", respectively.The t-SNE visualization effectively demonstrates the classification performance, showcasing the superior classification results of the TF-CNN-CFA network model in distinguishing between the four different brain states.This further confirms the model's effectiveness in multi-class classification tasks, providing compelling visual evidence and guiding directions for future model improvements and optimizations.
The results demonstrate that the TF-CNN-CFA model exhibits significant advantages in multiple key performance metrics.Firstly, its accuracy reaches 0.76 ± 0.09, notably higher than that of the other models, indicating outstanding performance in correctly predicting sample categories.Secondly, the model's precision of 0.79 ± 0.07 surpasses that of other models by far, signifying a high proportion of true positive instances among all predicted positive instances.Table 4.A comparison of the different models on the 3000 ms data (five-fold cross-validation).The best results for each evaluation metric are highlighted in bold.The first three network models take raw signals as input, while the following five models take time-frequency maps as input after feature representation.Furthermore, in terms of recall, TF-CNN-CFA also demonstrates excellent performance at 0.76 ± 0.09, indicating successful identification of most actual positive instances.Considering the F1 score, which balances precision and recall, TF-CNN-CFA achieves a score of 0.76 ± 0.10, showcasing good balanced performance in classification tasks.Additionally, the Kappa coefficient of 0.68 ± 0.12 reveals significant consistency between the model's predictions and random selection.

Model
Lastly, in the AUC value aspect, TF-CNN-CFA achieves a score of 0.84 ± 0.06, demonstrating a large area under the ROC curve and overall superior performance.In conclusion, based on a comprehensive comparative analysis of the models across performance metrics, it is evident that the TF-CNN-CFA model excels on the dataset, displaying strong classification and generalization abilities.Therefore, it can be concluded that the TF-CNN-CFA model performs remarkably well in this task and is a deep learning model worthy of further research and application.
Further comparison of the performance of the top four neural network architectures on our self-collected dataset, including EEGNex, TF-LeNet, TF-CNN, and TF-CNN-CFA, was conducted.In Table 5, a detailed analysis of the accuracy (Acc) and Kappa coefficient values of these models across different participants is provided.
The results show that the TF-CNN-CFA model excels across the entire dataset, with an average accuracy of 0.7614, significantly higher compared to the other models.Particularly noteworthy is the exceptional performance of the TF-CNN-CFA model in the testing of participant S7, achieving an accuracy of 0.9089 and a Kappa coefficient of 0.8784, highlighting its outstanding performance on this specific participant.This indicates the robust and reliable overall performance of the TF-CNN-CFA model on the dataset.Furthermore, the TF-CNN model exhibits excellent performance on participants S1 and S4, with accuracy rates of 0.8295 and 0.7366, respectively, and Kappa coefficients of 0.7724 and 0.6491.This demonstrates the standout performance of the TF-CNN model on specific participants and its decent generalization capabilities.While the TF-LeNet and EEGNex models also show relatively good performance on some participants, their overall performance slightly lags behind that of the TF-CNN-CFA and TF-CNN models.
Considering the overall average values, the TF-CNN-CFA model significantly outperforms the others in terms of accuracy and Kappa coefficients, reaching values of 0.7614 and 0.6818, respectively.Through an analysis of standard deviations, it is revealed that the performance of the TF-CNN-CFA model exhibits low variance, indicating high stability and reliability on the dataset.This suggests that the TF-CNN-CFA model not only excels on individual participants but also possesses excellent generalization capabilities across the dataset.
To analyze the effects of the model proposed in this study on EEG data recognition for each state, we computed the average confusion matrix for participant 7 across four different models.As depicted in Figure 11, the horizontal axis represents the predicted categories of EEG states by the models, while the vertical axis denotes the actual EEG state categories.The values on the diagonal represent the proportion of correct classifications, while the off-diagonal elements indicate the proportion of misclassifications.In this study, we combined an attention module with a CNN to process EEG data.By extracting features from a global perspective and generating channel and frequency attention maps, we aimed to reduce errors in the classification of the four brain states.For participant 7, the accuracy rates for the four states reached 88%, 90%, 91%, and 95%, respectively.Compared to the EEGNex, TF-LeNet, and TF-CNN models, the TF-CNN-CFA model demonstrated superior performance.Additionally, the minimal differences in classification accuracy among different categories suggest that the model exhibits stable and excellent performance.

R PEER REVIEW
16 of 22 models also show relatively good performance on some participants, their overall performance slightly lags behind that of the TF-CNN-CFA and TF-CNN models.
Considering the overall average values, the TF-CNN-CFA model significantly outperforms the others in terms of accuracy and Kappa coefficients, reaching values of 0.7614 and 0.6818, respectively.Through an analysis of standard deviations, it is revealed that the performance of the TF-CNN-CFA model exhibits low variance, indicating high stability and reliability on the dataset.This suggests that the TF-CNN-CFA model not only excels on individual participants but also possesses excellent generalization capabilities across the dataset.
To analyze the effects of the model proposed in this study on EEG data recognition for each state, we computed the average confusion matrix for participant 7 across four different models.As depicted in Figure 11, the horizontal axis represents the predicted categories of EEG states by the models, while the vertical axis denotes the actual EEG state categories.The values on the diagonal represent the proportion of correct classifications, while the off-diagonal elements indicate the proportion of misclassifications.The vertical axis represents the true labels of the four cognitive state categories, while the horizontal axis indicates the predicted labels of the four cognitive state categories."0", "1", "2", and "3" correspond to "Resting", "Memory", "Music", and "Subtraction", respectively.The element (i, j) represents the probability of a sample belonging to class j within the i-th class.

EEGNex TF-LeNet
In this study, we combined an attention module with a CNN to process EEG data.By extracting features from a global perspective and generating channel and frequency attention maps, we aimed to reduce errors in the classification of the four brain states.For par-Figure 11.Confusion matrix for participant 7 with the different models.The vertical axis represents the true labels of the four cognitive state categories, while the horizontal axis indicates the predicted labels of the four cognitive state categories."0", "1", "2", and "3" correspond to "Resting", "Memory", "Music", and "Subtraction", respectively.The element (i, j) represents the probability of a sample belonging to class j within the i-th class.

Time Length Impact
In the realm of brain-computer interface (BCI), a crucial issue is the reduction in the EEG signal length L to obtain sufficient information for robust classification.In other words, the aim is to achieve high classification accuracy while using shorter EEG signal lengths.
In this respect, we also performed classification tasks using EEG signals of different time lengths: L = 384 (1.5 s), 512 (2 s), 640 (2.5 s), 768 (3 s), 896 (3.5 s), 1024 (4 s), 1152 (4.5 s), and 1280 (5 s).The size of each dataset varies accordingly.Table 6 presents the number of training samples and testing samples as shown below.Following the established procedure, we trained the network through 5-fold cross-validation with 20 epochs per fold.Subsequently, we compared the results obtained with this approach to EEGNet_4_2, EEGNet_8_2, EEGNex, TF-ResNet18, TF-VGG16, TF-LeNet, and TF-CNN methods.Now, we consider comparing the classification performance of EEG signals of different time lengths L. We compare TF-LeNet, TF-CNN, and our network model TF-CNN-CFA.Table 7 presents the performance results for time lengths of 1.5 s, 2 s, 2.5 s, 3 s, 3.5 s, 4 s, 4.5 s, and 5 s.These results correspond to the average validation accuracy obtained from K-fold cross-validation (K = 5) for each category within the four classes and the overall average accuracy.average classification accuracy of the seven participants across the four cognitive states reached 76.14% ± 0.09, significantly outperforming several comparative methods.
This study introduces a deep learning method based on time-frequency maps that effectively distinguishes between these four states, validating that advanced cognitive activities can serve as a potential paradigm for brain-computer interfaces.This paradigm offers several advantages.Firstly, it is simple and easy to implement, reducing the impact of individual differences on the results.Participants do not require extensive training to acquire the skills, thus saving time and costs.Secondly, the diverse and engaging design of the four states alleviates cognitive load, enhancing operability and efficiency.This article represents the first attempt at EEG decoding based on high-level cognitive activities, covering various cognitive domains such as auditory processing, memory, and reasoning.It confirms the feasibility of utilizing advanced cognitive activities in BCIs.This study aims to translate brain thoughts into commands, thereby expanding the range of instructions available for brain-machine interfaces.
The TF-CNN-CFA model proposed in this study significantly enhances the classification performance of BCI tasks.According to the data in Table 4, direct classification of raw EEG brain signals yielded poor results, with classification accuracies of 0.27 ± 0.02 for EEGNet_4_2, 0.26 ± 0.03 for EEGNet_8_2, and 0.36 ± 0.08 for EEGNex.To address this issue, the raw EEG signals were transformed into time-frequency maps through continuous wavelet transform.These time-frequency maps were then grouped and segmented based on frequency ranges and decomposed into RGB images with three channels corresponding to frequency ranges of 0-15 Hz, 15-30 Hz, and 30-45 Hz.Each channel displayed the energy distribution within different frequency ranges.The time-frequency maps were subsequently fed into the network for training.Table 4 demonstrates an improvement in classification performance after transforming the signals into time-frequency maps, with average classification accuracies of 0.43 ± 0.08 for TF-LeNet, 0.70 ± 0.08 for TF-CNN, and 0.76 ± 0.09 for TF-CNN-CFA models.The conversion of data into time-frequency maps is essential when dealing with advanced cognitive tasks.The experimental findings suggest that advanced cognitive activities may engage brainwave signals across various frequency ranges throughout the entire brain, making the use of raw EEG signals ineffective.These activities are more intricate than simple imagination, rendering conventional EEG signal processing methods unsuitable.Furthermore, the TF-CNN-CFA model showed minimal differences in classification accuracy among different categories, indicating its stable and excellent performance.By introducing CFA modules, the model's classification performance was significantly enhanced.As shown in Figure 8, after incorporating these attention modules, the average classification accuracy for all subjects increased from 70.05% to 76.14%.Notably, subject 2 and subject 7 saw improvements in classification accuracy by 11.69% and 18.39%, respectively, after processing with the CFA modules.
Despite our relatively good results, there are some limitations that need further discussion.First, we can learn from the research approach of Veeranki et al. [48], who effectively utilized non-linear signal processing for emotion detection via EDA.Adopting this nonlinear-focused approach can refine our EEG-based BCI system and enhance its ability to decipher complex cognitive processes.Second, the multi-layer architecture and attention mechanism do help it to capture relevant features efficiently, but this also leads to an increase in the number of parameters, which increases the computational cost and training time.Third, with the increased complexity, the algorithm faces challenges in convergence during training.Optimizing such a network requires careful tuning of hyperparameters, including learning rates, dropout rates, and the number of epochs, which can be a painstaking and computationally expensive process.Fourth, the limited sample size in this study may lead to overfitting, which could impact the model's ability to generalize to unseen EEG data.Further research is underway to investigate cognitive state classification on larger-scale datasets.Fifth, the classifier performance is heavily dependent on the EEG signal denoising effect, and the commonly used ICA method for preprocessing is used in this paper.Some more robust preprocessing denoising methods have been proposed in recent years [49][50][51][52][53] which may further enhance the classification effect.We will further evaluate the effect of EEG noise on classification performance in future studies.In addition, cognitive activity is heavily subject-dependent and varies widely between subjects.Cognitive BCI stable feature representation is still a direction for further exploration in the future.Interest in cognitive BCI has grown in recent years.Vansteensel et al. [54] documented the successful manipulation of a computer cursor toward a target through the modulation of gamma electroencephalography (ECoG) activity within the left dorsolateral prefrontal cortex (DLPFC).Ryun et al. [55] demonstrated the predictability of two distinct motor actions (hand grasping and elbow bending) prior to their execution, achieved by analyzing prefrontal ECoG signals.These invasive investigations collectively illuminated the neural mechanisms supporting cognitive BCIs and bolstered the practical potential of this technology.Simultaneously, endeavors to create non-intrusive cognitive BCIs in humans, employing electroencephalography (EEG) [56,57] and near-infrared spectroscopy (NIRS) methodologies [58], have gained momentum.Thereby validating the practicality of non-invasive cognitive BCI systems.

Conclusions
This study explores a novel EEG-BCI paradigm involving four subject-driven cognitive state tasks: resting, narrative memory, music, and subtraction.By employing the TF-CNN-CFA model with a channel and frequency attention module to classify these four cognitive states, the results demonstrate an average classification accuracy of 76.14% across seven participants, significantly outperforming other classification methods of EEG-BCI.This study can also enrich cognitive brain-machine interface paradigms.

Figure 1 .
Figure 1.The overall framework of the proposed TF-CNN-CFA model.This network consists of a feature extraction module, a channel and frequency attention (CFA) module, a convolution module, and a fully connected module.

Figure 2 .
Figure2.Experimental paradigm design.Each participant is required to complete three experimental sessions, each comprising 5 blocks.Within each block, participants engage in four types of mental imagery-resting, memory, music, and subtraction.In each block, a "+" signal appears on the screen to indicate the beginning of the task.Cue words for the four mental states are then displayed for 60 s, followed by a 24 s rest period.There is no break between the 5 blocks within a session, but a five-minute rest interval separates each session.At the conclusion of the experiment, each participant has 15 min of EEG data recorded for each mental imagery task state.

Figure 4 .
Figure 4. Data preprocessing flowchart.The data preprocessing procedure involved band-pass filtering (0.1-45 Hz), the average reference method, Independent Component Analysis (ICA), interpolation repair, data segmentation, and baseline correction.Subsequently, continuous wavelet transform was used to create time-frequency images, which were organized by frequency range and decomposed into RGB channels representing frequency ranges of 0-15 Hz, 15-30 Hz, and 30-45 Hz.

Figure 4 .
Figure 4. Data preprocessing flowchart.The data preprocessing procedure involved band-pass filtering (0.1-45 Hz), the average reference method, Independent Component Analysis (ICA), interpolation repair, data segmentation, and baseline correction.Subsequently, continuous wavelet transform was used to create time-frequency images, which were organized by frequency range and decomposed into RGB channels representing frequency ranges of 0-15 Hz, 15-30 Hz, and 30-45 Hz.

Figure 5 .
Figure 5. ROC curves.The horizontal axis represents the false positive rate (FPR), while the vertical axis represents the true positive rate (TPR).Each participant has an individual ROC curve, with the thick blue line depicting the average ROC curve across all participants.The shaded area indicates the standard deviation.The diagonal line represents the random classification probability.

Figure 5 .
Figure 5. ROC curves.The horizontal axis represents the false positive rate (FPR), while the vertical axis represents the true positive rate (TPR).Each participant has an individual ROC curve, with the thick blue line depicting the average ROC curve across all participants.The shaded area indicates the standard deviation.The diagonal line represents the random classification probability.

Figure 6 .
Figure 6.An average confusion matrix of the 7 participants using the TF-CNN-CFA network model on the self-collected dataset.The horizontal axis represents predicted labels, while the vertical axis denotes true labels.The numbers "0", "1", "2", and "3" represent "Resting", "Memory", "Music", and "Subtraction", respectively.The element (i, j) indicates the probability of samples of class i being classified as class j.

Figure 7 .
Figure 7. Channel and frequency attention (CFA) module performance analysis.The horizontal axis represents participants, while the vertical axis represents accuracy.

Figure 7 .Figure 8 .
Figure 7. Channel and frequency attention (CFA) module performance analysis.The horizontal axis represents participants, while the vertical axis represents accuracy.Brain Sci.2024, 14, x FOR PEER REVIEW 13 of 23

Figure 8 .
Figure 8. Brain channel and frequency attention weight distribution map for the seven participants (normalized).Blue indicates weak brainwave signals in the area, while darker shades of red indicate stronger brainwave signals in that region.3.1.2.Qualitative Analysis (Visual Analysis) Following a series of quantitative evaluation tasks, we conducted a qualitative assessment using T-distributed Stochastic Neighbor Embedding (T-SNE) [42] to evaluate the discriminative capabilities of feature vectors.T-SNE is widely used for projecting high-dimensional data onto a two-dimensional scatter plot.All experiments in this study were based on our self-collected dataset, conducting individual participant classification experiments under the same training strategy.In Figure 9, the labels of all categories of participant 7's original EEG data are evenly distributed in the T-SNE plot.Different colors represent the labels of EEG signals for the four brain states (resting, memory, music, and subtraction).The confusion matrix of participant 7 after classification and its corresponding T-SNE plot are presented in Figure 10.It is observed that the T-SNE plot and confusion matrix exhibit similar trends.Under high classification accuracy, the T-SNE scatter plot shows closer clustering within the same category and clearer separation between different categories.When misclassifications occur, there may be some overlap between the scatter points of different categories in the T-SNE plot.

ticipant 7
after classification and its corresponding T-SNE plot are presented in Figure 10.It is observed that the T-SNE plot and confusion matrix exhibit similar trends.Under high classification accuracy, the T-SNE scatter plot shows closer clustering within the same category and clearer separation between different categories.When misclassifications occur, there may be some overlap between the scatter points of different categories in the T-SNE plot.

Table 1 .
Specific network structure.

Table 2 .
Channel and frequency attention (CFA) module's network structure.

Table 5 .
Performance comparison of the four best neural network architectures on the self-collected dataset (3000 ms).The best accuracy and Kappa coefficient values for each subject are highlighted in bold.

Table 6 .
The number of training and testing samples obtained by segmenting the original measurement data into different non-overlapping lengths.