Next Article in Journal
Application of Telemedicine and Artificial Intelligence in Outpatient Cardiology Care: TeleAI-CVD Study (Design)
Previous Article in Journal
The Acromial Index, but Not the Critical Shoulder Angle, Affects Functional and Clinical Outcomes in Patients with Rotator Cuff Tears
Previous Article in Special Issue
Diagnostic Accuracy of an Offline CNN Framework Utilizing Multi-View Chest X-Rays for Screening 14 Co-Occurring Communicable and Non-Communicable Diseases
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluating the Impact of Demographic Factors on Subject-Independent EEG-Based Emotion Recognition Approaches

by
Nathan Douglas
1,†,
Maximilien Oosterhuis
1,† and
Camilo E. Valderrama
1,2,*
1
Department of Applied Computer Science, University of Winnipeg, Winnipeg, MB R3B 2E9, Canada
2
Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB R3B 2E9, Canada
*
Author to whom correspondence should be addressed.
Shared first authorship.
Diagnostics 2026, 16(1), 144; https://doi.org/10.3390/diagnostics16010144 (registering DOI)
Submission received: 27 August 2025 / Revised: 9 December 2025 / Accepted: 26 December 2025 / Published: 1 January 2026
(This article belongs to the Special Issue 3rd Edition: AI/ML-Based Medical Image Processing and Analysis)

Abstract

Background: Emotion recognition using electroencephalography (EEG) offers a non-invasive means of measuring brain responses to affective stimuli. However, since EEG signals can vary significantly between subjects, developing a deep learning model capable of accurately predicting emotions is challenging. Methods: To address that challenge, this study proposes a deep learning approach that fuses EEG features with demographic information, specifically age, sex, and nationality, using an attention-based mechanism that learns to weigh each modality during classification. The method was evaluated using three benchmark datasets: SEED, SEED-FRA, and SEED-GER, which include EEG recordings of 31 subjects of different demographic backgrounds. Results: We compared a baseline model trained solely on the EEG-derived features against an extended model that fused the subjects’ EEG and demographic information. Including demographic information improved the performance, achieving 80.2%, 80.5%, and 88.8% for negative, neutral, and positive classes. The attention weights also revealed different contributions of EEG and demographic inputs, suggesting that the model learns to adapt based on subjects’ demographic information. Conclusions: These findings support integrating demographic data to enhance the performance and fairness of subject-independent EEG-based emotion recognition models.

1. Introduction

This article extends our earlier conference paper [1] by incorporating additional analyses, including a demographic ablation study and an attention-weight interpretation, providing deeper insights into how age, sex, and nationality influence subject-independent EEG-based emotion recognition.
Emotions are regulated by the hippocampus, amygdala, and prefrontal cortex (PFC) [2]. These brain regions interact to associate emotional stimuli with memories and generate appropriate responses [3,4]. The interaction between the hippocampus, amygdala, and PFC can be measured indirectly using electroencephalography (EEG) [5]. This recorded electrical activity can be used to develop machine learning models to predict emotions [6,7,8,9]. Recent advances in artificial intelligence and multimodal learning have significantly improved EEG-based emotion recognition, with deep spatiotemporal fusion models demonstrating state-of-the-art results [10,11,12,13,14,15,16,17].
To develop EEG-based emotion recognition models, two approaches can be used: subject-dependent and subject-independent [18]. The subject-dependent approach involves using EEG signals from the same subjects for both the training and testing phases, necessitating that a subject’s data appears in both sets. In contrast, the subject-independent approach builds a model from a group of subjects’ EEG signals and tests it on a different individual not included in the training phase. From these two approaches, the subject-independent model is more practical, as it allows the development of a model that does not require recalibration to be used by a new user. However, given the high variability of EEG signals across individuals, models developed under the subject-independent approach tend to yield lower classification performance than those trained using the subject-dependent approach [19].
The problem occurring for the subject-independent approaches is known as the shift-domain problem [20], in which the assumption that the training and test set features are from the same distribution is violated. In the case of EEG, as signals from subjects exhibit different distributions, the features extracted from those features also display a different distribution. As a result, the patterns learned by the model from the training set may not fully apply to new individuals, thereby reducing the predictive performance of the models [21,22].
To address the domain-shift problem, one approach is to include the demographics of the subjects alongside EEG signals to train the model [5]. For instance, Li et al. showed that including the age and biological sex of the subjects in the last layer can improve emotion prediction [23]. Similarly, Peng et al. compared a model trained and tested using same-sex subjects with a model trained on subjects of the same sex and tested on subjects of the opposite sex (i.e., cross-sex model), reporting that same-sex models outperformed the cross-sex model for predicting emotions [24]. This strategy was also used in [25], where models trained with subjects from the same nationality were compared with cross-nationality models. Again, the models trained and tested on individuals from the same nationality yielded a higher predictive performance, thus suggesting that nationality influences the patterns extracted by the model for making the predictions. Finally, in [26,27], it was shown that including the subjects’ biological information and nationality can increase the likelihood of correctly predicting emotions.
Previous studies have demonstrated the potential of demographic information for improving the performance of subject-independent emotion recognition approaches. Incorporating factors such as sex, age, or nationality can enable the extraction of group-specific patterns that enhance prediction accuracy [28,29,30,31]. However, existing works present two main limitations. First, Li et al. [23] only considered age and sex within a dataset of subjects from the same nationality, thereby neglecting the influence of cultural background on emotion perception, a factor that previous research has identified as critical when comparing interdependent and individualistic societies [32,33]. Second, the comparisons made in [24,25] between same-sex and same-nationality models and their cross-sex and cross-nationality counterparts limit practical applicability, as demographic group membership may not always be known in advance. Moreover, this strategy requires training and maintaining multiple deep learning models tailored to each demographic subgroup, which is impractical in scenarios with limited data availability for specific populations. Therefore, further research is needed to develop a unified approach that integrates demographic information with EEG signals, enabling models to generalize across diverse populations irrespective of sex, age, or cultural background. Table 1 summarizes prior studies integrating demographic factors into EEG-based emotion recognition and highlights the methodological limitations that motivate our proposed approach.
To address these limitations, the present study proposes a deep learning framework that fuses EEG signals with demographic information to improve emotion recognition under a subject-independent setting. Specifically, we evaluate our approach using three benchmark emotion recognition datasets (SEED, SEED-FRA, and SEED-GER), which include EEG data from 31 participants of different sexes, ages, and nationalities. The main contributions of this work are as follows:
  • We propose a unified deep learning framework that integrates EEG features with three key demographic variables (sex, age, and nationality) for emotion recognition.
  • We analyze the relative influence of each demographic variable on recognition performance, providing insights into their individual and combined effects.
  • We show that fusing demographic information enhances the generalization capability of subject-independent emotion recognition models across diverse populations.

2. Materials and Methods

2.1. Dataset

The study used three EEG datasets: SEED (SJTU Emotion EEG Dataset) [34], SEED-FRA [25] and SEED-GER [25] datasets, which are benchmark datasets commonly used for emotion recognition [5]. Table 2 describes these datasets. All datasets include signals recorded from 62 EEG channels using the 10–20 system EEG cap.
SEED contains EEG signals from 15 subjects who watched a series of film clips in Chinese. These clips were labeled as positive, neutral, and negative to capture emotional states. All subjects participated in three sessions, watching 15 clips each (5 per emotion). As a result, a total of around 45 EEG signals were collected for each subject.
SEED-FRA and SEED-GER are extensions of SEED, each consisting of 8 subjects who were shown clips in French and German, respectively. In SEED-FRA, each subject watched 21 videos per session across three sessions, resulting in a total of 63 EEG recordings per subject. In SEED-GER, each subject viewed 18 videos, yielding 54 EEG recordings per subject.
All datasets were class-balanced, having the same number of videos per emotion. Specifically, the number of videos for each emotion type was 15, 21, and 18 for SEED, SEED-FRA, and SEED-GER, respectively.

2.2. EEG Preprocessing

As EEG signals are prone to noise, we filtered them using a Butterworth filter within the range of 1–75 Hz. The selection of this range was made to ensure the inclusion of brain frequency bands (delta ( δ ): 1–4 Hz, theta ( θ ): 4–8 Hz, alpha ( α ): 8–12 Hz, beta ( β ): 12–30 Hz, and gamma ( γ ): 30–50 Hz) during the feature extraction process. Thus, this filter removed higher frequency components associated with eye blinking and motor functions.

2.3. Differential Entropy

The EEG signals were segmented using a four-second window. To calculate the DE for each window, it was assumed that the EEG signals followed a Gaussian distribution [35]. So, the DE was calculated as follows:
D E c h , b = 1 2 ln π × e × σ c h , b 2 ,
where c h was the c h t h EEG channel (e.g., F P 1 , O 2 , T 1 ), b was the b t h frequency band ( δ , θ , α , β , or γ ), and σ c h , b 2 was the power spectrum density (PSD) for the c h t h channel and the b t h band. The PSD was computed through the discrete Fourier transform (DFT) as:
σ c h , b 2 = 1 N b k = 0 N b ( | X [ k ] | ) 2 ,
where X was the DFT of the 4-s EEG window, and N b was the number of frequency components in the b t h band.
Since DE was computed for each EEG channel and frequency band, this procedure resulted in a feature matrix of dimension 62 × 5 for each of the 4-s segments.

2.4. Standardization

Before model training, both EEG signals and demographic data were standardized. EEG signals were individually standardized for each session per subject using a z-score transformation:
z = ( x μ ) σ ,
where x represents the DE feature vector, μ was the mean, and σ was the standard deviation computed per session per subject.
For the demographic data, one-hot encoding was applied to categorical variables, nationality, and gender. All demographic features were then standardized using the same z-score standardization technique to ensure consistency across variables.

2.5. Deep Learning Approaches

Two deep learning approaches were used to train the model. Approach 1 served as a baseline, consisting solely of the DE feature data. This model was used to establish a benchmark for the second approach. Approach 2 built on the first by incorporating the demographic information: sex (male or female), age, and nationality (Chinese, French, or German). The baseline model was used to evaluate the impact of adding such demographic data in the training process and overall performance.
All approaches were conducted on a workstation equipped with an AMD Ryzen Threadripper PRO 5975WX 32-core CPU, 503 GB of RAM, and three NVIDIA RTX A6000 GPUs (48 GB VRAM each) running CUDA 12.8.

2.5.1. Approach 1: The Baseline Model

To derive the most appropriate deep learning model for Approach 1, we tried four deep learning models to evaluate their effectiveness in capturing temporal and contextual dependencies in the EEG data. This systematic evaluation of models provided insight into their strengths and weaknesses, ultimately leading to selecting the most suitable approach for emotion recognition. Specifically, we used a convolutional neural network (CNN) [36] to extract spatial features. Second, we used a Graph Neural Network (GNN) [37] to relate the DE features of the channels based on the proximity between them. Since the first two models focused more on spatial relationships, we also explored other models to capture time variations. Thus, in our third model, we equipped the GNN with a long short-term memory (LSTM) network [38] to improve sequential modeling. Finally, to further capture long-range dependencies, we used a model composed of a GNN and a transformer [39].
Each model (CNN, GNN, GNN + LSTM, and GNN + Transformer) was implemented with comparable representational capacity. The CNN consisted of two convolutional layers followed by a fully connected output layer. The first convolutional layer used 32 filters, 3 × 3 kernel, and a stride of 1, followed by max-pooling with a size of (2, 1) and a dropout of 0.2. The second convolutional layer used 64 filters, with the same kernel and pooling configuration. The output from the final convolution layer was flattened and passed through a fully connected layer that produced the three output classes.
The standalone GNN model consisted of a single graph convolutional layer using a normalized adjacency matrix to define node relationships with five input features and eight output features. A ReLu activation function is applied after this layer, followed by a dropout rate of 0.2. This is followed by a fully connected layer mapping to 3 output classes.
The GNN + LSTM model used the same GNN described above. The resulting spatial features are flattened and fed into a bidirectional LSTM with 32 hidden units to capture temporal dynamics across windows. The concatenated forward and backwards states were passed through a dropout and a fully connected layer.
In the GNN + Transformer model, each EEG window (62 nodes × 5 features) was first processed by the same GNN layer described previously, producing eight features per node with SiLU activation, dropout, and layer normalization. The resulting node embeddings were flattened into tokens of size 62 × 8 = 496 and combined with a learnable positional embedding. These tokens were passed through a single Transformer encoder layer with 4 attention heads, a feed-forward layer with a dimension of 992 (2 × 496), and a dropout rate of 0.3. The encoded sequence was mean-pooled, followed by dropout 0.3 and a fully connected layer that produced the three output classes. Table 3 summarizes the architectures of the used models.

2.5.2. Approach 2: The Extended Model

Approach 2 extends Approach 1 by combining its output with demographic variables. Three techniques were explored to perform this task: concatenation, CNN fusion, and attention fusion [15]. Because attention fusion achieved the highest performance, we selected it for Approach 2, as shown in Figure 1.
The architecture was an extension of the GNN + Transformer model. Specifically, it extended that model by including demographics, such as age, gender, and nationality features after Transformer mean pooling. The architecture consisted of two branches: one processed 310 EEG-derived features using the same deep neural model from Approach 1, while the other encoded the demographic attributes, namely sex, age, and nationality. Each branch was passed through a fully connected layer, reducing the EEG output to 16 units ( O 1 ) and expanding the demographic input from 4 to 16 units ( O 2 ). The resulting embeddings ( O 1 and O 2 ) were then fused using an attention mechanism. Specifically, each of them was multiplied element-wise with a shared 16-unit weight vector (A), and attention weights ω 1 and ω 2 were computed as: ω 1 = exp ( A · O 1 ) exp ( A · O 1 ) + exp ( A · O 2 ) , and ω 2 = 1 ω 1 . The final fused features were computed as O f u s i o n = ω 1 O 1 + ω 2 O 2 .

2.6. Hyperparameter Selection

To ensure a fair and unbiased comparison across all models and the various approaches, we conducted the same hyperparameter search strategy to identify the best hyperparameters. Specifically, each model underwent an identical grid search over a range of values for learning rate and weight decay, ensuring that no model had any preferential tuning. This strategy allows each architecture to be evaluated to near optimal configuration, allowing for an unbiased comparison across architectures.
The search included learning rate, dropout rate, and many other key parameters specific to each approach, which were tuned by evaluating performance on the training data and selecting the values that yielded the highest accuracy. While early stopping can prevent overfitting, we chose to train each model on a fixed number of epochs to maintain a consistent number of training epochs across all subjects for our leave-one-subject-out cross-validation (LOSOCV). This approach ensured consistency across subjects in the evaluation process while leveraging the individually optimized hyperparameters.
Table 4 summarizes the selected hyperparameters optimized. All training approaches used a batch size of 32. To select optimal learning rates and weight decay, we performed a grid search over values for the learning rate ( 10 5 , 10 4 , 10 3 , 10 2 ) and weight decay ( 0 , 10 4 , 10 3 , 10 2 , 10 1 ). We selected the configuration that produced the best overall model performance for each of the models tested. The simpler CNN and GNN models were trained using Stochastic Gradient Descent (SGD), whereas the larger models, GNN + LSTM and GNN + Transformer models, used the AdamW optimizer with heavy regularization using a weight decay of 0.1.
Importantly, tuning each model within the same search space ensures that the comparison reflects the models’ actual capabilities rather than arbitrary hyperparameter choice.

2.7. Performance Metrics

To evaluate model performance in a subject-independent manner, we employed again LOSOCV on model given by the best hyperparameters. This approach involved iterating over each of the 31 subjects, using the data from the selected subject as the test set while training the deep learning model on data from the remaining 30 subjects. Consequently, the LOSOCV process yielded a separate performance metric for each subject.
Since the datasets were balanced across the emotion classes, we used accuracy to evaluate the performance of the deep learning models. Accuracy was calculated in two ways: per-subject performance and per-emotion performance. Per-subject performance was computed separately for each subject to evaluate how well the model performed across different users. This approach accounted for potential variations due to demographic imbalances and provided insight into the model’s ability to generalize. Per-emotion accuracy was used to measure the model’s effectiveness in distinguishing between different emotional states, helping to identify whether certain emotions were more easily classified than others. Additionally, we computed the macro F1-score and the macro area under the receiver operating characteristic curve (ROC-AUC).

2.8. Impact of Demographics

To evaluate whether incorporating demographic information improved prediction accuracy, we calculated the performance difference for each subject between the two approaches. For each demographic variable, we compared performance differences across their respective categories: female and male for biological sex; Chinese, French, and German for nationality; and younger than or equal to 23 years versus older than 23 years for age. To determine whether these differences were statistically significant, we conducted a two-sided paired Wilcoxon signed-rank test. The null hypothesis posited that the performance difference between the baseline and extended approaches was symmetric around zero, while the alternative hypothesis assumed an asymmetric distribution of differences around zero. The hypothesis test provided p-values, where a lower p-value suggested statistically significant differences.
To avoid false positives due to multiple comparisons, we adjusted p-values using Bonferroni correction. For reference, we interpreted p-values between 0.1 and 0.05 as weak evidence, between 0.05 and 0.01 as moderate evidence, and less than 0.01 as strong evidence of a significant difference.

2.9. Demographic Ablation Analysis

To evaluate the contribution of demographic information to model performance, we conducted an ablation study in which each demographic variable was removed individually. The goal of this analysis was to identify which variable caused the greatest performance reduction when excluded. A significant decrease in performance indicates that the variable plays a crucial role in emotion prediction and should therefore be carefully considered when developing subject-independent emotion recognition models.

2.10. Demographic Feature Importance

To analyze the extent to which the extended model (Approach 2) relied on EEG-derived features and demographic information, we visualized the average weights ( ω 1 and ω 2 ) obtained across the 31 subjects. This analysis allowed us to determine whether the extended model focused more on one of these two inputs or if both inputs contributed equally to the emotion predictions. Additionally, we measured the importance of the demographic variables by considering the expanded feature representation shown in Figure 1, which was defined as:
O 2 = f ( X 2 ) = X 2 L ,
where X 2 was the matrix containing the original features of dimension N × 4 , L was the matrix associated with the linear layer used for expansion, and O 2 was the expanded demographic data of dimension N × 16 . As the contribution of the expanded demographic was given by ω 2 O 2 , the contribution of for the j-th dimension of O 2 was:
ω 2 , j O 2 ( j ) = ω 2 , j X 2 L ( j ) = X 2 ( ω 2 , j L ( j ) ) ,
where ω 2 , j was the j-th entry of the ω 2 vector, and L ( j ) was the j-th column of the matrix L. Thus, the contribution of each original demographic features corresponded to the rows of the matrix ω 2 L .
To aggregate the contributions of each feature, we first computed the absolute value of ω 2 L . We then calculated the average across columns, resulting in a vector of four entries. These vectors were normalized per subject, and the mean across subjects was subsequently calculated to identify the demographic features that were most relevant across the cohort.

3. Results

3.1. Model Selection

Table 5 and Table 6 present the performance of the models evaluated under Approach 1. The CNN achieved the lowest performance, likely due to its limited ability to capture temporal dependencies and its assumption of a grid-like data structure. To address spatial relationships more effectively, a GNN was employed, which improved the results. Building on this, we combined the GNN with an LSTM to model sequential dependencies, leading to an overall accuracy of 80%. Finally, replacing the LSTM with a transformer further enhanced the ability to capture both temporal and spatial patterns, yielding a performance of 82%. Consequently, the GNN+Transformer architecture was selected as the baseline model.

3.2. Model Performance

Figure 2 illustrates the average training and testing accuracy for the extended model across 80 epochs. The model shows stable convergence, with training accuracy steadily increasing toward 100 percent. Test accuracy, however, plateaued around 84 percent after approximately 25 epochs, indicating a clear generalization gap of roughly 16 percent. Early stopping was employed during training, but it did not significantly change the observed accuracy trends. Importantly, the test curve remained stable and did not decrease, suggesting that although the model overfitted to some extent as expected for deep models trained on EEG data, it did not exhibit harmful divergence or collapse.
Table 7 presents the average, standard deviation, and 95% confidence intervals for the recall, precision, and macro F1-score across the 31 subjects. The extended model achieved a balanced performance across the three classes, as reflected by F1-scores ranging from 82.3% to 88.5%. To further examine the class-wise distribution, Figure 3 displays the confusion matrix obtained via LOSOCV. For the negative samples, 9.8% were misclassified as neutral, while 7.4% of the positive samples were classified as negative. Nevertheless, the diagonal values indicate that the model consistently achieved performance above 82% for all three classes.

3.3. Comparison Between Baseline and Extended Models

Figure 4 shows the difference, per subject, between the extended and baseline models in predicting negative, neutral, positive, and overall emotions. The overall accuracy of the extended model decreased for 15 subjects, increased for another 15, and showed no change for one subject. The extended model improved performance for negative and positive emotions by 3.37% and 7.33%, respectively. It also achieved an average overall improvement of 2.02% across all subjects.
Regarding overall improvement, Table 8 shows the 95% confidence intervals for the difference between the extended and baseline models after including demographic variables. For negative and positive emotions, the confidence intervals indicated significant improvements of 1.1–7.9% and 0.9–5.9%, respectively. These effects were reflected in the overall prediction, where the 95% CI for the improvement across the 31 subjects was 1.6–6.1%.

3.4. Performance by Demographic Group

Each subplot (a–d) in Figure 5 displays a specific emotion within each nationality group. Significant improvement was observed in the German group for negative emotions (two-sided Wilcoxon rank-sum test; p-value = 0.04). For neutral, the Chinese group showed a near-significant improvement (p-value = 0.07), while the French group showed significant improvement for positive emotions (p-value = 0.04). Overall, emotions showed suggestive improvements in the Chinese (p-value = 0.07) and German (p-value = 0.08) groups.
Figure 6 shows the performance difference between approaches by sex groups. We observed a near-significant improvement for males in recognizing negative emotions (p-value = 0.07). For neutral emotions, a near-significant improvement was observed for females (p-value = 0.08), while for positive emotions, a significant improvement was observed in the female group (p-value = 0.02). Overall, males showed a suggestive improvement in accuracy (p-value = 0.08), whereas females showed a significant improvement (p-value = 0.01).
Figure 7 shows the comparison for the age groups. For negative emotions, there was a significant improvement for ages younger than 23 (p-value = 0.05). For neutral, there were no significant improvements for both age groups (p-value = 0.35 and p-value = 0.23). Both groups resulted in a near-significant improvement in positive emotions. Finally, for the overall predictions, there was a weak improvement for those over 23 (p-value = 0.07), while those younger than 23 resulted in a significant improvement (p-value = 0.02).

3.5. Demographic Ablation Analysis Results

Table 9 shows the results of an ablation study for evaluating the performance of the extended model when each demographic variable was individually removed. Among the variables, nationality had the greatest impact, reducing the overall classification accuracy by 0.9%.

3.6. Fusion Weights

Figure 8 shows the learned attention weights assigned to each of the 16 fused dimensions for both EEG-derived features and demographic data. The variation in weights across dimensions indicates that the model does not treat all features equally during fusion. Notably, dimensions 9 and 11 exhibit contrasting patterns of modality dominance: dimension 9 assigns a low weight to EEG-derived features and a high weight to demographics, whereas dimension 11 assigns nearly equal weights (approximately 0.5) to both modalities. Other dimensions, such as dimension 15, demonstrate a stronger emphasis on EEG-derived features, with 60% of the weight assigned to this type. However, across the 16 dimensions, most provide roughly equal weighting to both feature types (between 45% and 55% for each), suggesting a balanced reliance on both feature types for emotion prediction.

3.7. Feature Importance

Figure 9 shows the attention weights assigned to each demographic variable. Among age, sex, and nationality, the model placed greater emphasis on the dummy-coded nationality variables (Chinese ‘00’, French ‘10’, German ‘01’), with weights exceeding 0.3 across all emotion categories. Among the two nationality variables, the first bit of the dummy variable received the highest overall weight, suggesting that the distinction between being French or not consistently played a prominent role in emotion classification. The variable sex was less important for predicting neutral emotions and had a greater influence on predicting positive emotions. Finally, the variable age showed higher influence for negative emotions and less for positive emotions.

4. Discussion

4.1. Impact of Demographics on Emotion Recognition

Our results indicate that demographic factors influence the prediction of emotions. This suggests that each demographic group processes emotion differently, and so the model identified different patterns based on the combination of demographic features. These differences are supported by the performed hypothesis, which resulted in significant improvements across the demographic group (see Figure 5, Figure 6 and Figure 7).
The predictive improvement obtained by the extended approach highlights the importance of incorporating demographic information into emotion recognition models. Including demographics not only enhances subject-independent approaches but also accounts for group-specific variations in emotional processing, leading to improved overall performance. Therefore, the demographics allow deep learning models to extract relevant patterns for each demographic group, thereby supporting the development of personalized and equitable emotion recognition systems.
Regarding demographics, nationality was the most influential variable in our predictions, as indicated by the attention-weight pattern (see Figure 9). The feature representing French participants (Nationality 1) received the highest attention weights across all emotions, suggesting that nationality plays a particularly significant role in emotion classification. This finding aligns with cross-cultural research indicating that emotional expression and perception are influenced by culturally specific display rules and norms, including variations in expressiveness [40,41,42,43]. Therefore, it is essential to include a cultural background component when conducting EEG-based emotion recognition studies. By considering cultural context, researchers can gain more accurate insights into how emotional expressions and perceptions differ across nationalities and avoid potential biases that may arise from ignoring these factors. This inclusion will enhance the robustness of emotion classification models and foster a deeper understanding of the interplay between culture and emotional processing.

4.2. Comparison with Other Works

Previous models also reported the effect of including demographics to improve subject-independent EEG-based emotion recognition. For instance, Li et al. [23] achieved a maximum improvement of 4.92% for valence and 6.25% for arousal compared to models without demographic input on the DEAP dataset [14]. Even greater gains were observed on the DREAMER dataset [44], with improvements of 8.94% and 7.24% for valence and arousal, respectively. While our model demonstrated a more modest 2.02% overall increase in classification accuracy, it was evaluated on more demographically and culturally diverse datasets (SEED, SEED-FRA, SEED-GER), supporting its stronger generalizability across populations. Moreover, unlike the concatenation-based fusion used in [23], our model employs an attention-based mechanism that dynamically weights EEG and demographic inputs. This approach not only improves interpretability but also reveals that different characteristics contribute unequally across dimensions, highlighting the nuanced role of demographic context in emotional processing. This adaptive fusion strategy provides a more flexible and fine-grained integration of modalities.
Other works, such as Peng et al. and Liu et al., have emphasized the impact of sex and nationality on EEG patterns, often relying on demographic-specific models [24,25]. While effective, such methods require separate training for each group. In contrast, our model handles multiple demographic factors within a single architecture, improving scalability and fairness. Our results suggest that incorporating nationality alongside age and sex not only improves model performance, but also promotes a more inclusive and representative understanding of emotional processing across populations. Finally, our findings align with the call for ethical and inclusive AI in Sheoran & Valderrama, reinforcing the value of demographic-aware emotion recognition models [26]. Finally, Table 10 compares models on the same datasets and shows that incorporating all three demographics yields performance comparable to or better than models using only a single demographic factor.

4.3. Limitations

One of the primary limitations in this study is the scarcity of publicly available EEG datasets. Existing datasets are not only limited in number but also relatively small in size. This poses a challenge when using Transformer-based architecture, which typically requires large amounts of data to effectively learn complex patterns and long-range dependencies. As a result, the limited training data likely contributed to reduced model accuracy and generalizability.
Another significant limitation pertains to the demographic information available in the dataset. For instance, all participants in the datasets fall within a narrow age range of 19 to 29 years. This 10-year span is insufficient to capture meaningful differences in how emotional processing may vary across distinct developmental stages, such as adolescence, adulthood, and seniors. Furthermore, the dataset includes only three nationalities which introduce a demographic bias. Consequently, the model might underperform when applied to individuals from underrepresented or unrepresented national backgrounds.
These limitations in demographic representation raise important ethical considerations. A key objective of incorporating demographic features into subject-independent models is to enhance their fairness and applicability across diverse populations. However, if the training data lacks diversity, the model may inherit and even amplify existing biases. This can compromise the model’s reliability when deployed in real-world settings involving more varied demographic groups. It is crucial to ensure that such models do not inadvertently marginalize individuals outside the demographic scope of the training data.
Finally, we note that we used a single controlled backbone (GNN + Transformer) for both baseline and demographic-extended models to isolate the effect of demographic covariates on interpretability and performance; comparison with large pretrained/foundation encoders (e.g., wav2vec2.0, HuBERT, or Transformer-only EEG models) is left to future work because such models require large-scale pretraining/fine-tuning and complicate interpretability analyses.

4.4. Future Work

To improve the model’s performance and generalization, future work could explore data augmentation techniques [55]. Since collecting large-scale, demographically diverse EEG datasets is challenging, data augmentation can increase the effective training sample size and introduce variability to enhance model robustness. For example, adding low-amplitude Gaussian noise can simulate natural variability in EEG recordings without distorting key patterns [55]. Time-window slicing, already used in our pipeline, can be expanded by treating it as a flexible, tunable process. Varying the window duration, adjusting the overlap rate, or even introducing randomized start times can create diverse temporal views of each trial while preserving the underlying emotional dynamics. Other augmentation techniques, such as temporal jittering (slightly shifting signals in time) or frequency-domain transformations that modify power in specific EEG bands (e.g., alpha, beta), can simulate subject-specific variations in brain activity. Future work could explore and evaluate combinations of these techniques to improve model performance, especially in the context of subject-independent emotion recognition.
Addressing the limited demographic diversity in the SEED datasets is an important direction for future research. Collaborations to develop or access more diverse datasets, along with the use of demographic-aware evaluation metrics, could enhance model fairness and generalizability. Specifically, future studies will aim to include participants from broader age ranges, additional nationalities and cultural backgrounds, and balanced sex distributions to reduce demographic bias. In the short term, techniques such as synthetic data generation and domain adaptation may help investigate the impact of demographic variability on model performance. Cross-dataset validation will also be explored to assess the model’s robustness and generalizability across different EEG datasets.
Finally, future work may investigate alternative or hybrid deep learning architectures and transfer learning from larger EEG datasets to improve model generalization and capture complex EEG patterns, even in scenarios with limited training data. Together, these approaches aim to enhance the practical applicability, fairness, and robustness of subject-independent EEG-based emotion recognition models.

5. Conclusions

This study presents a deep learning approach that combines EEG signals with demographic information to improve emotion recognition in subject-independent scenarios. To that end, we developed and tested our model using three benchmark datasets: SEED, SEED-FRA, and SEED-GER. By incorporating three demographic variables, nationality, biological sex, and age, the predictions of emotions were significantly enhanced. These findings suggest that including demographic data can improve emotion recognition across different individuals, leading to fairer and more accurate results in diverse populations. Therefore, this underscores the importance of demographic-aware modeling in developing personalized and equitable emotion recognition systems.

Author Contributions

N.D., M.O. and C.E.V. designed the study and methodology. N.D. and M.O. implemented the deep learning models and analyzed the results. C.E.V. supervised the work and reviewed the findings. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSERC under Discorvery Grant RGPIN-2024-05575.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the SEED project website at https://bcmi.sjtu.edu.cn/home/seed/ (accessed on 14 March 2025), reference number 16.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Douglas, N.; Oosterhuis, M.; Valderrama, C.E. Evaluating the Impact of Demographic Factors on Subject-Independent EEG-Based Emotion Recognition Approaches. In Proceedings of the International Conference on Advancement in Healthcare Technology and Biomedical Engineering (AHTBE 2025), Vancouver, BC, Canada, 29–30 August 2025. [Google Scholar]
  2. Dalgleish, T.; Power, M.J. Cognition and emotion: Future directions. In Handbook of Cognition and Emotion; Wiley: Chichester, UK, 1999; pp. 799–805. [Google Scholar]
  3. Hajjawi, O.S. Human brain biochemistry. Am. J. Biosci. 2014, 2, 122–134. [Google Scholar] [CrossRef]
  4. Power, J.D.; Petersen, S.E. Control-related systems in the human brain. Curr. Opin. Neurobiol. 2013, 23, 223–228. [Google Scholar] [CrossRef]
  5. Li, X.; Zhang, Y.; Tiwari, P.; Song, D.; Hu, B.; Yang, M.; Zhao, Z.; Kumar, N.; Marttinen, P. EEG based emotion recognition: A tutorial and review. ACM Comput. Surv. 2022, 55, 1–57. [Google Scholar] [CrossRef]
  6. Valderrama, C.E.; Ulloa, G. Spectral analysis of physiological parameters for emotion detection. In Proceedings of the 2012 XVII Symposium of Image, Signal Processing, and Artificial Vision (STSIVA), Medellin, Colombia, 12–14 September 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 275–280. [Google Scholar]
  7. Valderrama, C.E.; Gomes Ferreira, M.G.; Mayor Torres, J.M.; Garcia-Ramirez, A.R.; Camorlinga, S.G. Machine learning approaches to recognize human emotions. Front. Psychol. 2024, 14, 1333794. [Google Scholar] [CrossRef] [PubMed]
  8. Valderrama, C.E.; Sheoran, A. Identifying relevant EEG channels for subject-independent emotion recognition using attention network layers. Front. Psychiatry 2025, 16, 1494369. [Google Scholar] [CrossRef]
  9. Niaki, M.; Dharia, S.Y.; Chen, Y.; Valderrama, C.E. Bipartite Graph Adversarial Network for Subject-Independent Emotion Recognition. IEEE J. Biomed. Health Inform. 2025, 29, 7234–7247. [Google Scholar] [CrossRef]
  10. Hu, F.; He, K.; Wang, C.; Zheng, Q.; Zhou, B.; Li, G.; Sun, Y. STRFLNet: Spatio-Temporal Representation Fusion Learning Network for EEG-Based Emotion Recognition. IEEE Trans. Affect. Comput. 2025, 16, 1–16. [Google Scholar] [CrossRef]
  11. Hu, F.; He, K.; Qian, M.; Liu, X.; Qiao, Z.; Zhang, L.; Xiong, J. STAFNet: An adaptive multi-feature learning network via spatiotemporal fusion for EEG-based emotion recognition. Front. Neurosci. 2024, 18, 1519970. [Google Scholar] [CrossRef]
  12. Hazmoune, S.; Bougamouza, F. Using transformers for multimodal emotion recognition: Taxonomies and state of the art review. Eng. Appl. Artif. Intell. 2024, 133, 108339. [Google Scholar] [CrossRef]
  13. Dharia, S.Y.; Valderrama, C.E.; Camorlinga, S.G. Multimodal deep learning model for subject-independent EEG-based emotion recognition. In Proceedings of the 2023 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Regina, SK, Canada, 24–27 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 105–110. [Google Scholar]
  14. Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. Deap: A database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 2011, 3, 18–31. [Google Scholar] [CrossRef]
  15. Liu, W.; Qiu, J.L.; Zheng, W.L.; Lu, B.L. Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition. IEEE Trans. Cogn. Dev. Syst. 2021, 14, 715–729. [Google Scholar] [CrossRef]
  16. Jiang, W.B.; Liu, X.H.; Zheng, W.L.; Lu, B.L. Seed-vii: A multimodal dataset of six basic emotions with continuous labels for emotion recognition. IEEE Trans. Affect. Comput. 2024, 16, 969–985. [Google Scholar] [CrossRef]
  17. Xu, X.; Shen, X.; Chen, X.; Zhang, Q.; Wang, S.; Li, Y.; Li, Z.; Zhang, D.; Zhang, M.; Liu, Q. A Multi-Context Emotional EEG Dataset for Cross-Context Emotion Decoding. Sci. Data 2025, 12, 1142. [Google Scholar] [CrossRef] [PubMed]
  18. Al Machot, F.; Elmachot, A.; Ali, M.; Al Machot, E.; Kyamakya, K. A deep-learning model for subject-independent human emotion recognition using electrodermal activity sensors. Sensors 2019, 19, 1659. [Google Scholar] [CrossRef]
  19. Maswanganyi, R.C.; Tu, C.; Owolawi, P.A.; Du, S. Statistical evaluation of factors influencing inter-session and inter-subject variability in eeg-based brain computer interface. IEEE Access 2022, 10, 96821–96839. [Google Scholar] [CrossRef]
  20. Quinonero-Candela, J.; Sugiyama, M.; Schwaighofer, A.; Lawrence, N.D. Dataset Shift in Machine Learning; MIT Press: Cambridge, MA, USA, 2008. [Google Scholar]
  21. Samek, W.; Meinecke, F.; Muller, K.R. Single-trial EEG classification of motor imagery using common spatial patterns and support vector machines. J. Neural Eng. 2013, 10, 055008. [Google Scholar]
  22. Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef]
  23. Li, R.; Ren, C.; Li, C.; Zhao, N.; Lu, D.; Zhang, X. SSTD: A novel spatio-temporal demographic network for EEG-based emotion recognition. IEEE Trans. Comput. Soc. Syst. 2022, 10, 376–387. [Google Scholar] [CrossRef]
  24. Peng, D.; Zheng, W.L.; Liu, L.; Jiang, W.B.; Li, Z.; Lu, Y.; Lu, B.L. Identifying sex differences in EEG-based emotion recognition using graph convolutional network with attention mechanism. J. Neural Eng. 2023, 20, 066010. [Google Scholar] [CrossRef]
  25. Liu, W.; Zheng, W.L.; Li, Z.; Wu, S.Y.; Gan, L.; Lu, B.L. Identifying similarities and differences in emotion recognition with EEG and eye movements among Chinese, German, and French People. J. Neural Eng. 2022, 19, 026012. [Google Scholar] [CrossRef]
  26. Sheoran, A.; Valderrama, C.E. Impact of sex differences on subject-independent EEG-based emotion recognition models. Comput. Biol. Med. 2025, 190, 110036. [Google Scholar] [CrossRef]
  27. Sheoran, A.; Valderrama, C. Evaluating Cultural Impact on Subject-Independent EEG-Based Emotion Recognition Across French, German, and Chinese Datasets. In Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics 2025, Atlanta, GA, USA, 26–29 October 2025. [Google Scholar]
  28. Weiss, E.; Siedentopf, C.; Hofer, A.; Deisenhammer, E.; Hoptman, M.; Kremser, C.; Golaszewski, S.; Felber, S.; Fleischhacker, W.; Delazer, M. Sex differences in brain activation pattern during a visuospatial cognitive task: A functional magnetic resonance imaging study in healthy volunteers. Neurosci. Lett. 2003, 344, 169–172. [Google Scholar] [CrossRef]
  29. Knyazev, G.G.; Savostyanov, A.N.; Volf, N.V.; Liou, M.; Bocharov, A.V. EEG correlates of spontaneous self-referential thoughts: A cross-cultural study. Int. J. Psychophysiol. 2012, 86, 173–181. [Google Scholar] [CrossRef]
  30. Gan, L.; Liu, W.; Luo, Y.; Wu, X.; Lu, B.L. A cross-culture study on multimodal emotion recognition using deep learning. In Proceedings of the Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, 12–15 December 2019; Proceedings, Part IV 26. Springer: Cham, Switzerland, 2019; pp. 670–680. [Google Scholar]
  31. Gong, X.; Chen, C.P.; Zhang, T. Cross-cultural emotion recognition with EEG and eye movement signals based on multiple stacked broad learning system. IEEE Trans. Comput. Soc. Syst. 2023, 11, 2014–2025. [Google Scholar] [CrossRef]
  32. Hutchison, A.N.; Gerstein, L.H. The impact of gender and intercultural experiences on emotion recognition. Rev. Cercet. Interv. Soc. 2016, 54, 125. [Google Scholar]
  33. Schunk, F.; Trommsdorff, G.; König-Teshnizi, D. Regulation of positive and negative emotions across cultures: Does culture moderate associations between emotion regulation and mental health? Cogn. Emot. 2022, 36, 352–363. [Google Scholar] [CrossRef] [PubMed]
  34. Zheng, W.L.; Lu, B.L. Investigating Critical Frequency Bands and Channels for EEG-based Emotion Recognition with Deep Neural Networks. IEEE Trans. Auton. Ment. Dev. 2015, 7, 162–175. [Google Scholar] [CrossRef]
  35. Duan, R.; Zhu, J.; Lu, B.L. Differential Entropy Feature for EEG-Based Emotion Classification. In Proceedings of the 6th International IEEE EMBS Conference on Neural Engineering (NER), San Diego, CA, USA, 6–8 November 2013; pp. 81–84. [Google Scholar] [CrossRef]
  36. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  37. Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
  38. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  39. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 5998–6008. [Google Scholar]
  40. Bos, D.O. EEG-based emotion recognition. Influ. Vis. Audit. Stimuli 2006, 56, 1–17. [Google Scholar]
  41. Wu, S.Y.; Schaefer, M.; Zheng, W.L.; Lu, B.L.; Yokoi, H. Neural patterns between Chinese and Germans for EEG-based emotion recognition. In Proceedings of the 2017 8th International IEEE/EMBS Conference on Neural Engineering (NER), Shanghai, China, 25–28 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 94–97. [Google Scholar]
  42. Matsumoto, D. Cultural similarities and differences in display rules. Motiv. Emot. 1990, 14, 195–214. [Google Scholar] [CrossRef]
  43. Jack, R.E.; Garrod, O.G.B.; Yu, H.; Caldara, R.; Schyns, P.G. Cultural confusions show that facial expressions are not universal. Curr. Biol. 2012, 22, R149–R150. [Google Scholar] [CrossRef]
  44. Katsigiannis, S.; Ramzan, N. DREAMER: A database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE J. Biomed. Health Inform. 2017, 22, 98–107. [Google Scholar] [CrossRef]
  45. Suykens, J.A.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
  46. Li, Y.; Zheng, W.; Cui, Z.; Zhang, T.; Zong, Y. A novel neural network model based on cerebral hemispheric asymmetry for EEG emotion recognition. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; pp. 1561–1567. [Google Scholar]
  47. Song, T.; Zheng, W.; Song, P.; Cui, Z. EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Trans. Affect. Comput. 2018, 11, 532–541. [Google Scholar] [CrossRef]
  48. Song, T.; Zheng, W.; Lu, C.; Zong, Y.; Zhang, X.; Cui, Z. MPED: A multi-modal physiological emotion database for discrete emotion recognition. IEEE Access 2019, 7, 12177–12191. [Google Scholar] [CrossRef]
  49. Song, T.; Liu, S.; Zheng, W.; Zong, Y.; Cui, Z. Instance-Adaptive Graph for EEG Emotion Recognition. Proc. AAAI Conf. Artif. Intell. 2020, 34, 2701–2708. [Google Scholar] [CrossRef]
  50. Zhong, P.; Wang, D.; Miao, C. EEG-based emotion recognition using regularized graph neural networks. IEEE Trans. Affect. Comput. 2020, 13, 1290–1301. [Google Scholar] [CrossRef]
  51. Li, Y.; Wang, L.; Zheng, W.; Zong, Y.; Qi, L.; Cui, Z.; Zhang, T.; Song, T. A novel bi-hemispheric discrepancy model for EEG emotion recognition. IEEE Trans. Cogn. Dev. Syst. 2020, 13, 354–367. [Google Scholar] [CrossRef]
  52. Song, T.; Zheng, W.; Liu, S.; Zong, Y.; Cui, Z.; Li, Y. Graph-embedded convolutional neural network for image-based EEG emotion recognition. IEEE Trans. Emerg. Top. Comput. 2021, 10, 1399–1413. [Google Scholar] [CrossRef]
  53. Li, Y.; Chen, J.; Li, F.; Fu, B.; Wu, H.; Ji, Y.; Zhou, Y.; Niu, Y.; Shi, G.; Zheng, W. GMSS: Graph-based multi-task self-supervised learning for EEG emotion recognition. IEEE Trans. Affect. Comput. 2022, 14, 2512–2525. [Google Scholar] [CrossRef]
  54. Zhu, L.; Yu, F.; Huang, A.; Ying, N.; Zhang, J. Instance-representation transfer method based on joint distribution and deep adaptation for EEG emotion recognition. Med. Biol. Eng. Comput. 2024, 62, 479–493. [Google Scholar] [CrossRef] [PubMed]
  55. Rommel, C.; Paillard, J.; Moreau, T.; Gramfort, A. Data augmentation for learning predictive models on EEG: A systematic comparison. J. Neural Eng. 2022, 19, 066020. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The GNN and Transformer reduce 310 EEG-derived DE features to a 16-dimensional vector ( O 1 ). Demographic features (age, sex, nationality) are projected from 4 to 16 dimensions ( O 2 ). An attention module learns weights ω 1 and ω 2 to fuse O 1 and O 2 before classification.
Figure 1. The GNN and Transformer reduce 310 EEG-derived DE features to a 16-dimensional vector ( O 1 ). Demographic features (age, sex, nationality) are projected from 4 to 16 dimensions ( O 2 ). An attention module learns weights ω 1 and ω 2 to fuse O 1 and O 2 before classification.
Diagnostics 16 00144 g001
Figure 2. Training and testing accuracy averaged over all subjects per-epoch across 80 epochs.
Figure 2. Training and testing accuracy averaged over all subjects per-epoch across 80 epochs.
Diagnostics 16 00144 g002
Figure 3. Confusion matrix for the extended model.
Figure 3. Confusion matrix for the extended model.
Diagnostics 16 00144 g003
Figure 4. Performance differences between the extended and baseline models in predicting negative, neutral, positive, and overall emotions. Blue bars represent improvements in prediction accuracy after incorporating demographic information, while red bars indicate a reduction in performance.
Figure 4. Performance differences between the extended and baseline models in predicting negative, neutral, positive, and overall emotions. Blue bars represent improvements in prediction accuracy after incorporating demographic information, while red bars indicate a reduction in performance.
Diagnostics 16 00144 g004
Figure 5. Performance difference between the extended and baseline models across nationality groups for predicting: (a) negative, (b) neutral, (c) positive, and (d) overall emotions (* p-value < 0.1; ** p-value < 0.05).
Figure 5. Performance difference between the extended and baseline models across nationality groups for predicting: (a) negative, (b) neutral, (c) positive, and (d) overall emotions (* p-value < 0.1; ** p-value < 0.05).
Diagnostics 16 00144 g005
Figure 6. Performance difference between the extended and baseline models across sex groups for predicting: (a) negative, (b) neutral, (c) positive, and (d) overall emotions (* p-value < 0.1; ** p-value < 0.05; *** p-value < 0.01).
Figure 6. Performance difference between the extended and baseline models across sex groups for predicting: (a) negative, (b) neutral, (c) positive, and (d) overall emotions (* p-value < 0.1; ** p-value < 0.05; *** p-value < 0.01).
Diagnostics 16 00144 g006
Figure 7. Performance difference between the extended and baseline models across age groups for predicting: (a) negative, (b) neutral, (c) positive, and (d) overall emotions (* p-value < 0.1; ** p-value < 0.05).
Figure 7. Performance difference between the extended and baseline models across age groups for predicting: (a) negative, (b) neutral, (c) positive, and (d) overall emotions (* p-value < 0.1; ** p-value < 0.05).
Diagnostics 16 00144 g007
Figure 8. Average weights ( ω 1 for EEG features and ω 2 for demographic features) for the 16 dimension of the attention fusing model.
Figure 8. Average weights ( ω 1 for EEG features and ω 2 for demographic features) for the 16 dimension of the attention fusing model.
Diagnostics 16 00144 g008
Figure 9. Average attention weights (scaled by ω 2 ) assigned to the four demographic variables across emotion categories. Higher values (red tones) indicate greater relative importance of the demographic feature in emotion prediction.
Figure 9. Average attention weights (scaled by ω 2 ) assigned to the four demographic variables across emotion categories. Higher values (red tones) indicate greater relative importance of the demographic feature in emotion prediction.
Diagnostics 16 00144 g009
Table 1. Summary of related work on demographic-aware EEG-based emotion recognition.
Table 1. Summary of related work on demographic-aware EEG-based emotion recognition.
StudyDataset/SubjectsDemographic FactorsMethod/Experimental DesignKey Findings
Li et al. (2022) [23]SEED (same nationality)Age, SexDemographic variables appended at final layer of deep network under subject-independent settingAge + sex improved recognition performance; cultural factors not considered
Peng et al. (2023) [24]SEED (Chinese subjects)SexCompared same-sex vs. cross-sex training/testing workflowsSame-sex models outperformed cross-sex models; demographic mismatch degrades performance
Liu et al. (2022) [25]SEED, SEED-GER, SEED-FRANationalityCompared within-nationality vs. cross-nationality deep learning pipelinesModels trained/tested on same nationality achieved highest accuracy; cultural background affects emotion-related EEG patterns
Sheoran et al. (2025a) [26]SEED-familySex, Age, NationalityDemographic-informed model using auxiliary metadata fusionIncluding sex, age, nationality increased likelihood of correct prediction
Sheoran et al. (2025b) [27]SEED-familySex, Age, NationalityEvaluated demographic-dependent architectures under subject-independent setupBiological sex and nationality strongly affect model generalization
Table 2. Descriptions of the SEED, SEED-FRA, and SEED-GER datasets. For each dataset, the total number of subjects, number of EEG recordings per subject, nationality, male/female ratio, and average subject age are provided.
Table 2. Descriptions of the SEED, SEED-FRA, and SEED-GER datasets. For each dataset, the total number of subjects, number of EEG recordings per subject, nationality, male/female ratio, and average subject age are provided.
DatasetSubjectsEEG per SubjectNationalityMale/FemaleAverage Age
SEED [34]1545Chinese7/823.27
SEED-FRA [25]863French5/322.50
SEED-GER [25]854German7/122.25
Table 3. Model architecture summary. Abbrev.: B = batch, W = windows, N = 62 nodes, F = 5 features, g = graph feat dim, h = LSTM hidden.
Table 3. Model architecture summary. Abbrev.: B = batch, W = windows, N = 62 nodes, F = 5 features, g = graph feat dim, h = LSTM hidden.
ModelLayers/ModulesInput → OutputKey Dims
CNN2 × Conv2D + MaxPool + FC ( B , 1 , N , F ) ( B , 3 ) 32/64 filters; pool 2 × 1; drop 0.2
GNNGraphConv + FC ( B , N , F ) ( B , 3 ) g = 8 ; ReLU + LayerNorm; drop 0.3
GNN + LSTMGraphConv + BiLSTM + FC ( B , W , N , F ) ( B , 3 ) g = 8 ; h = 32 (bi, concat); drop 0.3
GNN + Transf. (no demo)GraphConv + Transformer + FC ( B , W , N , F ) ( B , 3 ) g = 8 ; heads = H; 1 layer
GNN + Transf. (demo)GraphConv + Transformer + Fusion + FC ( B , W , N , F ) + ( B , 3 ) ( B , 3 ) g = 8 ; heads = H; 1 layer; fusion: Attn(16)
Table 4. Hyperparameters for the CNN, GNN, GNN + LSTM, and GNN + Transformer.
Table 4. Hyperparameters for the CNN, GNN, GNN + LSTM, and GNN + Transformer.
ModelEpochsLearning RateOptimizerWeight Decay
CNN300.001SGD0
GNN500.01SGD0
GNN + LSTM800.001AdamW0.1
GNN + Transformer800.001AdamW0.1
Table 5. Performance per class of the CNN, GNN, GNN + LSTM, and GNN + Transformer for Approach 1.
Table 5. Performance per class of the CNN, GNN, GNN + LSTM, and GNN + Transformer for Approach 1.
ModelNegative (%)Neutral (%)Positive (%)
CNN595169
GNN785352
GNN + LSTM738780
GNN + Transformer798086
Table 6. Overall Performance of the CNN, GNN, GNN + LSTM, and GNN + Transformer for Approach 1.
Table 6. Overall Performance of the CNN, GNN, GNN + LSTM, and GNN + Transformer for Approach 1.
ModelAccuracy (%)Macro F1-Score (%)Macro AUC (%)
CNN606280
GNN616381
GNN + LSTM808091
GNN + Transformer828293
Table 7. Recall, Precision, and F1-score macro for the extended model. For each metric, the mean, standard deviation (SD), and 95% confidence interval (CI) is provided.
Table 7. Recall, Precision, and F1-score macro for the extended model. For each metric, the mean, standard deviation (SD), and 95% confidence interval (CI) is provided.
MetricRecall (%)Precision (%)Macro F1-Score (%)
Mean85.586.285.4
SD8.38.08.4
Lower CI82.483.382.3
Upper CI88.589.288.5
Table 8. 95% confidence intervals for the difference between the extended and baseline models for negative, neutral, positive, and overall emotion prediction. * indicates p < 0.05.
Table 8. 95% confidence intervals for the difference between the extended and baseline models for negative, neutral, positive, and overall emotion prediction. * indicates p < 0.05.
TypeLower CI (%)Upper CI (%)p-Value
Negative1.17.90.033 *
Neutral−0.98.40.103
Positive0.95.90.013 *
Overall1.66.10.004 *
Table 9. Ablation study assessing the impact of removing each demographic variable from the extended approach. The Receiver Operating Characteristic Area Under the Curve (ROC-AUC) is reported for each emotion. The minus are used to indicate that the feature was not included.
Table 9. Ablation study assessing the impact of removing each demographic variable from the extended approach. The Receiver Operating Characteristic Area Under the Curve (ROC-AUC) is reported for each emotion. The minus are used to indicate that the feature was not included.
ModelNegative (%)Neutral (%)Positive (%)Overall (%)
All features92.29495.794.0
- age93.194.595.194.3
- nationality91.793.793.893.1
- sex92.494.495.694.1
Table 10. Concise comparison of prior methods and our approach on SEED, SEED-FRA, and SEED-GER datasets. Performance reported as accuracy (Acc.) and standard deviation (SD).
Table 10. Concise comparison of prior methods and our approach on SEED, SEED-FRA, and SEED-GER datasets. Performance reported as accuracy (Acc.) and standard deviation (SD).
Demographic VariableReferenceDatasetPerformance (Acc./SD)
NationalityKNN [25]SEED54.1/8.7
SEED-FRA37.2/6.8
SEED-GER41.0/7.4
SVM [25]SEED72.6/10.5
SEED-FRA50.1/10.3
SEED-GER55.6/12.2
LR [25]SEED68.4/11.7
SEED-FRA47.2/12.2
SEED-GER50.4/10.9
DNN [25]SEED82.8/7.5
SEED-FRA64.2/8.6
SEED-GER65.9/10.1
DL-LR [27]SEED77.2/5.3
SEED-FRA73.0/5.0
SEED-GER65.6/6.0
SexSVM [45]SEED83.2/9.6
BiDANN [46]83.2/9.6
DGCNN [47]79.9/9.0
A-LSTM [48]72.1/10.8
IAG [49]86.3/6.9
RGNN [50]85.3/6.7
BiHDM [51]85.4/7.5
GECNN [52]82.4/-
BiHDM w/o DA [53]81.5/9.7
GMSS [53]86.5/6.2
JD-IRT [54]83.2/-
Graph-LSTM [8]79.3/5.8
DL-LR [26]81.5/7.8
Sex, Age, NationalityOursSEED85.2/5.6
SEED-FRA78.2/7.4
SEED-GER72.1/6.6
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Douglas, N.; Oosterhuis, M.; Valderrama, C.E. Evaluating the Impact of Demographic Factors on Subject-Independent EEG-Based Emotion Recognition Approaches. Diagnostics 2026, 16, 144. https://doi.org/10.3390/diagnostics16010144

AMA Style

Douglas N, Oosterhuis M, Valderrama CE. Evaluating the Impact of Demographic Factors on Subject-Independent EEG-Based Emotion Recognition Approaches. Diagnostics. 2026; 16(1):144. https://doi.org/10.3390/diagnostics16010144

Chicago/Turabian Style

Douglas, Nathan, Maximilien Oosterhuis, and Camilo E. Valderrama. 2026. "Evaluating the Impact of Demographic Factors on Subject-Independent EEG-Based Emotion Recognition Approaches" Diagnostics 16, no. 1: 144. https://doi.org/10.3390/diagnostics16010144

APA Style

Douglas, N., Oosterhuis, M., & Valderrama, C. E. (2026). Evaluating the Impact of Demographic Factors on Subject-Independent EEG-Based Emotion Recognition Approaches. Diagnostics, 16(1), 144. https://doi.org/10.3390/diagnostics16010144

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop