FCAN–XGBoost: A Novel Hybrid Model for EEG Emotion Recognition

In recent years, artificial intelligence (AI) technology has promoted the development of electroencephalogram (EEG) emotion recognition. However, existing methods often overlook the computational cost of EEG emotion recognition, and there is still room for improvement in the accuracy of EEG emotion recognition. In this study, we propose a novel EEG emotion recognition algorithm called FCAN–XGBoost, which is a fusion of two algorithms, FCAN and XGBoost. The FCAN module is a feature attention network (FANet) that we have proposed for the first time, which processes the differential entropy (DE) and power spectral density (PSD) features extracted from the four frequency bands of the EEG signal and performs feature fusion and deep feature extraction. Finally, the deep features are fed into the eXtreme Gradient Boosting (XGBoost) algorithm to classify the four emotions. We evaluated the proposed method on the DEAP and DREAMER datasets and achieved a four-category emotion recognition accuracy of 95.26% and 94.05%, respectively. Additionally, our proposed method reduces the computational cost of EEG emotion recognition by at least 75.45% for computation time and 67.51% for memory occupation. The performance of FCAN–XGBoost outperforms the state-of-the-art four-category model and reduces computational costs without losing classification performance compared with other models.


Introduction
Emotion is a series of reactions that organisms have in response to internal and external stimuli [1]. It can reflect the current psychological and physiological state of human beings and affect daily activities such as cognition, perception, and rational decision-making [2]. Emotion recognition has broad application prospects in fields such as artificial intelligence (AI), intelligent healthcare, remote education, and virtual reality (VR) games [3,4]. Accurately recognizing human emotions is one of the most urgent issues in the brain-computer interface [5].

•
The differential entropy (DE) and power spectral density (PSD) features of four EEG frequency bands were extracted, and a parallel feature processing network was proposed to perform further feature extraction and feature fusion on the extracted DE features and PSD features. We demonstrated the importance of feature fusion in EEG emotion recognition; • A novel feature attention network (FANet) was proposed to assign different weights to features of varying importance levels. This was performed to enhance the expression ability of features, and it was proven to improve the accuracy of EEG emotion recognition; • A novel FCAN-XGBoost hybrid EEG emotion classification network was proposed. This network was shown to consume fewer computing resources, while still possessing strong accuracy and robustness in EEG emotion recognition; • Extensive four-class classification experiments were conducted on the DEAP and DREAMER public datasets. The experimental results demonstrate dthat the FCAN-XGBoost hybrid model is superior to existing models and significantly reduces the computational cost of emotion recognition.

Different Features for EEG Emotion Recognition
EEG can reflect the electrophysiological activity of brain nerve cells in the cerebral cortex or scalp surface [27]. Human emotion changes and brain nerve activity are closely related, and EEG records the state changes of brain nerve cells during emotion changes in real-time; this signal is very realistic and has a high temporal resolution. Therefore, the results of emotion recognition by EEG are more accurate and reliable [15]. Typically, timedomain features, frequency-domain features, time-frequency features, nonlinear features, or a combination of these features are extracted from EEG signals for this purpose [14,15]. Mehmood et al. [16] employed the Hjorth parameter to extract EEG signal features and utilized random forests for the binary classification of emotions. Their study encompassed binary classification experiments on DEAP, SEED-IV, DREAMER, SELEMO, and ASCER-Sensors 2023, 23, 5680 3 of 20 TAIN datasets, with corresponding accuracy rates of 69%, 76%, 85%, 59%, and 87%. Tripathi et al. [17] extracted nine features, comprising the mean, median, maximum, minimum, standard deviation, variance, value range, skewness, and kurtosis, from the DEAP EEG signal. They employed deep neural networks (DNN) and convolutional neural networks (CNN) for two classifications and attained superior results. Gao et al. [18] extracted fuzzy entropy (FE) and PSD from high-frequency EEG signals and applied multi-order detrended fluctuation analysis (MODFA) to classify emotions. Their study achieved an accuracy rate of 76.39% in the three-category task. Bai et al. [19] extracted DE features from EEG signals of the DEAP dataset and utilized a residual network with deep convolution and point convolution for binary classification, with an accuracy rate of 88.75%. Fraiwan et al. [3] used multiscale entropy (MSE) to extract features from EEG, principal component analysis (PCA) for feature dimension reduction, and, finally, artificial neural networks (ANNs) to predict the enjoyment of museum pieces, obtaining a high 98.0% accuracy.

Fusion Features for EEG Emotion Recognition
Extracting multiple features of EEG and fusing them with different fusion strategies often results in better emotion recognition than single features [20]. Multi-band feature fusion has particularly demonstrated effectiveness in enhancing the accuracy of emotion recognition [28]. An et al. [29] proposed an EEG emotion recognition algorithm based on 3D feature fusion and convolutional autoencoder (CAE), which extracted DE from different frequency bands and fused them into 3D features. Using CAE for emotion classification, the recognition accuracy rates of valence and arousal dimensions on the DEAP dataset were 89.49% and 90.76%, respectively. Gao et al. [30] developed a method of fusing power spectrum and wavelet energy entropy to classify three emotions (neutral, happy, and sad) using support vector machine (SVM) and relational vector machine (RVM). The experimental results showed that the fusion of two features was superior to a single feature. Zhang et al. [31] proposed a multi-band feature fusion method GC-F-GCN based on Granger causality (GC) and graph convolutional neural network (GCN) for emotional recognition of EEG signals. The GC-F-GCN method demonstrated superior recognition performance than the state-of-the-art GCN method in the binary classification task, achieving average accuracies of 97.91%, 98.46%, and 98.15% for arousal, valence, and arousal-valence classification, respectively. Parui et al. [32] extracted various features, including frequency domain features, wavelet domain features, and Hjorth parameters, and used the XGBoost algorithm to perform binary tasks on the DEAP dataset. The accuracy rates of valence and arousal reached 75.97% and 74.206%, respectively. These findings suggest that the use of multiple features and their fusion through appropriate strategies can significantly enhance the recognition accuracy of emotions using EEG signals.

Hybrid Model for EEG Emotion Recognition
In addition to the technique of feature fusion, the application of hybrid models has been proven to be effective in improving the accuracy of emotion recognition [36][37][38]. Various studies have explored this approach and achieved promising results. For example, Chen et al. [39] proposed a cascaded and parallel hybrid convolutional recurrent neural network (CRNN) for binary classification of EEG signals using spatiotemporal EEG features extracted from the PSD of the signals. The proposed hybrid networks achieved classification accuracies of over 93% on the DEAP dataset. Similarly, Yang et al. [40] developed a hybrid neural network that combined a CNN and a recurrent neural network (RNN) to classify emotions in EEG sequences. They converted chain-like EEG sequences into 2D frame sequences to capture the channel-to-channel correlation between physically adjacent EEG signals, achieving an average accuracy of 90.80% and 91.03% for potency and arousal classification, respectively, on the DEAP dataset. Furthermore, Wei et al. [42] proposed a transformer capsule network (TCNet) that consisted of an EEG Transformer module for feature extraction and an emotion capsule module for feature refinement and classification of emotional states. On the DEAP dataset, their proposed TCNet achieved average accuracies of 98.76%, 98.81%, and 98.82% for binary classification of valence, arousal, and dominance dimensions, respectively. These studies demonstrate the potential of hybrid models in enhancing the performance of emotion recognition.

Multi-Category EEG Emotion Recognition
Compared to the research focusing solely on binary emotions, multi-classification research on emotions has promising prospects [42][43][44][45]. Hu et al. [46] introduced a hybrid model comprised of a CNN, a bidirectional long short-term memory network (BiLSTM), and a multi-head self-attention mechanism (MHSA) which transforms EEG signals into temporal frequency maps for emotion classification. The model achieved an accuracy rate of 89.33% for the four-category task using the DEAP dataset. Similarly, Zhao et al. [47] proposed a 3D convolutional neural network model to automatically extract spatiotemporal features in EEG signals, achieving an accuracy rate of 93.53% for the four-category task on the DEAP dataset. Singh et al. [48] utilized SVM to classify emotions by extracting the different features of EEG average event-related potentials (ERPs) and average ERPs, achieving accuracy rates of 75% and 76.8%, respectively, for the four-classification tasks on the DEAP dataset. Gao et al. [49] proposed a new strategy for EEG emotion recognition that utilized Riemannian geometry. Wavelet packets were used to extract the time-frequency features of EEG signals to construct a matrix for emotion recognition, achieving an accuracy rate of 86.71% for the four-category task on the DEAP dataset.
In conclusion, there remains ample opportunity to enhance the precision of EEGbased emotion recognition. To this end, we present a novel hybrid model, the FCAN-XGBoost, aimed at improving the accuracy of four-category EEG emotion recognition while minimizing computational costs. Unlike previous work in this area, we propose the FCAN-XGBoost model to achieve this goal, which is a novel hybrid model for four-category EEG emotion recognition using EEG fusion features. Figure 1 shows the overall framework and flow of the proposed new model FCAN-XGBoost is comprised of three modules, namely, the feature extraction module, the FCAN module, and classifier. The feature extraction module is tasked with extracting pertinent features from EEG signals across various frequency bands, whereas the FCAN module is responsible for the comprehensive processing and fusion of features. The function of classifier, on the other hand, is to facilitate the classification of emotion. A detailed account of the framework and process of the proposed model is provided in subsequent sections.
DEAP is a multimodal dataset consisting of 32 participants watching 40 one-minute music videos. The dataset consists of a range of physiological signals that include galvanic skin response, EEG, EMG, electrooculogram (EOG), skin temperature, blood volume pressure, and respiration rate. The EEG signals were recorded from 32 electrodes in accordance with the international 10-20 system at a sampling rate of 512 Hz. Additionally, each participant used the self-assessment manikin (SAM) to rate their emotional arousal, valence, liking, and dominance for every trial. The participants provided numerical scores between 1 to 9 to indicate their emotional states.
DREAMER is a multimodal dataset that encompasses 23 participants, each of whom underwent 18 distinct trials. EEG signals were acquired using a wearable, low-cost EEG acquisition device comprising 14 EEG channels, which were sampled at a frequency of 128 Hz. Similar to the DEAP dataset, participants' emotional states were evaluated via a continuous emotion model, and each participant was asked to rate their emotions on three
DEAP is a multimodal dataset consisting of 32 participants watching 40 one-minute music videos. The dataset consists of a range of physiological signals that include galvanic skin response, EEG, EMG, electrooculogram (EOG), skin temperature, blood volume pressure, and respiration rate. The EEG signals were recorded from 32 electrodes in accordance with the international 10-20 system at a sampling rate of 512 Hz. Additionally, each participant used the self-assessment manikin (SAM) to rate their emotional arousal, valence, liking, and dominance for every trial. The participants provided numerical scores between 1 to 9 to indicate their emotional states.
DREAMER is a multimodal dataset that encompasses 23 participants, each of whom underwent 18 distinct trials. EEG signals were acquired using a wearable, low-cost EEG acquisition device comprising 14 EEG channels, which were sampled at a frequency of 128 Hz. Similar to the DEAP dataset, participants' emotional states were evaluated via a continuous emotion model, and each participant was asked to rate their emotions on three dimensions (arousal, potency, and dominance) using the SAM scale, ranging from 1 to 5, for each trial.

Feature Extraction Module
The function of the feature extraction module in the emotion recognition model is primarily to extract EEG signal features and engage in the data processing. Previous studies have demonstrated the efficacy of DE and PSD features in EEG-based emotion recognition [29,39]. The PSD of EEG signals reflects the distribution of EEG signal power in different frequency bands [52]. The DE [53] of an EEG signal is an extension of Shannon entropy on continuous variables. For a specific length of EEG signal that approximately follows a Gaussian distribution, its DE is equal to the logarithm of its energy spectrum in

Feature Extraction Module
The function of the feature extraction module in the emotion recognition model is primarily to extract EEG signal features and engage in the data processing. Previous studies have demonstrated the efficacy of DE and PSD features in EEG-based emotion recognition [29,39]. The PSD of EEG signals reflects the distribution of EEG signal power in different frequency bands [52]. The DE [53] of an EEG signal is an extension of Shannon entropy on continuous variables. For a specific length of EEG signal that approximately follows a Gaussian distribution, its DE is equal to the logarithm of its energy spectrum in a specific frequency band. Notably, PSD and DE features are the most widely used features in the field of EEG emotion recognition. As a result, in this study, we extracted DE and PSD features of EEG signals from the DEAP and DREAMER public datasets for the subsequent emotion recognition process.
Equation (1) delineates the equation to calculate the DE of an EEG signal segment of length [a, b] that closely conforms to a gaussian distribution N(µ, σ 2 ): Assuming an EEG signal of length M, denoted as x(t), if we consider the value of t to be 0 ∼ M − 1, the PSD of the signal can be determined using Equation (2): where the variable P(ω k ) is the power spectral density; γ(t) denotes the autocorrelation function of x(t); k = −(M − 1), −(M − 2), · · ·, 0, 1, · · ·, M − 1; ω k is angular frequency; and t denotes time.

FCAN Module
The feature processing module encompasses four individual sub-modules, namely FCN1, FANet, feature fusion model, and FCN2.

FCN1
The fundamental architecture of FCN1 is a multi-layered fully connected neural network (FCN) [54] comprising an input layer, a hidden layer, and an output layer. Assuming that the input of the l-th neuron in the i-th layer is x l i and the output is y l i gives the following: where n is the number of neurons in the l − 1-th layer; w l ij is the weight between the l-th neuron in the i-th layer and the l − 1-th neuron in the j-th layer; b l i is the bias of the l-th neuron in the i-th layer; and σ is the ReLU [54] activation function.
The training procedure of FCN primarily consists of two fundamental steps: forward propagation and backpropagation [55]. During forward propagation, the network computes the output values based on the input data. In contrast, backpropagation aims to minimize the error between the predicted and actual output by adjusting the network's weight and bias parameters. This iterative process involves computing the gradient of the loss function with respect to the network's parameters and updating them, accordingly, to minimize the objective function.
Suppose the training data set is D = {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x m , y m )}, where x i is the input data and y i is the corresponding label. The objective of FCN is to minimize the difference between the output value generated by the network and the actual value. This can be achieved by minimizing the loss function L, which is defined as follows: where L denotes the training loss; l(y i ,ŷ i ) denotes the loss per prediction; and y i andŷ i are the actual and predicted values, respectively. Fully connected neural networks have high flexibility, high adaptivity, high interpretability, high scalability, and high robustness in processing EEG data and can flexibly adjust the model structure by regularizing and increasing the number of fully connected layers or nodes to adapt to more complex EEG data and tasks. Thus, two different fully connected neural networks, FCN1 and FCN2, were used in this study to process the features.
The FCN1 network consists of two fully connected networks with the same structure, which are mainly responsible for the initial processing of DE features and PSD features; its structure is shown in Figure 2, and the detailed parameters are shown in Table 1.
The FCN1 module, as illustrated in Figure 1, consists of two fully connected neural networks with identical structures running in parallel. They are responsible for separately processing the DE and PSD features extracted from the feature extraction module. The input shape of the network that processes DE features in FCN1 is (128, ), while the input shape of the network responsible for processing PSD features is (56, ). Both DE and PSD features processed by FCN1 produce an output shape of (128, ). The preliminary processing of DE and PSD features by the FCN1 module can yield a more profound representation of these features.
processing the DE and PSD features extracted from the feature extraction mod input shape of the network that processes DE features in FCN1 is (128, ), while th shape of the network responsible for processing PSD features is (56, ). Both DE a features processed by FCN1 produce an output shape of (128, ). The prelimina cessing of DE and PSD features by the FCN1 module can yield a more profoun sentation of these features.

FANet
Motivated by the channel attention mechanism proposed in prior works aimed to enhance the descriptive potential of features by introducing the FANet feature processing module after FCN1. The architecture of the FANet is presented ure 3, while the corresponding layer-wise parameter configurations are provided 2.

FANet
Motivated by the channel attention mechanism proposed in prior works [56], we aimed to enhance the descriptive potential of features by introducing the FANet into the feature processing module after FCN1. The architecture of the FANet is presented in Figure 3, while the corresponding layer-wise parameter configurations are provided in Table 2.    By imparting distinct weights to the processed DE and PSD features, the FANet amplifies the salient features while attenuating the less significant ones. This leads to an improved recognition ability in the emotion classifier. FANet consists of two parallel submodules with the same structure, one for processing DE features from FCN1 and the other for processing PSD features from FCN1. Both submodules output a shape of (128, ). FCN2 comprises a multi-layer FCN. The fusion of the two features results in a doubling of their dimensions. However, high-dimensional EEG features are typically unsuitable for emotion classification. As a result, prior to feeding to the classifier, dimensionality reduction is necessary. In this research, the aim was to attain both the effect of feature dimensionality reduction and the acquisition of emotionally expressive features. Therefore, conventional algorithms such as stacked autoencoder (SAE) [5] or principal component analysis (PCA) [3] were not employed to reduce the dimensionality of the fusion features. Instead, an FCN2 network with an output dimension of four was introduced after the feature fusion module to obtain low-dimensional features with a more pronounced emotional expression capacity. Figure 4 depicts the structure of the FCN2 network, while Table 3 shows the detailed parameters. By imparting distinct weights to the processed DE and PSD features, the FANet plifies the salient features while attenuating the less significant ones. This leads to an proved recognition ability in the emotion classifier. FANet consists of two parallel modules with the same structure, one for processing DE features from FCN1 and the o for processing PSD features from FCN1. Both submodules output a shape of (128, ).

FCN2
FCN2 comprises a multi-layer FCN. The fusion of the two features results in a bling of their dimensions. However, high-dimensional EEG features are typically un able for emotion classification. As a result, prior to feeding to the classifier, dimension reduction is necessary. In this research, the aim was to attain both the effect of fea dimensionality reduction and the acquisition of emotionally expressive features. Th fore, conventional algorithms such as stacked autoencoder (SAE) [5] or principal com nent analysis (PCA) [3] were not employed to reduce the dimensionality of the fusion tures. Instead, an FCN2 network with an output dimension of four was introduced the feature fusion module to obtain low-dimensional features with a more pronou emotional expression capacity. Figure 4 depicts the structure of the FCN2 network, w Table 3 shows the detailed parameters.

Feature Fusion Module
The most frequently used feature fusion approaches in deep learning-based algorithms are feature vector addition, multiplication, and concatenation [30]. This research investigates the influence of different feature fusion strategies on the efficacy of emotion recognition through experimental analysis. Following a meticulous comparison of the results, the method of feature concatenation was ultimately selected for implementation, which can be defined using Equations (5)- (7): where the DE features are denoted by X DE and the PSD features are denoted by X PSD . The final set of features obtained after the fusion of these individual feature sets is denoted by X con . As described in Equations (5)- (7), the feature fusion module is used to fuse the DE and PSD features from FANet, and the fusion strategy used by the feature fusion module is concatenation. The DE and PSD features of shape (128, ) from the FANet module are processed by the feature fusion module and fused into a new feature vector of shape (256, ).

Classifier
The classification algorithm we chose was XGBoost. XGBoost is an ensemble learning algorithm based on decision trees and is an enhanced version of the gradient boosting algorithm. The algorithm's fundamental principle is to construct a robust classifier by combining multiple weak classifiers. At each iteration, XGBoost refines the weights of each weak classifier based on the current model's performance, thereby enhancing the subsequent iteration's capacity to fit the data optimally. The specific mathematical expression of the XGBoost algorithm is presented below: whereŷ i (t) denotes the predicted value of the t-th iteration,ŷ i (t−1) denotes the predicted value of the t − 1 iteration, and f t (x i ) denotes the prediction of the t-th decision tree, which is integrated into the model. The objective function, Obj, is incorporated into the model to assess the degree of fitting of the model. Additionally, a regularization term, Ω( f t ), is included to regulate the complexity of the model. Lastly, f t denotes the t-th decision tree used in the model.
XGBoost has significantly improved modeling efficiency compared to the general gradient boosting decision tree (GBDT) [57], surpassing the random forest (RF) [58] model by more than double and achieving ten times faster performance than GBDT. Consequently, we selected XGBoost as the classifier to enhance the efficiency of emotion recognition.
The emotions were categorized into four categories: high valence high arousal (HVHA), high valence low arousal (HVLA) [43], low valence high arousal (LVHA), and low valence low arousal (LVLA). HVHA corresponds to the excitement, which occurs when the participant is in a high valence and high arousal state during the experiment. HVLA represents calmness or relaxation, which occurs when the participant is in a high valence and low arousal state during the experiment. LVHA corresponds to anger or depression, which occurs when the participant is in a low valence and high arousal state during the experiment. LVLA represents sadness and dejection, which occurs when the participant is in a low valence and low arousal state during the experiment.

DEAP Dataset Processing
When processing EEG data from the DEAP dataset, a cut-off point of five was utilized, whereby labels below five were assigned a value of zero, and those above five were given a value of one. Four distinct emotions were classified, including HVHA, HVLA, LVHA, and LVLA, which were labeled as 0, 1, 2, and 3, respectively. The EEG signals used in the experiment underwent downsampling to 128 Hz, while EOG artifacts were removed, and a band-pass filter ranging from 4 Hz to 45 Hz was applied for filtering. The EEG signal of each participant was decomposed into four frequency bands: theta, alpha, beta, and gamma, after which the EEG signal was intercepted with a 2 s [60] non-overlapping time window. In

DREAMER Dataset Processing
When processing EEG data from the DEAP dataset, we used 2.5 as the cut-off point; labels below 2.5 were assigned a value of 0, and labels above 2.5 were assigned a value of 1. When processing EEG data from the DREAMER dataset, a cut-off point of 2.5 was employed, with labels below 2.5 assigned a value of 0 and those above 2.5 assigned a value of 1. Four distinct emotions were classified in this experiment, namely, HVHA, HVLA, LVHA, and LVLA, which were labeled as 0, 1, 2, and 3, respectively. Each trial in the

Baseline Model
To ascertain the efficacy of the proposed classification model, a comparative analysis was conducted between the classification performance of the proposed FCAN-XGBoost model and two other models, namely FCAN-SVM and FCAN-LSTM. To ensure an equitable evaluation, the same data processing techniques, and experimental setup were employed for all baseline models.

FCAN-SVM
In previous studies on EEG emotion recognition, SVM [61] has been widely utilized as a classification model with promising results [18,29,47]. In this study, we introduce a novel approach by fusion the FCAN module with the SVM algorithm to propose the FCAN-SVM algorithm. The FCAN-SVM algorithm was subsequently utilized as one of the baseline models for the experiment.

FCAN-LSTM
Long short-term memory (LSTM) [62] is a variant of the recurrent neural network (RNN) [63] architecture. It was first introduced by Hochreiter and Schmidhuber in 1997 and has undergone various optimizations and improvements by researchers over the years. LSTM is particularly adept at learning long-term dependencies, making it a suitable algorithm for processing and predicting time series data. Many researchers have utilized the LSTM in their studies, including EEG emotion recognition studies [39,46]. In this study, we present the FCAN-LSTM model, which fuses the LSTM algorithm with the FCAN module. The LSTM component of our model comprises two LSTM layers and a connection layer. We also employ this architecture as a benchmark model for our experiments.

Performance Evaluation Metrics
To evaluate the classification performance of the model comprehensively and objectively, we treated the four classification tasks as separate binary classification problems and used the following evaluation metrics: accuracy, precision, recall, and F1-score. Accuracy represents the proportion of samples correctly predicted by the model. Precision refers to the fraction of correctly predicted positive samples among all samples that the model predicts as positive, while recall denotes the fraction of correctly predicted positive samples among all actual positive samples. The F1-score is a performance metric that takes into account both precision and recall in its calculation. These four metrics are defined below: where TP, FN, TN, and FP denote true positives, false negatives, true negatives, and false positives, respectively. In addition, to assess the internal consistency and reliability of the measures or scales used in the study, we evaluated the results of our experiments using Cronbach's alpha, a statistical indicator of the internal consistency of a measurement instrument. Typically, it takes on a value between 0 and 1, with larger values representing the higher reliability of the measurement instrument.

Results and Discussion
We trained the model on an NVIDIA GTX 1080ti GPU. The learning rates f for FCAN and XGBoost were set to 0.001 and 0.25, respectively, and a dynamic learning rate adjustment mechanism was used during the model training. The optimization function was set to Adam optimization. The loss function was set to cross-entropy. In our experiments, we divided the data into training and test sets in the ratio of 8:2.

Ablation Experiments
To objectively verify the classification effect of our model, we conducted three kinds of ablation experiments on the DEAP and the DREAMER. The first experiment was to explore the influence of different feature fusion methods on the accuracy of emotion recognition; the second was to verify the effect of the FANet module on emotion recognition; the third was to explore the influence of the position of the fully connected layer in the FCN2, where the XGBoost algorithm is located in the classification module, on the experimental results.

Feature Fusion Ablation Experiments
The experiments were performed on two data sets for emotion classification using only DE features, emotion classification using only PSD features, and emotion classification using the addition, multiplication, and concatenation of DE and PSD features. The results of the experiments are shown in Tables 4 and 5. In Tables 4 and 5, DE represents emotion recognition using DE features only, PSD represents emotion recognition using PSD features only, X add represents emotion classification by adding DE and PSD features, X mult represents emotion classification by multiplying DE and PSD features, and X con represents emotion classification by concatenating DE and PSD features. Tables 4 and 5 show that the accuracy of emotion classification using only DE and PSD features was lower than the classification achieved by fusing the two features. Furthermore, compared to emotion classification using the multiplication of the two features, the addition of DE and PSD features did not lead to higher accuracy. In particular, the best results were obtained by concatenating DE and PSD features for emotion classification. DEAP and DREAMER datasets achieved the highest accuracies of 95.26% and 94.05%, respectively. These results demonstrate that the concatenation of DE and PSD features can significantly improve the accuracy of EEG-based emotion recognition. The Cronbach's alpha values in both Tables 4 and 5 were 0.99, thus showing the high internal consistency and reliability of the measures used in this study.

FANet Module Ablation Experiments
Experimental results are shown in Tables 6 and 7. The impact on emotion classification was investigated by including and excluding FANet modules in feature processing modules and by placing FANet modules in different positions. In Tables 6 and 7, FCAN-XGBoost represents the model with the FANet module, FCN-XGBoost represents the model without the FANet module, and AF represents the experimental results with the FANet module placed after the feature fusion module. Tables 6 and 7 show that, for the DEAP and DREAMER datasets, the emotion recognition accuracy with the FANet module improved by 0.48 and 1.35 percentage points, respectively, compared to those without the FANet module. This indicates that the inclusion of the FANet module in the feature processing module helped to improve the classification performance of the model emotion. The Cronbach's alpha values in Tables 6 and 7 are 0.96 and 0.99, respectively, indicating high measurement consistency and reliability.

Impact of XGBoost Algorithm at Different Positions in FCN2
As seen in Section 3.4.3, there are five fully connected layers in FCN2 in FCAN-XGBoost. We fed the outputs of the different fully connected layers in FCN2 into the XGBoost classifier for an emotion classification experiment, and the experimental results are shown in Table 8.  Table 8 shows the experimental results, where various combinations of fused features and fully connected layers were evaluated for emotion classification. Specifically, No_FC represents when the fused features were directly fed into the XGBoost classifier without passing through the FCN2 network. In contrast, IFC1, IFC2, IFC3, IFC4, and IFC5 refer to the scenarios where the outputs of the first, second, third, fourth, and last fully connected layers in FCN2 were used for emotion classification, respectively. The Cronbach's alpha value of 0.97 in Table 8 also demonstrates good consistency and reliability of the measurements. The FCN2 network aims to reduce the dimensionality of the fused features and enhance their emotional expressiveness, leading to improved performance in emotion classification. Our experimental results validate the effectiveness of the proposed FCAN-XGBoost algorithm.

Comparative Experiments
We comparef the proposed emotion classification model with two baseline models and the state-of-the-art emotion classification models.
Tables 9 and 10 provide a comprehensive account of the relative time and memory consumption by each model employed for the task of emotion recognition. Specifically, the metric of "Time" signifies the duration taken by each model to perform the task of emotion recognition on the test set, whereas "Memory" represents the extent of memory space occupied by each model in executing the task of emotion recognition on the test set.
Notably, our proposed model outperformed the two baseline models on the DEAP and DREAMER datasets, in terms of achieving higher accuracy in emotion recognition while requiring fewer computational resources, as evidenced by its relatively lesser memory usage and shorter computation time.   The findings indicate that the proposed FCAN-based emotion recognition model exhib significantly superior classification performance than the two baseline models.
Furthermore, we compared the emotion recognition model of the proposed mod with the state-of-the-art models. The results are shown in Table 11.  Furthermore, we compared the emotion recognition model of the proposed model with the state-of-the-art models. The results are shown in Table 11.      Table 11 displays the comparative analysis of our model with other existing models [16,[42][43][44][45][46][47][48]50,51], and it illustrates that our proposed model exhibits exceptional performance in recognizing emotions. We conducted the emotion four-category task on two widely used datasets, namely, DEAP and DREAMER, and the attained accuracy rates were 95.26% and 94.05%, respectively. These results demonstrate the efficacy of our FCAN-XGBoost emotion classification model in recognizing four-category emotions.

Conclusions
This paper proposes a novel emotion recognition model named FCAN-XGBoost. A feature fusion strategy was employed to obtain fusion features. Motivated by the channel attention mechanism, we first proposed FANet to assign different weights to features of different importance levels to improve the classification performance of the model. To further improve accuracy, the FCAN and XGBoost algorithms were fused for emotion recognition. Results obtained from experiments conducted on two datasets, DEAP and DREAMER, demonstrate that the proposed model outperforms existing state-of-the-art models. Specifically, the proposed model achieved an accuracy of 95.26% and 94.05% on the four-class classification task for the DEAP and DREAMER datasets, respectively. Additionally, on the DEAP dataset, our model reduced memory consumption by approximately 92.78% and computing time by 76.70% compared to FCAN-SVM and reduced memory consumption by approximately 70.80% and computing time by 93.47% compared to FCAN-LSTM. On the DREAMER dataset, our model reduced memory consumption by approximately 94.43% and computing time by 75.45% compared to FCAN-SVM and reduced memory consumption by approximately 67.51% and computing time by 81.87% compared to FCAN-LSTM. This indicates that the proposed model significantly reduces computational costs while improving classification accuracy for EEG-based emotion recognition. Furthermore, the proposed model can be generalized to other multi-channel physiological signals for classifi-cation and recognition tasks, such as those for motor imagery, fatigue driving detection, and gesture recognition based on physiological signals.

Data Availability Statement:
The data for this study were obtained from publicly available datasets. The DEAP dataset is available at http://www.eecs.qmul.ac.uk/mmv/datasets/deap/index.html (accessed on 30 November 2022), and the DREAMER dataset is available at https://zenodo.org/ record/546113# (accessed on 6 January 2023).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: