Emotion Classification from Multi-Band Electroencephalogram Data Using Dynamic Simplifying Graph Convolutional Network and Channel Style Recalibration Module

Because of its ability to objectively reflect people’s emotional states, electroencephalogram (EEG) has been attracting increasing research attention for emotion classification. The classification method based on spatial-domain analysis is one of the research hotspots. However, most previous studies ignored the complementarity of information between different frequency bands, and the information in a single frequency band is not fully mined, which increases the computational time and the difficulty of improving classification accuracy. To address the above problems, this study proposes an emotion classification method based on dynamic simplifying graph convolutional (SGC) networks and a style recalibration module (SRM) for channels, termed SGC-SRM, with multi-band EEG data as input. Specifically, first, the graph structure is constructed using the differential entropy characteristics of each sub-band and the internal relationship between different channels is dynamically learned through SGC networks. Second, a convolution layer based on the SRM is introduced to recalibrate channel features to extract more emotion-related features. Third, the extracted sub-band features are fused at the feature level and classified. In addition, to reduce the redundant information between EEG channels and the computational time, (1) we adopt only 12 channels that are suitable for emotion classification to optimize the recognition algorithm, which can save approximately 90.5% of the time cost compared with using all channels; (2) we adopt information in the θ, α, β, and γ bands, consequently saving 23.3% of the time consumed compared with that in the full bands while maintaining almost the same level of classification accuracy. Finally, a subject-independent experiment is conducted on the public SEED dataset using the leave-one-subject-out cross-validation strategy. According to experimental results, SGC-SRM improves classification accuracy by 5.51–15.43% compared with existing methods.


Introduction
In recent years, with the development of artificial intelligence, emotion classification has presented important application prospects in human-computer interaction, disease monitoring, artificial intelligence education, intelligent transportation, and other fields. For example, if drivers' emotions can be recognized, some intervention measures may be taken to avoid accidents when drivers' concentration is severely disturbed [1]. Emotion classification methods are generally divided into two categories according to the types of signals analyzed: one is based on non-physiological signals, such as text, audio, facial expression, and body language; the other is based on physiological signals, such as electroencephalogram (EEG), electrocardiogram, galvanic skin response (GSR), and photoplethysmogram (PPG). Progress has been made in both types of methods. For example, Li et al. proposed a semi-supervised deep facial expression recognition method based on reduce the time loss to ensure good model performance. Based on the above analysis, this study proposes an emotion classification method based on a dynamic simplifying graph convolutional (SGC) network and channel style recalibration module (SRM), termed SGC-SRM, with multi-band EEG data as input. The main contributions are as follows: (1) A multilayer SGC network is built, which extracts sub-band features in parallel, updates the adjacency matrix through backpropagation, and realizes dynamic learning of EEG topology. (2) A convolution layer based on SRM is introduced to recalibrate the channel features of the sub-band to improve emotion-related feature extraction. (3) Features of four sub-bands are fused to achieve more accurate emotion classification, and 12 channels suitable for emotion classification are selected to reduce time consumption.
The remaining sections of this paper are arranged as follows: Section 2 gives an overview of SGC-SRM-related technologies. Section 3 describes the architecture and implementation of the proposed SGC-SRM model. In Section 4, experiments are conducted and the model performance and results are analyzed. Section 5 summarizes the main achievements of this study and highlights the future research direction.

EEG Emotion Classification Based on Spatial-Domain
EEG-based emotion classification frequently uses CNNs to extract spatial information from EEG (e.g., EmotionNet [12]). Nevertheless, there is more redundant information between multi-channel signals, which not only increases time consumption but also reduces classification accuracy. To compensate for the shortcomings of CNNs, some researchers extract the relationship between different EEG signal channels through capsule networks. For example, Kumari et al. used the short-term Fourier transform algorithm to transform raw one-dimensional EEG signals into a two-dimensional spectrogram image and implemented a capsule network to process the spatio-temporal characteristics of EEG signals. The average accuracy of valence, arousal, and dominance on the DEAP dataset is 77.50%, 78.44%, and 79.38% respectively [13]. Deng et al. used a capsule network to extract the spatial features of EEG channels, combined with the attention mechanism to adaptively assign different weights to each EEG channel, and used LSTM to extract the temporal features of EEG sequences. The average accuracy of valence, arousal, and dominance on the DEAP dataset is 97.17%, 97.34%, and 96.50%, respectively [14]. However, the dynamic routing operation of capsule networks requires significant computational overhead; thus, it is necessary to find optimal solutions.
Graph neural networks (GNNs) were introduced in 2009 by Scarselli et al. to deal with graph data [15]. The improved graph CNN (GCNN) method combines CNNs with spectral theory and provides an effective method to describe the intrinsic relationships between different nodes of a graph [16]. Due to this, the spatial location connection between channels in EEG-based emotion classification does not represent the functional connection between them. Song et al. proposed a dynamic graph CNN (DGCNN), which used the Gaussian kernel function to initialize the adjacency matrix, and dynamically learned the internal relationship between different EEG channels represented by the adjacency matrix. On the SEED dataset, subject-independent accuracy reaches 79.95% [9]. Subsequently, a GNN has been extensively used for EEG-based emotion classification. For example, Song et al. proposed a graph-embedded CNN (GECNN) to extract distinctive local features, and global features were captured to identify EEG emotions [17]. Jin et al. proposed a graph convolutional network (GCN) with learnable electrode relations that learns the adjacency matrix automatically in a goal-driven manner using the two-dimensional distribution of electrodes as the initial adjacency matrix (0 indicates that two electrodes are not adjacent; 1 indicates that they are adjacent), and the subject-dependent recognition accuracy of DE features on SEED was 94.72% [18]. Li et al. proposed a self-organizing GNN (SOGNN) for cross-subject, where the graph structure was dynamically constructed using the selforganized module of each signal for EEG-based emotion classification [19]. Zhang et al. proposed a sparse DGCNN (SparseD) by imposing a sparseness constraint on the weighted graph to improve EEG-based emotion classification performance [20].

Channel Selection and Sub-Band Feature Extraction
EEG signals contain rich brain activity information and are distributed in different frequency bands [21]. Wang et al. found that emotional characteristics were mainly related to high-frequency bands, e.g., the alpha band is located in the right occipital lobe and parietal lobe, the beta band is located in the parietal and temporal lobes, and the gamma band is located in the left frontal lobe and right temporal lobe [22]. Therefore, the distribution of emotion-related information is not the same at different sub-bands. Zhu et al. proposed an EEG-based emotion classification network based on the attention fusion of multi-channel band features, which combined multiple frequency bands through feature addition, multiplication, and attention; the highest accuracy achieved on SEED was 96.45% [23]. Therefore, if the information of each sub-band is extracted in parallel and then the emotion-related information of each sub-band is fully mined by the importance fusion, classification performance will be improved.

SRM
SRM was proposed to adaptively recalibrate intermediate feature maps using the style information of the intermediate feature map [24]. Initially, an intermediate style representation T is extracted from each channel of the feature map X through style pooling, and then, the per-channel recalibration weight G is estimated through style integration independent of channels. Finally, the input features X and G are calculated to obtain the calibrated feature map X .
As depicted in Figure 1, SRM is mainly composed of style pool and integration: (1) the style pool is introduced to calculate the average and standard deviation of each feature map to extract style features T; (2) the style integration is composed of a channel-wise fully connected layer (FC), a batch normalization (BN) layer, and a sigmoid activation function. Inspired by image style migration, SRM was originally used to extract image style and incorporate relatively important style features into feature maps. Zhang et al. designed a style discriminator with an SRM to capture seasonal style features on remote sensing images [25]. Lu et al. performed target detection on video-induced EEG signals, extracted EEG spatio-temporal features with graph convolution, and improved the SRM to select features with larger contributions [26]. Bao et al. added an SRM to a CNN to extract deep features and select features with high correlation with emotion and obtained improved results in a subject-depended experiment on the SEED dataset (95.08%) [27]. Therefore, the introduction of the SRM to adaptively recalibrate the intermediate features learned from sub-bands can incorporate them into the feature maps, thereby minimizing  features with larger contributions [26]. Bao et al. added an SRM to a CNN to extract deep features and select features with high correlation with emotion and obtained improved results in a subject-depended experiment on the SEED dataset (95.08%) [27]. Therefore, the introduction of the SRM to adaptively recalibrate the intermediate features learned from sub-bands can incorporate them into the feature maps, thereby minimizing the loss of information and improving the feature extraction ability of the network [27].

Methodology
To effectively utilize topology information of EEG signals in both the frequency and spatial domains, the SGC-SRM model is proposed in this study. First, the DE feature extracted from each sub-band of an EEG signal is used as input. Second, considering that different emotional states show different degrees of activation in different frequency bands [6], we extract the features of each sub-band respectively, and then fuse them according to the importance of the frequency bands, which can better mine the information in the frequency-domain. Given that EEG signals contain topological information, we improve the dynamic SGC to learn the relationship between channels. When extracting sub-band features, we added an SRM-based convolution layer to adaptively learn the intermediate feature map and recalibrate the channel features to emphasize the information related to emotions and ignore other information. Finally, we use the full connection layer and softmax for the triple classification (positive, neutral, negative).

Construction of Adjacency Matrix
The definition of EEG in the graph structure is represented by G = (V, E, A); among them, V denotes the set of nodes of the graph, |V| = C; E represents the set of edge connections between different nodes; A represents a symmetric adjacency matrix with A∈R N×N , Aii = 1; and the elements aij of A represent the edge weights between nodes vi and vj, which are used to represent the relationship between EEG channels. Salvador et al. found that local brain correlations typically decay as the Euclidean distance between the centroids of regions increases, and this nonlinear relationship can be approximately described by an inverse square law [28]. We refer to the adjacency matrix definition of a regularized GNN (RGNN) [29]: where dij denotes the physical distance calculated from the 3D coordinates of channels i The definition of EEG in the graph structure is represented by G = (V, E, A); among them, V denotes the set of nodes of the graph, |V| = C; E represents the set of edge connections between different nodes; A represents a symmetric adjacency matrix with A∈R N×N , A ii = 1; and the elements a ij of A represent the edge weights between nodes vi and vj, which are used to represent the relationship between EEG channels. Salvador et al. found that local brain correlations typically decay as the Euclidean distance between the centroids of regions increases, and this nonlinear relationship can be approximately Sensors 2023, 23, 1917 6 of 17 described by an inverse square law [28]. We refer to the adjacency matrix definition of a regularized GNN (RGNN) [29]: where d ij denotes the physical distance calculated from the 3D coordinates of channels i and j on the device that collects EEG signals, and δ = 5, ensuring that approximately 20% of the relationship between channels is not ignored [29].

SGC Network
SGC (simplifying GCN) is a variant of GCNs. GCNs were first proposed by Kipf and other authors [30]. Like CNNs and multilayer perceptrons, the eigenvectors of each node through multilayer networks are first learned by GCNs, and then these eigenvectors are used as input to linear classifiers. The difference between GCNs and multilayer perceptrons is that the hidden representation of each node is the average of its neighbors at the beginning of each layer. A graph convolutional layer contains three strategies for updating node representations: feature propagation, linear transformation, and a nonlinear activation layer. The transmission mode between GCNs layers is represented as follows: where S represents the normalization of the adjacency matrix A; A = A + I, I represents the identity matrix; D represents the degree of matrix A; H represents the feature of each layer; and σ represents the nonlinear activation function, which is H 0 for the input layer; K is an integer, representing the number of layers. To reduce the excessive complexity of GCNs, Wu et al. proposed SGC, which iteratively removes nonlinearities between GCN layers and collapses the generated function into a single linear transformation. Experiments show that SGC is more computationally efficient than GCNs while being able to show comparable or even better performance [31]. The propagation mode between SGCs layers can be expressed as follows: where X represents the input; S K = SS . . . S represents the repeated multiplication of the normalized adjacency matrix S into a single matrix; Θ = Θ (1) Θ (2) . . . Θ (K) means that the weights are reparameterized into a single matrix.

Improved Dynamic SGC
Studies have shown that DE features have stronger discriminative power in emotion recognition than other features [7,9,19]. Therefore, the DE features of EEG signals are used as the input of the model in this study, For X~N(µ, σ 2 ), DE features are calculated as follows: Then, the input of the model is denoted as X∈RN × C × B × D; labels Y∈ZN; N represents the number of samples; C denotes the number of channels; B denotes the Sensors 2023, 23,1917 7 of 17 number of frequency bands; D denotes the feature dimension. The proposed model first initializes an adjacency matrix Ab to be the DE features Xb of each sub-band (see the descriptions in Section 3.1.1), where Xb∈RN × C × b × D, b∈{δ, θ, α, β, γ} or b∈{θ, α, β, γ}. Subsequently, the adjacency matrix Ab is taken as the input of the dynamic SGC layer. We note that in the dynamic SGC layer, the SGC (see Section 3.1.2) is performed twice. During each time, K is set to be one (i.e., the feature representation of a node is derived by information aggregation of its neighbor nodes), the size of the features is changed by each convolution operation, and an activation function is added to the last layer.
The model constructs a graph structure for each sub-band, and then extracts the subband features separately (in parallel) and performs fusion, that is, an adjacency matrix is constructed for each frequency band; finally, the cross entropy loss (function) of the RGNN [29] is improved. To be specific, if the total number of frequency bands is B, then the number of adjacency matrixes constructed is B and the improved loss function is as follows: where B represents the total number of frequency bands; A b represents the adjacency matrix constructed from the data of band b; a represents the L1 regularization strength of the adjacency matrix, and the value of a is 0.01 (see Section 4.3.5 for the value analysis of a). CrossEntropy represents the crossentropy loss function, and its formula is as follows: where p(x i ) represents the true one-hot encoding vector and q(x i ) represents the predicted encoding vector. Due to the fact that the graphic structure is fixed, it cannot simulate the states of different subjects under different emotions. Therefore, by calculating the gradient of the loss function with respect to A, the adjacency matrix A is dynamically updated using the backpropagation algorithm to dynamically learn the relationship between channels, as shown in the formula: where lr represents the learning rate.

Layer 2: SRM Convolution Layer
The input of the SRM convolutional layer [24] is a deformation of the dynamic SGC output of Section 3.1.3, which is encoded into the feature space using three convolutional layers and two SRM layers. The specific process is depicted in Figure 3 (see Table 1 for the parameters), which can assign a large weight to important features in the sub-band and a small weight to features weakly correlated with emotion.

Layer 3: Fusion and Classification Layer
During feature fusion, the model adaptively learns the weights of feature maps extracted from different sub-bands, thereby improving the classification ability of the model. Assume that there are parameters of B frequency bands that can be learned. Wi is defined to represent their weights, and ∑ 4 i=1 W i = 1. The information of four sub-bands is fused to obtain the following: where Band f represents the fused feature; Band θ , Band α , Band β , and Band r denote the DE features on the bands of θ, α, β, and r, respectively. Finally, a fully connected layer and softmax are used for emotion classification.

Layer 2: SRM Convolution Layer
The input of the SRM convolutional layer [24] is a deformation of the dynamic SGC output of Section 3.1.3, which is encoded into the feature space using three convolutional layers and two SRM layers. The specific process is depicted in Figure 3 (see Table 1 for the parameters), which can assign a large weight to important features in the sub-band and a small weight to features weakly correlated with emotion.  Note: when there are 62 channels: out1 = 128; out2 = 256; out3 = 8. 12 channels: out1 = 32; out2 = 64; out3 = 8. N represents mini-batch; C represents the number of channels. "-" indicates that the parameter is not provided.

Layer 3: Fusion and Classification Layer
During feature fusion, the model adaptively learns the weights of feature maps extracted from different sub-bands, thereby improving the classification ability of the model. Assume that there are parameters of B frequency bands that can be learned. Wi is defined to represent their weights, and ∑ = 1. The information of four sub-bands is fused to obtain the following:  Note: when there are 62 channels: out1 = 128; out2 = 256; out3 = 8. 12 channels: out1 = 32; out2 = 64; out3 = 8. N represents mini-batch; C represents the number of channels. "-" indicates that the parameter is not provided.

Dataset
The experiment was conducted on the public dataset SEED, which is an EEG emotion dataset released by Shanghai Jiaotong University. Fifteen subjects participated in the experiment, with each subject participating in three sessions of experiments, and each group watched 15 movie clips, with a total of 675 samples [7,32]. In this study, DE features from the dataset are used as input to the model. The dataset provider uses a non-overlapping Hamming window with a window length of 1 s and short-time Fourier transform to extract five frequency bands of EEG signals (delta: 1-3 Hz; theta: 4-7 Hz; alpha: 8-13 Hz; beta: 14-30 Hz; gamma: 31-50 Hz); then, DE features are calculated. To normalize the processing period, we refer to the self-organized GNN (SOGNN) model and fill the SEED data window to 265 with zero if the window is less than 265 [19].

Experimental Setup
Generally, the verification strategy based on EEG-based emotion classification has two forms: subject-dependent and subject-independent. On the public benchmark dataset SEED, this study uses a leave-one-subject-out (LOSO) cross-validation method to evaluate the performance of the model and a subject-independent strategy to evaluate the ability of the model to recognize "strange" subjects' emotions. Specifically, the DE features of 14 subjects are used as the training set, whereas the data of the remaining subject are used as the test set. Fifteen-fold experiments are conducted. As the training set, each fold contains data from different subjects. After the model converges, the average value of the last 10 epochs is taken as the experimental result of each fold [33]. The final evaluation result of the model is the average accuracy (ACC) and standard deviation (STD) of the results obtained for each fold. Before training, we normalize the data of each subject, that is, subtract the average value from the characteristics of each subject and then divide it by its STD [19].
The model is trained on NVIDIA GeForce GTX 1080 Ti, the initial value of the learning rate is 0.001, and the learning rate is dynamically adjusted with exponential decay. The epoch of this experiment is 50, the optimization function is set to Adam optimization, and the batch size is 64.

Scheme Validation
According to the structure of the SGC-SRM model, the experiment needs to verify the effectiveness of the following scheme: (1) Whether the performance of multi-bands is better than that of a single band.
(2) Whether the features of the fused sub-bands are better than those of all bands directly extracted or not.
(2) Performance of multi-bands Because four of the five frequency bands of EEG signals are closely related to human emotions [34,35], considering that many studies use the five frequency bands commonly used, we develop two data input methods: Four bands = {θ, α, β, γ}, indicating that the DE feature data extracted from these four bands θ, α, β, γ will be used as input; all bands = {δ, θ, α, β, γ}, indicating that the DE feature data extracted from the five bands δ, θ, α, β, γ will be used as input.
As shown in Table 2, when the data of f of the model Band-SGC-SRM, the classificati 93.78% ± 4.23%, respectively, which outperfo (3) Performance of sub-band feature fus The data of four bands or all bands are u adjacency matrix is constructed for each sub-  Table 2, when the data of four bands or all bands are used as the input of the model Band-SGC-SRM, the classification result (ACC ± STD) is 94.07% ± 4.11% or 93.78% ± 4.23%, respectively, which outperforms single band performance in (1).

As shown in
(3) Performance of sub-band feature fusion The data of four bands or all bands are used as the input of the SGC-SRM model, an adjacency matrix is constructed for each sub-band, and the features of each sub-band are extracted in parallel. Subsequently, fusion classification is further performed using LOSO cross-validation strategy, and its results are shown in the last two rows of Table 2. For clarification, in the case when using four bands as input, called Fusion (θ, α, β, γ), the resulting average accuracy is 94.77% and STD is 4.48%; while in the case when using all bands as input, called Fusion (All bands), the resulting average accuracy is 94.90% and STD is 3.94%. This indicates that the sub-band feature fusion strategy proposed in this study has better performance than the results of not fusing sub-bands in (2).

Channel Selecting Performance
As shown in Figure 4, 12 channels of "FT7", "T7", "TP7", "P7", "C5", "CP5", "FT8", "T8", "TP8", "P8", "C6", and "CP6" are selected in this study [7]. Using the same settings as in Section 4.3.1, we conduct experiments of sub-band fusion and direct extraction of features from all frequency bands based on four and five frequency bands, respectively. As presented in Table 2, we can conclude the following: (1) the results of 12 channels are consistent with the results of 62 channels; (2) the performance of applying four-and five-band features is comparable; (3) fusing the features of sub-bands is better than directly extracting the features of all bands.
To further evaluate the classification performance of different multi-band and multichannel methods, we conduct experiments and display the box diagram in Figure 5.
extracting the features of all bands.
To further evaluate the classification performance of different mul channel methods, we conduct experiments and display the box diagram In the middle of Figure 5, SOGNN [19] is introduced as a baseline tures of five frequency bands in the SEED dataset and the box plot obta cross-validation. From this figure, we can see that the maximum, me zontal line in the figure), average (green triangle), minimum, upper q tangular line, indicating that 25% of the number is greater than the value tile (lower rectangular line, indicating that 25% of the number is less th fold results of the SGC-SRM model are better than those of SOGNN. Fro between using four and five sub-bands feature fusion in Figure 5, we c maximum, minimum, upper quartile, and lower quartile are the same in 62 and 12 channels; (ii) The average of using 12 channels is slightly using 62 channels.

Time Consumption
To more intuitively observe the impact of the number of chann bands on the calculation time, a fold experiment with a cross-valida randomly selected, and the times required for different channel numbe sub-bands are compared, as shown in Figure 6. We note that, for the 62 time required to apply five sub-bands is 17,394.64 s; in contrast, the tota four sub-bands is 11,525.26 s, which is 33.74% lower than the former. F In the middle of Figure 5, SOGNN [19] is introduced as a baseline using the DE features of five frequency bands in the SEED dataset and the box plot obtained by the LOSO cross-validation. From this figure, we can see that the maximum, median (orange horizontal line in the figure), average (green triangle), minimum, upper quartile (upper rectangular line, indicating that 25% of the number is greater than the value), and lower quartile (lower rectangular line, indicating that 25% of the number is less than the value) of all fold results of the SGC-SRM model are better than those of SOGNN. From the comparison between using four and five sub-bands feature fusion in Figure 5, we can see that (i) The maximum, minimum, upper quartile, and lower quartile are the same in the cases of using 62 and 12 channels; (ii) The average of using 12 channels is slightly better than that of using 62 channels.

Time Consumption
To more intuitively observe the impact of the number of channels and frequency bands on the calculation time, a fold experiment with a cross-validation experiment is randomly selected, and the times required for different channel numbers in four and five sub-bands are compared, as shown in Figure 6. We note that, for the 62 channels, the total time required to apply five sub-bands is 17,394.64 s; in contrast, the total time required for four sub-bands is 11,525.26 s, which is 33.74% lower than the former. For the 12 channels, the total time required to apply five and four sub-bands are 1494.39 and 1146.19 s, respectively. The latter reduces the operation time by 23.30%. Moreover, compared with using 62 channels, applying five sub-bands and 12 channels can save 91.41% of the total time; applying four sub-bands with 12 channels can save 90.5% of the total time. the total time required to apply five and four sub-bands are 1494.39 and 1146.19 s, respectively. The latter reduces the operation time by 23.30%. Moreover, compared with using 62 channels, applying five sub-bands and 12 channels can save 91.41% of the total time; applying four sub-bands with 12 channels can save 90.5% of the total time. (1) The time for feature fusion when using five sub-bands (the average of three groups of experiments) is significantly higher than that when using four sub-bands, but with equivalent performance. Specifically, using five bands is slightly better than four bands, with an accuracy difference of 0.13% for 62 channels and 0.16% for 12 channels, see Table 2.
(2) The accuracy of using 12 channels is slightly better than that of using 62 channels.
From the box plot (the green triangle symbol is the average value) in Figure 5, the average accuracy of using 12 channels is slightly higher than that of using 62 channels. Compared with the box plot of SOGNN, the SGC-SRM model is better than SOGNN in terms of the maximum value, minimum value, average value, and median. (3) In terms of time consumption, using 12 channels is significantly better than 62 channels, see Figure 6.
Therefore, selecting four sub-bands DE features of 12 channels is suitable for emotion classification because of better performance and lower time consumption.

Effectiveness of DE Characteristics
To verify the effectiveness of the DE features, we use the 12 channels and four subbands optimized as the input of the SGC-SRM model and obtain 15 rounds of cross-validation results using DE and PSD features, as depicted in Figure 7. The average accuracy and standard deviation obtained by applying PSD features are 91.90% ± 4.78%; the average accuracy and standard deviation of DE features are 95.22% ± 3.61%. The results show that the application of DE features has higher accuracy and lower standard deviation than traditional PSD features, which is consistent with the research conclusion of [8]. (1) The time for feature fusion when using five sub-bands (the average of three groups of experiments) is significantly higher than that when using four sub-bands, but with equivalent performance. Specifically, using five bands is slightly better than four bands, with an accuracy difference of 0.13% for 62 channels and 0.16% for 12 channels, see Table 2.
(2) The accuracy of using 12 channels is slightly better than that of using 62 channels.
From the box plot (the green triangle symbol is the average value) in Figure 5, the average accuracy of using 12 channels is slightly higher than that of using 62 channels. Compared with the box plot of SOGNN, the SGC-SRM model is better than SOGNN in terms of the maximum value, minimum value, average value, and median. (3) In terms of time consumption, using 12 channels is significantly better than 62 channels, see Figure 6.
Therefore, selecting four sub-bands DE features of 12 channels is suitable for emotion classification because of better performance and lower time consumption.

Effectiveness of DE Characteristics
To verify the effectiveness of the DE features, we use the 12 channels and four subbands optimized as the input of the SGC-SRM model and obtain 15 rounds of crossvalidation results using DE and PSD features, as depicted in Figure 7. The average accuracy and standard deviation obtained by applying PSD features are 91.90% ± 4.78%; the average accuracy and standard deviation of DE features are 95.22% ± 3.61%. The results show that the application of DE features has higher accuracy and lower standard deviation than traditional PSD features, which is consistent with the research conclusion of [8] Figure 8, where a = 0.01 exhibits the best performance. To further verify the effectiveness of the improved loss function, a × ∑ m i A b 1 is removed from the loss function, the adjacency matrix is a fixed matrix and will not change dynamically, and "NA" in the figure means that a × ∑ m i A b 1 is removed from the loss function; namely, the adjacency matrix is a fixed matrix. The results show that backpropagation dynamically changes the adjacency matrix, which can sometimes better extract the spatial relationship of EEG signals.  Figure 8, where a = 0.01 exhibi mance. To further verify the effectiveness of the improved loss function removed from the loss function, the adjacency matrix is a fixed matrix a dynamically, and "NA" in the figure means that a × ∑ ∥ A ∥ is remo function; namely, the adjacency matrix is a fixed matrix. The results sh agation dynamically changes the adjacency matrix, which can someti the spatial relationship of EEG signals.  Figure 8, where a = 0.01 exhibits the best performance. To further verify the effectiveness of the improved loss function, a × ∑ ∥ A ∥ is removed from the loss function, the adjacency matrix is a fixed matrix and will not change dynamically, and "NA" in the figure means that a × ∑ ∥ A ∥ is removed from the loss function; namely, the adjacency matrix is a fixed matrix. The results show that backpropagation dynamically changes the adjacency matrix, which can sometimes better extract the spatial relationship of EEG signals.

Comparative Experiment
To further evaluate the overall performance of the SGC-SRM model, we conduct a series of experiments on the public dataset SEED. The relevant baseline methods, which   Table 3. Table 3. Leave-one-subject-out emotion recognition accuracy (mean ± standard deviation) on SEED. Note: "/" indicates that the literature is not provided.

Method
DGCNN [9]: Multi-channel EEG-based emotion classification method based on DGC-NNs that initializes the adjacency matrix and trains the adjacency matrix dynamically through backpropagation.
GECNN [17]: A deep learning method used for EEG emotion recognition, where the CNN is used to extract different depth local features, and then dynamic graph filtering is used to explore the internal relationship between different EEG regions.
BiDANN-S [36]: A deep learning method used for EEG-based emotion classification, where the original EEG features extracted from each cerebral hemisphere are used to extract differentiated depth features, and domain discriminators are used to alleviate domain differences between the source and target domains.
BiHDM [10]: A bi-hemispheric discrepancy model learns asymmetrical differences between two hemispheres, using four recurrent neural networks to capture information from EEG electrodes in each hemisphere from horizontal and vertical streams.
RGNN [29]: A regularized GNN for EEG-based emotion classification, extended SGC, which uses the adjacency matrix to model channel relationships in EEG signals. To effectively deal with cross-subject EEG variations and noisy labels, node-wise domain adversarial training and emotion-aware distribution learning are proposed.
SOGNN [19]: A self-organizing GNN for cross-subject emotion classification of EEG, which dynamically constructs the graph structure according to the corresponding EEG features of the input and processes the graph structure by three graph convolution layers to extract the local and global connection features for emotion recognition.
SparseD [20]: A sparse DGCNN model, which introduces sparse constraints into the graph representation to improve the DGCNN.
As shown in Table 3, adopting the LOSO cross-validation strategy, with the DE features of four frequency bands as input, the average accuracy of the SparseD model is 89.71% (62 channels were applied), and that of SGC-SRM is 95.22% (12 channels are applied). When DE features of five frequency bands are used as input, the average accuracy of the DGCNN, GECNN, BiDANN-S, BiHDM, RGNN, SOGNN, and SparseD models using 62 channels are 79.95%, 82.46%, 84.14%, 85.40%, 85.30%, and 86.81%, respectively. In contrast, the average accuracy of the SGC-SRM model using 12 channels is 95.38%, which indicates that the proposed model can obtain better classification accuracy by using fewer channels (i.e., a small amount of data).

Ablation Experiments
To verify the effectiveness of important modules in the SRM-SGC model, we conducted a series of ablation experiments on the SEED dataset, including the following: (1) Verify the effectiveness of global connection on 62 channels or 12 channels.
(2) Verify the effectiveness of the convolution layer without SRM on 62 channels or 12 channels.
(3) Verify the effectiveness when removing global connections and SRM-based convolutional layers simultaneously.
The results are shown in Table 4. If the global connection is removed, the average accuracy of the model decreases by 0.99% on 62 channels and 0.43% on 12 channels; please see the comparison between the first two rows of Table 4. After removing the SRM-based convolutional layer, the average accuracy decreases by 4.24% on 62 channels and 1.77% on 12 channels; see the first and third rows of Table 4. When both global connections and SRM-based convolutional layers are removed, the average accuracy drops by 4.73% on 62 channels and 1.74% on 12 channels; see the first and fourth rows of Table 4. The results show that global connection can enhance the information of learning asymmetric channels and improve the performance of the model. Introducing a convolutional layer based on SRM and recalibrating the channel features of sub-bands can effectively improve the ability of emotion-related feature extraction, thereby improving the model's classification accuracy.

Confusion Graph of SGC-SRM
To further verify the performance of the SGC-SRM model, the confusion matrixes of LOSO are shown in Figure 9. The horizontal axis represents the prediction label of the model, and the vertical axis represents the actual label of the model. The three category labels are negative, neutral, and positive from left to right and from top to bottom. The SGC-SRM model exhibited good results in identifying negative, neutral, and positive emotions in the SEED dataset. The results obtained by LOSO cross-validation were 95%, 95%, and 97%, respectively. Among the emotions identified as wrong, those labeled neutral are easier to be identified as negative emotions. The probability of such errors is 4% for LOSO cross-validation. Happy emotions are more easily identified than neutral and negative emotions.

Conclusions
In this study, we propose an EEG-based emotion classification method based on multi-band dynamic SGC and channel feature recalibration. A multilayer SGC is con-

Conclusions
In this study, we propose an EEG-based emotion classification method based on multiband dynamic SGC and channel feature recalibration. A multilayer SGC is constructed to learn sub-band features in parallel, and a convolution layer based on SRM is introduced to recalibrate channel features. In addition, 12 channels suitable for emotion classification are selected to save time. Furthermore, the performance of single band, multi-band direct input, and sub-band feature fusion is compared, and the results show that the proposed sub-band feature fusion can achieve high-accuracy emotion classification. In addition, ablation experiments verify the effectiveness of important layers in our model. In the future, the SGC-SRM model will be applied to other physiological signals or fused with various non-physiological signals to improve the accuracy of emotion classification.