Transfer EEG Emotion Recognition by Combining Semi-Supervised Regression with Bipartite Graph Label Propagation

: Individual differences often appear in electroencephalography (EEG) data collected from different subjects due to its weak, nonstationary and low signal-to-noise ratio properties. This causes many machine learning methods to have poor generalization performance because the independent identically distributed assumption is no longer valid in cross-subject EEG data. To this end, transfer learning has been introduced to alleviate the data distribution difference between subjects. However, most of the existing methods have focused only on domain adaptation and failed to achieve effective collaboration with label estimation. In this paper, an EEG feature transfer method combined with semi-supervised regression and bipartite graph label propagation (TSRBG) is proposed to realize the uniﬁed joint optimization of EEG feature distribution alignment and semi-supervised joint label estimation. Through the cross-subject emotion recognition experiments on the SEED-IV data set, the results show that (1) TSRBG has signiﬁcantly better recognition performance in comparison with the state-of-the-art models; (2) the EEG feature distribution differences between subjects are signiﬁcantly minimized in the learned shared subspace, indicating the effectiveness of domain adaptation; (3) the key EEG frequency bands and channels for cross-subject EEG emotion recognition are achieved by investigating the learned subspace, which provides more insights into the study of EEG emotion activation patterns.


Introduction
In 1964, Micheal Beldoch first introduced the idea of Emotional Intelligence (EI) in [1] which examined three modes of communication (i.e., vocal, musical, and graphic) to identify nonverbal emotional expressions. In 1990, Salovey and Mayer formally put forward the concept of EI and considered emotional intelligence as an important component of artificial intelligence in addition to logical intelligence [2]. The key of EI is that machines can recognize the emotional state of humans automatically and accurately. Endowing machines with EI is indispensable to natural human-machine interaction, which makes machines more humanized in communication [3,4]. In addition, endowing machines with EI has great impacts in many fields such as artificial intelligence emotional nursing, human health, and patient monitoring [5]. Emotion is a state that integrates people's feelings, thoughts, and behaviors. It includes not only people's psychological response to the external environment or self-stimulation, but also the physiological response accompanying this psychological response [6]. Compared with the widely used data modalities such as image, video, speech, and text [7][8][9], EEG has its unique advantages such as high time resolution. In addition, EEG is difficult to camouflage in emotion recognition since it is directly generated from the neural activities of the central nervous system [10]. Therefore, EEG is widely used in information of EEG data was encoded into a bi-dimensional map, which is further used to perform knowledge transfer by characterizing the propagation patterns from one channel to the others [35].
Although transfer learning has been widely used in EEG-based emotion recognition to align the EEG data from different subjects [36], most of the existing researches simply place the emphasis on domain-invariant feature learning and recognition accuracy. Therefore, it is necessary to jointly optimize the recognition process in combination with the domaininvariant feature learning. In [22], neural networks were used to simultaneously minimize the recognition error on source data and force the latent representations of source and target data to be similar. Ding et al. constructed an undirected graph to characterize the source and target sample connections, based on which the transfer feature distribution alignment process is optimized together with the graph-based semi-supervised label propagation task [37]. However, this graph was constructed by the original space data and is not dynamically updated during the model optimization; therefore, it cannot well describe the sample connections between the two domains. In addition to the recognition accuracy, most existing studies only visualized the aligned distributions of source and target EEG data and did not sufficiently investigate the properties of the learned shared subspace in emotion expression [22,38,39].
In view of the above shortcomings, this paper proposes an EEG transfer emotion recognition method combining semi-supervised regression with bipartite-graph label propagation. Compared with the existing studies, the present work makes the following contributions. • The semi-supervised label propagation method based on sample-feature bipartite graph and semi-supervised regression method are combined to form a unified framework for joint common subspace optimization and emotion recognition. We first achieve better data feature distribution alignment through EEG feature transfer, based on which we then construct a better sample-feature bipartite graph and sample-label mapping matrix to promote the estimation of EEG emotional state in the target domain; • The EEG emotional state in the target domain is estimated by a bi-model fusion strategy. First, a sample-feature bipartite graph is constructed based on the premise that similar samples have similar feature distributions. This graph is used to characterize the sample-feature connections between the source and the target domain for label propagation, as shown by the 'Bi-graph label propagation' part of Figure 1. Furthermore, a semi-supervised regression is used to learn a mapping matrix to describe the intra-domain connections between samples and labels, which aims to estimate the EEG emotional state of the target domain. By fusing both models, the EEG emotional state of the target domain is estimated from the perspective of similar feature distributions should be shared by samples from the same emotional state; • We explore the EEG emotion activation patterns from the learned common subspace shared by source and target domains, which is based on the rationality that the subspace should retain the common features of the source and the target domain and inhibit the non-common features. We measure the importance of each EEG feature dimension by the normalized 2 -norm of each row of the projection matrix. Based on the coupling correspondence between EEG features and the frequency bands and channels, the importance of frequency bands and brain regions in EEG emotion recognition are quantified.
Notations. In this paper, the EEG frequency bands are represented by Delta, Theta, Alpha, Beta, and Gamma. Greek letters such as α, λ represent the model parameters. Matrices and vectors are denoted by boldface uppercase and lowercase letters, respectively. The 2,1 -norm of matrix A ∈ R r×c is defined as where a i is the i-th row of A.

Methodology
In this section, we first introduce its model formulation and then its optimization algorithm.

Problem Definition
Suppose that the labeled EEG samples from one define the source domain D s , and the unlabeled EEG samples from the other subject {X t } = {x tj } n t j=1 form the target domain D t , where X s ∈ R d×n s , X t ∈ R d×n t , Y s ∈ R n s ×c . x si ∈ R d , x tj ∈ R d are, respectively, the i-th and j-th samples in the source and target domains. y i s | n s i=1 ∈ R 1×c is the label vector of sample i-th source sample which is encoded in one-hot vector, d is the feature dimension, c is the number of emotional states, n s and n t are the number of samples in source and target domains, respectively, and n = n s + n t is the total number of all domains samples. The feature space and label space of both domains are the same, i.e., X s = X t and Y s = Y t ; however, their marginal distributions and conditional distributions are different due to the individual differences of EEG, i.e., P s (X s ) = P t (X t ) and P s (Y s |X s ) = P t (Y t |X t ).
As shown in Figure 1, we propose a joint method for EEG emotion recognition. The model consists of two parts, domain adaptation, and semi-supervised joint label estimation. Below, we introduce them in detail.

Domain Alignment
Suppose that the distribution differences of source and target EEG data can be minimized in their subspace representations. We measure the marginal and conditional distribution differences between the source and target domain subspace data through the Maximum Mean Discrepancy (MMD) criterion [40]. In detail, we project the source and target domain data into respective subspaces by two matrices; that is, we define P s ∈ R d×p is the projection matrix of the source domain and P t ∈ R d×p is the one of the target domain, where p (p d) is the subspace dimensionality. Then, the projected data of two domains can be represented as P T s X s and P T t X t , respectively. Marginal distribution alignment can be achieved by minimizing the distance between the sample means of the two domains, that is, Similarly, conditional distribution alignment aims to minimize the distance between the sample means belonging to the same class of the two domains, that is, where n k s and n k t denote the number of samples belonging to the k-th| c k=1 emotional state in source and target domains, respectively. 1 n s ∈ R n s and 1 n t ∈ R n t are all-one column vectors. f j)) t = 1) denote the probability that the j-th target domain sample belongs to the k-th emotional state category. N s (N t is the diagonal matrix whose k-th diagonal element is 1/n k s (1/n k t ). However, the label information of target domain data is not available. Here, we utilize the probability class adaptive formula [37] to estimate the target domain label and we denote by F t ∈ R n t ×c .
For simplicity, we combine M dist and C dist with the same weight. Thus, the joint distribution alignment is formulated as For clarity, we rewrite (3) in matrix form as where H s/t = I n s/t − 1/n s/t 1 n s/t 1 T n s/t is the centralization matrix, I n s/t ∈ R n s/t ×n s/t is the identify matrix, Y s = [I n s , Y s ] ∈ R n s ×(c+1) , F t = [I n t , F t ] ∈ R n t ×(c+1) is the extended label matrix, N s/t = diag(1/n s/t , N s/t ) ∈ R (c+1)×(c+1) . Additionally, to avoid too much divergence between source and target domain in the projecting process, we minimize the distance between them by min P s ,P t P s − P t 2,1 .

Label Estimation
We reduce the divergence between the source and the target domain by Equation (4) and simultaneously expect that better target labels can be calculated. In order to describe the target domain label estimation process from two aspects, we use a bi-model fusion method to estimate the target domain label. On one aspect, a semi-supervised label propagation method is used for emotional state estimation which is based on the sample-feature bipartite graph. The graph is constructed by characterizing the connections among EEG features and samples. On the other aspect, a semi-supervised regression method is used to estimate the EEG emotional state in the target domain. The two models are adaptively balanced to achieve more accurate target domain label estimation.

Bipartite Label Propagation
The semi-supervised label propagation method based on a sample-feature bipartite graph is used to estimate the label of target domain samples, which has the following formula min where A = [0 n , B; B T , 0 p ] ∈ R (n+p)×(n+p) is the bipartite graph similar matrix, 0 n ∈ R n×n , 0 p ∈ R p×p are all-zero matrices, and the matrix B ∈ R n×p is the sample-feature similarity matrix determined by both source and target data in their subspace representations. Based on matrix B, we expect to learn a better bipartite graph similarity matrix G ∈ R n×p , and then we can form the corresponding matrix S = [0 n , G; G T , 0 p ] ∈ R (n+p)×(n+p) with respect to A. λ is a regularization parameter, Y = [Y s ; F t ; F d ] ∈ R (n+p)×c is the label matrix including samples label matrix F = [Y s , F t ] ∈ R n×c and features label matrix F d ∈ R p×c for the subspace features, matrix L = D − S ∈ R (n+p)×(n+p) is the graph Laplacian matrix, and s ij , 0 n×p ∈ R n×p and 0 p×n ∈ R p×n are all-zero matrices, s ij is the element in row i and column j of matrix S. Tr(·) is the trace of a certain matrix.

Semi-Supervised Regression
For the semi-supervised regression method in target domain label estimation, we have its formula as min where W ∈ R p×c is the sample-label mapping matrix, γ is a regularization parameter, X new ∈ R n×p is the subspace data and b ∈ R 1×c is the offset variable. · 2 2,1 represents the squared 2,1 -norm.

Fused Label Estimation Model
Based on the above analysis in Sections 2.3.1 and 2.3.2, we combined the two models in (6) and (7), and we obtained the fused model objective function for target domain label estimation as where α, β is the regularization parameter, 1 p , 1 n , 1 c , 1 n t are the all-one column vector with dimensions R p×1 , R n×1 , R c×1 , R n t ×1 .

Overall Objective Function
As stated previously, we jointly optimize domain adaptation and semi-supervised joint label estimation. On the one hand, domain adaptation effectively reduces the differences in EEG data feature distribution among subjects and provides well-aligned data for joint label estimation; on the other hand, a better target domain label can promote the alignment of conditional distributions of source and target domains. Therefore, we combine them in a unified framework and finally obtain the objective function of TSRBG as where α, β, γ, λ are the regularization parameters.

Optimization
There are seven variables in Equation (9), which are the mapping matrix W, the offset vector b, the source domain projection matrix P s , the target domain projection matrix P t , the sample-feature similar matrix G, the feature label matrix F d , and the target domain label matrix F t . We propose to update one variable by fixing the others. The detailed updating rule for each variable is derived below.
• Update W. The objective function in terms of variable W is There are four variables, P, W, b, F t , in Equation (10). We need to initialize these variables apart from W. For target domain label matrix F t , we utilize the probability class adaptive formula [37] to estimate the target domain label and the initial value of each element is 1 c , where c is the number of emotional state categories. For subspace projection matrix P = [P s , P t ], we initialize them by Principal Component Analysis (PCA) [41] on the original EEG data.
Taking the derivative of Equation (10) w.r.t. b and setting it to zero, we have By substituting Equation (11) into (10), we obtain where H = I n − 1 n 1 n 1 T n ∈ R n×n is centralization matrix and I n ∈ R n×n is identify matrix, 1 n ∈ R n×n is an all-one matrix.
Constructing Lagrange function about W based on Equation (12), we have where Q ∈ R p×p is a diagonal matrix whose i-th diagonal element is and is a fixed minimal constant value, w i ∈ R 1×c is i-th row vector of W, · 2 2 represents the squared 2 -norm.
Taking the derivative of Equation (13) w.r.t. W and setting it to zero, we obtain • Update P. The objective function in terms of variable P is First, we need to convert the 2,1 -norm into the trace form. Similar to matrix Q, Here (P s − P t ) i is i-th row vector of (P s − P t ), · 2 2 represents the squared 2 -norm. By defining we construct the Lagrangian function in terms of variable P as Taking the derivative of Equation (19) w.r.t. P and setting it to zero, we have For Equation (20), we can solve it by Sylvester equation [42] and then obtain the source domain projection matrix P s and the target domain projection matrix P t .

•
Update G. The corresponding objective function is We propose to solve G in a row-wise manner. Accordingly, we convert Equation (21) to , and completing the squared form of g i , Equation (21) is equivalent to which defines an Euclidean distance on a simplex [43].
which can be decomposed into Then, the Lagrangian function of Equation (25) is Taking the derivative of Equation (26) w.r.t. F d and setting it to zero, we have • By some linear algebra transforms, the first term of Equation (28) can be reformulated as Tr(P T where H t = I n t − 1 n t 1 n t 1 T n t . By constructing the Lagrangian function based on Equations (28)-(30), we have Taking the derivative of Equation (31) w.r.t. F t and setting it to zero, we have To simplify the notations, we define where Z + t and Z + s means all negative elements in matrix Z t and Z s are replaced by zero; similarly, Z − t and Z − s means all positive elements in matrix Z t and Z s are replaced by zero and the negative take the absolute value.
Based on the Karush-Kuhn-Tucker (KKT) condition Φ F t = 0 (where is the Hadamard product), we have We summarize the optimization procedure of our proposed model TSRBG in Algorithm 1.

Algorithm 1 The procedure for TSRBG framework
Input: Data and labels of the source domain {X s , Y s }, data of the target domain X t ; Subspace dimension p; Parameters α, λ, γ, and β; Output: Sample-label mapping matrix W; Source domain projection matrix P s ; Target domain projection matrix P t ; Sample-feature similar matrix G; Feature label matrix F d ; Target domain label matrix F t . 1: Initialize P s , P t with PCA; Target domain label F t = 1 c * 1 n t ×c ; Feature label matrix F d = 1 c * 1 p×c ; 2: while not converge do 3: Compute W by Equation (15) and then update Q; 4: Using Sylvester equation to compute subspace projection matrix P by Equation (20) and split it to obtain the subspace projection matrix of source and target domain respectively and then compute M; 5: Update sample-feature similar matrix G by optimizing Equation (23) and then update S and Laplacian matrix L = D − S; 6: Compute Feature label matrix F d by Equation (27); 7: Compute Target domain label matrix F t by Equation (34); 8: end while

Computational Complexity
We assume that the complexity between individual matrix elements is O(1). The computational complexity of TSRBG consists of the following parts. We need O(pn 2 ) to calculate W and O(pc) to update Q. When updating P, the calculation of the Sylvester equation needs O(d 3 p 3 + d 2 p 2 ), and then O(dp) complexity is used to update M. For i ∈ [1, · · · , n], the updating of g i costs O(p), so the complexity is O(np) in updating G. For the label indicator matrix, F d costs O(p 2 c + pnc) and F t costs O(n 2 t c + n t c 2 + n t c + n t pc) complexities. As a result, the computational complexity of TSRBG is O(T(pn 2 + d 3 p 3 + n 2 t c)), where T is the number of iterations.

Dataset
SEED-IV [44] is a video-evoked emotional EEG dataset provided by the brain-like computing and machine intelligence center, Shanghai Jiao Tong University. In SEED-IV, 72 movie clips with obvious emotional tendency were used to evoke four emotional states of happiness, sadness, fear, and neutrality in 15 subjects and each subject had three sessions. In each session, each subject was asked to watch 24 movie clips; that is, every six movie clips correspond to one emotional state. EEG data was recorded by the ESI NeuroScan System with a 62-channel cap with sampling frequency of 1000 Hz. To reduce the computational burden, it was then down-sampled to 200 Hz. By band-pass filtering EEG data to 1-50 Hz, Differential Entropy (DE) feature was extracted from five different EEG frequency bands, including the Delta (1-3Hz), Theta (4-7 Hz), Alpha (8-13 Hz), Beta (14-30 Hz), and Gamma (31-50 Hz). The DE feature is defined as where X is a random variable, p(x) is the corresponding probability density function.
Assuming that the collected EEG signals obey the Gaussian distribution N(µ, σ 2 ), the DE feature can be calculated by The data format provided by SEED-IV is 62 × n × 5, where n is the number of EEG samples in each session. To be specific, there are 851, 832 and 822 samples in the three sessions, respectively. We reshape DE features into 310 × n by concatenating the 62 values of 5 frequency bands into a vector and then normalize them into [−1, 1] by row.

Experimental Settings
We set up a cross-subject EEG emotion recognition task based on SEED-IV. For each session, samples as well as their labels from the first subject form the labeled source domain and samples from each of the other subjects form target domain. Therefore, for each session, we have 14 cross-subject tasks.
To evaluate the performance of TSRBG, we compare it with several methods including four non-deep transfer learning methods (Joint Distribution Adaptation (JDA) [45], Graph Adaptation Knowledge Transfer (GAKT) [37], Maximum Independent Domain Adaptation (MIDA) [24], Feature Selection Transfer Subspace Learning (FSTSL) [46]), one semi-supervised classification method (Structured Optimal Bipartite Graph learning (SOBG) [47]), and two deep learning methods (DGCNN [48] and LRS [22]). DGCNN is a deep learning method which uses the graph structure to depict the relationship of EEG channels. LRS is a deep transfer method to minimize the discrepancies of latent representations of source and target EEG data.

Recognition Results and Analysis
The recognition accuracies of the above eight models in the cross-subject EEG emotional state recognition tasks in 3 sessions are shown in Tables 1-3 respectively. In these tables, 'sub2' indicates that the samples from the first subject were used as the labeled source domain data while the samples from the second subject were used as the unlabeled target domain data, and so on; 'AVG.' represents the average accuracy of all the 14 groups cross-subject cases in the session. We mark in bold the highest recognition accuracy of each emotion recognition case (each row of the tables). According to these obtained results shown in Tables 1-3, we draw the following observations.
• TSRBG has achieved better EEG emotional state recognition accuracy than the other compared models in most cases. The highest recognition accuracy is the 15th subject of session 2, which is 88.58%. The average recognition accuracy of the three sessions are better than the other seven models, which are 72.83%, 76.49%, and 77.50%, respectively. On the whole, it verifies that the proposed TSRBG model is effective. • By comparing the average recognition accuracy of the eight models in three sessions, it can be found that the joint optimization of semi-supervised EEG emotional states estimation and EEG feature transfer alignment in a tight coupling way can obtain better recognition accuracy. By setting GAKT and TSRBG as control groups, we find that the accuracy of TSRBG is significantly better than that of GAKT, and the main difference between them is the semi-supervised EEG emotion state estimation process. GAKT constructs an undirected graph based on the unaligned original data and this graph will not be updated with the data distribution alignment. In the double projection feature alignment subspace, it fails to well describe the sample association between the two domains. As a result, it cannot accurately estimate the EEG emotion state in the target domain, which affects the alignment effect of conditional distribution. However, TSRBG estimates the EEG emotional states of target domain by a bi-model fusing method. One model is used to construct a sample-feature bipartite graph to characterize inter-domain associations for label propagation. The initialized graph is dynamically updated based on the data subspace representations. The other model is the semi-supervised regression, which can effectively build the connection between subspace data representations and the label indicator matrix.
In order to describe the recognition performance advantages of our proposed model in more detail, we use the Friedman test [49] to judge whether the eight models have the same performance in cross-subject EEG emotion state recognition tasks. The underlying assumption is that "the performance of all models is the same". We rank the performance of the compared models in each group of cross-subject emotion state recognition experiments (in our experiment, the higher the recognition accuracy, the higher the ranking), and calculate the average ranking r i of each model. Assuming that there are K models and N data sets, we calculate the variable τ X 2 as which follows the X 2 distribution with degree of freedom K − 1. In our work, there are 8 comparative models and 42 groups of cross-subject EEG emotion state recognition tasks. That is, K = 8, and N = 42. Then, we can calculate the variable τ F as which obeys the F distribution with degree of freedom K − 1 and (K − 1)(N − 1). According to the recognition results of different models in Tables 1-3, we calculate that the average rankings of them are [3.79, 3.36, 4.81, 4.5, 6.19, 5.14, 6.79, 1.29]. Based on (37) and (38), we obtain the value of variable τ F is 35.682. If the significance level α is 0.05, then the critical value of Friedman test is 2.0416, which can be obtained through MATLAB expression 'icd f ('F', α, K − 1, (K − 1) * (N − 1))' [49]. Since 35.682 is far greater than 2.0416, the assumption "the performance of all models is the same" is rejected. It is necessary to further distinguish the algorithms through the Nemenyi test-based post-hoc test. The results are shown in Figure 2. The models are sorted based on the value of average ranking r i and the model with higher ranking is closer to the figure top. The length of the corresponding vertical line of these models is called the critical distance (CD), whose value 1.620 is calculated by where the critical value q α is 3.031 when α = 0.05. We can judge whether there are significant differences between models by whether there are overlaps in the vertical lines corresponding to the models in Figure 2. For example, the rank value of TSRBG is 1.29 while it is 3.36 for GAKT, the gap between them is 2.07, which is greater than the CD value 1.620, so there is no overlap between their corresponding vertical lines. Therefore, the TSRBG is significantly better than GAKT in the cross-subject EEG emotion recognition tasks. Similar analysis can be performed on the other models.  Further, the average recognition results of these models are reorganized by confusion matrices to analyze the recognition performance of each model in each emotional state. The results are shown in Figure 3. We find that TSRBG has a high average recognition accuracy of 82.48% in neutrality state, which is the highest recognition accuracy among the four kinds of emotional states. The proportions of the neutral EEG samples were wrongly divided into sadness, fear, and happiness by 6.90%, 6.56%, and 4.06%, respectively.
Compared with the other models, the recognition accuracies of the sadness and neutrality states were significantly improved by TSRBG. For example, the recognition rate of sad EEG emotional states was improved by at least 16

Subspace Analysis and Mining
In this work, the process of EEG feature transfer is to seek dual subspaces, which are expected to reduce distribution differences between the source and the target domain data as much as possible. For each domain, subspace data representation is obtained by projecting the original data with a projection matrix. In order to intuitively reflect the alignment effect of two domain data in the subspace, we use the t-SNE method [50] to visualize two groups of experimental data before and after alignment. As shown in Figure 4, we see that the data distributions of source and target domain in the subspace have been effectively aligned.
The subspace feature dimension is p. In order to obtain the subspace dimension suitable for data distribution alignment, we show the change of model recognition accuracy with the adjustment of subspace dimension in Figure 5. It is observed that TSRBG is generally insensitive to the subspace dimension. When the subspace dimension is adjusted within the interval [30,60], TSRBG generally have satisfactory recognition accuracies.
From the perspective of transfer learning, the subspace should reserve the common information and exclude the non-common information between subjects; that is, in the learned subspace, the common components between the source and the target domain should be preserved while the subject-dependent components should be excluded. The subject-independent common components are considered as the intrinsic component of emotion that does not change between subjects. The subject-dependent non-common components are considered as the unique external information of different subjects. From the perspective of EEG features, the subject-independent common EEG features should have larger weights and contribute more to cross-subject emotion recognition. By contrast, the subject-dependent non-common EEG features should have smaller weights and contribute less in cross-subject emotion recognition. If we can quantify the importance of different EEG feature dimensions, according to the corresponding relationship between EEG feature dimension and frequency band [51], the common EEG activation patterns in cross-subject emotion recognition can be explored.   We assume that θ si and θ ti are the importance measurement factors of the i-dimensional features of the source and target domain respectively. Based on the 2,1 -norm feature selection theory [52], θ si and θ ti can be obtained by calculating the normalized 2 -norm of the i-row vector of the subspace projection matrix of the source and target domain, respectively. That is, where p (s/t) i is the i-th row vector of the subspace projection matrix. Then, we can quantitatively calculate the importance of the a-th frequency band and the l-th channel through ω(a) = θ (a−1) * 62+1 + θ (a−1) * 62+2 + · · · + θ a * 62 , where a = 1, 2, 3, 4, 5 denote the Delta, Theta, Alpha, Beta, and Gamma frequency bands, respectively. l = 1, · · · , 62 denote the 62 channels, which are FP1, FPZ, · · · , CB2.
In SEED-IV, the DE features are extracted from five frequency bands and 62 channels. Therefore, the corresponding relationship between the feature importance measurement and different frequency bands (channels) can be established, as shown in Figure 6. As shown in Figure 7, we quantify the importance of different EEG frequency bands in cross-subject emotion recognition, according to the above analysis. Figure 7a presents the results obtained by analyzing the source projection matrix P s in three sessions, respectively, and their average results. Figure 7b displays the results obtained by analyzing the target projection matrix P t in three sessions, and their average result. Figure 7c presents the average results of the source and target domains in three sessions, and the average results of both across all sessions. From the perspective of data-driven and pattern recognition, it is believe that the Gamma frequency band is the most important one in the cross-subject EEG emotion recognition.
Furthermore, we calculated the importance of different EEG channels, as shown in Figure 8. In Figure 8a, we showed the importance of each brain region in the form of the brain topographic map. We observed that the position of the left side of the prefrontal lobe had high weight in all results, and believe that this brain region should have higher importance in cross-subject EEG emotion recognition. The top 10 important channels of each session and the overall average are quantitatively analyzed in Figure 8b. We believe that FP1, P06, P05, O1, P4, and P8 are more important for the cross-subject EEG emotion recognition. Considering that the model has good performance for sadness and neutral EEG emotional states, the above brain region and channels might be more closely related to these two emotional states.   1.926 1.887 1.876 1.839 1.825 1.815 1.775 1

Conclusions
In this paper, we proposed a new model termed TSRBG for cross-subject emotion recognition from EEG, whose main merits are generally summarized as follows. (1) The unification of the feature domain adaptation and the target domain label estimation was effectively realized in a unified framework. Better-aligned source and target data can effectively improve the target domain label estimation performance; in turn, more accurately estimated target domain label information can better facilitate the modeling of conditional distribution modeling, leading to better domain adaptation performance. (2) The intra-and inter-domain connections were investigated based on the subspace aligned data, which formulated a bi-model fusion strategy for target domain label estimation, leading to significantly better recognition accuracy. (3) The learned subspace of TSRBG provided us with a quantitative way to explore the key EEG frequency bands and channels in emotional expression. The experimental results on the SEED-IV data set demonstrated that: (1) The joint learning mode in TSRBG effectively improved the cross-subject EEG emotion state recognition performance; (2) The Gamma frequency band and the prefrontal brain region are identified as more important components in emotion expression.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board (or Ethics Committee) of Shanghai Jiao Tong University (protocol code 2017060).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Not applicable.