Multi-Scale Frequency Bands Ensemble Learning for EEG-Based Emotion Recognition

Emotion recognition has a wide range of potential applications in the real world. Among the emotion recognition data sources, electroencephalography (EEG) signals can record the neural activities across the human brain, providing us a reliable way to recognize the emotional states. Most of existing EEG-based emotion recognition studies directly concatenated features extracted from all EEG frequency bands for emotion classification. This way assumes that all frequency bands share the same importance by default; however, it cannot always obtain the optimal performance. In this paper, we present a novel multi-scale frequency bands ensemble learning (MSFBEL) method to perform emotion recognition from EEG signals. Concretely, we first re-organize all frequency bands into several local scales and one global scale. Then we train a base classifier on each scale. Finally we fuse the results of all scales by designing an adaptive weight learning method which automatically assigns larger weights to more important scales to further improve the performance. The proposed method is validated on two public data sets. For the “SEED IV” data set, MSFBEL achieves average accuracies of 82.75%, 87.87%, and 78.27% on the three sessions under the within-session experimental paradigm. For the “DEAP” data set, it obtains average accuracy of 74.22% for four-category classification under 5-fold cross validation. The experimental results demonstrate that the scale of frequency bands influences the emotion recognition rate, while the global scale that directly concatenating all frequency bands cannot always guarantee to obtain the best emotion recognition performance. Different scales provide complementary information to each other, and the proposed adaptive weight learning method can effectively fuse them to further enhance the performance.


Introduction
Developing automatic and accurate emotion recognition technologies has gained more and more attention due to its wide range of potential applications. In engineering, it facilitates the human-machine interaction more friendly, where machines might understand emotions and interact with us according to our emotions [1,2]. In the medical research, it is beneficial for diagnosing and treating various mental diseases, such as depression and autism spectrum disorders [3,4]. In the education field, it helps to track and improve the learning efficiency of students [5,6]. EEG signals record the neural activities of human cerebral cortex and reflect emotion states, providing an objective and reliable way to perform emotion recognition [7][8][9]. Besides, the advantages of EEG such as noninvasive, fast, and inexpensive in data acquisition make it become a preferred media in emotion recognition [10]. A popular video evoked EEG-based emotion recognition system is shown in Figure 1, which generally consists of the following stages. First, emotional video clips should be collected and subjects should be recruited before the experiments, and then EEG In the past decade, many feature extraction methods and classifiers were proposed for EEG-based emotion recognition [10]. Basically, EEG features can be divided into two types, the time-domain features and the frequency-domain features. The time-domain features aim to extract the temporal information from EEG, e.g., the fractal dimension feature [11], the Hjorth feature [12] and the higher order crossing feature [13]. For the frequencydomain features, researchers usually first filter EEG signals into several frequency bands, and then extract EEG features from each frequency band. The frequency interval of interest is 1-50 Hz which is usually partitioned into five frequency bands, Delta (1-4 Hz), Theta (4)(5)(6)(7)(8), Alpha (8)(9)(10)(11)(12)(13)(14), Beta (14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31), and Gamma (31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50). The frequency-domain features mainly include the differential entropy (DE) feature [14], the power spectral density (PSD) feature [15], the differential asymmetry (DASM) feature [11], the rational asymmetry (RASM) feature [16] and so on. Lu et al. made a detailed comparison among these features, and found that DE was the most stable and accurate feature for emotion recognition than the others [14,17]. Therefore, the DE feature is adopted in this paper. On the classifiers, many machine learning methods were proposed for EEG-based emotion recognition [18,19]. Peng et al. designed a discriminative manifold extreme learning machine (DMELM) method for emotion recognition, and found that Beta and Gamma frequency bands were more relevant to emotional states transition than the others [20]. Li et al. proposed a hierarchical convolutional neural network (HCNN) to classify emotion states, and their experimental results also indicated that the high frequency bands (Beta and Gamma) performed better than the low frequency bands [21]. Moreover, Zheng et al. introduced deep belief networks (DBNs) to construct EEG-based emotion recognition models, and their results demonstrated that combining all frequency bands together performed better than individual bands [22]. Yang et al. drew similar conclusion by designing a Continuous Convolutional Neural Network [23]. The potential reason was that the multiple bands could provide complementary information to each other.
Although the methods mentioned above have achieved improvement in EEG-based emotion recognition, there still exists a problem. The way of frequency bands combination of them is directly concatenating all frequency bands together, which termed as the global scale in this paper and depicted in Figure 2a. However, such way cannot always achieve the best results since it essentially assumes that all frequency bands share the same importance. In this paper, we make extension on the way of combining frequency bands, which is termed as local scales and shown in Figure 2b. Here, by taking the face recognition task as an example, we illustrate the rationality of such multi-scale setting. Human faces manifest distinct characteristics and structures when observe in different scales, and different scales provide complementary information to each other [24]. Similarly, we assume that different scales of frequency bands hold different characteristics of emotion, as well as complement to each other. In each scale, we combine adjacent frequency bands into patches. For example, when the scale is 2, patches are formed by combining 2 adjacent frequency bands, and there are 4 patches in total. It should be noted that we only combine adjacent frequency bands into a patch because the frequency bands changing from Delta to Gamma reflects the conscious mind going from weak to active, which is a continuous process [10]. Therefore, it is reasonable to combine adjacent frequency bands.  For each scale, we train a base classifier to obtain the single-scale classification result. After that, the critical step is how to fuse the results of all scales to enhance the overall performance. It essentially belongs to an ensemble learning task [25][26][27], which combines the results of a set of base classifiers to perform better. In this paper, we design an adaptive weight learning method to combine all scales, which considers the classifier on each scale as a base classifier and learns the weight of each scale to fuse multi-scale results.
From the above, we propose a novel multi-scale frequency bands ensemble learning (MSFBEL) for EEG-based emotion recognition. Generally, the main contributions of this work are summarized as follows.

•
We extended the way of combining different frequency bands into four local scales and one global scale, and then performed emotion recognition on every scale with a single-scale classifier. • We proposed an effective adaptive weight learning method to ensemble multi-scale results, which can adaptively learn the respective weights of different scales according to the maximal margin criterion, whose objective can be formulated as a quadratic programming problem with the simplex constraint. • We conducted extensive experiments on benchmark emotional EEG data sets, and the results demonstrated that the global scale that directly concatenating all frequency bands cannot always guarantee to obtain the best emotion recognition performance. Different scales provide complementary information to each other, and the proposed method can effectively combine these information to further improve the performance.
The rest of this paper is organized as follows. Section 2 presents the proposed method in detail. Section 3 displays the emotional EEG data sets, experiments, and results of the proposed method. Section 4 concludes the whole paper and presents the future work.
In this paper, vectors are written as boldface lowercase letters, and matrices are written as boldface uppercase letters. The trace of matrix A is represented by Tr(A). For a vector a, the 2 -norm of it is denoted by a 2 = (a T a) 1 2 , where a T is the transpose of a. a ≥ 0 represents that every element of vector a is larger than or equal to zero. 1 and I represent a column vector that all elements are "1" and an identity matrix, respectively. I denotes an indicator function which takes the value of 1 when the condition is true, and 0 otherwise.

Method
In this section, we present the proposed method MSFBEL ( Figure 3) in detail, which mainly contains two stages. First, we re-arrange every EEG sample into different scales as shown in Figure 2, and then perform emotion classification on each scale by a singlescale classifier, called single-scale frequency band ensemble learning (SSFBEL). Second, the results of all scales are fused by the adaptive weight learning method to further improve the performance.

Single-Scale Frequency Band Ensemble Learning
In this paper, we use DE feature to model emotion information from EEG signals. Without loss of generality, supposing that DE features are extracted from s frequency bands, we divide it into s scales. In each scale j (j = 1, 2, · · · , s), we combine j adjacent frequency bands into patches and then obtain p j = s − j + 1 patches of this scale. Figure 2 displays an example when s = 5, which represents that DE features are extracted from 5 frequency bands (Delta, Theta, Alpha, Beta, and Gamma).
SSFBEL is proposed to perform emotion classification on each scale of frequency bands, whose architecture is displayed in Figure 4, which takes an example with s = 5 and j = 2. Below we give some explanations of it.
• First, given an unlabeled DE feature-based sample y ∈ R d , we divide it into a set of patches y i ∈ R d j (i = 1, 2, · · · , p j ). Here d is the feature dimension of EEG samples.
For example, if we use the DE-based EEG feature representation, d is equal to the product of the numbers of channels and frequency bands. Similarly, d j denotes the feature dimension of DE patches under scale j, that is, d j equals the product of the numbers of channels and frequency bands in a patch. • Second, these patches are, respectively fed into base classifiers and then the corresponding predicted labels {z 1 , z 2 , · · · , z p j } can be obtained. • Finally, the predicted labels of all patches are combined by simple majority voting [28] to generate the final label r j for the sample y under scale j.  In SSFBEL, the collaborative representation based classification (CRC) [29] is used as the base classifier. CRC usually represents a test sample with an over-complete dictionary formed by training samples, whose representation coefficient vector is regularized with an 2 -norm to improve its computational efficiency. Once the representation coefficient vector is obtained, the test sample can be categorized into the class which yields the minimum reconstruction error. In current study, for each patch, we construct the corresponding dictionary according to the principle that the combination and order of frequency bands of them are consistent.

Delta
Suppose that we have a patch y i and the corresponding dictionary formed by training samples X i = [X i1 , X i2 , · · · , X ic ] ∈ R d j ×n , where c, d j , and n are the number of classes, the feature dimension of DE patches under scale j, and the number of training samples, respectively. X ik ∈ R d j ×n k (k = 1, 2, · · · , c) is the collection of samples from the k-th class in which each column is a sample, where n k denotes the number of samples in the k-th class. For the patch y i , its representation coefficient α i ∈ R n can be obtained by solving the following objective min where λ is a regularization parameter. Obviously, the optimal representation coefficient to Let α * ik ∈ R n k represent the vector whose only nonzero entries are the entries of α * i associated with class k. The sample y i can be reconstructed by the training samples of class k as y ik = X ik α * ik . The label of y i is determined as the class which yields the minimum reconstruction error In our experiments, to further improve the classification accuracy, we divided the above reconstruction error by α * ik 2 , because it can bring some discrimination information for classification [29]. Finally, the single-scale result r j is obtained, which combines all patches' results z i (i = 1, 2, · · · , p j ) by using majority voting

Adaptive Weight Learning
Assuming that different scales of frequency bands might have complementary information to each other, we combine the classification results of all scales obtained by SSFBEL to enhance the emotion recognition performance. The whole architecture of the proposed MSFBEL is shown in Figure 3.
Obviously, the current task is how to determine the optimal weights for different scales in the ensemble stage. In this work, we propose an adaptive weight learning method by maximizing the ensemble margin. Suppose the ensemble weight vector is w = [w 1 ; w 2 ; · · · ; w s ] ∈ R s , where ∑ s j=1 w j = 1, and we learn it from data. Specifically, we select m samples from the total n training samples, whose data and labels can be, respectively represented as m) represents the p-th DE-based EEG sample, o p is the corresponding ground-truth label. We divide a sample x p into s scales and feed them into SSFBEL to obtain the classification result of each scale r pj (j = 1, 2, · · · , s). We define the decision matrix D = [d pj ] ∈ R m×s as Then, the ensemble margin of sample x p is defined as To get the optimal weight vector w, we should make the ensemble margin in (5) as large as possible. Based on the studies of [30,31], margin maximization can be transformed into a loss minimization problem. To be specific, the ensemble loss function can be formulated as where 1 m = [1; 1; · · · ; 1] ∈ R m is a column vector. Therefore, objective (6) is equivalent to optimize min where 1 s = [1; 1; · · · ; 1] ∈ R s is a column vector. By denoting M = D T D and b = 2D T 1 m , we can rewrite the objective (7) as Since w is a column vector, objective (8) is equivalent to optimize To make the above objective separable, we introduce an auxiliary variable v with respect to w, then we get min According to the augmented Lagrangian multiplier (ALM) method [32,33], objective (10) can be rewritten as where µ is a quadratic penalty parameter and β ∈ R s is the Lagrangian multiplier. Accordingly, an alternative optimization method is applied to solving problem (11). The details are given in Appendix A. Finally, we get the optimum solution w * to problem (11), based on which we can make prediction on the test sample y as l = arg max k∈[c] {∑ s j=1 w * j I (r j = k)}. The procedure of MSFBEL framework is outlined in Algorithm 1.

Algorithm 1
The procedure for MSFBEL framework.
Input: Number of scales s, number of classes c, training data X ∈ R d×n , training data label O ∈ R c×n , a subset of training data X sub ∈ R d×m , the labels of subset O sub ∈ R c×m , testing data y ∈ R d ; Output: The label of testing data: l.

Experiments and Results
In this section, we first describe two emotional EEG data sets used in the experiments, including their data collection and feature extraction. Then, the experimental settings are given based on which we perform EEG-based emotion recognition to evaluate the effectiveness of MSFBEL.

Algorithm 2
The procedure for SSFBEL framework.
Output: The label of testing data under scale j: r j . 1: Compute α * i by Equation (2); 2: Compute z i by Equation (3);

Algorithm 3
The algorithm to solve problem (11).

Input: M and b;
Output: The weight vector w ∈ R s . 1: Initialize v, β, µ, and ρ; 2: while not converged do 3: Update v by Equation (A1); 4: Update w by solving problem (A3) via Algorithm 4; 5: Update β = β + µ(w − v); 6: Update µ = ρµ; The "SEED IV" is a publicly available emotional EEG data set [34]. The EEG signals were collected from 15 healthy subjects when they watched emotion-eliciting videos. In the EEG data collection experiment, 24 two-minute video clips were played to each subject. There are four types of emotional video clips, sadness, fear, happiness, and neutral, and each emotion has 6 video clips. While watching video clips, EEG signals of subjects were recorded at a 1000 Hz sampling rate by the 62-channel ESI NeuroScan system (https://compumedicsneuroscan.com/ (accessed on 6 August 2020)). Every subject was asked to complete the experiment three times on different days, and therefore we obtained three sessions of EEG signals for each subject.
In our experiments, we used the "EEG_feature_smooth" EEG data recordings downloaded from the "SEED IV" web site (http://bcmi.sjtu.edu.cn/home/seed/seed-iv.html (accessed on 20 December 2019)). Preprocessing and feature extraction of EEG data had already been conducted, including downsampling all data to 200 Hz, filtering out noise and artifacts by linear dynamic system (LDS) [35], and extracting DE features from 5 frequency bands: Delta, Theta, Alpha, Beta, and Gamma, with a four-second time window without overlapping. The dimensions of DE feature were 62 × W 1 × 5 (format: # channel × # samples × # frequency bands), where W 1 was the number of samples of one subject in each trial. We reshaped the 62 points of each of the 5 frequency bands, and then obtained DE features with the shape of 310 × W 1 . Since the time durations of different video clips in each session were slightly different, the total sample numbers for each subject in each session were approximately 830.

DEAP
Another emotional EEG data set used to validate our proposed method is "DEAP" [36]. It is a music video evoked EEG data set. There were 32 subjects invited to watch 40 oneminute music video clips. At the end of each video, subjects were asked to make a selfassessment of their level in terms of arousal, valence, liking, and dominance. During the experiment, the EEG signals were recorded by Biosemi ActiveTwo system (http://www. biosemi.com (accessed on 6 August 2020) with 32-channel electrode according to the international 10-20 system placement. The sampling rate is 512 Hz.
In our experiments, we utilized the "Data_preprocessed_matlab" EEG data recordings downloaded from the "DEAP" web site (http://www.eecs.qmul.ac.uk/mmv/datasets/ deap/index.html (accessed on 20 December 2019)). EEG signals were down-sampled to 128 Hz. EOG artefacts were removed by using a blind source separation technique (http:// www.cs.tut.fi/gomezher/projects/eeg/aar.htm (accessed on 6 August 2020)). A bandpass frequency filter from 4.0 to 45.0 Hz was applied. Then, we extracted DE features from 4 frequency bands: Theta, Alpha, Beta, and Gamma, with a one-second window size without overlapping. The dimensions of DE feature were 32 × W 2 × 4, where W 2 was the number of samples in each trial. We concatenated the 32 values of the 4 frequency bands and then obtained DE features with the shape of 128 × W 2 . We got 63 samples for each trial in which the first 3 samples were baseline signals, and the last 60 sample were trial signals. According to the study of Yang et al. [37] that the baseline signals were useful for emotion recognition, we further processed the data as they did: calculating the deviation between every trial sample and the average of 3 baseline samples as the final input. Therefore, we obtained 60 samples in each trial, and there were totally 2400 samples for each subject.

Experimental Settings
We evaluated methods on every subject of the two data sets. For the "SEED IV" data set, we evaluated methods under the within-session experimental paradigm as [34]. Specifically, for every subject in each session, we utilized the last 8 trials as the test data, which not only contained all emotional states but also guaranteed that each emotional state has exact 2 trials, and the rest 16 trials as the training data. On the "DEAP" data set, we split every subject's data as [36], which chose 5 as a threshold to divide all samples into four categories according to the different levels of valence and arousal: high valence and high arousal (HVHA), high valence and low arousal (HVLA), low valence and high arousal (LVHA), low valence and low arousal (LVLA). Then we performed 5-fold cross validation on each subject's data. The recognition accuracy and standard deviation were used as evaluation metrics. The average accuracy and standard deviation of all subjects represent the final performance of a method.
We compare MSFBEL with support vector machine (SVM), K Nearest Neighbors (KNN) and SSFBEL. Linear kernel was used in SVM and the regularization parameter C was determined by grid search from {10 −5 , 10 −4 , · · · , 10 5 }. For KNN, Euclidean distance measure was used and the number of neighboring samples K was searched from {10, 20, 30, · · · , 150}. The regularization parameter λ in objective (2) of SSFBEL was finetuned with grid search from {10 −5 , 10 −4 , · · · , 10 −1 }. The parameters µ in objective (11) of MSFBEL was fine-tuned with grid search from {10 −5 , 10 −4 , · · · , 10 −1 }, β and ρ were initialized with [1; 1; · · · ; 1] ∈ R s and 1.1, respectively. To learn the optimal weights corresponding to different scales, we need to get the decision matrix D defined in Equation (4). Therefore, we divided the training data into two parts; one was used for training and the other was used for validation. For the "SEED IV" data set, each part contained 2 trials of each emotional state. For the "DEAP" data set, each part contained half samples of each emotional category. Then we calculated the decision matrix D based on the ground truth labels and the estimated labels of samples in the validation set.

The Effect of Different Scales
In this section, we compare the performance of global scale and local scales of frequency bands for emotion recognition by SSFBEL. First, we perform classification on different scales for every subject' data from the "SEED IV" and "DEAP" data sets, and the results are shown in Tables 1-4, respectively. The best results are highlighted in boldface. From these tables, we can observe that not every subject gets the best results on the global scale (scale = 5 for "SEED IV", and scale = 4 for "DEAP"). For example, for subject #1 in Table 1, SSFBEL gets the best accuracy 84.85% when scale = 1, which exceeds the accuracy of scale = 5 by 1.17%. For subject #6, it obtains the optimal result 79.25% when scale = 2, which outperforms the result of scale = 5 by 12.58%. We can find similar results from the other three tables. Second, we calculate the average accuracy of all subjects in each scale of frequency bands. The average accuracies of each scale on the three sessions of "SEED IV" data set are shown in Figure 5. From this figure, we can find that the global scale cannot always achieve the optimal accuracy on every session. For example, in session 1, SSFBEL achieves the highest average accuracy 81.65% on scale = 1, which is higher than that of scale = 5 by 1.46%. These results are consistent with our proposed idea that the global scale of frequency bands which direct concatenating all frequency bands together cannot always achieve the best performance. That is, sometimes local scales can achieve higher emotion recognition accuracy than the global scale. Therefore, it is reasonable for us to re-organize all frequency bands into different scales, which can provide more potential for improving the performance of emotion recognition.   However, there still exists a problem that the optimal scale of frequency bands varies in terms of different subjects. For example, in Table 1, some subjects (#1, #3, #4, #7, #9, #10, #12, and #15) get the best result on scale = 1, and some subjects (#2, #6, #9, and #14) obtain the optimal result on scale = 2, and so on. Besides, some subjects (#2 and #9) achieve the best result on several scales at the same time. These problems can be found in the other three tables. The uncertainty of the optimal scale may be influenced by the characteristics of the EEG signals, which not only have low signal-to-noise ratio (SNR) but also exhibit significant differences across subjects [38]. Therefore, it is necessary to fuse all scales to reduce the impact of these factors. In this paper, MSFBEL is proposed to fuse different scales, in which the adaptive weight learning can fuse all scales' results through automatically assigning larger weights to more important scales to further enhance the performance. The effectiveness of MSFBEL will be evaluated in Section 3.3.2. Table 3. The accuracies (%) of different scales of frequency bands on session 3 of "SEED IV" by using SSFBEL.

Subject
Local  Table 4. The accuracies of different scales of frequency bands on the "DEAP" by using SSFBEL (%).

The Performance of MSFBEL
In this part, we compare MSFBEL with SVM, KNN and SSFBEL to show the effectiveness of it. In this experiment, SVM and KNN take global scale frequency bands as input. As for SSFBEL, we choose the best frequency band scale for every data set as input. Specifically, according to the average accuracies shown in Tables 1-4, we choose the results of 1, 3, and 5 for session 1, 2, and 3 of the "SEED IV" data set, and we select the results of scale = 4 for the "DEAP" data set. MSFBEL method takes all frequency band scales as input.
For the "SEED IV" data set, emotion recognition accuracies of the four methods in the three sessions are shown in Table 5. From the experimental results, we observe that MSFBEL achieves the best average recognition rates of 82.75%, 87.87%, and 78.27% in the three sessions, respectively. When compared with SVM, MSFBEL respectively achieves 11.53%, 10.27% and 6.51% improvements in the three sessions. As for KNN, MSFBEL respectively obtains 11.78%, 7.69% and 9.33% improvements than it in the three sessions. Moreover, MSFBEL exceeds SSFBEL by 1.10%, 2.12%, and 1.57% corresponding to the three sessions. For the "DEAP" data set, accuracies of the four methods are displayed in Table 6. From this table, we can find that MSFBEL achieves the average accuracy of 74.23%, which obtains 20.63% and 11.5% improvements in comparison with SVM and KNN. In addition, the performance of MSFBEL is better than SSFBEL by 2.03% in terms of average accuracy. Besides, Figure 6 presents the overall performance of the four methods. We observe that MSFBEL achieves better performance than the other three methods on both data sets. The underlying reason may be the combination of different scales of frequency bands by assigning larger weights to important scales by MSFBEL, while the other three methods only conduct classification on one scale. Therefore, we declare that there are complementary information among different scales, and the proposed adaptive weight learning method effectively fuses these information to enhance classification performance.   Besides the comparison of average accuracy of these four methods, we perform the Friedman test [39] to illustrate the statistical significance among them. The Friedman test is a non-parametric statistical test, which is used to detect differences of multiple methods across multiple test results. The null-hypothesis is "all the methods have the same performance". If the null-hypothesis is rejected, the Nemenyi test is utilized to further distinguish whether the performances of the two among all methods are significantly different. We analyze the difference in performance among the four methods, and the results are shown in Figure 7. In the figure, the solid circle represents the average rank of each method, and the vertical line represents the critical distance (CD) of Nemenyi test, which is defined as follows where k denotes the number of methods, N denotes the number of result groups, and q α is the critical value which is defaulted as 0.05 [40]. In this paper, we set k = 4 because there are four methods in total. For "SEED IV" data set, N = 45 because there are 15 subjects and each subject has 3 sessions. For "DEAP" data set, N = 32 since there are 32 subjects. If two vertical lines do not have overlap, it means that the corresponding methods have statistically different performance. As shown in Figure 7, both on "SEED IV" and "DEAP" data set, SSFBEL and MSFBEL do not have overlap with SVM and KNN, which represents that our proposed methods are significantly different in performance with these two compared methods.  Figure 8, the average accuracy of the "neutral" emotional state classified by SVM, KNN, SSFBEL, and MSFBEL are 80.48%, 79.31%, 82.36%, and 87.73%, respectively. In Figure 9, the average accuracies of the "HVHA" emotional state classified by these four methods are 72.45%, 76.7%, 85.74%, and 86.89%, respectively. Second, we can get the misclassification rate of each emotion state. For example, from the confusion matrix of MSFBEL on the "SEED IV" data set (Figure 8d), 87.73% of the EEG samples are correctly recognized as "neutral" state while 4.59%, 1.19%, 6.5% of them are incorrectly classified as "sad", "fear", and "happy" states, respectively. From the confusion matrix of MSFBEL on the "DEAP" data set (Figure 9d), 86.89% of the EEG samples are correctly classified as "HVHA" state while 3.13%, 6.26%, 3.36% of them are misclassified as "HVLA", "LVHA", and "LVLA" states, respectively. Third, comparing with the other three methods, MSFBEL shows improvement on each of the four emotional state. For instance, as shown in Figure 8, MSFBEL exceeds the accuracies of SVM, KNN, and SSFBEL on the "neutral" state recognition by 7.25%, 8.42%, and 5.37%, respectively. Further, we notice that "neutral" state always gets the highest accuracies on the four methods, which can deduce that it is the easiest emotional state to be identified. As displayed in Figure 9, MSFBEL exceeds the accuracies of other three methods on the "HVHA" state recognition by 14.44%, 10.19%, and 1.15%, respectively.

Conclusions and Future Work
In this paper, a new frequency bands ensemble method (MSFBEL) was proposed to recognize the emotional states from EEG data. The main advantages of MSFBEL are that (1) It re-organizes all frequency bands into different scales (several local scales and one global scale) to extract features and classify. (2) It combines the results of different scales by learning an adaptive weight to further improve emotion classification performance. Extensive experiments were conducted on the "SEED IV" and "DEAP" data set to evaluate the performance of MSFBEL. The results demonstrate that the scale of frequency bands influences the emotion recognition rate, while the global scale that directly concatenating all frequency bands cannot always guarantee to obtain the best emotion recognition performance. Moreover, the results also illustrate that different scales of frequency bands provide complementary information to each other, and the proposed adaptive weight learning method can effectively fuse these information. The results indicate the effectiveness of our proposed MSFBEL model in emotion recognition task.
In the future, we will focus on the following problem not covered in this paper. We will explore the cross subject or cross session domain adaptation problem, so that we may try to find the optimal scale of frequency bands for EEG-based emotion recognition.  where γ and η ∈ R s are the Lagrangian multipliers. Suppose that w * is the optimal solution, γ * and η * are the corresponding Lagrangian multipliers. According to KKT condition [42], we get the equations for every j (j = 1, 2, · · · , s) as follows The first equation of Equation (A5) can be reformulated into the following form