Multiple Transferable Recursive Feature Elimination Technique for Emotion Recognition Based on EEG Signals

: Feature selection plays a crucial role in analyzing huge-volume, high-dimensional EEG signals in human-centered automation systems. However, classical feature selection methods pay little attention to transferring cross-subject information for emotions. To perform cross-subject emotion recognition, a classiﬁer able to utilize EEG data to train a general model suitable for di ﬀ erent subjects is needed. However, existing methods are imprecise due to the fact that the e ﬀ ective feelings of individuals are personalized. In this work, the cross-subject emotion recognition model on both binary and multi a ﬀ ective states are developed based on the newly designed multiple transferable recursive feature elimination (M-TRFE). M-TRFE manages not only a stricter feature selection of all subjects to discover the most robust features but also a unique subject selection to decide the most trusted subjects for certain emotions. Via a least square support vector machine (LSSVM), the overall multi (joy, peace, anger and depression) classiﬁcation accuracy of the proposed M-TRFE reaches 0.6513, outperforming all other methods used or referenced in this paper.


Introduction
Emotions are known as a group of intrinsic cognitive states of the human mind. It adds meanings to human activities and plays a vital role in human communication, intelligence, and perception [1]. An emotion can be triggered by a specific feeling and will eventually lead to a change in behavior [2]. Since emotions are closely associated with human activities and psychophysiological states, establishing intelligent emotion recognition is integral to achieve adaptive human-machine interaction (HCI). One preparatory work for emotion recognition is target emotion tagging, a process that assigns proper emotional labels to improve the efficiency of annotation methods of final classification performance [3]. Previous pieces of literature have proposed several emotion models. Some of them, such as Ekman's and Parrot's, are widely adopted but are poor in the term of the number of emotions (six emotions). The wheel of emotions by Plutchik and recently proposed a 3D model hourglass of emotions that is able to obtain complex emotions (more than 20 emotions in total). As the mainstream of DEAP-based studies, Russell's valence-arousal (V-A) model is used in this literature [4][5][6][7]. The V-A plane with arousal score as a horizontal axis and valence score as vertical axis could be set up, from which each emotional state has an arousal dimension and a valence dimension [8]. Arousal scores range from inactivity to activity of one participant and the valence score measures the level of pleasure of him. Then, the V-A plane is ready to divide the target emotions (see Section 3.2).
Emotions can be expressed in both verbal and nonverbal manners. Therefore, it is important to build an HCI system that can recognize emotions by identifying facial or voice expressions of users [9]. The corresponding effective computing system must contain multifaceted processes. First, the HCI system should detect whether a specific emotion is expressed, and thus correctly label the emotional class (e.g., happiness or sadness) [10]. However, either facial or voice indicators are not always reliable. Past studies utilizing these indicators for HCI system emotion recognition show that subjects often intend to make their tones or manners in an exaggerated way to achieve a satisfactory performance [11]. Thus, emotion recognition via recording and analyzing physiological signals becomes a promising alternative [12]. Particularly, electroencephalography (EEG), with its non-invasive technique that easily yields input data for emotion classifiers, is becoming a preferred indicator [13]. EEG signals are immense in volume and high in dimension. For example, one single participant provides 8064-dimension original data in DEAP database, which are impossible to be handled directly. Another significant problem with emotion recognition via EEG is the response from each individual varies differently upon receiving the same affective stimuli. This is because emotions are personal and the evaluation should use an individual-specific assessment model. Since there is an existence of data distribution between subjects a long period of time is inevitable and required to train the classifiers. Furthermore, EEG signals could also be distributed differently in different days due to its non-stationarity. A proper model trained using the EEG data from a specific individual may not be well adapted for use on novel users, and therefore, feasible feature selection methods are imperative to transfer the useful information among individuals. Thus, the machine learning approach is adopted to extract useful information as clues for emotion recognition.
This paper focuses on the importance of selecting salient EEG indicators. All the algorithms mentioned below can be used for cloud services or non-cloud services. To examine high-dimensional EEG features, the recursive feature selection (RFE) combined with the least square support vector machine (LSSVM) was developed. The RFE-LSSVM has the capability to rank EEG features and selects the most relevant variables [14]. Choosing LSSVM over a vector machine (SVM) is because LSSVM shows less computational consumption [15]. Considering the need for cross-subject emotion recognition, it is reasonable to modify traditional RFE into the transferable recursive feature elimination (TRFE) [16]. This approach eliminates the EEG indicators that are not generic for all users and forms a set of robust EEG indicators that are steadily distributed among all training subjects and the specific novel testing subject.
With TRFE, the classifier does not necessarily require the corresponding specific training set up for the novel testing subject. By processing the reusable historical data collected from all other subjects, the training dataset is identified and produced. Following this concept, the development of TRFE, single TRFE (S-TRFE) and multiple TRFE (M-TRFE) are proposed. Both algorithms are based on a novel transferring set that contains the most trustful features from other subjects. While the S-TRFE algorithm directly adds the transferring set to the entire training set of one subject, M-TRFE removes some worst features from the entire training set of one subject and replaces them with the given transferring set to improve classification performance. In addition, M-TRFE also tries to select the most trusted subjects with better performance in cross-subject emotion recognition. Thus, the more a subject is trusted, the more he is donating to the transferring set. This process can be described as getting rid of the outliers who have not reacted commonly as most people do.
Based on this M-TRFE algorithm, the expected cross-subject classifier should have better performance. The accuracies given by this classifier are expected to be higher than others. Throughout this entire process, we exploit the DEAP database as the working resource.
To be concise, in the rest of the paper, TRFE will be used as a collective name that encompasses a series of RFE based cross-subject schemes. The original TRFE algorithm will be renamed as general TRFE (G-TRFE) to make a distinction. The newly proposed M-TRFE algorithm will be compared against all strategies previously mentioned, as well as the subject-specific (SS) case on both binary and multiclass emotion recognition.
The rest of the paper consists of several sections. Section 2 dedicates to the summary of related works that inspire this work, and Section 3 provides a short description of DEAP dataset, the EEG preprocessing scheme, and the feature extraction methods on DEAP. Current section will also demonstrate the workflow of LSSVM and the detailed process of M-TRFE as well. Section 4 consists of binary and multiclass emotion recognition, where different cross-subject or subject-specific methods are expounded, tried, and compared. The last two sections focus on result analysis, main contribution, the implication of this work, its limitations and its potentials.

Related Works
Emotion recognition is utilized in many fields. He et al. proposed a convolutional neural network (CNN) that recognizes emotion from an image by combining a binary classification network and a deep network [17]. In addition, a facial recognition system has been applied to evaluate the quality of distance teachings [18]. In speech analysis, emotion recognition is implemented by using the extreme learning machine (ELM) [19]. Music, in which emotions are expressed, can be analyzed to tell the difference between contemporary commercial music and classical singing techniques [20]. Classification performances of speech and music recognition systems are not ideal (around 50%), but facial and voice recognition systems have achieved high accuracies of 0.8170 and 0.8753, respectively.
Aside from facial and vocal features, physiological features have also been widely used in emotion recognition. To be more specific, EEG signals are also investigated via machine learning based classifiers. For instance, gender recognition with entropy measurement is achieved in Hu et al. [21]. The connection between mental fatigue and aging has been studied. The recognition of mental fatigue was found to be efficient when adaptive weights switch in deep belief networks (DBN) [22,23]. Even though DBN has also been applied in the recognition of emotion more studies use SVM which combines feature smoothing or selection methods, such as canonical correlation analysis (CCA) and principal component analysis (PCA) [24][25][26]. An end-to-end model based on CNN is used to reduce the cost of designing the feature set, and as a result, the average accuracy of 0.7548 was reported [27]. In a recent study, Tong et al. [28] combined the International Affective Picture System (IAPS) that sorted eight valence levels with similar arousal values, with nonlinear feature based SVM. These EEG-based emotion recognition methods are very encouraging and have gained widespread attention.
Several studies have already been done and shown the efficacy of various feature selection methods and their use in EEG based emotion recognition. Zhang et al. combined feature extraction methods of empirical mode decomposition and sample entropy [29]. Atkinson and Campos' work integrated mutual information based on EEG feature selection and kernel classifiers [30]. A novel feature termed as DE-PCCM proposed by Li et al. has good outcomes when a differential entropy (DE) feature extraction was employed [31].
RFE approaches are of particular interest to us due to the previous work which demonstrates the applicability of these approaches to emotion recognition. SVM-RFE detecting scalp spectral dynamics of interest (SSDOIs) has promising clinical applications [32]. Another modification of the RFE approach, D-RFE (stands for dynamical-RFE), was proposed to improve the inter-class discrimination [33]. In a series of previous works, we investigated OFS classification using LSSVM based on RFE [34]. Motivated by these studies, we adopted the supervised learning methodology, the 2D V-A plane to target four emotions and M-TRFE algorithm for EEG feature selection.

EEG Datasets for Effective Modeling
In this study, the DEAP (a database for emotion analysis using physiological signals) database was used to validate the proposed machine learning-based feature selector. Total of 32 subjects with (average age was 26.9, 50% of them were female) took part in the experiment for physiological data acquisition. The International 10-20 System was implemented for recording EEG signals in which 32-channel data were collected under a sampling rate of 256 Hz [35]. There were 40 video clips (i.e., 40 trials, each trial lasted about 1 min) prepared for each participant as the emotional stimuli, which equates to 40 trials for each subject to complete. The physiological responses were simultaneously recorded while the participant was watching the video. All subjects accomplished a self-assessment after each trial where arousal, valence, dominance, liking, and familiarity scales were labeled. With the exception of the familiarity scale (range 1-5), the remaining four rating scales ranged from 1 to 9. Then the VA-model was used to determine the target emotion classes [36].

Feature Extraction and the Target Emotion Classes
The Butterworth filter with a cutoff frequency of 4.0 and 45.0 Hz was used to filter the noise in the EEG data [37]. Then, an independent component analysis (ICA) was employed to eliminate muscular artifacts [38]. In each trial, 60-s continuous EEG signals were selected and split into three segments: 3-s baseline segment, 6-s (10%) validating segment, and 54-s working segment. The validating segment was used to rank features and avoid overfitting in the M-TRFE model, while the working segment was used to select features and perform the classifier training and testing. The baseline segment was discarded in this work because it is collected before the subject watches the video.
In this work, 11 channels out of 32 channels were picked. These channels are F3, F4, Fz, C3, C4, Cz, P3, P4, Pz, O1 and O2. This particular choice of channels follows the channel employment in previous work of Zhang et al. [39]. Overall, 137-dimensions EEG features were extracted, which consists of 60 frequency domain features and 77 time domain features. By using a fast Fourier transformation, the frequency features (60 power features, 16 power difference features) were prepared. In each channel, the power features were computed on four frequency bands, i.e., theta (4-8 Hz), alpha (8-12 Hz), beta (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) and gamma (30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45). Power difference features were employed to detect the variation in cerebral activity between the left and right cortical areas. There are four channel pairs, F4-F3, C4-C3, P4-P3 and O2-O1, used for power differences extraction with each pair, contributing four features of four bands. For each channel, seven temporal features were computed as seven indexes: mean, variance, zero crossing rate, Shannon entropy, spectral entropy, kurtosis, and skewness. All features were standardized with mean = 0 and s.d. = 1. The detailed descriptions of the features are shown in Table 1. Emotion classification achieved based on supervised learning requires predetermined emotion labels. In the DEAP database, participants used self-assessment manikins to rate the valence and arousal levels in the range from 1 to 9. The subject rated valence or arousal levels from the lowest of 1 to the highest of 9. A threshold is conventionally set up and calculated to determine high/low valence or arousal classes. The value of the threshold point here was determined in a participant generic manner. With each of 32 subjects selected rating values, the mean values of both valence and arousal indexes from all subjects and trials were calculated. For every subject's 40 arousal ratings a 1 , a 2 , . . . , a 40 (a i ∈ R 2 ), arousal threshold point c 1 is computed as follows: The same process using valence ratings was used to compute the valence threshold point, c 2 . It was found that c 1 = 5.2543 and c 2 = 5.1567 were the threshold values for arousal and valence dimensions. The ratings above c 1 were assigned as the state of high arousal and the ratings above c 2 were the state of high valence.
The entire V-A plane was split into 4 parts: HVHA (high valence high arousal), HVLA (high valence low arousal), LVHA (low valence high arousal), and LVLA (low valence low arousal). This is illustrated in Figure 1. Finally, the four emotions of joy, peace, depression and anger were assigned to each respective quadrant in the V-A plane. The same process using valence ratings was used to compute the valence threshold point, c2. It was found that c1 = 5.2543 and c2 = 5.1567 were the threshold values for arousal and valence dimensions. The ratings above c1 were assigned as the state of high arousal and the ratings above c2 were the state of high valence.
The entire V-A plane was split into 4 parts: HVHA (high valence high arousal), HVLA (high valence low arousal), LVHA (low valence high arousal), and LVLA (low valence low arousal). This is illustrated in Figure 1. Finally, the four emotions of joy, peace, depression and anger were assigned to each respective quadrant in the V-A plane.

Multiple Transferable Feature Elimination Based on LSSVM
M-TRFE was developed via LSSVM due to its merits in faster-training and better performance in avoiding overfitting. Here is the principle to select the feature instance. Given the training set and the corresponding output labels ∈ + − . The nonlinear mapping ( ) ϕ x was used to generate a higher dimensional feature space aiming at finding the optimal decision function, In Equation (2), w stands for the weight vector of the classification separating hyperplane and ( ) y x is the linear estimation function in feature space. To achieve minimization of structural risk, the scheme was carried out as below:

Multiple Transferable Feature Elimination Based on LSSVM
M-TRFE was developed via LSSVM due to its merits in faster-training and better performance in avoiding overfitting. Here is the principle to select the feature instance. Given the training set D = (x i , y i ) | i = 1, 2, . . . , l with the input data x i ∈ R n and the corresponding output labels y i ∈ {+1, −1}. The nonlinear mapping ϕ(x) was used to generate a higher dimensional feature space aiming at finding the optimal decision function, In Equation (2), w stands for the weight vector of the classification separating hyperplane and y(x) is the linear estimation function in feature space. To achieve minimization of structural risk, the scheme was carried out as below: where γ is the regularization parameter for adjusting the punishment degree of training error, ||w|| 2 is in control of the complexity of the model, and the last term l i=1 ζ 2 i is an empirical error on the training set, where the slack variable ζ i is introduced in case of nonlinear separable of the instances in two classes. The Lagrangian function can be constructed with the kernel function K(x i , x j ) = ϕ(x i )·ϕ(x j ) to find solutions of a linear equation system. Applying the least square method, a nonlinear prediction model is exposed via kernel function K: According to the equations above, M-TRFE measures if a feature is salient by checking the classification margin and the loss of the margin when the kth feature is eliminated, i.e., In Equation (5), w(k) is the weight vector of the classification plane with the kth feature eliminated. If the elimination of a particular feature leads to the largest ∆Φ, the corresponding feature is considered as the most influential one.
The goal of M-TRFE is to determine a set of best indicators among a group of participants. It is noted that the binary LSSVM is not capable of the four-class classification task. Here the one against one ensemble (OvO) of multiclass classifiers is utilized to fulfill the task. With OvO structure, each two emotion classes are tackled as a pair via an M-TRFE-LSSVM model. The details are shown as follows.
Given a sample set . . , l}, initialize the feature set S = {1, 2, . . . , D}, the feature-ranking set R = [], and the feature ranking vector p = []. Combine two training samples as a pair, and eventually generate l(l−1)/2 novel training sample, the last classifier can be built as: In the first step, we can use the obtained x j to train an LSSVM model with the computed weight vector w j (j = 1, 2, . . . , l). Then, the sorting criterion score can be calculated as follows, Update the feature ranking set as R = [p, R] and delete the feature in S. This process repeats till S = [].
In M-TRFE, credible training data contain the best feature instances from others. At the same time, the selected training data eliminate some of the worst performing features from the original training set. M-TRFE will also rate the pick of only a few subjects to take part in the building of this set. The influence of the variation of the training set of the TRFE concept is illustrated in Figure 2. The construction of the M-TRFE novel training set is unfolded in Figure 3a.
Notably for the use of multiclass M-TRFE, for the training set D = (x i , y i ) | i = 1, 2, . . . , n with the multiclass label y i ∈ {1, 2, 3,4}, several separate binary classifiers robustly analyze each emotion and encode the label into binary values y i ∈ {−1, +1}. This would give each emotion a feature ranking and subject selection. To fulfill the multiclass subject-generic emotion feature selection, a mutual feature-ranking list is generated based on four separate rankings of joy, peace, anger and depression, to detect the best features and most trusted subjects for each emotion. We consider one subject is more trusted if he achieves better results in cross-subject classification accuracy. The more one subject is trusted, the more he contributes to the transferring training set. Meanwhile, the least trusted ones stop contributing the set as is shown in Figure 3b. The weighted score of each feature can be averaged from all ranking lists.
In Equation (8), if one feature is determined as the worst feature, the ranking index r-value will be 0. The r-value of the second worst feature would be considered as 1 and so on. By choosing the least trusted 10 features, rankings for different emotions would be given. In fact, the W value evaluates how favorable one feature is. Moreover, instead of using separate feature ranking arrays, the mutual array can be calculated in the proposed M-TRFE multiclass model.   To fulfill the multiclass subject-generic emotion feature selection, a mutual feature-ranking list is generated based on four separate rankings of joy, peace, anger and depression, to detect the best features and most trusted subjects for each emotion. We consider one subject is more trusted if he achieves better results in cross-subject classification   To fulfill the multiclass subject-generic emotion feature selection, a mutual feature-ranking list is generated based on four separate rankings of joy, peace, anger and depression, to detect the best features and most trusted subjects for each emotion. We consider one subject is more trusted if he achieves better results in cross-subject classification feature of the ith subject, we define the Euclidean distance between the original EEG feature set to a novel transferring set as The H-value can be calculated as In Equation (10) Tables 2 and 3. Table 2. Pseudo codes of the algorithm for M-TRFE initialization.

1
Start initialization 2 for I = 1:s 3 for j= 1:f 4 Define V i = x k , y k using f th validating segment of subject i 5 Define Select the model and the regularization parameter γ Define cross-subject data V j 1 = x k , y k from working segment of subject j 1 15 Define J(w, b, ζ k ) = 1 2 ||w|| 2 + 1 2 (γ (i) 0 · v j 1 k=1 ζ k ) and train the model 16 Test model with the validating segment V i = x k 1 , y k 1 from subject i 17 Create subject ranking vector A H = A i ∪ A H 18 end for 19 Rank the most trusted subjects through ranking A H 20 end for 21 End initialization There are several details that need to be explained in Table 2. s in line 2 stands for the number of subjects that took part in the trials, while f value in line 4 controls the number of folds if L-fold cross-validation technique is applied. In this study, s = 32 and f = 10. j 1 from line 9 starts the subject ranking. A i records cross-subject performance of subject i.
In Table 3, the high state of emotion is taken as an example. The low state of emotion can also track the pseudo codes. There are several parameters that need to be explained. L = 137 in line 10 represents the dimensionality of the feature set. Since L is a prime number in this work, the step length of iteration of each elimination has to be taken as 1. In line 15, d H (and d L ) quantifies the distance between the original set and transferring set for the high (and low) class, and the distance difference D(r 1 ) is considered equally influential as LSSVM margin loss w(r 1 ) by taking λ 1 = λ 2 = 0.5.
An auxiliary function f a has also been used in the pseudo codes, which is introduced to simplify the representation of the algorithm: To evaluate the classification performance of the proposed feature selection model, several assessment metrics are introduced as accuracy, F1 score, and kappa value. Table 3. Pseudo codes of the algorithm for M-TRFE feature ranking.

1
Start feature ranking 2 for i = 1:s 3 Load Calculate V H for a certain emotion and create blank space end if 10 for j = 1:L 11 Build the Oi = V i ∪ Si used for transferring task 12 Define Find support vector w = O i k=1 α k y k x k 14 for r = 1: end for 17 Create a blank feature ranking set R = ∅ 18 Eliminate R from feature set S 20 end for 21 Return feature ranking set S = ∪ L J=1 R( j) 22 End feature ranking

Results
All the experiments and following results were carried out via Matlab R2016b, with the computer running on Windows 10 operating system with Intel ® Core TM i5-7200U CPU @ 2.50 GHz 2.71 GHz and 8 GB RAM.

Data Split and Cross-Validation Technique
In this subsection, several strategies for data splits based on different cross-validation techniques were tested. Since our feature extraction had enlarged the feature space, we needed to take different proportions of data randomly (not orderly) from the working segment as training sets. As the training/test set was completely different between each repetition, the subject-specific emotion recognition was run for five times to test the stability of the random use of the working segment. The value of accuracy is shown in Figure 4, the results of which correspond to random data splits under hold-out or 10-fold cross validation conditions. According to Figure 4, the hold-out cross-validation only yields accuracy values around 50%. Therefore, it would be impractical to evaluate generalization capability. To tackle this issue, we used the 10-fold cross-validation to achieve an acceptable classification performance (AVG arousal accuracy = 0.6549, AVG valence accuracy = 0.6865). In contrast to hold-out cross-validation, 10-fold cross validation divides the validating segment equally into ten small folds, with each fold estimating the accuracy. This was found to be a better model to enhance the classification performance. Thus, the analysis in the rest of the paper all employ 10-fold cross-validation. Figure 4 also shows that the classification results across different repetitions do not significantly fluctuate, which indicate the random use of the working segment is feasible.

Data Split and Cross-Validation Technique
In this subsection, several strategies for data splits based on different cross-validation techniques were tested. Since our feature extraction had enlarged the feature space, we needed to take different proportions of data randomly (not orderly) from the working segment as training sets. As the training/test set was completely different between each repetition, the subject-specific emotion recognition was run for five times to test the stability of the random use of the working segment. The value of accuracy is shown in Figure 4, the results of which correspond to random data splits under hold-out or 10-fold cross validation conditions. According to Figure 4, the hold-out cross-validation only yields accuracy values around 50%. Therefore, it would be impractical to evaluate generalization capability. To tackle this issue, we used the 10-fold cross-validation to achieve an acceptable classification performance (AVG arousal accuracy = 0.6549, AVG valence accuracy = 0.6865). In contrast to hold-out cross-validation, 10-fold cross validation divides the validating segment equally into ten small folds, with each fold estimating the accuracy. This was found to be a better model to enhance the classification performance. Thus, the analysis in the rest of the paper all employ 10-fold cross-validation. Figure 4 also shows that the classification results across different repetitions do not significantly fluctuate, which indicate the random use of the working segment is feasible.

Cross-Subject Feature Selection and Binary Classification
There are several strategies to realize binary cross-subject emotion recognition. In the strategy of S-TRFE, the entire training set from one subject remains, while different amounts of relevant features from other subjects are added to create a novel training set. We gradually increased the use of the transferring set, but despite alterations to the set, we found that the classification performance was generally unchanged. The value of F1-score did not significantly vary as well. On the dimension of the valence, accuracy ranged from 0.6865 to 0.6875. However, for the arousal dimension, the accuracy dropped from 0.6549 to 0.6470.
In the case of M-TRFE, two key factors of the paradigm are given in Tables 3 and 4. We labeled the participant who reached the highest classification accuracy in the direct cross-subject scheme as the most trusted subject, and the assigned ranks are listed in Table 4. In this direct scheme, the RFE based feature selection was not performed. Instead, all subjects were directly involved in training the classifier. The AVG arousal and valance accuracy reached 0.5089 and 0.4961. The unsatisfactory performance of direct cross-subject scheme paradigm confirmed the notion that cross-subject emotion recognition faces tough obstacles.
The worst features given by M-TRFE feature ranking were ranked and are presented in Table 5 with their corresponding physiological significance. The PSD features from the beta band were unwelcomed in the binary classification.
For all 32 subjects, the mean arousal and valence accuracies peaked when the worst feature was eliminated. As a result, the highest value of 0.6531 for arousal and 0.6867 for valence dimension were determined. Figure 5 reveals the variation of the classification performance when different numbers of features were excluded. With the worst feature eliminated, the classification performance achieved the highest value. Although all these metrics were slightly improve compared to SS, the enhancement was not significant.  M-TRFE paradigm uses a certain amount of relevant features from other most trusted subjects to replace an equal amount of the least relevant features of one specific subject that was eliminated. With the single most trusted subject contributing to the transferring set and with the most relevant feature of this subject being employed, we found that the classification accuracies on binary emotional dimensions peak. The classification performance on SS, RFE, S-TRFE and M-TRFE for each subject is illustrated in Figure 6. M-TRFE overtakes other cross-subject methods, yet inferior to SS. However, with all cross-subject methods analyzed, the recognition performances were actually close. M-TRFE paradigm uses a certain amount of relevant features from other most trusted subjects to replace an equal amount of the least relevant features of one specific subject that was eliminated. With the single most trusted subject contributing to the transferring set and with the most relevant feature of this subject being employed, we found that the classification accuracies on binary With the single most trusted subject contributing to the transferring set and with the most relevant feature of this subject being employed, we found that the classification accuracies on binary emotional dimensions peak. The classification performance on SS, RFE, S-TRFE and M-TRFE for each subject is illustrated in Figure 6. M-TRFE overtakes other cross-subject methods, yet inferior to SS. However, with all cross-subject methods analyzed, the recognition performances were actually close.  The average accuracies of all 32 participants are listed in Table 6. With regards to S-TRFE, altering the number of transferring features dis not influence the classification performance, and all the indexes remained the same except that the F1 score of arousal decreases progressively with the number of feature increases. Thus, S-TRFE merely shows the impact of transferring features, and the specific subject himself leads the classification performance. M-TRFE, meanwhile, performs the best when the most trusted subject was donated and its performance surpassed S-TRFE. We also used the G-TRFE algorithm, which was proposed by Yin in 2017 and was the inspiration of this work. However, it seemed not suitable for our feature extraction. The arousal accuracy was found to be 0.5580, and the valence accuracy was 0.5860. Unfortunately, accuracies of both dimensions were worse than M-TRFE. Note: All classification performances listed above adopted an optimal number of transferring features. The subject-specific average value of accuracy is shown. The values in the brackets are the corresponding standard deviation.

Multiclass Cross-Subject Emotion Recognition
As is illustrated in Figure 1, a total of four states of emotions were extracted. Especially, by converting the multiclass task into several binary classifications, we took four separate binary classifiers to identify the worst feature for each emotion state and the most trusted subject for the transferring feature set. Therefore, each emotion would have its most trusted subject. The least credible features and most trusted subjects were ranked and used to implement M-TRFE, and these features are presented separately in Tables 7 and 8. According to Equation (8), the over-all most trusted subjects and most credible features could be calculated and titled as "mutual", aiming to identify the worst features and most trusted subjects in multi-classification for OvO structures. Additionally, the corresponding physiological significances of this mutual ranking were given. We applied several previous methods, including S-TRFE, M-TRFE, and G-TRFE in this subsection. With the M-TRFE implemented, classification accuracies given by separate classifiers were all above 0.7 and the overall accuracy (OA) reaches 0.7538. Peace was considered as the best performing emotion, which gives an OA of 0.8932 and F1 score of 0.8025. Notably, for subject 3, the classification accuracy of class joy achieved 100%. The perfect recognition of the emotion anger was found in subjects 23 and 26.
The results of three different strategies are illustrated in Figure 7. G-TRFE was the worst performer (OA = 0.5390). S-TRFE performed better (OA = 0.6811) but it was still inferior to M-TRFE (OA = 0.7538) for all emotions. However, it should be noted that the achieved results are specific for only one particular emotion and are not the actual results of multiclass classification. subjects 23 and 26.
The results of three different strategies are illustrated in Figure 7. G-TRFE was the worst performer (OA = 0.5390). S-TRFE performed better (OA = 0.6811) but it was still inferior to M-TRFE (OA = 0.7538) for all emotions. However, it should be noted that the achieved results are specific for only one particular emotion and are not the actual results of multiclass classification. It should be mentioned that only one worst feature was removed in the binary state case, according to M-TRFE. During multiclass classification using OvO structure, more feature instances and more trusted subjects were available for M-TRFE to develop. As previously stated, the transferring set for OvO structure was decided by the mutual rankings given by separate binary It should be mentioned that only one worst feature was removed in the binary state case, according to M-TRFE. During multiclass classification using OvO structure, more feature instances and more trusted subjects were available for M-TRFE to develop. As previously stated, the transferring set for OvO structure was decided by the mutual rankings given by separate binary classifiers. The results of feature elimination are depicted in Figure 8a,b and the corresponding eliminated amount of features are also shown. Since the maximum number of the eliminated features was limited to eighteen, the maximum number of other subjects employed would also be limited to eighteen. Under that condition, every subject employed provided one's most relevant feature to the transferring set. Notably, when the two most trusted subjects were donated to the transferring set, M-TRFE performed the best, which can be seen in Figure 8c.
The results of M-TRFE, SS, S-TRFE and G-TRFE are all listed in Table 9. With the highest values of performance indexes including the kappa value and OA, M-TRFE was still the best choice for cross-subject emotion recognition. The kappa value suggests that there is a moderate level of agreement between the actual class and the predicted class of M-TRFE, which is similar to SS. The p-value of the one-way ANOVA between cross-subject schemes and SS was calculated and revealed that significant variation did exist. Meanwhile, the other two strategies were only deemed as a fair agreement. M-TRFE also had a balanced performance on all indexes. It is reasonable to conclude that M-TRFE is feasible and excellent on multi-classification using OvO structure.
classifiers. The results of feature elimination are depicted in Figure 8a,b and the corresponding eliminated amount of features are also shown. Since the maximum number of the eliminated features was limited to eighteen, the maximum number of other subjects employed would also be limited to eighteen. Under that condition, every subject employed provided one's most relevant feature to the transferring set. Notably, when the two most trusted subjects were donated to the transferring set, M-TRFE performed the best, which can be seen in Figure 8c.  Table 9. With the highest values of performance indexes including the kappa value and OA, M-TRFE was still the best choice for cross-subject emotion recognition. The kappa value suggests that there is a moderate level of agreement between the actual class and the predicted class of M-TRFE, which is similar to SS. The p-value of the one-way ANOVA between cross-subject schemes and SS was calculated and revealed that significant variation did exist. Meanwhile, the other two strategies were only deemed as a fair agreement. M-TRFE also had a balanced performance on all indexes. It is reasonable to conclude that M-TRFE is feasible and excellent on multi-classification using OvO structure.

Discussion
Due to the uncertainty to recognize human emotion through EEG, there is currently no sufficient knowledge that can be used to find the optimal machine learning method for feature selection. In this paper, a classical LSSVM-based feature selection algorithm was formed to resolve the existing cross-subject emotion classification problem. A novel EEG feature set was extracted from DEAP database to meet the cross-subject need. For physiological signals like EEG, different participants can have distinct reactions to the same stimuli. In this study, TRFE itself is described as an ideology of transferring historical data. Several other cross-subject algorithms based on TRFE and algorithms independent of TRFE are tested in comparisons. These algorithms all demand a delicate balance between an individual and the other individuals when compiling a novel training set. The proposed M-TRFE was exactly designed to offset this individual variation. It introduces the transfer learning principle that retains the shared information for a group of individuals. In other words, M-TRFE emphasizes the common ground in human emotion.
The feature extraction of this work is unique. The labels of DEAP database are rated for the entire duration of the video clips but we extended the labels every two seconds. Since the subjects are all informed of the video contents before the trials, it is reasonable to believe the emotion remains consistent through the entire course. The expansion of the feature set includes more emotion samples to the experiments and has a stronger influence on the results.
Through many experiments including binary classification and multiclass classification, M-TRFE is a particularly outstanding cross-subject method. Unlike other algorithms based on RFE, M-TRFE requires a careful selection of trusted subjects who are allowed to contribute to the novel training set. In order to identify these subjects, some special steps were designed to take. In binary classification, we search for those who perform better in the direct scheme. In the multiclass case, four separate binary classifiers directly reduce the number of labels from four to two and then forms a mutual ranking. Accordingly, the PSD of the beta band appears to be the least contributive physiological significance in multiclass classification. The binary classifiers also reached this conclusion. Meanwhile, OvO is a classical classifier ensemble that prevents overfitting and has given out the final results of multiclass classification.
On binary affective states, the allocation of the training/testing set and of 10-fold cross-validation was determined. Both were used throughout the experiments. Compared to subject-specific results, all the cross-subject schemes seem to be stagnant in their development. This is mostly because the feature ranking and elimination only permit the single worst feature to be eliminated, which leaves no room for M-TRFE to develop. On the other hand, multiclass classification gives the transferring training set a sufficient fusion. However, the number of trusted subjects that M-TRFE chooses is still limited. M-TRFE attains its best performance when only two subjects are involved. This might be partly due to the divergence of human emotion would be enlarged when affective computing recruits more individuals to participate in the cross-subject task. Even so, M-TRFE still becomes the preferred paradigm of most of the indexes, and even exceeds SS using OvO structure.
Furthermore, M-TRFE had not only better classification accuracy but also a faster running speed. The running period of MTRFE-LSSVM was significantly shorter by 86.97% than GTRFE-LSSVM. The latter appears to be inordinately time-consuming due to the oceanic design of feature set, costing 4784.80 s per training period verse 623.22 s by MTRFE-LSSVM. Moreover, M-TRFE efficiently reduces resources waste by selecting the best features from other individuals and putting them into use. This resource efficiency is exactly what the principle of the transfer learning expects. On the other hand, since the training set of S-TRFE is built by raising the dimension, its testing set dimension will also be raised. This occurrence is actually contrary to the concept of RFE. However, M-TRFE modifies this flaw by maintaining the dimension and reinforcing the performance. Thus, all the results bring to the conclusion that M-TRFE is far more superior in cross-subject emotion recognition.
Our work is also compared to other recent studies on DEAP dataset in Table 10. Our model achieves top performances in classification performance categories in the demonstration. The OA of binary classification was calculated as the average accuracy of the arousal and valance dimension. Here are some necessary explanations to the abbreviations in the table: SVD = singular value decomposition, mRMR = minimum redundancy maximum relevance, EC = evolutionary computation, FAWT = flexible analytic wavelet transform.

Conclusions
It is an intractable and challenging task for cross-subject emotion recognition to distinguish human individual emotion. However, the generality of human being and their emotions guarantees the potential for automated systems to perform cross-subject recognition. In this paper, the cross-subject emotion recognition has been carried out with the novel M-TRFE feature selection method on both binary and multiclass classification problems. M-TRFE manages not only the selection of feature instances but also the selection of individuals. By choosing participants who react closer to the common reaction, M-TRFE performs similar to the subject-specific recognition on the binary affective state and it prevails over all methods on multiclass classification. Throughout the work, LSSVM was applied to accomplish the selection. The binary classification rate achieved 0.6494 and 0.6898 on arousal and valance dimensions. In the case of multiclass, OA achieved 0.6513. These results have outperformed all other methods applied in this paper and most of the recently reported studies on the DEAP database. In general, M-TRFE has made cross-subject emotion recognition more efficient and precise with less resource waste.
In future work, we will look for more stable and more accurate classifiers to improve emotion recognition. We will also combine some other physiological signals such as eye gaze, GSR, blood pressure, etc., with the EEG signals. In addition, compound emotions such as anxiety can be a new direction for further research, under V-A or other emotion models. Although M-TRFE has been proven to be an excellent solution to cross-subject emotion classification, there are still flaws that need further investigation and solution. More complex multiclass tasks with larger quantities of emotions and participants will challenge the performance of M-TRFE. If there are too many subjects involved, detecting the most trusted subjects will be difficult. The improvement of M-TRFE will also be a topic for future research.