Maximum Marginal Approach on EEG Signal Preprocessing for Emotion Detection

: Emotion detection is an important research issue in electroencephalogram (EEG). Signal preprocessing and feature selection are parts of feature engineering, which determines the performance of emotion detection and reduces the training time of the deep learning models. To select the efﬁcient features for emotion detection, we propose a maximum marginal approach on EEG signal preprocessing. The approach selects the least similar segments between two EEG signals as features that can represent the difference between EEG signals caused by emotions. The method deﬁnes a signal similarity described as the distance between two EEG signals to ﬁnd the features. The frequency domain of EEG is calculated by using a wavelet transform that exploits a wavelet to calculate EEG components in a different frequency. We have conducted experiments by using the selected feature from real EEG data recorded from 10 college students. The experimental results show that the proposed approach performs better than other feature selection methods by 17.9% on average in terms of accuracy. The maximum marginal approach-based models achieve better performance than the models without feature selection by 21% on average in terms of accuracy.


Introduction
An electroencephalogram (EEG) is a biosignal that reflects brain activity. In the environment of artificial intelligence, the analysis of EEG is an important research area. It can help medical staff perform intelligent diagnosis, such as epilepsy and Alzheimer [1][2][3][4]. Emotion recognition is an important research issue in EEG, and emotions can be reflected in EEG [5,6]. For example, the emotions of fear and tension have different waveforms on the EEG. These waveforms are not obvious but play an important role in the research area of the brain-computer interface (BCI) and emotion recognition [7][8][9]. Peoples can observe different emotions based on computers and intelligent machines by utilizing BCI technologies. It can help people understand emotions on some occasions, such as at social events [10].
Signal preprocessing and feature selection play an important role in emotion detection, which can remove the noise from the EEG signal and select correlated features with emotions to improve the performance of deep learning models [11,12]. It also can reduce the training time of deep learning model. Existing methods for selecting features from EEG signals to detect emotions mainly have univariate, multivariate, filter, wrapper, and built-in methods. The univariate methods input single feature to the model and multivariate methods consider grouping the features as inputs to train the classifications [13,14]. The filter methods evaluate the correlation between features and emotions to filter some features that are not useful for emotion detection [15,16]. The wrapper methods select features based on the accuracy of the classification [17]. If the input features increase the accuracy, the features are selected. The built-in methods are applied in the classifications, such as deep learning models [18]. They select features by observing the weight of each feature. The features with low weights are dropped since they are not useful for emotion detection. These approaches focus on the difficulty of EEG-based emotion detection, which discovers the correlation between features and emotions from EEG signals.
To select the valuable features, we propose an approach on EEG signal preprocessing to select features. The selected features indicate the least similar segments between two classes of EEG signals. The signal similarity is defined as follows.
Definition 1 (Signal similarity). The signal similarity is described as the distance between two signals d i and d j , which is formulated as D(d i , d j ) = ||d i − d j || 2 . The signal similarity is negatively related with the distance. Figure 1 gives the segments of two EEG signals with different emotions on two time intervals. The similarity between the two signals at the first time interval is smaller than the second time interval. The segments at the first time interval are selected as the features for detecting emotions. The proposed method selects the features from the frequency domain of EEG. The frequency domain is helpful to preprocess and analyze EEG. The EEG signal on time domain is hard to analyze since it usually combines with noise caused by different behaviors. To research the EEG produced by specific behavior, we have to remove the noise. The frequency of noise is different from regular brain waves so that the noise can be detected on the frequency domain.
Wavelet transform is a popular tool for time-frequency analysis [19,20]. It overcomes the shortcoming of short-time Fourier transformation that the filter window size does not change with frequency. It utilizes a wavelet to decompose EEG into an approximate component and a detail component. The approximate component consists of the low frequency band of EEG, and the detail component consists of a high frequency band. The difference of EEG signals is reflected in the high frequency band so that the features are selected on the detail component.
The main contributions of this study are as follows.
• We propose a maximum marginal approach (MM) on the EEG signal preprocessing for emotion detection. It defines the similarity of two class signals and selects the feature on the frequency domain. The results show that MM performs better than other feature selection methods on emotion detection.

•
We conducted experiments on real EEG data by applying the selected features to bi-direction long short-term memory model (BiLSTM). The results show that the MM-based models performed better than the models without signal processing. The MM-based BiLSTM achieved better performance than others.
The remainder of this paper is organized as follows. In Section 2, studies related to the emotion detection and feature selection are described. In Section 3, the maximum marginal feature selection method is detailed. In Section 4, the BiLSTM model is detailed. In Section 5, the experimental results are presented. Finally, in Section 6, some concluding remarks regarding this study are provided.

Related Work
Deep learning is the most popular supervised learning architecture, and some studies are related to the EEG classification by using deep learning. Ni et al. [21] used a BiLSTM to classify the student's brain activity. It gave ten college students courses and classified whether these students feel confused about the contents of courses. They found that the high frequency component of EEG is most important for identifying the emotion. Chao et al. [22] proposed that the existing methods for the emotion recognition based on EEG ignore the spatial information. They constructed a multi-band feature matrix to record the frequency domain and spatial information of EEG. The capsule networks are used to learn the feature from the matrix for classifying emotions. Li et al. [18] constructed a neural network that can capture the spatial and temporal relationships of EEG electrodes. They considered the neuroscience and mentioned that brain regions play different roles when a human feels a different emotion.
Feature selection is an important method in EEG signal preprocessing, and existing methods are based on machine learning and statistical-based methods. Sun et al. [23] applied an unsupervised learning method to extract features from EEG. They mentioned that supervised learning methods lead to a decrease in the performance of EEG classification. They utilized an echo state network to construct a recurrent autoencoder for extracting features from EEG. Rahman et al. [24] combined principal component analysis (PCA) and t-statistics for feature extraction. It utilized PCA to reduce the dimension of EEG and used t-statistics to select valuable features. According to the experimental results, the neural network performed the best compared to other models. Alyasseri [25] selected features from the frequency domain by using a wavelet transform for the identification system. Hong et al. [26] considered a combination of EEG and functional near-infrared spectroscopy (fNIRS) as features to identify patients with cognitive and motor impairments. The experimental results show that the average of fNIRS and the highest frequency band of EEG have a value for diagnosing.
Converting the signal from the time domain to the frequency domain for analysis is a common EEG signal preprocessing method. Wavelet transform is a general tool for decomposing EEG to the frequency domain, and many studies applied wavelet transform to analyze the EEG signal. Bhattacharyya et al. [27] applied wavelet transform to determine amplitudes and frequencies of seizure patients from an adaptive frequency band in the EEG signal. They utilized the wavelet transform to calculate the frequency domain of EEG and used a slice window to extract the patterns from frequency components. These patterns are used to detect the seizure. Gupta et al. [28] utilized wavelet transform based on the Fourier-Bessel series expansion to get the frequency domain of EEG. The least-squares support vector machine is used to classify epilepsy. The proposed system can detect epilepsy automatically. Follis et al. [29] applied maximal overlap wavelet transform to decompose EEG signal. The Kruskal-Wallis test was used for the difference in the wavelet variances for detecting seizure and non-seizure. The experimental results show that no pattern can be evidenced for detecting seizures. Jiao et al. [30] utilized wavelet transform to extract the frequency domain from EEG and electro-oculogram. The long short-term memory (LSTM) model is utilized to learn these features for detecting diver sleepiness.

Maximum Marginal Approach
This section describes the maximum marginal approach on EEG signal preprocessing, which is to select segments where the detail component of EEG signals are least similar. These segments represent the difference in brain waves caused by emotions so that they are utilized to detect emotions.

Detail Components Set Construction
Wavelet transform is a decomposition of EEG based on the frequency. It utilizes a scalable and shiftable wavelet to decompose the EEG for calculating the frequency components. The narrow scale indicates that the frequency is high, while the wide scale indicates low frequency. The wavelet transform is formulated as follows.
where a indicates the scale, τ indicates translation, t is time. A discrete wavelet transform utilizes a wavelet to decompose the EEG signal for N times. Figure 3 shows an example of a two-level wavelet transform for decomposing EEG signal, where a n and d n indicate the approximate and detail component at the level n, respectively. The approximate component and detail component at level 1 can be calculated after the first decomposition of EEG. The detail component at level 2 can be calculated by applying wavelet transform on the approximate component at level 1.

Feature Selection
To find out the time interval where the segment of two classes signals are least similar, we calculate the dot product for the average of two classes signals. We assume that there are two classes of EEG signals with different emotions and the sets of detail components are formulated as  Figure 4 shows the dot product of two signals M 1 and M 2 at each time point t ∈ [0, T] where the x-axis is time, the y-axis is the dot product between two signals, and the 0-axis is the horizontal axis with a dot product of 0. The EEG is divided into four time intervals based on the area between the dot product of signals and the 0-axis. The similarity at each time interval is formulated as

BiLSTM Network
BiLSTM is utilized to detect emotions by using the selected features, which constructs a bi-direction recurrent network by using LSTM units [31]. It can capture the past and future information to detect emotions by using non-linear functions.
BiLSTM consists of forward LSTM and backward LSTM. The main idea of LSTM is to forget the information that is not useful for detecting emotion and pass the valuable information to the future time point.
As shown in Figure 5, each LSTM unit consists of a forget gate, input gate, and output gate. The information transmission of the LSTM unit is based on the forget gate. It receives the previous hidden state h t−1 and current information x t to determine whether the information is not useful for detecting emotions. The input gate utilizes previous cell state C t−1 , previous hidden state h t−1 , and current information x t to obtain the current cell state C. The output gate is used to give the probability of two emotions by using the current information or previous information. For each LSTM unit, each gate at time t is calculated as where δ g is sigmoid function, δ c is hyperbolic tangent function, C is the cell states, and h is the hidden states. The forget gate, output gate, and input gate are formulated as f , o, and i. The weight and basis of each gate are formulated W and b. The input of each function is calculated by using the current signal value x t and the previous hidden state h t−1 . The input of the first LSTM unit only has the current signal value.   We formulate the weights of BiLSTM as W = {W f , W i , W o , W c }, the loss function of the BiLSTM is established based on the cross entropy, which is formulated as where y indicates the real emotion,ŷ i indicates the detected emotion, and N indicates number of EEG signals. The training of the model is to calculate weights W and basis b for minimizing the loss function L. The gradient descent algorithm is used to update the parameters W and b, which is formulated as W = W − η ∂L ∂W and b = b − η ∂L ∂b where η indicates the learning rate [32]. The initial parameters are given randomly. To update the parameters W and b at each layer, the backward propagation algorithm is utilized, which is based on the chain rule to calculate the partial derivatives of compound functions [33].

Experimental Results
This section analyzes the experimental results of the maximum marginal approach on EEG signal preprocessing. The selected features are validated by applying to detect emotions. We extracted the frequency components of EEG signals by using the ppwt library that is a module of wavelet transform in python and constructed classifications by using the torch and scikit-learn libraries. The code and dataset are detailed in our GitHub (https://github.com/ligen0423/EEG-based-emotion-detection.git).

Dataset
The dataset is collected from the study [34], which contains EEG data recorded from 10 college students. The EEG dataset records the brainwave while these students watch massive open online course clips. The EEG data have been labeled depending on whether or not the students are confused about the video content. The brainwave is collected from MindSet equipment. They prepared 20 videos, and each video was 2 min long. Each student was given ten videos, and the equipment recorded the brainwave of students. Most of these students were 24 years old, including eight Han Chinese and eight male students.

Results and Analysis
The maximum marginal approach is utilized to select the features, we apply the selected features to the BiLSTM model for detecting emotions. K nearest neighbor (KNN), convolutional neural network (CNN), LSTM, and neural network (NN) are selected as baselines to validate the proposed approach. KNN is a machine learning model that detects emotions based on the class of the nearest neighbor. NN is utilizing non-linear functions to extract the features from the EEG signal for detecting emotions. It does not consider the time of EEG signals. CNN utilizes convolutional kernels for the local sampling of input. LSTM is based on the recurrent neural network that utilizes forget gates to drop some information that is not useful in detecting emotions. Table 1 shows the comparison results under the discrete wavelet transform on level 1. According to the results, MM-based BiLSTM achieves the highest performance on the accuracy, precision, and F1-score. The reason is that BiLSTM model considers the bi-directional temporal information of the EEG signals. Since the precision and the recall affect each other, the recall of MM-based BiLSTM is lower than LSTM. Overall, in the results, MM-based classifications perform better than the classifications without signal preprocessing. This indicates that the proposed maximum marginal approach can improve the performance of emotion detection.  Figure 7 shows the performance of the discrete wavelet transform by using MM-based BiLSTM, where the x-axis represents the level, and the y-axis represents the value of each evaluation metrics. The proposed approach achieves the best performance at level 1 with an accuracy of 0.86. With the improvement of the level, accuracy, precision, and F1-score show a downward trend. When the level is greater than 6, these three evaluation metrics of the method tends to be stable. With the increase in the level, the small features are difficult to be selected, so the metrics of emotion detection has decreased. When the level is greater than 6, the recall of the method has decreased.  Table 2 shows the comparison results between different feature selection methods. We utilized the wrapper, built-in, filter, and fisher score to conduct a comparison experiment. For the wrapper method, we input the EEG signal at each time point to obtain the performance. The features of the wrapper method are selected with the best performance. The built-in method is to select the features by obtaining the weights of the BiLSTM model's weights. The Fisher score is selecting the features by calculating the variances of two classes of EEG signals. The proposed maximum marginal approach achieves the best performance on the accuracy, precision, and F1-score. In addition, it achieves a lower result than wrapper and fisher score on the recall.

Conclusions
In this paper, we propose a maximum marginal approach to EEG signal preprocessing for emotion detection. The approach selects the features that are the least similar segments between EEG signals from the detail components. In the future, we can apply the proposed approach to select and study bispectrum features from the EEG signals for diagnosing [35,36]. The proposed approach includes a time interval division method based on the dot product between two signals. It can be used in other time-series domains, such as in financial series and meteorological series in future research. Our experiments verify the performance of different models based on the selected features for emotion detection. The experimental results show that the proposed approach performs better than other feature selection methods by 17.9% on average in terms of accuracy. The MM-based models achieve better performance than the models without feature selection by 21%, on average, in terms of accuracy. We validate the performance by using the discrete wavelet transform. The results show that the first-level of wavelet transform achieves the best performance on emotion detection tasks. As the level increases, the evaluation metrics of accuracy, precision, and F1-score exhibit a downward trend. If the wavelet transform levels are more than 6, the performance tends to be stable.