Comprehensive Analysis of Feature Extraction Methods for Emotion Recognition from Multichannel EEG Recordings

Advances in signal processing and machine learning have expedited electroencephalogram (EEG)-based emotion recognition research, and numerous EEG signal features have been investigated to detect or characterize human emotions. However, most studies in this area have used relatively small monocentric data and focused on a limited range of EEG features, making it difficult to compare the utility of different sets of EEG features for emotion recognition. This study addressed that by comparing the classification accuracy (performance) of a comprehensive range of EEG feature sets for identifying emotional states, in terms of valence and arousal. The classification accuracy of five EEG feature sets were investigated, including statistical features, fractal dimension (FD), Hjorth parameters, higher order spectra (HOS), and those derived using wavelet analysis. Performance was evaluated using two classifier methods, support vector machine (SVM) and classification and regression tree (CART), across five independent and publicly available datasets linking EEG to emotional states: MAHNOB-HCI, DEAP, SEED, AMIGOS, and DREAMER. The FD-CART feature-classification method attained the best mean classification accuracy for valence (85.06%) and arousal (84.55%) across the five datasets. The stability of these findings across the five different datasets also indicate that FD features derived from EEG data are reliable for emotion recognition. The results may lead to the possible development of an online feature extraction framework, thereby enabling the development of an EEG-based emotion recognition system in real time.


Introduction
Emotions have a complex and fundamental role in cognition and behavior, influencing how we interact with and interpret our daily life experiences. Technology that can help recognize and measure emotions is highly desirable, as this can facilitate research and development in areas such as healthcare, education, psychology, robotics, marketing, and entertainment. Emotion recognition technology can also offer individuals (or clinicians) tools to aid emotion regulation and intervention. However, despite years of interest in psychology and affective computing, the development of reliable and generalizable emotion detection techniques is still a challenge. To that end, this study provides a comprehensive analysis of electroencephalogram (EEG) measures of emotional states, categorized in terms of valence (positive vs. negative) and arousal (high vs. low).
Numerous experiments on emotion recognition have been undertaken in recent years utilizing both physiological signals (e.g., electrocardiogram (ECG), galvanic skin resistance (GSR), electromyogram (EMG), respiration rate (RR), electrodermal activity (EDA) and EEG signals) [1,2] and behavioral data (e.g., facial expression images, body gestures, speech and voice signals) [3,4]. Behavioral data can provide useful measures of emotion-related processes; however, they can also be easily biased due to their subjective and controllable nature. In comparison, physiological signals are relatively automatic and uncontrolled MAHNOB-HCI [16], AMIGOS [17] and SEED [18]. The feature sets that are explored include statistical, fractal dimension (FD), Hjorth parameters, higher order spectra (HOS), and wavelet transform. Like in most machine learning studies (e.g., [7,12,13]), classification accuracy serves as the main performance metric in this investigation; however, given that machine learning accuracy can vary across classification techniques [7,12,13], we also test the performance of two common classifiers, specifically, support vector machine (SVM) and classification and regression tree (CART). In this way, we aim to recommend the most useful and generalizable EEG feature-classification technique for detecting emotional states and to guide the future development of emotion recognition systems.
The key contributions of the current study are the following: We (i) evaluated the performance using five independent and public EEG emotion datasets and (ii) identified the optimal feature set for reliable EEG-based emotional state recognition. To our knowledge, this study could be one of the first to utilize five independent public datasets to identify the optimal EEG feature set that can discriminate emotions. The rest of the work is arranged as follows. The scalp EEG datasets and the details of the method are explained in Section 2. Various experimental results and discussions are described in Section 3. Last, Section 4 covers the conclusions. Figure 1 shows the methodological framework of the EEG and machine learning techniques used in the present study. For each EEG dataset (described in Section 2.1), the raw EEG data was subjected to (1) preprocessing, (2) feature extraction, and (3) emotional state classification based on ground-truth self-report data reflecting emotional valence and/or arousal.

Emotion-Related EEG Datasets
This study utilizes emotion-related EEG signals from the five most popularly used public datasets, namely MAHNOB-HCI, DEAP (Dataset for Emotion Analysis using Physiological signals), SEED (SJUT emotion EEG Dataset), AMIGOS (A dataset for Mood, personality, and affect research on individuals and GrOupS), and DREAMER. Table 1 summarizes the core details of these datasets that are relevant to the present research, including sample characteristics and EEG parameters. Datasets include EEG recordings from 15-40 (M = 27.4, SD = 9.4) young adult participants (55% male overall), recorded using different EEG systems with 14-62 scalp channels. The specifics of each dataset are described in detail in the subsequent paragraphs.
The MAHNOB-HCI was pioneered by Soleymani and fellows [16], which comprises of 32-channel EEG recordings and other peripheral nervous system (PNS) signals. The signals were obtained from 27 participants as they watched 20 video clips, which lasted from 34.9 s to 117 s. Participants rated their levels in valence, arousal, dominance, and predictability, after they watched each clip. The DEAP emotion dataset is a multimodal dataset created by [14], which comprises EEG signals from 32-channels and other PNS signals. These signals were collected from 32 healthy subjects when they were watching 40 music video clips (i.e., 40 trials in total), each video clip lasting a minute. After each video/trial, the participants were asked to rate their arousal, valence, dominance, like/dislike, and familiarity level using self-assessment report. The data of each video consists of 60-s EEG recordings and a 3 s baseline data. The EEG were collected with the sampling frequency (Fs) of 512 Hz. The SEED dataset [18] comprises EEG and eye movement signals from 15 participants exhibiting three different emotions namely positive, negative, and neutral emotional state. Each participant had three experiment sessions on different days. In each session, there were fifteen four-minute videos to evoke the required emotions. Therefore, for the three sessions, there are 45 trials in the database. The same fifteen videos were used in all three experiment sessions. The EEG signals were collected from 62 channels with Fs = 1000 Hz and down sampled to 200 Hz. After each session, the participants were asked to label the video according to the contents: −1 for negative, 0 for neutral, and 1 for positive. In this study, we employed only recordings with positive and negative labels from participants to assess our results with additional emotion datasets that apply binary classifiers. The AMIGOS dataset [17] includes 14 channels of EEG data, 2 channels of ECG data, galvanic skin response, and frontal video. The dataset was prepared from the recordings of 40 participants when they viewed 16 film clips, which lasted no longer than 250 s. After seeing each movie clip, participants self-assessed their levels of arousal, valence, dominance, liking, familiarity, and seven fundamental emotions (happiness, disgust, surprise, fear, anger, sorrow and neutral). As stated in [19], seven participants (participant ID: 9, 12, 21, 22, 23, 24, 29, and 33) physiological signals had missing data, therefore we excluded them in our study. Some participants (participant ID: 5, 11, 28, and 30) did not have either valence or arousal affective state values, so we excluded their data as well. The DREAMER dataset was developed by [15] and comprised of EEG signals from 14 channels and 2-channel ECG signals. These signals were collected from 23 healthy participants (aged between 22 and 33 years) as they watched 18 video clips with lengths between 65 and 393 s. After every video clip, the participants assessed their degrees of arousal, valence, and dominance using self-assessment manikin (SAM). In addition, 60-second baseline signals were recorded before each clip. EEG signals were captured with an Emotive EPOC wireless neuro headset with Fs of 128 Hz.
In this study, only the raw EEG and self-report data reflecting emotional valence and arousal were extracted for analyses. Furthermore, to be consistent across datasets, only data from the first session was used from sets including multiple sessions (i.e., AMIGOS and SEED). Across all datasets, and in this study, emotional valence and arousal ( Figure 2) were analyzed as two orthogonal dimensions [19][20][21], consistent with popular circumplex models of emotion (e.g., [22]). The self-report scales used to rate valence and arousal differed across datasets; for DEAP, MANHOB-HCI, and AMIGOS each dimension was rated on a scale of 1 to 9, whereas for DREAMER, they were rated from 1 to 5, with lower numbers reflecting more negative or lower valence and arousal, respectively. To test and validate EEG classification of emotion, EEG data were first categorized as either low or high valence and arousal relative to the midpoint of the respective self-report scale (e.g., DREAMER data with valence score <2.5 were classed as low valence and ≥2.5 were classed as high valence). For the SEED dataset, trials were already labeled in terms of positive (labeled 1) and negative (labeled 0) emotion categories; hence, further categorization was not necessary.

EEG Signal Preprocessing
EEG signal preprocessing, feature extraction, and emotional state classification were performed in Python (v3.7.1) and MATLAB (vR2020b). The average number of EEG trials across datasets was 569.6 (SD = 423.4), including 540 for MAHNOBHCI (20 trials × 27 participants), 1280 for DEAP (40 trials × 32 participants), 150 for SEED (10 trials × 15 participants), 464 for AMIGOS (16 trials × 28 participants), and 414 for DREAMER (18 trials × 23 participants). EEG trial data were filtered using a 50/60 Hz notch and 1 Hz high-pass Butterworth filters (4th order) to remove electrical mains and DC artefact. Data were then down sampled to 128 Hz to match the sample rates across datasets, before being rereferenced to the common average, and segmented into 2-second nonoverlapping epochs. Epochs were then subjected to automatic artefact rejection to remove eye-blinks and other electrical artefacts by excluding segments with data exceeding ±100 µV. There was an average of 1046 (SD = 411) epochs for valence and 1036 (SD = 438) epochs for arousal across participants that were accepted for further analysis.

EEG Feature Extraction
Feature extraction refers to the process of transforming raw data into numerical features that can be processed while preserving the information in the original data set. It yields better results than applying machine learning directly to the raw data. In the emotion recognition process through EEG signals, feature extraction is the crucial part of the emotion classification. The quality of the feature extraction will directly affect the accuracy of the emotion classification. In this study the feature extraction and analysis aimed to identify the salient EEG data that can distinguish or classify emotional states. To that end, we compared the classification performance of ten EEG feature sets that have shown reliable performance in previous emotion recognition studies [10][11][12][13]23], including Statistical, Wavelet, Fractal Dimension, Hjorth Parameters, Higher Order Spectra, Spectral Power, Entropy, Nonlinear, Connectivity, and Graph Metric features. For brevity, only the top five performing feature sets are reported in this article, including Statistical, Wavelet, Fractal Dimension, Hjorth Parameters, Higher Order Spectra features as described in Table 2. All feature sets were extracted from each channel and epoch of the preprocessed EEG data.  Descriptive statistical measures of EEG time-series data have been used for emotion recognition in previous studies [10,11,13]. In this study, the statistical feature set includes the mean (µ X ), median (X), standard deviation (σ X ), mean of absolute values of 1st difference (δ X ), 2nd difference(γ X ), normalized 1st difference (δ X ), and normalized 2nd difference (γ X ) measured from the time-series data at each channel, across epochs; these features were calculated as indicated in Equations (1)-(7) below: where X(t) denotes the time series EEG signal and T represents the total number of EEG samples. In addition, we also extracted skewness, and kurtosis features from the EEG data.

Wavelet Analysis
Wavelet transform is a popular time-frequency (TF) decomposition technique that divides the EEG signal in several approximation and details levels of wavelet coefficients corresponding to various EEG frequency ranges, while conserving the time information of the signal. Previous studies have used wavelet analysis to measure the EEG TF distribution related to emotions [13,[24][25][26]. Here, six-level continuous wavelet transform (CWT) was applied using the Morlet window function to obtain wavelet coefficients of EEG bands. This mother wavelet is chosen based on its near optimal time-frequency (TF) representation characteristics [27]. Besides, Morlet wavelet is widely used in EEG-based emotion recognition studies [28,29]. For sampling rate of 128 samples/sec, we obtained 18 scales and extracted CWT coefficients from first 12 scales as they have frequency >1. 25 (8): where x(t) denotes the time-series EEG signal in this work, ψ is the mother wavelet, and a is the scaling parameter, and b is the shifting parameter. Since the coefficients extracted from this frequency range are related to emotion [27][28][29], we computed average of the absolute values of the wavelet coefficients in each level scales as wavelet features, which is defined in Equation (9): where C (k,l) denotes the each value of the wavelet coefficients at the kth decomposition level, is the number of coefficients, and k = 1, 2, 3· · · , N represents the number of decomposition levels.

Fractal Dimension
FD features approximate the complexity (or fractality) of the EEG times-series data providing an indication the level of self-similarity of the EEG signal across all time scales. Previously, FD features have shown promise for EEG-based emotion recognition [13,23,30,31]. In this study, we considered several FD algorithms commonly used for EEG signal analysis, namely Katz [32], Petrosian [33], and Higuchi [34]; these algorithms are explained below.
Katz's fractal dimension (KFD): Katz suggested an algorithm to compute FD based on waveform planar curve [35], which is defined in Equation (11) as: where, d represents the distance between the two consecutive points (curve diameter) and L denotes the curve length. The mean of FD is calculated as KFD, by dividing L and d by the mean distance between the locations (a), as shown in Equation (12): where N is the number of time samples in the EEG epoch.
Petrosian fractal dimension (PFD): This algorithm converts time-series EEG signal into binary sequences [35]. The PFD is calculated as shown in Equation (13): (13) where N δ denotes the number of segment pairs in the binary sequence that are not identical, and m represents the samples number in the segment.
Higuchi's fractal dimension (HFD): Higuchi developed a method for finding FD directly from the original time series by decomposing into N samples, X(n) = X(1), X(2), X(3)· · · X(N). A new time-series signal is generated by selecting one sample after every ith sample, which is defined as: where i = 1, 2, 3, 4· · · j. Here, i represents the initial time, j represents the internal time, and N represents the total number of samples. For each i, the length of the curve, L i (j) is represented as Equation (15), and then taken as the average value of j values of L i (j).
The HFD method is developed from the concept that the curve under consideration is fractal-like if L(j)αj (−FD) where FD denotes fractal dimension, and it is measured as given in Equation (16):

Hjorth Parameters
Hjorth parameters are statistical functions that explain the EEG signal characteristics in the time domain, which have also been successfully used in emotion recognition from EEG signals [10,36]. It consists of two main measures, namely mobility (h 1 ), and complexity (h 2 ) features [37,38], which are defined according to the following Equations (17) and (18) : where, x(t) represents the time-series EEG signal with a length of N, σ x(t) relates to the standard deviation (SD) of EEG signal, σ 2 x(t) denotes the variance in the time-series EEG signal, σ d denotes the SD of the 1st derivative of x(t), and σ dd denotes the SD of 2nd derivative of x(t). This activity is mobility (estimates the mean frequency), and complexity (computes the bandwidth of the signal).

Higher Order Spectra
Higher order spectra (HOS) are a spectral representation of higher order statistics that can retain the information related to deviations from Gaussianity and the degree of nonlinearity in the time-series EEG signal. Among the group of HOS features, bispectrum (Bis) is regarded as an effective feature for recognizing emotion from EEG signals [10,12,24]. Bispectrum depicts the Fourier Transform (FT) of the third order moment of the signal [39], calculated as shown in Equation (19).
where X(f) is the FT of the given signal X(t), * represents its complex conjugate, and E[·] denotes the expectation operation. In this study, bispectrum features namely, bispectrum mean magnitude (Bis Mag ), and different bispectrum moments were extracted from EEG segments [40], which are computed as Equations (19)- (22): Bispectrum magnitude, Bis Mag Bispectrum logarithmic amplitudes summation, H 1 Bispectrum logarithmic amplitudes of diagonal elements summation, H 2 1st order spectral moment of amplitudes of diagonal elements of the bispectrum, H 3 where N is the total number of time points in the principal domain region, Ω.

Emotion and EEG Feature-Classification Techniques
Two classification techniques, SVM and CART, were applied and evaluated for emotional valence and arousal recognition using each EEG feature set described above, as well as a combination of all feature sets; the specific combination of a feature set (e.g., statistical) and classifier (e.g., SVM) is considered a unique feature-classification technique, which can be tested relative to other combinations. In terms of the classifiers, SVM forms a decision boundary between two classes (e.g., low vs. high valence) and attempts to increase each class distance from the decision boundary [12]. The function of kernel is to take data as input and transform it into the required form. Different SVM algorithms use different types of kernel functions. In the current study, Gaussian radial basis function (RBF) SVM (GSVM) is used due its excellent learning performance [41] in many applications including EEGbased emotion recognition [12,42,43]. CART classifiers use a minimum cost-complexity pruning technique [44]. For example, every test could consist of a linear combination of attribute values for numeric attributes. As a result, the output tree shows a hierarchy of linear models [44]. We compared the performance of four classifiers that have shown reliable classification performance in previous EEG-based emotion recognition studies [5,13], including CART, GSVM, Random forest (RF) and k-nearest neighbor (KNN). For brevity, only top two performing classifiers are reported in this paper, including CART and GSVM.
We applied Bayesian optimization technique for GSVM and CART classifiers to optimize the hyperparameters for each inner fold. For GSVM, we optimized 2 hyperparameters, namely box constrain and kernel scale. For CART, we optimized number of learning cycles, and learn rate, and minimum leaf size. Besides, we have used also random under sampling boosting for ensemble to effectively handle imbalanced data, and standardized the predictor data.

EEG Feature-Classification Accuracy
The accuracy of each EEG feature set-classification technique was evaluated using 4-fold cross-validation. In this approach, each participant's data was divided into 4-folds (i.e., four equal subsets of their data without overlap); 3-folds are randomly used for classifier training and the remaining fold is used as the final test for accuracy and validation. This 4-fold process is performed four times so that each fold is used as a test set, resulting in four classifier accuracy scores for each feature-classification method and participant. The mean accuracy is then computed across the 4-folds reflecting the final feature-classification accuracy per participant. This is applied separately for each dataset. To evaluate overall emotion feature-classification performance, the mean, and the SD of the final accuracy scores were computed across all participants.

Statistical Analysis: Comparing Feature-Classification Performance between EEG Feature Sets
Two-tailed paired-sample t-tests were used to evaluate whether emotion classification performance differs between feature sets. Cliff's Delta value was also computed as an additional effect-size measure of the difference between sets. It is a non-parametric effect size measure that computes the degree of difference between two groups of data (in this case, FD versus each feature set) beyond the meaning of p-values. Cliff's Delta range between −1 and 1, with effect sizes of −1 or 1 indicating that there is no overlap between the two groups, whereas a 0.0 indicates no difference between feature set means. Statistical significance was defined as p-value < 0.05. The p-values were corrected for multiple comparisons using Holm-Bonferroni correction.

EEG Scalp Topography Related to Emotion Processing
The topographic distribution of the most significant feature sets was visually inspected to consider the spatial distributions associated with high/low valence and arousal. To improve visual comparison, the features from each dataset were standardized (z-scored) and only common channels that were shared by all datasets (i.e., 14 channels) were plotted.

Experimental Results and Discussion
This section presents the classification accuracy of different feature sets and classifiers for each public EEG dataset. Higher accuracy scores are indicative of feature-classification methods that are more reliable for EEG emotion recognition. Tables 3 and 4 display the mean classification accuracy for emotional valence and arousal, respectively. Accuracy scores are shown for each feature-classification technique, including the combination of all feature sets (i.e., Combined-ALL); the highest accuracy scores within and across each dataset (i.e., Average) are highlighted in bold.
The majority of EEG feature-classification methods performed reasonably well with average classification accuracies ≥77.78% and 77.59% for valence and arousal, respectively. This is interesting as it suggests a complex relationship between 2D emotional states and many properties of the EEG signal and is consistent with the successful application of these features across previous emotion recognition studies [7,10,13]. As demonstrated by the average classification accuracy across datasets in Tables 3 and 4, the performance of EEG FD feature set was higher for classifying high/low emotional valence and arousal relative to other features when using either the GSVM or CART classifiers. These results are broadly consistent with previous research highlighting the value of FD features for detecting implicit emotional states [31,45,46]. Furthermore, the FD feature set delivered classification results with the lowest SD of accuracy, showing that they perform more consistently than other techniques in this study; this is a valuable property, suggesting greater stability or reliability of this feature set for applied emotion recognition. This outcome is also consistent with prior research showing that the intraclass correlation coefficient (ICC) of FD features is higher for emotional state classification relative to other methods, supporting its reliability for categorizing valence and arousal [46]. Another important finding of the present study is that CART classifiers performed better for EEG emotion recognition compared to GSVM. Using the FD feature set, we achieved the highest mean classification accuracy (average across the five datasets) for valence and arousal as 85.06% and 84.55%, with CART classifier (hereafter named as FD-CART). This was found to be the case across all datasets utilized in this study and is in line with previous research supporting the utility of CART for emotion recognition [47,48]. For that reason, we focus on reporting feature set outcomes utilizing the CART classifier in subsequent sections. Figure 3 shows the box plot of the top three feature set (fractal dimension, wavelet transform, and statistical features) accuracies using CART on each dataset. The plot visually displays the distribution of classification accuracies across each subject and illustrates that the reliability of selected EEG feature set for applied emotion recognition. Table 5 summarizes the statistical results of two-tailed paired t-tests comparing CART classification performance between different feature sets. The t-test outcomes (p-values) and Cliff's Delta effect size also demonstrate that FD features have significantly higher accuracy than other feature sets, confirming the descriptive observations in Tables 3 and 4.  Table 5 shows the statistical results of two-tailed paired t-test of CART classification performances among different feature sets. From the p-value, it is clear that the results from the FD are statistically different (p-value < 0.05) from other feature sets listed in the table, including the combination of features. Table 5 also provides the mean difference effect size for paired samples based on Cliff's Delta. From the Cliff's Delta, it is apparent that, across the five datasets, the emotional state classification accuracy with FD feature set is more accurate on average. However, it is not always the most accurate in each individual case.
The present research utilized a data-driven approach for identifying EEG features that are optimal for emotion detection, thus while FD clearly demonstrates the best performance, it is currently unclear why this feature set is the most effective. Further research is needed to investigate this matter; however, considering our results and the prior literature, we speculate that methodological (technical) and/or functional reasons could explain why FD features are most effective for emotion recognition. In terms of methodology, FD features are nonlinear complexity estimators and calculated over short time-periods, are robust to noise, and do not require any prior transformation of the time series [46,49]. This differs to other methods (e.g., wavelet, statistical) and is beneficial for emotion recognition. At a functional level, fractality indicates whether the EEG signal is synchronous or repetitive over different time scales (i.e., similar patterns occur over shorter and longer intervals), representing the nonlinear complexity of underlying brain activity [50]. As explained by Zappasodi et al. [50] complexity is considered to reflect efficient neuronal functioning, varying between randomness and constant periodicity; with the extremes related to disfunction and difficulty shifting between brain states. From this viewpoint, we can speculate that different emotional states are associated with unique levels of signal complexity, with high/low valence and arousal leading to important shifts in network complexity on a spectrum. This is consistent with the idea that emotions can drive mental (and neuronal) states associated with more or less lability and/or cognitive flexibility (e.g., [51]). FD may provide a relevant and effective means to model those functional differences, which are not captured in other EEG measures of 2D emotional states. The topography of FD features (i.e., KFD, PFD, and HFD) associated with high or low valence are plotted in Figure 4. The grand mean (GM) head maps calculated across datasets for KFD suggest that higher (i.e., more positive) valence was associated with less complexity (fractality) at frontal electrode sites, particularly over the left hemisphere, relative to periods of low valence. This pattern is somewhat consistent with the GM topography of PFD, which suggests higher valence is related to lower complexity at frontal, temporal, and occipital electrode sites. GM HFD indicates a slightly different pattern, with higher valence linked to relatively higher complexity at the most frontal EEG channels, but lower complexity over left frontocentral regions. In general, these results suggest that states of higher valence are related to less EEG complexity over frontal regions. However, this is not always consistent within datasets, and given the limited sites these topographic findings should be considered tentatively.
The topography of FD features (i.e., KFD, PFD, and HFD) associated with high or low arousal are plotted in Figure 5. The GM headmaps for KFD suggest that higher arousal is related to lower complexity at frontal and temporal sites over the left hemisphere. GM PFD is shows a similar spatial distribution for high and low arousal, with lower complexity at left frontocentral sites and temporoparietal sites relative to other scalp regions, and this pattern is stronger in periods of high arousal. GM HFD shows the opposite pattern compared to PFD. These GM topographic distributions are somewhat consistent with those shown for valence, with higher arousal broadly associated with lower complexity over the left hemisphere. However, it is important to note that these topographic interpretations are based only on visual inspection with limited channels. It is also apparent that these GM spatial distributions of FD features are not completely consistent across all datasets. For that reason, these topographic results should only be used as a tentative guide for research interested in FD distribution relative to emotional states or the optimal location for electrodes to facilitate EEG emotion recognition. For more definitive outcomes future research involving more EEG channels is needed. Table 6 provides a comparison to other studies in the literature that have utilized more than one dataset to validate their methods. As the AMIGOS and DREAMER emotion datasets were only lately released, there are only limited comparative studies and hence, for comparison, baseline evaluation work also included in Table 5. Siddharth et al. [52] utilized RGB topographic maps computed from power spectral density (PSD) features using bicubic interpolation and assessed binary classification (low/high) for valence and arousal emotion using DEAP, DREAMER, MANHOB, and AMIGOS . They achieved results 71.09-83.02% for valence and 72.58-80.42% for arousal emotion recognition. In another study, Li et al. [53] suggested an approach that generates spatial maps from EEG signals and combined graph regularized extreme learning machine (GRELM) with SVM for recognizing emotions. They obtained an accuracy of 62.005-88.00% for valence emotion on DEAP and SEED emotion datasets. In the recent study, Topic and Russo [19] demonstrated a hybrid deep learning approach using holographic and topographic feature maps for emotion recognition using EEG signals. In this approach, they introduced EEG-topography in which they utilized the spatial and spectral information and performed classification of valence and arousal on DEAP, DREAMER, AMIGOS and SEED datasets. They reported 76.61-88.45% and 77.72-90.54% for valence and arousal emotion recognition, respectively. The AMIGOS emotion dataset [17] authors achieved the classification accuracy of 57.60% for valence state and 59.20% for arousal state using power spectral density (PSD) EEG features. Similarly, the researchers of the DREAMER dataset [15] achieved emotion recognition accuracy of 62.49% and 62.17% for valence and arousal, respectively. From all these studies, we can see that the identified FD feature set performs better than comparable methods previously reported for both affective states consistently in all the five datasets. This demonstrates the effectiveness of fractal dimension features combined with CART classifier for emotion recognition using EEG signals.

Conclusions
In this work, we present a comparative analysis on different feature extraction methods using multichannel EEG recordings for the creation of a reliable emotional state recognition system. A comprehensive set of features (statistical, FD, Hjorth parameters, HOS, and wavelet transform features) were obtained from the EEG signals. We conducted a quantitative comparison of feature extraction techniques with two different classifiers, GSVM, and CART. The emotion EEG datasets namely, DEAP, DREAMER, MAHNOB, AMIGOS and SEED were used to assess the performance of the study. The findings revealed that FD feature set are the most sensitive feature metric in distinguishing emotions categorized in terms of high/low valence and arousal. The FD-CART feature-classification method tested in this study achieves an overall best mean accuracy of 86.79% and 84.55% for binary classification of valence, and arousal, respectively, using all features in the FD set. Our results suggest that the fractality of the EEG time-domain data has a substantial role and is more reliable for emotional state recognition. This might result in the creation of an effective online framework for extracting EEG features and the development of a real-time human computer interactive system for emotional state recognition.
The study comes with two limitations. Firstly, it would be interesting to explore deep learning classifiers as an alternative for CART and SVM. In recent years, convolutional layers of deep neural networks have been found successful in EEG-based classification of emotion [54,55]. It was not feasible to explore this approach here due to lack of data. However, integrating deep learning with the present research may be a fruitful direction for further work in EEG emotion recognition. Secondly, though subject-dependent cross-validation approach is carried out, building a truly subject-independent (e.g., leave-one-subject-out) system would be more reliable and scalable. In the future, we will extend this approach to subject independent cross-validation (e.g., leave-one-subject-out) with emotional state categorization in three-dimensional space, i.e., the valence-arousal-dominance emotional model. In addition, we also intend to investigate the FD-CART feature-classification method on the combined emotion EEG datasets for training, validation, and evaluation purposes.  Acknowledgments: The authors would like to thank the research teams who collected and made the datasets available publicly and for granting access to the datasets.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: