Predicting Exact Valence and Arousal Values from EEG

Recognition of emotions from physiological signals, and in particular from electroencephalography (EEG), is a field within affective computing gaining increasing relevance. Although researchers have used these signals to recognize emotions, most of them only identify a limited set of emotional states (e.g., happiness, sadness, anger, etc.) and have not attempted to predict exact values for valence and arousal, which would provide a wider range of emotional states. This paper describes our proposed model for predicting the exact values of valence and arousal in a subject-independent scenario. To create it, we studied the best features, brain waves, and machine learning models that are currently in use for emotion classification. This systematic analysis revealed that the best prediction model uses a KNN regressor (K = 1) with Manhattan distance, features from the alpha, beta and gamma bands, and the differential asymmetry from the alpha band. Results, using the DEAP, AMIGOS and DREAMER datasets, show that our model can predict valence and arousal values with a low error (MAE < 0.06, RMSE < 0.16) and a strong correlation between predicted and expected values (PCC > 0.80), and can identify four emotional classes with an accuracy of 84.4%. The findings of this work show that the features, brain waves and machine learning models, typically used in emotion classification tasks, can be used in more challenging situations, such as the prediction of exact values for valence and arousal.


Introduction
Emotions play an undeniably important role in human lives. They are involved in a plethora of cognitive processes such as decision-making, perception, social interactions and intelligence [1]. Thus, the identification of a person's emotional state has become a need. Let us consider a scenario where we want to identify the emotional state of subjects from their EEG signals. However, we do not just want to identify whether a person is feeling positive or negative, or whether they are feeling a certain discrete emotion (e.g., happiness or disgust). We want more than that, we want to know the exact valence and arousal values that the person is feeling. This offers a wider range of emotional states and has the advantage that it can later be converted into discrete emotions if we wish.
There are several works that identify the emotional state of a person from EEG, as we discuss in Section 2, but the vast majority identify a small number of states, such as high/low valence and high/low arousal, or one of the quadrants of the circumplex model of affect (HAHV, HALV, LALV and LAHV, where H, L, A and V stand for high, low, arousal and valence, respectively). Thus, while several approaches for identifying discrete emotions have been proposed in the recent years, little attention has been paid to the prediction of exact values for valence and arousal (see Figure 1).
With this in mind, in this paper, we seek to answer to the following research questions: RQ1) Can EEG be used to predict the exact values of valence and arousal? RQ2) Are the typical features, brain waves and machine learning models used for classification of emotions suitable for the prediction of exact valence and arousal values? RQ3) Are the predicted valence and arousal values suitable for classification tasks with good accuracy? To that end, we analyzed features from different domains (time, frequency and wavelet) extracted from the EEG signal, brain waves and machine learning methods for regression. For this purpose, we used three datasets (DEAP [2], AMIGOS [3] and DREAMER [4]) containing EEG signals collected during emotion elicitation experiments, together with the self assessment of the valence and arousal performed by the participants. We extracted time, frequency and wavelet features from EEG, considering alpha, beta and gamma bands, namely the three Hjorth parameters (activity, mobility and complexity), Spectral Entropy, Wavelet Energy and Entropy and IMF energy and entropy, as we describe in Section 3.
Experimental results, using a subject-independent setup with 10-fold cross-validation technique, show that our proposed model can predict valence and arousal values with a low error and a strong correlation between predicted and expected values (Section 5.2). Furthermore, in two subject-independent classification tasks (two classes and four classes), our model surpasses the state-of-the-art (Section 5.3).
Our main contributions can be summarized as follows: • A systematic study of the best features, brain waves and machine learning models for predicting exact valence and arousal values (Section 4); • Identification of the two best machine learning regressors (KNN and RF), out of seven, for predicting values for valence and arousal (Section 4.3); • Combination and study of features from the time, frequency and wavelet domain, complemented with asymmetry features, for valence and arousal prediction (Sections 4.4 and 4.5); • A model able to predict exact values for valence and arousal with a low error, which can also predict emotional classes with the highest accuracy among state-of-the-art methods (Section 5).

Background and Related Work
To properly understand emotion recognition systems from EEG, we need to know: (1) the set of emotions to be detected and how they are modeled; (2) how EEG signals are related to emotions; (3) which brain waves and features best describe emotional changes in people; (4) which machine learning methods are most appropriate for emotion recognition.

Emotions
Emotions are generated whenever a perception of an important change in the environment or in the physical body appear. There are two main scientific ways of explaining the nature of emotions. According to the cognitive appraisal theory, emotions are judgments about the extent to which the current situation meets our goals or favors our personal wellbeing [5]. Alternatively James and Lange [6,7] have argued that emotions are perceptions of changes in our body such as heart rate, breathing rate, perspiration and hormone levels.
Either way, emotions are conscious experiences characterized by intense mental activity and a certain degree of pleasure or displeasure.
There are two perspectives to represent emotions: discrete and dimensional. In the discrete perspective, all humans are thought to have an innate set of basic emotions that are cross-culturally recognizable. A popular example is Ekman's six basic emotions (anger, disgust, fear, happiness, sadness and surprise) [8]. In the dimensional perspective, emotions are represented by the valence, arousal and dominance dimensions [9]. Valence, as used in psychology, means the intrinsic attractiveness or aversion of an event, object or situation, varying from negative to positive. Arousal is the physiological and psychological state of being awake or having the sensory organs stimulated to a point of perception, ranging from sleepy to excited. Dominance corresponds to the strength of the emotion. Dimensional continuous models are more accurate in describing a broader range of spontaneous, everyday emotions when compared to categorical models of discrete emotions [10]. For example, while the latter can only describe happiness, the dimensional representation can discriminate between several emotions near happiness, such as aroused, astonished, excited, delighted, etc. (Figure 1).

Physiological Signals and EEG
The use of physiological responses to characterize people's emotional state has gained increasing attention. There are several physiological signals that can be used for this purpose, namely the electrical activity of the heart (ECG), galvanic skin response (GSR), electromyography (EMG), respiration rate (RR), functional magnetic resonance imaging (fMRI) or electroencephalography (EEG).
The latter provides great time resolution and fast data acquisition while being non invasive and inexpensive, making it a good candidate to measure people's emotional state. The frequency of EEG measurements ranges from 1 to 80 Hz, with amplitudes of 10 to 100 microvolts [11]. Brain waves are usually categorized into five different frequency bands: Delta (δ) 1-4 Hz; Theta (θ) 4-7 Hz; Alpha (α) 8-13 Hz; Beta (β) 13-30 Hz; and Gamma (γ) > 30 Hz), each one being more prominent in certain states of mind. Delta are the slowest waves, being most pronounced during non-rapid eye movement (NREM) sleep. Theta waves are associated with subconscious activities, such as dreaming, and are present in meditative states of mind. Alpha waves appear predominantly during wakeful relaxation mental states with the eyes closed, and are most visible over the parietal and occipital lobes [12]. Beta wave activity, on the other hand, is related to an active state of mind, more prominent in the frontal cortex during intense focused mental activity [12]. Lastly, Gamma rhythms are thought to be associated with intense brain activity for the purpose of running certain cognitive and motor functions. According to the literature, there is also a strong correlation between these waves and different affective states [1].

Brain Waves and Features
One early decision when working with EEG for emotion recognition is related to the number of electrodes to use. In the literature, this number varies from only 2 electrodes [13,14] to a maximum of 64 electrodes [15,16], with the most common value revolving around 32 [2,[17][18][19][20]. Usually, the placement of the electrodes in the scalp is done according to the international 10-20 system.
Another decision is related to the use of monopoles or dipoles. The former record the potential difference compared to a neutral electrode connected to an ear lobe or mastoid, while the latter collects the potential difference between two paired electrodes, thus allowing for the extraction of asymmetry features [16,19]. The asymmetry concept has been used in many experiments, and states that the difference in activity between the hemispheres reflects emotional positivity (valence). A left upper activity is related to a positive emotion (high valence), while a right upper activity is related to a negative emotion (low valence). According to the literature, the electrodes positioned in the frontal and parietal lobes are the most used because they have produced the best results. Regarding brain waves, most researchers use the set comprised of theta, alpha, beta and gamma. Some also use the delta [15,21] or a custom set of EEG frequencies [22,23], while Petrantonakis et al. [24,25] used only alpha and beta frequencies, as these had produced the best results in previous works. The same for Zhang et al. [14,26], who used only beta frequencies.
The emotion classification problem has been done in one of three ways: (i) identification of discrete emotions such as happiness, scared or disgust [24,27,34,[40][41][42]; (ii) distinction between high/low arousal and high/low valence [2][3][4]19,29,31,43]; and (iii) finding the quadrant, in the valence/arousal space [13,14,19,21,44,45]. In the last two cases, researchers create two classifiers, one to discern between high/low valence and the other for high/low arousal. Although binary classification is the most common, there are works in which researchers have performed multi-class classification [23,32]. There are also some works that included all positive emotions in one class and all negative emotions in another, and sometimes with the addition of the neutral class [15,16]. Table 1 summarizes the main characteristics of a subset of the reviewed papers, which include the database used, brain waves utilized, features extracted, classifiers employed and the set of emotions recognized. These works were chosen according to their relevance and novelty.

Materials and Methods
For creating our model, we explored different brain waves and features, and trained multiple regressors, using annotated datasets. Here, we describe all of them, plus the metrics used to evaluate the quality of the prediction.

Datasets
For our study, we used the AMIGOS [3], DEAP [2] and DREAMER [4] datasets, whose main characteristics are shown in Table 2.
The data used from the AMIGOS dataset that corresponds to the scenario where the 40 participants were alone watching 16 short videos: four in each quadrant of the circumplex model of affect. The EEG signals were recorded using the Emotiv EPOC Neuroheadset, using 14 electrode channels. The DEAP dataset contains data collected using 40 music videos, 10 in each quadrant. The EEG signal was recorded using 32 active AgCl electrodes with the Biosemi ActiveTwo system. The DREAMER dataset contains EEG signals recorded using the Emotiv EPOC Neuroheadset. Signals were collected from 23 participants while they watched 18 film clips selected to elicit nine emotions (amusement, excitement, happiness, calmness, anger, disgust, fear, sadness and surprise).
In the three datasets, participants performed a self-assessment of their perceived arousal, valence and dominance values using the Self-Assessment Manikin (SAM) [46]. In the case of DEAP and AMIGOS, participants also selected the basic emotion (neutral, happiness, sadness, surprise, fear, anger and disgust) they were feeling at the beginning of the study (before receiving any stimulus), and then, after visualizing each video.

Brain Waves
As we presented in the related work section, there is no consensus on which brain waves to use. However, considering the published results, we can see that the best accuracy is attained when using alpha, beta and/or gamma waves. Therefore, we studied only these three types of brain waves.

Features
The features analyzed in our work were selected based on their effectiveness, simplicity and computational speed, according to prior works, as described in Section 2. We studied the Hjorth parameters, Spectral Entropy, Wavelet Energy and Entropy and IMF energy and entropy.

Hjorth Parameters
The Hjorth parameters [47] are obtained by applying signal processing techniques in the time domain, giving an insight into the statistical properties of the signal. The three Hjorth parameters are: activity, mobility and complexity (Equations (1)- (3)).
Activity gives a measure of the squared standard deviation of the amplitude of the signal x(t), indicating the surface of the power spectrum in the frequency domain. That is, the activity value is large if the higher frequency components are more common, and low otherwise. Activity corresponds to the variance of the signal.
Mobility represents the mean frequency or the proportion of standard deviation of the power spectrum. This is defined as the square root of the activity of the first derivative of the signal divided by the activity of the signal.
Complexity indicates how the shape of a signal is similar to a pure sine wave, and gives an estimation of the bandwidth of the signal. It is defined as the ratio between the mobility of the first derivative and the mobility of the signal.
To summarize, the three parameters can be referred as the average power, the average power of the normalized derivative and the average power of the normalized second derivative of the signal, respectively.

Spectral Entropy
Entropy is a concept related to uncertainty or disorder. The Spectral Entropy of a signal is based on Shannon's entropy [48] from information theory, and it measures the irregularity or complexity of digital signals in the frequency domain. After performing a Fourier Transform, the signal is converted into a power spectrum, and the information entropy of the latter represents the Power Spectral Entropy of the signal [49]. Consider x i to be a random variable and p(x i ) its respective probability, then Shannon Entropy can be calculated as follows: The Spectral Entropy treats the signal's normalized power distribution in the frequency domain as a probability distribution, and calculates the Shannon Entropy of it. Therefore, the Shannon Entropy in this context is the Spectral Entropy of the signal if we consider p(x i ) to be the probability distribution of a power spectrum: Psd(x i ) is the power spectral density, which is equal to the absolute value of the signal's Discrete Fourier Transform.

Wavelet Energy and Entropy
Wavelet transformation is a spectral analysis technique in which any function can be represented as an infinite series of wavelets. The main idea behind this analysis is to represent a signal as a linear combination of a particular set of functions. This set is obtained by shifting and dilating a single prototype wavelet ψ(t) called mother wavelet [50]. This is realized by considering all possible integer translations of ψ(t), and dilation is obtained by multiplying t by a scaling factor, which is usually a factor of two [51]. Equation (6) shows how wavelets are generated from the mother wavelet: where j indicates the magnitude and scale of the function (dilation) and k specifies the translation in time.
The Discrete Wavelet Transform (DWT) is derived from the continuous wavelet transform with a discrete input. It analyses the signal in several frequency bands, with different resolutions, decomposing the signal both in a rough approximation and detailed information. For this, it applies consecutive scaling and wavelet functions. Scaling functions are related to low-pass filters and the wavelet to high-pass filters [50].
The first application of the high-pass and low-pass filters produces the detailed coefficient D1 and the approximation coefficient A1, respectively. Then, the first approximation A1 is decomposed again (into A2 and D2) and the process is repeated, taking into consideration the frequency components of the signal we want to isolate [51]. Given that, in this work, we only consider alpha, beta and gamma frequencies, the number of decomposition levels used is three (D1-D3). Thus, D1 corresponds to gamma, D2 corresponds to beta and D3 to alpha. The mother wavelet chosen was db4, since it had already proven to generate good results in similar works.
Finally, after obtaining the detailed coefficients of the desired bands (decomposition levels) the Wavelet Energy can be computed by summing the square of the absolute value of these coefficients. The wavelet entropy can be calculated in a similar way to the Spectral Entropy.

IMF Energy and Entropy
Empirical Mode Decomposition (EMD) is a data-driven method for processing nonstationary, nonlinear, stochastic signals, which makes it ideal for the analysis and processing of EEG signals [52]. The EMD algorithm decomposes a signal x(t) into a finite set of AM-FM oscillating components c(t) called Intrinsic Mode Functions (IMFs) with specific frequency bands. Each IMF satisfies two conditions: (i) the number of local extrema (maxima and minima) and the number of zero crossings differ by at most one; (ii) the mean value of the envelope defined by the local maxima and the envelope defined by the local minima is zero [52]. The general workflow of the EMD algorithm to decompose a signal is described in Algorithm 1.

Algorithm 1 EMD decomposition steps.
1: Identify all extrema (maxima and minima) 2: Create the upper u(t) and lower l(t) envelopes by connecting the maxima and minima separately with a cubic spline curve 3: Find the mean of the envelopes as m(t) = u(t)+l(t) 2 4: Take the difference between the data and the mean: is an IMF or not by checking the two basic conditions described above and the stoppage criterion 6: If d(t) is not an IMF, repeat steps 1-5 on d(t) as many times as needed until it satisfies the conditions 7: If d(t) is an IMF, assign it to an IMF component c(t) 8: Repeat steps 1-7 on the residue, r(t) = x(t) − c(t), as input data 9: The process stops when the residue contains no more than one extremum After decomposition by EMD, the original signal x(t) is a linear combination of N IMF components c i (t) and a final residual part r N (t), as shown in Equation (7).
EMD works as an adaptive high-pass filter, which isolates the fastest changing components first. Thus, the first IMFs contain information in the high frequency spectrum, while the last IMFs contain information within the lowest frequency spectrum. Since each component is band-limited, they reflect the characteristics of the instantaneous frequencies [53]. For this work, we focused on the first IMF, which roughly contains information within the gamma frequency range, the second IMF that contains the beta frequency spectrum, and the third IMF, which contains the alpha band [54]. To obtain the energy and entropy of the IMFs, we used the methods described in the previous sections.

Regression Methods
Since we intended to identify continuous valence and arousal values, the machine learning methods to be used should be regressors rather than classifiers. We studied seven methods: Linear Regression (LR), Additive Regression (AR), Decision Tree (DT), K-Nearest Neighbors (KNN), Random Forest (RF) and Support Vector Machines for Regression (SVR), with two kernels. These were chosen based on the analysis of the related work (see Section 2).

Metrics
To evaluate the regressors' accuracy, we used three measures: the mean absolute error (MAE), Pearson correlation coefficient (PCC) and the root-mean-square error (RMSE) [55]. MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. RMSE also measures the average magnitude of the error, but gives a relatively high weight to large errors. In our case, both metrics express the average model prediction error from 0 to 1. PCC measures the linear correlation between the ground-truth and predicted values. For MAE and RMSE the lower the value the better, while for PCC the closer to 1 the better. The formulas for each of the described measures are presented in Equations (8)-(10), where y represents a series of N ground-truth samples, andŷ a series of N predicted values.

Proposed Model
In this section, we describe both the process that led to the creation of the feature vectors, as well as the analysis performed to create our model.

Feature Vector
To compute the feature vector from the EEG signal, we started by performing a pre-processing step ( Figure 2). Here, we first detrended the signal and eliminated the 50 Hz power line frequency by applying a notch filter. Then, to remove artifacts, we applied adaptive filtering techniques (for ECG artifacts) and wavelet thresholding (for EOG artifacts).  To extract the alpha, beta and gamma bands, we applied three FIR band pass filters to the EEG signal. Then, we computed the Hjorth parameters and Spectral Entropy for each of the bands. Wavelet and IMF-based features were calculated using the signal obtained after the pre-processing step, because these algorithms are band-limited.
We used a 4 s epoch, with 50% overlap, and computed the selected features for this window of the EEG. These values were selected based on the literature and after some preliminary tests.

Methodology
To create our model for valence and arousal prediction, we used the DEAP dataset as ground-truth, since it has been widely used and validated by the research community.
To that end, we performed an analysis of multiple factors (regressors, brain asymmetry, waves and features) to identify the best configurations for our prediction models. The steps followed are represented in Figure 3 and described below: 1.
Regressors Selection: In this step, we compared the accuracy of the seven regressors selected for our analysis. For that, we used a feature vector composed by all the features computed for all electrodes and all waves (but without the asymmetry features); 2.
Brain Asymmetry: After identifying the best regressors, we checked if the use of brain asymmetry could improve the results, and if so, which type of asymmetry and waves produced the best results; 3.
Features by Waves: We verified the accuracy achieved using each feature (for each brain wave) individually. To perform feature selection, we used a forward selection (wrapper method) [56], where we started by ranking the features based on their PCC values, and then added one by one to the model, until the results no longer improved; 4.
Regressor Optimization: Finally, after identifying the best features, waves and regressors, we optimized the parameters of the selected regressors.

Spectral Entropy
Wavelet Energy Wavelet Entropy IMF Energy  To train the several models on each step, we assigned to the feature vector extracted from each EEG epoch the self-reported values of valence and arousal present in the DEAP dataset. Since these values were reported for the overall video, all epochs from a video were annotated with the same valence and arousal values.

Selection of Best Regressors
In all the experiments, we considered all the participants, and used a subject-independent setup through a 10-fold cross-validation. We randomly divided the dataset into 10 folds, then in turn, nine folds were used for training and the other one for testing. We report accuracy in terms of the three metrics (PCC, MAE, RMSE) as the average from the 10 iterations.

Regressors Selection
We started our analysis by identifying the best regression methods among the seven enumerated in Section 3.4: Additive Regression (AR), Decision Tree (DT), K-Nearest Neighbors (KNN), Linear Regression (LR), Random Forest (RF) and Support Vector Machine for Regression (SVR). The latter was tested with two different kernels, linear and Radial Basis Function (RBF)). We used the versions of these machine learning algorithms present in the Weka 3.8 software [57], with their default parameters.
We performed three tests for each regressor, one for each frequency band (alpha, beta and gamma). The feature vector used had a dimension of 256, composed by 8 features for each of the 32 channels: three Hjorth parameters (H1, H2, H3), Spectral Entropy (SE), Wavelet Energy (WP), Wavelet Entropy (WE), IMF energy (IMFP) and IMF entropy (IMFE).
As shown in Table 3, the RF and KNN regressors achieved the best results overall for the three bands. Although the DF presents better results than the KNN for the gamma band, it is worse for the other two bands. SVR, AR and LR present the worst results for all bands. As such, we selected RF and KNN (with K = 1) to be used in the remaining tests.

Asymmetry Features
The brain asymmetry concept has been widely used for emotion recognition, particularly for valence classification. Here, we tested two types of asymmetry: differential and rational. The former was calculated by considering the difference in feature values between homologous channels of opposing hemispheres (e.g., F3-F4, F7-F8, etc.). The rational asymmetry was calculated by dividing the feature values along the same homologous channels. The resulting feature vector, for both asymmetries, had a dimension of 112 (8 features for 14 asymmetry channels). We also combined the two feature vectors in a single one.
From Table 4, we can see that the differential asymmetry of the alpha waves produced the best results for the prediction of valence, using both regressors. The best results for arousal were achieved using a combination of both asymmetries of the gamma waves, using RF, and the rational asymmetry of the beta spectrum, using KNN. In both predictions (valence and arousal) the RF regressor achieved the best results.

Features by Waves
To identify the best features per wave (or the pairs wave-feature), we analyzed each pair individually, by training and testing a model for valence and another for arousal. Figure 4 shows the PCC values for the eight features per wave and for the two regressors. As we can see, the Wavelet Energy (WP) and the first Hjorth parameter (H1-Activity) produced the best results. On the other hand, the Spectral Entropy (SE) and the third Hjorth parameter (H3-Complexity) generated the worst results. Overall, beta and gamma features yield the best results for both regressors.  H2  WP  H1  IMFP  IMFE  H3  SE  WP  H1  H2  WE  IMFP  IMFE  H3  SE  H1  WP  IMFP  IMFE  H2  WE  H3  SE  WP  H1  IMFP  WE  H2  IMFE  H3  SE  WP  H1  H2  IMFP  WE  IMFE  H3  SE  WP  H1  IMFP  IMFE  H2   To identify the set of pairs wave-feature that produced the best values, we first ranked them by the PCC value, and then added one by one, starting from those with higher PCC values, and stopping when the inclusion of a new feature did not improve the results any further. We did this for each band separately and for band combinations, as well as using both regressors. The resulting set of waves and features for each combination is presented in Table 5. Considering each wave alone, gamma and beta exhibit the best results in either regressor, for both valence and arousal. In general, KNN required more features than RF to achieve similar results, when considering single waves.
The combination of features from several waves improved the results, both for valence and arousal. For KNN, the combination of the three waves generated the best results, while for RF the addition of the alpha waves with other waves did not bring any improvement.
As we have seen in Table 4, the alpha differential asymmetry yielded the best results for valence prediction. Thus, we studied it to identify its best features. In Table 5, we can observe that the alpha differential asymmetry (α DA ) generated much better results for valence than for arousal, as expected according to the literature.
Finally, we joined the alpha differential asymmetry with the best combination of waves. As we can see, we achieved the best results for arousal with this set of features. For valence, the values of PCC did not improve, but the MAE value for the KNN regressor was the smallest.
Before we selected the best features for each model, we performed a small test with the AMIGOS dataset. The test revealed that the combination of the alpha differential asymmetry with features from other waves yielded better results than using the asymmetry features only. Consequently, to attain a more generalized model, that is, one that would be accurate for all datasets with similar EEG characteristics and stimuli elicitation as the ones tested, we chose the following models for valence and arousal prediction: • KNN: All features except the Spectral Entropy (SE) from the three bands, plus the alpha differential asymmetry, with all features except the third Hjorth parameter (H3). This yields a feature vector of dimension 770 for DEAP (7 features × 3 waves × 32 channels + 7 features from alpha waves × 14 asymmetry channels) and 343 for AMIGOS and DREAMER (7 features × 3 waves × 14 channels + 7 features from alpha waves × 7 asymmetry channels).
• RF: First Hjorth parameter (H1) and the Wavelet Energy (WP) from the beta and gamma waves, plus the alpha differential asymmetry from the first Hjorth parameter (H1), the Wavelet Energy (WP) and Wavelet Entropy (WE). The resulting feature vector has a dimension of 170 for DEAP (2 features × 2 waves × 32 channels + 3 features from alpha waves × 14 asymmetry channels) and 77 for AMIGOS and DREAMER (2 features × 2 waves × 14 channels + 3 features from alpha waves × 7 asymmetry channels).

Optimizing the Models
After identifying the best configuration for the four models, we performed a hyperparameter tuning to optimize the regressors. For KNN, we tested several K values (1, 3, 5, 7, 11 and 21), and used the Manhattan distance in the neighbor search instead of the euclidean distance. For RF, we tested 50, 500, 750 and 1000 trees (instead of the default 100).
From Table 6, we can see that KNN yielded the best results when K = 1. For RF, the results for 500, 750 and 1000 trees are equal for MAE and RMSE, and had a very small difference for PCC. Therefore, we opted for the 500 trees since it has a lower computational cost.

Experimental Evaluation
To assess the quality and generalization of the two models identified in the previous section, we conducted two experiments. One to evaluate the accuracy of the predicted values of valence and arousal, and another to assess the classification accuracy using the predicted values.

Setup
We conducted the evaluation using three datasets, DEAP, AMIGOS and DREAMER. For each dataset, we created the models using the settings identified in the previous section. In this way, we can understand how generalizable these settings are. We assessed the quality of the proposed models for two tasks: prediction and classification. In the former, we evaluated the models' ability to predict the valence and arousal values, while in the latter, we measured the accuracy in identifying two classes (low/high arousal, low/high valence) and four classes (quadrants), using the estimated values to perform the classification tasks. In all the experiments, we used a subject-independent setup with a 10-fold cross-validation approach.

Prediction Results
As we can see in Table 7, both models (KNN and RF) achieved very good results for the three datasets, with PCC values greater than 0.755 and MAE values smaller than 0.158. In fact, and although the best models were found using the DEAP dataset, the quality of the prediction for the two unseen datasets (AMIGOS and DREAMER) is even better than for DEAP. This shows that the two identified models are generic enough to deal with new data.
Overall, results show that these models can predict valence and arousal with low error and a strong correlation with the expected values. The KNN presents the lowest errors (MAE) in all situations, while for PCC and RMSE both regressors present very similar values.

Classification Results
The final step of our evaluation consisted of evaluating the models in two subjectindependent emotion classification tasks. One where we distinguish between low/high arousal and low/high valence (two classes), and another where we identify the quadrant in the valence/arousal space (four classes). To that end, we used the pair of predicted valence and arousal values.

Arousal and Valence Binary Classification
In this classification task, we computed the accuracy rate for arousal and valence by averaging the classification rates for their low and high classes. We obtained these values for both regressors (KNN and RF) using the three datasets (see Table 8). As we can see, the KNN model achieved the best results for two datasets (DEAP and AMIGOS), while RF was slightly better than KNN from the DREAMER dataset. Thus, overall, we consider the KNN model to be the best one. In Table 9, we compare the results achieved by the best identified model (KNN) with several recent works using the DEAP dataset. As we can see, our model achieved the highest classification rate, with values around 89.8% for both valence and arousal.

Arousal and Valence Quadrants Classification
In the second classification task, we identified the quadrant where the pair of predicted valence and arousal values was located. The classification results for KNN, RF and a mix of both (KNN for arousal and RF for valence due to the PCC values in Table 6) for the three datasets are shown in Figure 5. We present the results in the form of confusion matrices where rows represent the true quadrant and columns represent the predicted quadrant. It can be seen that the KNN-based models generated the best results for all datasets. This was foreseeable due to the small MAE values that these models displayed earlier (Table 6). We achieved better accuracy results for the two unseen datasets (AMIGOS and DREAMER), which shows that the features, brain waves and machine learning methods identified are generic enough to be used in unknown data. Finally, we compared the classification results of the best identified model, with some recent approaches that perform a four class classification using the DEAP dataset. As we can see in Table 10, our best model (KNN) presents the best result, achieving an accuracy of 84.4%.

Discussion
The main goal of this work was to study the features, brain waves, and regressors that would ensure the creation of accurate and generalizable models for identifying emotional states from EEG, through the prediction of exact values for valence and arousal.
Our search for the best prediction models started with the comparison of several machine learning approaches, chosen based on their regular use and overall effectiveness and efficiency. RF and KNN achieved the highest PCC values and the lowest errors (MAE and RMSE), when compared to the remaining ones. Additionally, these regressors are relatively fast, making them good options for interactive applications where results should be produced in real-time.
The analysis of the features revealed that the first Hjorth parameter (H1), Wavelet Energy (WP) and IMF power (IMFP) generated the best accuracies on all frequency bands tested. These are features heavily correlated with power spectrum density. The other features, although not so relevant, also proved to be significant, as their inclusion in the KNN-based models improved the results. The only exception was the Spectral Entropy (SE), which whenever it was included deteriorated the results. The beta-and gamma-based features generated the best accuracies, which is consistent with the state-of-the-art.
The inclusion of the differential asymmetry of the alpha spectrum (α DA ) improved considerably the valence prediction, as shown in Table 5. This corroborates the valence hypothesis, which states that the left hemisphere is dominant for processing positive emotions, while the right is dominant for negative ones [63].
After identifying the best features, we optimized the machine learning regressors by testing different values for their parameters (number of trees for RF, and K for KNN). For RF, we identified 500 trees, and for KNN, K = 1. We also changed the spatial metric of the KNN to the Manhattan distance, which improved the results.
To compare our results with previous approaches, we transformed the predicted valence and arousal values into high/low arousal and high/low valence (two classes) and the corresponding quadrant of the circumplex model of affect (four classes). In both classification scenarios, the identified KNN model achieved the highest accuracy, obtaining a value of 89.8% for two classes and 84.4% for four classes. These results are even more encouraging if we consider that they were obtained by predicting the arousal and valence values rather than directly from a classifier trained to identify classes (as the related work did). This means that we can accurately assess the emotional level of individuals by predicting arousal and valence values, and if necessary we can also identify discrete emotional classes.
From the achieved results, we can conclude that EEG can be used for predicting exact valence and arousal values (RQ1), and that the typical features, brain waves and machine learning models used for classification of emotions can be used for predicting exact valence and arousal values (RQ2). Finally, the two classification scenarios where we converted the predicted valence and arousal values into classes showed that our proposed model produces high quality results in classification tasks (RQ3).

Conclusions
In this work, we investigated the best combination of features, brain waves and regressors to build the best possible model to predict the exact valence and arousal values. We identified KNN and RF as the best machine learning methods for regression. In general, the features extracted within the beta and gamma frequencies were the most accurate, and the brain asymmetry concept of the alpha band proved to be useful for predicting valence. In the end, the KNN-based model, using all features except the Spectral Entropy, achieved the best accuracy for arousal and valence prediction, as well as for classification. A comparison with previous works, using the DEAP dataset, shows that the identified model presents the highest accuracies for two and four classes, achieving 89.8% and 84.4% respectively.
As future work, one can explore the use of these features and regressors in the analysis and classification of other physiological signals, since according to [64] entropy features in combination with RF showed good results for analyzing ECG signals.