Analysis of Personality and EEG Features in Emotion Recognition Using Machine Learning Techniques to Classify Arousal and Valence Labels

: We analyzed the contribution of electroencephalogram (EEG) data, age, sex, and personality traits to emotion recognition processes—through the classiﬁcation of arousal, valence, and discrete emotions labels—using feature selection techniques and machine learning classiﬁers. EEG traits and age, sex, and personality traits were retrieved from a well-known dataset—AMIGOS—and two sets of traits were built to analyze the classiﬁcation performance. We found that age, sex, and personality traits were not signiﬁcantly associated with the classiﬁcation of arousal, valence and discrete emotions using machine learning. The added EEG features increased the classiﬁcation accuracies (compared with the original report), for arousal and valence labels. Classiﬁcation of arousal and valence labels achieved higher than chance levels; however, they did not exceed 70% accuracy in the di ﬀ erent tested scenarios. For discrete emotions, the mean accuracies and the mean area under the curve scores were higher than chance; however, F1 scores were low, implying that several false positives and false negatives were present. This study highlights the performance of EEG traits, age, sex, and personality traits using emotion classiﬁers. These ﬁndings could help to understand the traits relationship in a technological and data level for personalized human-computer interactions systems.


Introduction
Emotions influence how people process information and make decisions, and they shape their behavior when they interact with their surroundings. When interactions between humans and systems occur, physical, cognitive, and social connections are integrated, including empathetic interactions to enhance users' experience in varied fields [1]. For new human-computer interaction (HCI) paradigms, in which systems are in constant contact with the users, it is important to identify and recognize users' emotional states to improve interactions between digital systems and the users with high recognition accuracy and provide a more personalized experience [2].
From an HCI perspective, it is important to find new ways in which systems can be more personalized to the user and to achieve better cooperation in fields like assistive and companion computing using physiological signals like electroencephalograms (EEG)-a useful tool that describe how cognition and emotional behavior are related at a physiological level [3][4][5]. Owing to the development of new technology and portable devices to measure EEG, the research is expanding beyond medical applications to areas like e-learning, commerce, entertainment, etc.
Research in emotion recognition using EEG as the main source of information has focused on how to achieve better performance and accuracy in the emotion identification and classification

HCI and Personality Traits
Personality is a behavior pattern that maintains over time and context, differentiating one person from other [23]. To measure and identify an individual's personality, there are a wide variety of psychometric tests that correspond to various psychological theories about behavioral patterns and innate characteristics. One of these theories is Eysenck's personality model, which posits three independent factors to describe personality: psychoticism, extraversion, and neuroticism (PEN) [24]. Another widely used personality model is the five factor model (FFM). Paul Costa and Robert McCrae [25] devised the FFM and the Neuroticism-Extraversion-Openness Five-Factor Inventory questionnaire. They posited that there are five personality traits: extraversion (social vs. reserved), agreeableness (compassionate vs. dispassionate and suspicious), conscientiousness (dutiful vs. easy-going), neuroticism or emotional stability (nervous vs. confident), and openness to experience (curious vs. cautious). These traits describe the frequency or intensity of feelings, thoughts, or behaviors of an individual compared with other people. From this model, individuals' personality is described through these traits in some varying degree. In contrast, Jeffrey Alan Gray built the Bio-psychological Theory of Personality (behavioral inhibition system (BIS)/ behavioral activation system (BAS)) [26]. This is a model of the general biological processes relevant to human psychology, behavior, and personality. This model describes the existence of two brain-based systems for controlling a person's interactions with their environment: the BIS and BAS.
The FFM is considered a standard in science; however, based on statistical analysis of the data, some researchers argue that the FFM should be expanded to include a sixth trait: honestly-humility. The HEXACO model of personality conceptualizes human personality considering this sixth dimension: honesty-humility (H), emotionality (E), extraversion (X), agreeableness (A), conscientiousness (C), and openness to experience (O). Although the HEXACO model has received growing support from scientists, the necessity of this sixth trait is still a matter of debate [27].
Emotion -personality relationships have been studied in psychology for a long time; however, their relationship in HCI is still under discussion. Some studies used varied kinds of behavioral data to identify emotional states, for example, using participants' text and annotation [28,29], identifying personal characteristics in voice and speech [30][31][32], body language [33,34], and also from digital footprints [35,36]. In the next section, we present some works that aimed to include personality information in different HCI applications.

Related Works
In [30], an approach to detect an user's interaction style in spoken conversation was presented. It combined emotional labeling of conversation-based affective speech corpus of 53 students from both sexes and International Personality Item Pool to measure personality for intelligent speech-based HCIs. Callejas-Cuervo and colleagues [37] proposed a system architecture where videogames can stimulate participants to extract characteristics that can correlate with information from emotion and personality traits, using electrocardiogram (ECG), galvanic skin response (GSR), electromyography (EMG) signals, and the PEN model together with Russell's affective model. Furthermore, in [28], the standard cognitive appraisal model (OCC emotion model) and the FFM personality model were combined in a natural language processing tools to analyze language for affect. In [38], researchers proposed an intensity-based affective model that incorporates the FFM for personality and the OCC model for affect, from predetermined answers related to image and labels; then, they performed personality processing and modeling to predict emotion.
Guo and Ma [35], proposed a modeling personality system from big data coming from different sources including participants' location, heartbeat rate, and browser data to describe person's conditions and activity to accurately identify participants' personality. They proposed a human model of four layers: state, pattern related to daily activity, emotions, and personality. Wei and colleagues [39] focused their attention on the "apparent personality analysis"-using short human-centered video sequences and developing an algorithm for recognizing personality traits from those videos -using deep modal regression. In [40], emotion recognition and varied characteristics (i.e., personality traits, age, and sex) were used to create a car interphase that takes actions when it identifies an emotional reaction (neutrality, panic/fear, frustration/anger, and boredom/sleepiness) in participants using GSR, temperature, and heart rate. At the time of the publication, one major concern was scenarios in which the emotion occurred, which the system could not identify. They noted that the incorporation of demographic characteristics and personality traits can enhance and increase the accuracy of emotion recognition; however, no data involving these were provided.
Robot interaction with people is another vast field of interest for emotion recognition and personality traits. Anzalone and colleagues [41] examined extraversion in human-humanoid interactions using nonverbal behavior (i.e., upper-body movements and interaction duration). Additionally, Bhin and colleagues discussed building an automated psychophysical personality data acquisition system for human-robot interaction under the premise that, to build more natural interaction between human and robots, systems need the ability to recognize the psychological state (i.e., personality) of users [42]. They proposed a system for personality recognition using nonverbal cues through a commercial webcam to record participants' body movement and facial expression, a microphone to record speech signals, a wristband (Microsoft Band or Empatica E4) to obtain physiological signals such as heart rate and body temperature, and the FFM (BFI-K-44) to measure personality.
The influence of personality traits in works regarding affective computing and HCI has been expanding in recent years, which benefit from examining emotion, mood, and personality by employing physiological data, facial and audio recognition, body movement, etc. Works that use EEG signals as a main source of information have been increasing in recent years. The following section summarizes their findings.

EEG-Related Works
Cai and colleagues [43] evaluated the behavior and personality of 42 participants using physiological data from wearables devices that measured heart rate, respiration, and EEG while participants watched a 20-minute video or gave an 8-minutes presentation. Their main objective was not to predict personality or emotion, but to correlate these two characteristics and analyze the relationship between personality traits and behavior through the influence of emotional states. They used Pearson's correlation coefficient to determine the relationship between respiration rate and personality traits, and Spearman's rank correlation coefficient to determine the relationship between facial expressions and personality traits under different emotion states. They yielded evidence of said correlations; however, their results were not definitive.
Rukavina and colleagues [2] examined personality, sex, age and gender roles to improve the emotion recognition accuracy. Age and personality dimensions were correlated with all extracted features during each of the five affective states or core affects from the valence, arousal, and dominance space: for each core affect, they presented two blocks of 20 pictures, 10 pictures in a 2-minute time window (20-second fixation cross, 20-seconds picture presentation belonging to the same core affect, and 20-second fixation cross), using a total of 100 pictures. First, they performed a correlation analysis to consider only meaningful variables for the classification analysis. They concluded that sex and age were significantly correlating with affective states; however, they did not find a correlation between personality traits and affective states. One possible explanation was the high significance level of the Bonferroni correction (p < 0.007). Although, their experiment was limited because the time and methods of presenting the emotional stimuli material to the participants can affect the outcome.
Miranda-Correa and Patras [44] propose a multi-task cascade deep learning approach and performed binary classification for emotional states (arousal and valence) and personal factors (personality, mood, and social interaction) from EEG signals. Forty participants watched short affective videos and thirty-seven participants watched long videos (ranging from 51 seconds to nearly 24 minutes), in individual or group sessions. The researchers analyzed the time and frequency domain features from EEG data trough segments (20 second time window), to obtain the affective levels (arousal and valence) using convolutional neural networks (CNNs) and recurrent neural networks (RNNs); then, they estimated the Big Five factors and their relationship with mood; i.e., the Positive Affect and Negative Affect Schedule (PANAS) from n consecutive segments' affective levels using a second deep network with recurrent layer of eight units with "sigmoid" output function. Using the fusion affect sub-network (from CNNs and RNNs), they achieve 0.59 and 0.61 F1-scores (p < 0.001) for valence and arousal recognition, respectively.
Mittermeier and colleagues [45], studied if there is an emotion-specific neural correlation between positive and negative auditory emotional stimuli and attention through auditory-evoked potentials (AEPs) and if there is a specific relationship between AEPs evoked by emotional stimuli and the personality dimension extraversion-introversion. Differing from the other studies, this work focusses on the auditory emotional stimuli to analyze the correlation between reaction times to the stimuli, evoke potentials, and personality (extraversion). They found that extraversion correlated with the EPN 170 amplitude in the emotional paradigms. Compared to participants in the introverted subgroup, extroverted persons exhibited significantly higher EPN 170 amplitudes in the P3 channel for emotional paradigms (syllables (Pz channel) and words (P3 channel)) but not in the tones task.
Subramanian and colleagues [46] built a multimodal database from 58 participants for implicit personality and affect recognition using commercial physiological sensors to understand the relationship between emotional attributes and personality traits and characterize both by physiological responses. The paper described the influence of personality differences on users' affective behavior using the ASCERTAIN database to understand the relationship between emotional attributes from an arousal/valence model and the Big Five personality model by measuring users' physiological responses. Their main goal was to assess personality traits via affective physiological responses instead of questionnaires. They compiled valence and arousal ratings reflecting user's affective impression: a seven-point scale was used with a -3 (very negative) to 3 (very positive) scale for valence, and a 0 (very boring) to 6 (very exciting) scale for arousal. Ratings concerning engagement (did not pay attention to totally attentive), liking (I hated it to I loved it), and familiarity (never seen it before to remember it very well) were also acquired with the five traits from the FFM. They found that arousal was moderately correlated with extraversion, while valence correlated strongly with linking (0.68 p < 0.05). GSR features obtained higher recognition performance for both arousal and valence (0.68 with Naïve Bayer's (NB) classifier), while ECG features obtained the worst recognition performance (0.56 for valance and 0.57 for arousal using Support Vector Machine (SVM)). EEG features had better performance recognizing arousal (0.61) as compared to valence. GSR, ECG and EMG achieved better recognition of valence. Peripheral (ECG+GSR) features performed better than unimodal features for arousal recognition, while the best multimodal F1-score (0.71) was obtained for valence. Finally, comparing the two employed classifiers, NB achieved better recognition performance than linear SVM for arousal (0.69 using peripheral signals) and valence (0.68 for GSR signal).
Mueller and Kuchinke [47] examined the individual differences in implicit processing of emotional words (happy, neutral, and fear-related) on a lexical decision task; i.e., deciding whether a letter string is a correct word or a non-word via a button. They argued that several participant-specific variables (personality traits and neurological foundation) are known to modulate processing of emotional information. The main task comprised 35 trials in pseudo-randomized order displaying faces of five individuals in each of different emotional expressions. A correlation analysis was performed between happy and neutral, happy and fear-related, and fear-related and neutral. Difference scores were calculated for response times (RTs), error rates (ERs), and drift rates (DRs), which correlated with all nine variables of emotion processing (RTs, ERs, and DRs for Happy-Fear, Happy-Neutral, and Fear-Neutral). Additionally, they performed three multiple linear regression analyses for RTs, ERs, and DRs as dependent variables to predict individual emotional effects. Results revealed that BAS-Drive was the variable that explained most of the variance regarding Happy to Fear RT (H-F RT ) differences. RTs for happy words were generally shorter than RTs to fear-related words resulting in negative difference scores on average. The negative relationship between H-F RT differences and BAS-Drive scores revealed that participants with larger BAS-Drive scores showed greater H-F RT differences. Instead, BAS-Drive scores were positively correlated with Fear to Neutral RTs.
Although the literature shows a relationship between emotion, mood, affective states, and personality [48][49][50], how to effectively use demographic characteristics and personality traits to ascertain emotion recognition remains unclear. There is no standard for choosing emotion or personality models in recognition techniques, and variables and classification approaches differ between studies, thus yielding inconsistent results. This is understandable given the newness of this field, and each outcome offers novel insight into new approaches that can be developed. The presented papers do not provide conclusive information about a strong correlation between emotion classification and demographic characteristics and personality traits. It is still not clear how singular personality traits can be measured from psychological signal and emotion, even though literature says they correlate at biological level [24].
For this study, we aim to test the hypothesis that age, sex, and personality traits, can improve the classification accuracies for arousal and valence levels, when they are used alongside EEG data for emotion recognition processes by machine learning algorithms. Using the information from the AMIGOS dataset [51], we analyzed (1) the contribution of the different EEG traits, demographic characteristics, and personality traits in the classification process of arousal, valence, and discrete emotions labels using varied machine learning techniques, (2) the contribution of the demographic characteristics and personality traits in emotion classification, as relevant information related to behavior and individual characteristics, and (3) the performance of simple classification models with new EEG traits that were not considered in the AMIGOS study.

Materials and Methods
According to the brain-computer interface cycle, where it is the common approach to perform emotion recognition using EEG signals, we followed the basic phases proposed in [52] and used across different works [53][54][55] to analyze the performance of the different classifiers implemented. The first phases related to experiment implementation where participants are exposed to the emotional stimuli, recording of the EEG signals and preprocessing of the raw data, which are being retrieved from the dataset AMIGOS. The phases related to feature extraction and classifier implementation are being performed by the authors of this work.

AMIGOS Dataset Experiment
AMIGOS is a dataset to study the relationship between affect, personality, and mood [51]. The dataset consists of multimodal recordings of participants and their responses to fragments of emotional videos. Participants took part in two experimental setups while watching long and short videos: first, in an individual scenario, and, second, in a group scenario with other participants. While watching the videos, EEG, ECG, GSR, frontal high-definition video, and both RGB and depth full body videos were recorded. Personality (Big-Five), mood (PANAS), internal annotation (participants' self-assessment affective levels), and external annotation (off-line annotations by three annotators; valence and arousal scales) scores were obtained. The participants read and signed a consent form to take part in the study.
From the AMIGOS dataset, we used information from the individual-short videos scenario, in which 40 participants (male = 27, female = 13, aged 21-40 years, mean age = 28.3 years) watched 16 videos (duration < 250 s)-four from each high and low arousal-valence emotional levels combination: high arousal and high valence (HAHV), high arousal and low valence (HALV), low arousal and high valence (LAHV), and low arousal and low valence (LALV). The experiment consisted of an initial self-assessment session for arousal, valence, and dominance scores, as well as a selection of basic emotions (neutral, happiness, sadness, surprise, fear, anger, and disgust) that participants felt before any stimuli were shown. Next, 16 videos were presented in a random order in 16 trials, each consisting of (1) a five-second baseline recording showing a fixation cross; (2) The display of one video; and (3) self-assessment of arousal, valence, dominance, mood, liking, and familiarity, as well as the selection of basic emotions. After the 16 trials, the recording session ended.

AMIGOS Features
For the input features, we used the 14 EEG signals from Emotiv EPOC Neuroheadset as information source, which were recorded at 128-Hz sample rate and 14-bit resolution (electrode distribution is shown in Figure 1). We used the demographic characteristics (age and sex) and personality traits which were acquired before the experiment using an online form.

•
We also utilized age, sex, and the Big Five personality traits [56] (i.e., 7 features). A total of 112 features from the AMIGOS dataset were used in this study.

Added EEG Features
For this study, we calculate the fractal dimension (FD) and the differential entropy (DE) for each one of the electrodes in the five frequency bands mentioned before. Moreover, the rational asymmetry (RASM) and differential asymmetry (DASM) for each of the seven pairs of electrodes in the five bands were calculated (70 features). Because in previews literature [57][58][59], these EEG traits are related to participants' emotional responses, and reports about EEG emotion recognition used the same kind of features to obtain classification above the chance level [14]. We wanted to include these EEG traits to analyze if they can improve the classification performance in contrast with the ones used in the AMIGOS work. • FD is a measure of signal complexity. Because EEG signal are nonlinear and chaotic, a FD model can be applied in EEG data analysis [60]. We compute FD using the Higuchi algorithm for each of the 14 EEG signals (14 features). • DE can be defined as the entropy of continuous random variables and is used to measure its complexity [61]. DE is equivalent to the logarithm of the energy spectrum (ES) in a certain frequency band for a fixed length EEG sequence [62]. We calculated ES as the average energy of EEG signals in the different five frequency bands for each electrode and applied the logarithm to obtain DE (70 features). DASM and RASM were calculated as the differences and ratios between the DE of the seven pairs of asymmetry electrodes (35 features for each trait).
In total, we added 154 features from EEG signals to complement the ones already obtained in the AMIGOS base experiment, thus, 266 features were used in the emotion classification models.
We applied feature selection methods [58] to analyze how the different features are related with the classification labels and to obtain a reduce set of features (from the total 266 features), to analyze the improvement in the classification performance. We applied feature importance to analyze in how much percentage the features contribute to predict the different label scenarios. Also, we implemented univariate selection and a recursive feature elimination with cross validation to select the features that improve the classification rates and built a second set of features.

Classifiers
We wanted to focus our analysis in two different studio cases: first, analyzing the classification performance of different machine learning algorithms using all 266 features to classify the emotional stimuli video labels (arousal and valence levels). In this case, our motivation was to analyze what features can predict the videos emotional labels based on participants' personal data. This would help us identify to what degree it is possible to classify emotions when the self-assessment arousal and valence scores from the participants are not available. Second, We analyzed the classification performance using only EEG data and participants' sex, age, and personality traits to classify self-assessment emotional answers using self-assessment manikins [63] and the seven basic categorical emotions, which were reported by the participants at the end of each video. Our motivation in this second case was to analyze the performance of different classifiers when using only information related to EEG signals and characteristics like age, sex, and personality.
For the first studio case (Figure 2a), we tested three different classification scenarios in which we select two sets of input features. For the classification scenarios, we considered the labels from the videos used as emotional stimuli: first, we combined valence-arousal space labels (HAHV, HALV, LAHV, and LALV); second, we considered arousal labels (HA and LA); and third, we considered valence space labels (HV and LV). To transform the arousal and valence responses into classification labels, we use a threshold of 5.0 to convert the response values into binary labels to obtain categorical data. For the input features, we considered a first set of features with only EEG data, demographic characteristics and personality traits (266 features) and a second set of features with EEG data, demographic characteristics and personality traits reduced using feature selection. From the 640 AMIGOS short-videos observations (16 videos × 40 participants), we exclude the observations that had missing personality and EEG data.
For the second studio case (Figure 2b), we tested 9 different classification scenarios corresponding to the different self-assessment traits related directly to emotions (arousal and valence labels and the seven emotions). To transform the arousal and valence responses into classification labels, we use a threshold of 5.0 to convert the response values into binary labels (HA and LA; HV and LV) to obtain categorical data. For the input features, we considerate: a first set of features with EEG data and demographic characteristics and personality traits (266 features), and a second set of features with EEG and demographic characteristics and personality traits reduced using feature selection. From the 640 AMIGOS short videos observations (16 videos × 40 participants), we exclude the observations that had missing personality and EEG data. The classifiers were chosen to test and compare the emotion recognition accuracy using simple machine learning models.

•
SVM is a linear model that use a decision boundary as a linear function to separate two classes with a line, a plane, or a hyperplane, fitting two parameters: regularization or margin maximization (C), and kernel. C determines the strength of the regularization. Higher values of C correspond to less regularization, trying to fit the training set as best as possible to each individual data point.
With lower values of C, the algorithms will try to adjust to the majority of the data points. Kernels are mathematical functions that take data as input and transform it into the required form (i.e., linear radial basis function). • Naïve Bayes is faster than linear models by looking at each feature individually, collecting simple per-class statistics from each feature.

•
Random Forest, is a collection of decision trees, where each tree is slightly different from the others. With many trees (estimators) it is possible to reduce the overfitting by averaging the results of each tree. And with the tree deepness it is possible to splits the tree capturing more information about the data.

•
Artificial neural network is a multi-layer fully-connected neural nets that consist of an input layer, multiple hidden layers with units, and an output layer. Each layer has an activation function to discriminate the data (i.e., relu, sigmoid).
Our goal was to identify if the accuracy improved in any of the scenarios using different feature sets compared with the accuracy reported in AMIGOS work using PANDAS framework under python language. For the combine valence-arousal space label scenario, we applied SVM with linear (C = 100) and radial basis function (RBF) kernel (C = 100, gamma = 0.1). For the other scenarios we applied SVM with linear (C = 100) and RBF kernel (C = 100, gamma = 0.1), Naïve Bayes, Random Forest (estimators = 2000, max_depth = 300) and an artificial neural network (ANN) with 134 hidden units, one "relu" activation function hidden layer, and, for the output layer we used a "sigmoid" activation function (optimizer = "rmsprop", batch size = 32, epochs = 100). Parameters were tuned using grid search with cross-validation. To evaluate the classifier accuracy, we obtained the mean accuracy, mean F1, and mean area under the curve (AUC) scores using a 10-fold cross-validation approach over the training set of features (75% of all the dataset).

Results
Feature construction and feature selection are key steps in the data analysis process-in most cases, conditioning the success of any machine learning endeavor [64]. Previous works have shown how applying feature selection process in emotion recognition tasks using EEG traits [57,58], increases the performance of the classifiers while the computational power is reduced. For the purpose of this work, we wanted to perform feature selection process to reduce the number of features, preventing overfitting and improving the classification process.
Feature selection methods can generally be divided into filter and wrapper methods. While wrapper methods select features based on interaction with a classifier, filter methods are model-independent [58]. Filter techniques assess the relevance of features by looking only at the intrinsic properties of the data. Advantages of filter techniques are that they easily scale to very high-dimensional datasets, they are computationally simple and fast, and they are independent of the classification algorithm. In this case, feature selection needs to be performed only once, and then different classifiers can be evaluated [65].
For feature selection, we used different feature selection filter approaches to understand how they affect the overall emotion classification process. We analyzed how the features contribute in percentage to predict the different label scenarios using feature importance selection. We also performed univariate selection and a recursive feature elimination (RFE) with cross-validation to select the features to build the second set of features [66].

Feature Importance
Feature importance [66] provides a percentage score for each feature of the dataset, the higher the score, the more important or relevant is the feature towards the output variables-using forests of trees to evaluate the importance of features on a classification task and identify the features more related to each of the labels. The EEG traits contribute around 0.5%-0.3% percent to each of the different scenarios. In contrast, the importance percentage of the personal trait labels have the lowest scores. Table 1 shows the scores of demographic characteristics, and personality traits for the different scenarios, which do not exceed 0.32%, implying that they are not relevant to the classification process.

Univariate Selection
When a finite training sample is provided, the statistic of the relevance is assessed by performing a statistical test with null hypothesis, "H0: the feature is individually irrelevant"; that is, X and Y are statistically independent. Feature selection based on individual feature relevance is called univariate [67]. In univariate selection each feature is considered separately, intended to select single variables that are associated in most degree with the target variable according to a statistical test. The advantage of this technique is that it is fast and scalable; however, it ignores feature dependencies. Higher scores and p-values indicate that the variable is associated and consequently it is useful to the target [68].
Using the univariate feature selection algorithm propose by [66], we obtained the best features based on an analysis of variance, F-test, and p-value of the features related to the three arousal and valence labels scenarios, selecting 10% of significant features [68]. Inspecting the features, we found that, for valence-arousal scenario, only one EEG trait was selected for the algorithm-PSD from EEG channel AF4 in the theta band. For the other scenarios, no features were selected.

RFE with Cross-Validation
RFE with cross-validation is a RFE with automatic tuning of the number of features selected, it returns the most suitable features based on SVM classifier with linear kernel. For RFE, the SVM will be retrained several times with a decreasing number of features [64,68]. The features selected differed from the ones identified by the feature importance and the univariate selection algorithms because, in the RFE, an external estimator assigns weights to features. This estimator is trained on the initial set of features and the importance of each feature is obtained by a coefficient attribute; then, the least important features are discarded from the current set of features. That procedure is recursively repeated on the discarded set until the desired number of features is eventually reached [66].
Performing RFE with personal and EEG traits, we obtained 3 features for valence-arousal label, 15 features for arousal label, and 1 feature for valence. In this case, no demographic characteristics nor personality traits were selected by the algorithm. Finally, for the second set of features we built one dataset combining the results from the univariate selection and the RFE feature selection process to determine how the performance of the classifiers behave in contrast to the original sets of traits.

Feature Importance
In Figure 3, we show the features that exceeded 0.5% of importance for each of the classification labels. The red bars are the feature importance of the forest, along with their inter-trees variability. In Table 2, we show the notation for the EEG channels and pair of electrodes used in Figure 3.   On the x-axis, the list of features is represented; on the y-axis, the importance of each feature in percentage is shown.
For arousal label, age, agreeableness, emotional stability, openness, extraversion, and conscientiousness were selected as important features. For the sadness label, sex, extraversion, openness, and emotional stability were selected as important features. For the neutral label, consciousness was selected as the important feature. For disgust and surprise labels, emotional stability was selected as an important feature. However, the contribution is still under 0.5%, which is too low compared with the other traits.

Univariate Selection
In Figure 4, we show the ratio for the most significant features. In Table 2, we show the notation for the EEG channels and pair of electrodes used in the Figure 4. Figure 4 shows that the following demographic characteristics and personality traits were selected: arousal (openness), sadness (sex, extraversion, and openness), fear (openness), surprise (extroversion and emotional stability), disgust (agreeableness and emotional stability), and neutral (conscientiousness and emotional stability).   For the arousal label, the EEG traits selected were: PSD in the theta (O2, P8), slow alpha (O2, T8), and alpha (O2, T8); PSA index for FC5/FC6, and T7/T8 in the theta, slow alpha, and alpha bands, for O1/O2 in the beta band, and for P7/P8 and, O1/O2 in the gamma band; DE in theta (O2), and gamma (CH14); and DASM in the theta, slow alpha, and alpha bans for FC5/FC6, in beta for O1/O2, and gamma band for P7/P8, and O1/O2. For valence label, important EEG features selected were: DE for AF3, and F7 in the theta band.

RFE with Cross-Validation
In this case EEG traits were selected for the nine different scenarios. No demographic characteristics and personality traits were selected by the algorithm: Finally, we built a dataset combining the results from the univariate selection and the RFE feature selection process to determine how the performance of the classifiers behave in contrast to the original sets of traits.
In general, PSD, PSA DE and DASM features were selected for arousal labels. Diverse EEG information were retrieved for valence labels. PSD, PSA and DE at the temporal (T7, T8), and occipital (O1, O2) regions of the scalp were selected for sadness. For happiness PSD and DE features were selected. PSD, PSA, DASM and RASM were the features selected for surprise label. For disgust, PSD, PSA, DE DASM and RASM at the frontal (AF3, AF4, F3, F4, FC5, FC6, F7, F8) and parietal (P7, P8) regions of the scalp were selected.

EEG Data, Sex, Age, and Personality Traits to Predict Video Emotional Labels
We tested the different machine learning classification models with a 10-fold cross-validation for the two features sets defined as follow: a first set of features with EEG data, sex, age, and personality traits; and a second set of features with EEG data, sex, age, and personality traits reduced using feature selection. In Table 3, the mean accuracy, mean F1, and mean AUC scores are shown for the different sets, classifiers, and scenario labels. For valence-arousal scenario, the first set of features outperformed when we used SVM with linear kernel for the HAHV (accuracy 0.61, F1 0.14, AUC 0.61), LAHV (accuracy 0.64, F1 0.15, AUC 0.54), and LALV (accuracy 0.61, F1 0.15, AUC 0.55) labels. When we used the second set of features, HALV (accuracy 0.74, F1 0.00, AUC 0.51) using SVM with linear kernel had a good performance-higher than change for accuracy and AUC scores. For arousal and valence labels, using the second set of features for arousal, the best classifier was SVM with linear kernel (accuracy 0.52, F1 0.49, AUC 0.51); for valence the best classifier was ANN (accuracy 0.51, F1 0.67, AUC 0.56). For arousal labels, the worst performance was obtained with random forest and the first set of features. For valence labels, the worst performance was obtained with Naïve Bayes and the first set of features. We used receiver operating characteristic (ROC) curves to describe the performance of the best classifiers obtained from each scenario. In Figure 5, we show the 10-fold cross-validation ROC curves for each of the valence-arousal labels when using the first and second features sets with the best accuracies scores. For HAHV, LAHV and LALV labels, the first set of features containing EEG data, sex, age, and personality traits without feature selection obtained the best accuracy score. For the HALV label, the second set of features containing EEG data, sex, age, and personality traits with feature selection had the best accuracy classification. The curves show that the classification process is higher than chance; however, the F1 scores were low, indicating that the classifiers did not achieve good precision (number of correct positive predictions the model got among all the items identified as positive) nor recall (proportion of predictions correctly identified as positive); i.e., the predictions are not relevant in these cases.  Figure 6 shows the 10-fold cross-validation ROC curves for the arousal label scenario and the valence label scenario with the best accuracies scores. The curves show the best classification performance for arousal label was obtained using EEG traits with feature reduction and SVM with linear kernel classifier (0.52 accuracy score when AUC score is higher than chance). For the valence scenario, the second set of features and the ANN classifier had the best accuracy; in this case, the curve shows that the classification process was slightly higher than chance.

EEG Data, Sex, Age, and Personality Traits to Predict Self-Assessed Traits Labels
In Table 4, the mean accuracy, mean F1 and mean AUC scores are shown for the different classifiers and scenario labels. For the arousal scenario, the first set of features (EEG data, sex, age, and personality traits without reduction) performed better when we used SVM with RBF kernel (accuracy 0.68, F1 0.67, AUC 0.71). For the valence scenario, the second set of features (EEG data, sex, age, and personality traits with reduction) performed better when we used SVM with linear kernel (accuracy 0.61, F1 0.65, AUC 0.62). In these cases, we noticed that no demographic characteristics nor personality traits were selected in the reduced set of features; i.e., the improvement in the classification accuracies was owing to the EEG traits selected. For arousal labels, the worst performance was obtained with ANN in both sets of features. For valence labels, the worst performance was obtained with Naïve Bayes and ANN in both sets of features.
When we compared the classifier performance for the discrete emotions, we identified that the accuracy and the AUC scores yielded good results for some of the cases; however, the F1 scores were low, indicating that the classifiers did not achieve good precision (number of correct positive predictions the model got among all the items identified as positive) nor recall (proportion of predictions correctly identified as positive), indicating that the predictions were not relevant in these cases. We use ROC curves to describe the performance of the best classifiers obtained. Figure 7 shows the 10-fold cross-validation ROC curves for arousal and valence labels. The best accuracy scores were obtained for the arousal with 0.68 and valence with 0.61. For the discrete emotions, we decided not to show the ROC curves owing to the low F1 scores obtained in each case.

Discussion
The results obtained in this work revealed that none age, sex, or personality had a correlation with arousal and valence labels from the emotional stimuli. However, compared with self-assessed emotional labels, some demographic characteristics and personality traits were chosen by the feature selection for arousal; for some of the discrete emotions, this might be because the self-assessed responses relied on participants' subjective emotion assessment. If so, demographic characteristics and personality traits would correlate more with the self-assessed emotion responses than with the emotional labels from the stimuli videos. If we analyze the classification performance, only relevant results were obtained for arousal and valence labels from self-assessed answers (owing to low values of F1 scores for discrete emotions). Feature selection showed only an improvement in the classification scores for the valence label; neither demographic characteristics nor personality traits were selected by the feature selection process, which shows that age, sex, and personality traits did not foster classification performance improvement for the selected labels.
It is known form previous works that sex and age can be correlated with these emotional labels and can improve emotion recognition process [2]; however, it is still unclear how personality can be used to obtain better emotion recognition models. We believe that one of the reasons why sex, age, and personality were not chosen by the feature selection algorithms was because the nature of the data. If we adjust the values to a categorical and binary codification, the feature selection algorithms could select these kind of features (as age was selected in the Rukavina and colleagues' work [2]). We decided to work with the continuous data owing to the real description of the population. Other possible limitation is related to the distribution of personalities in the participants, because the sample is relative small to obtain a vast distribution in the five personality traits assessed, and the reported scores are close to each other, implying that the participants exhibit the same type of personality among the group [51], it is difficult to obtain data that describe all the possible outcomes to design classifiers.
Works like [69], intended to create more complex deep learning models which personality information can increase in 10% accuracy of the emotion recognition. Although the literature shows a strong relationship between emotion, mood affective states, and personality, the papers presented here and the information from the AMIGOS dataset analysis still does not provide conclusive information about whether there is a strong correlation between emotion stimuli, emotional states, and personality. We believe this is owing to how the information from the personality questionnaires is being fit as a feature for the classifiers and how the classifiers are being designed for emotion recognition. Utilizing new, deep learning techniques could possibly integrate this kind of information in a more suitable way to achieve personalized emotion recognition models. There is also a need for a behavioral metric that can identify differences between how people perceive and manifest emotions. Behavior changes and emotional reaction can vary from person to person owing to past experiences, memories, and context.
Comparing the results, it is still difficult, using traditional machine learning models or basic deep learning models, to obtain higher classification accuracies using EEG traits when different variables need to be considered (the number of participants in an experiment, number of EEG channels, EEG signals and traits, etc.). Furthermore, it is important to consider the dynamics of the emotional stimuli and how the participants perceive these stimuli; pictures, videos, interactive interfaces, and virtual environments come with different variables. There is still the need to analyze how time, familiarity, interaction, and so on affect individuals' emotion recognition processes, and how the EEG features are correlated with these variables to describe individuals' emotional behavior when interacting with stimuli material. Classification accuracies vary between the different EEG traits used in the classification process and the number of participants in the experiment [11,14]. How to obtain good classification accuracies in cross-participants experiments, which can allow researchers to have more freedom in using different stimuli methods and degrees of interaction with systems to identify emotional states from EEG signals, remains unknown. The literature provides hints about how behavioral cues can be described as digital data to use in emotion recognition when there is an interaction between a person and a machine.
One of the most important physiological signals used in emotion recognition is EEG, owing to the number of features that can identify emotional behavior in the brain and the idea of integrating emotion to BCI systems; however, it is still difficult to achieve higher accuracies for emotion classification using only EEG signals. To face this challenge, multimodal approaches are being implemented because they are robust and increase the accuracy for emotion classification, in contrast to systems that only relay on one information source. Signals like ECG, EMG, and GSR are also considered in these studies because they provide relevant information about individuals' emotional and behavioral state.
Perceived emotions may be owing to exposure to the emotional stimuli (video in this case); however, the chosen dataset did not have information about arousal-valence scores related to the video time traces. In the scope of this analysis, we did not try to trace the changes in emotional response related to the emotional stimuli over time; instead, we wanted to determine how EEG data, age, sex, and personality traits performed while classifying emotions compared with the AMIGOS dataset, in which the classification was made by averaging the time window. Consequently, we used different machine learning techniques and compared the results with the AMIGOS dataset. To analyze emotion recognition over time, techniques such as RNN or learning and teaching support material are recommended, which are beyond the scope of this study.
In future research, it is important to address specific challenges like: the access to a wider and diverse population where participant exhibit different demographic characteristics, personality traits, and behavioral cues; the nature of the emotional stimuli, whether they are passive or active; the data gathered and its evaluation during stimuli exposure time, and the interaction type that the participant can experience while using HCI systems. For personalized HCI, it is important to analyze, not only intrinsic characteristics as demographic or personality traits, but also behavioral cues that manifest when using HCI systems and its context. For future works we would like to focus our approach on capturing and analyzing behavioral cues, together with physiological signals, related to the use of a specific technology or the task being performed.