Emotion Classiﬁcation Based on Biophysical Signals and Machine Learning Techniques

: Emotions constitute an indispensable component of our everyday life. They consist of conscious mental reactions towards objects or situations and are associated with various physiological, behavioral, and cognitive changes. In this paper, we propose a comparative analysis between di ﬀ erent machine learning and deep learning techniques, with and without feature selection, for binarily classifying the six basic emotions, namely anger, disgust, fear, joy, sadness, and surprise, into two symmetrical categorical classes (emotion and no emotion), using the physiological recordings and subjective ratings of valence, arousal, and dominance from the DEAP (Dataset for Emotion Analysis using EEG, Physiological and Video Signals) database. The results showed that the maximum classiﬁcation accuracies for each emotion were: anger: 98.02%, joy:100%, surprise: 96%, disgust: 95%, fear: 90.75%, and sadness: 90.08%. In the case of four emotions (anger, disgust, fear, and sadness), the classiﬁcation accuracies were higher without feature selection. Our approach to emotion classiﬁcation has future applicability in the ﬁeld of a ﬀ ective computing, which includes all the methods used for the automatic assessment of emotions and their applications in healthcare, education, marketing, website personalization, recommender systems, video games, and social media.


Introduction
Emotions influence our quality of life and how we interact with others. They determine the thoughts we have, the actions we take, subjective perceptions of the world, and our behavioral responses. According to Scherer's theory [1], emotions consist of five synchronized processes, namely cognitive appraisal, bodily symptoms (physiological reactions in the central and autonomic nervous systems), action tendencies (the motivational component that determines us to react or take action), facial or vocal expressions, and feelings (inner experiences, unique for each person apart). Affective computing is the study of systems or devices that can identify, process, and simulate emotions. This domain has applicability in education, medicine, social sciences, entertainment, and so on. The purpose of affective computing is to improve user experience and quality of life, and this is why various emotion models have been proposed over the years and efficient mathematical models applied in order to extract, classify, and analyze emotions.
Feature selection has been ensured by using a Fisher score, principal component analysis (PCA) and sequential forward selection (SFS). This work is a continuation of the research presented in [8], wherein, using the same techniques, we classified fear by considering it to be of low valence, high arousal, and low dominance. Similarly, classification has been based on the physiological recordings and subjective ratings from the DEAP database. In the current approach, we extend our study of emotion classification by including all six basic emotions from Ekman's theory. Our research has impact in the field of affective computing, as we can understand better the physiological characteristics underlying various emotions. This could lead to the development of effective computational systems that can recognize and process emotional states in the fields of education, healthcare, psychology, and assistive therapy [9,10].

Emotion Models
Various theoretical models of emotions have been developed and most of them have been used for automatic emotion recognition.
Paul Ekman initially considered a set of 6 basic emotions, namely sadness, happiness, disgust, anger, fear and surprise [2]. This model is known as the discrete model of emotions. Later, he expanded the list to 15 emotions: amusement, anger, contempt, contentment, disgust, embarrassment, excitement, fear, guilt, pride in achievement, relief, sadness/distress, satisfaction, sensory pleasure and shame [11]. In 2005, Cohen claimed that empirical evidence does not support the framework of basic emotions and that autonomic responses and pan-cultural facial expressions provide no basis for thinking that there is a set of basic emotions [12].
In contrast to the discrete model, the dimensional model provides ways to express a wide range of emotional states. Using this model, an emotion is described using two or three fundamental features and the affective states are expressed in a multi-dimensional space [13][14][15]. Russell's circumplex model is an early model, in which an affective state is viewed as a circle in the two-dimensional bipolar space [15]. The proposed dimensions are pleasure and arousal. Pleasure (valence) reflects the positive or negative emotional states, and a value close to zero means a neutral emotion. Arousal expresses the active or passive emotion component. In this space, 28 affective states are represented: happy, delighted, excited, astonished, aroused, tense, alarmed, angry, afraid, annoyed, distressed, frustrated, miserable, sad, gloomy, depressed, bored, droopy, tired, sleepy, calm, relaxed, satisfied, at ease, content, serene, glad, and pleased.
Whissell also used a bi-dimensional space with activation and evaluation as dimensions [14]. Later, he refined his model and proposed the wheel of emotions as follows: quadrant I (positive valence, positive arousal), quadrant II (negative valence, positive arousal), the third quadrant (negative valence, negative arousal) and quadrant IV (positive valence, negative arousal). Examples of emotional states and their positions in the wheel are as follows: joy, happiness, love, surprised, contentment in QI; anger, disgust, fear in QII; sadness, boredom, depression in QIII and relaxation, calm in QIV [16].
Plutchik developed a componential model in which a complex emotion is a mixture of fundamental emotions. The fundamental emotions considered by Plutchik are joy, trust, fear, surprise, sadness, anticipation, anger, and disgust [13].
A three-dimensional model, called the pleasure-arousal-dominance (PAD) model or Valence-Arousal-Dominance (VAD), was introduced by Mehrabian and Russell in [3,[17][18][19]. In the PAD model, there are three independent dimensions: pleasure (valence), which ranges from unhappiness to happiness and expresses the pleasant or unpleasant feeling about something, arousal, the level of affective activation, ranging from sleep to excitement, and dominance, which reflects the level of control of the emotional state, from submissive to dominant. Figure 1 presents the distribution of Ekman's basic emotions within the dimensional emotional space, spanned by the valence, arousal, and dominance axis of the VAD model [20], with ratings taken from Russell and Mehrabian [3]. Russell and Mehrabian provided in [3] a correspondence between the VAD model and the discrete model of emotions. The values for the six basic emotions, in terms of emotion dimensions, are presented in Table 1.

Emotions and the nervous system
In everyday life, each of us is trapped in the chain of emotions, an important component of behavior. Attempts to define and characterize emotions date back to ancient times, but from the 19th century, research has begun to be scientifically documented. It is well known that there is a close correlation between brain functions and emotions. In particular, the limbic system (hypothalamus, thalamus, amygdala, and hippocampus), the paralimbic system, the vegetative nervous system, and the reticular activating system are involved in processing and controlling emotional reactions. Particular importance is given to the prefrontal cortex, anterior cingulate cortex (ACC), nucleus accumbens, and insula.
The limbic system categorizes our emotions into pleasant and unpleasant (valence). Depending on this, chemical neuro-mediators (noradrenaline and serotonin) increase or decrease, influencing the activity of different regions of the body (posture, mimicry, gestures), in response to different emotional states.
The amygdala, a structure that gives an emotional connotation of events and memories, is located deep within the right and left anterior temporal lobes of the brain [21]. The amygdala is a neural switch for fear, anxiety and panic.
The hypothalamus is responsible for processing the incoming signals in response to internal mental events such as pain or anger. Hypothalamus triggers corresponding visceral physiological effects like a raised heart rate, blood pressure, or galvanic skin response [22].
The insula, the part of the limbic system located deep in the lateral sulcus (Sylvius), is part of the primary gustatory cortex. Regarding the perception of emotions, in this region is perceived the feeling of disgust, which comes as a variant of an unpleasant taste. The experience of disgust protects us from the consumption of spoiled or poisonous foods [23,24]. Russell and Mehrabian provided in [3] a correspondence between the VAD model and the discrete model of emotions. The values for the six basic emotions, in terms of emotion dimensions, are presented in Table 1.

Emotions and the Nervous System
In everyday life, each of us is trapped in the chain of emotions, an important component of behavior. Attempts to define and characterize emotions date back to ancient times, but from the 19th century, research has begun to be scientifically documented. It is well known that there is a close correlation between brain functions and emotions. In particular, the limbic system (hypothalamus, thalamus, amygdala, and hippocampus), the paralimbic system, the vegetative nervous system, and the reticular activating system are involved in processing and controlling emotional reactions. Particular importance is given to the prefrontal cortex, anterior cingulate cortex (ACC), nucleus accumbens, and insula.
The limbic system categorizes our emotions into pleasant and unpleasant (valence). Depending on this, chemical neuro-mediators (noradrenaline and serotonin) increase or decrease, influencing the activity of different regions of the body (posture, mimicry, gestures), in response to different emotional states.
The amygdala, a structure that gives an emotional connotation of events and memories, is located deep within the right and left anterior temporal lobes of the brain [21]. The amygdala is a neural switch for fear, anxiety and panic.
The hypothalamus is responsible for processing the incoming signals in response to internal mental events such as pain or anger. Hypothalamus triggers corresponding visceral physiological effects like a raised heart rate, blood pressure, or galvanic skin response [22].
The insula, the part of the limbic system located deep in the lateral sulcus (Sylvius), is part of the primary gustatory cortex. Regarding the perception of emotions, in this region is perceived the feeling of disgust, which comes as a variant of an unpleasant taste. The experience of disgust protects us from the consumption of spoiled or poisonous foods [23,24].
The hippocampus reminds us of the actions responsible for certain emotional states. Hippocampus abnormalities are associated with mood and anxiety disorders [25]. The reticular activating system controls arousal, attention, sleep, wakefulness, and reflexes [26].

The Six Basic Emotions and Their Corresponding Physiological Reactions
Happiness is an emotional state associated with well-being, pleasure, joy, and full satisfaction. This state is characterized by a facial expression in which the mouth corners are raised [27]. Happiness activates the right frontal cortex, the left amygdala, the precuneus and the left insula, involving connections between awareness centers -frontal cortex, the insula and the center of feeling-the amygdala [28].
Sadness, the opposite of happiness and different from depression, is an emotion associated with the feelings of regret, weakness, mental pain and melancholy. This state is characterized by a facial expression that causes lowering the mouth's corners, lifting the inner corner of the upper eyelid, raising and nearing the eyebrows. The angle with the tip upward between the inner corners of the eyebrows is a relevant sign of sadness [27]. At the brain level, sadness is associated with increased activity of the hippocampus, amygdala, right occipital lobe, left insula, and left thalamus [28].
Fear is an innate emotion, considered as an evolutionary mechanism of adaptation to survival, that appears in response to a concrete or anticipated danger. This emotion is controlled by the autonomic nervous system which brings the body into a fight-or-flight state. Fear is characterized by an increasing heart rate and respiratory frequency, peripheral vasoconstriction, perspiration, hyperglycemia, etc. At the brain level, fear activates the bilateral amygdala that communicates with the hypothalamus, the left frontal cortex and other parts of limbic system [28].
Anger is an intense primary emotional state that is part of the fight or flight mechanism, manifested in response to threats or provocations. During the anger state, as a result of the stimulation of the sympathetic vegetative system, a rising of the adrenaline and noradrenaline discharges occurs, followed by an elevation of blood pressure, increasing heart rate and respiratory frequency [27]. Anger activates the right hippocampus, the amygdala, the left and right part of the prefrontal cortex and the insular cortex [28].
Disgust is often associated with avoidance. Unlike other emotions, in the case of disgust the heart rate decreases. At the facial level disgust is characterized by raising the upper lip, wrinkling the nose bridge and raising the cheeks [27]. Disgust implies an activation of the left amygdala, left inferior frontal cortex and insular cortex [28].
Surprise is the hardest emotion to immortalize, being an unexpected and short-lived experience. After the surprise passes, it turns into fear, anger, disgust, or amusement. When someone experiences surprise, the bilateral inferior frontal gyrus and the bilateral hippocampus are activated. The person tends to arch their brows, open the eyes widely and drop their jaw. The hippocampus is also activated, as it is strongly associated with memory and experiences one had or did not have before [28,29].

Biophysical Data
Electroencephalography (EEG) is a method of exploring the electrical potentials of the brain. The encephalogram is the graph obtained from the registration of electric fields at the scalp level. EEG is efficient for detecting affective states, with good temporal resolution. There are four types of waves commonly recorded in humans.
Delta waves have high amplitude and low frequency (0.5-3 Hz). They are characteristic of psychosomatic relaxation states, being recorded in deep sleep phases. They can also be encountered in anesthetic states, following the blocking of nerve transmission through the reticular activating system. They can appear in any cortical region, but predominate in the frontal area [30]. The theta waves have a frequency of 3-8 Hz. This rhythm occurs during low brain activities, sleep, drowsiness or deep meditation. An excess of theta waves is related to artistic, creative, or meditative states. The alpha waves, so-called 'basic rhythm', are oscillations of small amplitude with average frequencies around 8-12 Hz. Under normal conditions, their amplitude increases and decreases regularly, the waves being grouped into characteristic spindle. They appear in the occipital cortex and indicate a normal wakeful state when the human subjects are relaxed or have their eyes closed. The beta waves are characterized by a frequency of 12-30 Hz. Unlike the alpha rhythm, beta waves are highly irregular and signify a desynchronization of cortical neural activity. Their maximum incidence is in the anterior parietal and posterior frontal regions of the brain. This wave is associated with active thinking or concentration and is related to consciousness, brain activities and motor behaviors. The gamma waves are the fastest brainwaves (30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42), usually found during conscious perception and related to the emotions of happiness and sadness [31]. During memorization tasks, the activation of the gamma waves is visible in the temporal lobe. The predominance of these waves has been associated with the installation of anxiety, stress, or arousal states [32].
Galvanic skin response (GSR) or electrodermal activity (EDA) is a handy and relatively noninvasive tool used to study body reactions to various stimuli, being a successful indicator of physiological and psychological arousal. The autonomic control regulates the internal environment and ensures the body's homeostasis [33]. It is considered that the skin is an organ that responds preponderantly to the action of the sympathetic nervous system through the eccrine sweat gland [34]. For this reason, the data acquisition made from the skin can offer information about the attitude of the body's "fight or flee" reactions. Skin conductance is quantified by applying an electrical potential between two contact points on the skin and measuring the current flow between them. EDA has a background component, namely skin conductance level (SCL), resulting from the interaction between the tonic discharges of the sympathetic innervations and local factors [35], and a fast component -skin conductance responses (SCR), which results from the phasic sympathetic neuronal activity [36]. A high level of SCL indicates a high degree of anxiety [37].
Heart rate (HR) and HR variability (HRV) are other parameters used to assess human emotions. They have good temporal resolution and can monitor variations or trends of emotions. HRV is associated with cerebral blood flow in the amygdala and in the ventromedial prefrontal cortex [38]. Individuals with high HRV tend to better regulate their emotions [39].
Respiration is an important function for maintaining the homeostasis of the internal environment. Respiratory regulation is achieved by correlating the respiratory centers and the brainstem, the limbic system, and the cerebral cortex. Breathing rate also changes according to emotional responses [40].

Machine Learning Techniques for Emotions Classification
The interest in the field of automatic recognition of emotions is constantly increasing. The data used in emotion recognition systems is primarily extracted from voice, face, text, biophysical signals and body motion [41]. In this section, we performed a brief analysis of the machine learning techniques involved in automatic emotion recognition systems using biophysical data.
Three binary classifications have been performed in [4]: low or high arousal, low or high valence, and low or high liking. The authors used the Gaussian naïve Bayes algorithm for classification, alongside Fisher's linear discriminant for feature selection and leave-one-out cross validation for classification assessment. To measure the performance of the proposed scheme, F1 and average accuracies (ACC) were used. To draw a final conclusion, a decision fusion method was adopted.
Atkinson and Campos [42] used the minimum-redundancy maximum-relevance (mRMR) method for feature selection and Support Vector Machine for binary classification into low/high valence and arousal. The reported accuracy rates were: 73.14% for valence and 73.06% for arousal. The study was performed by extracting and processing the EEG features from the DEAP database. Yoon and Chung [43] used the Pearson correlation coefficient for feature extraction and a probabilistic classifier based on the Bayes theorem for resolving the binary classification problem of low/high valence and arousal discrimination, with an accuracy of about 70% for both: 70.9% for valence and 70.1% for arousal. For the three-level classification (low/medium/high), the accuracy for high arousal was 55.2% and for valence, 55.4%. Similarly, emotion recognition has been performed based on the EEG data from the DEAP dataset.
A similar approach is presented in Naser and Saha [44], where the SVM algorithm led to accuracies of 66.20%, 64.30%, and 28.90% for classifying arousal, valence, and dominance into low and high groups. Dual-tree complex wavelet packet transform (DT-CWPT) was used for feature selection.
In [45], two classifiers, linear discriminant analysis and SVM were used for two-level classification of valence and arousal. The results showed that SVM produces higher accuracies for arousal and the LDA classifier is better in the case of valence. By applying the SVM technique on the EEG features, classification accuracies of 62.4% and 69.40% were achieved during a music-induced affective state evaluation experiment whereby the users were required to rate their currently perceived emotion in terms of valence and arousal. In the case of the LDA classifiers, the accuracies were 65.6% for valence and 62.4% for arousal. Liu et al [46] conducted two experiments in which visual and audio stimuli were used to evoke emotions. The SVM algorithm, having as input fractal dimension features (FD), statistical and higher order crossings (HOC) extracted from the EEG signals provided the best accuracy for recognizing two emotions -87.02%, in the case of the audio database and 76.53% in the case of the visual database. The authors provided a comparison between the performances of the proposed strategies applied on their databases and a benchmark database, DEAP. Having DEAP as data source, the mean accuracy for two emotions recognition was 83.73% and 53.7% for recognizing 8 emotions, namely happy, surprised, satisfied, protected, angry, frightened, unconcerned, and sad. A comparative study of four machine learning methods (k-nearest neighbor, SVM, regression tree, Bayesian networks (BNT)) showed that SVM offered the best average accuracy at 85.8%, followed by regression tree with 83.5% for the classification of five types of emotions, namely anxiety, boredom, engagement, frustration, and anger into 3 categories, namely low, medium, and high [47].
In the case of the two-class classification for arousal, valence and like/dislike ratings, for EEG signals, the average accuracy rates were 55.7%, 58.8%, and 49.4% with SVM. Having as input features the peripheral physiological responses, the classification average accuracies recorded 58.9%, 54.2%, and 57.9% [48].
Based on the MAHNOB dataset and using the SVM algorithm with various kernels, Wiem et al. [49] reached a classification accuracy between 57.34% and 68.75% for valence and between 60.83% and 63.63% for arousal when discriminating into low/high groups and between 46.36% and 56.83%, respectively, 50.52% and 54.73% for classification into 3 groups. The features were normalized and a level feature fusion (LFF) algorithm was used. The most relevant features were the electrocardiogram and the respiration volume.
In [50], a deep learning method based on the long-short term memory algorithm was used for classifying low/high valence, arousal and liking based on the EEG raw data from the DEAP dataset [4], with accuracies of 85.45%, 85.65% and 87.99%. Jirayucharoensak et al. [51] used a deep learning network implemented with a stacked autoencoder based on the hierarchical feature learning approach. The input features were the power spectral densities of the EEG signals from the DEAP database, which were selected using the principal component analysis (PCA) algorithm. Covariate Shift Adaptation (CSA) was applied to reduce the non-stationarity in EEG signals. The ratings from 1 to 9 have been divided into 3 levels and mapped into "negative", "neutral", and "positive" for valence and into "passive" "neutral", and "active" for arousal. A leave-one-out cross validation scheme was used to evaluate the performance. They were finally classified with an accuracy of 49.52% for valence and 46.03% for arousal.
A 3D convolutional neural network-based schema has been applied on the DEAP data set in [52] for a two-level classification of valence and arousal. The authors increased the training samples through an augmentation process adding noise signals to the original EEG signals. The schema consisted of 6 layers: input layer, middle layers (two pairs of convolution and max-pooling layers) and a fully-connected output layer. In both convolution layers, rectified linear unit (RELU) is used as activation function. The recorded accuracies for the proposed method were: 87.44% for valence and 88.49% for arousal. Random forest is not a very common technique used for emotion recognition. In [53], the authors reported a 74% overall accuracy rate for emotions classification into amusement, grief, anger, fear and a baseline state using the random forest classifier. Leave-one-subject-out cross validation was used for evaluating the classifier.
In Table 2, we present the performance of the ML techniques used for emotion recognition.    Table 3 presents the intervals of valence, arousal and dominance assigned to each of the six basic emotions, inspired from the values of Mehrabian and Russell's model from Table 1. The ratings of valence and arousal from the DEAP database have been assigned to a larger interval: low ([1;5)) or high ([5;9]). Dominance was the emotion dimension that fluctuated in a smaller interval. Thus, an emotion is characterized by low or high valence/arousal and some degree of dominance spanned across a narrower interval.   Table 3 presents the intervals of valence, arousal and dominance assigned to each of the six basic emotions, inspired from the values of Mehrabian and Russell's model from Table 1. The ratings of valence and arousal from the DEAP database have been assigned to a larger interval: low ([1;5)) or high ([5;9]). Dominance was the emotion dimension that fluctuated in a smaller interval. Thus, an emotion is characterized by low or high valence/arousal and some degree of dominance spanned across a narrower interval. Table 4 presents the intervals corresponding to Condition 0, no emotion (or the lack of emotion), and Condition 1, the existence of a certain degree of emotion.
Four input features, sets have been generated after extracting and labelling the data from DEAP: (1) 32-channel raw EEG values and the peripheral recordings: hEOG, vEOG, zEMG, tEMG, GSR, Respiration, PPG, and temperature; (2) Petrosian fractal dimensions of the 32 EEG channels and the peripheral recordings mentioned at (1); (3) Higuchi fractal dimension of the 32 EEG channels and the peripheral recordings mentioned at (1); (4) Approximate entropy of the 32 EEG channels and the peripheral recordings mentioned at (1).  The DEAP database contains 40 valence/arousal/dominance ratings for each of the 32 subjects. For the emotion of anger, there were 28 ratings in the database for Condition 1-Anger and 239 ratings for the Condition 0-No anger. In order to have a balanced distribution of responses for classification, we used 28 ratings for Condition 1 and 28 ratings for Condition 0, so we took the minimum between both. Every physiological recording had a duration of 60 s. Thus, in order to obtain a larger training database, we have divided the 60 s long recordings into 12 segments, each being 5 s long. Thus, for anger we obtained a training dataset of 672 entries that was fed to the classification algorithms. Table 5 presents, for each emotion, the number of entries for Conditions 0 and 1 and the total number of 5-s long segments that have been fed as input data to the classification algorithms.
For binary classifying the emotion ratings into Condition 1 (emotion) and Condition 0 (lack of emotion), we applied four machine and deep learning algorithms, with and without feature selection, similarly to the experiment described in [8], where we classified the emotion of fear. Our input features were: EEG (raw values/approximate entropy/Petrosian fractal dimension/Higuchi fractal dimension) and peripheral signals, hEOG, vEOG, zEMG, tEMG, GSR, respiration rate, PPG and temperature. We constructed models based on four deep neural networks (DNN1-DNN4) with various numbers of hidden layers and neurons per layer. The machine learning techniques employed were SVM, RF, LDA and kNN. As feature selection algorithms, we used Fisher selection, PCA and SFS. Higuchi fractal dimension (HFD) is a non-linear method highly used in the analysis of biological signals. It originates from chaos theory and has been used for 30 years as a modality of measuring signals dynamics and complexity. It has been used for detecting hidden information contained in biophysical time series with the help of fractals, which, despite scaling, preserve the structure and shape of complex signals. There are many methods for calculating fractal dimensions, such as Katz's, Petrosian's or Higuchi's [54][55][56]. Approximate Entropy (ApEn) is a measure of regularity in the time domain which determines the predictability of a signal by comparing the number of matching sequences of a given length with the number of matching sequences one increment longer [57]. In regular data series, knowing the previous values enables the prediction of the subsequent ones. A high value of ApEn is associated with random and unpredictable variation, while a low value correlates with regularity and predictability in a time series [58]. DNN1 has one input layer, three hidden layers with 300 neurons per layer, and one output layer. The input layer contains 40 neurons, corresponding to the 32 EEG data (raw values/Petrosian fractal dimensions/Higuchi fractal dimensions/approximate entropy) and 8 peripheral data (hEOG, vEOG, zEMG, tEMG, GSR, respiration rate, PPG and temperature). Petrosian fractal dimensions, Higuchi fractal dimensions and approximate entropy have been computed using the functions from the PyEEG library [59]. The output layer generates two possible results: 0 or 1. In the output layer, we used the binary crossentropy loss function and sigmoid activation function. Also, the model uses the Adam gradient descent optimization algorithm and the rectified linear unit (RELU) activation function on each layer. The network is organized as a multi-layer perceptron network. The input data has been standardized to zero mean and unit variance. The Keras classifier [60] had 1000 epochs for training and a batch size of 20. Cross-validation has been performed by using the k-fold method with k = 10 splits and the leave-one-out method, which takes each sample as test set and keeps the remaining samples in the training set. However, the leave-one-out method is more computationally demanding than k-fold. The model has been trained and cross-validated for 10 times and we calculated the average accuracy and F1 score across these 10 iterations. Each time, the input data has been shuffled before being divided into the training and test datasets.
DNN2 has 3 hidden layers and 150 neurons/layer, DNN3 has 6 hidden layers with 300 neurons/layer, and DNN4 has 6 hidden layers with 150 neurons/layer. Their configuration and method of training and cross-validating is similar to DNN1. Feature selection was not necessary for the DNNs, as the dropout regularization technique prevents overfitting.
For the SVM method, we used the radial basis function kernel (rbf). For Random Forest, the number of trees in the forest has been set to 10 (default value for the n_estimators parameter in the RandomForestClassifier method from the scikit learn library [61]). The function that measures the quality of the split has been "entropy", that divides based on information gain. For kNN, the number of neighbors has been set to 7. For SVM, LDA, RF, and KNN, the input data has been divided into 70% training and 30% test using the train_test_split method from the scikit learn library. This function makes sure that each time, the data is shuffled before dividing into the training and test datasets. The input data has been also standardized in order to reduce it to zero mean and unit variance.
These classification methods have been trained and cross-validated 10 times, without feature selection and with the Fisher, PCA, and SFS feature selection methods. In a similar way to the DNNs, we calculated the average accuracy and F1 score across these 10 iterations. The Fisher score has been calculated on the training dataset and then the first, most relevant 20 features have been selected. Consequently, a machine learning model (SVM/RF/LDA/kNN) has been constructed and cross-validated based on these relevant features. The PCA algorithm retains 99% of the data variance (the n_components parameter of the PCA method from scikit learn has been set to 0.99). The SFS classifier selects the best feature combination containing between 3 and 20 features.

Results
The cross-validation results obtained after training and testing on the data using the machine and deep learning methods, with k-fold cross validation, for each basic emotion, are presented in Tables 6-11. The numbers written in bold correspond to the maximum F1 scores and accuracies. Table 12 presents the most important features for each of the six basic emotions, based on the Fisher score and SFS algorithm. The accuracies obtained using the leave-one-out method for cross-validation are with 5%-10% lower, but the hierarchy of results is preserved, not affecting the classification ranking. Figure 3 presents the decision tree obtained for classifying anger using RF with raw EEG data and peripheral features, without feature selection.         Table 13 presents the best classification F1 scores for each emotion, with and without feature selection. Without feature selection, kNN has been selected in 13 cases, followed by Random Forest (seven times) and SVM (four times). For anger, the highest classification accuracy has been obtained for Petrosian and Higuchi fractal dimension, using SVM (98.02%). For joy, the highest classification accuracy has been achieved by kNN using Petrosian values (87.9%). For surprise, kNN with raw EEG values (85.01%), disgust-kNN with Petrosian values (95%), fear-kNN with raw EEG values (90.75%), sadness-SVM with Higuchi fractal dimensions (90.8%).

Discussion
With feature selection, kNN has been selected in 12 cases, random forest seven times, SVM five times and LDA one time. SFS has been selected two times and Fisher score 14 times. For anger, the highest classification accuracy has been obtained for raw data using kNN and Fisher (97.52%). For Joy, the highest classification accuracy has been achieved by LDA and SFS using raw values (100%). For surprise, SVM and SFS with raw EEG values (96%), disgust, random forest and Fisher with Higuchi fractal dimensions (90.23%), fear, kNN and Fisher with Higuchi fractal dimensions (83.39%), and sadness, SVM and Fisher with Higuchi fractal dimensions (86.43%). For anger, disgust, fear and sadness, the classification accuracies have been higher without feature selection. The SFS feature selection algorithm lead to higher accuracies for joy and surprise.
According to Table 12, the most important features for anger were tEMG and respiration. This result is consistent with reality, because in conditions of anger, anxiety and stress, besides intensifying the breathing, there is also an accumulation of tension in the muscles located between the forehead and the shoulders (tension triangles). Thus, corrugator muscles are responsible for forehead frowning, the masseter and the temporalis muscles are responsible for jaw clenches, while the trapezius muscles are responsible for the neck tightening and the shoulders rising.
The most important features for joy were GSR and zEMG. Dynamic facial expressions of joy determine an intense activity of the zygomatic muscle, which pulls up and laterally the corners of the lips to sketch a smile. High skin conductance entropy indicates body arousal.
The most important features for surprise are GSR and FC1. Although surprise is an emotion with neutral valence, it is frequently associated with increased GSR.
According to existing studies, disgust suppresses attention, in order to minimize the visual contact with the threatening agent. This explains the movement of the eyeballs horizontally and vertically (vEOG and hEOG), which are the most important features for disgust (Table 12).
Fear is characterized by opening the eyes and rotating the eyeballs horizontally and vertically, for danger identification (vEOG, hEOG), stretching the mouth (zEMG), and opening the nostrils for better tissue oxygenation. Activation of the frontal cortex (FC1, F4) aims to stimulate motor areas and prepare the body for escape or fight.
Sadness involves an increasing activity, mostly in the left prefrontal cortex and in the structures of the limbic system, as we can see from our most selected traits: FC1 and FP1 (Table 12).
Liu [46] achieved a classification accuracy of 53% for recognizing eight emotions using Fractal Dimension Features with SVM, while we obtained accuracies of over 83% using Higuchi Fractal Dimensions and kNN. Our results are comparable to those obtained by Liu [47] who reached accuracies of 85% with SVM and 83% using a regression tree for classifying anxiety, boredom, engagement, frustration, and anger into three categories, namely low, medium, and high.

Conclusions
This paper presented a comparative analysis between various machine learning and deep learning techniques for classifying the six basic emotions from Ekman's model [2], namely anger, disgust, fear, joy, sadness, and surprise, using physiological recordings and the valence/arousal/dominance ratings from the DEAP database. DEAP is the most well-known and exhaustive multimodal dataset for analyzing human affective states, containing data from 32 subjects who watched 40 one-minute long excerpts of music videos. Using the three-dimensional VAD model of Mehrabian and Russell [3], each of the six basic emotions has been defined as a combination of valence/arousal/dominance intervals [63]. Then, we classified them into two classes: 0-lack of emotion and 1-emotion by training and cross-validating using various machine learning and deep learning techniques, with and without feature selection. For anger, the highest classification accuracy has been obtained with Petrosian and Higuchi fractal dimensions, using SVM and no feature selection (98.02%). For joy, the highest classification accuracy has been achieved by LDA and SFS using raw EEG values (100%). For surprise-SVM and SFS with raw EEG values (96%), disgust-kNN with Petrosian values and no feature selection (95%), fear-kNN with raw EEG values and no feature selection (90.75%), and sadness-SVM with Higuchi fractal dimensions and no feature selection (90.8%). In the case of four emotions (anger, disgust, fear and sadness), the classification accuracies were higher without feature selection.
Our approach to emotion classification has applicability in the field of affective computing [64]. The domain includes all the techniques and methods used for the automatic recognition of emotions and their applications in healthcare, education, marketing, website personalization, recommendation systems, video games, and social media. Basically, human feelings are translated to the computers, which can understand and express them. The identification of the six basic emotions can be used for developing assistive robots, as the ones which detect and processes the affective states of children suffering from autism spectrum disorder [65], intelligent tutoring systems that use automatic emotion recognition to improve learning efficiency and adapt learning contents and interfaces in order to engage students [66], virtual reality games or immersive virtual environments that act as real therapists in anxiety disorder treatment [9,10], recommender systems which know the users' mood and adapt the recommended items accordingly [67,68], public sentiments analysis about different events, economic, or political decisions [69], and assistive technology [70][71][72].
Emotions play a central role in explainable artificial intelligence (AI), where there is so much need for human-AI interaction and human-AI interfaces [73]. As future research directions, we intend to classify the six basic emotions into three classes, namely negative, neutral, and positive and to develop emotion-based applications starting from the results presented in this paper, in the emerging field of explainable AI.

Conflicts of Interest:
The authors declare no conflict of interest.