Machine Learning Models for Classiﬁcation of Human Emotions Using Multivariate Brain Signals

: Humans can portray different expressions contrary to their emotional state of mind. Therefore, it is difﬁcult to judge humans’ real emotional state simply by judging their physical appearance. Although researchers are working on facial expressions analysis, voice recognition, and gesture recognition; the accuracy levels of such analysis are much less and the results are not reliable. Hence, it becomes vital to have realistic emotion detector. Electroencephalogram (EEG) signals remain neutral to the external appearance and behavior of the human and help in ensuring accurate analysis of the state of mind. The EEG signals from various electrodes in different scalp regions are studied for performance. Hence, EEG has gained attention over time to obtain accurate results for the classiﬁcation of emotional states in human beings for human–machine interaction as well as to design a program where an individual could perform a self-analysis of his emotional state. In the proposed scheme, we extract power spectral densities of multivariate EEG signals from different sections of the brain. From the extracted power spectral density (PSD), the features which provide a better feature for classiﬁcation are selected and classiﬁed using long short-term memory (LSTM) and bi-directional long short-term memory (Bi-LSTM). The 2-D emotion model considered for the classiﬁcation of frontal, parietal, temporal, and occipital is studied. The region-based classiﬁcation is performed by considering positive and negative emotions. The performance accuracy of our previous model’s results of artiﬁcial neural network (ANN), support vector machine (SVM), K-nearest neighbor (K-NN), and LSTM was compared and 94.95% accuracy was received using Bi-LSTM considering four prefrontal electrodes.


Introduction
Emotion recognition is the task of recognizing human emotion. The task of identifying the emotions of others differs tremendously. The use of technology to aid people in recognizing emotions is a relatively new subject of research [1]. By using the SAM, they found a positive correlation between arousal and dominance and arousal and liking; a moderate positive correlation between familiarity and liking as well as valence. Scales of valence and arousal are not independent but have low positive correlations [2].
Generally, innovation works best when a variety of modalities are used. To date, there has been a lot of focus on automating the interpretation of nonverbal cues from video, spoken expressions from the audio, communicative context from texts, and physiology as measured by sensors [3][4][5]. However, describing a facial expression as emblematic of a specific emotion can be challenging for humans. Different people identify different emotions in the same facial expression, according to studies. It is significantly more difficult for artificial intelligence (AI). When audio-based emotion recognition is considered, the emotions conveyed are purely individual. Generalizing individual characteristics and producing an algorithm for the identification of such features tend to be unreliable [6]. Another disadvantage regarding the mentioned features used for emotion recognition is that they are voluntary, i.e., individuals may be able to mask their real emotions and are very unreliable. Hence, recognizing an emotion using brain signals tends to be more reliable as they are more uniform and involuntary concerning individuals [7].
For a computer to recognize emotions, the emotions must be classified. Emotions are usually defined either using a categorical perspective or a dimensional perspective [8]. Categorized-based emotion definition segregates emotion into numerous types, namely six basic emotions mentioned by Paul Ekman, which are fear, happiness, anger, disgust, sadness, and surprise [9]. Later, Robert Plutchik extended Ekman's basic emotions to a "Wheel of Emotions" suggesting that diverse emotions could be created by modifying some basic feelings [10]. Dimensional models aim to reduce emotions to fundamental characteristics that describe the similarities and contrasts among experiences. One such circumplex model was created by Russell, who described emotions in terms of arousal and valence [11]. Arousal is defined as "The degree of sensory stimulation that happens when a feeling is stimulated," [12] whereas valence is defined as "The cheerfulness of a feeling spanning from favorable or pleasurable to passive or undesirable" [13]. Dominance can also be considered for the analysis which adds a third dimension while defining emotions. For the work discussed in the above paper, the valence-arousal model is considered for the classification of emotions.
The dimensional presentation of human emotions using a valence-arousal scale can be seen in Figure 1. Emotions are separated into four quadrants based on valence and arousal as seen in Figure 1. Any emotion that has high valence is considered positive and emotions with low valence are considered negative.
Computers 2022, 11, x FOR PEER REVIEW 2 of 17 difficult for artificial intelligence (AI). When audio-based emotion recognition is considered, the emotions conveyed are purely individual. Generalizing individual characteristics and producing an algorithm for the identification of such features tend to be unreliable [6]. Another disadvantage regarding the mentioned features used for emotion recognition is that they are voluntary, i.e., individuals may be able to mask their real emotions and are very unreliable. Hence, recognizing an emotion using brain signals tends to be more reliable as they are more uniform and involuntary concerning individuals [7]. For a computer to recognize emotions, the emotions must be classified. Emotions are usually defined either using a categorical perspective or a dimensional perspective [8]. Categorized-based emotion definition segregates emotion into numerous types, namely six basic emotions mentioned by Paul Ekman, which are fear, happiness, anger, disgust, sadness, and surprise [9]. Later, Robert Plutchik extended Ekman's basic emotions to a "Wheel of Emotions" suggesting that diverse emotions could be created by modifying some basic feelings [10]. Dimensional models aim to reduce emotions to fundamental characteristics that describe the similarities and contrasts among experiences. One such circumplex model was created by Russell, who described emotions in terms of arousal and valence [11]. Arousal is defined as "The degree of sensory stimulation that happens when a feeling is stimulated," [12] whereas valence is defined as "The cheerfulness of a feeling spanning from favorable or pleasurable to passive or undesirable" [13]. Dominance can also be considered for the analysis which adds a third dimension while defining emotions. For the work discussed in the above paper, the valence-arousal model is considered for the classification of emotions.
The dimensional presentation of human emotions using a valence-arousal scale can be seen in Figure 1. Emotions are separated into four quadrants based on valence and arousal as seen in Figure 1. Any emotion that has high valence is considered positive and emotions with low valence are considered negative.
. Various functions of the vagus nerve which make it an attractive target in treating psychiatric and gastrointestinal disorders [14]. The effect of the pandemic on the mental health of healthy individuals is predominant for the past two years. The vagus nerve functions as a vital connection between the central nervous system (CNS) and the entire body and carries most of the parasympathetic nervous system. Stress, depression, and anxiety act as the main factor that hinders the healthy functioning of the vagus nerve, which has been a day-to-day part of every human in this era [15]. A patient's mental state can also affect the functioning of medications regarding any disease. Hence, understanding and Various functions of the vagus nerve which make it an attractive target in treating psychiatric and gastrointestinal disorders [14]. The effect of the pandemic on the mental health of healthy individuals is predominant for the past two years. The vagus nerve functions as a vital connection between the central nervous system (CNS) and the entire body and carries most of the parasympathetic nervous system. Stress, depression, and anxiety act as the main factor that hinders the healthy functioning of the vagus nerve, which has been a day-to-day part of every human in this era [15]. A patient's mental state can also affect the functioning of medications regarding any disease. Hence, understanding and recognizing patients' emotions at an early stage may assist in further medication procedures. Classifying emotions into positive and negative using brain signals requires an accurate system that extracts required features and utilizes an efficient classification methodology. Therefore, in the above paper, we provide a system that extracts features from the PSD of EEG signals, selects appropriate characteristics, and uses a convolutional neural network (CNN) as a classifier model that provides reliable results for positive and negative classification of emotions using the valence-arousal model. The main aim of the proposed scheme is to compare the accuracies of PSD of EEG signals with and without the removal of outlier samples using LSTM and Bi-LSTM techniques.
Most of the initial research on emotion recognition was carried out on observable verbal or non-verbal emotional expressions (facial expressions, patterns of body gestures). However, tone of voice and facial expression can be deliberately hidden or over-expressed and in some other cases where the person is physically disabled or introverted may not be able to emotionally express through these parameters. This makes the method less reliable to measure emotions. In contrast, emotion recognition methods based on physiological signals, such as EEG, ECG, EMG, and GSR, are more reliable as humans intentionally cannot control them. Among them, the EEG signal in the objective physiological signal is directly generated by the CNS, which is closely related to the human emotional state.
In this paper, an emotion recognition system is proposed based on EEG signal. The results indicated that the extracted PSD features are promising in recognizing human emotions. The performance of the deep learning model's, such as LSTM and BiLSTM classifiers, is studied based on the PSD features of the EEG signal from different scalp regions that include all 32 electrodes, frontal, occipital, temporal, and parietal regions electrodes separately. BiLSTM provides highest classification accuracy from the frontal region electrodes (Fp1, F3, F4, Fp2) compared with all other scalp regions and KNN, SVM, ANN, and LSTM classifiers. This improved accuracy enables us to use this system in different applications like, biofeedback applications for monitoring stress, wearable sensors design and psychological wellbeing. Finally, performance is evaluated especially focusing more on frontal electrodes.
The rest of the paper is categorized as follows: The related work given under Section 2 is based on the categorization of emotion into two emotion models. Section 3 describes the proposed methodology. Experiments and results are given in Section 4, and concluding remarks are given under Section 5 followed by a declaration and references.

Related Work
Emotion recognition has gained substantial attention due to its direct link to psychological, physiological, and human-machine interface aspects, etc. [16]. Machine learning techniques [17] used for emotion recognition using EEG are SVM Previous studies have classified emotions using various models. The feature extraction and classification methodologies considered vary from machine learning technology to deep learning techniques. Multi-target modelling was suggested by Guozhen et al. [18] as a method for identifying several continuous positive emotions that display statistical interdependence. Results revealed that (1) using LSTM as the unit regressor in an ensemble of regressor chains (ERC) obtained the best regression results on EEG features alone with the lowest RMSE = 8.325 and highest R2 = 0.346 and the best Kendall rank correlation coefficient (0.165), and (2) using specific features from alpha frequency bands of EEG signals could represent various positive emotions. D Acharya et al. [19] utilized fast Fourier transform as a feature and compared the performances of the LSTM and CNN models in which LSTM performed better with 88.6% compared to 87.7% accuracy of the CNN model. Niranjana et al. [20] obtained an accuracy of 93.25% considering the frontal electrodes. They also suggested using time-frequency features to obtain better results. Joshi VM et al. [21] adopted a Bi-LSTM classifier with a single channel, prefrontal, and 32 channels.
Shashi Kumar GS et al. [22] used a backpropagation neural network classifier to classify happy and sad emotions. Joshi VM et al. [23], extracted modified differential entropy (MD-DE) features to classify positive, negative, and neutral emotions, achieving 89.66% and 76.29% for subject dependent and subject independent respectively using Bi-LSTM. Veltmeijer EA et al. [24] investigated automatic group emotion recognition and provided a comprehensive overview on group emotion estimation that covers a wide range of subjects, from group types and emotion models to performances. Methodological improvements rely on improving the real-world applicability of current methods. The authors [25] have developed a hybrid model using different deep learning models in accordance with LSTM as classifiers. The common machine learning techniques used for emotion recognition using EEG are SVM [26,27], K-NN [28][29][30], and decision trees [31,32].
Payal et al. [33] used maximum relevance and minimum redundancy (MRMR) to reorganize features, while principal component analysis (PCA) is used to reduce extract features. When contrasted to K-NN, the adaptive particle swarm optimization (PSO) precision seemed to be 10% higher. Mashael et al. [34] considered the PSD of EEG signal as one of the features when contrasted to the deep transfer learning (DTL) model created by the scholars using quintessential classifier. Hence, random forest (RF) achieved a comparable prediction performance as DTL for the database for emotion analysis using the physiological signals (DEAP) test set.
Vaishali M Joshi et al. [35] used linear formulation of differential entropy (LF-DE) feature extractor and Bi-LSTM [36] network decoder to create a regime for recognizing emotions through EEG signals. The prediction accuracy of emotion categorization has been enhanced by 4.12% for target-dependent strategies, 4.5% for noncontingent strategies, and 1.3% for inter-dependent initiatives, according to the SEED database. In comparison to the DEAP dataset, the average result of participant noncontingent trials increased by 7.04%. Rahul Sharma et al. [37] utilized discrete wavelet transform to study the rhythm of EEG signals and reduced the irrelevant data using the PSO technique. This study was carried out using DEAP, which produced 82.01% average classification accuracy corresponding to four label classifications of emotions.
An integrated model was developed by Yongqiang et al. [38] that applies multiple graph convolution neural network models to obtain graph domain features. LSTM cells memorize the transition among two channels over time and retrieve temporal features. A dense layer achieves the results of emotional classification. In subject-dependent studies, researchers were able to classify valence and arousal with an average classification accuracy of 90.45% and 90.60%, compared to 84.81% and 85.27% in subject-independent tests.
The effectiveness of classifiers for subject-independent and subject-dependent models was considered separately by Debarshi Nath et al. [39]. The subject-dependent model produced the greatest outcomes, with precision on the valence and arousal scales of 94.69 and 93.13% respectively. SVM was the subject-independent model's top performer, with accuracy rates of 72.19% on the valence scale and 71.25% on the arousal scale. The author [40] also suggests scope for developing an efficient headgear model for real-time monitoring of emotions using the LSTM technique considering the band power of EEG signal as the feature. Their model illustrated a significant average increment of 18% in arousal and 16% in valence compared to other classifiers.
A novel method was presented by Li, Zhenqi, et al. [41] that builds an LSTM network as the classifier to investigate the temporal correlations of EEG signals and employs rational asymmetry (RASM) as the attribute to define the frequency-space domain features of EEG signals. Its mean accuracy of 76.67% was compared to a multitude of pertinent studies on DEAP. Garg et al. [42] used discrete wavelet transform as a feature for classification along with its statistical data to capture the trends and variations in the dataset and considered a merged LSTM model as a classifier. They achieved the highest accuracy for the classification of valence with 84.89%. Alhagry S et al. [43] proposed a scheme in which they used raw EEG signals as the input for the LSTM model and received 85.65% and 85.45% accuracy for arousal and valence, respectively. The authors [44][45][46] have developed a hybrid model using different deep learning models in accordance with LSTM as classifiers obtaining accuracy ranging from 81.10% to 98.21%, which was obtained by combining CNN with LSTM. For ECG, features [47] were extracted and then classified using a Bi-LSTM network model to establish an ECG-based emotion recognition model with an accuracy of 76.65% in the valance dimension and 70.15% in the arousal dimension. The best performance Computers 2022, 11, 152 5 of 17 obtained using a single modality with our method is 82.63% and 74.88% for valence and arousal, which is competitive but better than 78.75%.
Divya Acharya et al. [48] classified negative emotions considering four negative emotions. LSTM based deep learning model provides classification accuracy as 81.63%, 84.64%, 89.73%, and 92.84%. Yang J et al. [49] used differential entropy as a feature for classification and classified emotions into happy, sad, fear, and neutral using the Bi-LSTM classifier and received 84.21% accuracy. Ramzan M et al. [50] have developed a hybrid model using different deep learning models Iyer A et al. [51] proposed a multi-channel rhythm-specific CNN-based approach for the automatic detection of emotion. They obtained significant results for comparing low and high valence and low and high arousal. Zhu M et al. [52] used leave one subject out (LOSO) and 10-fold cross-validation (CV) strategies to carry out experiments on the SEED and DEAP datasets. The experimental results show that the accuracy of the proposed method can reach 89.42% (SEED) and 77.34% (DEAP).
The internationally accepted 10-20 electrode arrangement for electrode placement is generally followed while placing the electrodes atop the scalp to cover the brain lobes. From nasion to inion, measurements are performed in the median and transverse planes. Electrode placement locations are measured by dividing the transverse and median planes by 10-20% of the distance interval, as shown in Figure 2.
classifiers obtaining accuracy ranging from 81.10% to 98.21%, which was obtained by combining CNN with LSTM. For ECG, features [47] were extracted and then classified using a Bi-LSTM network model to establish an ECG-based emotion recognition model with an accuracy of 76.65% in the valance dimension and 70.15% in the arousal dimension. The best performance obtained using a single modality with our method is 82.63% and 74.88% for valence and arousal, which is competitive but better than 78.75%.
Divya Acharya et al. [48] classified negative emotions considering four negative emotions. LSTM based deep learning model provides classification accuracy as 81.63%, 84.64%, 89.73%, and 92.84%. Yang J et al. [49] used differential entropy as a feature for classification and classified emotions into happy, sad, fear, and neutral using the Bi-LSTM classifier and received 84.21% accuracy. Ramzan M et al. [50] have developed a hybrid model using different deep learning models Iyer A et al. [51] proposed a multi-channel rhythm-specific CNN-based approach for the automatic detection of emotion. They obtained significant results for comparing low and high valence and low and high arousal. Zhu M et al. [52] used leave one subject out (LOSO) and 10-fold cross-validation (CV) strategies to carry out experiments on the SEED and DEAP datasets. The experimental results show that the accuracy of the proposed method can reach 89.42% (SEED) and 77.34% (DEAP).
The internationally accepted 10-20 electrode arrangement for electrode placement is generally followed while placing the electrodes atop the scalp to cover the brain lobes. From nasion to inion, measurements are performed in the median and transverse planes. Electrode placement locations are measured by dividing the transverse and median planes by 10-20% of the distance interval, as shown in Figure 2. The numbers 10 and 20 indicate the distance between adjacent electrodes (10% or 20% of the total front-back or right-left distance of the skull). Each site has a letter to identify the lobe and a number to identify the hemisphere location. F stands for Frontal, T for Temporal, C for Central (although there is no central lobe, C letter is used for identification purposes), P for Parietal, and O for Occipital. z (zero) refers to an electrode placed on the midline. Even numbers refer to electrode positions on the right hemisphere, while odd numbers refer to the left one.
Fakhruzzaman MN et al. [53] research indicates that Emotiv EPOC can be a possible option but not recommended for implementing motor imagery application. Comparing all of the mentioned literature reviews, we can infer that Bi-LSTM performs better amongst widely used deep learning models and PSD can be a better-suited feature. Hence, in the proposed research work, we are using Bi-LSTM as the classifier to classify the emotions into negative and positive. We have conducted our research using the DEAP dataset. We have considered two cases, considering all 32 electrodes and four electrodes from the The numbers 10 and 20 indicate the distance between adjacent electrodes (10% or 20% of the total front-back or right-left distance of the skull). Each site has a letter to identify the lobe and a number to identify the hemisphere location. F stands for Frontal, T for Temporal, C for Central (although there is no central lobe, C letter is used for identification purposes), P for Parietal, and O for Occipital. z (zero) refers to an electrode placed on the midline. Even numbers refer to electrode positions on the right hemisphere, while odd numbers refer to the left one.
Fakhruzzaman MN et al. [53] research indicates that Emotiv EPOC can be a possible option but not recommended for implementing motor imagery application. Comparing all of the mentioned literature reviews, we can infer that Bi-LSTM performs better amongst widely used deep learning models and PSD can be a better-suited feature. Hence, in the proposed research work, we are using Bi-LSTM as the classifier to classify the emotions into negative and positive. We have conducted our research using the DEAP dataset. We have considered two cases, considering all 32 electrodes and four electrodes from the frontal and pre-frontal region (Fp1, F3, F4, and Fp2) shown in Figure 2 and tested. Electrode placements in the dorsolateral prefrontal cortex and orbital frontal cortex are F3, F4, and Fp1, Fp2 respectively [54]. Considering the LSTM model with the Bi-LSTM model using the PSD of EEG signal as the feature, the proposed model provides a comparison of the above-mentioned models.
Adjabi I et al. [55] developed a 2-D facial recognition that is still open to future technical and material developments for the acquisition of images to be analyzed. The attention of researchers is increasingly attracted by 3-D facial recognition. Huang H et al. [56] proposed a brain-computer interface (BCI) for patients with disorders of consciousness (DOC), such as coma, vegetative state, minimally conscious state and emergence of minimally conscious state, suffering from a motor impairment, which generally cannot provide adequate emotion expressions. The authors conclude that the BCI system could be a promising tool to detect the emotional states of patients with DOC. El Morabit S et al. [57] compared some popular and off-the-shelf CNN architectures. Most of the used architectures achieved significantly better results compared to many state-of-the-art methods.

Methodology
In this study, machine learning techniques are adopted to classify emotional states. Based on a 2-dimensional Russell's emotional model, states of emotion have been classified for each subject using EEG data. The PSD of EEG signals for each video and every participant is extracted using MATLAB code. The PSD is given as the input to the model, which later classifies the emotions into positive and negative classes.
The EEG signals from various electrodes in different scalp regions, namely frontal, parietal, temporal, and occipital, are studied. The region-based classification is performed by considering each scalp region separately. Among all other scalp region electrodes, the frontal region electrodes performed better and gave the highest classification accuracy. The results indicate that the use of a set of frontal electrodes (Fp1, F3, F4, Fp2) for emotion recognition can simplify the acquisition and processing of EEG data.

Dataset
The dataset used for this work is obtained from the online data source called DEAP. The database for emotion has indeed been described as a multi-modal directory comprising electroencephalography and other physiological signals collected from 32 participants while viewing selected video clips to understand the human emotional responses. The dataset contains 32-channel EEG signals as well as eight additional physiological signals. The chosen music clips include 40 one-minute clips which have been labelled based on their ability to evoke feelings. Thirty-two electroencephalographic streams were accumulated while participants watched the selected 40 music clips utilizing 10-20 electrode positions and a 512 Hz frequency band.
Following the viewership of the song snippets, participants underwent a self-assessment manikin (SAM) evaluation, scoring five distinct regions from 1 to 9: Arousal, valence, dominance, liking, and familiarity, as can be seen in Figure 2. The valence range extends from unsatisfied to satisfied (e.g., negative to positive), whereas the arousal range extends from inactive to active (e.g., from non-excited to excited). Dominance and liking are not represented in the 2-D model.

Proposed Algorithm
A human's brainwave signal generates immense levels of neuron signals that control all bodily functions. The human brain stores emotional experiences accumulated throughout the person's life. We can analyze a person's visceral reactions when subjected to certain situations by effectively tapping into brainwave signals. These data from brainwave signals can help bolster and justify whether a person is physically fit or suffering from mental illness [26]. Hence, using EEG signals for emotion recognition provide reliable data for analysis. The EEG data considered in the proposed scheme is acquired from the DEAP dataset. The DEAP dataset that is being used is pre-processed MATLAB data. Pre-processing of data is very much required for improving signal to noise ratio of EEG data. Eye blinks, facial and neck muscle activities, and body movements are the major EEG artefacts. To address these artefacts, pre-processing of EEG signals was done by the authors of the database before making it publicly accessible. The EEG data were down sampled from 512 Hz to 128 Hz, and EOG artefacts were removed. A bandpass frequency filter was applied from 4.0 Hz to 45 Hz. For the proposed scheme, we will be considering only the EEG signals from all 32 electrodes in 10-20 international placements and the frontal region [49]. The case scenarios used in the proposed algorithm are shown in Table 1. These electrode configurations are later tested using LSTM and Bi-LSTM as classifiers and their results are compared. The proposed work uses the DEAP dataset for EEG signals. The EEG data used are pre-processed by down sampling the signal to 128 Hz, removing EOG artefacts. A bandpass filter has been applied to acquire a signal between 4 and 45.0 Hz. The data are averaged for common reference and segmented for the 60s to get the EEG signal for each video. The PSD of EEG signals for each video and every participant is extracted using MATLAB code. One of the extracted PSD images shown in Figure 3 is in 1-D. The 2-D plot for happy and sad emotions in the frontal electrode is shown in Figures 4 and 5.
for analysis. The EEG data considered in the proposed scheme is acquired from the DEAP dataset. The DEAP dataset that is being used is pre-processed MATLAB data. Pre-processing of data is very much required for improving signal to noise ratio of EEG data. Eye blinks, facial and neck muscle activities, and body movements are the major EEG artefacts. To address these artefacts, pre-processing of EEG signals was done by the authors of the database before making it publicly accessible. The EEG data were down sampled from 512 Hz to 128 Hz, and EOG artefacts were removed. A bandpass frequency filter was applied from 4.0 Hz to 45 Hz. For the proposed scheme, we will be considering only the EEG signals from all 32 electrodes in 10-20 international placements and the frontal region [49]. The case scenarios used in the proposed algorithm are shown in Table 1. These electrode configurations are later tested using LSTM and Bi-LSTM as classifiers and their results are compared. The proposed work uses the DEAP dataset for EEG signals. The EEG data used are pre-processed by down sampling the signal to 128 Hz, removing EOG artefacts. A bandpass filter has been applied to acquire a signal between 4 and 45.0 Hz. The data are averaged for common reference and segmented for the 60s to get the EEG signal for each video. The PSD of EEG signals for each video and every participant is extracted using MATLAB code. One of the extracted PSD images shown in Figure 3 is in 1-D.        The PSD for each of the test cases is extracted using the pre-processed dataset for MATLAB in the DEAP dataset. The PSD is given as the input to the Bi-LSTM model which later classifies the emotions into positive and negative classes. In the proposed scheme, the PSD of the EEG signal is extracted using the discrete Fourier transform (DFT) of the signal.
The DFT of the EEG signal is calculated using the equation: where is a finite duration data sequence recording for any electrode and N is the number of samples in the sequence.
The PSD of the EEG signal is calculated using the Welch method: The PSD for each of the test cases is extracted using the pre-processed dataset for MATLAB in the DEAP dataset. The PSD is given as the input to the Bi-LSTM model which later classifies the emotions into positive and negative classes. In the proposed scheme, the PSD of the EEG signal is extracted using the discrete Fourier transform (DFT) of the signal.
The DFT of the EEG signal is calculated using the equation: where x n is a finite duration data sequence recording for any electrode and N is the number of samples in the sequence. The PSD of the EEG signal is calculated using the Welch method: where U is a normalization factor for the power in the window function, L is the length of the segment, w(n) window function, and f is normalized frequency. The flow chart for the proposed emotion classification scheme is as seen in Figure 6. The EEG data were down sampled from 512 Hz to 128 Hz, and EOG artefacts were removed. The PSD of EEG signals for each video and every participant is extracted. The PSD is given as the input to the Bi-LSTM model which later classifies the emotions into positive and negative classes based on valence arousal scale. Valence equal to 1 indicates positive emotion. Otherwise, it indicates negative emotion.
The schematic representation of the proposed emotion classification block diagram shown in Figure 7.
LSTM is a class of recurrent neural networks (RNNs) that can acquire long-term interrelations and use them to solve sequence classification problems. Aside from individual data pieces such as pictures, LSTM includes backpropagation, implying that it can handle the entire collection of information. This is effective in areas such as voice recognition, translation software, and others. The LSTM is a form of RNN which excels at solving a range of problems. The pictorial representation of a single LSTM cell is shown in Figure 7. In the figure below, where: C t -Cell state; H t -Hidden state; X t -Input data (PSD of each electrode for every video) tanh-tan function; σ-Sigmoid function X-Pointwise multiplication;

+-Pointwise addition
In a single LSTM cell, 't' represents a new iteration and 't − 1 is the previous iteration. Considering Figure 8, from the left, in forget gate, the bits are imported from the previous state and new input data are decided. In new memory, the goal is to determine which new information should be added to the network by combining the previous state and new input data. The output gate ensures only necessary information is given as output.
Computers 2022, 11, x FOR PEER REVIEW 9 of 1 where is a normalization factor for the power in the window function, L is the lengt of the segment, ( ) window function, and is normalized frequency.
The flow chart for the proposed emotion classification scheme is as seen in Figure 6  The EEG data were down sampled from 512 Hz to 128 Hz, and EOG artefacts wer removed. The PSD of EEG signals for each video and every participant is extracted. Th PSD is given as the input to the Bi-LSTM model which later classifies the emotions int positive and negative classes based on valence arousal scale. Valence equal to 1 indicate positive emotion. Otherwise, it indicates negative emotion.
The schematic representation of the proposed emotion classification block diagram shown in Figure 7. LSTM is a class of recurrent neural networks (RNNs) that can acquire long-term interrelations and use them to solve sequence classification problems. Aside from individual data pieces such as pictures, LSTM includes backpropagation, implying that it can

+-Pointwise addition
In a single LSTM cell, 't' represents a new iteration and 't − 1′ is the previous iterat Considering Figure 8, from the left, in forget gate, the bits are imported from the previ state and new input data are decided. In new memory, the goal is to determine which n information should be added to the network by combining the previous state and n input data. The output gate ensures only necessary information is given as output. In the proposed scheme, the single LSTM cells are combined and the LSTM archi ture is built, as shown in Figure 9. The necessary data are retrieved from features us In the proposed scheme, the single LSTM cells are combined and the LSTM architecture is built, as shown in Figure 9. The necessary data are retrieved from features using LSTM cells. The data are then connected to a fully connected layer, which is further connected to the Softmax layer. Softmax layers segregate the values according to the trained labels and emotions are classified into two labels, namely positive and negative emotions.  Bidirectional LSTM (BiLSTM) is a recurrent neural network used primarily on na ral language processing. Unlike standard LSTM, the input flows in both directions, it's capable of utilizing information from both sides. In summary, BiLSTM adds one m LSTM layer, which reverses the direction of information flow. Briefly, it means that input sequence flows backward in the additional LSTM layer. Then, we combine the o puts from both LSTM layers in several ways.
BiLSTM model classifies the emotion into positive and negative using the sum valence and dominance and arousal and liking. If the sum of valence and dominanc greater than that of arousal and liking, the emotion is said to positive. Otherwise, the em tion is negative. The schematic representation of proposed BiLSTM architecture is sho in Figure 10. Bidirectional LSTM (BiLSTM) is a recurrent neural network used primarily on natural language processing. Unlike standard LSTM, the input flows in both directions, and it's capable of utilizing information from both sides. In summary, BiLSTM adds one more LSTM layer, which reverses the direction of information flow. Briefly, it means that the input sequence flows backward in the additional LSTM layer. Then, we combine the outputs from both LSTM layers in several ways.
BiLSTM model classifies the emotion into positive and negative using the sums of valence and dominance and arousal and liking. If the sum of valence and dominance is greater than that of arousal and liking, the emotion is said to positive. Otherwise, the emotion is negative. The schematic representation of proposed BiLSTM architecture is shown in Figure 10.
input sequence flows backward in the additional LSTM layer. Then, we combine the outputs from both LSTM layers in several ways.
BiLSTM model classifies the emotion into positive and negative using the sums of valence and dominance and arousal and liking. If the sum of valence and dominance is greater than that of arousal and liking, the emotion is said to positive. Otherwise, the emotion is negative. The schematic representation of proposed BiLSTM architecture is shown in Figure 10. To detect emotions from EEG data, a deep learning-BiLSTM network is used. To adopt a pure subject-independent strategy, the model is trained and tested on DEAP database. All 32 and four frontal (Fp1, Fp2, F3, and F4 and) EEG electrodes are chosen from DEAP. Based on valence rating, positive and negative emotions are classified. The DEAP database is available in accordance with the valence and arousal scale, i.e., if valence is equal to 1, it indicates positive emotion; otherwise, it indicates negative emotion.
In the proposed approach, PSD features are extracted from EEG signal. The normalized PSD features are used for training the LSTM and BiLSTM architectures. The BiLSTM outputs are connected through a fully connected layer and a softmax layer. This softmax layer is used to generate the positive and negative emotion status.

Results and Discussion
The proposed classifier technique is applied to all the electrode configurations shown in Table 1. Figure 11 shows the best fit accuracy versus epoch and loss versus epoch graphs.

Results and Discussion
The proposed classifier technique is applied to all the electrode configurations shown in Table 1. Figure 11 shows the best fit accuracy versus epoch and loss versus epoch graphs. The optimum learning rate is 0.01 and the last layer, which was a fully connected layer, contained the softmax activation function. The numbers of epochs were taken as 5 in the model.
The scheme has been applied to all the configurations mentioned in Table 1 by firs considering all the data and in the next case. The outlier samples have been removed usin the MATLAB function rmoutliers. All the results obtained for the test cases are discussed in the next section. The optimum learning rate is 0.01 and the last layer, which was a fully connected layer, contained the softmax activation function. The numbers of epochs were taken as 50 in the model. The scheme has been applied to all the configurations mentioned in Table 1 by first considering all the data and in the next case. The outlier samples have been removed using the MATLAB function rmoutliers. All the results obtained for the test cases are discussed in the next section.
A range of key evaluation metrics is used for our ensemble models. The following measures were used to test model reliability: Accuracy, precision, recall, and F1-score are calculated from Equations (3)-(6), respectively.
where TP (true positive) and TN (true negative) are the correct predictions. FP (false positives) and FN (false negatives) are incorrect predictions.
Confusion matrix: A confusion matrix is a square matrix that gives a pictorial representation of the instances correctly classified by the model. The diagonal values indicate the correct prediction (Both true positives and true negatives) and the other columns indicate the misclassified instances (Both false positives and false negatives).
In this paper, classifiers, such as KNN, SVM, ANN, LSTM, and Bi-LSTM, were used as the state-of-art classifiers since they are known to give better results. The performance of these models is summarized in Tables 2 and 3.  According to the electrode's configuration, the size of the dataset, i.e., extracted PSD, varies. Hence, the number of iterations for each test case varies. For all the test cases, 50 epochs have been considered and a learning rate of 0.01 is maintained throughout the experiment using a single CPU. The confusion matrices of LSTM and Bi-LSTM are given below. Figure 12 presents the use of the LSTM, and Figure 13 Bi-LSTM. In the confusion matrix, the diagonal elements represent 'True prediction' and the rest are 'False prediction'.
According to the electrode's configuration, the size of the dataset, i.e., extracted varies. Hence, the number of iterations for each test case varies. For all the test cas epochs have been considered and a learning rate of 0.01 is maintained throughou experiment using a single CPU. The confusion matrices of LSTM and Bi-LSTM are below. Figure 12 presents the use of the LSTM, and Figure 13 Bi-LSTM. In the conf matrix, the diagonal elements represent 'True prediction' and the rest are 'False p tion'.  From [53], the researchers received the highest accuracy of 89.66% using B By comparing the results with our proposed scheme, we can infer that Bi-LSTM h test accuracy of 94.95% with a 70-30 train test split, and models were evaluated u five-fold validation method. The comparison can be seen in Table 4 from all the li reviews in the designated section and the results obtained from the proposed sch In the proposed method, the 2-D model (arousal and valance), minimum el are considered. The experimental results demonstrate that the region-based classi provide higher accuracy compared to selecting all 32 electrodes. In a recent devel From [53], the researchers received the highest accuracy of 89.66% using Bi-LSTM. By comparing the results with our proposed scheme, we can infer that Bi-LSTM has better test accuracy of 94.95% with a 70-30 train test split, and models were evaluated using the five-fold validation method. The comparison can be seen in Table 4 from all the literature reviews in the designated section and the results obtained from the proposed scheme.  [42] Statistical data LSTM 84.89% Yang J et.al. [49] Differential Entropy BiLSTM 84.21% Alhagry Salma et al., [43] Raw EEG Data LSTM 85.65% D. Acharya et.al. [19] Joshi VM et al. [ In the proposed method, the 2-D model (arousal and valance), minimum electrodes are considered. The experimental results demonstrate that the region-based classifications provide higher accuracy compared to selecting all 32 electrodes. In a recent development, several neurophysiological studies have reported that there is a correlation between EEG signals and emotions. Studies showed that the frontal scalp seems to store more emotional activation compared to other regions of the brain, such as temporal, parietal, and occipital. From the experimental results of this study, the frontal region gives higher classification accuracy. Moreover, among different brain regions, the frontal region demonstrated improved performance in classifying positive and negative emotions.
There are different patterns to understand the emotional conditions, including speech, face, gestures, etc. The physiology behind emotional states is associated with the limbic system, which is a major part of the brain. EEG signals originating directly from the brain significantly have the signature of emotional states.

Conclusions
The main aim of our work was to use the PSD of EEG signals to classify emotions into positive and negative emotions, which we have successfully achieved. By using Bi-LSTM, we were able to improve accuracy. We have received better accuracy in both the classifiers while considering frontal electrodes compared with all the 32 electrodes. However, Bi-LSTM performed the best amongst the classifiers utilized in our work. In the proposed scheme, we have achieved an average accuracy of 94.95%.
The classification of positive and negative emotion with and without removing the outlier samples of the EEG signal using Bi-LSTM is performed for all scalp regions, i.e., frontal, parietal, occipital, temporal, and even for all 32 electrodes. Among these regions, frontal region electrodes (Fp1, F3, F4, and Fp2) and all 32 electrodes showed the highest classification accuracy. Without removing the out layered samples of EEG signal, the classification accuracy obtained is 90.25% and 91.25% for considering all 32 electrodes and four frontal electrodes, respectively. After removing the out layered samples, the classification accuracy increased to 92.15% and 94.95% for considering all 32 electrodes and four frontal electrodes, respectively.
For further development, electrode-specific work, further identification of particular emotions, and integration of a greater number of features can provide insight into the specific electrodes that contribute to analyzing the change in electrical signals based on the emotions induced by the stimuli. This improved accuracy enables us to use this system in different applications, e.g., wearable sensor design and biofeedback applications for monitoring stress and psychological wellbeing.