An Ensemble Learning Approach for Electrocardiogram Sensor Based Human Emotion Recognition

Recently, researchers in the area of biosensor based human emotion recognition have used different types of machine learning models for recognizing human emotions. However, most of them still lack the ability to recognize human emotions with higher classification accuracy incorporating a limited number of bio-sensors. In the domain of machine learning, ensemble learning methods have been successfully applied to solve different types of real-world machine learning problems which require improved classification accuracies. Emphasising on that, this research suggests an ensemble learning approach for developing a machine learning model that can recognize four major human emotions namely: anger; sadness; joy; and pleasure incorporating electrocardiogram (ECG) signals. As feature extraction methods, this analysis combines four ECG signal based techniques, namely: heart rate variability; empirical mode decomposition; with-in beat analysis; and frequency spectrum analysis. The first three feature extraction methods are well-known ECG based feature extraction techniques mentioned in the literature, and the fourth technique is a novel method proposed in this study. The machine learning procedure of this investigation evaluates the performance of a set of well-known ensemble learners for emotion classification and further improves the classification results using feature selection as a prior step to ensemble model training. Compared to the best performing single biosensor based model in the literature, the developed ensemble learner has the accuracy gain of 10.77%. Furthermore, the developed model outperforms most of the multiple biosensor based emotion recognition models with a significantly higher classification accuracy gain.


Introduction
Human-Computer Interaction (HCI) research is focused on making interaction with computers more productive and interactive. One of the methods used for improving the interaction between humans and computers is to provide emotional intelligence to computing systems. Such systems are capable of adapting depending on the emotional state of the user. Some examples for such systems include entertainment systems, healthcare systems, adaptive learning systems and computer games.
Previous studies have investigated different methods to provide emotional intelligence to computers. Among those methods, facial image based emotion recognition is the most widely used method since this method can recognize a wide range of emotion types [1,2]. Another approach is the speech signal analysis that determines the human emotion by analysing the patterns in the speech signal [3,4]. In addition, direct examination of the person is one of the most sophisticated ways for human emotion recognition [5,6]. These methods use different types of bio-signals to continuously

Related Work
In recent years, there has been an increasing amount of literature on human-computer interaction methods to provide emotional intelligence to computers. Emotional intelligence is widely used to develop emotionally-aware healthcare monitoring systems, computer games and entertainment systems and safe driving systems. In computer games, emotional intelligence can be used to evaluate the player's affective state for dynamic game content generation [5,21,22]. Similarly, in vehicle safety systems, emotion recognition models are used to monitor the affective state of the driver while operating [23,24]. Furthermore, in health care systems, emotional intelligence is employed to monitor the emotional state of patients [6,25,26].
Rattanyu and Mizukawa [9] discuss speech analysis, facial feature analysis and bio-signal processing as the primary methods for emotion recognition. Firstly, speech based emotion recognition methods determine the emotion by analysing a given speech signal. The main drawback of this method is that the user needs to speak continuously if the system wants to figure out the emotional state. Secondly, facial image based recognition is another widely used method for emotion recognition [1,2]. Although it provides accurate predictions for emotions, the main problem in this method is that some people tend to mask their emotional states (social masking) while predicting [7,8]. Finally, bio-signal processing methods use different types of bio-signals to predict emotions. A bio-signal based method is an adequate solution for recognizing emotions compared to other methods. Because of its unmasked nature, bio-signals improve the predicting accuracy compared to facial image based recognition methods [8]. In addition, since it is available continuously, unlike speech based systems [3,4], the system can continuously identify the emotion level.
Numerous studies have attempted to use different types of bio-signals for detecting emotions [9,10,27]. Kim and André [8] developed an emotion recognition model incorporating four different biosensors: electrocardiogram, skin conductivity, electromyogram and respiration. In their investigation, they have achieved 70% accuracy for a person independent of emotion classification. The developed model was able to classify a given set of bio-signal patterns to four emotion classes: anger, sadness, pleasure and joy. In another major study, Rattanyu and Mizukawa [9] developed an emotion recognition model using ECG signals with a classification accuracy of 61%. In a study that set out to develop a neural network based emotion recognition model, Yoo et al. [28] developed a model incorporating ECG signals and skin conductivity. In their study, they have formed a classification model with 80.1% accuracy. In another major investigation, Ayata et al. [29] developed an emotion recognition model to classify arousal and valence using galvanic skin response. The mentioned model incorporates features from empirical mode decomposition and statistical analysis methods. Among multi-sensor based emotion recognition models, the model developed by Nazos et al. [30] has the ability to recognize sadness, anger, surprise, fear, frustration and amusement with up to 83% accuracy. Furthermore, the recent investigation by Gouizi et al. [31] has an accuracy of 83% for recognizing six emotions by using six different biosensors. More information on biosensor based emotion recognition can be found in the recent state-of-art reviews by Jeritta et al. [32] and Egger et al. [33].
Murugappan et al. [34] discuss the challenges and limitations in multi-sensor based emotion recognition. One of the major challenges is the increased computational complexity due to multiple sensor data streams and algorithmic requirements. The other factor is the limitation to subjects' freedom (movements) due to multiple sensor probes, wires, etc. By building on the concept of simplicity, they have been able to develop an emotion recognition model with a classification accuracy of 66.48% by only using ECG signals. Considering all of these factors, the selected method should be able to provide high emotion recognition accuracy with a minimum number of sensors.
A number of studies have examined the use of ECG signals for emotion recognition [7,10,27,34,35]. An ECG based method is an adequate solution due to four important reasons. Firstly, the ECG signal is a result of activities in the heart that has nerve endings from the autonomic nervous system that governs the behaviour of each emotion [11]. Secondly, ECG sensors can be used as a wearable device [36]. Thirdly, it is convenient to use because ECG signals can be captured from different parts of the body [37]. Finally, it has a high amplitude compared to other biosensors [9].
To date, various methods have been developed and introduced to extract features from ECG signals. One commonly used method is the heart rate variability analysis [8,28,38]. HRV analysis is a broadly used method in biomedical engineering applications [39]. The method developed by Ferdinando et al. [38] using HRV analysis had an accuracy of around 59% for identifying arousal and valence state of a person. Similarly, another study that developed an emotion recognition model incorporating ECG signals and skin resistance had 80% accuracy for recognizing the four quadrants of the discrete emotional model [40].
Another widely used method is the empirical mode decomposition. EMD is one of the well-structured approaches to analysing non-stationary and nonlinear data [41]. Furthermore, according to investigations done by Manjula and Sarma [42], compared to wavelets, EMD performs better while extracting spectral power based features. Foteini et al. [43] point out that the first six intrinsic mode functions generated from the EMD method relates to a specific activity in the heart. Developing on that, the number of studies have used empirical mode decomposition for analysing ECG signals [7,27]. Jerrita et al. [7] investigated the use of Hilbert-Huang transform (HHT) for EMD based feature extraction and came up with a classification model with 54% accuracy for identifying six emotions in the discrete emotion model [40].
With-in beat analysis is another method for ECG based emotion recognition that has a high emotion recognition accuracy compared to EMD and HRV methods. This method was introduced by Rattanyu and Mizukawa [9] for recognising six emotions in the emotional spectrum. Their model was able to identify an emotion with up to 61% accuracy using ECG signals.
Some of the studies have used discrete Fourier transform (DFT) to extract frequency domain features from the ECG signal. Jerritta et al. [10] discuss the advantages of using frequency domain features compared to EMD based features. They claim that, unlike EMD features that provide an idea about the local properties of the ECG wave, the DFT method provides information about the frequency content of the signal. In their study, they have achieved 54% accuracy for recognizing neutral, happiness, sadness, fear, surprise and disgust emotions from ECG signals utilizing DFT based features of ten intrinsic mode functions derived from the EMD method.
Collectively, most of the studies have used different analysis methods to extract features from ECG signals. In HRV analysis, the HRV time series is generated only by considering the R-R interval variations of the ECG wave. However, the features extracted from this method represent features from both the time domain and the frequency domain of the HRV wave. Similarly, the EMD technique decomposes the signal into a set of oscillating signals. The features extracted from this method also correspond to a set of fragmented features that has correlations to an ECG wave. However, compared to these two methods, the within beat method analyses the raw ECG wave in the time domain. In addition, compared to frequency domain based features extracted by EMD and HRV methods, the DFT method provides an overview of the frequency domain of the raw ECG wave. Each of these approaches has its own advantages, and the features generated by all of these techniques represent a broad range of features that correspond to different domains and spaces of the ECG wave. However, most of the ECG based feature extraction methods in the literature have emotion recognition capability around 55% for different types of classification requirements.
Together, these studies highlight the need for an accurate emotion recognition model with a minimum number of biosensors. The studies presented thus far provide evidence that ECG is the best method for capturing bio-signals because ECG signals contain emotion-related information. In addition, considering the accuracies gained from the classification models, there is a need for higher classification accuracy. However, the methods represented in the literature extract a wide range of features from the ECG wave, and they are sophisticated methods for examining time-varying data. Up to now, a number of studies have investigated different approaches for ECG based emotion recognition. However, up to now, no one has investigated the feasibility of combining well-known ECG based feature extraction methods to select an optimal set of features that gives higher emotion classification accuracy.
Considering most of the studies mentioned in the literature, it is apparent that the majority of them have used traditional single learner algorithms as the prediction model. Most of the considered algorithms include support vector machines, K-nearest neighbour, Fisher analysis and artificial neural networks. Even though most of the mentioned algorithms are well-equipped techniques, a majority of them lack the ability to recognize emotions with a higher classification accuracy. Recently, ensemble learning methods have been used to improve the classification accuracy of various problems in different domains, and they have gained significant accuracy improvements after applying these techniques [15,16,19]. Furthermore, even the research in the domain of biomedical signal analysis have also used these ensemble techniques to improve the model performance [12,17,18].
Even though there has been an extensive amount of research conducted in defining primal emotions for humans, yet, while developing prediction models, research has selected different emotions as their target emotions [44]. This investigation is based on the 2D emotional model proposed by J. A Russel [45] where the emotions are placed in a 2D arousal and valance space. To be more broad in the aspect the emotion selection, this study considers the primal emotion of each emotional quadrant as the selected emotion. This selection will improve the diverse nature of the predictions made in the study. Furthermore, there are similar studies that had the same set of emotions as their targets, and those gained classification accuracies will be beneficial while benchmarking the developed model. Therefore, the analysis of this study is focused on recognizing four primal emotions, namely: anger; sadness; pleasure; and joy. Additionally, this study presents the classification results of two models developed incorporating two additional emotions in the emotional spectrum. Furthermore, a complete overview of the emotions and their organization in the arousal valance space is described in the next Section 3.1.3.
The main objective of this paper is to evaluate the capability of ensemble learners for biosensor based human emotion recognition that requires higher prediction accuracies. This research combines four ECG based feature extraction methods, namely: HRV; EMD; WIB; and DFT based. The first two techniques are the most widely used methods in the literature, and this study uses the with-in beat method because of its high emotion recognition accuracy. Additionally, this study introduces a novel method that extracts a set of frequency-domain features from ten frequency bands of the ECG wave employing discrete Fourier transform (named as TFB features). As an additional step for the ensemble learning procedure, the machine learning analysis of this paper selects a set of optimal features by combining the mentioned feature extraction methods for recognising anger, sadness, joy and pleasure.

Material and Methods
As discussed earlier, the principal objective of this research is to suggest an ensemble learning approach for ECG based emotion recognition by combining four ECG feature extraction methods. The selected feature extraction methods can be listed as follows: HRV, EMD, WIB and TFB. Section 3.1 of the methodology describes the 2D emotion model and ECG signal acquisition. Moving forward, Section 3.2 addresses the signal pre-processing algorithms. After that, Section 3.3 describes selected feature extraction methods in detail. Finally, Section 3.4 of the methodology illustrates the machine learning process.

Experiment for ECG Data Collection
This section describes the process conducted to acquire data from subjects to develop a machine learning model. The first section of this subsection provides insight into selecting a suitable ECG sensor for capturing ECG data. The second section explains the ECG data capturing algorithm developed. Moving forward, the following section talks about the adapted discrete emotional model. The last section explains the ECG data collecting experiment in detail.

ECG Sensor
As there are various types of hardware out there that records ECG signals, hardware selection was done based on a few factors. Firstly, the subject should not feel restricted while wearing the hardware as it has a direct impact on the comfort of the subject. Secondly, recorded ECG signals should not be too noisy as too much noise will make the processing difficult. Finally, the hardware should be financially affordable. After considering all, a Spiker-Shield Heart and Brain sensor was selected as it is affordable and has an inbuilt noise sensor. The subject only needs to have a few plasters on their hand in order to record the ECG signals, which makes it not too disturbing to the subject as well. Figure 1 shows an image of the wearable sensor used.

Signal Collection
A simple algorithm was developed to acquire signals from subjects by communicating with an Arduino microcontroller. The sampling rate of the signal was set to 1000 Hz, and the baud rate of the serial communication unit was adjusted to 115,200 bps. Since the emotion-related changes can be observed in 3-15s of the ECG signal frame, the length of the ECG signal was set to a 20 s interval [32]. The algorithms sent the captured 20 s ECG wave frame to the data collection node via a Nodejs asynchronous function interface. Then, the data collection node captures the data and writes the values to a data file (.txt) indicating the subject ID, captured time and the emotion. Later, this information is used to filter the emotion elicited data frame from the data space.

Discrete Emotional Model
As shown in Figure 2, the discrete 2D emotional model [45] places all human emotions on two axes, namely: arousal; and valence. The first quadrant of the emotional model includes highly aroused and valenced emotions. This quadrant holds joy as its primal emotion and other sub-emotions such as excited, astonished, and delighted as secondary emotions. The second quadrant of the emotional model, which represents low aroused and high valenced emotions, includes emotions like pleasure and calm. Next, the third quadrant of the emotional model incorporates emotions such as sadness and digest. Finally, the fourth quadrant of the emotion model includes anger, fear and annoyed emotions, which represents the low valenced high aroused scenarios.

Experiment
The designed experiment captures six emotions in the discrete emotional model, namely: joy; sadness; pleasure; anger; fear; and neutral. A majority of these emotions represent the primal emotions of the emotion spectrum, and those are the emotions that were intended be identified in this study using ECG signals. The other emotions (fear and neutral) were chosen to conduct the comparative analysis with the literature.
As for the design of the experiment, firstly, a set of videos were collected by consulting with domain experts to elicit selected target emotions. Each of these videos was 3-10 minutes in length.
Subjects were invited into a disturbance-free environment and sensors were fixed on to them in order to record ECG signals. Then, each and every video was shown to the subjects having two minutes of breaks in between. At the end of each video, subjects were asked to write down their emotional experience throughout the video in a pre-designed feedback paper, highlighting points of emotional climaxes. If the subject's emotional climaxes match with the target emotion, then it is marked as a successful attempt (i.e., a hit-see the Table 1). These climaxes were synchronized with the ECG data collection unit, and later this information was used to filter the emotion-related ECG signals. This is achieved by measuring the ECG signals with time information, and then by matching the time of the emotional climax and the signal time. Table 1 above illustrates the information regarding the selected video clips and durations. Even though they have marked as a specific emotion-related video, subjects were not aware of the intended emotion level of each. Furthermore, to further eliminate the bias in the results, the order of the video play was also changed. Collectively, compared to other three primal emotions, anger video has the lowest hit rate, and this phenomenon was also observed in previous studies [46]. Despite that, other selected videos had a significantly better chance of eliciting target emotions.  Figure 3 shows the experimental setup and the environment while two subjects were going through the designed experiment. Furthermore, subjects were of both genders, between 22-26 years of age and a total number of 25 subjects participated in the experiment. Out of these 25, ECG signals of three participants had to be removed due to noise issues and signal anomalies. The rest of the ECG signals were used for the proceeding work. The final filtered data set contains 488 20 s ECG waves that include, 105 anger waves, 110 sadness waves, 174 joy waves and 99 pleasure waves. Furthermore, it comprises 165 data frames of fear emotion and 103 from neutral emotion.

Signal Pre-Processing
The pre-processing procedure of the signal consists of three main steps: filtering, de-trending and smoothing. First, a Butterworth bandpass filter with a frequency range of 0.05-100 Hz was used to remove the noise from the ECG signal [8]. The resulting signal shows a trended pattern as a result of filtering near the DC component. Therefore, Algorithm 1 was used to stabilize the signal.

Algorithm 1 De-Trending
Require: Algorithm 1 describes the de-trending procedure used to stabilize the signal. The algorithm takes a signal x[n] with N samples and stabilizes the signal by dividing the signal into small segments (K number of segments). First, the algorithm fits each signal segment to a second order polynomial. After that, the trend of the segment is estimated by providing the time variation of the signal. Then, the predicted trend, which is a time variation of a second order polynomial, is reduced from the original signal. After de-trending, the resulting signal was further smoothed using a Gaussian kernel, and it was made sure that the smoothing procedure preserves the vital information of the wave. Figure 4 illustrates the resulting signals after conducting each step.

Feature Extraction Methods
This subsection of the study discusses the selected feature extraction methods in detail. The first section of this subsection explains the PQRST detection algorithm. Then, the second section describes the process of generating the HRV time series and the HRV analysis that is based on the generated HRV time series. Moving on, the next section of the feature extraction methods talks about empirical mode decomposition based features. After that, the third section discusses the with-in beat analysis technique. Finally, the last section of this subsection presents the novel frequency band based feature extraction technique used for ECG based feature extraction.

PQRST Detection
As Figure 5 shows, the ECG signal pattern is a result of a series of waves associated with the activities of the heart [47]. The ECG pattern consists of the P wave, QRS complex followed by that the T wave. Each of these waves corresponds to a specific activity in the heart (repolarization or depolarization).
To detect PQRST wave positions in an ECG signal, first, a simple algorithm was designed to find the R peak locations of each QRS complex. Then, the identified R peak locations were used to segment out QRS complexes from the ECG signal. After that, since the ECG signal has a specific pattern, local minima and local maxima detection methods were employed to figure out PQST wave locations from each segmented QRS complex. Figure 6 illustrates the PQRST positions discovered using a PQRST detection algorithm. These computed locations were later employed in HRV analysis and WIB analysis. In HRV analysis, only R peak locations are used to compute a set of diverse features, whereas all statistical features of WIB analysis are computed considering different peak-to-peak intervals (PR, RS, QRS, etc.).

Heart Rate Variability Analysis
Heart rate variability analysis (HRV) is one of the most commonly used methods for ECG feature extraction [8,38]. To compute the HRV time series, first, the ECG signal was processed using the PQRST position detection algorithm. After that, the detected R positions (R peaks) were used to compute the R-R interval variations of the ECG wave. Finally, the R-R intervals and the corresponding cumulative sums of R-R intervals were used to compute the interpolated HRV time series. Figure 7 shows a computed HRV time series for a selected subject.  Table 2 presents an overview of extracted features and their respective domains. The standard deviation of NN intervals (sdnn), the mean value of NN intervals (m_nn), root mean square of NN intervals (rmssd) and max value of NN (m_nn) intervals are taken as statistical features. The number of pairs of neighbouring NN intervals varying by more than 50 ms (nn50) and the NN50 over the total number of NN intervals (pNN50) are extracted as additional time-domain features. Here, NN refers to the Normal-to-Normal interval, and it can be also seen as the R-R interval that was adopted in this study. Spectral powers of LF (l f ), VLF (vl f ) and HF (h f ), power ratios of LF/HF (l f _h f ), the low-frequency power in normalized units (l f nu) and the total power (total_power) are used as frequency-domain features. The Poincaré plots transfer the R-R intervals to a different geometric domain and sd1 and sd2 features are calculated as the geometric deviations between consecutive R-R intervals. A detailed explanation of each method and respective feature notations can be found in [8].

With-in Beat Features of the ECG Signal
This section implements the method proposed by Rattanyu and Mizukawa [9] for emotion recognition from ECG signals using with-in beat features. With-in beat information of ECG signal includes PR interval, ST interval and QRS interval (i.e., QS). Unlike the HRV method, this method considers the variation of inner pulses of the ECG signal. The with-in beat method computes five different statistical features from each interval. They are mean (mean), maximum (max), minimum (min), median (median) and standard deviation (sd).
To compute with-in beat features, first, the ECG signal was sent to the PQRST detection algorithm. The algorithm returns the locations of identified PQRST positions, and those locations are then used to compute the considered intervals. Table 3 illustrates the computed features and their corresponding notations. Each feature corresponds to an interval (IN) is in the form of Label (1): min_IN, max_IN, sd_IN, mean_IN, median_IN]. (1)

Empirical Mode Decomposition Based Features
Empirical mode decomposition decomposes a given signal to a finite number of signals called intrinsic mode functions (IMF). This study computes four different features from each IMF. They are spectral power of IMF in the time domain, the spectral power of IMF in the frequency domain, instantaneous frequency of IMF and spectral power of the instantaneous frequency spectrum of the IMF. Collectively, this EMD based feature extraction method extracts 24 features from ECG signal in both time domain and frequency domain. Figure 8 illustrates the first six IMFs generated from the EMD procedure with the original ECG wave.
The Welch's [48] method was used to compute the spectral power in the frequency domain (spec_p f ), and Hilbert transform was adopted to estimate the instantaneous frequency (mean_i f ) of the IMF. The spectral power of the instantaneous frequency spectrum (spec_p f ) was calculated using (2). The feature vector computed using IMF i (i ∈ [1, 2, . . . , 6]) is in the form of Label (3),

Ten Frequency Band Analysis
A number of studies have explored the use of different frequency band based features for emotion classification [8,34]. Elaborating on that, this subsection of the study presents ten frequency band analysis (TFB) for emotion recognition. As shown in Figure 9, the frequency range of the ECG signal falls between 0-100 Hz. The developed method divides the ECG frequency range into ten different sub-bands having 10 Hz bandwidth for each and computes the spectral power of each sub-band. This analogy will provide a different set of frequency domain features compared to HRV time series based features and EMD based features. The selection process of ten frequency bands was based on an empirical study on different spectral power bands within the range of 0-100 Hz. This selection criterion does not consider the physiological aspects of each frequency band, and even the selection of spectral power, as the measure is based on similar related studies that used spectral power as one of the signal features. First, a smaller analysis was conducted to find the optimal sub-band value varying from 1-20 Hz, while making sure the collected number of bands have an adequate number of feature values for classification. Then, the final result, which showed the highest ensemble based classification accuracy, was chosen as the number of bands (in this case, 10 bands at 20 Hz each).
First, Welch's method was used to compute the frequency power spectrum of the ECG signal. This method computes the frequency power spectrum by splitting the signal into a set of overlapping segments and taking the average squared magnitude of each frequency component ( f ∈ [0, . . . , f s /2]).
The Welch's method takes a discrete signal with N samples (x[n]), sampling frequency of the signal ( f s ), length of a segment (l seg ), overlapping length of a segment (l over ) and the window function (w[n]) as input arguments, and then returns the power spectrum (P x ) and the frequency distribution (F x ) of the given signal. Algorithm 2 describes an overview of the steps followed in Welch's method.

Algorithm 2 Welch's Method
A more optimized version of the Welch's method in Scipy API [49] was used to compute the frequency power spectrum of the ECG signal by setting l seg as 256, l over as 128 and w[n] as the Hanning window.
After computing the frequency power spectrum of the ECG wave, a 0-100 Hz frequency range of the ECG frequency spectrum was filtered out and divided into ten sub-bands. Then, the trapezoidal integration was used to compute the spectral power of each sub-band. The derived ten spectral power features can be expressed as following Label (4):

Emotion Recognition Model
This section describes the machine learning process that is used to develop a machine learning model for identifying four major emotions: anger, sadness, pleasure, joy. The pre-processed data contains 488 data frames and 63 features for each data frame. The machine learning procedure of this study is divided into two parts: the (1) Ensemble learner based machine learning process, and the (2) Ensemble learner and feature selection based machine learning process.
The first strategy of this analysis is based on the traditional way of ensemble learning where the ensemble learner chose the features and dynamically derives a set of diverse learners. The second technique is inspired by recent studies related to feature selection before ensemble learning [13,14,19]. Both of the strategies mentioned employ six popular ensemble learning methods that cover most of the modern bagging and boosting techniques. Furthermore, adopted feature selection methods represent a set of diverse techniques for machine learning feature selection (statistical, search based and algorithmic).
Adopted Ensemble learners Each of these procedures is followed by a model parameter optimization process and a model evaluation process. The Grid Search algorithm [50] was used as the model parameter optimizer and traditional 10-Fold Cross-validation was used to evaluate the model. Prior to the machine learning process, the data was normalized by a Robust scalar, and all of the algorithms used in this section were taken from Python Scikit API [50].

Results and Discussion
The results section of this paper will elaborate on four Sections. Section 4.1 results from ensemble methods without feature selection, Section 4.2 results from ensemble methods with feature selection, Section 4.3 results overview and Section 4.4 computational requirement analysis. Section 4.4 computational requirement analysis for combined features. As explained, the first two sections of the results and discussion will present the data gathered from ensemble learning and prior feature selection. Furthermore, this section will investigate whether feature selection is a worthy step for ensemble learning algorithms. Moving on, the results overview section compares the final results with different classification models in the literature by discussing emotion elicitation methods, experimental procedures and limitations. The final sector describes the computational requirements of each adopted feature extraction method, and then provides reasons for selecting combined analysis with ensemble learning as an optimal method for ECG based emotion recognition. It should be noted that the results mentioned in this section is for recognizing four major emotions in the 2D emotional model. Table 4 illustrates the results obtained for different ensemble learning techniques while presenting model parameter optimization results obtained using the Grid Search Algorithm. According to the results, an Extra Tree Classifier shows the highest prediction capability, which is 70.09% with a standard deviation of 3.34%. Furthermore, the ensemble model developed using the Random Forest classifier has the second-highest classification accuracy, and this value is slightly lower than the capability of an Extra Tree Classifier. Examining other techniques, the Gradient Boost classifier also shows adequate performance for emotion classification. However, the ADABoost ensemble with different base learners shows relatively lower prediction accuracies.  Table 5 illustrates the classification accuracies after selecting features employing different feature selection techniques. This table only presents the accuracies gained from the three best performing models observed in the previous section. In general, different models show different classification performances while undergoing diverse feature selection methods. For instance, Random Forest Classifier shows the highest accuracy for Model based feature selection, whereas a Gradient Boost classifier provides better results for the Recursive Feature Elimination technique. Examining all results, it is apparent that the model selection procedure improved the individual accuracy from a significant value. As an example, it raised the accuracy of the Extra Tree ensemble from 70.09% to 80.00%. Furthermore, Recursive Feature Elimination and the Feature Selection by Model methods provide better results compared to the chi-square test with a chi statistic greater than 2.0. To summarise the results, the best performing ensemble learner for four major emotions classification is the Extra Tree Classifier with the selected features listed in Table 6. These features are selected using the Model based feature selection approach by providing the Extra Tree Classifier as the feature selector. Table 5. Ensemble methods with feature selection.

Method
RF ETC GB Table 6. Feature selection results.

Method N D Features
EMD 6 24 T spec_p_1 ,spec_p_2, spec_p_3,spec_p_4,spec_p_1, spec_p_6, ins_p_1, ins_p_2, ins_p_3, ins_p_4, ins_p_5, ins_p_6 F spec_pf_1, mean_if_1, spec_pf_2, mean_if_2, spec_pf_3, mean_if_3, spec_pf_4, mean_if_4, spec_pf_5, mean_if_5, spec_pf_6, mean_if_6 HRV 3 14 T sdnn, mn_nn, pnn50, m_nn, rmssd, nn50 F lf, ffnu, lf_hf, total_power, hfnu, vlf G sd1, sd2 WIB 6 15 T median_pr, mean_pr, max_pr, sd_pr, min_pr, median_qrs, mean_qrs, max_qrs, min_qrs, sd_qrs, median_st, max_st, min_st, mean_st, sd_st According to the results presented in Table 6, most of the features from the TBF method got chosen as optimal features for emotion recognition. However, compared to the TFB method based features, EMD and HRV based features show less capability for emotion recognition. Most of the HRV features that were selected in the analysis contain statistical features of R-R interval variations. These features are quite similar to with-in beat analysis based features introduced in the literature that also have good capability (nearly one-third of them got selected). The only difference is that, compared to outer beat interval based statistical features used in HRV analysis, WIB analysis computes statistical features of the inner beat intervals of the ECG signal. Moreover, this further supports the fact that raw ECG patterns based features are the most efficient features for emotion recognition compared to different analysis based features. Table 7 illustrate the results gathered from training different models with adopted emotions. Model A* is the principal model of this study and it has the ability to recognize four major emotions in the discrete emotion model with up to 80% accuracy. As mentioned, the experiment procedure also collected some additional emotions to prove the effectiveness of ensemble learning, and also for benchmarking aspects. Those models and their capabilities are also listed in the table, and the following paragraphs will compare those classification accuracies with the literature. Additionally, the table also depicts the individual gains of the Ten Frequency Band (TFB) analysis method introduced in this study. Even though those features do not comprise a direct physiological aspects of human emotions, those features tend to have a better performance compared to others. Therefore, it should be noted that more investigations should be conducted to evaluate the physiological aspects of those features. As Table 7 shows, there is a significant accuracy improvement due to combining selected feature extraction methods with an ensemble learning process. The accuracy for identifying four major emotions is 80.00%, and this value is a significantly better result compared to the literature.
The method developed by Kim and André [8] combined four different sensors for detecting four primal emotions in the emotional spectrum and came up with an accuracy of 70%. However, the findings in this study provide insight into using a single sensor for developing the same classifier with an accuracy of 80.00%. Another investigation that was conducted to develop an emotion classification model by a neural network by Yoo et al. [28] achieved a recognition ability of 80% for identifying the four emotion quadrants using ECG and skin resistance. They have considered six subjects for their investigation, and the bio-signals were captured at different times of the day for a week. However, the classifier still presented in this investigation includes ECG patterns from 22 different subjects with a slightly lower accuracy of 78.12%. Furthermore, this study holds higher emotion recognition accuracy compared to the investigation by Maaoui et al. [51]. In their research, they have developed an emotion classifier to identify amusement, contentment, disgust, fear, neutral, and sadness from five biosensors. The accuracy of the developed model was 46.5%, and the method proposed in this investigation can still identify six emotions with up to 75.11% accuracy. Furthermore, the accuracy gained from this investigation outperforms several other studies that investigated multi-sensor based emotion recognition methods [52,53].
In another study, Murugappan et al. [34] developed an emotion recognition model for detecting five emotions (disgust, sad, joy, fear, neutral) which had an accuracy of 66.48%. However, they had 20 subjects for the experiment and some of the features that they considered include wavelet transformation based features. However, the method developed in this investigation has higher accuracy of 77.25% with a larger amount of subject space. The emotion recognition method developed utilizing with-in beat based features by Rattanyu and Mizukawa [9] had an emotion recognition accuracy of 61.44% for detecting six emotions, namely: anger; fear; sadness; joy; neutral; and digest. They had a smaller subject space compared to this study and the only difference between their model and the model C produced in this investigation is the digest emotion. The digest emotion falls into the same emotion quadrant of sadness, and, on the other hand, the pleasure emotion is in a different quadrant of the emotion model. The model developed in this study that replaced the digest emotion by pleasure had a recognition capability of 75.11% involving 22 subjects. Examining recent studies, the emotion recognition model developed by Guo et al. [54] has the capability to recognize anger, sadness, fear, joy and relax with up to 56.9% accuracy. In their investigation, they have used HRV analysis for the feature extraction and SVM was used as the machine learning model. By definition, the relaxed emotion in their study can be seen as the pleasure emotion adopted in this study, and, by considering that, their developed model has similarities to the model B developed in this study. Nevertheless, the model developed in this research outperforms their model by a 20% accuracy gain. Furthermore, the accuracy gained from combining all features from different domains has higher accuracies compared to emotion recognition studies that investigated the use of EMD based feature extraction methods [7,10].
Considering everything, it is apparent that the results obtained in this study outperform most of the methods mentioned in the literature. The next section describes additional perspectives of the proposed methodology such as emotion data collection methods, experiment procedures and accuracy gains of the TFB method introduced. Table 8 compares the final results with the models developed in the literature. According to the data shown in the table, the combined analysis outperforms all ECG signal based emotion recognition models and a majority of models that use multiple biosensors for recognising emotions. Compared to emotion recognition models that employ multiple biosensors, the frequency spectrum analysis technique introduced in this study has an accuracy enhancement of up to 24.13%. Furthermore, in contrast with the best performing ECG based emotion recognition model in the literature [34], the introduced ensemble model has an accuracy gain of 6.38%. Therefore, considering the accuracy improvement of the TFB method, it is apparent that the TFB method itself is an optimal method for emotion recognition using ECG signals.

Results Overview
Considering the combined analysis results, combining other broadly used techniques with the TFB method introduced has improved the prediction accuracy from a significant value. For instance, after incorporating other analysis based features with the TFB based model, the accuracy of the six emotion recognition model has improved by 4.48% (see Table 7). Furthermore, compared to the best ECG based emotion recognition model mentioned in the literature, the model developed by combining all four methods has improved the accuracy by 10.77%. Additionally, in contrast to the best performing multiple biosensor based emotion recognition model [28], the combined analysis based model has a similar accuracy with a significantly larger subject space.
As shown in Table 8, studies in the literature have used different methods to elicit emotions in their experimental procedure. The majority of them have used audio or video based methods to extract emotions, and most of them have been able to achieve high emotion recognition accuracy the same as in this study, which used video clips as the emotion elicitation method. It should be noted that, unlike picture based methods, these types of arrangements should use a sophisticated procedure to filter the emotion elicited climaxes from ECG signal space. Moreover, these videos should be picked carefully with the help of domain experts. Therefore, it is safe to say that the emotion-related data filtering protocol followed in this investigation is a reasonably fair approach for obtaining ECG signal based emotion climaxes recorded in the data (signal space). Regarding the number of subjects involved in an experiment, most of the single biosensor based studies have adopted a reasonably higher number of subjects, whereas multiple biosensor based studies have conducted experiments on a smaller number of subjects. However, as mentioned, the number of subjects involved in an experiment has a direct impact on the accuracy of the model. Therefore, the subject count in this analysis, which is 22, is a fairly higher value compared to other studies, and the model produced in this research still outperforms most of them.
Collectively, the ensemble learning method based model developed in this study holds a higher capability compared to other studies. Firstly, the combined analysis model comprises ECG emotional data from 22 subjects within the age range of 22-26. Secondly, the developed model is able to identify the emotion of a person from a 20 s ECG wave. Thirdly, the model uses a single biosensor for recognising emotions. Table 9 illustrates the computational times and time and space complexities of the selected algorithms (N indicates the number of samples in the signal). According to the table, the EMD method takes 95.97% of the feature extraction time in the combined analysis. Furthermore, comparing the additional space complexity added by the employed techniques, most of them used O(N) additional space while computing. In addition, feature extraction methods considered compute time domain and frequency domain features, and the computation time complexity of majority of methods has O(N log N). Since the machine learning model takes a 20 s ECG wave, the combined computation time (0.23 s) will not affect the real-time nature of the system. Despite that, this will raise the model prediction accuracy from a significant amount.

Computational Requirement Analysis
Other than considering computational requirements for the feature extraction methods, while implementing real-life devices, several aspects related to model prediction complexity and the transmission time should be also concerned. In general, the prediction complexity of an Ensemble tree based classifier is O( f n × n tree ), where f n is the number of features and n tree is the number of trees in the ensemble. Furthermore, the transmission time of the signal solely depends on the method itself (i.e., is wired or wireless). The method used in this study, which is wired communication, might not be efficient in real scenarios. However, there has been an extensive amount of research going on in the domain of wearable computing that can be adopted to develop real-life applications [55,56].

Conclusions
The initial objective of this research was to evaluate the capability of ensemble learners for human emotion classification which needed an improved classification accuracy. According to the results presented, the combined features and the selected ensemble learners provide better performance compared to single learner models presented in the literature. Furthermore, the results presented in the feature selection based machine learning process proves that feature selection is a worthy step even for ensemble learners that rely on diversity.
Even though this study is not a review on ECG based emotion recognition, the results overview section provides an extensive review on ECG based feature extraction methods, emotion elicitation methods, experiment procedures and the evolution of ECG based human emotion recognition.
Findings from this research make several contributions to the current literature. Firstly, this research introduces the TFB analysis, which is a simple but efficient way for ECG based feature extraction, and the individual method has 6.38% accuracy gain compared to the best performing model in the literature. Even though the selected method does not have a physiological implication, the extracted features tend to have better capability in terms of signal based features. Secondly, findings from this research confirm that the ECG signal processing is an efficient method for bio-signal based emotion recognition. Furthermore, feature selection results of this analysis provide insight into the capability of the raw ECG signal pattern based features compared to different analysis based features. As the final contribution, this research provides empirical evidence on whether the feature selection prior to ensemble learning is an appropriate step or not. Taken together, models derived from selected features outperform all ECG signal based emotion recognition models mentioned in the literature with a classification accuracy gain of up to 28.61%.

Future Work
Further research needs to be done on emotion recognition for different age ranges. In addition, more research is also required to explore the capability of neural network based emotion recognition because neural networks features are engineered by themselves compared to traditional machine learning models which require pre-extracted features.
As mentioned, emotional intelligence can be applied to various situations where the interaction between the human and the machine (maybe a computer or a smartphone) needed to be improved and personalized. The analysis conducted in this study is based on a wired wearable device that has the ability to transfer data at a higher rate. However, this wired setup might not be feasible for real situations, and users might feel uncomfortable wearing these kinds of devices. However, recently, research has found ways of transmitting ECG data wirelessly by wearable devices [55,56]. Some of these designs can even be integrated into clothes themselves by using techniques such as capacitively coupled ECG [36,56,57]. Therefore, given that the computational complexity is low, as a real-life application, this model can be used even in a smartphone device.
Author Contributions: T.D. proposed the approach to use ensemble learners to enhance the classification performance. T.D. also suggested using feature selection as an additional strategy for improving the ensemble learner classification accuracy. T.D. developed all the feature extraction methods and proposed the novel TFB method. T.D. did the complete machine learning analysis of the research. T.D. wrote the full article and rendered all the figures. T.D. and Y.R. designed and conducted the experiment for ECG data collection. Both R.R. and I.N. provided assistance on this work. All authors approved the final draft.