Multimodal Emotion Evaluation: A Physiological Model for Cost-Effective Emotion Classification

Emotional responses are associated with distinct body alterations and are crucial to foster adaptive responses, well-being, and survival. Emotion identification may improve peoples’ emotion regulation strategies and interaction with multiple life contexts. Several studies have investigated emotion classification systems, but most of them are based on the analysis of only one, a few, or isolated physiological signals. Understanding how informative the individual signals are and how their combination works would allow to develop more cost-effective, informative, and objective systems for emotion detection, processing, and interpretation. In the present work, electrocardiogram, electromyogram, and electrodermal activity were processed in order to find a physiological model of emotions. Both a unimodal and a multimodal approach were used to analyze what signal, or combination of signals, may better describe an emotional response, using a sample of 55 healthy subjects. The method was divided in: (1) signal preprocessing; (2) feature extraction; (3) classification using random forest and neural networks. Results suggest that the electrocardiogram (ECG) signal is the most effective for emotion classification. Yet, the combination of all signals provides the best emotion identification performance, with all signals providing crucial information for the system. This physiological model of emotions has important research and clinical implications, by providing valuable information about the value and weight of physiological signals for emotional classification, which can critically drive effective evaluation, monitoring and intervention, regarding emotional processing and regulation, considering multiple contexts.


Introduction
Emotions are adaptive responses to environmental stimuli, involving alterations in subjective experience, as well as in cognitive, motivational, physiological, and behavioral domains [1]. These responses interact strongly with decision-making, perception, and learning [2] and play a pivotal role in survival and well-being, by allowing the necessary resources to deal with everyday opportunities and threats [3]. Every emotion comprises a functional role and empowers people with the ability to respond to challenging situations.
Notwithstanding the adaptive value of emotions, when persistent and/or exaggerated, they may be associated with significant discomfort and suffering, with deficits in multiple areas of functioning, The classification algorithm used was the support vector machine, resulting in a recognition rate of 85% for different emotional states.
Several studies have investigated emotion classification systems, but most of them are based on the analysis of only one or a few physiological signals. Knowing how informative the individual signals are and how their combination works, would allow to develop more cost-effective, informative, and objective systems for emotion detection, processing, and interpretation.
In the present work, our aims are twofold: (i) evaluate which signal/feature combination better describes an emotion; (ii) analyze which isolated signal may better describe an emotion. To accomplish these goals, data were collected from volunteers that watched emotional videos eliciting happiness, fear and nothing in particular (neutral emotion condition), while their physiological signals (ECG, EDA, EMG zygomatic and medial frontal) were recorded. The most informative five-minute window was selected by means of information quantity. Classifiers were designed considering a multimodal approach, and different signal's combinations were tested and evaluated. Therefore, this work aims to present a modular emotion identification system that may be used in emotion stratification and description, which can operate independently of the signal data source (between the ones available in the work design).
The development of automatic systems for emotion identification and modelling has the potential to greatly support research and practice in areas where emotional data can be preponderant to effectively predict individual's emotion, cognition and behavior, such as in mental health settings. In fact, the Global Burden of Diseases, Injuries, and Risk Factors Study of 2017 [19] indicates that some mental disorders, such as depressive and anxiety disorders, are part of the leading causes of an elevated number of years lived with disability. Nowadays, this yields even more relevance considering the new challenges associated with the COVID-19 pandemic, including the increase of anxiety, stress and depression rates at a global level [20,21]. Having access to complementary information of this nature (physiological activation) would greatly enrich the process of monitoring and intervention in these situations and could even foster the development and improvement of prevention programs.
Grounded on these premises, the present work is organized as follows: Section 2 describes the experiment and the methods used; Section 3 explains the results obtained, which are, lastly, discussed in Section 4.

Materials and Methods
In order to study the physiological component of emotions, a carefully designed protocol was implemented to observe and collect the physiological responses of volunteers to video stimuli.

The Physiological Database
All study procedures were according to the guidelines of the Declaration of Helsinki and the American Psychological Association. It was also submitted and approved by the Ethics and Deontology Council of the University of Aveiro, Portugal. Before the experiment began, all necessary information was provided to the participants, describing the study, and informing that they can withdraw the study at any time. An informed consent was signed, assuring that the participant participates in the study voluntarily and understand all study procedure.
This study intends to evaluate the response to happiness, fear and neutral emotional conditions. To accomplish such goal, the participants performed three sessions, spaced for at least one week. This temporal separation avoids emotional contagion between sessions. The exposition order to emotions were counterbalanced. The participant was instructed to seat comfortably, to be as attentive as possible to the task, and place his/her chin on a chin rest, in front of a computer. Before and after the emotional induction, participants evaluate their emotional state in terms of happy, fear, anxiety, and stress, by four Visual Analogue Scales (VAS) of 100 points.
In order to evaluate the physiological response to emotions, a visual elicitation emotional video was displayed. This video was preceded by a documentary excerpt of about 5 min; this was considered Sensors 2020, 20, 3510 4 of 13 the baseline, since no emotion was induced. Emotional elicitation was performed by three sets of 8-12 movie excerpts, completing a 30 min duration film presentation. Fear was induced by excerpts of terror movies. Happiness was induced by excerpts of comedy movies. The neutral condition was induced by documentary excerpts (please refer to the work of Ferreira and colleagues [12] for a similar procedure). The emotional content of the excerpts used on video display were evaluated on a previous pilot study. In this pilot study, several movie excerpts were presented to participants and they ranked their emotional levels, which allowed to infer the effectiveness of the excerpts in inducing happiness, fear, and neutral emotions.
Fifty-five volunteers participated in this study, 18 males and 37 females, aged between 18 and 28 years old (21 ± 2.62). Following the inclusion criteria in the study, only normal or corrected to normal visual acuity participants were included. They could not have taken any medication or present any disease that could have influenced their cardiac functioning (e.g., tricyclic antidepressants or cardiac arrhythmia, respectively), or report any psychiatric or neurological diagnosis. The database is characterized by forty-seven right-handed, six left-handed, and two ambidextrous participants. Thirty-six participants completed secondary education, fifteen had a bachelor's, and four had a master's degree.
Signal Recording Figure 1 represents the process of signal recording. The cardiac signal (ECG), electrodermal activity (EDA), zygomatic (EMGZ), and medial frontalis (EMGMF) muscular information of each participant was recorded using BIOPAC MP160 data acquisition system and AcqKnowledge 5 software (BIOPAC Systems, Inc. in Goleta, CA, USA), with a sampling rate of 1000 Hz. Moreover, Ag/AgCl disposable vinyl electrodes (EL503; BIOPAC Systems, Inc.) and the corresponding conductive gel were used to acquire the signal, following the Lead II configuration [22], in ECG recording. EDA was collected at the Medial Phalanx of the index and middle fingers of the left hand. For EMG data collection, two electrodes were positioned on the zygomatic major muscle with a distance of 1 cm between each other; and two on the frontalis muscle, one at the center and the other with a distance of 1 cm and slightly towards the right side.
Sensors 2020, 20, x FOR PEER REVIEW 4 of 15 sets of 8-12 movie excerpts, completing a 30 min duration film presentation. Fear was induced by excerpts of terror movies. Happiness was induced by excerpts of comedy movies. The neutral condition was induced by documentary excerpts (please refer to the work of Ferreira and colleagues [12] for a similar procedure). The emotional content of the excerpts used on video display were evaluated on a previous pilot study. In this pilot study, several movie excerpts were presented to participants and they ranked their emotional levels, which allowed to infer the effectiveness of the excerpts in inducing happiness, fear, and neutral emotions. Fifty-five volunteers participated in this study, 18 males and 37 females, aged between 18 and 28 years old (21 ± 2.62). Following the inclusion criteria in the study, only normal or corrected to normal visual acuity participants were included. They could not have taken any medication or present any disease that could have influenced their cardiac functioning (e.g., tricyclic antidepressants or cardiac arrhythmia, respectively), or report any psychiatric or neurological diagnosis. The database is characterized by forty-seven right-handed, six left-handed, and two ambidextrous participants. Thirty-six participants completed secondary education, fifteen had a bachelor's, and four had a master's degree.
Signal Recording Figure 1 represents the process of signal recording. The cardiac signal (ECG), electrodermal activity (EDA), zygomatic (EMGZ), and medial frontalis (EMGMF) muscular information of each participant was recorded using BIOPAC MP160 data acquisition system and AcqKnowledge 5 software (BIOPAC Systems, Inc. in California, USA), with a sampling rate of 1000 Hz. Moreover, Ag/AgCl disposable vinyl electrodes (EL503; BIOPAC Systems, Inc.) and the corresponding conductive gel were used to acquire the signal, following the Lead II configuration [22], in ECG recording. EDA was collected at the Medial Phalanx of the index and middle fingers of the left hand. For EMG data collection, two electrodes were positioned on the zygomatic major muscle with a distance of 1 cm between each other; and two on the frontalis muscle, one at the center and the other with a distance of 1 cm and slightly towards the right side.
Before the task, the experimenter cleaned with ethyl alcohol (70%) the places where the electrodes would be placed. After placement, the participant waited at least 10 min before starting the experiment to allow the stabilization of the psychophysiological signals.

Signal Preprocessing
The participants were instructed to be as quite as possible during the signal acquisition. Nevertheless, the signals are always affected by noise. Therefore, to increase the signal quality, for the ECG, a Butterworth low pass filter with a cut-off frequency of 40 Hz was applied. For EMG, it was used a bandpass finite impulse response filter with 20 Hz and 450 Hz as cut-off frequencies. The EDA was filtered by means of a low pass Butterworth filter with 5 Hz cut-off frequency.
The level of physiological activation of participants is not the same across the 30-min movie duration. Moreover, some data segments may represent redundant data that may conduce to Before the task, the experimenter cleaned with ethyl alcohol (70%) the places where the electrodes would be placed. After placement, the participant waited at least 10 min before starting the experiment to allow the stabilization of the psychophysiological signals.

Signal Preprocessing
The participants were instructed to be as quite as possible during the signal acquisition. Nevertheless, the signals are always affected by noise. Therefore, to increase the signal quality, for the ECG, a Butterworth low pass filter with a cut-off frequency of 40 Hz was applied. For EMG, it was used a bandpass finite impulse response filter with 20 Hz and 450 Hz as cut-off frequencies.
The EDA was filtered by means of a low pass Butterworth filter with 5 Hz cut-off frequency. The level of physiological activation of participants is not the same across the 30-min movie duration. Moreover, some data segments may represent redundant data that may conduce to misleading results. Therefore, as a preliminary step, for each participant and each stimuli the information quantity of the signals was calculated, in a second base. To accomplish this step, the extended alphabet finite context model was used in each collected signal. An extended alphabet finite context model (xaFCM), estimates the probability of the next sequence of d > 0 symbols of the information source (depth-d) using the k > 0 immediate past symbols (order-k context). For each sequence of length k found, it counts the number of times each sequence of d symbols appeared right after it. The purpose was to use this model to give an approach for the number of bits that would be generated by a compressor [23].
The previous calculated information will give us a time series representing the amount of information during the 30 min experience. Our goal was to identify the 5-min interval with higher data quantity representation, to increase the information input on the classifier.
Intuitively, the four collected signals should have the same behavior when information quantity is analyzed, since they depend on the participant's response to the stimuli. Consequently, we decided to study the cross-correlation to evaluate the optimum lag that leads to the highest correlation value. In the optimal lag evaluation, it was observed that in most cases the optimum lag was zero. Nevertheless, some participants presented a lag different from 0. In case the optimal lag was equal to 0, we concluded that the time series were synchronized. On the other hand, in case optimal lag was different from 0, a shift corresponding to the module of the lag value was made, to the left in time if the optimal lag value was positive, and to the right if the optimum lag was negative, so the time series would be synchronized, as we can see in Figure 2. It is important to notice that the time shift was observed in few participants, and it always represents a small time shift. After this procedure, all signals became synchronized in terms of amount of information, which means that they are related to the experience segment that most activate the participant, minimizing redundant information, inside those 5-min window.
Sensors 2020, 20, x FOR PEER REVIEW 5 of 15 misleading results. Therefore, as a preliminary step, for each participant and each stimuli the information quantity of the signals was calculated, in a second base. To accomplish this step, the extended alphabet finite context model was used in each collected signal. An extended alphabet finite context model (xaFCM), estimates the probability of the next sequence of d > 0 symbols of the information source (depth-d) using the k > 0 immediate past symbols (order-k context). For each sequence of length k found, it counts the number of times each sequence of d symbols appeared right after it. The purpose was to use this model to give an approach for the number of bits that would be generated by a compressor [23]. The previous calculated information will give us a time series representing the amount of information during the 30 min experience. Our goal was to identify the 5-min interval with higher data quantity representation, to increase the information input on the classifier.
Intuitively, the four collected signals should have the same behavior when information quantity is analyzed, since they depend on the participant's response to the stimuli. Consequently, we decided to study the cross-correlation to evaluate the optimum lag that leads to the highest correlation value. In the optimal lag evaluation, it was observed that in most cases the optimum lag was zero. Nevertheless, some participants presented a lag different from 0. In case the optimal lag was equal to 0, we concluded that the time series were synchronized. On the other hand, in case optimal lag was different from 0, a shift corresponding to the module of the lag value was made, to the left in time if the optimal lag value was positive, and to the right if the optimum lag was negative, so the time series would be synchronized, as we can see in Figure 2. It is important to notice that the time shift was observed in few participants, and it always represents a small time shift. After this procedure, all signals became synchronized in terms of amount of information, which means that they are related to the experience segment that most activate the participant, minimizing redundant information, inside those 5-min window.

Feature Extraction and Selection
Feature extraction is the process of getting useful information from existing data. This is an important step, since the data available for classification are directly dependent on it. To extract all features from the signals, Neurokit was used. This library is a Python module that provides highlevel integrative functions. From each signal, several features were extracted, as shown in Table 1.

Feature Extraction and Selection
Feature extraction is the process of getting useful information from existing data. This is an important step, since the data available for classification are directly dependent on it. To extract all features from the signals, Neurokit was used. This library is a Python module that provides high-level integrative functions. From each signal, several features were extracted, as shown in Table 1.
Notwithstanding, not all of them are relevant, so, in the next step, the ones that best represented the analyzed conditions were selected. To accomplish this step, we used the correlation threshold method, which found highly correlated features; one is selected in order to diminish data redundancy. The variance threshold was also used; the selected threshold was adapted to each feature space. This method selects only the features with high information quantity. After these two steps, the backward elimination method was used to select the features that minimizes the performance error. Table 2 presents the final selected features used for classification process.  Figure 3 exemplifies one of the chosen features, the ECG T wave amplitude. By analyzing the box plot, we observed differences between the three conditions. Fear has associated an increased variability that can be associated with the sudden variation that can be observed in the time series (Figure 2, right chart). The remaining selected features, similarly, present a differentiation between the conditions.

Emotion Classification
The selected features were the input of two machine learning methods: random forest and neural networks (multilayer perceptron with backpropagation), two popular classifiers on emotion classification, given their theoretical characteristics. Random Forest is an algorithm constituted by several decision trees, therefore the results are the aggregation of the results obtained in each of the decision tree. This kind of methods are usually efficient in datasets with high variability. Neural networks are algorithms that find dependencies and relations between attributes, not available in a first analysis. Since there is not a physiological description of emotions, the relation between variables are not defined or known.
The classifiers' parameters were found by grid search. In Random Forest, the used parameters

Emotion Classification
The selected features were the input of two machine learning methods: random forest and neural networks (multilayer perceptron with backpropagation), two popular classifiers on emotion classification, given their theoretical characteristics. Random Forest is an algorithm constituted by several decision trees, therefore the results are the aggregation of the results obtained in each Sensors 2020, 20, 3510 7 of 13 of the decision tree. This kind of methods are usually efficient in datasets with high variability. Neural networks are algorithms that find dependencies and relations between attributes, not available in a first analysis. Since there is not a physiological description of emotions, the relation between variables are not defined or known.
The classifiers' parameters were found by grid search. In Random Forest, the used parameters for the classifiers with each isolated signal (ECG, Electromyogram zygomatic (EMGZ), Electromyogram medial frontalis (EMGMF) and EDA), and the combination of the two EMG signals were 8000 trees. When considering the classifier with all signals as input, 10,000 trees.
Considering the neural network, different structures were used depending on the input. Notwithstanding, the last layer is the output layer and has 3 nodes, that corresponds to the 3 emotions. Considering the neural network classifier with: • ECG, EMGZ, and EDA signals as input, 4 layers were used. The first layer has 12 nodes. The second layer has 9 nodes and the third layer has 6 nodes; • EMGMF as input, 3 layers were used. The first layer has 8 nodes. The second layer has 9 nodes; • both EMG signals as input, 3 layers were used. The first layer has 10 nodes. The second layer has 24 nodes; • all signals as input, 2 layers were used. The first layer has 28 nodes.
It is known that the physiological emotional response is not instantaneous, so there is a gap in time until the body reacts to the emotion. Moreover, the emotional information is encapsulated in the physiological signal, so the needed duration time for the methods to understand it is not known. Therefore, in this work, two frame durations were used and analyzed: 30 and 60 s.
Emotion recognition may be done as subject-independent (the data from a new subject is not known in the train dataset), and subject-dependent (the algorithm already knows the participant and its emotional response). In the two models, the data in the train and test dataset are different. To accomplish this goal, three approaches were planned. In the first approach, a division was made between the participants: 12 randomly selected out of 55 were used in the test dataset, and there is not any data from those participants on the train dataset. In the second division, again, 12 (out of 55) were used in test dataset. From the selected participants on the test dataset, one of the three emotions was randomly chosen to be part of the train dataset. The design of the test dataset was done in order to ensure that there is no unbalanced data for classification, both in train and test the same number of emotions were guaranteed. Again, the data on test dataset were not found in the training dataset. The third approach is similar to the second one; however, in this one, 30% of data were randomly chosen from each emotion from the participants in the test dataset and used in the training dataset, to provide more information of the participant to the model, leading to a subject-dependent analysis. In all three approaches, 10 iterations for random test dataset selection were performed to guarantee that the results are not biased to the sample. Furthermore, in the training phase, a shuffle split algorithm was used to split this dataset into a training and validation dataset. Shuffle split is a random permutation cross-validator, in this study, 20% was set for the validation set and 10 iteration folds were chosen.
The classification methods used were random forest and neural network, both reported in emotion identification literature. Notwithstanding, the literature is more focused on unimodal systems, using one physiological signal at a time. The performance metrics used to evaluate both models were: • F1 Score: this metric conveys the balance between precision and recall. It gives a more accurate performance; • sensitivity: it is the probability of the method classifying an emotion in a class, when it effectively belongs to it; • specificity: it is the probability of the method not classifying an emotion in a specific class, when it does not belong to it.
Sensors 2020, 20, 3510 8 of 13 Figure 4 represents the implemented process. Data are acquired from different physiological signals. Signals were filtered and the 5 min of maximum information window was selected on the data preparation and pre-processing block. Over the 5 min window, it was necessary to extract the features and then select the ones that better describe the physiological process. Data were then split in the training and test datasets. The test dataset was saved for later use, and the training dataset suffered another division in training and validation. Following, the model was trained, the validation dataset was used to validate, and the test dataset for testing. Ten iterations with the same follow-up were performed. Subsequently, the model results were analyzed and the model deployed.

Results
The classification methods used were random forest and neural network, both reported in emotion identification literature. Notwithstanding, the literature is more focused on unimodal systems, using one physiological signal at a time. The performance metrics used to evaluate both models were: • F1 Score: this metric conveys the balance between precision and recall. It gives a more accurate performance; • sensitivity: it is the probability of the method classifying an emotion in a class, when it effectively belongs to it; • specificity: it is the probability of the method not classifying an emotion in a specific class, when it does not belong to it. Figure 4 represents the implemented process. Data are acquired from different physiological signals. Signals were filtered and the 5 min of maximum information window was selected on the data preparation and pre-processing block. Over the 5 min window, it was necessary to extract the features and then select the ones that better describe the physiological process. Data were then split in the training and test datasets. The test dataset was saved for later use, and the training dataset suffered another division in training and validation. Following, the model was trained, the validation dataset was used to validate, and the test dataset for testing. Ten iterations with the same follow-up were performed. Subsequently, the model results were analyzed and the model deployed. For emotion recognition, two machine learning algorithms were used: the random forest and the neural network. In order to correctly compare the results from the two methods, the same training and testing datasets were used. In this context, the shuffle split with 10 iterations and a validation split with 20% were used. In Table 3, we present for both classifiers and for the two-frame duration the F1-score (corresponding to the best classifiers) obtained from the iterations of the shuffle split. Three conditions were tested:

Results
(a) when there is no data sharing between train and test dataset (participants in one dataset are not in the other one); (b) the data in training and testing is disjointly selected; nevertheless, one emotion (condition) of the participants in the test sample is in training and the other two conditions are in testing; (c) from the participants in test sample, 30% of data in each condition were randomly selected and added to the train dataset. For emotion recognition, two machine learning algorithms were used: the random forest and the neural network. In order to correctly compare the results from the two methods, the same training and testing datasets were used. In this context, the shuffle split with 10 iterations and a validation split with 20% were used. In Table 3, we present for both classifiers and for the two-frame duration the F1-score (corresponding to the best classifiers) obtained from the iterations of the shuffle split. Three conditions were tested: (a) when there is no data sharing between train and test dataset (participants in one dataset are not in the other one); (b) the data in training and testing is disjointly selected; nevertheless, one emotion (condition) of the participants in the test sample is in training and the other two conditions are in testing; (c) from the participants in test sample, 30% of data in each condition were randomly selected and added to the train dataset. Table 3. F1-score obtained for the two classifiers; (a) corresponds to the subject independent evaluation, (b) corresponds to the subject dependent evaluation, and (c) corresponds to emotion dependent evaluation. In (a), we were evaluating a subject independent approach; in (b) and (c), a subject dependent approach was performed. Emotions are idiosyncratic, then it is expected that the algorithms need some participants and emotion response information to better describe the emotional context and response.

Random Forest (%) Neural Network (%)
In Figure 5, the sensitivity and specificity obtained in the classification (corresponding to the best classifiers) process are presented. In most of the cases, the classifier correctly predicts the classes, in exception of EMGZ and EMGMF. EMGZ better classifies happiness than fear and EMGMF better classifies fear than happiness.  Tables 4 and 5 represent the global result of the mean accuracy of the 30s and 60s data frame (evaluating all splits). The result is presented as mean F1 score, standard deviation, maximum and minimum F1 score. Maximum and minimum are represented because the shuffle split is a random iteration model, and in some datasets, it is possible that it includes more information, and this leads to variation in final F1 score results.  Tables 4 and 5 represent the global result of the mean accuracy of the 30s and 60s data frame (evaluating all splits). The result is presented as mean F1 score, standard deviation, maximum and minimum F1 score. Maximum and minimum are represented because the shuffle split is a random iteration model, and in some datasets, it is possible that it includes more information, and this leads to variation in final F1 score results.

Discussion
This study aimed to evaluate how well different signals described physiological response towards emotional stimuli, as well as what signal/combination of signals may better describe this response. To accomplish these aims, user independent and user dependent emotion recognition methods were presented. From the three approaches tested, the approach where 30% of the emotion was included in the train dataset had better results compared to the other two approaches. This supports the need of more information about the participant, since the emotion classification is dependent on the amount of information known by the algorithm.
In the case of the study between 30s and 60s, no conclusions can be drawn, since the shuffle split may be giving more useful information in one of the test cases. There is not a clear difference between the two-time frames.
Considering the physiological representation/description of emotion, we observed that the best combination of signals for emotion description corresponds to use all the signals with all the selected features. In the case of 30s frame, with random forest, an accuracy of around 88% was obtained. In the case of 60s frame, with random forest, an accuracy of around 87% was obtained. With neural network, in the case of 30s frame an accuracy of 77% was obtained, and in the case of 60s frame, an accuracy of around 54% was obtained. Nevertheless, by analyzing the results, two signals (when considered isolated on classification) have obtained good results: ECG and EDA. When analyzing the sensitivity and specificity results, the ECG had more correct classifications than EDA. The major drawback of this signal is its level of noise, since a small movement of the hand will lead to high levels of noise. On the other hand, when collecting the ECG signal it is easy to hide the electrodes, although it is needed to use three electrodes.
The EMG also achieved good results when both collected signals (EMGZ and EMGMF) are combined. This occurs because the two separated signals will not provide the complete description of the emotional response, but together they complement their information and the emotional description is more accurate. The EMGZ captures more information at level of happiness, and the EMGMF captures more information at the level of fear. The good discriminative value of the EMG signals on the emotion stratification is in line with the research already done on facial expression evaluation, since the EMG signals were collected on the face. The electrode positioning on the face is more intrusive than the facial recognition. Nevertheless, the facial recognition needs a good focus of the camera on the face, which in real conditions is not always possible. Moreover, facing a camera is not always an easy task for some people. Therefore, the EMG can be a solution to monitor the facial reaction. Nevertheless, we are aware that this solution may only be viable on laboratory experiments, since real life data collection would need less intrusive and portable sensors.
Overall, in terms of machine learning algorithms, there are discrepancies between random forest and neural network in the three conditions described. In the case where the test includes participants who will not be known by the classifiers and in the case where the test includes an emotion from each participant, the neural network achieved better performance. In the case where 30% of the emotions were supplied to the training dataset, the random forest got better performance. Therefore, the results indicate that the neural network better describes the emotional context when a subject or emotion independent test is performed, while the random forest better describes the subject and emotion dependent context.
It is also important to notice that we are using the information interval with higher information quantity, which may not be observed in all data collection protocols and, also, not during all the experiment. Nevertheless, to identify each signal contribution to emotion recognition is a truly important step, in order to guarantee equal initial conditions for all the evaluated signals.
Knowledge about emotional response, and particularly, the possibility to obtain accurate, non-redundant and objective emotional data using automatic systems for emotion identification and modelling has important implications. For instance, this system would be crucial in situations where a real setup must be planned and where the ecological validity is very important. Moreover, it is important to know which consequences stem from selecting one signal in detriment of another.
Our results suggest that the combination of more than one signal would produce better results in emotion classification; however, it may not be always possible. Frequently, due to financial, material, or methodological constraints, it is not possible to measure multiple signals at a time. Therefore, knowing how informative a signal or a combination of signals may be, helps to design more cost-effective set-ups. This study provides important guidelines about which signals are more informative for the classification of fear, happiness, and neutral emotion, as well as about possible errors associated with situations where only one of the mentioned signals was used.
Finally, this study yields important applications. For instance, in the context of mental health, where frequently more objective data about the emotional response of the patients is lacking, having a system able to detect, process, and interpret physiological responses associated with the emotional state would be an added value to the diagnosis, monitoring and intervention process. Furthermore, these systems can be integrated and used to develop cost-effective, non-invasive, and portable devices with the potential to foster an efficient monitoring of emotional responses, emotion regulation strategies and communication with caregivers and health agents. Apart from the mental health context, these systems can be used in other contexts, including to monitor disorders associated with somatic alterations or even populations with emotional processing and communication difficulties, such as autism spectrum disorders [24], as well as in situations where this information can be highly informative about an individual's cognition and behavior (e.g., on the monitoring of long-distance motorists).

Conclusions
This research analyzed the physiological component of emotion in different emotional conditions (fear, happiness and neutral), using automatic systems for emotion identification. Our results suggest that the ECG signal seems to be the most informative in emotion stratification. The use of facial EMG in emotion is dependent on monitoring two (or more) muscles, allowing to identify facial expression changes by corresponding muscular contractions. Nevertheless, if all signals are used on emotion identification, a higher accuracy is achieved, since all signals are representative of different information. This physiological model of emotions has important research and clinical implications, by providing valuable information about the value and weight of physiological signals for emotional classification, which can critically drive effective evaluation, monitoring, and intervention regarding emotional processing and regulation, considering multiple contexts.