Emotional Decision-Making Biases Prediction in Cyber-Physical Systems

: This article faces the challenge of discovering the trends in decision-making based on capturing emotional data and the inﬂuence of the possible external stimuli. We conducted an experiment with a signiﬁcant sample of the workforce and used machine-learning techniques to model the decision-making process. We studied the trends introduced by the emotional status and the external stimulus that makes these personnel act or report to the supervisor. The main result of this study is the production of a model capable of predicting the bias to act in a speciﬁc context. We studied the relationship between emotions and the probability of acting or correcting the system. The main area of interest of these issues is the ability to inﬂuence in advance the personnel to make their work more efﬁcient and productive. This would be a whole new line of research for the future.


Introduction
Research in cyber-physical systems integrates the principles of science in the disciplines of computing and engineering to develop new technology.In industrial practice, many engineering systems have been designed by decoupling the control system from the hardware/software implementation details.After the control system is designed and verified by an extensive simulation, ad hoc adjustment methods have been used to address the modeling.The integration of various subsystems, while keeping the system functional and operational, has been time-consuming and costly.The increasing complexity of components and the use of advanced technologies for sensors and actuators, wireless communications, and multicore processors poses a major challenge for building the next generation of control systems.
An important part of cyber-physical systems, which is not commonly considered, is the human one.People are in continuous interaction with the system and their decisions condition its productivity.We have been looking for a new paradigm that links the physical world to the human world, both together, working as one.We found tools coming from other knowledge areas that may be useful for the goal we pursue.The management of the emotions, present in the workforce of any operations team, would be an excellent example for the subject of our study.In several research articles [1][2][3][4][5][6][7][8] we found the different relationships inside the psychological part of the emotional decision-making, and how these emotions can be redirected thanks to an external stimulus trigger.In fact, our main driver is to find and use the link between psychology and the use of bio-metrics, traditional telemetry, or systems signals, using machine-learning techniques, and having a single view of the overall system's behavior, to be able to interact with it.We found different examples to build our use case [9,10].A self-explanatory cite from Dr Heinström (2010) [11] is: "We are genetically programmed for a stronger and more instinctive reaction towards negative stimuli than towards positive or neutral ones.This is a survival mechanism developed through evolution, just like the automatic reaction to pain which protects us from physical suffering.The processing of negative stimuli is therefore automatic and immediate, and requires little cognitive capacity [10].Consequently, every one of us reacts instinctively and intuitively to signals of danger, although some may react more strongly than others".
This work evaluates from a contextualized point of view of a cyber-physical system, the influence of stimuli and emotions in decision-making around the operations management tasks in a data center.
The study was made in two ways: a general point of view and a specific purpose point of view.The general view study was developed for the entire dataset, obtaining a random forest model capable of predicting with a 85% accuracy the acting (or not acting) of a possible operator inside a data center.The specific purpose model have been focused on the correlation between the external stimulus, the emotion after this stimulus (obtained through the questionnaires) and the final decision (acting/not acting) of the subject.In this case, other random forest model has been used, obtaining with the 85% accuracy the action (or not action) of the subject.

Materials and Methods
For implementing the whole experiment, we selected a significant sample of people working in technology that must make decisions every day in their job.We used the statistical tools to contrast if the sample is relevant for the study.Then, we decided the data that we needed them to answer, and evaluated their ability to trigger actions or decisions based on emotions.We relied on the studies from psychology that will be referenced across this article, to build the questionnaire and make the right assumptions for building the experiment.
After that, we decided what would be the best procedure or tool to determine the predictive model for decision-making, based on the information that we had.We used Python for the development of the models using techniques of machine-learning.Finally, we contrasted the different results from the machine-learning techniques with other statistical procedures and tools to confirm the hypothesis, looking for a new way of contrasting the findings from other disciplines, such as humanities, social sciences, or health, i.e., Psychology.

About the Sample
In the quest for closing the loop of actuation, on the emotion awareness and influence, we decided to run an experiment with people from different sex and ages.The total number of the sample is 100, and to be sure that the sample was representative of the universe here, we made some calculations.The calculation of the size of the sample is one of the aspects to be specified in the previous phases of commercial research and determines the degree of credibility that we will grant to the results obtained.
For the sample size estimation, n, we supposed a Gaussian distribution with confidence level of 95.5% [12,13].
It is the size of the population or universe (total number of possible respondents).k: It is a constant that depends on the level of confidence that we assign (see Table 1).The level of confidence indicates the probability that the results of our research are true: a 95.5% confidence is the same as saying that we can be wrong with a probability of 4.5%.
e: It is the desired sampling error.The sampling error is the difference that can exist between the result that we obtain by asking a sample of the population and the one that we would obtain if we asked the total of it.In our experiment: if the results of our survey say that 100 people would take an action and we have a sampling error of 5%, they will act between 95 and 105 people.p: It is the proportion of individuals who possess the characteristic of study in the population.This data is generally unknown, and it is usually assumed that p = q = 0.5 which is the safest option.
q: It is the proportion of individuals who do not possess this characteristic, i.e., 1 − p. n: It is the size of the sample (number of surveys).However, practical criteria based on experience or simple logic are usually used to calculate the sample size.Some of the most used methods are the following:

•
The budget that we have available for research.

•
Experience in similar studies.

•
The representation of each group considered: choose from each of them a sufficient number of respondents so that the results are indicative of the opinion of that group.

•
Calculations for our study: Contrast the percentage of total active workers in a country with the people working in data center Management related activities.If the population of the country is 47 million people, and we find that 19,564,600 is the registered active workers, the number of them estimate that 5% of this population is related to this specialty (p = 0.05 and q = 0.95), we want a confidence of 95.5% that determines that k = 2 and we are willing to assume a sampling error of 5% (e) we would need a sample of at least 76 people to be representative.In our specific study: Calculating the sample size: n: 76 is the minimum sample size for our study (we have 100)

About the Questionnaire
We built a questionnaire taking the example from previous works on decision-making techniques science articles [7,[14][15][16] and added some control variables.The objective of the questionnaire and the stimulus is to verify if, at the end of the study, the subject takes action to the problem presented in the stimulus.This action can be either to solve the problem by their own or to alert a superior.
During this phase, first, we asked to the participants to define themselves in terms of mood/emotions, (Angry: Emotion = 1, Happiness: Emotion = 2, Neutral: Emotion = 3, Sadness: Emotion = 4, Surprise: Emotion = 5) before starting 10 questions related to general decision-making.Then we introduce an external stimulus, consisting of a bad or good news and question again to self-analyze the dominant emotion/mood.Finally, we ask to the individuals if they are willing to act/raise the problem to the chain of command.Therefore, with this study we obtained 24 variables:

•
Emotion A: It is the initial emotion/mood recorded at the beginning of the experiment.It is indicated as a number from 1-5 that corresponds with the emotions above specified.

•
Question 1A: When I make an important decision, for me, it is essential to overcome doubtful aspects.Questions with the suffix A indicate that no stimulus has been introduced.Every question is evaluated on a scale of 1 to 9 (being 1 the minimum score).
• Question 2A: When I make an important decision, for me, it is essential to organize the actions depending on the time.

•
Question 3A: When I make an important decision, for me, it is essential to define the desired goals.

•
Question 4A: When I make an important decision, for me, it is essential to accept responsibility for the decision.

•
Question 5A: When I make an important decision, for me, it is essential to be motivated to make decision.

•
Question 6A: When I make an important decision, for me, it is essential to generate emotions that will help me decide.

•
Question 7A: When I make an important decision, for me, it is essential to reflect on the need to make the decision.

•
Question 8A: When I make an important decision, for me, it is essential to plan the actions to be performed.

•
Question 9A: When I make an important decision, for me, it is essential to make decisions without external pressure.

•
Question 10A: When I make an important decision, for me, it is essential to take the goals of the business into account.

•
Stimulus: Is the external news.it is a binary variable, as the news can be positive/good (0) or a negative/bad (1).

•
Emotion B: After the news, the individuals are asked again to state the predominant emotion/mood they feel.Just like Emotion A, the emotion is indicated as a number from 1-5.

•
Question 1B: When I make an important decision, for me, it is essential to overcome doubtful aspects.It is important to note that questions with the suffix B indicate that the stimulus (news) has been carried out.

•
Question 2B: When I make an important decision, for me, it is essential to organize the actions depending on the time.

•
Question 3B: When I make an important decision, for me, it is essential to define the desired goals.

•
Question 4B: When I make an important decision, for me, it is essential to accept responsibility for the decision.

•
Question 5B: When I make an important decision, for me, it is essential to be motivated to make decision.

•
Question 6B: When I make an important decision, for me, it is essential to generate emotions that will help me decide.

•
Question 7B: When I make an important decision, for me, it is essential to reflect on the need to make the decision.

•
Question 8B: When I make an important decision, for me, it is essential to plan the actions to be performed.

•
Question 9B: When I make an important decision, for me, it is essential to make decisions without external pressure.

•
Question 10B: When I make an important decision, for me, it is essential to take the goals of the business into account.

•
Decision: After the stimulus and the ten B questions, we ask the subjects about their willingness to act and raise the problem according to their situation.It is a binary variable, if they decide to act/raise the problem to the chain of command, the result is 1 and if they do not, it takes the 0 value.

Machine-Learning Techniques
No single algorithm dominates when choosing a machine-learning model.Some perform better with large data sets and some perform better with high-dimensional data.Thus, it is important to assess a model's effectiveness for our particular data set.In this article, we will give a high-level overview of how random forest works and discuss the real-world advantages and drawbacks of this model in the Appendix A.
Neural Networks can be used with small datasets as well, but it depends on the classes we are trying to split.For instance, if we just try to classify black versus white images, we will need very few training examples.Moreover, there are cases that do not allow us to have much training examples, for instance, the case of predicting tsunamis, so Neural Networks are totally valid for small datasets (as long as we do not get overtrained networks).But as we have data with not clearly separated classes and with few variables, Neural Networks do not make sense in this problem.For this reason, we considered random forest.
Random Forest is a good model if we want high performance with less need for interpretation [17].This classification technique was introduced by Breiman [18] and it has been used in numerous articles [19][20][21][22][23][24], but its application in psychology [9,10] and decision-making fields is scarce.
The outcome of this experiment will be explained in the next section, Results.

The Final Results
As mentioned before, the study was made in two ways: a general point of view and a specific purpose point of view.Both based on the same data, which is made up of 100 samples (the stimulus is totally balanced-50 positive and 50 negative news-and the proportion of action over not action is 51-49).
The approach of this study is to develop two different models to predict, on the one hand, the capacity of action (or not action) of a subject through the whole available variables obtained in our study and, on the other hand, the relation between the stimulus, the emotion and the decision-making after the stimulus, establishing a connection with the final action of the subject.For that, we developed with Python two random forest models.We studied the correlation between variables and their p-value, rejecting the variables that had not a statistic significance p = 0.05.We made different studies and tests with the models to find the best hyperparameters of the algorithm that adjust to our problem, obtaining the final models detailed below.
In this way, the general view study was developed for the entire dataset (formed by the 24 variables, except the ones rejected by their p-value), obtaining a random forest model capable of predicting with a 85% accuracy the acting (or not acting) of a possible operator inside a data center (the metrics obtained for this model are represented in Table 2).The specific purpose model was focused on the correlation between the external stimulus, the emotion after this stimulus (obtained through the questionnaires) and the final decision (acting/not acting) of the subject.In this case, we used other random forest model, obtaining with the 85% accuracy the action (or not action) of the subject (see Table 2 for the details of the metrics obtained for this model).This specific random forest model was made with only a part of the dataset (formed by the variables corresponding to the stimulus, emotion after stimulus, B questions, and the decision variable).The first model was able to predict, with all the data flow (the emotion and decision-making answers to the questionnaires, the stimulus and the emotion and decision-making answers to the questionnaires after stimulus), the probability of acting/reporting to the chain of command.The importance of the model variables is shown in Figure 1.
We run two processes for evaluating the data.The first one was to verify the correlation between the variables and reject, in case there were two variables with a high correlation (higher than 95%), one of them (the correlation matrix of the variables is shown in Figure 2).The second process was to make a p-value analysis for rejecting the variables that had not a statistic significance p = 0.05.
As shown in Figure 2, there is no variable with a correlation higher than 95%, but after the p-value analysis some variables were rejected.The distribution and frequencies of the selected variables after processing are shown in Figures 3 and 4, respectively.
As we can see in Figure 3, the variables Emotion B (emotion obtained after stimulus) and Stimulus have a clear distribution, Emotion B = 2 (happiness) tends to not acting whereas Stimulus = 1 (equivalent to negative news) produces an active response, driving subjects to action.

Specific Random Forest Model
Apart from having a general model capable of predicting the action of a subject into a data center, it was very interesting to know the importance of stimuli and emotions in decision-making.For this reason, a second model was developed.The selected variables for the analysis have been Stimulus, Emotion B, and the answers of the decision-making process after the stimulus.The main goal is to establish the probability of action according to an emotion and stimulus, and the reason we developed this experiment.
The procedure was identical to the general model.We evaluated the correlation and the p-value of the variables according to the predicted label (Decision to act).The correlation matrix of this specific model is shown in Figure 5, and the analysis of the distribution and frequencies of selected variables (after p-value processing (p = 0.05)) is shown in Figure 6 and Figure 7, respectively.We obtained Table 3, evaluating the probability of the willingness to act from the participants according to their input stimulus .As we can see in Table 3 a negative stimulus always increases the probability to act and the emotions that drive us to an actuation are sadness and surprise.We used Weka [25], the Machine-Learning software, (also used for data mining) to check the accuracy and prediction of other models.In particular, we used: trees J48, random forest, and random trees.The results where almost identical, but the ones obtained with our models and processing achieved better outcomes (see Appendix B for more details).It is important to note that these models were implemented from a collection of machine-learning algorithms that Weka has previous established.Weka did the computation and chose the final model that better adjusts to the dataset.But we did not any of our preprocessing nor adjustment of hyperparameters (as we did in Python for our random forest models).

Discussion
In this research study we highlighted the importance of selecting the right information and the appropriate modeling techniques for the different data sources.We used machine-learning techniques to solve the problem of predicting the willingness of to act from the users of a cyber-physical system in a specific context.
Thanks to the different relationships between the psychological part of the emotional decision-making process, and how these emotions can be re-conducted using an external stimulus triggers, this work evaluates from a contextualized point of view of a cyber-physical system, the influence of stimuli and emotions in decision-making around the operations management rasks in a data center.
The study was able to predict the action of a possible operator inside a data center with an 85% accuracy.The results obtained about stimulus and actions reinforce the theories from other articles [11,26] about negative stimuli and how humans are genetically programmed for a stronger and more instinctive reaction towards negative stimuli than towards positive or neutral ones.The processing of negative stimuli is therefore automatic and immediate, and requires little cognitive capacity [27,28].Fear is also an important factor in negative stimuli, as it increases the risk perception [4], so it increases the probability of the acting or escalating an issue to the chain of command compared with the participants that received a positive stimuli.
We have found an important work field for the future, using emotional variables.We can translate subjective information into technical information and act with it using the commonest predictive modeling techniques.We found tools coming from other knowledge areas that are useful for the goal we pursue: to establish a direct relationship between psychological or subjective variables and technical or numerical variables.
There is a clear path for closing the loop and being able to aid to the operations personnel to enhance their skills and to help them to automate the commonest tasks.This is the way we can be more productive, in specially changing environments, such as Smart Cities, IoT, Edge Datacenters.This will be the main research area for the authors of this article.The future of Telecommunications predictors.These predictors will consistently be chosen at the top level of the trees, so we will have very similar structured trees.In other words, the trees would be highly correlated.
Therefore, in summary of what was stated initially, random forests are bagged decision tree models that split on a subset of features on each split.
Whether we have a regression or classification task, random forest is an applicable model for our needs.It can handle binary features, categorical features, and numerical features.There is very little preprocessing that needs to be done.The data does not need to be re-scaled or transformed.
They are parallelizable, meaning that we can split the process to multiple machines to run.This results in faster computation time.Boosted models are sequential in contrast, and would take longer to compute.
It is faster to train than decision trees because we are working only on a subset of features in this model, so we can easily work with hundreds of features.Prediction speed is significantly faster than training speed because we can save generated forests for future uses.Random forest handles outliers by essentially binning them.It is also indifferent to non-linear features.
It has methods for balancing error in class population unbalanced data sets.Random forest tries to minimize the overall error rate, so when we have an unbalance data set, the larger class will get a low error rate while the smaller class will have a larger error rate.
Each decision tree has a high variance, but low bias.But because we average all the trees in random forest, we are averaging the variance as well so that we have a low bias and moderate variance model.Model interpretability: Random forest models are not all that interpretable; they are like black boxes.For very large data sets, the size of the trees can take up a lot of memory.It can tend to overfit, so we should tune the hyperparameters.
564,600 (total active workers in Spain 19 March 2019) k: 2 (for 95.5% confidence level) e: 5% p: 0.05 (proportion of people that would match the desired profile) q: 0.95 (rest of the population proportion)

Figure 7 .
Figure 7. Frequencies of variables.Paying attention to Figure6, Emotion B = (2,3) (2-happiness and 3-neutral) tends to not act and Stimulus = 1 (negative news) drives the subject to action.We obtained Table3, evaluating the probability of the willingness to act from the participants according to their input stimulus .As we can see in Table3a negative stimulus always increases the probability to act and the emotions that drive us to an actuation are sadness and surprise.

Table 1 .
The most used k values and their confidence levels.

Table 3 .
Probabilities to Action according to Stimulus and Emotion.