I2E: A Cognitive Architecture Based on Emotions for Assistive Robotics Applications

: Emotions and personality play an essential role in human behavior, their considerations, and decision-making. Humans infer emotions from several modals, merging them all. They are an interface between a subject’s internal and external means. This paper presents the design, implementation, and tests of the Inference of an Emotional State (I2E): a cognitive architecture based on emotions for assistive robotics applications, which uses, as inputs, emotions recognized previously by four affective modals who inferred the emotional state to an assistive robot. Unlike solutions that classify emotions, with a single sign, the architecture proposed in this article will merge four sources of information about emotions into one. For this inference to be closer to a human being, a Fuzzy System Mamdani was used to infer the user’s personalities, and a MultiLayer Perceptron (MLP) was used to infer the robot’s personality. The hypothesis tested in this work was based on the Mehrabian studies and in addition to three experts in psychologists. The I2E architecture proved to be quite efﬁcient for identifying an emotion with various types of input.


Introduction
Creating a cognitive architecture for robots requires additional knowledge from different research fields, such as social psychology, affective computing, computer science, and AI, which influence the design of the underlying control structure. Cognitive architectures, or systems of cognition, specify the structural basis for an intelligent system. Several architectures have already been proposed, as shown by the work of Rickel [1], Kim [2] and Arnellos [3]. These cognitive architectures are for specific purposes; they were developed to solve a specific problem. However, there are other architectures known as general-purpose architectures. Two architectures are widely used in this category, Adaptive Control of Through-Rational (ACT-R), developed by John Anderson and his team since the 1970s [4] and SOAR (State Operator and Result) that was developed by John Laird since the 1980s [5].
Some architectures of intelligent cognitive systems, like those described above, try to reproduce the brain's functioning and are based on the stimulus-response concept, or based only on rationalization. No feeling/emotion is considered for the decision making in these architectures. On the other hand, most of the main topics in psychology and all the major problems humanity faces usually involve emotion. According to [6], "Psychology and humanity can progress without considering emotion-as fast as someone running on one leg".
to recognize natural forms of language and human behavior through the use of technologies based on recognition [20]. Seeking to reproduce the way human beings interact with the world, in the Robotics Laboratory of the Electrical Engineering Department of the Federal University of Bahia, researchers in AR are being developed a robotic device called HiBot (roBOT for Human Interaction). HiBot aims to be a platform for human-robot interaction (IHR) experiments. To be more specific, assisting in medical treatments, such as for autistic individuals, in pedagogical support, and other forms of social interaction. HiBot can behave as a mediator between therapists and user care. This device will interact using emotional protocols to make the social interaction process with the user closest to human beings.
The HiBot architecture adopts three basic levels of abstraction: (i) cognitive level, (ii) associative level, and (iii) reactive level. The cognitive level is responsible for choosing the robot's emotional behavior. The associated level is responsible for communicating the robot's internal processes and synchronizing with external events. The reactive level is responsible for the perception and execution of the robot's actions. The HiBot's project uses several affective ways to recognize the user's emotion and, in the future, from these emotions to be able to make decisions. Currently, emotion recognition modules are being developed in parallel, and each of the modals that make up the set of affective sensors uses different techniques/algorithms to accomplish this task.
HiBot, as illustrated in Figure 1, has a set of affective actuators and sensors. Modules organize these sets. Affective actuators, Figure 1b, aims to promote interaction through social protocols (facial and vocal expressions). For that, two modules were defined: (i) Voice Synthesis Module (VSM) and (ii) Facial Expression Module (FEM). The affective sensors, Figure 1a, aim to obtain affective cues transmitted by different modals, such as face, body, voice, and electroencephalography (EEG). Thus, Hibot has four affective sensors modules: (i) Facial Expression Recognition Module (FERM), by video; (ii) Body expression recognition module (BERM), by video/Kinect; (iii) Voice Expression Recognition Module (VERM), by voice; and (iv) EEG recognition module (EEGRM). All four of these affective modules have been developed in parallel by the laboratory researchers. Camada [21], for example, developed the BERM module, which is responsible for recognizing the affective state from the stereotypes of a person's gestures, using as a camera sensor a Kinect device.
Thus, this work presents the Inference of an Emotional statE (I2E). This architecture will merge all the emotions recognized previously by four HiBot's affective modals, and infers a single emotion for the user. Unlike the solutions presented above, which classify emotion, the architecture proposed in this article infers the emotion demonstrated by a robot from the emotions classified by the affective modals of HiBot. In order for this inference to be closer to a human being, a Fuzzy Mamdani [22] system was implemented to infer the user's personality and a MultiLayer Perceptron (MLP) [23] to infer the robot's personality. It is known that there are more sophisticated techniques like Deep Learning or Fuzzy 2. However, the I2E will be embedded on a grid of low processing capacity boards, such as beaglebone black or raspberry, so the solution would have to adapt to the type of hardware it will process. Designing an embedded system is an incredibly complex task. It involves concepts such as portability issues, limiting energy consumption without loss of performance, low memory availability, the need for security and reliability, and the possibility of adding modules to each update. For these reasons, we focus on designing I2E with less sophisticated techniques, which have already been extensively explored by the scientific community and that present very satisfactory results.
The rest of this paper is organized as follows. The I2E proposed architecture is presented in Section 2. Section 3 presents the results of the simulations. Finally, the final comments and future works are presented in Section 4.

I2E Architecture
The I2E architecture is responsible for infers the assistive robot's emotional state. I2E does not recognize emotions; it processes previously recognized emotions. Unlike solutions that classify emotions, with a single sign, the architecture proposed in this article will merge four emotions into one, that is to say, IE2 will infer a single emotion for the user. This emotion will be used to infer the emotion that will be demonstrated to the user by HiBot. It is similar to human behavior. Humans infer emotions from several modals by merging all of them.
This emotional state is based on emotion and the user's personality. Figure 2 shows the overview of I2E. It is sub-divided into three modules:
Module 03: responsible for inferring the robot's emotion.
The inference of the robot's emotion follows three steps: Module 01 infers the user's personality profile (UserOcean), from the inference of the user's emotion (UserEmotion), resulting from the combination of the HiBot's four emotion modals. The personality model adopted was the Big Five Model (BFM) because it is a widely used model and with much literature on the relationship of emotions with BFM. Module 2, based on the personality traced to the user (UserOcean), will infer the personality of the robot (UserRobot), that the personality is empathic with the personality of the user, as it is an auxiliary robot. After the inference of the robot's personality, Module 03 chooses the emotion represented by that personality that will be demonstrated by HiBot.

Module 01-User's Personality
Module 01 is responsible for inferring the user's personality. Personality is the set of characteristics that stand out from a person and represent the pattern of personal and social individuality of each individual. It also influences the tendency to judge specific objectives, such as freedom, or dispositions of actions such as honesty [24].
Module 01 is illustrated in Figure 3. Module 01 uses as inputs the emotions recognized previously by four HiBot's affective modals: (i) Facial Expression Recognition Module (FERM), by video; (ii) Body expression recognition module (BERM), by video/Kinect; (iii) Voice Expression Recognition Module (VERM), by voice; and (iv) EEG recognition module (EEGRM). The set of affective modals is highlighted within the red dotted lines in Figure 3. Each affective modal comprises six attributes representing the pertinence of each of the six emotions: anger, anger, disliking, fear, pity, relief, happy four. After reading all the attributes, it was necessary to calculate the median between each attribute's three values. The median was adopted because it returns to the central trend for distorted numerical distributions. After processing the input values, the next step is the configuration of the User Fuzzy System. This process can be described in four steps: Get OCEAN'S values, OCEAN's discretization, FCL: rules of fuzzy system, and defuzzification.

Get OCEAN'S Values
Let us consider the PAD values used by [14] to define the emotions shown in Table 1. The Equation (1) described by [14] was used to transform the emotions into the pattern of the Big Five Model, in this article, these values will be called OCEAN's values.
As a result of these transformations, numerical values representing the user's personality for each of the input emotions as obtained. The values that represent each emotion used in this article are illustrated in Table 1. However, a problem emerged: how to represent these numerical values? Fuzzy systems can represent these kinds of values using linguistic variables. Fuzzy logic was proposed initially by Lotfi Zadeh in 1965 [25]. Fuzzy logic relaxes the harsh notion of pertinence in propositional logic and classical set theory by creating the concept of pertinence degrees: rather than true or false (it belongs or does not belong), there is a continuous spectrum of values between 0 and 1 to denote the degree of pertinence. Thus, it becomes possible to represent inaccurate symbolic knowledge.

. OCEAN's Discretization
This step is responsible for discretizing the OCEAN's values to find a better amount of output linguistics variables in the fuzzy system.
For discretizing OCEAN's values, tests were performed subdividing the range from −1 to 1 into 3, 5, 7, and 9 parts. Knowing that discretization is the process of putting values in classes so that there are a limited number of possible states. In our tests, we used the WEKA (Weka is a tried and tested open-source machine learning software that can be accessed through a graphical user interface, standard terminal applications, or the Java API-https://www.cs.waikato.ac.nz/ml/weka/) environment and an unsupervised pre-processing filter called Discretize. After each test, its validation was done in WEKA, too, using the J48 algorithm, which implements a decision tree. The test results are shown in Table 2, where, the error rate means the percentage of information loss relative to the values of the Mehrabian article shown in Table 1. Therefore, after all the tests, it was found that nine linguistic variables would be ideal for discretizing the OCEAN's values.  Figure 4. • Output: OCEAN's values obtained by discretization, were mapped into triangular values, as shown in Figure 5.  Figure 5, as can be seen in Table 3. The information in Table 3 was translated into FCL rules [26], and the rules that compose the knowledge base of the fuzzy system can be seen in Figure 6. The defuzzification function used was the centroid, and the results of OCEAN's outputs values were between −1 and 1, as can be seen in Figure 7, and from here, they are called UserOcean. In Figure 7 the fusion of the input emotions 25% FEAR and 50% PITY can be observed, resulting in trapezoids that represent the pertinence of each personality factor, OCEAN. Deffuzification using the centroid function resulted in the following UserOCEAN values: −0.2526, −0.2688, −0.5313, −0.2688 and 0.2947, respectively.

Module 02-Robot's Personality
Module 02, Figure 8, is responsible for inferring the robot's personality (RobotOcean) from the user's personality (UserOcean). The input of module 02 is the output of module 01; that is, the UserOcean. The output of module 02 was generated following two steps: (1) consultation with experts, and (2) neural network design.

1.
Consultation with experts: Three experts psychologists were consulted and, based on the emotions in Table 1, they were asked to relate personalities they considered basic and those that could be more commonly be presented by users, with the personality that the robot must have to react with empathy to each of the personalities presented by the users. As a result of this query, the database represented in Table 4  Neural network design: To generate the robot's personality, a Multi-Layer Perceptron (MLP) [23] neural network was used with five neurons in the input layer, ten neurons in the hidden layer, and five neurons in the output layer. The activation function used for the hidden layers was sigmoid logistics. For the output layer, the hyperbolic tangent, this variation in activation functions is to better represent the output with values between −1 and 1. As can be seen, there is a limited amount of data. Therefore all data were used as a training set, obtaining network convergence with training errors below 0.000001. Table 4. Equivalence emotion/incoming personality profile and emotion/outgoing personality profile.

Module 03: Robot's Emotion
This module was responsible for choosing the robot's emotion as from the personality suggested by OCEAN, resulting from the neural network of module 2, called RobotOcean. The robot's emotion was selected from the emotion corresponding to the personality with the lowest Euclidean Distance (d) calculated between RobotOcean and the values in Table 1. An example of selecting the emotion of module 03 can be seen in Table 5, The emotion love was selected from all the emotions in Table 1, as it has the shortest Euclidean Distance (0.0646) from RobotOCean (0.1324, 0.1304, 0.1999, 0.2050 e −0.0865).

Simulation and Performance Evaluation
The research group that works with HiBot moves towards the construction of a database, where each emotion classified by one of the modals, is merged into just one. It is important to note that the objective is to obtain a database that considers all possible intensities of each of these emotions. Considering that this database is still under construction, it became a necessary alternative for the visualization of the data, which examines the intensities of the six inputs/emotions. It is known that there are pleasurable emotions that demonstrate similar personalities; the same occurs with emotions of low pleasure, yet their OCEAN values are analogous. Therefore, it is expected that the fusion of analogous emotions, obtained after processing the fuzzy system of module 01, will be grouped into a single emotion with similar OCEANs, and the intensity of each factor of the ocean is what will differentiate which emotion will be inferred by module 02 and 03.
To verify the behavior of emotion's groups from the inputs, the cloud of particles were generated, equally distributed in the space of representation of the set of entries. Each particle of the cloud is made up of six values, between 0 and 1, corresponding to the pertinence of the incoming emotions: anger, disliking, fear, pity, relief, and happy for. Using a particle cloud to represent the data universe used in this work is not new, and it was inspired by Thurn's paper that used a particle map on mobile robot navigation [27].
For generating the cloud of particles, space has divided into equal parts for representing each entry. An example would be with these distances: 0.25, 0.20, 0.125, 0.1, 0.0625, and 0.05, resulting in 6 sets of tests. For example, the same emotion in input can generate different output emotions from module 01, and different intensity of emotions affects the selection of different personalities, some results obtained with the test set 1, which use as distance or value 0.25 will be discussed. Due to the complexity of observing the values that represent personalities (OCEAN's values) and improving the visualization and validation of the tests performed, the graphs will show the emotions that infer each personality.
The graphs in Figures 10-15 show the results of the inference and the error of each OCEAN value from Anger, Disliking, Fear, Happy For, Pity, and Relief emotions, respectively. The orange lines represent the values inferred by Module 01, and the blue lines are the reference values used by Mehrabian [14].      This experiment uses as input emotions: anger and fear. The orange-colored area represents the intensity of anger in each personality factor, the blue-colored area represents fear intensity, the pink-colored area represents the merge of the two in each personality factor, and the green area is the neutral area.     The graph shown in Figure 21 illustrates the emotions and their percentages inferred by the UserOcean personality. Remembering that module 01 makes a "fusion" of the six emotions that each of the three affective modals classified. The module 01 system, therefore, considers the nuances of these emotions. For example, if modal 01 indicates anger with 0.9 pertinence and modal 02 indicate disliking with 0.5 pertinence, both emotions are must be considered in the user's emotional inference. Therefore, the emotions inferred in module 01 may be different from the input emotions, as they represent the nuances of the emotions; the inference mechanism takes into account the relevance of each emotion. Like the six inputs, four emotions are said to be "of low pleasure"; therefore, in the output of module 01 was observed that the disliking emotion also has a higher representation with 55.5%.  Table 6 shows emotions classified to be demonstrated by the robot, which proves that the robot's posture remained empathic and proving the neural network's prediction efficiency. The idea of use particle cloud to cover the most significant number of emotions could be proven by looking at Figure 22, which shows the same input emotion generating different output emotions. For example, the dislike emotion had love, joy, hope, and liking as its output, that is, the name of the emotion was the same, but the profiles of each personality was different. To better understand this inference, we will use another example, the case of the pairs ANGER-GRATITUDE and ANGER-LIKING. The histogram charts present by Figure 23 shows the general map of the ANGER emotion, while histogram charts presented in Figures 24 and 25 show the map of the ANGER-LIKING and ANGER-GRATITUDE pairs respectively, in which it can be observed that small variations in the input emotions influence the emotion expressed by the robot. For example, the predominance of anger in Figure 24 and an absence of emotions happy for, relief and pity (the light blue bars indicate zero, that is, no example of this emotion was found) led to the inference of the Liking emotion. In Figure 25 the merge of emotions of disliking, fear, and anger inferred in the emotion of gratitude as output.

Final Comments
Emotion and personality are both critical in the process of human decision making in real life. In the Robotics Laboratory of the Electrical Engineering Department of the Federal University of Bahia, there is a device called HiBot (roBOT for Human Interaction). Therefore, cognitive architecture must be implemented in HiBot to make the social interaction process with the user closest to human beings.
Usually, the works that classify emotions present their results using a confusion matrix with known input/output pairs, based on supervised learning systems. However, as mentioned earlier, as we do not yet have a known set of input/output pairs, we used a cloud of particles to simulate all the possibilities of entry, fusing emotions from the modals affective and grouping them accordingly the Big Five Model.
In this work, the I2E was presented, an architecture that infers an emotional state of a robot from a multimodal system's affective sensors, which will compose a system embedded in HiBot. I2E is the first step towards creating a cognitive architecture for the HiBot. Thiese are also the first steps towards the creation of a database for the classification of affective modals in emotions.
The I2E architecture infers the emotion that will be demonstrated by a robot from the affective modals. For this inference to be closer to a human being, in module 01, a Fuzzy System Mamdani infers the user's personality, and in module 02, an MLP Neural Network infers the robot's personality. Finally, in module 03, from the robot's personality, it is inferred the emotion to be demonstrated by HiBot.
Considering the recognition of emotions by all affective modals is under development, and we do not have a database that includes all modals by the same person. Moreover, this database is still under construction, and it became necessary an alternative visualize the data, which analyzes the intensities of the six inputs/emotions. The experiments made from particle cloud shows that the combination of the input emotions generated different user's personalities (OCEANs), that is, different values for each factor of the Big Five (OCEAN), which in some cases approximate the same emotion. However, although OCEANs can be associated with the same emotion, they generate a different personality to a robot, as seen in the discussion on the output pairs ANGER-GRATITUDE and ANGER-LIKING. These associations demonstrate that the intensity of each input emotion and the combination of these emotions are factors that directly imply the inference of the output emotion.
The experiments also demonstrated that using a machine learning technique to define the better amount of linguistic variables in the discretization process of the OCEAN's values, to avoid loss of knowledge representation was a factor of high relevance to obtain the nuances in the user's personality factors. In this work, a neural network is efficient as a function to describe empathic personalities for a robot based on the user's personality.
The use of a Mamdani system allows the use of linguistic variables and the visualization of the fusion of emotions in defuzzification charts. This form of data visualization has become an essential tool for the validation of experiments by specialists. Despite being a relatively simple machine learning technique, fuzzy systems and MLP have proven to be an efficient choice, and that they can quickly be shipped and replicated on other platforms.
For psychology, I2E provided an environment for observing the inference of one emotion from another. With the tests, it is possible to verify how emotion and an individual's personality influences the generation of emotion. A contribution to psychology is that based on these observations in the creation of emotions, protocols to classify emotions can be created based on current theories. With the addition of technology, it would remove subjectivity to identify and measure emotions.
As future work can be suggested, in ways to improve the accuracy of architecture, going deeper into each of the groups of emotions that resulted from mergers and with the help of experts create new relationships between personality profiles, to find the values of the five factors that define the personality (OCEANs) in various levels of basic emotions, because in this work only the basic emotions, without their variations were used. Therefore, instead of several emotions, it would be necessary to know the various levels of the same basic emotions. With then, several levels of the combinations of emotions to validate that the system's learning can be efficient.
The hypothesis tested in this work was based on the Mehrabian studies, which follows three experts' validation. Another future work is testing I2E with real people and validating the real output by psychology experts.
The I2E architecture proved to be quite efficient for identifying an emotion with various input types. With these results, a further step has been taken in the construction of the HiBot brain, because, with the emotions identified, emotion can now be included in the robot's decision process, thus making it closer to assistive robotics.