1. Introduction
Artificial intelligence is moving beyond simple information processing toward systems that require social awareness and adaptive behavior. As robots increasingly operate as collaborators in human-centered settings, affective interaction capability has become an important factor in system acceptability and interaction quality [
1]. Recent advances in generative AI, including large language models (LLMs), have greatly improved linguistic interaction. However, the ability to generate natural dialogue does not by itself guarantee socially appropriate behavior. Affective interaction becomes meaningful when affective state is reflected in decision making, not only in verbal responses.
Since the 1980s, psychological research has clarified the close relationship among emotion, cognition, and behavior [
2,
3], and this line of work has inspired computational affect models such as EMotion and Adaptation (EMA) [
4], WASABI (Affect Simulation for Agents with Believable Interactivity) [
5], and Activating event–Beliefs–Consequences Emotional BDI (ABC-EBDI) [
6]. Yet two structural limitations remain when such models are deployed in practical systems.
First, many computational affect architectures already incorporate emotion into behavioral decision making, but the linkage logic is typically embedded within appraisal, coping, conduct, or designer-authored domain-specific rules and plan structures. The problem, therefore, is not that emotion is absent. Rather, the process through which emotion is translated into behavior remains buried in internal logic instead of being separated as an explicit intermediate layer. As a result, when the same emotion should be realized differently across contexts and roles, developers often have to add or adjust internal rules and plans on a case-by-case basis. This makes the emotion-to-behavior translation structure difficult to represent explicitly and can limit scalability and explainability.
Second, recent end-to-end deep-learning systems provide flexibility, but their black-box nature makes behavioral consistency difficult to guarantee. Neuro-symbolic approaches have been proposed to address this issue by combining neural models with logical structure, yet they remain focused mainly on physical task planning and still face limitations in explaining the logical causality through which affective state is converted into behavior.
In this study, we frame the shared limitation of these two approaches as an architectural design problem in which the linkage between internal state and external action remains implicit or tightly coupled to model-specific rules. The Mechanism Gap in this paper does not mean that emotion and behavior are unconnected. Rather, it refers to the lack of an explicitly formalized, independent, and reusable intermediate layer. Existing models provide sophisticated mechanisms for emotion generation and intensity computation, but the transformation from generated emotion to behavior is still implemented differently from model to model.
To address this issue, we propose an Emotional BDI framework that places Frijda’s action tendency within a Belief–Desire–Intention (BDI) architecture as an explicit intermediate layer between affective state and concrete behavior. Rather than mapping emotion directly to behavior, the framework first converts affective state into a directional tendency and then lets the BDI Executor realize that tendency according to role and context. Thus, the same anger state may be realized as Inhibition in a Caregiver role and as Rejection in a Gamer role. The framework also separates a Reactive Pathway from a Deliberative Pathway, with the Action Tendency Layer serving as the mediation layer in the latter.
Action tendency has also been discussed in prior work, but this paper repositions it as an architectural mediation interface between affective-state representation and BDI-based behavior realization. In this sense, the theoretical contribution lies not in introducing action tendency itself, but in formalizing its role as an explicit intermediate layer within an Emotional BDI architecture. The affective model in the current prototype should therefore be understood not as an empirically validated model of human emotion, but as a prototype implementation used to examine the feasibility of this architecture.
The specific contributions of this study are as follows.
First, we propose an Emotional BDI architecture in which the Action Tendency Layer functions as an explicit intermediate interface between the Affective Core and the BDI Executor.
Second, we present a two-stage translation structure in which emotion is transformed into abstract behavioral directionality and then concretized at the BDI level according to role and context, making the mediation process more interpretable and traceable.
Third, we provide a current lookup-table (LUT) prototype together with an exploratory user evaluation on the social robot Buddy to examine the feasibility of the proposed structure and its initial implications for user experience.
In the current prototype, personality, mood, and emotion are modeled by combining the Pleasure–Arousal–Dominance (PAD) space with discrete emotion categories, and the architecture supports both the Reactive Pathway and the Deliberative Pathway. The Action Tendency Layer is implemented as a modular interface so that the current deterministic LUT can later be replaced by richer reasoning engines.
The remainder of the paper is organized as follows.
Section 3 presents the proposed framework,
Section 4 describes the current system implementation, and
Section 5 reports the exploratory user evaluation.
3. Proposed Framework
This section presents the architectural design proposed to address the problem outlined above. By introducing action tendency as an intermediate representation layer between internal affective state and external action, the framework makes the emotion-to-behavior mediation process more explicit than direct mapping approaches.
3.1. Architectural Overview
Conceptually, the framework adopts a three-stage processing structure consisting of the Affective Core, the Action Tendency Layer, and the BDI Executor.
Figure 1 shows the interaction structure among the main modules.
Figure 1 presents the overall structure of the framework, centered on the interaction among the BDI system, the Affective Core, and the Action Tendency Layer. The BDI system on the left maintains memory, context, social norms, and goals; the Affective Core on the right is responsible for primary and secondary emotion generation; and the Action Tendency Layer in the middle converts affective state into abstract behavioral directionality. The diagram summarizes how internal affect processing and BDI-based behavioral execution are connected within a single architecture.
Stage 1, the Affective Core, receives external stimuli through perception, converts personality into the PAD space, establishes a mood baseline in Plutchik’s eight emotion categories, and then integrates that baseline with stimulus-driven emotion increments to produce an internal affective state. The output of this stage is the affective state represented in Plutchik’s eight emotion categories, from which the dominant emotion is selected for subsequent action tendency decision making.
Stage 2, the Action Tendency Layer, takes the output of the Affective Core together with the agent’s current context beliefs and generates abstract tendencies such as Avoidance, Approach, Submission, Rejection, Inhibition, and Attending. This is the key layer that mediates between emotion and behavior. Rather than mapping emotion directly to concrete behavior, it provides an intermediate representation of behavioral directionality, allowing appropriate tendencies to be derived according to role and context.
Stage 3, the BDI Executor, integrates action tendency into the BDI system to form intentions and executable plans, and then realizes them as concrete action through the Action Module shown in
Figure 1. In this stage, rational deliberation within BDI is responsible for final action selection, while the Action Module functions as the terminal interface that realizes the result externally.
The framework supports a dual-pathway architecture. The Reactive Pathway triggers immediate behavior in response to sudden threats, whereas the Deliberative Pathway supports more careful decisions through cognitive appraisal in complex situations. By positioning the Action Tendency Layer within the deliberative route, the framework makes explicit how affective impulses are translated into behavioral intention through a mediated reasoning process. The detailed operation of these pathways is described in
Section 3.4.
3.2. Affective Core
The Affective Core generates and maintains the agent’s internal affective state. The PAD transformation, mood-baseline construction, and use of Plutchik’s eight emotion categories described in this subsection are modeling choices adopted in the current prototype and should be distinguished from the conceptual contribution of the Action Tendency Layer itself. The module uses two representational spaces. Personality is converted into the PAD (Pleasure–Arousal–Dominance) space and forms the basis of mood, while subsequent mood and emotion computation is carried out in Plutchik’s eight emotion categories. These categories are used instead of the continuous PAD space for emotion computation because explicit emotion labels are directly used in the later action tendency decision stage.
The agent’s personality is specified on the basis of the Big Five model and converted into the PAD space using the transformation proposed by Mehrabian [
36]:
Here, E, , N, O, and C denote Extraversion, Agreeableness, Neuroticism, Openness, and Conscientiousness in the Big Five model, respectively, and P, A, and D denote the Pleasure, Arousal, and Dominance dimensions of the PAD space. Through this transformation, personality traits are handled in the same dimensional space as mood and emotion.
The mood baseline is determined from the personality PAD values. Specifically, PAD sign combinations are used in the current prototype to initialize Plutchik-oriented mood-category seeds rather than to define a one-to-one PAD-to-emotion conversion. For transparency,
Table 2 lists the sign-based seeding rule used after the Big Five-to-PAD transformation. This rule should be understood as a prototype-level discretization heuristic for mood initialization, not as an empirically established PAD-to-Plutchik mapping. Mood is influenced not only by internal personality factors but also by external factors such as time and weather, and it transitions without cognitive appraisal.
The framework supports dual-process emotion generation. The Primary Emotion Generator responds immediately to external stimuli without cognitive appraisal and is used in the reactive pathway. The Secondary Emotion Generator produces emotion through multilevel appraisal of stimuli. Multilevel appraisal refers to the process of subjectively evaluating a situation in terms of factors such as user intention (utterance intent, action, facial expression), system goals, current state, task history, and social norms.
Figure 2 illustrates how secondary emotion is determined. The mood baseline derived from personality PAD values is combined with the emotion increments induced by external stimuli, and the category with the highest resulting intensity is selected as the dominant emotion. The emotion threshold specifies the criterion at which an emotion increment exceeds the mood baseline and begins to affect behavioral decision making.
Emotion computation is performed in Plutchik’s eight discrete categories: joy, trust, fear, surprise, sadness, disgust, anger, and anticipation. Enhanced emotions are defined as transient emotional spikes generated by stimuli and added to the mood baseline. Mood is a persistent and foundational affective state derived from personality, and the category with the highest value after these components are combined is chosen as the dominant emotion.
The emotion increment generated by an external stimulus does not persist indefinitely; once the stimulus disappears, it decays over time. To implement affective homeostasis, the framework models the restorative tendency by which the agent’s affective state returns toward its underlying mood. The temporal change of the emotional state is defined abstractly as follows:
Here,
denotes the emotional state at time
t,
denotes the current mood state, and
denotes the time interval. The function
f models how emotion evolves over time relative to mood. This abstract formulation provides architectural flexibility by accommodating different decay mechanisms such as linear, exponential, and logarithmic forms. In the current implementation, a linear decay model is adopted for computational efficiency. Specifically, the discrete emotion increment
decays as follows:
Here,
is the decay coefficient for emotion
i. In the current prototype, it is set in proportion to the arousal level of the agent’s personality. This is a design assumption based on the interpretation that the PAD arousal dimension may relate to differences in affective responsiveness and therefore still requires empirical refinement. For computational efficiency and simulation simplicity, the current implementation assumes a linear relationship. This does not fully capture the nonlinear complexity of human emotion, but it follows a practical affective-computing approach in which emotional change is approximated by a linear system during early architectural design and logic verification [
37]. The linear decay should therefore be understood as a pragmatic assumption of the current prototype rather than an empirically established law of affect. The final emotional state is obtained by taking mood as the baseline and adding the remaining emotion increment after decay:
. Feedback from emotion is also reflected in mood, forming a feedback loop that affects later action tendencies and decision making. Concretely, part of the intensity of the dominant emotion is accumulated into the mood baseline, and this cumulative effect is described in the short-term timeline analysis below. This dimensional integration provides the affective representation used by the current prototype before the state is passed to the Action Tendency Layer.
3.3. Action Tendency Layer
The Action Tendency Layer is the explicit mediation layer that connects affective state to BDI-based behavioral decision making in the proposed framework. Action tendency is defined here as an abstract intermediate representation positioned between emotion and concrete action planning. The purpose of this layer is not to generate behavior directly from emotion, but to represent the directionality of behavioral readiness formed by the current affective state. Through this separation, the same emotion can be concretized differently according to role, and the transformation from emotion to tendency to intention to action can be described more transparently. For example, anger may first be transformed into an abstract tendency such as opposition or inhibition, and the BDI Executor then realizes that tendency as concrete behavior according to the agent’s role, goals, and context.
Whereas many prior affective architectures reviewed above tended to integrate emotion-to-behavior mediation inside appraisal, coping, conduct, or decision rules, the proposed externalizes that mediation as an independent representation at the level of directionality. At the interface level, action tendency () can be defined as an intermediate representation that takes as input an emotion vector (, the dominant category selected in Plutchik’s eight emotion categories) together with contextual information and produces one abstract behavioral direction from a finite set {Avoidance, Approach, Submission, Rejection, Inhibition, Attending, .
The theoretical basis of this layer lies in Frijda’s action tendency theory, which is operationalized here as an intermediate interface between affective state and BDI-based behavioral decision making.
Interface design is central to this module. The input consists of the emotion vector and context beliefs, including the agent’s current role, goals, and contextual information. The output is an abstract tendency such as Avoidance, Approach, Submission, Rejection, Inhibition, or Attending. This interface is designed as a standardized slot that can later be coupled with richer reasoning modules.
In the current prototype, this interface is implemented through a role-specific Criteria Library using a lookup table (LUT). The Criteria Library describes this current implementation. It consists of mapping sets according to the agent’s role, and each AT label is defined with reference to Frijda’s [
10] taxonomy of action readiness, including Approach, Avoidance, Rejection, Inhibition, Submission, and Attending. For example, the Caregiver Set maps positive emotions such as Joy and Anticipation to Approach, more passive emotions such as Trust and Sadness to Submission, and anger to Inhibition so that the caregiving role goal is prioritized. By contrast, the Gamer Set provides different mappings, such as converting anger into Rejection in an entertainment context. This library reflects the design choices of the current prototype and illustrates that the same emotion can lead to different action tendencies according to role and context. In this sense, the current prototype does not eliminate hand-authored mappings; rather, it relocates them from dispersed rule logic into a single inspectable intermediate layer. The full Criteria Library mappings are included in
Appendix A.
Figure 3 shows how BDI components and affective processing are linked by the Action Tendency Criteria Library. Personality, context, goals, and appraisal results converge on the criteria sets and the action tendency decision stage, which then feed into evaluation, decision making, and coping activities. For example, in the Gamer Set, Joy may be converted into Approach while Anger may be converted into Rejection.
Action tendency decision determines the final action tendency on the basis of the selected set and the agent’s current emotion. This process can be expressed as follows:
Here, denotes the agent’s internal action tendency, denotes the currently active criteria set, and denotes the dominant emotion. Importantly, this output does not represent the final behavioral decision; it represents the primary direction that the system is inclined to take in the current affective state. The current implementation derives this directionality from role-specific criteria, while preserving an interface that can later be extended with richer contextual reasoning modules.
3.4. Cognitive Integration and Decision Pathways
Action tendency is integrated into the BDI system and translated into final behavior. We refer to this process as Tendency-to-Intention Translation, meaning the logical mapping by which an abstract behavioral tendency becomes a concrete intention and plan within BDI. For example, an abstract tendency such as avoidance may be translated into specific plans such as stepping back or terminating a conversation.
Figure 4 shows the BDI-based task-management process in which sensor input proceeds through perception, belief revision, options, filtering, and planning before producing final action output. In the proposed framework, the belief-revision stage and the subsequent option, filter, and plan-formation stages can reflect the action tendency produced earlier, and intention and plan are then formed on that basis to generate actual behavior.
In particular, both the process and the result of action tendency generation are explicitly logged within the agent, making it possible to trace which emotional state led the agent to select which tendency at a given moment. This behavioral traceability makes it easier to describe the relationship among emotional state E, tendency , intention I, and action a, and provides a basis for later explanation of agent behavior.
The BDI Integration Interface defines how action tendency interacts with the BDI system. In the current architecture, the generated action tendency is reflected in belief update and intention formation and influences later plan filtering and behavior selection. Through this mechanism, the agent does not simply enact affective impulse directly, but can revise an existing plan or select a new behavioral intention according to the situation. For example, when the agent detects a user’s negative response, it may suspend its current service plan and shift to a new plan, such as ending the conversation or apologizing, in line with an Avoidance tendency.
The framework supports two decision pathways. The Deliberative Pathway is used when careful decisions are required through cognitive appraisal in complex situations. Its flow proceeds through Belief, Multilevel Appraisal, Emotion Generation, action tendency, and Belief Update & Plan Revision. In this pathway, action tendency functions as an intermediate representation that passes the result of affective evaluation to BDI reasoning and may lead to either revision of an existing plan or generation of a new one. In this way, the framework explicitly describes how affective evaluation is translated into behavioral intention through belief update and plan revision.
The Reactive Pathway, shown as the gray dashed route in
Figure 5, is a reflexive path that bypasses complex BDI reasoning and triggers immediate behavior in response to sudden threats or unambiguous stimuli. Its flow proceeds through Perception, Primary Emotion, and Immediate Action. This pathway provides rapid response in urgent situations or simple interactions and supports the agent’s liveliness and responsiveness.
Figure 5 shows the current emotion-informed decision-making process, in which emotion generation, action tendency decision, belief update, and plan filtering are linked through the interaction between the Emotion Agent and the BDI reasoning module. As the BDI layer on the left interacts with the emotion-generation and action tendency decision modules on the right, the figure visualizes how affective evaluation shapes final plan selection and behavioral output.
As shown in
Figure 5, the Task Manager inside the BDI Executor integrates the generated action tendency with the current intention to form concrete intentions and behavioral plans. In this process, action tendency informs belief updating and intention formation, influences plan filtering and behavior selection, and may trigger revision of an existing plan or generation of a new one when needed. Feedback from executed plans and changes in internal state then update the agent’s beliefs for subsequent decisions.
In this way,
Figure 5 summarizes how action tendency, as an abstract intermediate representation, is combined with BDI reasoning and carried through to final action selection.
4. System Implementation
This section describes how the proposed architecture was implemented in a working system to demonstrate its engineering feasibility. The current prototype combines the social-robot platform Buddy with the BDI-based Java Agent Model (JAM) framework and uses Robot Operating System (ROS) middleware for real-time communication across heterogeneous modules. The design choices described below should therefore be understood as implementation choices of the current prototype rather than as the conceptual contribution itself.
4.1. Hardware & Software Environment
To validate the proposed architecture in a physical environment, we adopted our in-house social-robot platform Buddy as the embodied agent. Buddy is a desktop social robot designed for affective interaction and daily support services, and it provides the hardware needed to express internal state and perform physical behavior. The high-resolution display mounted in the robot’s head serves as the primary output device for affective expression, showing the facial expression of a 2D avatar in real time according to the dominant emotion computed in Plutchik’s eight emotion categories. A front-facing camera and microphone collect the user’s facial and vocal information for affect recognition and intention understanding, while a built-in speaker supports verbal communication through text-to-speech (TTS) output. The platform also includes motors controlling neck rotation and vertical arm movement, allowing action tendencies to be expressed through nonverbal gestures.
The system adopts a layered integration of a Java-based intelligence framework for high-level cognitive processing and ROS middleware for heterogeneous hardware control and data communication. At the highest level are a Task Management module equipped with a JAM agent and an Affective Management module. This layer generates emotions on the basis of BDI logic, derives Action Tendencies, and establishes final action plans. Actual communication with hardware is handled through ROS middleware. Vision and speech data received from the edge device (NVIDIA Jetson Orin) and robot sensors are transformed into a format understood by the Java layer, while abstract behavior commands decided at the upper layer are translated into concrete robot-control commands and sent to the actuators. Communication between Java (JAM) and Python (ROS), which operate in different language environments, is implemented through a TCP/IP socket-based bridge. This loosely coupled design helps prevent computationally heavy processing in the cognitive layer from blocking the real-time loop of the lower control layer and can facilitate maintenance and extension.
4.2. Current Prototype Implementation
Because the Affective Core and the Action Tendency Layer involve nontrivial numerical computation, they were implemented as Java classes linked to the JAM framework. Based on data received from ROS, the Java modules update the agent’s mood state, compute the dominant emotion in Plutchik’s eight emotion categories, and determine the action tendency. The PAD mapping, discrete emotion categories, linear decay, and LUT-based Criteria Library used below are practical design choices of the current prototype adopted to verify the logical flow of the architecture; they should not be interpreted as conceptual constraints of the architecture itself.
The system also adopts an externally configured implementation in which agent identity and behavioral logic are separated. Personality, goals, and the Action Tendency Criteria Library are therefore managed in external JSON (JavaScript Object Notation) files rather than embedded directly in source code. At initialization, the system parses these JSON files and loads the agent’s personality parameters, goals, and action tendency mapping rules into memory. This structure makes it comparatively easy to switch across roles and configurations and shows that the Action Tendency Layer functions not as one fixed rule bundle inside the code base, but as a configurable intermediate layer.
This externalized mapping structure thus serves as the current implementation of a standardized interface that can later be coupled with more sophisticated reasoning engines.
4.3. Operation of the Affective Model
This subsection analyzes the internal behavior of the proposed affective model through simulation. By observing changes in emotion, mood, and action tendency over short- and long-term timelines, we examine how affective factors are reflected in the agent’s decision-making process.
On the short-term timeline, the agent’s emotion and mood respond immediately to external events. When a particular event occurs, the corresponding emotion rises rapidly and combines with the existing mood state to produce an amplified affective response. This emotion then decays and gradually converges toward the mood baseline. For example, the intensities of joy (happy) and anger increase sharply at specific moments and then decline over time, while earlier emotional responses feed back into mood and raise the baseline. This shows that affective response can influence mood and, in turn, modulate later emotion generation.
On the long-term timeline, emotion changes are transient, whereas mood changes gradually according to factors such as personality, weather, and time of day. The environmental parameters of the present model were set with reference to prior work suggesting that the relation between weather and mood may vary with season and outdoor-activity conditions [
38], and that subjective energy and arousal may fluctuate with circadian rhythms [
39]. The results below should therefore be interpreted not as evidence of a universal law, but as simulation patterns that reflect these design assumptions. In the simulation, an agent with an active and positive personality showed relatively stronger positive mood components such as joy and trust under clear weather and during the morning period. By contrast, an agent with a calm and negative personality exhibited a comparatively narrower range of mood fluctuation, while negative mood components remained more pronounced. Under clear weather, positive mood tended to be relatively reinforced, whereas under rainy or cloudy weather negative mood tended to be maintained or increase. In terms of time-of-day effects, the circadian design produced a pattern in which arousal became relatively higher from the morning toward noon.
Figure 6 visualizes affective change over the short-term timeline. In
Figure 6a, joy (happy) and anger rise sharply at specific moments and then decay. In
Figure 6b, previously generated emotion feeds back into mood, producing a cumulative elevation of the baseline. This illustrates how short-term affective response and mood feedback operate together.
Figure 7 shows how personality and environment affect mood trajectories over the long-term timeline.
Figure 7a,b visualize mood changes over a day under different personality settings, while
Figure 7c,d illustrates how weather conditions exert different effects on positive and negative dispositions. These simulations show that long-term variability in the affective model is structurally shaped by personality and the external environment.
4.4. Service Robot Scenario
To illustrate the practical applicability of the proposed framework, we present a concrete interaction scenario using the home service robot Buddy. This scenario shows how the current prototype uses affective cues to make context-appropriate decisions. The robot provides both speech and visual feedback and includes emotion-expression capabilities. The agent employs symbolic processing, and the BDI system is implemented with JAM.
The scenario centers on Buddy as a domestic service robot. In the morning, a user asks Buddy for weather and news information. Buddy displays facial expressions on its screen according to the type and intensity of its affective state and responds to the user’s utterances by providing information. During initialization, when the user begins the interaction by greeting Buddy, the robot sets its initial affective state on the basis of its personality and the current environment, such as the weather and time of day. In this process, anticipation increases and the final state is set to neutral.
During service-request execution, when the user asks for weather information, Buddy analyzes its affective state and the situation and selects a service-oriented response. The robot then provides the requested information, and
Figure 8 shows an example of the internal log recorded at that moment. The label
toward in the log is a legacy internal implementation label corresponding to a service-oriented Approach tendency in the taxonomy used in this paper.
Later, when the user provides negative feedback about the inaccuracy of the information, this stimulus triggers a transient increase in the anger component of the internal affective system. This internal response to a negative evaluative stimulus gradually converges back toward neutral over time. When the user subsequently requests additional news service, the BDI system evaluates the current anger state, selects an Inhibition tendency, and converts it into behavior consistent with the caregiving goal. The robot continues to provide the service while suppressing internal tension and displaying a serious, concerned expression. This illustrates how the functional disposition associated with the Caregiver role regulates even negative affect in a way that remains aligned with the care objective.
Figure 9 summarizes the subsequent negative-feedback episode in the home-care scenario as a human-readable decision trace from user stimulus to observed action.
Buddy can also express these internal-state changes through screen-based facial expressions.
Figure 10 presents representative examples that visualize a neutral state and anger-related internal tension.
If Buddy had a different personality or different goals, the same stimulus could produce different emotions and different behaviors. For example, a robot with a less sensitive personality might respond neutrally rather than experiencing anger in response to negative feedback. Likewise, even under the same anger state, a robot in the Gamer role could select a Rejection tendency and refuse service.
Table 3 presents a contrasting example in which the same anger state is converted into different action tendencies depending on role. This shows that, because behavioral directionality is abstracted, changing role requires only replacement of the LUT mapping set rather than modification of the affect-processing mechanism itself.
This scenario shows that the proposed architecture, including the Action Tendency Layer, can be implemented in a real physical environment and that explicit mediation between emotion and behavior can be operationalized within a robot system.
5. Exploratory User Evaluation
This section presents the results of the user evaluation of the proposed architecture. The central objective of the experiment was to examine how the Action Tendency Agent differs from the Emotion-Driven Agent and the Cooperative Agent across four user-perceived indicators: satisfaction, trust, interaction quality, and appropriateness. More specifically, the evaluation examined how these three agent conditions shaped user impressions across the broader user experience.
5.1. Experimental Setup
The research question of the experiment was as follows: does the Action Tendency Agent lead to different user evaluations than the Emotion-Driven Agent or the Cooperative Agent?
A total of 30 responses were initially collected online. Prior to analysis, four cases were excluded because of invalid response patterns or missing data, resulting in a final analytic sample of 26 participants (aged 27–42 years; mean age 31.5 years). Most participants had little prior knowledge of affective agents or social robots. We used a within-subject repeated-measures design in which each participant viewed videos from three conditions presented in random order and then completed questionnaires for each agent. To compare the three behavioral conditions under matched scenarios and stimulus conditions, we adopted a controlled video-based human–robot interaction (HRI) design. Although this approach does not fully replace live interaction, it has been used in HRI research as an exploratory method for comparing user impressions and responses across conditions [
40,
41]. This was an intentional design choice made to control exogenous variables such as mechanical noise, communication delay, and recognition errors, and to isolate differences attributable to the agents’ decision-making structures. The user study reported in this paper was classified as minimal-risk research and received exemption from review by the KIST Institutional Review Board (IRB No. KIST-202411-HR-007, 22 April 2025). Written informed consent was obtained from all participants.
The experiment was conducted on the basis of two contrasting domain scenarios: elderly home care and entertainment gaming. In each scenario, the agent was designed to respond to the user’s request according to one of the following three logics.
Condition A, the Cooperative Agent (Passive Compliance Model), was designed to comply with user requests consistently regardless of internal affective state or role. Because it contains no affective processing module, behavior selection is not influenced by emotion. This condition represents a typical rule-based service agent implementation without emotion-based behavioral modulation. In the home-care scenario, it performs the service consistently without visible facial change regardless of internal state; in the entertainment scenario, it accepts a rematch irrespective of provocative stimuli.
Condition B, the Emotion-Driven Agent (Direct Mapping Model), directly connects emotion to behavior without an intermediate abstraction layer. Once an internal emotion is generated, it is linked immediately to action through predefined domain rules. This condition represents a common affective-computing implementation in which emotion functions as a behavioral trigger but no separate role-sensitive mediation mechanism is provided. In the home-care scenario, the agent refuses service when anger is triggered by the predefined rule. In the entertainment scenario, the same anger rule leads it to refuse a rematch or terminate the game.
Condition C, the Action Tendency Agent, implements the proposed architecture and performs action tendency transformation based on Frijda’s theory of action readiness [
10]. Once emotion is generated, it is transformed into an abstract action tendency through the role-specific Criteria Library (LUT) and then provided to BDI reasoning. As a result, the same emotion can be concretized differently according to role and context. In the home-care scenario, anger leads to an Inhibition tendency, allowing the agent to continue service with a serious and concerned expression. In the entertainment scenario, anger leads to a Rejection tendency, causing the agent to decline a rematch.
5.2. Quantitative Results
User evaluation employed four core indicators reconstructed for the purposes of this study on the basis of the Trustworthy and Acceptable HRI (TA-HRI) checklist. Each indicator consisted of 4–5 questionnaire items (5 for satisfaction, 5 for trust, 4 for interaction, and 5 for appropriateness; 19 items in total), all rated on a five-point Likert scale (1 = strongly disagree, 5 = strongly agree). Internal consistency was evaluated with Cronbach’s .
Satisfaction () assessed whether the robot’s interaction and service met user expectations. Representative items included “Overall, were you satisfied with interacting with this robot?” and “Did the service provided by this robot meet your expectations?”
Trust (
) assessed whether the robot behaved in a trustworthy manner across situations. Representative items included “Did you feel that this robot responded in a trustworthy way in a given situation?” and “Was the robot’s behavior predictable and consistent?” Because the
value of the trust scale falls below the commonly used threshold of 0.70, trust-related results should be interpreted cautiously. This is consistent with prior work suggesting that trust in robots is a multidimensional construct influenced by performance, agent attributes, and environmental factors [
42], and that short scenario-based exposure may not activate all of its subdimensions evenly [
43].
Interaction quality () assessed the robot’s ability to provide active and natural interaction. Representative items included “Did you feel that this robot interacted with the user actively?” and “Do you think this robot provided natural interaction based on emotion?”
Appropriateness () assessed whether the robot’s behavior was appropriate in light of its goals, affective state, and situational context. Representative items included “Did you feel that the robot’s behavior was appropriate to the situation?” and “Do you think the robot made appropriate decisions on the basis of its goals and emotional state?”
Given the within-subject repeated-measures design, repeated-measures ANOVA was conducted. For indicators that showed a significant main effect, post-hoc comparisons were performed using paired
t-tests with Bonferroni correction.
Table 4 and
Table 5, and
Figure 11 summarize the descriptive statistics, post-hoc comparisons, and visual comparison results for each condition.
The repeated-measures ANOVA results (
Table 4,
Figure 11) revealed statistically significant differences among the three conditions for satisfaction (
,
,
), trust (
,
,
), and appropriateness (
,
,
), with all effect sizes falling in the large range (
). By contrast, no significant difference was observed in interaction quality (
,
,
).
The Bonferroni-corrected post-hoc comparisons (
Table 5) showed that the Action Tendency Agent scored significantly higher than the Emotion-Driven Agent in satisfaction (
,
), trust (
,
), and appropriateness (
,
). Between the Action Tendency Agent and the Cooperative Agent, a significant difference was observed only in satisfaction (
,
); trust (
) and appropriateness (
) showed higher mean values for the Action Tendency Agent but did not reach statistical significance. Between the Cooperative Agent and the Emotion-Driven Agent, a significant difference was observed in appropriateness (
,
), suggesting that even passive compliance that ignores affect can be perceived as more appropriate than unregulated direct expression of emotion.
A brief supplementary nonparametric check (Friedman tests and Bonferroni-corrected Wilcoxon signed-rank tests) broadly supported the main patterns for satisfaction and appropriateness, whereas trust remained comparatively weaker.
5.3. Discussion of Results
The results suggest that an explicit mediation structure including the Action Tendency Layer may be perceived by users as more coherent and more appropriate than the Emotion-Driven Agent. The clearest pattern is that the Action Tendency Agent received higher ratings than the Emotion-Driven Agent in satisfaction and appropriateness. A difference was also observed in trust, but this result should be interpreted cautiously and treated only as supplementary because of the limited internal consistency of the trust scale. In particular, the large effect size for appropriateness () suggests that passing through an intermediate tendency layer may be perceived as more reasonable than connecting emotion directly to concrete behavior.
The comparison with the Cooperative Agent showed a similar pattern. The Action Tendency Agent scored higher in satisfaction (, ), whereas trust and appropriateness did not differ significantly. This suggests that role- and context-grounded regulation need not reduce user experience and may be valuable not because it always complies, but because it produces responses perceived as better suited to the situation.
The comparatively low performance of the Emotion-Driven Agent suggests that this direct-mapping condition may be perceived as less coherent and less appropriate. The higher appropriateness rating of the Cooperative Agent over the Emotion-Driven Agent () points in the same direction. By contrast, the Action Tendency Agent transformed the same anger state into restrained performance in home care and firm refusal in gaming, illustrating how an intermediate tendency layer can differentiate behavior across role and context.
It is also important that no significant difference was observed among the three models in interaction quality (, ). This suggests that the three models produced broadly similar overall interaction experiences and that the differences observed in satisfaction and appropriateness cannot be reduced simply to differences in liveliness or naturalness. Trust-related interpretation should likewise remain cautious because the internal consistency of the trust scale was relatively low ().
Taken together, the experiment suggests that the proposed structure may be perceived more favorably than the Emotion-Driven Agent, particularly in satisfaction and appropriateness. Relative to the Cooperative Agent, it showed an advantage in satisfaction without clear disadvantages in the other measured dimensions. These findings should nevertheless be interpreted as exploratory given the small sample size (), the video-based design, the limited trust reliability, and the absence of a dedicated order/carryover model.
6. General Discussion
This section interprets the results in relation to the structural characteristics of the architecture and discusses scalability and limitations.
6.1. Architectural Implications of Explicit Emotion-to-Behavior Mediation
The main architectural implication of these results is that emotion-to-behavior mediation can be treated as an explicit and reusable design layer rather than being buried inside local rules. In the proposed framework, this layer allows the same affective state to be translated into different behaviors according to role and context while keeping the mediation logic structurally visible.
The comparison with the Cooperative Agent further suggests that such regulation need not reduce user satisfaction; the value of the proposed structure may lie less in always complying than in producing responses perceived as more fitting to the situation.
Finally, the absence of a significant difference in interaction quality and the relatively low internal consistency of the trust scale both call for caution in interpretation. Even so, the present findings suggest that the proposed structure may support a more appropriate user-experience pattern than the Emotion-Driven Agent and may offer a satisfaction advantage relative to the Cooperative Agent. Future work should examine these tendencies more rigorously in live HRI settings, with larger samples and richer reasoning modules.
6.2. Scalability: From Deterministic Baseline to Data-Driven Reasoning
The currently implemented Action Tendency Layer uses a deterministic lookup table (LUT). The core contribution of the present work lies not in this particular implementation, but in the architectural separation of a standardized intermediate interface connecting affective state to BDI-based behavior selection. The current LUT is simply a first-step instantiation of that structure, designed as a modular slot that can later be replaced through standardized input and output interfaces.
This interface structure also opens a concrete path for future integration with small language models (SLMs) or other data-driven reasoning modules. In future implementations, such modules could be used within the Action Tendency Layer itself to infer a context-sensitive tendency from affective state, role information, and current beliefs, rather than relying on a deterministic LUT. In this way, the proposed architecture separates the structural role of action tendency from any single implementation method and allows the current LUT baseline to be replaced by richer inference modules while preserving the same intermediate interface. From this perspective, the proposed framework should be understood not only as a deterministic baseline, but also as a structural scaffold for richer neuro-symbolic affective agents.
6.3. Limitations
The present study has several limitations.
First, the sample size () is small, and the empirical findings should therefore be interpreted as exploratory. The marginal significance observed in some post-hoc comparisons (, ) may reflect limited statistical power, and the current data do not justify claims of generalized superiority.
Second, the internal consistency of the trust scale () falls below the commonly accepted threshold of 0.70. Trust-related findings should therefore be interpreted with particular caution, which is also why this paper does not use trust as a strong primary claim.
Third, participants did not engage in live face-to-face interaction with the robot; instead, they watched videos and then completed questionnaires. This video-based design is advantageous for comparing the decision-making structures of the three conditions under matched presentation and behavior sequences [
44], but it cannot fully capture the bidirectionality and real-time responsiveness of actual interaction. The ecological validity of the study is therefore limited.
Fourth, although the three conditions were presented in random order and an exploratory check reflecting presentation order did not indicate a clear order-related difference, order effects and carryover effects were not tested using a separate statistical model. Their influence therefore cannot be fully ruled out.
Fifth, the comparison conditions used in the experiment were not exact reproductions of specific prior systems. Rather, they were designed to capture the core characteristics of the direct-mapping condition and the passive-compliance condition. The present findings should therefore be understood as an exploratory comparison with simplified comparison models, not as a direct superiority claim over systems such as FAtiMA, WASABI, or ABC-EBDI.
Sixth, the current framework was validated on only one robot platform (Buddy) and in only two domains (home care and gaming), and the results from the two domains were reported in aggregated form. As a result, generalizability across platforms and domains was not directly established, and future work should include domain-separated analyses as well as domain × condition interaction analyses.
Seventh, the current Action Tendency Layer is implemented through a deterministic LUT, and the PAD sign-based mapping, discrete emotion categories, and linear decay are also prototype-specific implementation choices. The current version places researcher-designed mappings inside an explicit intermediate interface, and the contribution lies in showing at what structural level such choices can be represented rather than in claiming that these mappings are themselves empirically settled.
7. Conclusions
This study treated the translation from emotion to behavior as an architectural design problem and proposed an Emotional BDI framework in which Frijda’s action tendency functions as an explicit mediation layer within BDI. By placing this directional layer between affect generation and behavioral realization, the framework allows the same emotion to be concretized differently according to role and context.
In an exploratory user evaluation with 26 participants, the Action Tendency Agent received more favorable ratings than the Emotion-Driven Agent in satisfaction and appropriateness and showed a satisfaction advantage over the Cooperative Agent. These findings suggest that responses grounded in role and context may improve user experience even when the agent does not always comply with user requests.
The current LUT implementation is only a first-step realization of the proposed architecture, and the main contribution of this work lies in formalizing an explicit interface for emotion-to-behavior mediation rather than in the specific implementation itself. Future work should extend and test this structure more rigorously through larger samples, live interaction settings, domain-separated analysis, and data-driven reasoning modules guided by theoretical and symbolic structure.