Design and Implementation of a Companion Robot with LLM-Based Hierarchical Emotion Motion Generation

Lim, Yoongu; Cho, Jaeuk; Lee, Duk-Yeon; Choi, Dongwoon; Lee, Dong-Wook

doi:10.3390/app152312759

Open AccessArticle

Design and Implementation of a Companion Robot with LLM-Based Hierarchical Emotion Motion Generation

by

Yoongu Lim

^1,2,

Jaeuk Cho

²,

Duk-Yeon Lee

²

,

Dongwoon Choi

²

and

Dong-Wook Lee

^1,2,*

¹

Robotics Engineering Department, Korea University of Science and Technology (UST), Ansan 15588, Republic of Korea

²

Human-Centric Robotics R&D Department, Korea Institute of Industrial Technology (KITECH), Ansan 15588, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12759; https://doi.org/10.3390/app152312759

Submission received: 23 October 2025 / Revised: 25 November 2025 / Accepted: 28 November 2025 / Published: 2 December 2025

Download

Browse Figures

Versions Notes

Abstract

Recently, human–robot interaction (HRI) with social robots has attracted significant attention. Among them, companion robots, which exhibit pet-like behaviors and interact with people primarily through non-verbal means, particularly require the generation of appropriate gestures. This paper presents the design and implementation of a companion cat robot, named PEPE, with a large language model (LLM)-based hierarchical emotional motion generation method. To design the cat-like companion robot, an analysis of feline emotional behaviors was conducted to identify the body parts and motions essential for effective emotional expression. Based on this analysis, the required degrees of freedom (DoFs) and structural configuration for PEPE were derived. To generate expressive gestures efficiently and reliably, a hierarchical LLM-based emotional motion generation method was proposed. The process defines the robot’s structural features, establishes a gesture generation code format, and incorporates emotion-based guidelines grounded in feline behavioral analysis to mitigate LLM hallucination and ensure physical feasibility. The proposed method was implemented on the physical robot, and eight emotional gestures were generated—Happy, Angry, Sad, Fearful, Joyful, Excited, Positive Feedback, and Negative Feedback. A user study with 15 participants was conducted to validate the system. The high-arousal gestures—Angry, Joyful, and Excited—were rated significantly above the neutral clarity threshold (p < 0.05), demonstrating clear user recognition. Meanwhile, low-arousal gestures exhibited neutral-level perceptions consistent with their subtle motion profiles. These results confirm that the proposed LLM-based framework effectively generates expressive, physically executable gestures for a companion robot.

Keywords:

companion robots; human–robot interaction (HRI); large language model (LLM); emotional gestures; prompting techniques

1. Introduction

A social robot is designed to support people in achieving social or emotional goals [1]. Recently, studies on human–robot interaction involving social robots have attracted significant attention, and researchers have explored various approaches—such as vision and voice—to realize effective HRI for human interaction [2]. Different embodiments of social robots lead to different user experiences, as the shape and appearance of a robot not only affect how it is perceived but also influence its overall performance in human–robot interaction. The physical form plays a decisive role in shaping user perception and the overall interaction experience, as shown in a recent study comparing humanoid and telepresence robots [3]. In addition, object properties such as size, shape, form, texture, and color are critical factors that shape the responses of the user [4]. Human-like social robots are particularly susceptible to the uncanny valley effect, in which increasing human resemblance can provoke discomfort or unease in users [5]. In response to this issue and with the aim of improving HRI, research has increasingly focused on pet-like or zoomorphic companion robots in various application areas [6,7].

In this paper, we introduce PEPE, a cat-like companion robot with an animal-inspired appearance and emotional gestures informed by feline behavior analysis. The aim is to alleviate the uncanny valley effect by fostering comfort and a sense of friendliness in human–robot interaction. The design of PEPE was informed by a previous study on domestic cat emotional behavior, which employed the Facial Action Coding System (FACS) to identify relevant Action Units (AUs) and Action Descriptors (ADs) [8]. Analysis revealed that, across friendly and non-friendly contexts, the most active AUs and ADs were linked to the ears and eyes. These findings guided the design of PEPE’s facial features. For PEPE’s body design, we incorporated insights from online resources on feline body language, focusing on how cats express emotions through posture and movement in various situations. Based on these resources, we concluded that the body parts most relevant to emotional expression are the tail, ears, and legs [9,10,11,12,13]. These insights guided the design of PEPE’s body, and neck movement was additionally incorporated to support interaction.

For effective HRI, the behaviors of social robots must be carefully designed; otherwise, they may be misunderstood or misinterpreted, causing users to perceive weirdness, unfriendliness, or even danger [14,15]. Previous studies have introduced a rule-based approach, including sentiment-driven models [14], to address the problem; however, this method remains limited in flexibility and expressiveness.

Recent advancements in Large Language Models (LLMs) have shown potential for robot motion and task planning, particularly in generating expressive or context-aware actions [16,17]. Manually designing an action sequence and executing the motion onto the robot is a complex task that requires sustained effort, an area where LLMs could play a significant role. Prior work has shown that LLMs can be effective in generating gestures for human-like social robots [18]. The application of LLMs, and evidence of their effectiveness, has the potential to simplify emotional gesture generation and provide greater flexibility and expressiveness. While the usage of LLMs increases, prompting techniques and methods have also advanced. Because LLMs are trained on large datasets, they may often posses the general capabilities required for the task given [19]. However, to complete the task accordingly, the user must specify the necessary information and provide proper instructions for the desired answer. Previous studies have sought to address this problem by applying various prompting techniques [20], including a more specific method such as Chain-of-Thought (CoT) [21]. Instead of emphasizing the prompting method itself, this study focuses on how an LLM can be guided to produce emotionally coherent and physically feasible motion sequences for a companion robot.

This study presents an LLM-based hierarchical motion generation framework for companion robots. The approach enables the model to synthesize emotional gestures by integrating robot structural data, motion code formats, and emotion-specific guidelines in a stepwise manner. Through this structured process, emotional motions can be produced efficiently and aligned with the robot’s physical design. The process employs a progressive instructional sequence to provide the model with robot-specific context, motion data structures, and emotion principles. The architecture of the method consists of (1) providing the companion robot’s structural features, (2) defining the gesture-generating code format, and (3) specifying emotion-based guidelines. The first prompt outlines the mechanical design and structural details of the robot, including actuator locations, body part placement, and actuator angle thresholds. The second prompt defines the required output format, which is subsequently executed on the robot for motion evaluation. The third prompt specifies the emotions that each segment of the generated code is intended to represent. By utilizing the proposed LLM-based method, it becomes possible to generate emotional gestures for companion robots. The generated motions were implemented on the developed companion robot, PEPE, and a Likert-scale questionnaire survey is conducted to validate the implemented motions.

The key contributions of this paper are as follows:

Development of PEPE, a cat-like companion robot with multiple degrees of freedom (DoFs) inspired by feline emotional behaviors.
Proposal of a progressive instructional prompting technique for LLMs to generate emotional gestures.
Implementation on PEPE and evaluation through a Likert-scale questionnaire survey.

The remainder of this paper is organized as follows: Section 2 presents an analysis of feline emotional expressions to identify the degrees of freedom necessary for efficient HRI in the development of companion robots. Section 3 describes the structure of the cat-like companion robot, PEPE, designed based on this analysis. Section 4 introduces the LLM-based motion generation framework for producing emotional gestures in PEPE. Section 5 reports the evaluation of the proposed method through a user study. Finally, Section 6 concludes the paper.

2. Analysis of Feline Emotional Expressions

Designing a pet-like robot to express emotional gestures through non-verbal communication and enhance HRI is a challenging task. Prior studies have highlighted how embodiment and gesture recognition play a central role in human–robot communication [22] and how multimodal expression through motion, color, and sound can add complexity to design [23]. Building on these insights, we propose a design of a cat-like companion robot, PEPE, which leverages multiple degrees of freedom derived from domestic cat behaviors [8] to improve the richness of emotional gestures and enhance the overall user experience.

Our companion robot is designed to visually resemble and behaviorally mimic a domestic cat. To guide the development of its emotional motion, we drew on prior research by other researchers analyzing feline facial signals using the Cat Facial Action Coding System (catFACS) [8,24]. These studies demonstrated how specific facial muscle movements are active during different affiliative and non-affiliative social interactions, identifying which facial regions, such as the ears, eyes, and whiskers, are most expressive in emotional contexts. Table 1 presents the top five most active AUs, with the ear identified as the most frequently involved body part. In addition to facial expression studies, we also referred to online resources [9,10,11,12,13] on feline body language to capture how cats display emotions through posture and movement. These sources provide practical insights into how domestic cats use their heads, feet, bodies, tails, and overall body stance to communicate emotions. By combining this behavioral knowledge, we were able to identify key non-verbal cues, as shown in Table 2.

For instance, relaxed ear and tail movements often indicate positive emotions such as happiness, while rigid or rapid motions may signal heightened arousal states such as anger, fear, or excitement. Conversely, lowered ears and a crouched body posture typically reflect negative affect, such as sadness or fear. Table 2 summarizes these distinctive behavioral markers across a range of emotional situations.

To determine which emotional categories to incorporate, we referred to earlier works on basic emotions in Drosophila, which identified three fundamental affective states: happy, angry, and sad [25]. Building on this foundation, we extended the set to include fearful, joyful, and excited, as these states are commonly associated with expressive animal behaviors and are relevant for enriching emotional diversity in companion robots. In addition, we incorporated positive and negative feedback as interaction-oriented categories, enabling PEPE to respond directly to user actions. This expanded set of eight emotions provides a balanced framework that combines biologically grounded basic emotions with socially meaningful interaction states, guiding both the behavioral design and motion generation processes.

The identified features were subsequently taken into account in the design of each body part.

3. Design and Implementation of PEPE

3.1. Mechanical and Structural Design of PEPE

The cat-like form of PEPE was chosen to enhance approachability and foster emotional engagement in HRI. Prior research [6,7,26] has shown that zoomorphic or pet-like robots can improve user comfort and acceptance by evoking familiarity and empathy. In addition, we adopted a neutral sitting-cat posture to convey stability and provide users with a reassuring impression of the robot. Maintaining the original esthetic design constrained the available space for integrating mechanical features that enable expressive motions. With respect to prior research [8,9,10,11,12,13], we provided DoFs to ears, neck, front legs, and tail for expressing emotional gestures. Figure 1 shows the external and internal design of PEPE.

The external covers of the head, ears, body, legs, and tail were produced through 3D printing with filament. The head cover was later fitted with artificial fur to enable future interaction scenarios, such as users touching the robot’s head or providing tactile feedback. The internal structural framework that supports the robot is constructed of aluminum alloy. The skeletal structure and degrees of freedom are shown in Figure 2. Further details are provided in the following paragraphs.

The ear module consists of two XC330-T288-T(ROBOTIS, Seoul, South Korea) actuators (ROBOTIS), with each actuator controlling one ear independently, as shown in Figure 2b. The ears move along the pitch axis and can operate separately. Based on Table 2, when the ears reach the forward threshold, they tilt downward, conveying a sad or negative impression to the user. When positioned at the center, the ears are oriented forward, representing a neutral state. At the backward threshold, the ears are drawn back to express anger or stress.

The head module integrates all major electronic components, including a motor controller, a single-board computer (Raspberry Pi 5, Raspberry Pi Ltd., Cambridge, UK), a WIFI router, two LCD displays (T-Display-S3 AMOLED, AMOLED, LilyGO, Shenzhen, China), a camera (Arducam Mini 16 MP IMX519, Arducam, Chengdu, China), two microphones (I2S MEMS INMP441, InvenSense, San Jose, CA, USA), and a mini speaker (MAX 98347A, Analog Devices, Inc., San Jose, CA, USA). The motor controller, equipped with a relay module, provides low-level control of the actuators and is used to execute the emotional motion codes. The single-board computer performs high-level control, managing communication with the camera. The LCD displays function as the robot’s eyes, enabling the visualization of emotional states during gestures. The camera, microphones, and speaker were included to support multimodal interaction between the robot and the user; however, these were not utilized in this study in order to focus on the evaluation of motion-based emotional gestures. Details of the control system are presented in the next section.

The neck module consists of two XM430-W350-T (ROBOTIS, Seoul, Republic of Korea) actuators (ROBOTIS), which control the pitch and yaw axes, as presented in Figure 2c. More powerful actuators were selected for this module because they support the head, which houses multiple electronic components, the ear module, and the external head cover. The inclusion of both pitch and yaw axes provides multiple degrees of freedom, enabling the robot to orient its head toward the user or other objects during interaction. In addition, based on Table 2, this configuration allows the head to perform motions—such as shaking side to side or facing downward—to convey emotional states.

The body module consists of two XM430-W350-T actuators (ROBOTIS), each controlling the pitch axis of a front leg, as illustrated in Figure 2d. These actuators were selected to accommodate the sitting design of the robot, in which the majority of the weight is supported by the front. The purpose of the front legs is not mobility, but rather the expression of specific emotions; therefore, their motion is designed to reflect the feline behaviors identified in our analysis. The internal structure houses six lithium-polymer batteries, enabling the robot to function independently when lifted or moved to different locations. For reasons of weight and space, mobility was not implemented in the back legs. Instead, a passive mechanism was introduced, allowing them to move independently of the body and giving the appearance of a four-legged creature rather than a rigid or immobile structure.

The tail module consists of two XC330-T288-T actuators (ROBOTIS), each controlling the pitch and roll axes, as detailed in Figure 2e. At least two degrees of freedom are required for the tail to reproduce the emotional movements identified in Table 2. The tail was designed to be lightweight, ensuring safe interaction while preserving natural motion.

3.2. Electronic and Control System Design of PEPE

As illustrated in Figure 3, the workflow of PEPE is structured around modular subsystems interconnected through the internet and coordinated via an MQTT broker. Emotional gesture codes are generated offline using a LLM and uploaded to the robot. The ESP32 modules handle the primary control functions: one ESP32 manages the actuators for motion execution, another drives the left and right LCD eye displays to visualize emotional states, and a third supports the microphone and speaker for potential audio interaction. The specific eye displays used for each emotion are shown in Figure 4, adding an additional layer of visual expressiveness to the gestures. The actuators, including those in the ears, neck, legs, and tail, execute the predefined gestures under ESP32 control. A Raspberry Pi 5 is dedicated exclusively to camera processing, separating vision tasks from low-level motor control. This distributed architecture ensures clear functional separation, lightweight communication through MQTT, and flexible expansion for future multimodal interaction.

Overall, the integration of structural modules, actuators, and electronic components enabled PEPE to resemble a cat-like companion robot capable of expressing emotions through non-verbal gestures. The following section presents the LLM-based framework using a progressive instructional prompting technique developed to generate and evaluate these emotional motions.

4. Progressive Instructional Prompting Technique

LLMs are capable of generating answers ranging from simple factual responses to detailed descriptions or specific tasks. However, the quality of the generated output strongly depends on the user’s ability to formulate clear and optimal prompts. The emerging paradigm of “pre-train, prompt, and predict” highlights this dependency, as effective prompt design is essential to leverage pre-trained models across diverse domains [27,28]. In addition, recent advances show that prompt-based frameworks are not limited to language but extend to graph-based reasoning and structured domains [29]. Parallel to these developments, prompting techniques have also been applied in motion-related tasks, where multimodal prompts (e.g., text, image, motion) enable conversational and interactive motion generation, bridging natural language with embodied control [30,31,32].

Prompting techniques have been widely studied as a means to improve the reliability and task alignment of LLMs. Prior survey studies categorize a broad range of strategies, including zero-shot and few-shot prompting, chain-of-thought reasoning, and multimodal prompting [33,34,35]. These findings suggest that tailored prompting techniques are necessary when adapting LLMs to domains beyond their pretrained data. The companion robot used in this study was developed recently and would not have been included in the datasets. Therefore, we suggest a progressive instructional prompting method that informs the companion robot’s information and generates the code, aiming to minimize errors, reduce hallucinations, and increase robustness for motion generation in companion robots.

In prior research, persona prompting has been adopted to steer LLM behavior by assigning a consistent role. However, systematic evaluations show mixed outcomes: in many objective tasks, persona-based system prompts yield negligible or negative gains in task performance [36], and model behavior can vary depending on how the persona is phrased or contextualized [37]. These findings motivate careful persona design and intermediate checks in sequential prompt pipelines.

In this study, a progressive instructional prompting method was employed to generate emotional gestures. ChatGPT served as the LLM responsible for generating emotional motion sequences for PEPE. The prompting process followed a hierarchical and sequential structure to ensure coherent and physically feasible outputs. As illustrated in Figure 5, the overall prompting flow consists of three main instructional prompts: robot external description, output code structure, and emotion guidelines. This structured prompting framework was designed to progressively refine the model’s understanding of the task—from general context to specific motion generation—while minimizing inconsistencies and undesired behavior in the output.

The designed persona and basic rules are as follows:

You are a helpful assistant for creating robotic motion codes.
I will provide instructions on how to develop emotional codes for a cat-like companion robot.
Three sets of instructions will be given sequentially: robot information, code information, and emotion information.
Do not generate the code until explicitly instructed to do so via a prompt.
If you have understood the assignment, respond with “Yes”.

The first instruction includes PEPE’s structural and external description. Despite being trained on a vast dataset, ChatGPT lacks essential information specific to our robot. To clearly convey this information, we primarily provided a structured description of the robot’s actuators, degrees of freedom, and angle thresholds for each body part (Appendix A), while these thresholds and specifications form the core of the prompt, we additionally supplied an image of the robot (Figure 1a) as a supporting feature to enhance contextual understanding. The actuators are located in the ears, neck, front feet, and tail, and these details—rather than the photo—serve as the essential basis for describing PEPE’s structure.

The second instruction outlines the fundamental structure of the code that needs to be generated. The robot’s control mechanism operates by receiving velocity, acceleration data, and actuator angular position information at each time step, which together define the temporal dynamics of motion execution. ChatGPT’s task is to infer and assign these parameters in accordance with the designated rules, ensuring that the generated code adheres to safe and feasible actuation. To guide this process, we provided a template of the expected code format in Appendix A, which specifies the required fields and their arrangement. This template acted as a structural reference, allowing ChatGPT to align its outputs with the defined conventions while also reducing the likelihood of missing parameters or generating inconsistent sequences. An illustrative example was included within the instructions to demonstrate how the conditions should be applied in practice, serving as a few-shot reference to stabilize generation quality.

The third and final instruction contains the principle governing the emotions that the robot should express. To generate distinct emotional gestures, we deliberately provided only the essential information for each emotion, rather than prescribing detailed movement trajectories. This fundamental information (outlined in Appendix A) consists of three components: (1) the target emotion to be conveyed, (2) the motion intensity, defined as whether the movement should be vibrant or scarce, and (3) the intended pace of the motion. By constraining the description to these elements, we encouraged ChatGPT to generalize motion features while maintaining consistency with the robot’s physical limits. Furthermore, we emphasized that outsourcing emotional features was acceptable in the generation process, thereby allowing the model to integrate both structured instructions and externally learned knowledge.

After instructing each guideline, ChatGPT answered with “Yes” and provided a summary for each given instruction. We divided this process into three instructions because, when presented as a single prompt, ChatGPT often omitted critical details-such as matching the lower bound of timestamps-produced incomplete or inconsistent code structures, and occasionally generated logically mismatched outputs (e.g., specifying 10 timesteps but providing only 9). By structuring the instructions step-by-step and requiring intermediate summaries, we aimed to make the system more robust in generating emotional gestures. During experimentation, however, we observed that including explicit parameter values in examples sometimes biased the output. For instance, if a sample prompt used a 500 ms timestamp interval, the generated code would rigidly adopt this interval rather than exploring alternative values. This behavior reflects the well-documented tendency of LLMs to overfit to demonstrated patterns. To mitigate this issue, we explicitly instructed the model not to replicate previous parameter settings and to generate values independently instead. This refinement increased the flexibility of the generated code while maintaining adherence to the structural rules of the prompt.

Before executing the final prompt to generate emotional gestures, a verification task was conducted to evaluate the effectiveness of the progressive instructional prompting technique. The objective of this task was to determine whether the model could accurately follow given instructions and generate a motion sequence as intended. The verification task required moving each body part individually to its maximum and minimum thresholds for a second each. Once the code was generated, we tested it on the robot and confirmed that it accurately managed to execute the task.

After completing the verification sequence, the final execution prompt was issued, successfully generating eight distinct emotional gestures. Three representative examples—Happy, Angry, and Joyful—are shown in Figure 6, illustrating the diversity of expressions produced through the sequential prompting technique. The corresponding motion codes are provided in Appendix B. Each gesture was subsequently tested on the physical robot to validate its performance and confirm that the motions operated as intended. Details of the validation procedure are presented in the following section.

5. Performance Validation

5.1. Validation Method

To evaluate the appropriateness of each emotional gesture, a questionnaire survey using a five-point Likert scale was conducted. Previous studies have also assessed the perception and acceptance of robotic gestures through similar survey-based methods [18,26]. Since emotional expression is inherently subjective, this approach is considered appropriate for performance validation.

A total of 15 participants (ages 25–35; 4 females and 11 males) were recruited to take part in the study. All participants observed the robot in person under controlled laboratory conditions. Prior to participation, the purpose of the experiment and evaluation guidelines were explained, and all participants provided informed consent. No personally identifiable information was collected, and all responses were analyzed anonymously.

The questionnaire consisted of nine items. The first eight questions evaluated whether each emotional gesture accurately represented its corresponding emotion (Happy, Angry, Sad, Fearful, Joyful, Excited, Positive Feedback, and Negative Feedback), while the ninth question assessed the overall impression of all gestures combined. Each question was rated on a five-point Likert scale, where 1 indicated very inappropriate, 2 indicated inappropriate, 3 indicated neutral, 4 indicated appropriate, and 5 indicated very appropriate, yielding a maximum total score of 45 points. The survey questions are presented in Table A1 (Appendix C).

During the evaluation, each emotional gesture was demonstrated individually, with participants observing from a fixed, safe distance of approximately two meters. Participants could request additional repetitions if needed. Importantly, participants were not informed that the gestures were generated by ChatGPT to avoid potential bias in their evaluations. They were told that PEPE is a cat-like companion robot designed to enhance human–robot interaction and were encouraged to provide honest assessments based solely on their perceptions of the robot’s movements.

All evaluations were conducted under defined and controlled conditions. Each participant experienced the exact same procedure: the same laboratory space, identical lighting and background environment, constant noise level, and the same robot motion sequences executed from the same initial pose. The gestures were presented in a fixed demonstration order, and the robot operated with the same hardware configuration and motion code for all trials. Participants observed from a fixed distance of approximately one meter, and no interaction with the robot was permitted during the evaluation. These standardized conditions ensured that all participants evaluated the gestures under equivalent and fully reproducible circumstances.

5.2. Quantitative Evaluation of Emotional Gestures

The questionnaire survey evaluated how 15 participants perceived the appropriateness of PEPE’s emotional gestures. Participants were also encouraged to provide written feedback for each gesture when possible.

As illustrated in Figure 7a, the average Likert-scale ratings revealed varying levels of acceptance across the eight emotional gestures. The Happy gesture averaged 2.8, Angry 3.8, Sad 3.0, Fearful 3.2, Joyful 3.8, Excited 3.5, Positive Feedback 3.1, and Negative Feedback 2.6. The overall mean rating was 3.0, indicating a neutral level of acceptance for the generated gestures. Among all emotions, Angry and Joyful were rated the highest, suggesting that these gestures were perceived as the most expressive and appropriate, whereas Negative Feedback and Happy received the lowest scores.

The error bars in Figure 7a represent the standard deviations for each emotion, showing the level of participant agreement. The Happy gesture (SD = 0.68) exhibited relatively consistent responses below the neutral threshold, while Angry (SD = 0.74) and Joyful (SD = 0.83) showed moderate agreement around the “appropriate” range. The Sad gesture (SD = 0.85) and Fearful gesture (SD = 0.94) indicated more dispersed responses, suggesting mixed perceptions among participants. The Excited gesture (SD = 0.83) also showed moderate variability, whereas Positive Feedback (SD = 0.83) remained neutral overall. Finally, the Negative Feedback gesture (SD = 0.74) demonstrated relatively consistent disagreement, indicating that most participants did not associate the motion with the intended emotion.

As shown in Figure 7b, the gestures were further grouped into positive (Happy, Joyful, Excited, and Positive Feedback) and negative (Angry, Sad, Fearful, and Negative Feedback) emotion categories to examine broader trends in perception. The positive emotions achieved an average rating of 3.3 ± 0.87, indicating general agreement among participants that these gestures were appropriate and recognizable. In contrast, the negative emotion category averaged 3.1 ± 0.92, reflecting slightly lower scores and greater variability in participants’ evaluations. This suggests that while PEPE’s positive gestures were interpreted more consistently, the negative expressions elicited more diverse responses, possibly due to subtler or less distinguishable motion features.

To complement the descriptive results, a one-sided Wilcoxon signed-rank test—summarized in Table 3—was performed to determine whether each gesture’s rating exceeded the neutral midpoint (3). The results showed that three high-arousal emotions—Angry, Joyful, and Excited—were rated significantly above the neutral threshold (p < 0.05), corresponding to the Clear recognition category in the table. In contrast, Sad, Fearful, and Positive Feedback exhibited average ratings near the neutral point and did not differ significantly from the midpoint (p ≥ 0.05), leading to their classification as Neutral recognition. The gestures Happy and Negative Feedback, which received lower mean scores and non-significant results, were categorized as Subtle recognition, reflecting their low-arousal or understated motion characteristics. A grouped analysis further supported this trend: the positive emotion set (Happy, Joyful, Excited, Positive Feedback) was rated significantly above neutral (W = 490.0, p = 0.0037), whereas the negative emotion set (Angry, Sad, Fearful, Negative Feedback) did not reach significance (W = 435.0, p = 0.085). These findings collectively suggest that PEPE’s high-intensity emotions were reliably recognized, while low-intensity emotions produced milder or more neutral impressions, consistent with their design intent.

In summary, the evaluation results indicate that participants generally perceived PEPE’s emotional gestures at a moderate level, with clearer recognition for high-arousal expressions. The descriptive ratings and Wilcoxon analysis consistently showed that Angry, Joyful, and Excited gestures were interpreted as appropriate and expressive, reflecting their dynamic and high-intensity motion designs. In contrast, low-arousal emotions such as Happy, Sad, Fearful, Positive Feedback, and Negative Feedback tended to cluster around the neutral midpoint and did not reach statistical significance, which aligns with their subtle and low-amplitude behaviors. When grouped, the positive emotions collectively surpassed the neutral threshold, whereas the negative set showed greater variability and did not achieve significance, suggesting that certain negative expressions require clearer motion differentiation. Overall, these findings confirm that PEPE effectively conveys high-intensity emotions while providing milder expressions for low-arousal states, with room for refinement in enhancing the clarity and distinctiveness of subtler emotional gestures.

5.3. Qualitative Feedback Analysis

In addition to the quantitative ratings, participants provided written feedback for each emotional gesture to describe their impressions of PEPE’s movements. Overall, the comments suggested that while the gestures were generally understandable, several required refinement to appear more expressive and natural. Participants also noted that a clearer distinction between positive and negative emotional gestures would improve the robot’s overall expressiveness.

Feedback on the Happy gesture, which received a relatively low acceptance score, indicated that the motion appeared “vague,” “slow,” and “unnatural.” One participant remarked that “the movements were not fluent and lacked diversity,” suggesting that the gestures should include more dynamic transitions. The Angry gesture, which received one of the highest ratings, was described as “appropriate” and “relatable.” Participants commented that the motion effectively conveyed tension and matched the expected behavior of a cat expressing anger. Feedback on the Sad gesture reflected a mix of opinions. Positive comments mentioned that “the movement showed a relatable image of a cat being sad,” while others noted that it was “vague” and “should include more sensible movement.” These responses suggest that, although the gesture conveyed the intended emotion to some extent, its expressiveness could be enhanced through more pronounced or fluid motion. The Fearful gesture received the most divided feedback. Some participants described it as “acceptable,” while others felt “it looked more like anger.” One participant pointed out that the robot’s mechanical limitations-such as the number of degrees of freedom (DoFs)-restricted its ability to fully express fear. This indicates that hardware constraints may have affected the clarity of this emotion. For the Joyful gesture, which showed moderate agreement in ratings, most participants responded positively, stating that “it gave the vibe of a joyful cat.” A few participants, however, suggested that the gesture “needed more diverse actions” to appear more vivid and energetic. The Excited gesture was also reviewed favorably, though one participant mentioned that it “should be more distinguishable from the Joyful gesture,” implying a need for clearer motion differentiation between these two similar emotions. Feedback on the Positive Feedback gesture suggested that participants found it appropriate for scenarios in which the robot communicates a positive or encouraging response to the user. In contrast, participants described the Negative Feedback gesture as “vague,” and some noted that “it might have been more expressive if the cat had shown a form of rejection or rebellion toward the user.”

Overall, the qualitative feedback indicates that, while PEPE’s emotional gestures were generally acceptable and recognizable, several require refinement to enhance distinctiveness, fluidity, and emotional depth. Participants emphasized the importance of a clearer separation between positive and negative motions and suggested that additional expressive features—such as increased degrees of freedom, sound effects, or changes in eye display—could further improve emotional realism and user engagement.

6. Discussion

This study examined the potential of large language models to generate emotionally expressive gestures for the cat-like companion robot PEPE through a progressive instructional prompting technique. The survey results and participant feedback together provide insight into both the strengths and limitations of this approach.

Overall, the LLM-generated gestures were successful in conveying the intended emotions. Joyful and Angry gestures received the highest recognition scores, indicating that movements characterized by clear dynamics and higher intensity were more easily interpreted by observers. This observation was supported by the Wilcoxon signed-rank analysis, which confirmed that Angry, Joyful, and Excited gestures were rated significantly above the neutral midpoint (p < 0.05), corresponding to Clear recognition in the evaluation. In contrast, the remaining low-arousal gestures—such as Fearful, Sad, and Positive Feedback—displayed mean ratings near the neutral point and larger variability, indicating Neutral recognition rather than strong emotional identification. The Happy and Negative Feedback gestures showed slightly lower mean values and non-significant results (p ≥ 0.05), reflecting Subtle recognition consistent with their understated motion profiles. Group-level analysis also revealed that positive emotions (M = 3.3 ± 0.87) were perceived more consistently than negative emotions (M = 3.1 ± 0.92), and the positive emotion set as a whole was statistically above neutral (p = 0.0037). This asymmetry aligns with previous findings in human–robot interaction studies, where positive and energetic expressions are generally recognized with greater accuracy than restrained or defensive ones [18,26].

Earlier research on robot emotion expression has emphasized the importance of motion amplitude, speed, and synchrony with other modalities for clear affective communication. Consistent with these observations, participants in our study responded favorably to gestures that contained evident amplitude changes and temporal variation. Compared with rule-based or manually designed gestures, the LLM-generated motions achieved comparable levels of user acceptance while requiring substantially less design time. These findings demonstrate that prompt-driven generation can serve as a viable alternative to handcrafted emotional motion libraries.

Feedback indicated that several gestures appeared “vague” or “unnatural,” primarily due to mechanical limitations in PEPE’s degrees of freedom and the absence of supporting cues, such as facial expressions or sounds. The Fearful gesture, in particular, was often interpreted as Angry, revealing the need for clearer motion segmentation and timing control. Moreover, some gestures lacked smooth transitions between poses, implying that prompt-based generation should include temporal-continuity constraints or post-processing filters. The small number of participants (n = 15) and single-modality evaluation also limit the generalization of the results; larger and more diverse user groups may yield deeper insights into cultural or demographic differences in emotion recognition.

The results highlight promising directions for enhancing LLM-based motion generation. Future studies will refine the progressive instructional prompting framework by introducing additional physical parameters—such as velocity envelopes, phase timing, and amplitude scaling—to achieve smoother and more lifelike gestures. Integrating multimodal features, including sound cues or animated eye displays, could further strengthen emotional clarity. Reinforcement or imitation learning techniques may also be combined with LLM-generated base motions to enable adaptive, user-responsive behavior. Together, these improvements could advance the realism and social expressiveness of companion robots like PEPE.

7. Conclusions

This study presented the design and implementation of PEPE, a cat-like companion robot developed to enhance emotional expressiveness and human–robot interaction. The robot was designed based on feline behavioral analysis, providing multiple degrees of freedom in the ears, neck, legs, and tail to support diverse non-verbal emotional gestures.

Building upon this design, an LLM-based hierarchical framework employing a progressive instructional prompting technique was applied to generate motion sequences. The framework allowed ChatGPT to produce eight distinct motion sequences representing different emotional states, which were evaluated through a user study involving 15 participants. Quantitative results showed that Joyful and Angry gestures were perceived as the most appropriate and expressive, while Fearful and Negative Feedback received lower ratings. The Wilcoxon signed-rank analysis further confirmed that Angry, Joyful, and Excited gestures were rated significantly above the neutral clarity threshold (p < 0.05), corresponding to clear recognition, whereas the remaining low-arousal gestures showed neutral or subtle recognition, remaining statistically indistinguishable from the midpoint—consistent with their mild and understated design. Qualitative feedback further revealed that participants generally recognized the intended emotions, but found some gestures to be vague or lacking fluidity. These findings demonstrate that the proposed prompting approach can generate recognizable emotional motions while minimizing manual design effort.

The study also identified several challenges in improving robot expressiveness. Limited mechanical degrees of freedom, the absence of multimodal cues—such as sound or facial animation—and subtle motion transitions were key factors that affected emotional clarity. In addition, the current design is limited by the small eye-display area and the lack of multimodal outputs such as sound or mouth animation. A next-generation version of PEPE is currently in development, incorporating an expanded visual display and additional multimodal cues to enhance emotional expressiveness. Future work will focus on enhancing the prompting framework by integrating physical motion constraints, increasing robot articulation, and incorporating multimodal feedback mechanisms. Additionally, recent LLM models—such as updated ChatGPT variants, Google’s Gemini models, Meta’s LLaMA family, and other open-source alternatives—have continued to advance, offering opportunities to explore more diverse and expressive gesture-generation capabilities in future work. Combining the LLM-based generation method with reinforcement or imitation learning may further enable adaptive and user-responsive motion behavior. Overall, this research provides a foundational step toward data-efficient and scalable methods for emotional motion generation in companion robots, contributing to more natural and affect-aware human–robot interactions.

Author Contributions

Conceptualization, Y.L., J.C. and D.-W.L.; methodology, Y.L.; hardware, Y.L., D.C. and D.-Y.L.; software, Y.L. and D.-Y.L.; formal analysis, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, Y.L., J.C. and D.-W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Ministry of Trade, Industry and Energy (MOTIE) in the year 2022 Robot Industrial Technology Project “Development of companion robot technologies capable of emotional connection based on Human–Robot physical and cognitive interaction.” (No. 20018513).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data generated during the robot motion experiments are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Prompt Design

The progressive instructional prompting template is provided below:

Appendix A.1. Persona

Appendix A.2. Step 1: Robot Information

Appendix A.3. Step 2: Code Information

Appendix A.4. Step 3: Emotion Information

Appendix B. Generated Motion Codes

Representative generated codes are shown below. For brevity, only the full code for *Happy*, *Angry*, and *Joyful* gestures are included.

Appendix B.1. Happy Gesture

Appendix B.2. Angry Gesture

Appendix B.3. Joyful Gesture

Appendix C. Survey Questionnaire

Table A1 presents the full content of the questionnaire used to evaluate the appropriateness of the emotional gestures. Participants rated each question on a five-point Likert scale (1: very inappropriate, 2: inappropriate, 3: neutral, 4: appropriate, 5: very appropriate). The questionnaire was self-developed based on prior studies that investigated gesture perception and robot acceptability through survey-based evaluation methods [18,26].

Table A1. Survey questions for evaluating emotional gesture appropriateness.

No.	Question
1	The motion representing “Happy” appropriately conveys the emotion.
2	The motion representing “Angry” appropriately conveys the emotion.
3	The motion representing “Sad” appropriately conveys the emotion.
4	The motion representing “Fearful” appropriately conveys the emotion.
5	The motion representing “Joyful” appropriately conveys the emotion.
6	The motion representing “Excited” appropriately conveys the emotion.
7	The motion representing “Positive Feedback” appropriately conveys the emotion.
8	The motion representing “Negative Feedback” appropriately conveys the emotion.
9	Overall, the emotional gestures are appropriate and expressive.

References

Siciliano, B.; Khatib, O. Springer Handbook of Robotics; Springer: Berlin/Heidelberg, Germany, 2016; pp. 1349–1360. [Google Scholar]
Dou, X.; Yan, L.; Wu, K.; Niu, J. Effects of Voice and Lighting Color on the Social Perception of Home Healthcare Robots. Appl. Sci. 2022, 12, 12191. [Google Scholar] [CrossRef]
Mårell-Olsson, E.; Bensch, S.; Hellström, T.; Alm, H.; Hyllbrant, A.; Leonardson, M.; Westberg, S. Navigating the Human–Robot Interface—Exploring Human Interactions and Perceptions with Social and Telepresence Robots. Appl. Sci. 2025, 15, 1127. [Google Scholar] [CrossRef]
Urakami, J.; Seaborn, K. Nonverbal Cues in Human–Robot Interaction: A Communication Studies Perspective. ACM Trans. Hum. Robot Interact. 2023, 12, 22. [Google Scholar] [CrossRef]
Mori, M.; MacDorman, K.F.; Kageki, N. The Uncanny Valley [From the Field]. IEEE Robot. Autom. Mag. 2012, 19, 98–100. [Google Scholar] [CrossRef]
Pike, J.; Picking, R.; Cunningham, S. Robot Companion Cats for People at Home with Dementia: A qualitative case study on companotics. Dementia 2021, 20, 1300–1318. [Google Scholar] [CrossRef]
Ihamäki, P.; Heljakka, K. Robot Dog Intervention with the Golden Pup: Activating Social and Empathy Experiences of Elderly People as Part of Intergenerational Interaction. In Proceedings of the 54th Hawaii International Conference on System Sciences (HICSS), Maui, HI, USA, 5–8 January 2021; pp. 1888–1897. [Google Scholar]
Scott, L.; Florkiewicz, B.N. Feline faces: Unraveling the Social Function of Domestic Cat Facial Signals. Behav. Process. 2023, 213, 104959. [Google Scholar] [CrossRef]
How to Decode Your Cat’s Behavior. Available online: https://www.rd.com/list/how-to-decode-your-cats-behavior/ (accessed on 13 October 2025).
Decoding Cat Body Language. Available online: https://catcaresociety.org/decoding-cat-body-language/ (accessed on 13 October 2025).
Understanding Body Language in Cats. Available online: https://thevets.com/resources/pet-health-care/understanding-body-language-in-cats/ (accessed on 13 October 2025).
Cat Body Language: What Your Feline Friend Is Trying to Tell You. Available online: https://www.sciencefocus.com/nature/cat-body-language (accessed on 13 October 2025).
How Do Cats Communicate With Each Other? Available online: https://www.petmd.com/news/view/cat-language-101-how-do-cats-talk-each-other-37620 (accessed on 13 October 2025).
Nakata, T.; Sato, T.; Mori, T. Expression of Emotion and Intention by Robot Body Movement. Intell. Auton. Syst. 1998, 5, 352–359. [Google Scholar]
Bretan, M.; Hoffman, G.; Weinberg, G. Emotionally Expressive Dynamic Physical Behaviors in Robots. Int. J. Hum. Comput. Stud. 2015, 78, 1–16. [Google Scholar] [CrossRef]
Ding, Y.; Zhang, X.; Paxton, C.; Zhang, S. Task and Motion Planning with Large Language Models for Object Rearrangement. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 2086–2092. [Google Scholar]
Wang, S.; Han, M.; Jiao, Z.; Zhang, Z.; Wu, Y.N.; Zhu, S.; Liu, H. LLM3: Large Language Model-based Task and Motion Planning with Motion Failure Reasoning. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; pp. 12086–12092. [Google Scholar]
Huang, P.; Hu, Y.; Nechyporenko, N.; Kim, D.; Talbott, W.; Zhang, J. EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning. IEEE Robot. Autom. Lett. 2025, 10, 7699–7706. [Google Scholar] [CrossRef]
Schulhoff, S.; Ilie, M.; Balepur, N.; Kahadze, K.; Liu, A.; Si, C.; Li, Y.; Gupta, A.; Han, H.; Schulhoff, S.; et al. The Prompt Report: A Systematic Survey of Prompt Engineering Techniques. arXiv 2025, arXiv:2406.06608. [Google Scholar]
Amatriain, X. Prompt Design and Engineering: Introduction and Advanced Methods. arXiv 2024, arXiv:2401.14423. [Google Scholar] [CrossRef]
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.H.; Le, Q.V.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv 2023, arXiv:2201.11903. [Google Scholar]
Fiorini, L.; D’Onofrio, G.; Sorrentino, A.; Cornacchia Loizzo, F.G.; Russo, S.; Ciccone, F.; Giuliani, F.; Sancarlo, D.; Cavallo, F. The Role of Coherent Robot Behavior and Embodiment in Emotion Perception and Recognition During Human-Robot Interaction: Experimental Study. JMIR Hum. Factors 2024, 11, e45494. [Google Scholar] [CrossRef] [PubMed]
Löffler, D.; Schmidt, N.; Tscharn, R. Multimodal Expression of Artificial Emotion in Social Robots Using Color, Motion and Sound. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interacti (HRI), Chicago, IL, USA, 5–8 March 2018; pp. 334–343. [Google Scholar]
Caeiro, C.C.; Burrows, A.M.; Waller, B.M. Development and application of CatFACS: Are human cat adopters influenced by cat facial expressions? Appl. Anim. Behav. Sci. 2017, 189, 66–78. [Google Scholar] [CrossRef]
Gu, S.; Wang, F.; Patel, N.P.; Bourgeois, J.A.; Huang, J.H. A Model for Basic Emotions Using Observations of Behavior in Drosophila. Front. Psychol. 2019, 10, 781. [Google Scholar] [CrossRef]
Nunez, E.; Hirokawa, M.; Suzuki, K. Design of a Huggable Social Robot with Affective Expressions Using Projected Images. Appl. Sci. 2021, 8, 2298. [Google Scholar] [CrossRef]
Liang, J.; Huang, W.; Xia, F.; Hausman, K.; Ichter, B.; Florence, P.; Zeng, A. Code as Policies: Language Model Programs for Embodied Control. arXiv 2023, arXiv:2209.07753. [Google Scholar] [CrossRef]
Wang, G.; Xie, Y.; Jiang, Y.; Mandlekar, A.; Xiao, C.; Zhu, Y.; Fan, L.; Anandkumar, A. Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv 2023, arXiv:2305.16291. [Google Scholar] [CrossRef]
Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arXiv 2021, arXiv:2107.13586. [Google Scholar] [CrossRef]
Wu, X.; Zhou, K.; Sun, M.; Wang, X.; Liu, N. A Survey of Graph Prompting Methods: Techniques, Applications, and Challenges. arXiv 2023, arXiv:2303.07275. [Google Scholar] [CrossRef]
Jiang, B.; Chen, X.; Yin, F.; Li, Z.; YU, G.; Fan, J. MotionChain: Conversational Motion Controllers via Multimodal Prompts. arXiv 2024, arXiv:2404.01700. [Google Scholar] [CrossRef]
Wu, Q.; Zhao, Y.; Liu, X.; Tai, Y.; Tang, C. Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs. arXiv 2025, arXiv:2405.17013. [Google Scholar]
Sahoo, P.; Singh, A.K.; Saha, S.; Jain, V.; Mondal, S.; Chadha, A. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv 2025, arXiv:2402.07927. [Google Scholar]
Vatsal, S.; Dubey, H. A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks. arXiv 2024, arXiv:2407.12994. [Google Scholar] [CrossRef]
Chang, K.; Xu, S.; Wang, C.; Luo, Y.; Liu, X.; Xiao, T.; Zhu, J. Efficient Prompting Methods for Large Language Models: A Survey. arXiv 2024, arXiv:2404.01077. [Google Scholar] [CrossRef]
Zheng, M.; Pei, J.; Logeswaran, L.; Lee, M.; Jurgens, D. When “A Helpful Assistant” Is Not Really Helpful: Personas in System Prompts Do Not Improve Performances of Large Language Models. arXiv 2024, arXiv:2311.10054. [Google Scholar]
Lutz, M.; Sen, I.; Ahnert, G.; Rogers, E.; Strohmaier, M. The Prompt Makes the Person(a): A Systematic Evaluation of Sociodemographic Persona Prompting for Large Language Models. arXiv 2025, arXiv:2507.16076. [Google Scholar] [CrossRef]

Figure 1. Structural design and internal composition of the proposed companion robot, PEPE. (a) Frontal view of the robot in its neutral sitting posture. (b) Internal arrangement of the head module, showing the ESP32 motor controller, wiring, and ear actuators and internal configuration of the body module, revealing the actuator assembly, embedded mechanical components, and battery placement.

Figure 2. Mechanical structure and expressive degrees of freedom of PEPE. (a) Overall upper-body assembly showing integration of ear, head, leg, and tail modules. (b) Ear pitch actuation and rotation axes. (c) Neck yaw–pitch mechanism for head orientation. (d) Front-leg pitch joints used to modulate body posture. (e) Tail actuation module showing pitch and roll motion directions.

Figure 3. Workflow of the proposed system. Emotional gesture code is generated offline using a Large Language Model (ChatGPT 4) and deployed to the robot subsystems. The ESP32 modules manage motor control, LCD eye displays, and audio I/O, while the Raspberry Pi 5 is dedicated to camera processing. All components communicate through an LTE network (WiFi router) using MQTT, and the actuators execute the specified emotional motions.

Figure 4. Eye display variations for emotional expression. (a) Eye display used for Happy, Joyful, Excited, and Positive Feedback gestures. (b) Eye display used for Angry and Negative Feedback gestures. (c) Eye display used for the Sad gesture. (d) Eye display used for the Fearful gesture.

Figure 5. Workflow of the proposed progressive instructional prompting technique. The process begins with three sequential prompts (Prompt 1: robot structure, Prompt 2: code structure, Prompt 3: emotion information) provided to ChatGPT. Based on these prompts, the model generates motion codes corresponding to eight emotional categories (Happy, Sad, Angry, Fearful, Joyful, Excited, Positive Feedback, and Negative Feedback). The generated codes are transmitted through the ESP32 motor controller to the actuators, where the emotional gestures are physically executed.

Figure 6. Sequential motion snapshots of PEPE performing three representative emotional gestures generated by ChatGPT: (a) Happy (8 s), (b) Angry (4.5 s), and (c) Joyful (6.3 s). Each sequence (left → right) presents the initial, peak, and final poses of the corresponding motion, where t = 0 s denotes the start of movement.

Figure 7. Evaluation results of emotional gesture appropriateness. (a) Mean Likert-scale ratings (1–5) for each emotional gesture, with error bars representing standard deviations (n = 15). (b) Comparison between grouped positive (Happy, Joyful, Excited, Positive Feedback) and negative (Angry, Sad, Fearful, Negative Feedback) emotions. The dashed horizontal line (3.0) indicates the neutral rating threshold.

Table 1. Frequency of each AU in 53 adult domestic cats and Proportion of observation out of 2628 coded AUs [8].

AU Code	AU Description	Frequency	Proportion of Obs.
EAD104	Ear Rotator	399	15.18%
EAD102	Ear Adductor	277	10.54%
EAD101	Ears Forward	222	8.45%
AU5	Upper Lid Raiser	218	8.30%
EAD103	Ear Flattener	184	7.00%

Table 2. Typical feline behaviors associated with different emotional states, based on online resources [9,10,11,12,13]. The table highlights characteristic movements and postures (e.g., ear orientation, tail dynamics, body posture) that are commonly interpreted as indicators of cats’ underlying emotion state.

	Happy	Angry	Sad	Fearful
Ear	Faced Forward	Upright to Appear Larger	Pointed Downward	Pointed Downward
Neck	Stable Posture	Shakes Side to Side	Tilted Downward	Alters between Facing Down and Facing Straight Ahead
Front Feet	Stable Posture	Moves Quickly	Minimal Movement	Minimal Movement
Tail	Occasionally Sway Upward	Raised Stiffly	Occasionally Sway Side to Side	Raised Stiffly
	Joyful	Excited	Positive Feedback	Negative Feedback
Ear	Faced Forward	Faced Forward	Sway Side to Side	Pointed Downward
Neck	Moves Actively Side to Side	Fixed in a Specific Direction	Faced Target	Faced Downward
Front Feet	Reaching Toward Target	Reaching Eagerly Toward Target	Stable Posture	Minimal Movement
Tail	Frequently Sway	Continuous Movement	Gently Sway	Limp

Table 3. Wilcoxon signed-rank test results comparing each emotion’s Likert rating to the neutral midpoint (3).

	W Statistic	p-Value	Mean	Interpretation
Happy	8.0	0.8716	2.800	Subtle
Angry	85.0	0.0015	3.867	Clear
Sad	14.0	0.5000	3.000	Neutral
Fearful	35.0	0.2026	3.200	Neutral
Joyful	45.0	0.0031	3.867	Clear
Excited	45.0	0.0261	3.467	Clear
Pos. Feedback	33.0	0.2635	3.133	Neutral
Neg. Feedback	3.5	0.9711	2.600	Subtle

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lim, Y.; Cho, J.; Lee, D.-Y.; Choi, D.; Lee, D.-W. Design and Implementation of a Companion Robot with LLM-Based Hierarchical Emotion Motion Generation. Appl. Sci. 2025, 15, 12759. https://doi.org/10.3390/app152312759

AMA Style

Lim Y, Cho J, Lee D-Y, Choi D, Lee D-W. Design and Implementation of a Companion Robot with LLM-Based Hierarchical Emotion Motion Generation. Applied Sciences. 2025; 15(23):12759. https://doi.org/10.3390/app152312759

Chicago/Turabian Style

Lim, Yoongu, Jaeuk Cho, Duk-Yeon Lee, Dongwoon Choi, and Dong-Wook Lee. 2025. "Design and Implementation of a Companion Robot with LLM-Based Hierarchical Emotion Motion Generation" Applied Sciences 15, no. 23: 12759. https://doi.org/10.3390/app152312759

APA Style

Lim, Y., Cho, J., Lee, D.-Y., Choi, D., & Lee, D.-W. (2025). Design and Implementation of a Companion Robot with LLM-Based Hierarchical Emotion Motion Generation. Applied Sciences, 15(23), 12759. https://doi.org/10.3390/app152312759

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design and Implementation of a Companion Robot with LLM-Based Hierarchical Emotion Motion Generation

Abstract

1. Introduction

2. Analysis of Feline Emotional Expressions

3. Design and Implementation of PEPE

3.1. Mechanical and Structural Design of PEPE

3.2. Electronic and Control System Design of PEPE

4. Progressive Instructional Prompting Technique

5. Performance Validation

5.1. Validation Method

5.2. Quantitative Evaluation of Emotional Gestures

5.3. Qualitative Feedback Analysis

6. Discussion

7. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Prompt Design

Appendix A.1. Persona

Appendix A.2. Step 1: Robot Information

Appendix A.3. Step 2: Code Information

Appendix A.4. Step 3: Emotion Information

Appendix B. Generated Motion Codes

Appendix B.1. Happy Gesture

Appendix B.2. Angry Gesture

Appendix B.3. Joyful Gesture

Appendix C. Survey Questionnaire

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI