Next Article in Journal
Exploring Convective Drying Behavior of Hydroxide Sludges Through Micro-Drying Systems
Previous Article in Journal
Assessment of the Mechanical Properties of Low-Cost Seismic Isolators Exposed to Environmental Conditions
Previous Article in Special Issue
Differences in User Perception of Artificial Intelligence-Driven Chatbots and Traditional Tools in Qualitative Data Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Classification of Properties in Human-like Dialogue Systems Using Generative AI to Adapt to Individual Preferences

Graduate School of System Informatics, Kobe University, 1-1, Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(7), 3466; https://doi.org/10.3390/app15073466
Submission received: 17 February 2025 / Revised: 13 March 2025 / Accepted: 20 March 2025 / Published: 21 March 2025

Abstract

:

Featured Application

A design approach that includes individual preferences could make human-like dialogue systems more comfortable for users.

Abstract

As the linguistic capabilities of AI-based dialogue systems improve, their human-likeness is increasing, and their behavior no longer receives a universal evaluation. To better adapt to users, the consideration of individual preferences is required. In this study, the relationships between the properties of a human-like dialogue system and dialogue evaluations were investigated using hierarchical cluster analysis for individual subjects. The dialogue system driven by generative AI communicated with subjects in natural language via voice-based communication and featured a facial expression function. Subjective evaluations of the system and dialogues were conducted through a questionnaire. Based on the analysis results, the system properties were classified into two types: generally and individually relational to a positive evaluation of the dialogue. The former included inspiration, a sense of security, and collaboration, while the latter included a sense of distance, personality, and seriousness. Equipping the former properties is expected to improve dialogues for most users. The latter properties should be adjusted to individuals since they are evaluated based on individual preferences. A design approach in accordance with individuality could be useful for making human-like dialogue systems more comfortable for users.

1. Introduction

In recent years, various human–computer dialogue systems, commonly called chatbots, have been put to practical use. With the development of relevant technologies, dialogue systems are predicted to build longer-term and closer personal relationships with users [1]. They are also expected to act as actuators that influence and assist users through conversation in the development of the next AI agents [2], including everyday issues such as healthcare, education, or whole lifestyles [3,4,5]. As a relevant technology, generative AI using large language models is making dialogues with systems more human-like. Systems can generate new sentences as if they had their own mind. For example, ChatGPT-4 (OpenAI) demonstrated advanced linguistic abilities, answering a theory-of-mind test including a false belief scenario correctly, which matches the performance of six-year-old children [6], and it was recognized as a human in a Turing test [7]. Although advanced linguistic abilities do not show that dialogue systems have a mind, it has been shown that systems can behave in a human-like way as if they understand people’s views and minds. Human-like conversations have been thought to improve relationships between users and systems. Some conversational software applications have been developed to make interactions more human-like, introducing emotions, personas, or personal stories [8,9,10]. However, human-like performance is sometimes perceived as unpleasant [11]. The uncanny valley hypothesis, which concerns eerie feelings arising as human-likeness increases, is inconclusive [12], and that is still a complex issue regarding user feelings toward human-like systems, including dialogue systems. Previous research on chatbots’ responses shows some necessary characteristics for positive user experiences, while the answers to each question, choosing the preferred chatbot’s response, in the survey, were mixed. The best response differs from person to person [13]. In practical interpersonal relationships, there are individual preferences in how to interact and evaluate others. As is often remarked, relationship compatibility is a key factor in personal relationships. For systems, personality adaptiveness can become a valuable feature [14]. Although systems have not equaled humans, they have become human-like enough to show similarly varied preferences in relationships between users and systems. Systems can demonstrate human-like value as communication partners. The media equation, which suggests that interactive systems can be perceived as equally human [15], is widely recognized. It has been shown that conversations with chatbots have positive psychological effects similar to conversations with humans [16] and that apologies from robots have value for people [17]. In parallel, it has been shown that the behavior expected of systems differs from the behavior expected of humans. For example, users behave differently toward humans and chatbots [18,19], and productivity is the primary expectation for chatbots [20]. To develop favorable partner dialogue systems, it is necessary to identify the properties that influence evaluations based on individual preferences in human-like dialogue systems. In our previous research, subjective evaluation of dialogues with a human-like system was investigated in terms of social interaction structure to better adapt to users [21]. Since indicating social behavior in interactions is important for humans to recognize non-living things as living with empathy [22,23,24], dialogues with the system were evaluated by progressively developing social interaction structures. This paper focuses on individual preferences in system behavior to improve personal relationship compatibility between users and systems.
In this study, the links between the subjective evaluations of the system’s impressions and the dialogue quality were analyzed for each subject using hierarchical cluster analysis. The dialogue system, developed using generative AI for the experiment, communicated in natural language with the subjects via voice-based communication and featured a facial expression function. The dialogue system and the dialogues in three types of interaction setting were evaluated through a questionnaire. Based on the results of the cluster analysis, the dialogue system’s properties were classified into two types: generally and individually relational to a positive evaluation of the dialogue. The former included inspiration, a sense of security, and collaboration, while the latter included a sense of distance, personality, and seriousness. Since the latter properties were evaluated based on individual user preferences, they should be adjusted to each user.
The remainder of this paper is structured as follows. Section 2, Methods, describes the dialogue system, questionnaire, and experimental conditions. Section 3, Results, presents first, responses to the questionnaire; second, the aggregated results from analysis of the responses; and, third, the classification of system evaluation items based on the results. Section 4, Discussion, considers the properties of the dialogue system evaluated through these items. Section 5, Conclusions, summarizes the findings and describes future work.

2. Methods

2.1. Dialogue System

A laptop PC was used as the dialogue system for the experiment. The subjects input their voices into the PC microphone. The dialogue system outputs voice-based communication, reading texts, from the PC speakers and displays facial expression images on the screen. The PC screen was masked except for the facial expressions. An overview of the dialogue system is shown in Figure 1.
The dialogue was conducted in a question-and-answer format. The subject switched on the microphone to input a voice comment and then switched it off. The dialogue system responded to this. There was no continuous dialogue, such as referring to what the subject had said in the previous turn. ChatGPT (gpt-3.5-turbo-0613 model, OpenAI) was used to generate sentences for the dialogue system’s voice output.
In the experiment, the dialogue system was a human-like communication partner. To help the subjects feel this, a facial expression function was developed. It is known that emotional information can be conveyed to humans through expressions by machine systems, and using a combination of different forms of expression is effective [25]. The face image, displayed on the screen of the dialogue system, changes the shape of the mouth and the face color, as shown in Figure 2. There are five levels of emotional expression, with the image on the left side used to express more negative emotions and the image on the right side used to express more positive emotions. To select the face image related to a dialogue, ChatGPT (gpt-3.5-turbo-0613 model, OpenAI) was used. This feature did not classify emotions in detail but simply determined whether the emotion was positive or negative. It is known that cultural differences exist in emotional expression and classification between Eastern and Western cultures. However, “happy” and “sad” are commonly used for positive and negative emotions [26]. Therefore, the prompts for emotional evaluation included the degree of happiness and sadness.

2.2. Settings of Dyadic and Triadic Interactions

The dialogues were conducted in three types of interaction setting, following the development of children. During stages of communication development in children, relationships with someone or something begin as a one-to-one dyadic interaction between oneself and a partner or between oneself and an object of interest. Next, relationships progress to a triadic interaction involving oneself, a partner, and an object of common interest, as shown in Figure 3. This is called joint attention, an essential development for social relationships in human communication at around nine months [27]. The settings of dyadic and triadic interactions were applied to the dialogue system in the experiment, and the system behaved differently in each setting [21]. An object of common interest in interpersonal communication was a topic in the settings. Figure 4 shows the interaction structures between the subject, dialogue system, and topic. The dialogue system was the Child in Figure 3. The system worked to construct the structures enclosed by the dotted line in Figure 4 according to each setting.

2.3. Topic

The topics in the experiment were “which is better, orange or apple, airplane or bullet train, summer or winter”, which did not contain a strong meaning or emotion for the subjects. When subjects were asked to talk about emotional topics in pilot experiments, the results showed differences in the degree of feelings depending on their chosen topics, and some subjects felt a resistance to talking. Although exchanging emotions is important in communication, research on emotional conversations and conversation contents was not the purpose of this study. Therefore, topics with a strong meaning or emotion were avoided to ensure the equality of the experimental conditions.

2.4. General Algorithms

The prompts and the processing for the dialogue system in each setting and examples of the interaction are shown in Table 1. In the vertical axis of Table 1, the actions of the subject and system are listed with the progression of the experiment. In the horizontal axis, the experimental conditions, which are three types of interaction setting, are arranged. The top columns show the features of the system’s behavior in each setting.
To generate responses, ChatGPT (gpt-3.5-turbo-0613 model, OpenAI) was prompted in its system role to respond to a subject input in a polite tone with approximately 25 words, and in its assistant role to provide a reply to a subject input or to provide a reply to a subject input on a specified topic.
To evaluate emotions, ChatGPT (gpt-3.5-turbo-0613 model, OpenAI) was prompted in its assistant role to provide a happiness degree and a sadness degree on a scale of 100, evaluating the emotions present in a combination of a subject input and a generated response to this. The difference in numbers between the degree of happiness and sadness was converted into five levels with equal intervals, and these levels were used to select a face image.
Unity (Unity 2022.3.0, Unity Technologies) was used to integrate and present these functions.

2.5. Questionnaire

A Japanese post-dialogue questionnaire was created for the experiment. For the dialogue system evaluation, items were developed with reference to research on communication partners in Japan. The items from the research on the attractiveness factors of a friend [28] were as follows: “Independent”, “Sincere”, “Inspiring”, and “Easy to talk to”. The items from the research on the impression of the communication robot [29] were as follows: “Awkward”, “Has personality”, and “Inhuman”. Additionally, “Friendly” was included with the expectation that a friendly system would be evaluated positively as a dialogue partner. Regarding the dialogue evaluation, some subjects were asked what types of dialogue they consider to be good in advance. Dialogue evaluation items were developed based on the answers. The items about the dialogue were as follows: “Fun”, “It listens”, “Can have a smooth conversation”, and “Want to talk about other topics too”. The subjects were asked to respond to each item on a six-point visual analog scale from disagree to agree.

2.6. Subject

The subjects were 13 healthy adults (5 males, 8 females, age 35.8 ± 14.4 years) including university students and working adults. The contents of the experiment were explained to the subjects in advance. Informed consent was obtained from all subjects involved in this study.

2.7. Experimental Procedure

  • The subject inputs any voice, such as hello, into the system.
  • The dialogue system announces a topic and starts a dialogue.
  • After five minutes, the system announces the end of a dialogue and asks the subject to fill out the post-dialogue questionnaire.
These steps were repeated for the three types of system settings. When three dialogues were completed, the dialogue system announced the end of the experiment.
The subjects were only told that there were three types of partners and were not given any prior information about their respective settings. When the experimenter explained how to talk with the dialogue system, the subjects practiced talking with the system for up to two turns. Considering that impressions would change depending on the order, the order of the settings was switched for each subject to ensure equality. To avoid the influence of others, the experiment was conducted in a room with only the subject and the contents of the dialogues were not saved. The time series of the number of characters, spoken during the experiment, was saved and used to verify that a dialogue had taken place.

3. Results

All the subjects interacted with the system and responded to the questionnaire. First, the responses are presented from the perspective of overall trends and individual subjects. Second, hierarchical cluster analysis was conducted on each subject. The analysis results were aggregated to investigate the cumulative results of all subjects’ evaluations. The process is illustrated using the response and dendrograms of one subject as an example. Third, the system evaluation items were classified according to the cumulative results.

3.1. Questionnaire Results

The average and standard deviation of the results for the post-dialogue questionnaire are shown in Figure 5. The dialogue system in the dyadic interaction (with a topic) setting ignored the subjects in the dialogue as shown in Figure 4, and the evaluations were negative on average. In the dyadic interaction (with a subject) setting and in the triadic interaction setting, dialogues took place between the subjects and the system, and the evaluations were relatively positive on average. On the other hand, each of the subjects gave various evaluations of the system and dialogues across the three settings. The evaluation results for one subject are shown in Table 2 as an example, and the evaluation results for all subjects are included in the Supplementary File.

3.2. Analysis of Links Between Evaluations of the System and the Dialogue

The post-dialogue questionnaire results were analyzed for each subject to investigate the individual evaluations. Hierarchical cluster analysis with the Ward method was used to examine the links between the evaluations for items about the system and the evaluation for each item about the dialogue. Figure 6 shows the dendrograms from the analysis for one subject’s evaluations (from Table 2) as an example.
The strength of the links was quantified with hierarchical closeness on the dendrograms from the analysis. The system evaluation items at the same hierarchy on the dendrograms with a dialogue evaluation item were assigned a step number of 0, items merged at one hierarchical level were assigned a step number of 1, those at two hierarchical levels were assigned a step number of 2, and so forth. A smaller number of steps means a stronger link. The step numbers are shown in brackets after each item in Figure 6.
For the subject, as shown in Figure 6, the following system evaluations were most strongly linked to each dialogue evaluation. When the system was “Easy to talk to” and “Inspiring”, the dialogues were “Fun”. When the system was “Easy to talk to”, “Inspiring”, and “Friendly”, the dialogues were “It listens”. When the system was “Friendly”, the dialogues were “Can have a smooth conversation” and “Want to talk about other topics too”. The other system evaluations were linked more weakly.
The analysis results for all thirteen subjects were aggregated to create the graphs in Figure 7, which show the distribution of the subjects for each number of steps. These graphs indicate the number of subjects who linked each system evaluation to each dialogue evaluation and the strength of these links.
In the following sections on Results and Discussions, a link with a strength of 0 or 1 step was considered to be relational, and a link with a strength of 2 or more steps was non-relational. A step number of 0 means that the evaluation values for the two items were completely connected, and a step number of 1 means that the values were not the same but were close. A link with a strength of 2 or more steps was considered weak in comparison to these. According to this distinction, the items for system evaluation that were strongly linked to the dialogue evaluation were treated as relational items, while the others were treated as non-relational items. For example, in the graph of “Friendly” in the “Fun” section, four subjects linked “Friendly” for the system to “Fun” for the dialogue with a strength of 0 or 1 step, and the other nine subjects linked them with a strength of 2 or more steps. This graph indicates that “Friendly” for the system and “Fun” for the dialogue were relational for four subjects, while they were non-relational for the other nine subjects.
As shown in Figure 7, the distributions of subjects in the “Awkward” and “Inhuman” graphs were all at 2 or more steps, except for one subject in each item. While “Awkward” and “Inhuman” functioned as evaluation items for the system as shown in Figure 5, the results of the analysis did not show distinct relationships with the dialogue evaluation. The reason would be that these items were negative expressions for a dialogue partner and were difficult to link with the positive dialogue evaluation in the questionnaire. These also showed that the possibility of a positive evaluation of awkward or inhuman behavior was denied. Additionally, since the impressions of words cannot easily be reversed, it was decided not to use reverse scoring values. An exceptional result of “Awkward” was for one subject at 0 steps in the “Fun” section. The subject may have perceived “Awkward” as humorous and linked it to “Fun”. An exceptional result of “Inhuman” was for one subject at 1 step in the “Can have a smooth conversation” section. The subject may have perceived “Inhuman” as neutral and linked it to “Can have a smooth conversation”. Although there were these possibilities of relationships, the exceptional results were too small, so the results on “Awkward” and “Inhuman” were excluded from the following classification as these items that bore no relationship to the dialogue evaluation.

3.3. The Classification of the System Evaluation Items

The system evaluation items are classified according to the types of relationships with each dialogue evaluation. The classification consists of four conditions whether an item is relational or non-relational and whether the relationship is general or individual, based on the characteristics of the subject distributions in Figure 7. The shapes of the distributions in each graph of Figure 7 have one or two peaks. The peak positions indicate the evaluation tendencies of the subject groups that are distributed around the peaks. When there is one peak, the subjects form one group with the same evaluation tendency. When there are two peaks, the subjects form two groups with different evaluation tendencies. When a peak is at 0 or 1 step, this subject group tends to relate the system evaluation to the dialogue evaluation. When a peak is at 2 or more steps, this subject group tends not to relate the two evaluations. Therefore, if one of the two peaks is at 0 or 1 step and the other peak is at 2 or more steps, one subject group tends to relate their evaluations of the system and the dialogue, while the other group tends not to relate them.
The classification results of the six evaluation items, except for “Awkward” and “Inhuman”, are shown in Table 3. Items with one peak at 0 or 1 step are classified into the first column. Items with one peak at 2 or more steps are in the fourth column. Items with two peaks, one at 0 or 1 step and the other at 2 or more steps, are classified into the second column. Items with two peaks both at 2 or more steps are classified into the third column. The distribution of subjects on “Inspiring” in the “It listens” section has a peak at 1 and 2 steps with equal values. It is determined that the peak is at 2 steps because the sum of the number of subjects at 2 or more steps is larger than that at 0 or 1 step. The distribution of subjects on “Easy to talk to” in the “It listens” section has a peak at 1 and 2 steps with equal value. It is determined that the peak is at 1 step because the sum of the number of subjects at 0 or 1 step is larger than that at 2 or more steps.
The items in the first column of Table 3 are generally relational items. The subjects formed one group and tended to relate the system evaluations for the items to the dialogue evaluation. The items in the second column are individually relational items. The subjects formed two groups, the evaluation tendencies differed across subjects for the items, and some subjects related the evaluations. The items in the third and fourth columns are non-relational items. In this study, the items in the third column are treated as simply non-relational, along with the items in the fourth column. Although there were two groups of subjects and they might differ in their evaluations for the items in the third column, neither group related them. Since the aim is to investigate how the subjects related the evaluations, how they did not relate them is not addressed. Future research on negative evaluations, such as indifference or negative impacts, could provide additional findings.

4. Discussion

In this experiment, the subjects communicated with the human-like dialogue system using generative AI. The system behaved differently in three interaction settings. The subjective evaluations of the system and dialogues varied among individual subjects. Focusing on the individual evaluations, a cluster analysis of the links in the evaluations was conducted. Based on the analysis results, the cumulative results of all subjects’ evaluations were organized, as shown in Figure 7. According to the cumulative results, the system evaluation items were classified, as shown in Table 3. The “awkward” and “inhuman” items were excluded from the classification, since they were not related to the positive dialogue evaluations. The irrelevance of “inhuman” indicates that the human-likeness of the system was preferred in its dialogues.
To identify the properties that influence evaluations, the items in Table 3 are discussed. The subjects’ interpretations of the item words are considered with the premise that the system is not human.
The system evaluation items in Table 3:
  • The “Inspiring” item was developed to evaluate an attractiveness factor of a friend that provides positive inspiration [28]. The “Inspiring” item was only related to the “Fun” from the dialogue category. In other words, dialogues with an inspiring partner were fun. This is a reasonable result of a literal interpretation of “Inspiring”. The property of inspiration was evaluated through the item “Inspiring”.
  • The “Easy to talk to” item was developed to evaluate a sense of security. An interpersonal relationship with a sense of security means a relaxed relationship without worries or barriers. As shown in previous research on friendships, the sense of security is the most fundamental attractiveness factor of a friend, and the survey item “Easy to talk to” best reflects this [28]. The “Easy to talk to” item was related to three dialogue evaluations, except for “Can have a smooth conversation”. The result indicates that the item was most commonly related to positive evaluations, as it was the most frequent and generally relational among the subjects. It is also natural that a sense of security is a fundamental element of the relationship with the system. Similarly to interpersonal relationships, the property of a sense of security was evaluated through the item “Easy to talk to”.
  • The “Friendly” item was developed with the expectation that a friendly system would be evaluated positively as a dialogue partner. The “Friendly” item was related to “Can have a smooth conversation”, which indicates dialogue capabilities, and “Want to talk about other topics too”, which indicates a wish to continue the relationship, but not to “Fun” or “It listens”, which involve feelings. It seems that the “Friendly” item was interpreted as being collaborative, which refers to the system’s dialogue capability to make dialogue easy for the subjects. Compared with being friendly, being collaborative is a more plausible expectation for the system. The property of collaboration was evaluated through the item “Friendly”.
  • The “Independent” item was developed to evaluate a sense of distance, an attractiveness factor of a friend that maintains a moderate distance [28]. The “Independent” item was related to “Fun”, “It listens”, and “Want to talk about other topics too” among some subjects. In other words, dialogues with an independent partner made some subjects feel good and caused a wish to continue. This is a reasonable result for an interpretation of “Independent” as maintaining a moderate distance, similar to the case of human friends. The property of a sense of distance was evaluated through the item “Independent”.
  • The “Has personality” item was developed with reference to a survey about impressions of robots [29]. The words “Has personality” are reasonably interpreted literally regarding systems. The “Has personality” item was related to three dialogue evaluations, except for “Can have a smooth conversation”, among some subjects. It is natural that some subjects preferred a partner with a personality. The property of personality was evaluated through the item “Has personality”.
  • The “Sincere” item was developed to evaluate the attractiveness factor of a friend. Sincerity is an important factor in lasting friendships [28]. However, the “Sincere” item was not related to “Want to talk about other topics too”, which indicates a wish to continue the relationship. It was related to “Can have a smooth conversation” and “It listens” among some subjects, but not to “Fun”. When the dialogue system listened well and responded fluently, the system was evaluated as sincere. Unlike in interpersonal relationships, “Sincere” in the system was interpreted in terms of reliability rather than trust. Considering that an attentive attitude and conversational ability were evaluated individually through the word sincere, the “Sincere” item is referred to as seriousness. The property of seriousness was evaluated through the item “Sincere”.
The properties of the system related to the positive dialogue evaluations included inspiration, a sense of security, collaboration, a sense of distance, personality, and seriousness.
The properties are divided into two types—generally and individually relational—corresponding to the first and second columns of Table 3. The three properties of inspiration, a sense of security, and collaboration generally received positive evaluations. The other three properties of a sense of distance, personality, and seriousness had different effects on individuals. To develop favorable dialogue systems, the former should be equipped generally, while the latter should be adjusted to individuals. In particular, to adopt individual preferences, personalizing the latter properties would be effective. For example, tuning the level of interference with users’ actions, characterization, and the frequency of repeated confirmations. On the other hand, for general use, modest designs in the latter properties are predicted to reduce the risk of the system being disliked. This prediction is consistent with the results of previous research as follows. Although there are individual differences, the setting supported by a larger number of people involves avoiding small talk and maintaining a formal tone, identifying itself as a bot, and asking follow-up questions [13]; that is, maintaining a formal distance, having less personality, and expressing a serious attitude. A dialogue system in a high seriousness setting, with a single focused topic, receives both positive and negative evaluations, while a system in a lower seriousness setting, with no focused topic, receives positive evaluations generally [21]. Patients are more likely to self-disclose with a virtual anonymous agent, which has less personality, than with a human specialist in a conversation about some topics regarding diagnosing a disease [30].
In this experiment, the sample size of 13 subjects was small relative to the analysis method, and the reliability of the results is limited. The findings of this study are able to explain the participants’ responses from previous research; however, further verification is needed. In a future study, a test of a system personalized regarding the properties described in this paper is planned to verify its effectiveness. In addition, it is necessary to note the risks associated with AI dialogue systems. AI technology has been discussed in terms of social and ethical risks that may undermine people’s autonomy, privacy, and equity. Systems that include AI technology need to be developed ethically and safely for users [31,32,33]. Human-like AI dialogue systems, adapted to individual preferences, will become comfortable partners in daily life. However, they also involve these risks, so their development should be ethical.

5. Conclusions

Human-like behavior is preferred in AI-based dialogue systems and is required to adapt to users’ individual preferences. This study clarifies that system properties are classified into two types that generally or individually influence users’ dialogue evaluation. The former type of properties, such as inspiration, a sense of security, and collaboration, are expected to improve dialogues for most users. The latter type of properties, such as a sense of distance, personality, and seriousness, affect users depending on individual preferences, and adjusting them contributes to the personalization of systems.
The classification of human-like system properties according to the presence of individual preferences could be a helpful approach to improving relationship compatibility between users and systems. In future work, a system personalized with the properties described in this paper will be examined. Additionally, interactive systems targeting specific users, such as children and the elderly, are needed and should be researched [34,35]. Results different from those in this study could emerge for specific groups. In terms of addressing individuals, handling emotions with affective computing [36] should be included. Further research concerning these aspects is necessary.

Supplementary Materials

The following supporting information can be downloaded at: www.mdpi.com/article/10.3390/app15073466/s1, Table S1: Questionnaire results for all subjects.

Author Contributions

Conceptualization, K.A. and Z.L.; data curation, K.A.; formal analysis, K.A.; investigation, K.A.; methodology, K.A.; project administration, Z.L.; resources, C.Q., S.C. and Z.L.; software, K.A.; supervision, Z.L.; validation, Z.L.; visualization, K.A.; writing—original draft, K.A.; writing—review and editing, C.Q., S.C. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Research Ethics of the Graduate School of System Informatics at Kobe University, which adheres to the principles of the Declaration of Helsinki. The experimental protocol was approved by the Graduate School of System Informatics at Kobe University (29 November 2023).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in this study are included in the article and the Supplementary File. Further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to express our sincere gratitude to all the subjects and related parties who cooperated in conducting this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Folstad, A.; Skjuve, M.; Brandtzaeg, P.B. Different chatbots for different purposes: Towards a typology of chatbots to understand interaction design. Internet Sci. 2019, 11551, 145–156. [Google Scholar] [CrossRef]
  2. Durante, Z.; Huang, Q.; Wake, N.; Gong, R.; Park, J.S.; Sarkar, B.; Taori, R.; Noda, Y.; Terzopoulos, D.; Choi, Y.; et al. Agent AI: Surveying the horizons of multimodal interaction. arXiv 2024, arXiv:2401.03568. [Google Scholar]
  3. Zhang, J.; Oh, Y.J.; Lange, P.; Yu, Z.; Fukuoka, Y. Artificial Intelligence Chatbot Behavior Change Model for Designing Artificial Intelligence Chatbots to Promote Physical Activity and a Healthy Diet: Viewpoint. J. Med. Internet Res. 2020, 22, e22845. [Google Scholar] [CrossRef]
  4. Ayedoun, E.; Hayashi, Y.; Seta, K. Adding Communicative and Affective Strategies to an Embodied Conversational Agent to Enhance Second Language Learners’ Willingness to Communicate. Int. J. Artif. Intell. Educ. 2019, 29, 29–57. [Google Scholar] [CrossRef]
  5. Herrmann-Werner, A.; Festl-Wietek, T.; Junne, F.; Zipfel, S.; Madany Mamlouk, A. “Hello, my name is Melinda”—Students’ views on a digital assistant for navigation in digital learning environments; A qualitative interview study. Front. Educ. 2021, 5, 541839. [Google Scholar] [CrossRef]
  6. Kosinski, M. Evaluating large language models in theory of mind tasks. Proc. Natl. Acad. Sci. USA 2024, 121, e2405460121. [Google Scholar] [CrossRef]
  7. Jones, C.R.; Bergen, B.K. People cannot distinguish GPT-4 from a human in a Turing test. arXiv 2024, arXiv:2405.08007. [Google Scholar]
  8. Zhou, H.; Huang, M.; Zhang, T.; Zhu, X.; Liu, B. Emotional chatting machine: Emotional conversation generation with internal and external memory. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, LU, USA, 2–7 February 2018; pp. 730–738. [Google Scholar]
  9. Nißen, M.; Rüegger, D.; Stieger, M.; Flückiger, C.; Allemand, M.; v Wangenheim, F.; Kowatsch, T. The effects of health care chatbot personas with different social roles on the client-chatbot bond and usage intentions: Development of a design codebook and web-based study. J. Med. Internet Res. 2022, 24, e32630. [Google Scholar] [CrossRef] [PubMed]
  10. Miyanishi, T.; Hirayama, J.; Kanemura, A.; Kawanabe, M. Answering mixed type questions about daily living episodes. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 4265–4271. [Google Scholar] [CrossRef]
  11. Rapp, A.; Curti, L.; Boldi, A. The human side of human-chatbot interaction: A systematic literature review of ten years of research on text-based chatbots. Int. J. Hum. Comput. Stud. 2021, 151, 102630. [Google Scholar] [CrossRef]
  12. Kätsyri, J.; Förger, K.; Mäkäräinen, M.; Takala, T. A review of empirical evidence on different uncanny valley hypotheses: Support for perceptual mismatch as one road to the valley of eeriness. Front. Psychol. 2015, 6, 390. [Google Scholar]
  13. Svenningsson, N.; Faraon, M. Artificial intelligence in conversational agents: A study of factors related to perceived humanness in chatbots. In Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing Conference, Kobe, Japan, 21–23 December 2019; pp. 151–161. [Google Scholar] [CrossRef]
  14. Ahmad, R.; Siemon, D.; Gnewuch, U.; Robra-Bissantz, S. Designing personality-adaptive conversational agents for mental health care. Inf. Syst. Front. 2022, 24, 923–943. [Google Scholar] [CrossRef]
  15. Reeves, B.; Nass, C. The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places; Cambridge University Press: Cambridge, UK, 1996. [Google Scholar]
  16. Ho, A.; Hancock, J.; Miner, A.S. Psychological, relational, and emotional effects of self-disclosure after conversations with a chatbot. J. Commun. 2018, 68, 712–733. [Google Scholar] [CrossRef]
  17. Okada, Y.; Kimoto, M.; Iio, T.; Shimohara, K.; Shiomi, M. Two is better than one: Apologies from two robots are preferred. PLoS ONE 2023, 18, e0281604. [Google Scholar] [CrossRef]
  18. Hill, J.; Ford, W.R.; Farreras, I.G. Real conversations with artificial intelligence: A comparison between human-human online conversations and human-chatbot conversations. Comput. Hum. Behav. 2015, 49, 245–250. [Google Scholar] [CrossRef]
  19. Mou, Y.; Xu, K. The media inequality: Comparing the initial human-human and human-AI social interactions. Comput. Hum. Behav. 2017, 72, 432–440. [Google Scholar] [CrossRef]
  20. Brandtzaeg, P.B.; Folstad, A. Why people use chatbots. Internet Sci. 2017, 10673, 377–392. [Google Scholar] [CrossRef]
  21. Abe, K.; Quan, C.; Cao, S.; Luo, Z. Subjective evaluation of dialogues with dyadic and triadic interactions using a ChatGPT-based dialogue system. In Proceedings of the 2024 Joint 13th International Conference on Soft Computing and Intelligent Systems and 25th International Symposium on Advanced Intelligent Systems (SCIS&ISIS), Himeji, Japan, 9–12 November 2024; pp. 1–6. [Google Scholar] [CrossRef]
  22. Gao, T.; McCarthy, G.; Scholl, B.J. The Wolfpack Effect: Perception of Animacy Irresistibly Influences Interactive Behavior. Psychol. Sci. 2010, 21, 1845–1853. [Google Scholar] [CrossRef]
  23. Kanakogi, Y.; Okumura, Y.; Inoue, Y.; Kitazaki, M.; Itakura, S. Rudimentary Sympathy in Preverbal Infants: Preference for Others in Distress. PLoS ONE 2013, 8, e65292. [Google Scholar] [CrossRef]
  24. Imaizumi, T.; Takahashi, K.; Ueda, K. Influence of appearance and motion interaction on emotional state attribution to objects: The example of hugging shimeji mushrooms. Comput. Hum. Behav. 2024, 161, 108383. [Google Scholar] [CrossRef]
  25. Loeffler, D.; Schmidt, N.; Tscharn, R. Multimodal expression of artificial emotion in social robots using color, motion and sound. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, Chicago, IL, USA, 5–8 March 2018; pp. 334–343. [Google Scholar] [CrossRef]
  26. Jacka, R.E.; Garrodb, O.G.B.; Yu, H.; Caldarac, R.; Schyns, P.G. Facial expressions of emotion are not culturally universal. Proc. Natl. Acad. Sci. USA 2012, 109, 7241–7244. [Google Scholar] [CrossRef]
  27. Tomasello, M. Becoming Human: A Theory of Ontogeny; Belknap Press of Harvard University Press: Cambridge, MA, USA, 2021. [Google Scholar]
  28. Nishiura, M.; Daibo, I. The relationships between attraction of same-sex friend and relationship-maintenance motivation in the light of personal importance. Jpn. J. Interpers. Soc. Psychol. 2010, 10, 115–123. (In Japanese) [Google Scholar] [CrossRef]
  29. Chidori, H.; Matsuzaki, G. Impression for the robot that the purpose of communication. In Proceedings of the 59th Annual Conference of Japanese Society for the Science of Design, Sapporo, Japan, 22–24 June 2012; (In Japanese) [Google Scholar] [CrossRef]
  30. Yokotani, K.; Takagi, G.; Wakashima, K. Advantages of virtual agents over clinical psychologists during comprehensive mental health interviews using a mixed methods design. Comput. Hum. Behav. 2018, 85, 135–145. [Google Scholar]
  31. Ortega-Bolaños, R.; Bernal-Salcedo, J.; Germán Ortiz, M.; Galeano Sarmiento, J.; Ruz, G.A.; Tabares-Soto, R. Applying the ethics of AI: A systematic review of tools for developing and assessing AI-based systems. Artif. Intell. Rev. 2024, 57, 110. [Google Scholar] [CrossRef]
  32. Laranjo, L.; Dunn, A.G.; Tong, H.L.; Kocaballi, A.B.; Chen, J.; Bashir, R.; Surian, D.; Gallego, B.; Magrabi, F.; Lau, A.Y.S.; et al. Conversational agents in healthcare: A systematic review. J. Am. Med. Inform. Assoc. 2018, 25, 1248–1258. [Google Scholar] [CrossRef] [PubMed]
  33. Sundar, S.S. Rise of Machine Agency: A Framework for Studying the Psychology of Human–AI Interaction (HAII). J. Comput. Mediat. Commun. 2020, 25, 74–88. [Google Scholar] [CrossRef]
  34. Sano, T.; Horii, T.; Abe, K.; Nagai, T. Temperament estimation of toddlers from child–robot interaction with explainable artificial intelligence. Adv. Robot. 2021, 35, 1068–1077. [Google Scholar] [CrossRef]
  35. Ring, L.; Shi, L.; Totzke, K.; Bickmore, T. Social support agents for older adults: Longitudinal affective computing in the home. J. Multimodal User Interfaces 2015, 9, 79–88. [Google Scholar] [CrossRef]
  36. Wang, Y.; Song, W.; Tao, W.; Liotta, A.; Yang, D.; Li, X.; Gao, S.; Sun, Y.; Ge, W.; Zhang, W.; et al. A systematic review on affective computing: Emotion models, databases, and recent advances. Inf. Fusion 2022, 83–84, 19–52. [Google Scholar] [CrossRef]
Figure 1. An overview of the dialogue system.
Figure 1. An overview of the dialogue system.
Applsci 15 03466 g001
Figure 2. Face images displayed on the screen for five levels of emotional expression.
Figure 2. Face images displayed on the screen for five levels of emotional expression.
Applsci 15 03466 g002
Figure 3. Triadic interaction structures between a child, a partner, and an object of interest.
Figure 3. Triadic interaction structures between a child, a partner, and an object of interest.
Applsci 15 03466 g003
Figure 4. Interaction structures between the subject, the dialogue system, and the topic, according to the settings of the dialogue system.
Figure 4. Interaction structures between the subject, the dialogue system, and the topic, according to the settings of the dialogue system.
Applsci 15 03466 g004
Figure 5. Results for the post-dialogue questionnaire.
Figure 5. Results for the post-dialogue questionnaire.
Applsci 15 03466 g005
Figure 6. Dendrograms from hierarchical cluster analysis for one of the subjects with the step number in brackets after each item.
Figure 6. Dendrograms from hierarchical cluster analysis for one of the subjects with the step number in brackets after each item.
Applsci 15 03466 g006
Figure 7. The distribution of the subjects for each number of steps from the dendrograms.
Figure 7. The distribution of the subjects for each number of steps from the dendrograms.
Applsci 15 03466 g007
Table 1. Prompts to ChatGPT and the processing of the dialogue system. Examples in this table are for the topic “which is better, orange or apple”.
Table 1. Prompts to ChatGPT and the processing of the dialogue system. Examples in this table are for the topic “which is better, orange or apple”.
Dialogue progressionInteractions
Dyadic interaction (with subject)Dyadic interaction (with topic)Triadic interaction
Respond to what a subject says. If a subject goes off-topic, follow the subject.
Ex. Subject “What do you like?”, System “One of my absolute favorite hobbies is traveling. Exploring new places and experiencing different cultures brings me so much joy!”
Read out the prepared sentences one at a time and do not respond to what a subject says.
Ex. Subject “What do you like?”, System “Both oranges and apples have their own wonderful qualities!”
Respond topically to what a subject says. If a subject goes off-topic, the system sometimes points it out.
Ex. Subject “What do you like?”, System “I enjoy both oranges and apples, but my choice depends on my mood or the season. They’re both nutritious and delicious fruits, don’t you think?”
StartNeutral level facial expression image is displayed, and a standard phrase is announced, e.g., “Hello. The first topic is which is a better fruit, orange or apple?”
Subject turnSome voice input, e.g., “I like an orange. I think, an orange is better”.
ProcessingPrompt for a response: Respond to a subject input in a polite tone with approximately 25 words.
Prompt for emotional evaluation: Rate happiness and sadness on a scale of 100 about “a subject input and a generated response”.
The difference in numbers between the degree of happiness and sadness is changed to five levels and used to select a face image.
All response texts and face image selections are prepared in advance. The generated sentences about topics are cut into appropriate lengths, and face images are selected in the same way as the others, except for subject inputs.Prompt for a response: Respond topically to a subject input in a polite tone of approximately 25 words on a topic.
Prompt for emotional evaluation: Rate happiness and sadness on a scale of 100 about “a subject input and a generated response”.
The difference in numbers between the degree of happiness and sadness is changed to five levels and used to select a face image.
System turnA selected face image is displayed, and a response text is read out.
Repeat Subject turn, Processing and System turn for 5 min.
EndA standard phrase is announced, “Thank you. That’s all for this topic. Please fill out the questionnaire. Please talk to me after you’ve finished”.
Table 2. Questionnaire results for one of the subjects.
Table 2. Questionnaire results for one of the subjects.
Evaluation ItemSetting
Dyadic Interaction (with Subject)Dyadic Interaction (with Topic)Triadic Interaction
SystemFriendly534
Awkward353
Independent444
Sincere445
Inspiring525
Has personality522
Inhuman254
Easy to talk to525
DialogueFun425
It listens426
Can have a smooth conversation424
Want to talk about other topics too424
Table 3. The classification of the system evaluation items.
Table 3. The classification of the system evaluation items.
Dialogue EvaluationGenerally RelationalIndividually RelationalNon-RelationalNon-Relational
FunInspiring
Easy to talk to
Independent
Has personality
Friendly
Sincere
It listensEasy to talk toIndependent
Sincere
Has personality
Friendly
Inspiring
Can have a smooth conversationFriendlySincereEasy to talk toIndependent
Inspiring
Has personality
Want to talk about other topics tooFriendly
Easy to talk to
Independent
Has personality
SincereInspiring
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Abe, K.; Quan, C.; Cao, S.; Luo, Z. Classification of Properties in Human-like Dialogue Systems Using Generative AI to Adapt to Individual Preferences. Appl. Sci. 2025, 15, 3466. https://doi.org/10.3390/app15073466

AMA Style

Abe K, Quan C, Cao S, Luo Z. Classification of Properties in Human-like Dialogue Systems Using Generative AI to Adapt to Individual Preferences. Applied Sciences. 2025; 15(7):3466. https://doi.org/10.3390/app15073466

Chicago/Turabian Style

Abe, Kaori, Changqin Quan, Sheng Cao, and Zhiwei Luo. 2025. "Classification of Properties in Human-like Dialogue Systems Using Generative AI to Adapt to Individual Preferences" Applied Sciences 15, no. 7: 3466. https://doi.org/10.3390/app15073466

APA Style

Abe, K., Quan, C., Cao, S., & Luo, Z. (2025). Classification of Properties in Human-like Dialogue Systems Using Generative AI to Adapt to Individual Preferences. Applied Sciences, 15(7), 3466. https://doi.org/10.3390/app15073466

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop