Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessFeature PaperArticle

Peer-Review Record

Classification of Properties in Human-like Dialogue Systems Using Generative AI to Adapt to Individual Preferences

Appl. Sci. 2025, 15(7), 3466; https://doi.org/10.3390/app15073466

by Kaori Abe^*

, Changqin Quan, Sheng Cao

and Zhiwei Luo

Reviewer 1: Anonymous

Reviewer 2:

Paulo Tasinaffo

Appl. Sci. 2025, 15(7), 3466; https://doi.org/10.3390/app15073466

Submission received: 17 February 2025 / Revised: 13 March 2025 / Accepted: 20 March 2025 / Published: 21 March 2025

(This article belongs to the Special Issue Human-Computer Interaction: Challenges, Opportunities and Emerging Developments, 2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Positive aspects:

Interesting topic. Potentially significant educational applications.

However, the study has significant methodological problems that the authors must explain and improve the manuscript.

Weaknesses:

The study has several serious methodological shortcomings that impact the overall conclusion and validity of the manuscript.

There is no background information about the survey participants. Are they members of the general population or a specific group? The distribution of their age is quite large 14.4 years vs. 35.8 years average (assuming normal distribution). This is very negative if the group is not homogenous. The factors affecting the social and demographic homogeneity of this group must be explained in detail.

The research is based on a survey of only N = 13 individuals. Also, the survey questionnaire contains only 12 questions. Therefore, in total the study is based only on 13 * 12 = 156 datapoints. This is very low. How reliable are these results? The authors must justify how 156 datapoints can be used to make comprehensive and far-reaching conclusions.

This is not clear from the data, and the study appears to be irrelevant. Therefore, the authors must prove the reliability of their conclusions using appropriate statistical metrics.

Table 2 (page 7) presents questionnaire results for one of the subjects. What is the significance of this specific table for the overall quality of the manuscript, given that the same table can be displayed for all N = 13 subjects?

On what basis was this subject selected? Is he/she a typical representative in some way or an outlier? The text (rows 174-175) explains nothing about these important issues. The authors must either explain the reason for Table 2, show Table 2 for all subjects or delete the table.

Again, Figure 6 shows hierarchical cluster analysis for one of the users. The same questions remain as for Table 2 and must be answered.

The Discussion paragraph presents only a qualitative and descriptive analysis of the survey results. A quantitative analysis of the robustness of the results is missing and must be included in the revised version.

Finally, the authors provide only 16 references, which is completely inadequate. The reference section must be significantly expanded to include additional relevant research.

Author Response

Dear reviewer,

Thank you very much for taking the time to review our manuscript. We appreciate these valuable comments. Please find the detailed responses below and the corresponding revisions and corrections highlighted in the re-submitted files.

Comments 1: There is no background information about the survey participants. Are they members of the general population or a specific group? The distribution of their age is quite large 14.4 years vs. 35.8 years average (assuming normal distribution). This is very negative if the group is not homogenous. The factors affecting the social and demographic homogeneity of this group must be explained in detail.

Response 1: Thank you for pointing this out. We would like to explain that the subject group includes university students and working adults, whose ages range from their twenties to sixties. The subjects participated in the experiment designed to investigate evaluations based on individual preferences as healthy adults. We have added the following text to provide more information about the group in Section 2-5:

(page 6, line 183) “The subjects were 13 healthy adults (5 males, 8 females, age 35.8 ± 14.4 years) including university students and working adults.”

Comments 2: The research is based on a survey of only N = 13 individuals. Also, the survey questionnaire contains only 12 questions. Therefore, in total the study is based only on 13 * 12 = 156 datapoints. This is very low. How reliable are these results? The authors must justify how 156 datapoints can be used to make comprehensive and far-reaching conclusions.

This is not clear from the data, and the study appears to be irrelevant. Therefore, the authors must prove the reliability of their conclusions using appropriate statistical metrics.

Response 2: Thank you for pointing this out. The study is at the beginning stage of research on relationships between humans and emerging human-like systems. We would like to expand the survey and accumulate more results. We have confirmed the limitation in the discussion section and added the following text to provide more information about future work in the conclusion section:

(page 14, line 424) “In future work, a system personalized with the properties described in this paper will be examined. Additionally, interactive systems targeting specific users, such as children and the elderly, are needed and researched [34, 35]. Different results from this study could be shown in specific groups. In terms of addressing individuals, handling emotions with affective computing [36] should be included. Further research considering these aspects is necessary.”

Comments 3: Table 2 (page 7) presents questionnaire results for one of the subjects. What is the significance of this specific table for the overall quality of the manuscript, given that the same table can be displayed for all N = 13 subjects?

Response 3: Thank you for pointing this out. We would like to explain that the purpose of Table 2 is to provide a concrete example of the questionnaire results to illustrate the analysis method. The results from this subject were selected because these were relatively suitable for illustration. The results for all subjects are not included in the main text for conciseness. They are available in the supplementary file. Therefore, we have added the following text to explain: (page 7, line 217) “as an example”

Comments 4: Again, Figure 6 shows hierarchical cluster analysis for one of the users. The same questions remain as for Table 2 and must be answered.

Response 4: The purpose of Figure 6 is to provide a concrete example of the analysis results to illustrate the aggregation method from dendrograms to create the graphs in Figure 7. The selection and omission reasons are the same as above. Therefore, we have added the following text to explain: (page 8, line 231) “as an example”

Comments 5: The Discussion paragraph presents only a qualitative and descriptive analysis of the survey results. A quantitative analysis of the robustness of the results is missing and must be included in the revised version.

Response 5: Thank you for pointing this out. We have added the following text to enhance the explanation of analysis in the discussion section:

(page 12, line 321) “Focusing on the individual evaluations, cluster analysis of the links in the evaluations was conducted. Based on the analysis results, the cumulative results of all subjects’ evaluations were organized, as shown in Figure 7. According to the cumulative results, the system evaluation items were classified, as shown in Table 3. Based on the cluster analysis results of links in the evaluations, the system evaluation items were classified, as shown in Table 3.”

Comments 6: Finally, the authors provide only 16 references, which is completely inadequate. The reference section must be significantly expanded to include additional relevant research.

Response 6: Thank you for pointing this out. We have added the following text to the relevant descriptions, increased the bibliographic references, and renumbered the references:

(page 1, line 33) “including everyday issues such as healthcare, education or the whole lifestyle [3–5].”

(page 1, line 39) “, and it was recognized as a human in a Turing test [7].”

(page 2, line 44) “, introducing emotions, personas, or personal stories [8–10].”

(page 2, line 45) “The uncanny valley hypothesis, which concerns eerie feelings arising as hu-man-likeness increases, is inconclusive [12], and that is still a complex issue with user feelings for human-like systems including dialogue systems.”

(page 2, line 53) “For systems, personality adaptiveness can become a valuable feature [14].”

(page 2, line 56) “Systems can demonstrate human-like value as communication partners. The media equation, which suggests that interactive systems can be perceived as equally human [15], is widely recognized. It is shown that conversations with chatbots have positive psychological effects similar to conversations with humans [16] and that apologies from robots have value for people [17].”

(page 2, line 67) “Since indicating social behavior in interactions is important for humans to recognize non-living things as living with empathy [22–24], dialogues with the system were evaluated by progressively developing the social interaction structures.”

(page 13, line 410) “and safely for users [31–33].”

(page 14, line 425) “Additionally, interactive systems targeting specific users, such as children and the elderly, are needed and researched [34, 35].”

(page 14, line 427) “In terms of addressing individuals, handling emotions with affective computing [36] should be included.”

Sincerely,

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

This article focuses on individual preferences in system behavior to improve personal relationship compatibility between users and systems. In this study, the relations between the properties of a human-like dialogue system and the dialogue evaluations were investigated using hierarchical cluster analysis for individual subjects.

The article is well-written and well-founded. Therefore, I favor publishing it in the Applied Science Journal, as long as the authors make the small but important improvements mentioned below in this article. Here are the suggested improvements:

End Section “1. Introduction” with a brief paragraph stating what will be developed in the rest of the article. For example, “In Section e it is described... In Section 3, it is developed...”
The article is weak in bibliographic citations. I suggest performing a bibliographic search in the cloud and increasing the bibliographic references from 16 to at least 35. To this end, add Section “2. Related Works” to this article. All this is because of the richness of the subject addressed.
In line 48, replace “haven’t” with “have not”. Avoid abbreviated English terms in scientific articles.
Reference [9] cited in the Introduction of the article (line 57) is a bit strange since it is listed as “unpublished” in the bibliographic references at the end of the article.
When you talk about the originality of your article in the last paragraph of the Introduction, it would be interesting to list all of these originalities. Also, make it very clear how original this article is in relation to reference [9], which you also authored.
In line 82, when you cite Table 1 for the first time, dedicate a generous paragraph explaining this Table. How are the items in this Table organized?
There is a missing comma in the caption of Figure 3.
This is very important! Create Section “2.6 General Algorithms”. In this Section, you should include a computational algorithm explaining the method you developed in detail. The text contained in Section 2.5 is attached to a style of writing typical of psychology and is not written in the style of Computer Engineering. This point of view is necessary because a scientific article must be reproducible by anyone who reads it anywhere in the world.
Start Section “3. Results” with a brief paragraph explaining what will be done in it. Do not go directly to Section 3.1 without writing anything before this.
A full stop is missing from the captions of Tables 2 and 3.
Write the titles of Sections 3.1, 3.2, and 3.3 with all words beginning with a capital letter, as was done in Sections 2.1 and 2.2. Maintain a standard writing style.
The Conclusion of the article can still be improved. For example, future improvements to this work, which are described at the end of Section 4, can be moved to the end of the Conclusion.

Sincerely,

The Reviewer.

Author Response

Dear reviewer,

Comments 1: End Section “1. Introduction” with a brief paragraph stating what will be developed in the rest of the article. For example, “In Section e it is described... In Section 3, it is developed...”

Response 1: Thank you for pointing this out. We have added the following text to the end of the introduction section:

(page 2, line 83) “The remainder of this paper is structured as follows. Section 2, Methods, describes the dialogue system, questionnaire, and experimental conditions. Section 3, Results, presents, first, responses to the questionnaire, second, the aggregated results from analysis of the responses, third, the classification of system evaluation items based on the results. Section 4, Discussion, considers the properties of the dialogue system evaluated through these items. Section 5, Conclusions, summarizes the findings and describes future work.”

Comments 2: The article is weak in bibliographic citations. I suggest performing a bibliographic search in the cloud and increasing the bibliographic references from 16 to at least 35. To this end, add Section “2. Related Works” to this article. All this is because of the richness of the subject addressed.

Response 2: Thank you for pointing this out. We have added the following text to the relevant descriptions, increased the bibliographic references, and renumbered the references:

(page 1, line 33) “including everyday issues such as healthcare, education or the whole lifestyle [3–5].”

(page 1, line 39) “, and it was recognized as a human in a Turing test [7].”

(page 2, line 44) “, introducing emotions, personas, or personal stories [8–10].”

(page 2, line 53) “For systems, personality adaptiveness can become a valuable feature [14].”

(page 13, line 410) “and safely for users [31–33].”

(page 14, line 425) “Additionally, interactive systems targeting specific users, such as children and the elderly, are needed and researched [34, 35].”

(page 14, line 427) “In terms of addressing individuals, handling emotions with affective computing [36] should be included.”

Comments 3: In line 48, replace “haven’t” with “have not”. Avoid abbreviated English terms in scientific articles.

Response 3: Thank you for pointing this out. We have corrected this as follows: (page 2, line 54) “have not”

Comments 4: Reference [9] cited in the Introduction of the article (line 57) is a bit strange since it is listed as “unpublished” in the bibliographic references at the end of the article.

Response 4: We agree with this comment. We would like to explain that reference [9] (now renumbered as [21]), which is our previous study report, is currently under review for another Special Issue. We will update the information or remove it from the list when the process is completed.

Comments 5: When you talk about the originality of your article in the last paragraph of the Introduction, it would be interesting to list all of these originalities. Also, make it very clear how original this article is in relation to reference [9], which you also authored.

Response 5: Thank you for pointing this out. We have added the following text in the introduction section to distinguish the originality:

Comments 6: In line 82, when you cite Table 1 for the first time, dedicate a generous paragraph explaining this Table. How are the items in this Table organized?

Response 6: Thank you for pointing this out. We have added the explanation of Table 1 in Section 2-4, General Algorithms, as Response 8. Thus, the content about Table 1 has been organized by removing the following unnecessary sentences from Sections 2-1 and 2-2:

(page 3, line 102) “system’s voice output. The prompts and processing are shown in Table 1.”

(page 3, line 112) “is positive or negative. The prompts and processing are shown in Table 1.”

(page 4, line 134) “according to each setting. The operation of the dialogue system in the settings is shown in Table 1.”

Comments 7: There is a missing comma in the caption of Figure 3.

Response 7: Thank you for pointing this out. We have corrected this as follows: (page 4, line 136) “a partner, and an object of interest.”

Comments 8: This is very important! Create Section “2.6 General Algorithms”. In this Section, you should include a computational algorithm explaining the method you developed in detail. The text contained in Section 2.5 is attached to a style of writing typical of psychology and is not written in the style of Computer Engineering. This point of view is necessary because a scientific article must be reproducible by anyone who reads it anywhere in the world.

Response 8: Thank you for pointing this out. We have added Section 2-4, General Algorithms, and provided an explanation of Table 1 in this section as follows:

(page 5, line 149)

2-4. General Algorithms

The prompts and the processing for the dialogue system in each setting and examples of the interaction are shown in Table 1. On the vertical axis of Table 1, actions of the subject and system are listed with the progression of the experiment. On the horizontal axis, the experimental conditions, which are three types of interaction settings, are arranged. The top columns show the features of the system’s behavior in each setting.

In the response generation prompts, ChatGPT (gpt-3.5-turbo-0613 model, Open-AI) is prompted in system role to respond to a subject input in a polite tone of ap-proximately 25 words, and in assistant role to provide a reply to a subject input or to provide a reply to a subject input on a specified topic.

In the emotional evaluation prompts, ChatGPT (gpt-3.5-turbo-0613 model, Open-AI) is prompted in assistant role to provide a happiness degree and a sadness degree on a scale of 100 evaluating the emotions present in a combination of a subject input and a generated response to this. The difference in numbers between happiness and sadness degrees is converted into five levels with equal intervals, and these levels are used to select a face image.

Unity (Unity Technologies) was used to integrate and present these functions.

Comments 9: Start Section “3. Results” with a brief paragraph explaining what will be done in it. Do not go directly to Section 3.1 without writing anything before this.

Response 9: Thank you for pointing this out. We have added the following text to explain in the results section:

(page 6, line 203) “All the subjects interacted with the system and responded to the questionnaire. First, the responses are presented from the perspective of overall trends and individual subjects. Second, hierarchical cluster analysis was conducted on each subject. The analysis results were aggregated to investigate the cumulative results of all subjects’ evaluations. The process is illustrated using the response and dendrograms of one subject as an example. Third, the system evaluation items were classified according to the cumulative results.”

Comments 10: A full stop is missing from the captions of Tables 2 and 3.

Response 10: Thank you for pointing this out. We have corrected this by adding full stops to the captions of Tables 2 and 3, (page 8, line 225) and (page 11, line 317).

Comments 11: Write the titles of Sections 3.1, 3.2, and 3.3 with all words beginning with a capital letter, as was done in Sections 2.1 and 2.2. Maintain a standard writing style.

Response 11: Thank you for pointing this out. We have corrected this by capitalizing the titles of Sections 3.1, 3.2, and 3.3 as follows:

(page 7, line 209) “3-1. Questionnaire Result”

(page 8, line 226) “3-2. Analysis of Links Between Evaluations to the System and the Dialogue”

(page 11, line 281) “3-3. Classification of the System Evaluation Items”

Comments 12: The Conclusion of the article can still be improved. For example, future improvements to this work, which are described at the end of Section 4, can be moved to the end of the Conclusion.

Response 12: Thank you for pointing this out. We have added the following text to the conclusion section:

Sincerely,

Article Menu

Classification of Properties in Human-like Dialogue Systems Using Generative AI to Adapt to Individual Preferences

Further Information

Guidelines

MDPI Initiatives

Follow MDPI