Effects of Speech Level Shift Tested by a Non-Task-Oriented Dialog System on Text Chat Dialog with Users in Japanese: A Pilot Study

Nagai, Nozomu; Miyamoto, Tomoki; Katagami, Daisuke

doi:10.3390/app15073897

Open AccessArticle

Effects of Speech Level Shift Tested by a Non-Task-Oriented Dialog System on Text Chat Dialog with Users in Japanese: A Pilot Study

by

Nozomu Nagai

^1,†,

Tomoki Miyamoto

^2,*,† and

Daisuke Katagami

^1,*

¹

Graduate School of Engineering, Tokyo Polytechnic University, Iiyama-Minami 45-1, Atsugi-shi 105-0123, Kanagawa, Japan

²

Graduate School of Informatics and Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu 182-8585, Tokyo, Japan

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(7), 3897; https://doi.org/10.3390/app15073897

Submission received: 14 November 2024 / Revised: 14 March 2025 / Accepted: 21 March 2025 / Published: 2 April 2025

(This article belongs to the Section Robotics and Automation)

Download

Browse Figures

Versions Notes

Abstract

Recently, interaction between humans and dialog systems has become increasingly common and sophisticated. Humans establish good relationships with others using various linguistic considerations (e.g., politeness and speech level) in dialog. However, the effect of linguistic considerations used by dialog systems has remained unclear. This study examines the effects of the speech level shift used by a text chat dialog system in Japanese-language user dialog. We designed a rule-based, non-task-oriented text dialog system that controls formal and informal speech levels for Japanese dialog; the effects of a shift in the speech level used by the dialog system were verified through psychological experiments using text chats (n = 134). The speech level control method was constructed with reference to statistical information from the BTSJ Japanese natural conversation corpus and knowledge of linguistic considerations (politeness). The results of the experiment showed that 41.3% of the participants who interacted with the dialog system that shifted the speech level also shifted their speech levels in response. Moreover, a subjective evaluation revealed that participants who noticed the speech level shift of the dialog system felt that this system paid attention to its relationship with the participants. The experimental results suggest that a dialog system can realize dialog in which the user and system both adjust their relationship through dynamic speech level shifts. This study identifies the importance of dynamic speech level shifts, an important factor for expressing politeness, in interactions between humans and dialog systems.

Keywords:

non-task-oriented dialog system; rule-based dialog system; speech level shift; politeness

1. Introduction

Research on dialog systems has attracted significant attention in recent years. Dialog systems can be categorized as task-oriented dialog systems, which support a specific user task, and non-task-oriented dialog systems, which are designed for chatting. As an example of a task-oriented dialog system, Siri [1] primarily performs specific tasks, such as smartphone operation and information retrieval. Rinna [2] is a chatbot designed to carry out casual conversations with users and is classified as a non-task-oriented dialog system. Rinna is an AI dialog system that was developed as a “Japanese high school girl” and speaks mainly in the Japanese friend style (i.e., an informal style) for the speech level (e.g., using “dayo” as a particle to indicate informal speech in Japanese). The speech level allows for adjustments in formality, which is a major aspect of Japanese. As the user is a customer who receives a service, the Japanese version of Siri uses honorifics (i.e., a formal style) as its main speech level (e.g., using “desu” as a particle to indicate formal speech in Japanese).

Users of languages with different speech levels generally apply these levels depending on the contents of the dialog and the social (psychological) distance between themselves and the other party [3]. For example, speakers commonly use the informal speech level when chatting among friends, whereas they use the formal style when meeting someone for the first time. Another dialog strategy that is used by Japanese speakers is the expression of their intention to reduce the psychological distance by dynamically switching their speech level from formal to informal during a particular dialog. This dialog strategy of dynamically manipulating the speech level is known as a “speech level shift” in the field of sociolinguistics [4]. In this paper, we define the act of shifting speech levels in real-time within a single conversation as a “dynamic speech level shift”. Conversely, speech level shifts that occur across conversations (for example, from one day to the next) are defined as “static speech level shifts”. These definitions are used throughout the paper as required. Several studies on the politeness of utterances in dialog systems have referred to knowledge obtained through sociolinguistics [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]. Here, the word “politeness” refers to dialog strategies for decreasing conflict in human relationships (i.e., linguistic considerations) and not in the general sense (i.e., formality as a linguistic form). This is based on Brown and Levinson’s politeness theory [21], a typical framework for modeling interpersonal communication in sociolinguistics and pragmatics. An overview of this politeness theory is presented in Section 2.1. In sociolinguistics and pragmatics, the usage of honorifics in Japanese (including speech level shifts), a crucial factor of politeness, has been discussed by a wide range of researchers, including those conducting empirical research based on human dialog [22,23] and those conducting theoretical research advocating for improvements to Brown and Levinson’s politeness theory [24].

Previous studies on speech honorifics and politeness in dialog systems have been dominated by psychological experiments in which robots or virtual agents are asked to apply specific strategies regarding linguistic considerations in politeness theory, followed by an examination of the effects of these strategies [5,6,7,8,9,10,11,12,13]. Studies of text-based chatbots have included research examining the effects of static speech level shifts across days [25] and psychological experiments examining the effects of linguistic considerations from a broad perspective that encompasses the concept of politeness theory [16]. Recent engineering research has proposed a polarity dialog system model based on neural network-based machine learning [18], as well as several methods for predicting and classifying politeness in conversation using transformer-based large language models [26,27,28,29,30]. Additionally, previous works have suggested that ChatGPT may be able to take advantage of the dialog strategies defined in the politeness theory [31], increasing our knowledge of how to use politeness in human-machine interactions [32]. However, even after numerous studies on politeness in dialog systems from both psychological and engineering viewpoints, the impact of speech level shifting as a fundamental strategy using linguistic considerations to adjust the psychological distance from the user during interactions with the dialog system remains uncertain.

This study examined the effects of a dynamic speech level shift by a text chat dialog system on dialog with users in Japanese. We developed a non-task-oriented text dialog system that dynamically controls the formal and informal speech levels in Japanese dialog and verified its effectiveness through psychological experiments. We adopted the formal speech level ratio in the utterances of the user (rate of using the formal style) as a trigger for a speech level shift. Furthermore, with reference to politeness theory [21], which is a framework of linguistic considerations in human dialog, we designed dialog situations that take into account information regarding the speaker. Three experimental conditions were set: a shift in the speech level and the application of formal and informal styles. We performed analyses on these three conditions based on a subjective evaluation by the users and the dialog logs between the user and the dialog system. This study provides valuable insights into the design of dialog systems by elucidating the effects of dynamic speech level shifts. These shifts are crucial for conveying politeness in linguistic communication, i.e., the dialog between dialog systems and humans. It is important to note that the focus of this research is to investigate the effects of dynamic speech level shifts in dialog systems on user interaction through psychological experiments. The study does not claim to have developed novel engineering methods or evaluated their usefulness.

The remainder of this paper is organized as follows: Section 2 outlines related studies. The non-task-oriented dialog system for controlling the speech level is described in Section 3. Thereafter, the evaluation experiments and their results are explained in detail. The overall results of the evaluation experiments are discussed in Section 5. Finally, a summary of this research is presented in Section 6.

2. Related Work

In this section, we introduce the key concepts in this paper, namely politeness theory and speech level shifts, and review previous chatbot studies that mentioned the effects of speech level shifts.

2.1. Politeness Theory and Speech Level Shift

Politeness theory [21] is a framework that systematizes the linguistic considerations that people use to build strong relationships with their dialog partners. The key concepts in politeness theory are the desire to be liked by others, which is known as positive face, and the desire not to be disturbed or imposed upon by others, which is known as negative face. The degree of face-threatening (FT) is determined by the social distance between the speaker and hearer, their relative power relations, and the burden of FT acts within the culture. The FT degree is determined by Equation (1).

W x = D (S, H) + P (S, H) + R x .

(1)

As indicated in Equation (1), the FT level (Wx) is obtained by the sum of the social distance (D) between the speaker (S) and hearer (H), the relative power relationship (P), and the degree of burden of action x in the culture of concern (Rx). An example of a specific situation and speaker attributes relating to D, P, and Rx is presented in Table 1. In this study, the speaker attributes from Table 1 were considered when setting the speech level of the utterances in the dialog system. In this study, we applied a specific aspect of politeness theory to the implementation of dialog systems. It is important to acknowledge that within the humanities, Brown and Levinson’s politeness theory has been the subject of extensive critique and has spurred the development of alternative theoretical frameworks. For instance, some researchers have pointed out that Brown and Levinson’s theory cannot comprehensively express social conventions outside of Western culture and cannot be applied to Japanese-language contexts [24,33]. Specifically, Ide [24] contends that Brown and Levinson’s theory fails to capture the nuances of Japanese politeness, particularly the use of honorifics and context-dependent politeness strategies. Similarly, Matsumoto [33] criticizes the universality of Brown and Levinson’s concepts of “face” and “politeness”, arguing that they are culturally specific and do not adequately reflect Japanese cultural norms. These criticisms suggest that Equation (1) is an oversimplified model of interpersonal relationships. Conversely, Brown and Levinson’s theory [21] is supported by many politeness researchers as a universal theory of language usage. Usami [34] provides logical criticism of research that strongly criticizes Brown and Levinson’s theory. However, it also supports Brown and Levinson’s theory and proposes partial improvements to the theory. This debate is ongoing, and numerous papers on the subject are still being published; however, a detailed discussion is outside the scope of this research. Herein, we introduce typical findings that focus on the utilization of honorifics in Japanese. Notably, in Japanese, honorifics are used based on social conventions (pragmatic constraints) rather than as a strategy for interaction [24]. However, few studies have statistically investigated the use of honorifics based on actual dialog data. In previous studies that analyzed the use of honorifics based on relatively small-scale dialog data (12 conversations) [22,23], the subject was a first-meeting dialog between ordinary working people, and D and Rx in Equation (1) were controlled by the experimental design. The results suggested that honorifics were used at a similar rate regardless of the value of P, and this finding was supported by later work [34]. Furthermore, it was shown that a shift in speech level from formal to informal is more likely to occur when the interlocutor is of a lower age or social status. In this study, we implemented a method for speech level shifting in the dialog system, as described in Section 3, considering the findings of previous studies involving a limited scale of dialog data, such as those in [35].

2.2. Effects of Speech Level and Politeness on Chatbot

Nagai et al. [37] developed a dialog system that automatically generates utterances based on the Seq2Seq model. They conducted experiments on the generation of formal and informal styles and evaluated their generated impression. The experimental results revealed that an informal utterance style improves the enjoyment of dialog for the user, whereas a formal utterance style tends to provide a polite impression. The objective of the previous study was to automatically generate speech at different levels; however, the impact of shifting these speech levels was not examined. A module for the automatic generation of politeness utterances was also developed in a recent study that focused on English [19].

In a study on a dialog system that switches the speech level from formal to informal in a static manner, Kageyama et al. [25] used an example-based method to determine the response of the dialog system according to the similarity with the previously collected dialog. The authors developed a dialog system that uses both speech levels in the collected example data. They conducted a user study in which the users interacted with an example-based dialog system continuously for three days and statically switched the speech levels of the dialog system by applying a formal style on the first day. The experimental results demonstrated that the dialog system that performed a static speech level shift improved the user’s evaluation of satisfaction and friendliness with each passing day. In particular, the effect of a gradual change in speech level was more pronounced than that of the condition in which it did not change. Furthermore, it was suggested that even if the user was unaware of the change in speech level, they might subconsciously develop a positive impression. This prior study suggests that static speech level shifting in dialog systems is beneficial. However, the effect of a dialog system that performs dynamic speech level shifting, which humans naturally perform in dialog, is unknown. Other studies on chatbots have been conducted to examine the effects of gratitude and apologies [20], as well as the relationship between the user’s gender/age/personality and the effects of polite utterances [17].

Furthermore, studies on a robot that assists in vehicle driving by providing spoken utterances have used politeness theory as a reference for the utterance design [5]. These works reported the effects of support from an informal style on the friendliness of the robot. It was found that the ratio of users who prefer an informal style, formal style, or neither is 4:1:1. In prior research focused on politeness in dialog systems targeting Japanese [5,36], politeness strategies in politeness theory (e.g., apologizing, joking, praising, etc.) have been applied to utterance design in conjunction with speech levels [8,37,38,39,40]. In addition, multimodal dialog using embodied robots and virtual characters are relatively frequent research subjects [8,10,11,12,13,38]. This study aimed to verify the impact of speech levels alone in a text-based dialog system. In dialog where text is the only modality, even a change in speech level can significantly affect the interaction between the user and the system.

3. Methods

This section describes the experimental dialog system and dialog rules that we constructed to investigate the effects of dynamic speech level shifts in dialog systems through psychological experiments. The experiments were approved by the Research Ethics Committee of Tokyo Polytechnic University. All the experiments were carried out in accordance with the guidelines of the Research Ethics Committee of Tokyo Polytechnic University.

3.1. Dialog System with Speech Level Control

We describe a non-task-oriented dialog system that provides dynamic speech level control in dialog. A schematic of the dialog system is presented in Figure 1. The dialog system receives the text of the user’s speech and performs morphological analysis using MeCab (https://taku910.github.io/mecab/, accessed on 31 March 2025) [41]. The speech level of the user’s utterance is identified based on the results of this analysis. This dialog system determines that the speech level is formal style when the user’s utterance contains “desu” or “masu”. Figure 1 shows an example case where the user’s utterance is given in the formal style. The results of the morphological analysis are recorded in the user dialog log along with the utterance ID, utterance content, and formal style rate. The formal style rate (

T_{f o r m a l} / T_{u t t e r a n c e}

) is the proportion of utterances made in the formal style by the user during the dialog; it is calculated by counting the number of times the user used the formal style (

T_{f o r m a l}

) and dividing this by the total number of user utterances (

T_{u t t e r a n c e}

). The formal style rate is updated each time the user speaks. The speech level of the next utterance of the dialog system is determined based on the most recent formal style rate. Specifically, if the most recent formal style rate is greater than or equal to a threshold (

θ

), the dialog system’s utterance is set to be in the formal style; otherwise, it is set to be in the informal style. Based on the chosen speech level, the next specific utterance is chosen from the dialog system’s utterance list. This system is designed to be rule-based (scenario-based), and the dialog progresses according to a single scenario that has been set in advance. We created a dialog scenario based on the dialog situation (see Table 2) that was adopted in the Situation Track of the Dialog System Live Competition 2 [42]. For each round of dialog, we prepared a list of dialog system utterances, with one formal-style and one informal-style candidate for each turn, and one candidate was selected based on the chosen speech level. The utterance that is finally selected by the dialog system is output to the user. The user then inputs a reply to the new utterance from the dialog system and the dialog proceeds in the same manner.

The threshold value (

θ

) was determined based on the formal style rate in the BTSJ Japanese natural conversation corpus [35]. Among the data in the corpus, we adopted the formal style rate of the first meeting dialog between university students who have the same relationship level as the threshold value. Specifically, the formal style rate was calculated based on the target conversation data using Equation (2).

Number of sentences containing “ desu ” or “ masu ” / Number of all sentences .

(2)

The result was 29.6%, which was used as the threshold in this paper. This threshold was calculated based on data from natural conversations between humans and thus has a certain degree of reliability.

In this study, the states in which users interacted with the dialog system were set as follows:

The user and dialog system are meeting for the first time and are of the same generation.
D and P are even.

Figure 1. Flow of dialog system with speech level control.

Table 2. Example of experimental dialog conditions.

System Profile	Dialog System Developed for Ages 20 to 30
User profile	Age: 20 to 30, job: office worker
Place/Time	Own home/free time
Topic	Travel
Background	The dialog system is a communication system that runs on an anonymous text chat service. The user likes to travel and one day finds a dialgoue system on the chat service that is also interested in travel and decides to speak to the system regarding this common interest.

An example of dialog from the dialog system is depicted in Figure 2. The speech balloon on the left (white) represents a dialog system utterance, whereas that on the right (green) represents a user utterance. The dialog system says “Nice to meet you” as its first utterance, and the system uses a formal style. Thereafter, the speech level of the dialog system changes dynamically from a formal style to an informal style (bold part) according to the speech level (informal style) of the user.

3.2. Experimental Dialog Rules

The dialog system design achieves human-like speech level shifting based on human-to-human dialog data. However, as this study aimed to verify the effects of speech level shifting by a dialog system, it was essential for the system to perform speech level shifting in the experiments. In addition, as it is natural for Japanese people to use formal-style utterances in their first conversation, we were concerned that users would not voluntarily perform the speech level shift if the dialog system did not actively perform the speech level shift. Therefore, we designed independent speech level shift rules for the speech level shift condition in the experiments. Specifically, the first, twelfth, thirteenth, fifteenth, and sixteenth utterances of the dialog system were set to the formal style, whereas the fourth and fifth utterances were fixed to the informal style. This rule was applied regardless of the formal style rate in the user utterances so that all experimental participants who were assigned to the speech level shift condition could experience the speech level shift of the dialog system.

4. Experiments

This section describes the experiments and their results.

4.1. Experimental Outline

The experiments aimed to examine the effects of the shift in the speech level of the dialog system. During the experiments, the participants and dialog system engaged in dialog, following which the participants evaluated their impressions of the dialog system through a questionnaire. Furthermore, we analyzed the formal style rate of the participant utterances and dialog logs.

Three experimental conditions were investigated with a between-subject design: a dialog system that only used a formal style (the formal style condition), a dialog system that only used an informal style (the informal style condition), and a dialog system that controlled the speech level (the shifting speech level condition). Each participant experienced only a single experimental condition due to the concern that the order effect and learning effect would be large if the experiment were conducted using a within-subjects design. Each participant was randomly assigned to one of the experimental conditions. The contents of the responses were controlled under the three conditions. The presence or absence of a shift in the speech level was an independent variable.

The participants were recruited through Lancers, a crowdsourcing service (n = 134). The gender ratio and average age of participants in each experimental condition were roughly equivalent.

Shifting speech level condition (males: 28, females: 18; average age: 41.3, S.D.: 10.0, max: 66, min: 22);
Formal style condition (males: 28, females: 20; average age: 39.3, S.D.: 9.5, max: 56, min: 20);
Informal style condition (males: 24, females: 16; average age: 42.3, S.D.: 11.7, max: 70, min: 24).

The required sample size for the experiment was calculated using G*Power 3.1.9.7 with a medium effect size (0.25) and a statistical power of 0.75. The calculation indicated that a sample size of n = 141 would be appropriate, which is roughly consistent with the sample size of this study (n = 134).

The dialog between the user and the dialog system automatically ended after 15 turns [42,43]. Approval for the experiments was obtained from the Research Ethics Committee of Tokyo Polytechnic University. Informed consent was obtained from all experimental participants. All experiments were carried out in accordance with the guidelines of the Research Ethics Committee of Tokyo Polytechnic University.

4.2. Procedure

First, the dialog aspect of the experiment was explained to the participants. The participants were instructed to interact with the dialog system according to the dialog state. An example of the dialog conditions between the user and dialog system is displayed in Table 2. During this experiment, the user and dialog system were of the same age, had similar interests, and had met for the first time. However, participants in the experiment were told that the dialog partner was not a human but a dialog system. This instruction was based on the guidelines for the Situation Track of the Dialog System Live Competition [42,43]. Both this study and the aforementioned competition aimed to develop a dialog system capable of engaging in natural, human-like dialog. Some evaluation methods, such as the Turing test, do not explicitly state who the dialog partner is. However, in this study, aiming to create a society where humans and dialog systems interact symbiotically, we consider evaluating dialog systems as a service crucial. To this end, we specifically considered contexts where the human user is aware that the dialog system is not human. Then, we asked the participants to register as users of Telegram [44], which is a social networking service. The participants interacted with the dialog system using Telegram. Finally, the participants evaluated their impressions of the dialog system, at which point the experiment ended. During the experiment, the dialog states were set with reference to the Dialog System Live Competition 2 [42,43].

The questionnaire consisted of 10 items that were designed to investigate the human-like nature of the dialog system, the recognition of changes in the speech level of the dialog system by the participants, their evaluation of the breakdown of the dialog, and their overall impression of the dialog. The participants responded to each item on a 7-point Likert scale (1 = strongly disagree, 4 = neither agree nor disagree, and 7 = strongly agree). The content of each question is presented below:

Q1: The dialog system behavior was like a human being.
Q2: The dialog system behaved in such a way that it maintained an appropriate social distance from me.
Q3: The utterances of the dialog system considered the social relationship with me.
Q4: There was a change in the wording of the dialog system (e.g., formal or informal style).
Q5: I felt that the utterances of the dialog system were a mixture of formal and informal styles.
Q6: I decided on my own wording with the wording of the dialog system in mind.
Q7: The dialog with the dialog system was not broken down.
Q8: The sentence endings of the dialog system were appropriate for the aforementioned situations (e.g., age and relationship between the user and system).
Q9: The dialog with the dialog system was enjoyable.
Q10: The dialog system was approachable.

Among these 10 items, Q1 to Q3 aimed to investigate the human-like nature of the dialog system, Q4 to Q6 were established to investigate whether or not the participants in the experiment were cognizant of the speech level of the dialog system, Q7 and Q8 aimed to investigate their evaluation of the dialog breakdown, and Q9 and Q10 were designed to evaluate their overall impression of the dialog. While Q2 and Q3 do not directly assess human likeness, a key aspect of human likeness considered in this study is the ability to dynamically adapt one’s attitude in response to one’s relationship with the dialog partner. Therefore, Q2 and Q3 were included as evaluation indicators of human likeness. We created the questionnaire items for Q2, Q3, Q4, Q5, and Q6 because no existing literature providing similar scales was available. For the other questionnaire items, we prepared each item with reference to existing evaluation scales for interactive robots and dialog systems [45,46,47] in order to comprehensively evaluate the dialog system. Specifically, Q1 was created by referring to a scale for evaluating the social presence, anthropomorphism, and lifelikeness of dialog robots [45,46], and Q7 and Q8 were created by referring to a scale for evaluating the detection of dialog breakdown in dialog systems [47]. Q9 and Q10 were created by referring to evaluation scales regarding the enjoyment and friendliness of interactions with a dialog robot [47]. The validity of the results of the evaluation scales in this experiment is discussed in Section 4.3, in which Cronbach’s alpha is calculated.

4.3. Statistical Analysis

This study aimed to verify the effect of dynamic speech level shifting by a dialog system on dialog with users. A statistical analysis was conducted to analyze this subject from the perspective of subjective evaluation, using the aforementioned questionnaire items as independent variables. By analyzing these questions as independent variables, the psychological effects of speech level shifts in dialog systems can be identified from multiple perspectives. The questionnaire evaluation values are depicted in Figure 3. One-way analysis of variance (ANOVA) [48,49,50,51] that does not assume equal variances was adopted as the statistical analysis method to determine whether there was a significant difference in the average values of the three experimental conditions for each item (

α = 0.05

). As mentioned in Section 4.2, the 10-item questionnaire was classified into four categories: “Human-like”, “Recognition of speech level”, “Dialog breakdown”, and “Overall impression”. Therefore, one-way ANOVA was conducted three times for “Human-like”, three times for “Recognition of speech level”, two times for “Dialog breakdown”, and two times for “Overall impression”. In summary, comparisons were made between all items within all categories. The Bonferroni method was used to correct the p-values according to the number of times one-way ANOVA was applied to each category.

η^{2}

was calculated as the effect size [52]. This value was calculated as follows: Sum of squares of a factor/Total sum of squares. The approximate effect size is

η^{2} = 0.1

for a small effect,

η^{2} = 0.25

for a medium effect, and

η^{2} = 0.4

for a large effect. As this is a novel study investigating the effect of dynamic speech level shifting performed by a dialog system, no established precedent exists to serve as a standard for interpreting the effect magnitude. However, please refer to the effect size in each analysis result to roughly interpret the strength of the effect. The Tukey method was used for the sub-test when a main effect was observed in the ANOVA. Cohen’s d was used as the effect size of the Tukey method. The approximate effect sizes are small

d = 0.2

, medium

d = 0.5

, and large

d = 0.8

.

4.4. Results

The ANOVA results revealed the main effects of Q4 (

F (2, 86.5) = 7.31, p < 0.01, η^{2} = 0.10

), i.e., “There was a change in the wording of the dialog system (e.g., formal versus informal style)”, Q5 (

F (2, 86.5) = 7.62, p < 0.01, η^{2} = 0.10

), i.e., “I felt that the utterances of the dialog system were a mixture of formal and informal styles”, and Q6 (

F (2, 86.5) = 2.44, p < 0.1, η^{2} = 0.03

), i.e., “I decided on my own wording while keeping the wording of the dialog system in mind”, which were used to examine whether the perception of the participants regarding the speech level changes of the dialog system was correct. The results of Q4, Q5, and Q6 are depicted in Figure 3. Multiple comparisons using the Tukey method for Q4, Q5, and Q6 demonstrated that the shifting speech level condition was rated higher than the informal style condition for Q4 (

p < 0.01, d = 0.11

) and Q5 (

p < 0.01, d = 0.12

).

Cronbach’s alpha coefficients were calculated to assess the reliability of the questionnaire. The alpha coefficients were calculated for each of the four evaluation categories shown in Figure 3. The values shown below are the mean (standard deviation) of the alpha coefficients calculated for the three experimental conditions for the four categories.

Human-like: 0.72 (0.10);
Recognition of speech level: 0.77 (0.07) (without Q6), 0.40 (0.07) (include Q6);
Dialog breakdown: 0.67 (0.11);
Overall impression: 0.81 (0.09).

The alpha coefficients for “Human-like”, “Dialog breakdown”, and “Overall impression” are reasonably high. The alpha coefficients for “Recognition of speech level” are low for all items, but the alpha coefficient rises to a reasonably high value when Q6 is excluded. In other words, the internal consistency of Q4 and Q5, which showed significant differences in Figure 3, is reasonably high.

The formal speech style rate of the experimental participants was examined and found to be 62.0% under the shifting speech level condition, 80.4% under the formal style condition, and 22.8% under the informal style condition. These results suggest that the participants tended to use the same speech level as the dialog system. Moreover, four participants had a formal style rate of 0% and six had a formal style rate of 100%, indicating that some participants did not apply any shift in their speech level. Next, we analyzed the extent of speech level shifting performed by the participants. In 19 (41.3%) of the dialog logs, the user either did not change their speech level or changed it only for greetings or assistance. In 19 (41.3%) of the logs, the user changed their speech level within two turns, according to the speech level shift of the dialog system. In the remaining 8 (17.3%) dialog logs, the user changed their speech level regardless of the speech level of the dialog system.

Figure 3. Impression evaluation results of text dialog experiments. The error bars represent the standard error.

5. Discussion

In this section, the contributions and limitations of this paper are discussed based on the experimental results.

5.1. User Impressions of Text Dialog System

Among the conditions, there were no significant differences in evaluation items Q1 to Q3, which aimed to evaluate the human-like characteristics. The average score for all conditions was approximately 5 points, indicating that the dialog system was evaluated as human-like by the experimental participants. The human likeness of a dialog system is an important evaluation index for the system and is considered significant in live dialog system competitions [42]. In previous studies [8,27,37,38], a significant correlation was observed between the dialog strategies used by dialog systems and their perceived human likeness. These studies [8,37,38,39,40] considered the speech level in addition to applying politeness strategies (such as jokes, compliments, and apologies) based on politeness theory [21] to the design of the dialog systems. Therefore, the differences between the experimental conditions were further emphasized compared to this study. These findings suggest that differences in the speech level and politeness strategies may need to be combined to change the impressions regarding the human likeness of the dialog system significantly through the design of language expressions. Moreover, many previous studies [8,10,11,12,13,38] focused on text-based chat with communication robots and virtual agents, which may affect the impressions of dialog systems depending on their modality. The emergence of ChatGPT [53] in recent years has drawn attention to text-based dialog between humans and dialog systems. Thus, research on the design of dialog systems that focus on text-based chats will become increasingly important in the future. In this experiment, a rule-based method was employed to control the utterances of the dialog system. In recent years, extensive research has been performed on large language model-based dialog systems; however, rule-based methods are also widely used in dialog systems for services provided by companies [54]. Although this research does not provide state-of-the-art engineering techniques, we believe that the speech level shift module of the dialog system presented in this paper can be applied to existing rule-based dialog systems, such as those employed by companies. By employing a rule-based system, the speech of the dialog system during the experiment can be easily controlled, while the participants in the experiment can speak freely. Some experimental paradigms, such as the Wizard of Oz approach, do not require autonomous systems; however, psychological experiments using rule-based dialog systems, such as the experiments performed in this study, can be conducted to verify the psychological effects of speech level shifting used by a dialog system in a situation similar to a practical dialog system use case.

The shift in the speech level was rated significantly higher for Q4 and Q5, which were the items that indicated whether the participants were aware of the speech level changes. This result demonstrates that the participants perceived the speech level changes in the system more strongly under the condition in which the speech level was deliberately shifted. In contrast, there was no significant difference in Q6. As the mean score for Q6 was higher than four under all experimental conditions (i.e., one-sample test,

p s < 0.01

, and stochastically significant following the Bonferroni multiple-comparison correction), it was assumed that the participants used the speech level of the dialog system as a reference, despite no significant difference in the degree to which this was the case across all conditions.

No significant differences were observed between Q7 and Q8, which were the items that indicated a breakdown of the dialog. Therefore, there were no significant differences in the degree of dialog breakdown among the experimental conditions. Moreover, the mean scores for Q7 and Q8 were higher than four under both experimental conditions (i.e., one-sample test,

p s < 0.01

, and stochastically significant following the Bonferroni multiple-comparison correction), indicating that the dialog system could provide an appropriate dialog for the context and situation during the experiment.

No significant differences were identified between Q9, which evaluated the impressions of the dialog, and Q10. The mean scores for Q9 and Q10 were above four points in all experimental conditions (i.e., one-sample test,

p s < 0.01

, and stochastically significant following the Bonferroni multiple-comparison correction). This indicates that the dialog system could realize enjoyable and friendly dialog for the participants.

The above results demonstrate that the user impressions were good under all experimental conditions. Thus, the quality of the dialog system during the experiment was sufficient. In the following sections, we describe an analysis that focused on verifying the effects of the speech level shift of the dialog system, which was the main objective of this study.

5.2. Promotion of Speech Level Shift

The formal style rate in the user utterances tended to be similar to that of the dialog system. Thus, we analyzed the dialog logs between the participants and the dialog system, with a focus on the conditions in which the speech level was shifted. Table 3 presents the dialog logs as an example of a successful dialog with the formal style rate (26.67%) that was closest to that of the BTSJ Japanese natural conversation corpus for the core conversation during the first meeting. At the beginning of the dialog, both the user and dialog system used a formal style; however, the dialog system shifted its speech level to an informal style at the fifth utterance (bolded part). From the fifth to eleventh utterances, the user continuously spoke in an informal style, and the dialog system used an informal style to match the user. We observed a case in which the users also shifted their speech level in response to a speech level shift in the dialog system. In this dialog example, the speech level was shifted from a formal style to an informal style, and thus, the dialog was conducted so that the user and dialog system reduced their psychological distance. An example of inappropriate dialog is shown in Table 4. In this example, the dialog system does not provide appropriate responses to the user’s questions and opinions. Because the dialog system used in this experiment is rule-based, it cannot generate flexible responses to the user’s questions or to the development of new topics, which is a weakness of the system. However, the results presented in Figure 3 show that this experiment succeeded in giving the user the impression that the dialog was appropriate.

In 19 (41.3%) of the logs, the user changed their speech level within two turns, according to the speech level of the dialog system. According to these results, the speech level shift of the dialog system successfully encouraged several users to shift their speech level. Furthermore, we investigated the correlation between the impression evaluation value and the number of speech level shifts. We conducted a correlation analysis with the evaluation values of Q4, Q5, and Q6 for 46 experimental participants in the speech level shift condition. No significant correlation was found with any of the impression evaluation items (Q4: correlation coefficient (r)

= 0.21

, Q5:

r = 0.03

, and Q6:

r = 0.02

,

p s > 0.1

). These results suggest that awareness of dialog system speech level shifts does not necessarily lead to an increased frequency of user speech level shifts.

5.3. Speech Level Recognition and Users’ Impressions

The results of the previous section demonstrate that, although some users shifted their speech level according to the shift in the speech level of the dialog system, a certain number of users interacted without adjusting their speech to the level of the dialog system. Under the speech level shift condition, the speech level used by the dialog system changed depending on that of the user, and thus, differences in the user perception of the speech level of the dialog system may have caused variations in their impressions of the system. Based on the evaluation of Q6, namely “I decided on my own wording while keeping the wording of the dialog system in mind,” we clustered the experimental participants whose evaluation value was three or lower under the shifting speech level condition into the low speech level recognition group and the participants whose evaluation value was 5 or higher into the high speech level recognition group.

We compared the evaluations of the humanness of the dialog system of the two groups (Figure 4). The results of the t-tests revealed a tendency (

p < 0.1, d = 0.94

) for the high speech level recognition group to rate the dialog system higher for Q3; that is, “The utterances of the dialog system were attentive to its relationship with me”. Furthermore, the effect size was large. In this analysis, the p-values were corrected using Bonferroni’s method. No significant differences were observed in the other items. This result indicates that the users who determined their speech level by focusing on the shifting speech level condition tended to feel that the dialog system focused on its relationship with the user. For users in the high speech level recognition group, it is suggested that the dynamic speech level shift of the dialog system may facilitate the attribution of intentionality [55] and sociability [56] to the dialog system.

5.4. Limitations and Future Work

This study has several limitations. First, the experimental findings were limited to specific situations. We used Japanese dialog situations in which the speakers were the same age and meeting for the first time. However, experiments with situations in which the speakers are friends or of different ages are also necessary. Another limitation of the experiment is that it was conducted using only a text dialog system. Future experiments using interactive robots and virtual agents will provide useful knowledge for the research fields of human-agent and human-robot interaction. Similarly, interaction design to extend this work is expected to be realized for voice assistants that do not have physical characteristics [57]. Although speech level shifts are observed in a limited number of languages, many languages exhibit differences in the expression of respect, such as “can” and “could” in English; thus, it is possible to conduct research focusing on language forms in each country. Several studies have achieved the categorization and generation of polite utterances for the English language [58,59]. Verification focusing on language and culture is considered to be important in interaction research [60,61,62]. Additionally, this study focuses on changes in sentence endings in dialog and does not analyze the length of the dialog. In the future, it will be important to analyze not only the form of sentence endings (speech level) but also the quantity and quality of the dialog.

Finally, the number and timing of speech level shifts in this experiment were not controlled because the speech level of the dialog system changed depending on the formal style rate of the user. However, as a speech level shift is generally performed in response to the dialog situation and the attitude of the interlocutor, it would be reasonable to design a dialog system that performs speech level shifts in response to the formal style rate of the user as an engineering study. Moreover, it may be possible to conduct experiments in which the number and timing of the speech level shifts are completely controlled as a psychological study.

This experiment revealed that 41.3% of the participants performed speech level shifts in response to the speech level shifts performed by the dialog system; however, rigorously discussing the significance of this result is difficult (e.g., whether it exceeds the chance level), necessitating further research. This is because the research question posed by this study is relatively novel, and no existing data are available on the percentage of humans who engage in speech level shifts in response to dialog systems, complicating the determination of a general probability. Future research should consider methods for identifying the causes of speech level shifts performed by users, as well as a quantitative analysis of speech levels in human–dialog system dialog based on dialog data with sufficient quantity and quality.

In this study, we used the simplified face violation estimation formula shown in Equation (1) and Table 1 to implement the dialog system; however, in real dialog, appropriate speech level shifting may not be possible without considering various demographic variables, including gender, age, and nationality. In this study, we conducted dialog experiments using the simplified personas listed in Table 2. Future research will be required to implement a complex politeness dialog model based on a wider range of variables.

6. Conclusions

In this study, we investigated the effect of Japanese dynamic speech level shifts performed by a dialog system on user interaction through psychological experiments. Statistical data from the BTSJ Japanese natural conversation corpus were used to design the speech level shifts of the dialog system. The results of the psychological experiments demonstrated that the users shifted their speech level in accordance with the speech level shift of the dialog system at a certain frequency. Furthermore, in certain cases, we observed dialog that reduced the psychological distance between the users and the dialog system. The questionnaire results revealed insignificant differences in the evaluation of the human-like quality of the dialog system between the dialog system and baseline system (a dialog system without shifting speech levels). However, the users felt that the dialog system paid attention to its relationship with the users, who were cognizant of the speech level used by the system. According to these results, a dialog system can realize dialog in which the user and system both adjust their relationship through dynamic speech level shifts. It would be interesting to conduct further experiments that consider not only the speech level and politeness but also the semantics of the dialog in future work.

Author Contributions

N.N., T.M. and D.K. conceived the experiment, N.N. and T.M. conducted the experiment, and N.N. and T.M. analyzed the results. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by JSPS KAKENHI (Grant Numbers JP22H04869, JP20H05572, and JP18H03581).

Institutional Review Board Statement

Approval for the experiments was obtained from the Research Ethics Committee of Tokyo Polytechnic University. All experiments were carried out in accordance with the guidelines of the ethics committee.

Informed Consent Statement

Informed consent was obtained from all experimental participants.

Data Availability Statement

The data presented in this study can be obtained on request from the corresponding author.

Acknowledgments

We are grateful to Mayumi Usami of the National Institute for Japanese Language and Linguistics for the advice provided during this study.

Conflicts of Interest

The authors declare no competing interests.

References

Apple. Siri. Available online: https://www.apple.com/jp/siri/ (accessed on 21 April 2023).
Wu, X.; Ito, K.; Iida, K.; Tsuboi, K.; Klyen, M. Rinna: Zyosikousei zinkou tinou. In Proceedings of the Twenty-Second Annual Meeting of the Association for Natural Language Processing, Sendai, Japan, 8–10 March 2016; pp. 306–309. [Google Scholar]
Takiura, M. Honorifics seen from politeness, politeness seen from honorifics: An overview focusing on their pragmatic relativity. Jpn. J. Lang. Soc. 2008, 11, 23–38. [Google Scholar]
Usami, M. On styles in Japanese language: Focusing on ’Speech-Level Shift’ in Japanese conversational discourse. Jpn. J. Lang. Soc. 2015, 18, 7–22. [Google Scholar]
Miyamoto, T.; Katagami, D.; Tanaka, T.; Kanamori, H.; Yoshihara, Y.; Fujikake, K. Should a driving support agent provide explicit instructions to the user? Video-based study focused on politeness strategies. In Proceedings of the Ninth International Conference on Human-Agent Interaction, Virtual Event, 9–11 November 2021; pp. 157–164. [Google Scholar]
Miyamoto, T.; Katagami, D.; Shigemitsu, Y.; Usami, M.; Tanaka, T.; Kanamori, H.; Yoshihara, Y.; Fujikake, K. Influence of Social Distance Expressed by Driving Support Agent’s Utterance on Psychological Acceptability. Front. Psychol. 2021, 12, 526942. [Google Scholar] [CrossRef] [PubMed]
Bar-Or, E.; Regev, T.; Shaviv, P.; Tractinsky, N. Towards a sociolinguistics-based framework for the study of politeness in human-computer interaction. arXiv 2022, arXiv:2202.09901. [Google Scholar]
Miyamoto, T.; Katagami, D.; Shigemitsu, Y. Improving relationships based on positive politeness between humans and life-like agents. In Proceedings of the 5th International Conference on Human-Agent Interaction, Bielefeld, Germany, 17–20 October 2017; pp. 451–455. [Google Scholar]
Wang, N.; Johnson, W.L. The politeness effect in an intelligent foreign language tutoring system. In Proceedings of the International Conference on Intelligent Tutoring Systems, Montreal, QC, Canada, 23–27 June 2008. [Google Scholar]
Salem, M.; Ziadee, M.; Sakr, M. Effects of politeness and interaction context on perception and experience of HRI. In Proceedings of the International Conference on Social Robotics, Bristol, UK, 27–29 October 2013; pp. 531–541. [Google Scholar]
Torrey, C.; Fussell, S.R.; Kiesler, S. How a robot should give advice. In Proceedings of the International Conference on Human-Robot Interaction, Tokyo, Japan, 3–6 March 2013; pp. 275–282. [Google Scholar]
Takayama, L.; Groom, V.; Nass, C. I’m sorry, Dave: I’m afraid I won’t do that: Social aspects of human-agent conflict. In Proceedings of the CHI Conference on Human Factors in Computing Systems, Boston, MA, USA, 4–9 April 2009; pp. 2099–2108. [Google Scholar]
Srinivasan, V.; Takayama, L. Help me please: Robot politeness strategies for soliciting help from humans. In Proceedings of the CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, 7–12 May 2016; pp. 4945–4955. [Google Scholar]
Lee, J.G.; Lee, K.M.; Ryu, S.H. Vehicle politeness in driving situations. Future Internet 2019, 11, 48. [Google Scholar] [CrossRef]
Lee, J.G.; Lee, K.M. Polite speech strategies and their impact on drivers’ trust in autonomous vehicles. Comput. Hum. Behav. 2022, 127, 107015. [Google Scholar] [CrossRef]
Holtgraves, T.M.; Ross, S.J.; Weywadt, C.R.; Han, T.L. Perceiving artificial social agents. Comput. Hum. Behav. 2007, 23, 2163–2174. [Google Scholar] [CrossRef]
Rana, K.; Madaan, R.; Shukla, J. Effect of polite triggers in chatbot conversations on user experience across gender, age, and personality. In Proceedings of the 30th IEEE International Conference on Robot & Human Interactive Communication, Vancouver, BC, Canada, 8–12 August 2021; pp. 813–819. [Google Scholar]
Firdaus, M.; Shandilya, A.; Ekbal, A.; Bhattacharyya, P. Being polite: Modeling politeness variation in a personalized dialog agent. IEEE Trans. Comput. Soc. Syst. 2022, 10, 1455–1464. [Google Scholar] [CrossRef]
Mukherjee, S.; Hudeček, V.; Dušek, O. Polite Chatbot: A Text Style Transfer Application. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, Dubrovnik, Croatia, 2–4 May 2023; pp. 87–93. [Google Scholar]
Song, M.; Zhang, H.; Xing, X.; Duan, Y. Appreciation vs. apology: Research on the influence mechanism of chatbot service recovery based on politeness theory. J. Retail. Consum. Serv. 2023, 73, 103323. [Google Scholar] [CrossRef]
Brown, P.; Levinson, S.C. Politeness: Some Universals in Language Usage; Cambridge University Press: Cambridge, UK, 1987; Volume 4. [Google Scholar]
Usami, M. Politeness in Japanese dyadic conversations between unacquainted people: The influence of power asymmetry. In Proceedings of the 10th World Congress of Applied Linguistics, Amsterdam, The Netherlands, 8–14 August 1993; pp. 8–14. [Google Scholar]
Usami, M. Politeness and Japanese Conversational Strategies: Implications for the Teaching of Japanese. Ph.D. Thesis, Harvard Graduate School of Education, Cambridge, MA, USA, 1994. [Google Scholar]
Ide, S. Formal forms and discernment: Two neglected aspects of universals of linguistic politeness. Multilingua 1989, 8, 223–248. [Google Scholar] [CrossRef]
Kageyama, Y.; Chiba, Y.; Nose, T.; Ito, A. Improving user impression in spoken dialog system with gradual speech form control. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, 12–14 May 2018; pp. 235–240. [Google Scholar]
Khan, S.; Fazil, M.; Imoize, A.L.; Alabduallah, B.I.; Albahlal, B.M.; Alajlan, S.A.; Almjally, A.; Siddiqui, T. Transformer Architecture-Based Transfer Learning for Politeness Prediction in Conversation. Sustainability 2023, 15, 10828. [Google Scholar] [CrossRef]
Mishra, K.; Firdaus, M.; Ekbal, A. GenPADS: Reinforcing politeness in an end-to-end dialogue system. PLoS ONE 2023, 18, e0278323. [Google Scholar] [CrossRef] [PubMed]
Mishra, K.; Priya, P.; Ekbal, A. Help me heal: A reinforced polite and empathetic mental health and legal counseling dialogue system for crime victims. In Proceedings of the AAAI Conference on Artificial Intelligence, Limassol, Cyprus, 5–8 June 2023; Volume 37, pp. 14408–14416. [Google Scholar]
Priya, P.; Firdaus, M.; Ekbal, A. A multi-task learning framework for politeness and emotion detection in dialogues for mental health counselling and legal aid. Expert Syst. Appl. 2023, 224, 120025. [Google Scholar] [CrossRef]
Mishra, K.; Priya, P.; Ekbal, A. PAL to lend a helping hand: Towards building an emotion adaptive polite and empathetic counseling conversational agent. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 12254–12271. [Google Scholar]
Purohit, A.K.; Upadhyaya, A.; Holzer, A. ChatGPT in Healthcare: Exploring AI Chatbot for Spontaneous Word Retrieval in Aphasia. In Proceedings of the Companion Publication of the Conference on Computer Supported Cooperative Work and Social Computing, Virtual Event, 23–27 October 2023; pp. 1–5. [Google Scholar]
Ribino, P. The role of politeness in human–machine interactions: A systematic literature review and future perspectives. Artif. Intell. Rev. 2023, 56, 445–482. [Google Scholar] [CrossRef]
Matsumoto, Y. Reexamination of the universality of face: Politeness phenomena in Japanese. J. Pragmat. 1988, 12, 403–426. [Google Scholar] [CrossRef]
Usami, M. Discourse politeness in Japanese conversation: Some implications for a universal theory of politeness. Jpn. Lang. Lit. 2002, 38, 351. [Google Scholar]
Usami, M. BTSJ-Japanese Natural Conversation Corpus with Transcripts and Recordings. NINJAL Institute-Based Projects: Multiple Approaches to Analyzing the Communication of Japanese Language Learners; National Institute for Japanese Language and Linguistics: Tachikawa, Japan, 2021. [Google Scholar]
Miyamoto, T.; Katagami, D.; Shigemitsu, Y.; Usami, M.; Tanaka, T.; Kanamori, H.; Yoshihara, Y.; Fujikake, K. Proposal of driving support agent which speak based on politeness theory. In Proceedings of the International Conference on Human-Computer Interaction 2019, Gipuzkoa, Spain, 25–28 June 2019; Volume 11596. [Google Scholar]
Nagai, N.; Miyamoto, T.; Katagami, D. Evaluation of dialogue systems with different end of sentences using Seq2Seq model. J. Jpn. Soc. Fuzzy Theory Intell. Inform. 2020, 32, 523–527. [Google Scholar]
Miyamoto, T.; Katagami, D.; Usami, M. A politeness control method for conversational agents considering social relationships with users. In Proceedings of the Advances in Artificial Intelligence-Selected Papers from the Annual Conference of Japanese Society of Artificial Intelligence, Virtual Event, 8–11 June 2021; pp. 224–231. [Google Scholar]
Iwashita, M.; Katagami, D. Psychological effects of compliment expressions by communication robots on humans. In Proceedings of the 2020 International Joint Conference on Neural Networks, Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Higashino, K.; Kimoto, M.; Iio, T.; Fujita, K. Is Politeness Better than Impoliteness? Comparisons of Robot’s Encouragement Effects Toward Performance, Moods, and Propagation. Int. J. Soc. Robot. 2023, 15, 717–729. [Google Scholar] [CrossRef]
MeCab. Available online: https://taku910.github.io/mecab/ (accessed on 21 April 2023).
Higashinaka, R.; Funakoshi, K.; Inaba, M.; Tsunomori, Y.; Takahashi, T.; Aakama, R.; Usami, M.; Kawabata, Y.; Mizukami, M. The dialogue system live competition 2. JSAI Tech. Rep. SIG-SLUD 2019, 87, 42–49. [Google Scholar]
Higashinaka, R.; Minato, T.; Sakai, K.; Funayama, T.; Nishizaki, H.; Nagai, T. Spoken dialogue system development at the dialogue robot competition. J. Acoust. Soc. Jpn. 2021, 77, 512–520. [Google Scholar]
Telegram. Available online: https://telegram.org/ (accessed on 21 April 2023).
Bartneck, C.; Kulić, D.; Croft, E.; Zoghbi, S. Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots. Int. J. Soc. Robot. 2009, 1, 71–81. [Google Scholar]
Heerink, M.; Kröse, B.; Evers, V.; Wielinga, B. Assessing Acceptance of Assistive Social Agent Technology by Older Adults: The Almere Model. Int. J. Soc. Robot. 2010, 2, 361–375. [Google Scholar]
Higashinaka, R.; D’Haro, L.F.; Shawar, A.; Bayan, B.; Banchs, R.; Funakoshi, K.; Inaba, M.; Tsunomori, Y.; Takahashi, T.; Sedoc, J. Overview of the dialogue breakdown detection challenge 4. In Proceedings of the Tenth International Workshop on Spoken Dialogue Systems Technology, Syracuse, Italy, 24–26 April 2019. [Google Scholar]
Schmider, E.; Ziegler, M.; Danay, E.; Beyer, L.; Bühner, M. Is It Really Robust? Methodology 2010, 6, 147–151. [Google Scholar]
Harwell, M.R.; Rubinstein, E.N.; Hayes, W.S.; Olds, C.C. Summarizing Monte Carlo Results in Methodological Research: The One- and Two-Factor Fixed Effects ANOVA Cases. J. Educ. Behav. Stat. 1992, 17, 315–339. [Google Scholar]
Blanca, M.; Alarcón, R.; Arnau, J.; Bono, R.; Bendayan, R. Non-normal data: Is ANOVA still a valid option? Psicothema 2017, 29, 552–557. [Google Scholar]
Brown, M.B.; Forsythe, A.B. The Small Sample Behavior of Some Statistics Which Test the Equality of Several Means. Technometrics 1974, 16, 129–132. [Google Scholar]
Cohen, J. Statistical Power Analysis for the Behavioral Sciences; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 1988. [Google Scholar]
Open AI. ChatGPT. Available online: https://openai.com/blog/chatgpt (accessed on 21 April 2023).
Makany, T.; Roh, S.; Hara, K.; Hua, J.M.; Goh Si Ying, F.; Teh Yang Jie, W. Beyond anthropomorphism: Unraveling the true priorities of chatbot usage in SMEs. In Proceedings of the 5th International Conference on Conversational User Interfaces, Eindhoven, The Netherlands, 19–21 July 2023; pp. 1–5. [Google Scholar]
Dennett, D.C. The Intentional Stance; MIT Press: Cambridge, UK, 1989. [Google Scholar]
Reeves, B.; Nass, C. The Media Equation: How People Treat Computers, Television, and New Media Like Real People And Places; Cambridge University Press: New York, NY, USA, 1996. [Google Scholar]
Luger, E.; Sellen, A. “Like having a really bad PA”: The gulf between user expectation and experience of conversational agents. In Proceedings of the CHI Conference on Human Factors in Computing Systems, New York, NY, USA, 7–12 May 2016; pp. 5286–5297. [Google Scholar]
Danescu-Niculescu-Mizil, C.; Sudhof, M.; Jurafsky, D.; Leskovec, J.; Potts, C. A computational approach to politeness with application to social factors. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL, Sofia, Bulgaria, 4–9 August 2013; pp. 250–259. [Google Scholar]
Niu, T.; Bansal, M. Polite dialogue generation without parallel data. Trans. Assoc. Comput. Linguist. 2018, 6, 373–389. [Google Scholar]
Nomura, T. Cultural differences in social acceptance of robots. In Proceedings of the 26th IEEE International Symposium on Robot and Human Interactive Communication, Lisbon, Portugal, 28 August 2017–1 September 2017; pp. 534–538. [Google Scholar]
Bartneck, C.; Suzuki, T.; Kanda, T.; Nomura, T. The influence of people’s culture and prior experiences with Aibo on their attitude towards robots. AI Soc. 2007, 21, 217–230. [Google Scholar]
Koda, T.; Takeda, Y. Perception of culture-specific gaze behaviors of agents and gender effects. In Proceedings of the 6th International Conference on Human-Agent Interaction, Southampton, UK, 15–18 December 2018; pp. 138–143. [Google Scholar]

Figure 2. Example of dialog with the dialog system (left) and user (right). See Table 3 for detailed dialog examples with English translations. The lower part displays the Telegram user interface. The user can type a message and press Enter to send the message to the dialog system.

Figure 4. Results of impression evaluation of low and high speech level recognition groups during text dialog experiment. The error bars represent the standard error.

Table 1. Example of speaker attributes based on Equation (1). In [36], the social relationship between the user and the dialog system, as well as the current dialog situation, are defined based on three variables (D, P, and Rx); this definition was adopted in this study.

	Attributes and Situation
D	First meeting or Friend
P	Younger or Same generation or Older
Rx	Chat or Request

Table 3. Example dialogu with formal style rate of 26.67% under shifting speech level condition. S = system and U = user (participant). Please note that the English translation in parentheses may not convey the same level of politeness or formality as the Japanese language.

Speaker	Utterances
S1	Hazimemasite. Yorosiku onegai itasi masu. (Nice to meet you. Thank you for joining us.)
U1	Yorosiku onegai si masu. (Nice to meet you, too.)
S2	Kaiwa no te-ma ga ryokou tte koto na n desu kedo, sassoku desu kedo nanika imamade ixtu ta tokoro de yokaxtu ta ba syotte ari masu ka? (We’re going to talk about travel today. So, have you been to any great places you’d like to recommend?)
U2	Yakusima ha yokaxtu ta desu yo. (Yakushima was a great place to visit.)
S3	He-! Watasi ha ixtu ta koto nai n desu yo ne. Doko ga osusume desu ka? (Oh, really? I’ve never been there. What would you recommend?)
U3	Yaku sugi rando ga osusume desu. (I would recommend Yakusugi Land.)
S4	Naruhodo! Watasi mo sugoi i ki taku naxtu te ki ta n da kedo, sore tte dou ixtu ta n desu ka? (I see! I really want to go there too. How did you get there?)
U4	Rentaka- wo tukai masi ta. (I rented a car.)
S5	A- nara watasi mo ike sou da kara, kondo taimingu wo mi te ixtu te miru ne! Sorenisitemo ii tokoro da ne, kekkou ryokou tte suru no? (Ah, I see. I might be able to go, too. Do you travel a lot?)
U5	Hima sae are ba, ryokou ni dekake teru yo. (Whenever I have free time, I go on trips.)
S6	I ya- ii tokoro wo sixtu teru nante sugoi ne! Sonna anata ni kiki tai koto ga aru n desu kedo, kore kara saki kou iu tokoro ni ixtu te mi tai na tte omoxtu teru tokoro ha ari masu ka! Aimai demo ii n da kedo. (Wow, you know some great places. I have a question for you. Do you have any places you want to go in the future, even if it’s vague?)
U6	Mon san misheru toka ixtu te mi tai na. (I’d like to visit Mont Saint-Michel.)
S7	Humu, itugoro iko u to katte aru? (Hmm, do you have a specific time in mind?)
U7	San nen i nai kana. (Within the next three years, I think.)
S8	Unun. Doko ni ryokou ni iku to si te mo yappari kesi ki wo mi ni iku ryokou ni naru no kana? (I see. When it comes to traveling, it’s all about seeing the scenery, isn’t it?)
U8	Umi ga kirei da to omou. (I think the sea is beautiful.)
S9	Demo ryokou to ie ba oisii gohan mo sute gatai yo ne, nani ka osusume no gohan tte aru no? (But when it comes to traveling, it’s hard to pass up on delicious food, right? Do you have any food recommendations?)
U9	Oosaka no tako ya ki kana. (Osaka-style takoyaki.)
S10	Naruhodo! Ryokou tte i ku no mo tano sii kedo zyunbi si tari sira beru no mo tano sii yo ne-. (I see. It’s fun not only to travel, but also to prepare and research for the trip.)
U10	Sou da ne. (That’s right.)
S11	Demo yappari ryokou no hanasi wo suru no ha tano sii ne! (So talking about travel is really fun, isn’t it?)
U11	Tano sii yo ne! (It’s so fun!)
S12	Gomen. Hito ni kou iu hanasi ki ku koto ga amari nai node katte ni sugoku mo ri a gaxtu tyai masi ta. (I’m sorry. I don’t get to talk to people about this kind of stuff very often, so I got really excited.)
U12	Sonna koto nai yo. (Don’t worry.)
S13	Ano- mosi yokaxtu tara kondo ryokou si ta toki ni syasin toka oku xtu te kure mase n ka? Mata hana si tai desu si. (Um, if it’s okay with you, could you send me some pictures when you travel next time? I’d love to talk again.)
U13	Okke-. (Sure.)
S14	Maa ki ga mu i tara o nega i si masu! (Well, if you feel like it, please do!)
U14	Ha-i. (Okay.)
S15	Axtu, sorosoro o wari no zikan mitai desu. Kyou ha arigatou gozai masi ta! (Oh, it looks like our time is almost up. Thank you for joining us today!)
U15	Un, arigatou. (Yes, thank you.)
S16	Otukare sama desi ta-. (Thank you and good job!)

Table 4. An example where the dialog system was unable to provide appropriate responses to the user’s questions or opinions. S = system and U = user (participant).

Speaker	Utterances
U6	Tōkyō ni itte mitai desu ga, Tōkyō tte kankōchi to iu kanji ga amari shimasen yo ne? (I would like to visit Tokyo, but it doesn’t really give off the vibe of a tourist destination, does it?)
S7	Fumu… itsu goro ikou toka tte arimasu ka? (Hmm… do you have any plans on when you might go?)
(Omission)
U8	Keshiki o miru no mo ii desu ga, watashi wa gotōchi gurume mo tanoshimi ni shiteimasu. (While enjoying the scenery is nice, I’m also looking forward to trying local cuisine.)
S9	Demo ryokō to ieba oishii gohan mo sute-gatai desu yo ne… Nanika osusume no gohan tte arun desu ka? (But when it comes to traveling, delicious food is irresistible, right? Do you have any recommended dishes?)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nagai, N.; Miyamoto, T.; Katagami, D. Effects of Speech Level Shift Tested by a Non-Task-Oriented Dialog System on Text Chat Dialog with Users in Japanese: A Pilot Study. Appl. Sci. 2025, 15, 3897. https://doi.org/10.3390/app15073897

AMA Style

Nagai N, Miyamoto T, Katagami D. Effects of Speech Level Shift Tested by a Non-Task-Oriented Dialog System on Text Chat Dialog with Users in Japanese: A Pilot Study. Applied Sciences. 2025; 15(7):3897. https://doi.org/10.3390/app15073897

Chicago/Turabian Style

Nagai, Nozomu, Tomoki Miyamoto, and Daisuke Katagami. 2025. "Effects of Speech Level Shift Tested by a Non-Task-Oriented Dialog System on Text Chat Dialog with Users in Japanese: A Pilot Study" Applied Sciences 15, no. 7: 3897. https://doi.org/10.3390/app15073897

APA Style

Nagai, N., Miyamoto, T., & Katagami, D. (2025). Effects of Speech Level Shift Tested by a Non-Task-Oriented Dialog System on Text Chat Dialog with Users in Japanese: A Pilot Study. Applied Sciences, 15(7), 3897. https://doi.org/10.3390/app15073897

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Effects of Speech Level Shift Tested by a Non-Task-Oriented Dialog System on Text Chat Dialog with Users in Japanese: A Pilot Study

Abstract

1. Introduction

2. Related Work

2.1. Politeness Theory and Speech Level Shift

2.2. Effects of Speech Level and Politeness on Chatbot

3. Methods

3.1. Dialog System with Speech Level Control

3.2. Experimental Dialog Rules

4. Experiments

4.1. Experimental Outline

4.2. Procedure

4.3. Statistical Analysis

4.4. Results

5. Discussion

5.1. User Impressions of Text Dialog System

5.2. Promotion of Speech Level Shift

5.3. Speech Level Recognition and Users’ Impressions

5.4. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI