Evaluating Human–Automation Etiquette Strategies to Mitigate User Frustration and Improve Learning in Affect-Aware Tutoring

: Human–automation etiquette applies human–human etiquette conventions to human– computer interaction (HCI). The research described in this paper investigates how to mitigate user frustration and support student learning through changes in the style in which a computer tutor interacts with a learner. Frustration can signiﬁcantly impact the quality of learning in tutoring. This study examined an approach to mitigate frustration through the use of different etiquette strategies to change the amount of imposition feedback placed on the learner. An experiment was conducted to explore how varying the interaction style of system feedback impacted aspects of the learning process. System feedback was varied through different etiquette strategies. Participants solved mathematics problems under different frustration conditions with feedback given in different etiquette styles. Changing etiquette strategies from one math problem to the next led to changes in motivation, conﬁdence satisfaction, and performance. The most effective etiquette strategies changed depending on if the user was frustrated or not. This work aims to provide mechanisms to support the promotion of individualized learning in the context of high level math instruction by basing affect-aware adaptive tutoring system design on varying etiquette strategies.


Introduction
This work investigates the intersection of human-human etiquette strategies, student frustration, and the interaction style between a learner and an intelligent tutoring system.Preliminary results of this work were presented in [1].Human emotion can drive the direction of conversation and plays a key role in communication [2].Both positive emotions (e.g., happiness and fulfilment) and negative emotions (e.g., boredom and frustration) are significant components in communication, especially in learning [3][4][5].Negative emotions, notably frustration, have significant consequences such as lower task productivity [6][7][8][9], longer decision-making time [10,11], and lower learning efficiency [12].

Intelligent Tutoring Systems
Student learning is supported by a human tutor's ability to respond to questions, analyze answers, and provide customized feedback.In much the same way, computer-based intelligent tutoring systems (ITSs) enable learning by providing customized feedback to users through instructional content and teaching strategies [13,14].ITS research aims to apply the best practices of human tutors while attempting to develop new methods for ITS teaching and learning [15][16][17].However, in contrast to human tutors, ITSs have limited ability to adjust their interaction behavior based on the emotional Appl.Sci.2018, 8, 895 2 of 17 state of the student to appropriately meet the needs of the student [18].Affect-aware systems (also called affective systems) include emotion as a factor, and typically adjust the task difficulty level of problems and provide adaptive feedback to consider user emotions [8,19].
An ITS is a form of adaptive system and is a computer-based system designed to be responsive to the current contact by changing its behavior without explicit human control.Adaptive systems can adjust their behavior by tracking the condition of the users [20], and have four categories: (1) adapting the allocation of functions between the human and the automation system; (2) adapting the information displayed to the user; (3) changing the user's task priority by directing their attention, and (4) changing the interaction style between the human and the system.The interplay of human factor considerations when changing the interaction style has been one reason that this approach has been less utilized than the others.For instance, while humans use various interaction styles when they face certain situations, adjusting the way computers deliver information violates the human factors principle of consistency in the context of human-computer interaction (HCI) [20].However, a consistent feedback style may not always be the best in every situation.Furthermore, given the complexity and subtly of the interplay between frustration and HCI, mitigating frustration in human-computer interaction through system changes has been less studied [8].

Lessons Learned from Human-Human Communication and Learning
People interact differently in human-human interaction when they perceive the emotional states of others [21].For example, special communication skills are used by physicians to deliver bad news when they detect their patients' negative emotions [22].A human tutor may change his or her speaking style to enhance a student's motivation or mitigate frustration by considering other factors besides performance in order to maximize student learning.In education, various factors influence effective student learning.Keller [23] proposed four steps for encouraging and sustaining students' motivation in the learning processes: attention, relevance, confidence, and satisfaction (ACRS).The ARCS model has been used to improve learning effectiveness in distance learning [24], employee education [25], and manufacturing training [26].Higher levels of motivation, confidence, perceived satisfaction, and overall performance lead to higher rates of engagement in a combination of classroom and online learning [27].Feedback can be used to not only enhance performance, but also to enhance precursors to performance such as motivation, confidence, and satisfaction [23].

Ettiquette Strategies
Communication in human-human interaction can serve as a basis to investigate the utility of changing the interaction style of an ITS.Social behaviors in human interactions are governed by expectations between the speaker and hearer based on conventional norms.Conventional requirements for social behavior are codified in etiquette.When people share the same model of etiquette, they expect the same level of social behaviors from each other.Interactions between people with inappropriate etiquette may be unproductive, confusing, or even potentially dangerous [28].Etiquette includes three independent factors: social power, social distance, and imposition.It is possible for people to have expectations when interacting with computers.
Etiquette strategies between humans were developed to redress the affronts posed by facethreatening acts (FTAs) [29,30].FTAs are an act by the speaker that opposes the desires of the hearer, damaging their face.Positive face is characterized by the desire to be liked and admired.Ignoring someone threatens positive face.Negative face is the desire to be unimpeded in one's action, where the speakers does not impose on the hearer [29].
Etiquette has independent factors including three social variables: social power (i.e., ability of one person to impose their will on another), social distance (e.g., level of familiarity), and imposition (i.e., degree of threat of an FTA).The social power and social distance are decided by the relationship between speakers and hearers.It may take a long time to change the aspects of social power and social distance between two entities, if they can be changed at all.However, the level of imposition can be determined by using different interaction styles since it refers to the amount of demand or burden [29,31].Consequently, the concept of different etiquette strategies is based on the idea that it is easier to adjust the imposition from speaker to hearer to mitigate FTAs [29].
Cooperation to maintain each other's face is facilitated by etiquette strategies.Four types of etiquette strategies have been identified [29].A bald strategy does not consider the level of imposition on the hearer from the speaker."Pass the salt" is a direct request that does not attempt to minimize the threat to the hearer's face.Positive politeness minimizes the imposition and social distance between speaker and hearer by giving compliments or making assertions of familiarity and solidarity."That is a nice coat, where did you get it?"prefaces the request for information by paying a compliment.Negative politeness assumes that the speaker is in some way imposing on the hearer."I don't want to bother you but..." or "I was wondering if..." attempt to be respectful, but the speaker knows that there is some level of imposition in the request.Off-record utterances by the speaker makes requests on the hearer only indirectly, use general language that requires the hearer to infer the true meaning.For example, a speaker could say "Wow, it sure is getting cold in here."This requires the hearer to infer that the speaker is really asking for the temperature to be raised [29].
The effectiveness of different interaction styles with etiquette was examined to see how these strategies could potentially enhance or inhibit effective tutoring [32].Human tutors were able to select from one of three different etiquette strategies as they saw fit: bald, positive politeness, or negative politeness when they communicated with their students.Etiquette strategies were used by human tutors in tutoring conversations, both positively and negatively.Observations from conversation examples showed that positive politeness were used to encourage the students when they struggled to solve problems.However, the tutors' responses about the problem answer (e.g., "No, that is wrong") may have led to negative impressions for students even though it was not part of the intentional feedback based on etiquette strategies.The study suggested that human tutors use different interaction strategies to tailor tutoring even though there were violations of the rules of conversations.

Application of Ettiquette Strategies in Tutoring
The concept of etiquette and politeness has been applied to automation [33].Miller et al. [34] developed computational models of communication focused on politeness and etiquette, and established roles of social interactions such as managing power, familiarity relationship, urgency, and indebtedness.Etiquette was used to make natural and polite interactions between humans and computer systems [35][36][37].
Various systems for training and tutoring have explored the concept of etiquette.A virtual manufacturing plant factory training system was developed to teach employees, based on two levels of politeness: direct and indirect (polite).Results showed that indirect interaction leads to higher student motivation [38].The virtual factory training system demonstrated beneficial effects of two etiquette strategies (positive and negative politeness) on learning efficiency [39].In a similar manner, a language and culture learning system explicitly delivered language contents and taught social norms by using face-to-face interactions with etiquette and anthropomorphism [40].A disease and hospital information system were developed to convey information politely [41].The participants' ratings of politeness and appropriateness were higher in bald, positive politeness, and negative politeness conditions, but lower in off-record condition because it required subtlety and consideration of context to be properly comprehended.

Ability of an ITS to Adapt Interaction Style
To summarize the discussion above, research suggests that feedback can be used to address performance, motivation, confidence, and satisfaction [23].Furthermore, observation of human tutors reveals that they change their etiquette strategies to support student learners [32].Finally, using etiquette strategies in human-computer interaction may be a viable strategy to adapt the interaction style of an ITS.Taken together, this leads to the hypothesis H1 that asks whether changes in etiquette strategies by an ITS can lead to different outcomes in performance, motivation, confidence, and satisfaction.Hypothesis H1.Changing etiquette strategies in tutoring leads to differences in performance, motivation, confidence, and satisfaction.

Frustration in Human-Computer Interaction
Emotion can influence the quality of the interaction between a human and a computer.In general, human operators accept machines as a team member, and therefore expect appropriate reactions from machines [42].Even though computer systems provide benefits in productivity, frustration is one of the most common experiences in HCI [43].Frustration is an emotional state where achieving a goal is blocked by obstacles [44].Aggression is one of the consequences of frustration, which is a complex emotion related to disappointment and anger [45].Frustration has been shown to reduce the quality of ongoing performance by eliciting responses that interfered with the completion of a given task [6].In an experiment conducted on children, frustration significantly reduced perceptual-motor performance, especially in boys [7].
Despite ever increasing technological capabilities, frustration remains a recurring problem for users of computer-based systems.Therefore, frustration continues to be of significant interest in HCI.Frustration has been shown to be both frequent and damaging to productivity.On average, users waste 42-43% of their time due to frustration when using computers [46].
Previous work found that task performance is influenced by the level of frustration.For example, a higher level of frustration led to a lower performance score on a digit-symbol substitution test [47].Likewise, operators' task performance was diminished when they were frustrated by system delays in a robot vehicle teleoperating task [48].Frustration led to lower user satisfaction, lower motivation, and drove the users to seek alternative systems [46,49].

The Impact of Frustration on Learning with an ITS
In learning, higher frustration caused slower response times [50] and delayed content acquisition [51].Frustration also reduced the motivation of students [52] and led to a lack of confidence of students in computer science [53].Studies have explored how to account for user frustration in the development of effective tutoring systems.Different heuristic strategies have been used to mitigate user frustration, including mirroring student actions to show empathy; adjusting the authority level of the tutoring system to reduce pressure; and changing the voice, motion, and gestures of the avatar in the tutoring system to provide encouragement for the students [18].The intelligent tutor's strategies effectively supported the students by encouraging them to continue their tasks although they were frustrated [18,54].These studies showed that frustration is a topic worth exploring for reasons other than its relation to productivity.Affect-aware computer system would benefit from a more human-like ability to sense and respond to frustration [55].
The concept of automation etiquette applies human-human etiquette conventions to HCI [33].If a system can incorporate an understanding of the user affective state into its reasoning, the interaction between the user and the computer system could be made more sophisticated.Computers could appropriately modify their behavior with users to further joint performance.For instance, in tutoring, human tutors are finely attuned to their students' emotional states.If computers could be more attuned, they may be able to provide appropriate responses in stressful situations where human emotion is impacting the ability to function.Initial studies explored the effects of various interaction styles and etiquette strategies to potentially enhance human-human tutoring [32], increase the situation awareness of users in HCI [28], and lead to higher reliability of the system from the user's perspective [35].In combination with advances of tutoring, human-computer interfaces that incorporate more empathy and affect could enable ITSs to more authentically embody the richness of human social interactions [18,19].

Application of Ettiquette Strategies to Mitigate Frustration
To summarize the discussion above, human tutors are attuned to their student's emotional states.Human tutors have been shown to change their etiquette strategies to enhance outcomes [32].Frustration can decrease performance [51], motivation [52], satisfaction [46,49], and confidence [53].While some heuristics have been used to mitigate frustration [18], it is an open question whether the frustration level has an effect on which etiquette strategy most effectively impacts an outcome.Hypothesis H2 is therefore presented to test if the most effective etiquette strategy changes for different levels of frustration of the learner.Hypothesis H2.When users are frustrated, the most effective etiquette strategies are different from when they are not frustrated.

Impact
Understanding the effects of different etiquette strategies on users' performances, motivation, confidence, and satisfaction can contribute to the design of an effective HCI system to enhance the quality of interactions between users and systems.Such a system could support a student emotionally as well as cognitively.An experiment was conducted to investigate the effects of etiquette strategies in tutoring while the participants solved mathematics problems under different levels of frustration.This work aims to provide mechanisms to support the promotion of individualized learning in the context of high level math instruction.The goal was to develop an understanding of how different etiquette strategies can have differential effects not only performance, but also on the learning precursors of motivation, confidence, and satisfaction.In the same way human tutors adapt their feedback to learners when they become frustrated, an adaptive ITS system could change its communication style.

Materials and Methods
The objective of this study was to explore the ability of etiquette strategies to mitigate user frustration and improve task performance, motivation, confidence, and satisfaction in tutoring.

Participants
A total of 40 university students (23 males, 17 females) averaging 21.1 years old (range: 18-29).They averaged 5.7 h (range: 1-15) of computer-use daily.Participants last attended mathematics class an average of 1.35 years ago (range: 1-3).All subjects gave their informed consent for inclusion before they participated in the study.The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of Iowa State University (15-142).

Task
Participants solved mathematics problems in algebra, geometry, trigonometry, calculus, statistics, and probability.The Graduate Record Examination (GRE) practice book provided the problem.The GRE is an exam used for admissions into graduate school.Twenty trials with one math problem each were provided (see Figure 1).All problems had a historical GRE correct rate of 30-40%.The same level of task difficulty ensured that participants would require feedback frequently in order to solve the problem.Problems were displayed on a computer monitor.Participants were provided pencils and scratch paper.

Independent Variables
The independent variables were frustration (levels: high, low) and etiquette strategy (levels: bald, positive-politeness, negative-politeness, off-record, no-feedback).Frustration was induced by imposing a time constraint and by changing the label of the level of task difficulty on the problems, even though all problems had the same level of difficulty.Frustration comes from unfulfilled expectations [56].All problems were of a similar difficulty level (30-40% GRE correct rate).However, half of the twenty problems were labeled as 'easy' problems, the other half were labeled "hard".Thus, if a problem is labeled as easy but is actually hard, participants will get frustrated because their experience with the problem is different from their expectation.Recognizing the difference between expected and actual difficulty has been shown to cause frustration [57,58].A pilot test determined the level of difficulty such that the problems produced a measurable level of frustration (when mislabeled "easy") but not so hard that subjects gave up.Additionally, a time constraint was also employed to manipulate frustration [59].The time constraint was the average of their last five practice problems.Beeps at 1 min, 30 s and 10 s remaining reminded the participant of the time constraint.The manipulations were designed to elicit frustration to a level that did not cause the user to give up on the task.
The independent variable of etiquette strategies had five levels: the four different etiquette strategies and a baseline condition of no feedback.Table 1 shows the same feedback being presented in each etiquette strategy.

Etiquette Strategies Preference
One possible use for etiquette strategies would be to match the appropriate etiquette strategy to the preference of the learner, much as it has been argued that the presentation of information should be matched to a student's learning style.Past research has documented that learners will express a preference of how information should be presented to them [60].The participants were asked before the experiment to rate their preferences for the four etiquette strategies.Participants were asked to read the definitions and examples of four etiquette strategies and complete their preference rating (10-point Likert scale).This baseline data was used to compute the correlation between their preference and trial results.

Independent Variables
The independent variables were frustration (levels: high, low) and etiquette strategy (levels: bald, positive-politeness, negative-politeness, off-record, no-feedback).Frustration was induced by imposing a time constraint and by changing the label of the level of task difficulty on the problems, even though all problems had the same level of difficulty.Frustration comes from unfulfilled expectations [56].All problems were of a similar difficulty level (30-40% GRE correct rate).However, half of the twenty problems were labeled as 'easy' problems, the other half were labeled "hard".Thus, if a problem is labeled as easy but is actually hard, participants will get frustrated because their experience with the problem is different from their expectation.Recognizing the difference between expected and actual difficulty has been shown to cause frustration [57,58].A pilot test determined the level of difficulty such that the problems produced a measurable level of frustration (when mislabeled "easy") but not so hard that subjects gave up.Additionally, a time constraint was also employed to manipulate frustration [59].The time constraint was the average of their last five practice problems.Beeps at 1 min, 30 s and 10 s remaining reminded the participant of the time constraint.The manipulations were designed to elicit frustration to a level that did not cause the user to give up on the task.
The independent variable of etiquette strategies had five levels: the four different etiquette strategies and a baseline condition of no feedback.Table 1 shows the same feedback being presented in each etiquette strategy.

Etiquette Strategies Preference
One possible use for etiquette strategies would be to match the appropriate etiquette strategy to the preference of the learner, much as it has been argued that the presentation of information should be matched to a student's learning style.Past research has documented that learners will express a preference of how information should be presented to them [60].The participants were asked before the experiment to rate their preferences for the four etiquette strategies.Participants were asked to read the definitions and examples of four etiquette strategies and complete their preference rating (10-point Likert scale).This baseline data was used to compute the correlation between their preference and trial results.

Independent Variable Manipulation Verification (Frustration)
The NASA task load index (TLX) frustration subscale [61] scores served as a subjective measure of frustration.To verify the independent variable manipulation, participant responses were compared between low and high frustration in the no feedback condition.

Task Performance
Both an objective and subjective measure of performance were used.A rubric was used to objectively grade their score (see Table 2).The TLX performance subscale scores provided a subjective measure of performance.After each trial, participants were asked to rate motivation, confidence, and satisfaction on a 10-point Likert scale.

Feedback Appropriateness and Effectiveness
After each trial, participants were asked to rate feedback appropriateness and feedback effectiveness using a Likert scale from 0-10.

Mental and Temporal Workload
NASA TLX is a subjective assessment tool that rates perceived workload in multiple dimensions.The participants' mental demand and temporal demand were measured through NASA TLX subscales after each trial.

Experimental Design
This was a within-subject, 2 (frustration: low, high) × 5 (etiquette strategy: bald, positive-politeness, negative-politeness, off-record, no-feedback) experimental design.A within-subject design was used to block the effect of individual differences such as math skill level.Each combination of independent variables condition was tested twice for a total of 20 trials.Condition order was counterbalanced using Latin squares to account for learning effects.

Procedure
After the consent process, briefing, and demographic survey, participants reviewed and practiced problems until they felt comfortable.The time constraint for high frustration trials was the average completion time of the last five practice trials.Between trials, participants completed a post-trial survey and the NASA TLX.Opinions and tactics were gathered in a post-experiment survey.Finally, a debriefing explained the true goal of the study, as participants were initially told that the study purpose was to test their mathematics problem-solving ability.

Data Analysis
The Shapiro-Wilk test was used to check the normality of data.Bartlett's test was used to test the homogeneity of variance.Measured data were analyzed with ANOVA tests.Post-hoc analysis used Tukey's honest difference test (HSD) in order to distinguish pairwise means that are significantly different from each other.Tukey results are presented as a series of letters for each group.If two groups do not share a letter, then they are significantly different from each other.The results are reported as significant for alpha < 0.05, and marginally significant for alpha < 0.10 [62].Cohen's d was calculated to check effect size [63].The Cohen's d results are reported as small effect for 0.20 < d <0.50, medium effect for 0.50 < d < 0.80, and large effects for d > 0.80.Spearman's rank order correlation coefficient was computed to test the association between two ranked variables: participants' baseline rating of etiquette strategies versus each dependent variable.

Interaction Style Preferences
Before starting the trials, participants' had significantly different preferences of etiquette strategies, F(3,117) = 12.6, p < 0.001.Figure 2a indicates the baseline ratings of each etiquette strategy.Significant pairwise differences between strategies are indicated in the figure when the two groups do not share a letter, based on Tukey's HSD.For example, bald and positive-politeness were not significantly different from each other (and therefore are both labelled as A in Figure 2a), and likewise negative-politeness and off-record were not different from each other (labelled as B in Figure 2a).However, every group labelled A was significantly different (p < 0.05) from every group labelled B. The following pairs of groups were found to be significantly different: bald and negative-politeness; bald and off-record; positive-politeness and negative-politeness; and positive-politeness and off-record.

Data Analysis
The Shapiro-Wilk test was used to check the normality of data.Bartlett's test was used to test the homogeneity of variance.Measured data were analyzed with ANOVA tests.Post-hoc analysis used Tukey's honest difference test (HSD) in order to distinguish pairwise means that are significantly different from each other.Tukey results are presented as a series of letters for each group.If two groups do not share a letter, then they are significantly different from each other.The results are reported as significant for alpha < 0.05, and marginally significant for alpha < 0.10 [62].Cohen's d was calculated to check effect size [63].The Cohen's d results are reported as small effect for 0.20 < d <0.50, medium effect for 0.50 < d < 0.80, and large effects for d > 0.80.Spearman's rank order correlation coefficient was computed to test the association between two ranked variables: participants' baseline rating of etiquette strategies versus each dependent variable.

Interaction Style Preferences
Before starting the trials, participants' had significantly different preferences of etiquette strategies, F(3,117) = 12.6, p < 0.001.Figure 2a indicates the baseline ratings of each etiquette strategy.Significant pairwise differences between strategies are indicated in the figure when the two groups do not share a letter, based on Tukey's HSD.For example, bald and positive-politeness were not significantly different from each other (and therefore are both labelled as A in Figure 2a), and likewise negative-politeness and off-record were not different from each other (labelled as B in Figure 2a).However, every group labelled A was significantly different (p < 0.05) from every group labelled B. The following pairs of groups were found to be significantly different: bald and negative-politeness; bald and off-record; positive-politeness and negative-politeness; and positive-politeness and offrecord.From the participant rating data, it was possible to determine each participant's first preference for a strategy by identifying their highest rank among four strategies they rated.Figure 2b illustrates the distribution of participant's first preference of etiquette strategies.The baseline etiquette strategy ratings were not correlated to any of the dependent variables measured after each trial (math problem).

Independent Variable Manipulation Verification (Frustration)
The TLX frustration subscale was significantly higher for high frustration than low frustration, F(1,39) = 48.5, p < 0.001, d = 0.72 (see Figure 3).The figure indicates significant pairwise differences between groups when they do not share a letter.This verifies the manipulation of frustration through problem labelling and time constraints.Anecdotal participant's comments in the high frustration From the participant rating data, it was possible to determine each participant's first preference for a strategy by identifying their highest rank among four strategies they rated.Figure 2b illustrates the distribution of participant's first preference of etiquette strategies.The baseline etiquette strategy ratings were not correlated to any of the dependent variables measured after each trial (math problem).

Independent Variable Manipulation Verification (Frustration)
The TLX frustration subscale was significantly higher for high frustration than low frustration, F(1,39) = 48.5, p < 0.001, d = 0.72 (see Figure 3).The figure indicates significant pairwise differences between groups when they do not share a letter.This verifies the manipulation of frustration through problem labelling and time constraints.Anecdotal participant's comments in the high frustration conditions included: "I do not have enough time to solve problems," "Is it really an easy problem?" "I am so frustrated," "There is no hope."conditions included: "I do not have enough time to solve problems," "Is it really an easy problem?" "I am so frustrated," "There is no hope."

Task Performance
The participants correctly solved significantly more problems in low frustration than high frustration, F(1,39) = 127.4,p < 0.001, d = 0.81.The main effect of etiquette strategies on task performance (score) was significant, F(4,156) = 2.77, p = 0.029.Figure 4a indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, bald was significantly different from negative-politeness, offrecord, and no-feedback; positive-politeness was significantly different from negative-politeness, offrecord, and no-feedback.Every strategy in the high frustration condition was significantly different from every strategy in the low frustration condition.The interaction was significant, F(4,156) = 3.28, p = 0.013.
The participants rated their own subjective performance significantly lower in high frustration than low frustration, F(1,39) = 30.2,p < 0.001, d = −0.41.The main effect of etiquette strategies on subjective rating of performance was significant, F(4,156) = 11.6,p < 0.001.The interaction was not significant, F(4,156) = 1.01, p = 0.41. Figure 4b indicates significant (p < 0.05) pairwise differences when two groups do not share a letter, based on Tukey's HSD.In the low frustration condition, no-feedback was significantly different from all four etiquette strategies.In the high frustration condition, bald and off-record were significantly different from negative-politeness; no-feedback was significantly different from all four etiquette strategies.Across frustration conditions, high/bald, high/off-record, and high/no-feedback were all significantly different than all four etiquette strategies in low

Task Performance
The participants correctly solved significantly more problems in low frustration than high frustration, F(1,39) = 127.4,p < 0.001, d = 0.81.The main effect of etiquette strategies on task performance (score) was significant, F(4,156) = 2.77, p = 0.029.Figure 4a indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.
Appl.Sci.2018, 8, x FOR PEER REVIEW 9 of 17 conditions included: "I do not have enough time to solve problems," "Is it really an easy problem?" "I am so frustrated," "There is no hope."

Task Performance
The participants correctly solved significantly more problems in low frustration than high frustration, F(1,39) = 127.4,p < 0.001, d = 0.81.The main effect of etiquette strategies on task performance (score) was significant, F(4,156) = 2.77, p = 0.029.Figure 4a indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, bald was significantly different from negative-politeness, offrecord, and no-feedback; positive-politeness was significantly different from negative-politeness, offrecord, and no-feedback.Every strategy in the high frustration condition was significantly different from every strategy in the low frustration condition.The interaction was significant, F(4,156) = 3.28, p = 0.013.
The participants rated their own subjective performance significantly lower in high frustration than low frustration, F(1,39) = 30.2,p < 0.001, d = −0.41.The main effect of etiquette strategies on subjective rating of performance was significant, F(4,156) = 11.6,p < 0.001.The interaction was not significant, F(4,156) = 1.01, p = 0.41. Figure 4b indicates significant (p < 0.05) pairwise differences when two groups do not share a letter, based on Tukey's HSD.In the low frustration condition, no-feedback was significantly different from all four etiquette strategies.In the high frustration condition, bald and off-record were significantly different from negative-politeness; no-feedback was significantly different from all four etiquette strategies.Across frustration conditions, high/bald, high/off-record, and high/no-feedback were all significantly different than all four etiquette strategies in low In the low frustration condition, bald was significantly different from negative-politeness, off-record, and no-feedback; positive-politeness was significantly different from negative-politeness, off-record, and no-feedback.Every strategy in the high frustration condition was significantly different from every strategy in the low frustration condition.The interaction was significant, F(4,156) = 3.28, p = 0.013.
The participants rated their own subjective performance significantly lower in high frustration than low frustration, F(1,39) = 30.2,p < 0.001, d = −0.41.The main effect of etiquette strategies on subjective rating of performance was significant, F(4,156) = 11.6,p < 0.001.The interaction was not significant, F(4,156) = 1.01, p = 0.41. Figure 4b indicates significant (p < 0.05) pairwise differences when two groups do not share a letter, based on Tukey's HSD.In the low frustration condition, no-feedback was significantly different from all four etiquette strategies.In the high frustration condition, bald and off-record were significantly different from negative-politeness; no-feedback was significantly different from all four etiquette strategies.Across frustration conditions, high/bald, high/off-record, and high/no-feedback were all significantly different than all four etiquette strategies in low frustration; high/positive-politeness was significantly different from low/positive-politeness, low/negative-politeness.

Motivation
The main effect of frustration on motivation was not significant, F(1,39) = 0.11, p = 0.75.The main effect of etiquette strategies on motivation was significant, F(4,156) = 5.45, p < 0.001.The interaction was not significant, F(4,156) = 0.96, p = 0.43. Figure 5a indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, off-record was significantly different from bald, positive-politeness, and negative-politeness; no-feedback was significantly different from all four etiquette strategies.In the high frustration condition, positive-politeness was significantly different from bald, negative-politeness, and no-feedback.Across frustration conditions, low/positive-politeness was significantly different from bald, negative-politeness, off-record, and no-feedback in the high frustration condition; low/no-feedback was significantly different from positive-politeness and off-record in the high frustration condition.frustration; high/positive-politeness was significantly different from low/positive-politeness, low/negative-politeness.

Motivation
The main effect of frustration on motivation was not significant, F(1,39) = 0.11, p = 0.75.The main effect of etiquette strategies on motivation was significant, F(4,156) = 5.45, p < 0.001.The interaction was not significant, F(4,156) = 0.96, p = 0.43. Figure 5a indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, off-record was significantly different from bald, positive-politeness, and negative-politeness; no-feedback was significantly different from all four etiquette strategies.In the high frustration condition, positive-politeness was significantly different from bald, negativepoliteness, and no-feedback.Across frustration conditions, low/positive-politeness was significantly different from bald, negative-politeness, off-record, and no-feedback in the high frustration condition; low/no-feedback was significantly different from positive-politeness and off-record in the high frustration condition.

Confidence
Participants had significantly more confidence about tasks in low frustration than high frustration, F(1,39) = 12.8, p < 0.001, d = 0.47.The main effect of etiquette strategies on confidence was significant, F(4,156) = 9.66, p < 0.001.The interaction was not significant, F(4,156) = 0.71, p = 0.59. Figure 5b indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, no-feedback was significantly different from all four etiquette strategies.In the high frustration condition, both off-record and no-feedback were significantly different from bald, positive-politeness, and negative-politeness.Across frustration conditions, high/bald was significantly different to positive-politeness, negative-politeness, offrecord, and no-feedback in the low frustration condition; high/bald is different to low/positivepoliteness and low/negative-politeness; high/negative-politeness is significantly different from low/no-feedback; high/off-record is significantly different to all four etiquette strategies in low frustration; high/no-feedback is significantly different to all four etiquette strategies and no-feedback in the high frustration condition.

Confidence
Participants had significantly more confidence about tasks in low frustration than high frustration, F(1,39) = 12.8, p < 0.001, d = 0.47.The main effect of etiquette strategies on confidence was significant, F(4,156) = 9.66, p < 0.001.The interaction was not significant, F(4,156) = 0.71, p = 0.59. Figure 5b indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, no-feedback was significantly different from all four etiquette strategies.In the high frustration condition, both off-record and no-feedback were significantly different from bald, positive-politeness, and negative-politeness.Across frustration conditions, high/bald was significantly different to positive-politeness, negative-politeness, off-record, and no-feedback in the low frustration condition; high/bald is different to low/positive-politeness and low/negative-politeness; high/negative-politeness is significantly different from low/no-feedback; high/off-record is significantly different to all four etiquette strategies in low frustration; high/no-feedback is significantly different to all four etiquette strategies and no-feedback in the high frustration condition.

Satisfaction
Participants were significantly more satisfied with overall feedback in low frustration than high frustration, F(1,39) = 7.32, p = 0.010, d = 0.22.The main effect of etiquette strategies on satisfaction with feedback was significant, F(4,156) = 9.43, p < 0.001.The interaction was not significant, F(4,156) = 0.56, p = 0.69. Figure 6a indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, positive-politeness was significantly different from off-record; and no-feedback was significantly significantly different from low/positive-politeness and low/negative-politeness, and low/negativepoliteness; high/positive-politeness was significantly different from low/bald, low/off-record, and low/no-feedback; high/negative-politeness was significantly different from low/positive-politeness and low/no-feedback; high/off-record was significantly different from low/positive-politeness and low/negative-politeness; high/no-feedback was significantly different from all four etiquette strategies in low frustration.Feedback was marginally significantly more effective in low frustration than high frustration, F(1,39) = 3.06, p = 0.088, d = 0.14.The main effect of etiquette strategies participant's rating of feedback effectiveness was significant (F(4,156) = 10.31,p < 0.001.The interaction was not significant, F(4,156) = 1.07, p = 0.37. Figure 7b indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, no-feedback was significantly different from all four etiquette strategies.In the high frustration condition, positivepoliteness was significantly different from bald, negative-politeness, off-record, and no-feedback; negative-politeness was significantly different from no-feedback.Across frustration conditions, both high/bald and high/negative-politeness were significantly different from low/positive-politeness and low/no-feedback; high/positive-politeness was significantly different from low/no-feedback; high/off-record was significantly different from low/positive-politeness and low/negative-politeness; high/no-feedback was significantly different from all four etiquette strategies in low frustration.

Mental and Temproal Workload
The main effect of frustration on mental demand was not significant, F(1,39) = 0.03, p = 0.87.The main effect of etiquette strategies on mental demand was significant, F(4,156) = 6.69, p < 0.001.The interaction was not significant, F(4,156) = 0.32, p = 0.87, Figure 8a indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, negative-politeness was significantly different from bald and no-feedback.In the high frustration condition, bald was significantly different from negative-politeness and offrecord; positive-politeness was significantly different from negative-politeness; negative-politeness was significantly different from no-feedback.Across frustration conditions, high/bald was significantly different from low/negative-politeness; high/negative-politeness was significantly different from low/bald.Low/positive-politeness, low/off-record, and low/no-feedback; high/offrecord was significantly different from low/bald and low/no-feedback; high/no-feedback was significantly different from low/negative-politeness. Feedback was marginally significantly more effective in low frustration than high frustration, F(1,39) = 3.06, p = 0.088, d = 0.14.The main effect of etiquette strategies participant's rating of feedback effectiveness was significant (F(4,156) = 10.31,p < 0.001.The interaction was not significant, F(4,156) = 1.07, p = 0.37. Figure 7b indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, no-feedback was significantly different from all four etiquette strategies.In the high frustration condition, positive-politeness was significantly different from bald, negative-politeness, off-record, and no-feedback; negative-politeness was significantly different from no-feedback.Across frustration conditions, both high/bald and high/negative-politeness were significantly different from low/positive-politeness and low/no-feedback; high/positive-politeness was significantly different from low/no-feedback; high/off-record was significantly different from low/positive-politeness and low/negative-politeness; high/no-feedback was significantly different from all four etiquette strategies in low frustration.

Mental and Temproal Workload
The main effect of frustration on mental demand was not significant, F(1,39) = 0.03, p = 0.87.The main effect of etiquette strategies on mental demand was significant, F(4,156) = 6.69, p < 0.001.The interaction was not significant, F(4,156) = 0.32, p = 0.87, Figure 8a indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, negative-politeness was significantly different from bald and no-feedback.In the high frustration condition, bald was significantly different from negative-politeness and off-record; positive-politeness was significantly different from negative-politeness; negative-politeness was significantly different from no-feedback.Across frustration conditions, high/bald was significantly different from low/negative-politeness; high/negative-politeness was significantly different from low/bald.Low/positive-politeness, low/off-record, and low/no-feedback; high/off-record was significantly different from low/bald and low/no-feedback; high/no-feedback was significantly different from low/negative-politeness.
Feedback was significantly more temporally demanding in high frustration than low frustration, F(1,39) = 70.3,p < 0.001, d = 1.23.The main effect of etiquette strategies on temporal workload was significant, F(4,156) = 4.82, p = 0.001.The interaction was significant, F(4,155) = 2.54, p = 0.042.Figure 8b indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, both bald and positive-politeness were significantly different from negative-politeness and off-record; negative-politeness was significantly different from no-feedback.In the high frustration condition, all strategies were not significantly different from each other.Across frustration conditions, all high frustration conditions were significantly different from all etiquette strategies in low frustration.Feedback was significantly more temporally demanding in high frustration than low frustration, F(1,39) = 70.3,p < 0.001, d = 1.23.The main effect of etiquette strategies on temporal workload was significant, F(4,156) = 4.82, p = 0.001.The interaction was significant, F(4,155) = 2.54, p = 0.042.Figure 8b indicates significant (p < 0.05) pairwise differences between groups when they do not share a letter, based on Tukey's HSD.In the low frustration condition, both bald and positive-politeness were significantly different from negative-politeness and off-record; negative-politeness was significantly different from no-feedback.In the high frustration condition, all strategies were not significantly different from each other.Across frustration conditions, all high frustration conditions were significantly different from all etiquette strategies in low frustration.

Discussion
Results demonstrated that etiquette strategies significantly influenced motivation, confidence, satisfaction, and performance.However, the null of hypothesis H1 was only partially rejected.Mathematical problem scores in low frustration condition were higher when the bald strategy was provided (as hypothesized in H1), but in the high frustration condition, there were no differences in scores between any etiquette strategies.However, the time constraints in the high frustration condition may have resulted in a ceiling effect, as some participants ran out of time to solve a given problem.When compared to positive politeness, negative politeness lead to higher performance in the high frustration condition.
Positive politeness resulted in higher motivation and satisfaction when compared to the no feedback in the low frustration condition.On the other hand, motivation and satisfaction were not driven by the interaction style of the feedback in the high frustration condition.In the high frustration condition, participants provided feedback with negative politeness had higher confidence in their work when compared to when they were not given any feedback.Moreover, positive politeness led to higher satisfaction with feedback than no feedback in high frustration condition.Thus, positive and negative politeness effectively worked to increase confidence and satisfaction with feedback.These results demonstrated that user's motivation, confidence, satisfaction, and performance vary depending upon the etiquette strategies used in tutoring.Thus, it may be feasible to build an adaptive tutoring system that changed interaction styles in order to make improvements to performance, motivation, confidence, and satisfaction.
The results did not lead to the rejection of hypothesis H2.When participants were frustrated and provided feedback with positive and negative politeness, their self-assessed performance, motivation, confidence, and satisfaction were higher than when they were provided bald, off-record, and no feedback.Thus, the most effective etiquette strategies were different when users are frustrated.
The results provided evidence that people's performance, motivation, confidence, and satisfaction can be affected by a change of etiquette strategy.In addition, there was no correlation between the four dependent variables and participants' baseline etiquette strategy preference ratings,

Discussion
Results demonstrated that etiquette strategies significantly influenced motivation, confidence, satisfaction, and performance.However, the null of hypothesis H1 was only partially rejected.Mathematical problem scores in low frustration condition were higher when the bald strategy was provided (as hypothesized in H1), but in the high frustration condition, there were no differences in scores between any etiquette strategies.However, the time constraints in the high frustration condition may have resulted in a ceiling effect, as some participants ran out of time to solve a given problem.When compared to positive politeness, negative politeness lead to higher performance in the high frustration condition.
Positive politeness resulted in higher motivation and satisfaction when compared to the no feedback in the low frustration condition.On the other hand, motivation and satisfaction were not driven by the interaction style of the feedback in the high frustration condition.In the high frustration condition, participants provided feedback with negative politeness had higher confidence in their work when compared to when they were not given any feedback.Moreover, positive politeness led to higher satisfaction with feedback than no feedback in high frustration condition.Thus, positive and negative politeness effectively worked to increase confidence and satisfaction with feedback.These results demonstrated that user's motivation, confidence, satisfaction, and performance vary depending upon the etiquette strategies used in tutoring.Thus, it may be feasible to build an adaptive tutoring system that changed interaction styles in order to make improvements to performance, motivation, confidence, and satisfaction.
The results did not lead to the rejection of hypothesis H2.When participants were frustrated and provided feedback with positive and negative politeness, their self-assessed performance, motivation, confidence, and satisfaction were higher than when they were provided bald, off-record, and no feedback.Thus, the most effective etiquette strategies were different when users are frustrated.
The results provided evidence that people's performance, motivation, confidence, and satisfaction can be affected by a change of etiquette strategy.In addition, there was no correlation between the four dependent variables and participants' baseline etiquette strategy preference ratings, and so no evidence that the best strategy for these participants was fixed and based on their own preferences.
Although frustration is a common and natural emotion people experience while learning, it impacts on learners' self-esteem, distractibility, and ability to follow directions [64].A tutor's feedback can be a great help to mitigate students' frustration and ultimately reduce the consequences of frustration.The results of this study show that different feedback interaction styles impact different aspects of the learning process.For example, the participants performed better by receiving feedback based on bald and positive politeness under low frustration while they performed better with negative politeness feedback under high frustration.Their satisfaction with performance showed a similar pattern: participants were more satisfied when they received positive politeness feedback under low frustration, but negative politeness feedback under high frustration.These results demonstrated that different etiquette strategies were helpful to improve the participants' performances when they were highly frustrated.It provides the evidence that choosing the proper interaction style can mitigate the influences of frustration.Likewise, the participants' ratings of motivation, satisfaction, and confidence showed a similar tendency.Since motivation, satisfaction, and confidence are directly connected to the students' learning goals, providing appropriate feedback to support these is crucial to enhance effective learning [23].These results can be applicable for not only a human tutor but also a computer tutor.

Conclusions
The results of this work lay the foundation for using etiquette strategies as a method to realize affect-aware ITSs that can support a student emotionally as well as cognitively.Results demonstrated that varying the interaction style of feedback presentation in an ITS has differential effects depending on the emotional state of the learner.Furthermore, results demonstrated that there is not one "best" strategy to simultaneously improve motivation, confidence, satisfaction and performance.Different etiquette strategies influence these factors differently, depending on the learner's current emotional and learning state.Further research is needed to establish the interaction of strategy impacts.
Frustration is one of the most frequently occurring emotions in the use of computers [43] and in learning [18].In the same way human tutors adapt their feedback to learners when they become frustrated, an adaptive computer system could change its communication style.Based on an understanding of the user's emotional state, the system could adapt its interaction style to mitigate frustration, improve human-computer interaction, and potentially improve task performance.This study provided a basic understanding of the role of different interaction styles of feedback under varying user emotional states and can be used to form the basis of an adaptive tutoring system.This experiment used only math problems.It is possible that the type of task will greatly influence the best feedback strategy.Further work will be needed to generalize the results of this study.The level of frustration, although moderate on an absolute scale, had a significant effect on the appropriateness and effectiveness ratings of the feedback.Future work will study the effect of higher levels of frustration on motivation, confidence, satisfaction and performance.Future work could also consider personality factors such as learner attributional style, perceived competency, or self, which may influence motivation and hence learning [65].
In human-computer tutoring, most of the real-time adaptation is triggered by poor performance and results in a change to the task difficulty or problem content.However, a good human tutor will be aware of the emotional state of the learner and adapt their interaction style to support aspects of the student's learning that underlie performance such as a student's motivation, confidence, or satisfaction.This work aims to provide mechanisms to support the promotion of individualized learning in the context of high level math instruction.Future work will look at the ability to adapt interaction styles depending on the emotional state of the students as well as the goal of the tutor.The results presented here could be used to derive the logic of etiquette strategies adaptation to form the basis of an adaptive tutoring agent.In on-going work, an adaptive tutoring system was designed to improve the learning factors of motivation, confidence, satisfaction, and performance using a rule set developed based on the current data set [66], to trigger the most appropriate etiquette strategy for a given combination of factors and frustration level.

Figure 2 .
Figure 2. (a) Average and standard error of strategies preference (n = 40); (b) Count of preferred strategy.

Figure 2 .
Figure 2. (a) Average and standard error of strategies preference (n = 40); (b) Count of preferred strategy.

Figure 4 .
Figure 4. Mean and standard error of (a) problem score and (b) NASA task load index (TLX) performance (n = 40).

Figure 4 .
Figure 4. Mean and standard error of (a) problem score and (b) NASA task load index (TLX) performance (n = 40).

Figure 4 .
Figure 4. Mean and standard error of (a) problem score and (b) NASA task load index (TLX) performance (n = 40).

Figure 8 .
Figure 8. Mean and standard error of TLX (a) mental demand and (b) temporal demand (n = 40).

Figure 8 .
Figure 8. Mean and standard error of TLX (a) mental demand and (b) temporal demand (n = 40).

Table 1 .
Example sentences of etiquette strategies.

Table 1 .
Example sentences of etiquette strategies.