Design of E ﬀ ective Robotic Gaze-Based Social Cueing for Users in Task-Oriented Situations: How to Overcome In-Attentional Blindness?

: Robotic eye-gaze-based cueing has been studied and proved to be e ﬀ ective, in controlled environments, in achieving social functions as humans gaze. However, its dynamic adaptability in various real interactions has not been explored in-depth. This paper addresses a case where a simplistic robotic gaze fails to achieve e ﬀ ective social cueing in human–robot communication, primarily due to in-attentional blindness (IB), and presents a method that enables the robot to deliver gaze-based social signals adequately. To understand the implications of IB and ﬁgure out ways to overcome any limitations from IB, which frequently arise in task-oriented situations, we designed a set of 1-on-1 communication experiments consisting of a robotic tutor and human learners participating in multiple-choice quiz sessions (task-oriented situation). Here, multimedia contents were utilized alongside the robot as visual stimuli competing for the human’s attention. We hypothesized that quiz scores would increase when humans successfully recognize the robot’s gaze-based cue signals hinting answers. We compared the impacts of two di ﬀ erent cueing methods: oblique cueing (OC), where cues were straightforwardly provided regardless of whether participants were potentially experiencing IB or not, and leading cueing (LC), where cueing procedures were led through achieving eye contact and securing the participants’ attention before signaling the cue. By comparing participants’ test scores achieved by the control group with no cueing (NC) and two experimental groups of OC and LC, respectively, we found that there was a signiﬁcant increase in test scores only when the LC method was utilized. This experiment illustrates the importance of proactively guiding a user’s attention through sophisticated interaction design in e ﬀ ectively attaining a user’s attention and successfully delivering cues. In future studies, we aim to evaluate di ﬀ erent methods by which a social robot can intentionally shift a human’s attention, such as incorporating stimuli from various multi-modal human communication channels.


Introduction
Social cueing is an essential element in human communication, and human gaze plays a crucial role in non-verbal social cueing. Human gaze is a complex behavior integrating physical, psychological, and social functions, resulting in several combinations of primitive behaviors such as saccades, vestibulo-ocular reflexes, smooth pursuits, or vergences. The functionality of human gaze can include information gathering, signaling of intention/attention, expressing emotional states, or even providing Since the gaze plays an important role in sending attention signals [21], gaze-based social cueing has been actively studied in the HRI domain. Yoshikawa et al. showed that the effect of eye-gazing in face-to-face human-robot interaction and found the importance of responsive gaze control [22]. Palinko et al. executed a comparison study on eye-gaze and head-gaze, and although their results showed higher performance in eye-gaze, they did not imply either one alone can transfer enough information in social communication [23]. Atienze and Zelinsky developed an active vision system for effective human intention recognition in collaborative tasks and showed the efficiency of combining natural gaze and gestures [24]. Miyauchi et al. showed the effectiveness of active eye contact as a means of meta-communication [25], and Boucher et al. also justified the importance of gaze in human-human and human-robot interaction [26]. Some other studies systematized the gaze functions in human-robot interaction according to five social contexts: establishing liveliness, signaling social attention, regulating interaction processes, supporting interaction content, and projecting mental states [7,21].

In-attentional Blindness during Task-Related Interaction
While most of the prior work has focused on gaze control as a single action, recent studies imply that "in-attentional blindness" in visual-cueing and social communications has been often disregarded in robot-based studies. In-attentional blindness (IB), also known as perceptual blindness, is a phenomenon in which an individual is not aware of unexpected stimuli in plain sight when focusing on a task [19,20]. This phenomenon can result in failure in perceiving social cue or visual gaze from robotic systems during human-robot interaction and communication when a person's attention is occupied by a certain task. Furthermore, the visual cue (e.g., gaze) is a visual attention factor, so the visual cue may not be effectively perceived when the user is not looking at the robot's eyes due to the IB. As a result that people might not see the robot's face, some studies do not even take into account the gaze by robots during interaction [27][28][29].
This well-known psychological factor provides a consideration in experimental design: robotic systems need the capacity to avoid or overcome any interaction that may involve IB. A few potential solutions could include (1) passively waiting for the right time for a user not to focus on other tasks before delivering a visual cue, (2) proactively attracting the user's attention by using modalities other than visual stimuli, then delivering the visual cue only after the person's attention has been drawn to the robot, or (3) reinforcing the visual cue in terms of frequency, duration, and amplitude.
In this work, we would like to employ the second approach by first establishing joint attention through the use of non-visual cues and then delivering the intended visual cue. Our hypotheses, therefore, are composed as follows.

Hypotheses
To monitor the presence of in-attentional blindness (IB) in human-robot interactions and to provide a social communication strategy to overcome the challenges of IB, we set the following hypotheses for this study.

Hypothesis 1 (H1).
A robot's gaze-based social cue, designed to aid in improving the user's performance in a task-oriented communication, may suffer from IB during a human-robot interaction.
To evaluate H1, we designed a robotic system (both a robotic body and its software simulator [30]) to conduct human-robot collaborative quiz solving in which the robot guides conversational quiz experiment. We aimed to measure the impact of the robot's gaze-based cue and evaluated H1 while monitoring the presence of IB. The details of the experimental design can be found in Section 4.2.

Hypothesis 2 (H2).
Our proposed approach based on a foundation of joint-attention will effectively improve the performance in a human-robot collaborative task.
To evaluate H2, we designed a two-stage gaze control framework which initially worked to re-attain the user's attention through the use of non-visual cues, before proceeding with a more basic form of gaze-based social cueing. The details of the experimental design are also described in Section 4.2.

Robotic Platform and Gaze Control
We created a custom-designed robotic platform with dynamic eye-gaze and a multi-modal robotic communication simulator, which we named "My Own Cognitive Communication Agent" (hereafter referred to as "MOCCA") [30]. The MOCCA system is composed of robotic hardware working in conjunction with a computer-based simulated environment created in Unity TM . The robotic body and its virtual twin are designed to provide synchronized responses in real time to user inputs. Our experiments use the virtual MOCCA, in conjunction with an eye-tracker that tracks the human participants' gaze targets to obtain quantifiable data required for evaluating the efficacy of robotic gaze in real-time human-robot gaze interactions. More broadly, we have used the MOCCA system to study the interactive and responsive characteristics of robotic gaze control and mutual gaze in human-robot interaction scenarios [31,32].

Experimental Design
To analyze the effectiveness of different gaze-based social cueing methods in IB situations, we designed a social and collaborative communication scenario, which features a robotic tutor and a human learner. In this scenario, the MOCCA software provides multiple-choice questions to participants in 1-on-1 quiz sessions. While the participant considers possible answers to the questions, the MOCCA character uses gazed-based expression to give a cue revealing the answer at each question. We hypothesized that the total scores from the quiz would increase when participants successfully recognize these cues, as they would consequently choose the answer with the help of the robotic gaze-based cueing.
As such, we utilized quiz scores as a metric for evaluating the efficacy of social cues. To ensure that the evaluation was objective, quiz questions primarily asked participants to name the capitals of various countries (The quiz questions are shown in Appendix A). This scenario differs significantly from the guessing game question [33,34], in which participants' baseline performances can be assumed to be the same owing to the random nature of the problem. Instead, the performance of participants in response to our questions would be highly dependent on the participants' prior knowledge. As a result, we first needed to ensure that we conducted the comparative analysis from the same baseline.
The experiments were conducted in a controlled environment to minimize distractions where participants were sitting towards a wall and wearing an earphone to focus on the robot's sound.

Participants
We recruited 93 participants (with the gender distribution of 45 males and 48 females; and ages spanning 20 to 29) and they are randomly assigned to three groups: "no cue" (NC) control group (30 participants: 15 males and 15 females), "oblique cue" (OC) group (30 participants: 14 males and 16 females), and "leading cue" (LC) group (33 participants: 16 males and 17 females). All participants were asked to take multiple-choice quizzes with the virtual MOCCA robot, and each participant answered 12 questions in total: the first set of 6 questions (Pre-test set) ensured group equivalence, and the second set of 6 questions (Test set) was used for comparative analysis. In order to assess participants' baseline scores, the MOCCA robot did not give any hints during the Pre-test set to all the participants, regardless of groups they were part of. During the following Test set, the MOCCA robot applied different interventions to each group, respectively. For the NC control group, the robot continued providing no hints. For the other two groups (OC and LC), the robot provided hints to the correct answer by two different types of rapid gaze-based cues. For the OC group, the robot used an indirect and passive cueing method to bring about a possible IB situation. Conversely, for the LC group, the robot tried to guide participants' attention on itself through a directed and contextual cueing method before giving hints.

Procedures
Detailed ways in which MOCCA hosted participants of each group are as follows. During each question, the MOCCA robot presented four choices displayed around her head on the screen as shown in Figure 1. Each choice was intentionally placed at each corner side so that the participant would be able to notice the eye movements of MOCCA while reading through the four choices. In order to provide a cue revealing the answer to participants, the MOCCA robot made a quick, head-eye combined movement toward the right choice as shown in Figure 1.
Participants in the OC group were asked to read through the four choices independently and select an answer at their leisure. We intended MOCCA to provide participants with a hint after they had read through all the choices, which we estimated would take two seconds after all choices first appeared on the screen. For the LC group, the MOCCA robot loosely controlled the participants' pace and attention by reading out the four choices.
Appl. Sci. 2020, 10, 0 5 of 12 to provide a cue revealing the answer to participants, the MOCCA robot made a quick, head-eye combined movement toward the right choice as shown in Figure 1. Participants in the OC group were asked to read through the four choices independently and select an answer at their leisure. We intended MOCCA to provide participants with a hint after they had read through all the choices, which we estimated would take two seconds after all choices first appeared on the screen. For the LC group, the MOCCA robot loosely controlled the participants' pace and attention by reading out the four choices. Once MOCCA completed this action, she instructed participants to answer on the count of three, and then provided a hint on the count of two. We expected the participants' visual attention to be directed to the MOCCA robot on the countdown by audibly counting to three.
After Q3: Were you aware that MOCCA gave you hints during the quiz sessions? (1: I was not at all aware, 2: I was slightly aware, 3: I was somewhat aware, 4: I was moderately aware, 5: I was clearly aware).

Results
Our results from the study design described in Section 4 showed clear presence of IB and the efficacy of our approach. The quiz scores from each group, designed to show the evidence of the effectiveness of the corresponding social cue, are compared in Table 1 with the mean and standard deviation of each set score.  Once MOCCA completed this action, she instructed participants to answer on the count of three, and then provided a hint on the count of two. We expected the participants' visual attention to be directed to the MOCCA robot on the countdown by audibly counting to three.
After Q3: Were you aware that MOCCA gave you hints during the quiz sessions? (1: I was not at all aware, 2: I was slightly aware, 3: I was somewhat aware, 4: I was moderately aware, 5: I was clearly aware).

Results
Our results from the study design described in Section 4 showed clear presence of IB and the efficacy of our approach. The quiz scores from each group, designed to show the evidence of the effectiveness of the corresponding social cue, are compared in Table 1 with the mean and standard deviation of each set score. The one-way ANOVA analysis on this data indicated that there were no significant differences in the scores achieved by all three groups in the Pre-test set of questions in which no social cues are provided. This, thereby, demonstrates the group equivalence and well-balanced group design on the participants.
As for the Test set scores, the Test set score of the group OC was higher than that of the group NC, although this distinction was not statistically significant, despite the presentation of hints to the OC group by MOCCA. The effect of social cueing shown in the group LC, however, was more apparent and noticeable. The scores of the group LC were significantly higher compared to those of both groups OC and NC. This results are displayed in Figure 2. We also examined the increase in the scores between quiz sets (Pre-test set vs. Test set) within each group. The increase in the Test set score was significant only in the group LC as displayed in Figure 3.
Appl. Sci. 2020, 10, 0 6 of 12 The one-way ANOVA analysis on this data indicated that there were no significant differences in the scores achieved by all three groups in the Pre-test set of questions in which no social cues are provided. This, thereby, demonstrates the group equivalence and well-balanced group design on the participants.
As for the Test set scores, the Test set score of the group OC was higher than that of the group NC, although this distinction was not statistically significant, despite the presentation of hints to the OC group by MOCCA. The effect of social cueing shown in the group LC, however, was more apparent and noticeable. The scores of the group LC were significantly higher compared to those of both groups OC and NC. This results are displayed in Figure 2. We also examined the increase in the scores between quiz sets (Pre-test set vs. Test set) within each group. The increase in the Test set score was significant only in the group LC as displayed in Figure 3.  Additionally, the post-survey results showed that there was no statistically significant difference found on the likeability of MOCCA among the participants but showed general fondness (average of 3.5 in a 5-level Likert scale) as displayed in the left side of Figure 4. However, the visible trend on the linear increase of likeability implies that the LC can be effective in developing a more interactive and effective social robotic cueing mechanism as the social cueing mechanism becomes more proactive (NC < OC < LC), with a big enough participants' size of 93. Appl. Sci. 2020, 10, 0 6 of 12 The one-way ANOVA analysis on this data indicated that there were no significant differences in the scores achieved by all three groups in the Pre-test set of questions in which no social cues are provided. This, thereby, demonstrates the group equivalence and well-balanced group design on the participants.
As for the Test set scores, the Test set score of the group OC was higher than that of the group NC, although this distinction was not statistically significant, despite the presentation of hints to the OC group by MOCCA. The effect of social cueing shown in the group LC, however, was more apparent and noticeable. The scores of the group LC were significantly higher compared to those of both groups OC and NC. This results are displayed in Figure 2. We also examined the increase in the scores between quiz sets (Pre-test set vs. Test set) within each group. The increase in the Test set score was significant only in the group LC as displayed in Figure 3.  Additionally, the post-survey results showed that there was no statistically significant difference found on the likeability of MOCCA among the participants but showed general fondness (average of 3.5 in a 5-level Likert scale) as displayed in the left side of Figure 4. However, the visible trend on the linear increase of likeability implies that the LC can be effective in developing a more interactive and effective social robotic cueing mechanism as the social cueing mechanism becomes more proactive (NC < OC < LC), with a big enough participants' size of 93. Additionally, the post-survey results showed that there was no statistically significant difference found on the likeability of MOCCA among the participants but showed general fondness (average of 3.5 in a 5-level Likert scale) as displayed in the left side of Figure 4. However, the visible trend on the linear increase of likeability implies that the LC can be effective in developing a more interactive and effective social robotic cueing mechanism as the social cueing mechanism becomes more proactive (NC < OC < LC), with a big enough participants' size of 93. The participants' response on the trust level, as shown in the right side of Figure 4, shows no statistical difference nor any trend. On the contrary, the distinct decline of trust level on the oblique cue (OC) group shows that the random and unexpected social cue can play a negative role in gaining trust between social human-robot interaction.
The third question from the post-survey as depicted in Figure 5 aligns with our anticipations in the cue design as well as the level of trust above. The distribution shows clear difference in the success of cue awareness between OC and LC. The system also tracked and recorded participants' gaze trajectories in real time while they were responding to the questions. Figures 6 and 7 demonstrate that the gaze distributions in the OC and LC groups. In oblique cuing (OC) case (Figure 6), the gaze trajectories are distributed over most of the choice buttons but not much on the robotic face, which means the robot's social cueing did not have any implications on the human's attention. However, in the leading cueing (LC) case (Figure 7), it is obvious that the user's attentions were focused mainly on the robot's face and the button toward which the robot focused its leading cue. The participants' response on the trust level, as shown in the right side of Figure 4, shows no statistical difference nor any trend. On the contrary, the distinct decline of trust level on the oblique cue (OC) group shows that the random and unexpected social cue can play a negative role in gaining trust between social human-robot interaction.
The third question from the post-survey as depicted in Figure 5 aligns with our anticipations in the cue design as well as the level of trust above. The distribution shows clear difference in the success of cue awareness between OC and LC.
Appl. Sci. 2020, 10, 0 7 of 12 The participants' response on the trust level, as shown in the right side of Figure 4, shows no statistical difference nor any trend. On the contrary, the distinct decline of trust level on the oblique cue (OC) group shows that the random and unexpected social cue can play a negative role in gaining trust between social human-robot interaction.
The third question from the post-survey as depicted in Figure 5 aligns with our anticipations in the cue design as well as the level of trust above. The distribution shows clear difference in the success of cue awareness between OC and LC. The system also tracked and recorded participants' gaze trajectories in real time while they were responding to the questions. Figures 6 and 7 demonstrate that the gaze distributions in the OC and LC groups. In oblique cuing (OC) case (Figure 6), the gaze trajectories are distributed over most of the choice buttons but not much on the robotic face, which means the robot's social cueing did not have any implications on the human's attention. However, in the leading cueing (LC) case (Figure 7), it is obvious that the user's attentions were focused mainly on the robot's face and the button toward which the robot focused its leading cue. The system also tracked and recorded participants' gaze trajectories in real time while they were responding to the questions. Figures 6 and 7 demonstrate that the gaze distributions in the OC and LC groups. In oblique cuing (OC) case (Figure 6), the gaze trajectories are distributed over most of the choice buttons but not much on the robotic face, which means the robot's social cueing did not have any implications on the human's attention. However, in the leading cueing (LC) case (Figure 7), it is obvious that the user's attentions were focused mainly on the robot's face and the button toward which the robot focused its leading cue. Appl. Sci. 2020, 10, 0 8 of 12

Discussions
The findings from the experiment supported the formulated hypotheses. Our first hypothesis H1 anticipated that the impact of the robot's gaze-based social cues might be limited due to the presence of the in-attentional blindness. By comparing the test scores of the NC group (to which no cues were provided) with those of the experimental OC group (to which cues were provided when participants

Discussions
The findings from the experiment supported the formulated hypotheses. Our first hypothesis H1 anticipated that the impact of the robot's gaze-based social cues might be limited due to the presence of the in-attentional blindness. By comparing the test scores of the NC group (to which no cues were provided) with those of the experimental OC group (to which cues were provided when participants

Discussions
The findings from the experiment supported the formulated hypotheses. Our first hypothesis H1 anticipated that the impact of the robot's gaze-based social cues might be limited due to the presence of the in-attentional blindness. By comparing the test scores of the NC group (to which no cues were provided) with those of the experimental OC group (to which cues were provided when participants might be reading the choices), we found that there was no significant increase in the OC group's test scores despite being tipped for the answer.
To figure out a solution to overcome the drop in social communication during IB cases, we designed a "leading cueing", where the robot deliberately guided the participant's attention by achieving eye contact before signaling the correct answer. We hypothesized in the H2 that this LC method would improve participants' performance by making it more likely that participants chose the right answers. By comparing the LC group's test scores with those of the NC and OC groups, respectively, we found that the robot's leading cues led to a significant increase in the test scores of the LC group when compared to those of the other two groups.
As Fisher et al. [11] explained, the social and communicative function of the robotic gaze can only be fulfilled when the robot successfully secures the user's attention and willingness to take the message. This experiment also demonstrated the importance of proactively guiding and leading a user's attention through sophisticated interaction design to successfully capture the user's attention in potential IB situations. Moreover, Morgan et al. [35] showed that the cue-agent's mental state directly impacts participants' performance on a perceptual task. Thus, it seems to be a meaningful study to use robot's modalities to control its mental state and examine the improved effects on dealing with IB.
Furthermore, as a result of the participants' increased awareness of the robot's hints, a significantly higher number of participants reported being aware of the help cues (4.3/5.0) in the LC group while most participants were not at all cognizant of the cues (2.2/5.0) in the OC group.
The quiz task used in our experiment can be generalized to situations in which the user needs to focus on a certain task. That is, in order for the robot to effectively communicate to the user, the communication method and timing of the robot must be designed in consideration of the occurrence of the IB situation in a general situation in which the user and the robot collaborate. In addition, we used the quiz questions of asking the capitals of certain countries, but this was only to create a set of common-sense questions. If expertise needs to be required, the IB situation could arise more evidently. Thus, if the robot is able to achieve delivering the collaborative social cue more effectively, it is expected that the user's task performance and the likeability and reliability of the robot can be increased.

Conclusions
In this study, we have examined the ability of a social robot to adequately deliver collaborative gaze-based social signals to draw the participant's attention which is easily disturbed by the presence of various visual stimuli. We primarily considered the existence and importance of in-attentional blindness in human-robot interaction with a focus on task-oriented communication by designing and conducting 1-on-1 collaborative quiz experiments. In accordance with the results, we can conclude that a robot's gaze-based social cue may suffer from IB during a human-robot interaction (Hypothesis 1 (H1)); and the proposed proactive attention attraction based on a foundation of joint-attention will effectively improve the performance in a human-robot collaborative task (Hypothesis 2 (H2)).
In future studies, we aim to evaluate different methods by which a social robot can intentionally affect or control a human's attention, such as by incorporating the stimuli of various modalities (e.g., eye tracking, vocal interaction, or body gestures). Through these multi-modal approaches, we will expand our study on how to effectively increase joint attention, successfully avoid in-attentional blindness even in multi-user interaction scenarios, and how to effectively utilize the users' eye gaze data in real time.