Evaluating the Nao Robot in the Role of Personal Assistant: The Effect of Gender in Robot Performance Evaluation

By using techniques such as the Wizard of Oz (WoZ) and video capture, this paper evaluated the performance of the Nao Robot in the role of a personal assistant, which was valuated alongside the impact of the assigned gender (male/female) in the perceived performance of the robot assistant. Within a sample size of 39 computer sciences students, this study assessed criteria such as: perceived enjoyment, intention to use, perceived sociability, trust, intelligence, animacy, anthropomorphism, and sympathy, utilizing testing tools such as Unified Theory of Acceptance and Use of Technology (UTAUT) and Godspeed Questionnaire (GSQ). These methods identified a significant effect of the gender assigned to the robot in variables such as intelligence and sympathy.


Introduction
Starting as manufacturing machines, robots have opened to a wider application of areas such as education [1][2][3], personal assistance [4], health [5,6], and many others. Multiple research studies have described different elements to consider in the design of interactions with socially intelligent robots [6][7][8][9]. The theory of "Social Intelligent Robots" [6] establishes that the interaction with a robot must meet four different criteria. Being Socially Evocative relies on anthropomorphizing and capitalizes on feelings evoked. Being Socially Situated requires being able to react to other social agents and objects in the environment. The ability to be sociable and Proactively Engage with humans satisfies internal social aims. Displaying Social Intelligence requires showing deep models of human cognition and social competence during robot interaction.
One significant social element in the Human-Robot Interaction (HRI) is the gender assigned to the robot. Multiple research studies [10][11][12][13][14][15][16] have demonstrated that stereotypes transferred to the robot through the gender played a significant role in the user's interaction, showing how the gender has an effect on the user perception of the robot's performance and attitude executing tasks stereotypically associated to each gender [10]. Similar tests were also applied to different robot models aiming to the same results, and also demonstrated that participants feel more comfortable interacting with robots aligned with the stereotypical gender roles [11]. From this general postulation, gender has an effect in the user-robot interaction, and other research approaches had developed novel methods in the field, demonstrating the differences in the elicited information in the users when interacting with robots of different genders [12] or the level of persuasiveness assigned to the robot regarding the robot and user's gender [13]. The effect of the robot gender can be perceived even by changing only the isolated variables of voice levels [14] or aesthetic appeal [15] in the robot.
The current research evaluated the effectiveness of the robot Nao in the execution of the role of personal assistant, using criterias such as: trust, intention to use, perceived enjoyment, and perceived sociability. In addition, this work evaluated the impact of the gender in the perception of this performance by measuring the perceived enjoyment, intention to use, perceived sociability, trust, intelligence, animacy, anthropomorphism, sympathy, service value, and topic comforts.

Wizard of Oz (WoZ)
The Woz is utilized in technologies especially under development. In HRI, the WoZ technique [17] involves simulating the end state of the technology operating through the assistance of people operating the robot. This simulation creates the illusion that the robot works autonomously.

Video-Based Human-Robot Interaction (VHRI)
The VHRI [18] technique is one of the easiest and affordable techniques, and also allows the evaluation of the interaction with advanced technologies in controlled environments. The technique is used by recording videos of the desired interaction with the robot.

Unified Theory of Acceptance and Use of Technology (UTAUT)
UTAUT [19] is a theoretical framework that was created to evaluate the acceptance and usage of technologies. It integrates elements of other theories such as the theory of reasoned action, motivation model, theory of social cognition, and innovation diffusion theory, among others [15].

Godspeed Questionnaire (GSQ)
The GSQ [20] is a standardized instrument in the HRI field translated to multiple languages. It is supported by multicultural previous research. By using semiotic differential scales, the GSQ evaluates constructs such as anthropomorphism, animacy, likeability, and perceived intelligence.

The Robot
For this research, we used the Robot Nao Model V6 (see Figure 1) from SoftBank Robotic [21]. The main characteristics of this robot are: 58 cm height, 25 degrees of freedom in limb movement, 4 directional speakers, a limited manipulation of objects, verbal communication capacity, and humanoid appeal.

Design
For this research, we designed two different scenarios:

First Scenario: Gender Evaluation Effect
In the first scenario, we applied the video capture technique and the GSQ, evaluating the impact of the gender assigned to the robot in its perceived performance as a personal assistant. For this scenario, we created two sets of videos of the robot performing the same activities, dialogues, and movements. The only difference was the voice tone of the robot in each video. In one pack of videos the voice emulated a male tone, while the other used a female appeal. The participants group was randomly divided into two cohort subgroups. Using headphones, each subgroup watched a single gender pack of videos (male or female). During the set of videos, the robot performed the same activities related to the role of personal assistant such as taking notes in a meeting or tracking agenda activities. After watching the videos, the participant completed a survey. In the survey, the robot was addressed consistently to the specific gender of the videos presented.
The survey used in this first part included items of the GSQ, plus some additional items related to the gender effect evaluation. Among the evaluated items are: animacy, anthropomorphism, intelligence, sympathy, value of the service, robot role, and type of information shared.

Second Scenario: Interaction Evaluation
In the second scenario, we applied the WoZ technique and the UTAUT questionnaire items. In this scenario, the participants interacted directly with the robot in a building reception area. In this scenario, the robot performed as a personal assistant. The robot provided information regarding the university courses, study programs, and professors' office locations, among other administrative information. The flow of this activity was related to the interaction and questions of the participants. No gender variables were evaluated in this scenario, and the robot presented with a standard androgynous voice.
For the scenario, the operator of the robot had a script for the most common questions and answers. In case the participants had other specific questions, we also included the capability of inputting a specific answer to the robot through typing unique sentences.
After the interaction, the participants completed a second survey evaluating their second experience with the robot. For this evaluation, we applied criterias of the UTAUT questionnaire such as intention to use, perceived enjoyment, perceived sociability, and trust.

Population
Both scenarios were applied in a group of 39 students of computer sciences from two different groups, which were divided in 30 male and nine female participants. The mean age was 21.21 years old with SD = 2.32. From the total of 39 participants, 17 participants received the video of Nao as a female personal assistant, while the other 22 participants received the video of Nao as a male personal assistant.

First Scenario: Gender Evaluation Effect
According to the multiple one-way ANOVA analysis applied to the group, significant variance was identified in the analysis of sympathy F(1,37) = 6.13, p = 0.018 and intelligence F(1,37) = 4.47, p = 0.036, alpha = 0.05. In both categories, the robot in the female role achieved higher mean values in the evaluation.
Regarding the significant variance identified in the sympathy perception, higher values were achieved by the female robot, reaching M = 4.39, SD = 0.65; meanwhile, the male robot obtained M = 3.76, SD = 0.87. Analyzing the items in the sympathy category, the higher scores for the female robot were achieved for the items related to being friendly, kind, and pleasant.
Regarding the intelligence perceived, the higher values were also reached by the female robot succeeding M = 3.98, SD = 0.77; meanwhile, the male robot reached values of M = 3.34, SD = 0.87. The items for which the female robot highly succeeded in the intelligence category were related to an evaluation of responsibility and being reasonable.
No significant difference was identified in the evaluation of the value of the service provided according to the gender of the robot F(1,37) = 0.33, p = 0.571, alpha = 0.05. The female service value perception was M = 623.47, SD = 314.30, while the male service value perception was M = 270.45, SD = 294.66 (see Figure 2). Finally, no significant differences were identified regarding the kind of information the participants felt comfortable sharing with the robot related to the gender (see Figure 3). However, there was a significant difference in the information that the participants felt comfortable sharing; "data and information" was the most common material that the participants were willing to share with the robot.   Table 1).

Conclusions
The performance of the robot as an assistant was evaluated during the direct interaction of the WoZ. It performed with significant highly performance values. The levels of sociability, enjoyment, intention to use, and trust were satisfactory according to the standards of the test. It might be said that the level of trust is continuously the less favorable category according to the UTAUT standards. The lack of trust in the robot or technology is a challenge to develop in this kind of interaction. It should also consider the evaluation of the exposition of this kind of technology regarding time as a variable.
Regarding the gender role evaluated, we identified significant differences in the intelligence and sympathy perceived. The robot representing the female role achieved higher values. No significant differences were identified in the animacy, antrophomorphism, service value, and topic comfort.
The higher intelligence variability ratings for the female robot were related to the perception of responsibility and reasonability in the female assistant. Regarding sympathy, the female assistant was perceived as being more kind, friendly, and pleasant.

Analysis
This research evaluated the effectiveness of the Nao robot in the role of personal assistant. The results are consistent with the previous references. The decrease of trust, in relation to the other variables such as intention to use, is consistent with the lack of will to sharing meaningful information with robots. It was most preferable to share only information related to "information and data". It appears as though the social robots are still struggling to obtain trustworthiness from participants over functionability. Although their performance during tasks is impressive, more solid reliance at deeper levels is still required.
Regarding the significant effect that the robot's gender has the human interaction, this study shows a significant difference in the perception of intelligence and sympathy. Further investigation is required to elaborate the effect of the selected role, which is conventionally assigned to the female role, and the mental models linked to the gender itself. According to the current results, during the female representation of the role, the attribution of intelligence could be related to the higher perception of responsibility and reasonability; meanwhile, the higher sympathy rating might be related to a higher perception of being friendly, kind, and pleasant. This is consistent with the previous research stating that records of better performance are assigned to robots executing tasks aligned to traditional roles. However, more research is required to state if that is the only factor causing this significant difference, or if there is a more positive general perception of the female gender regarding the intelligence and sympathy variables.
Regarding the non-significant differences identified in the service value, it is also interesting to mention that no significant gap was identified between genders, even though one of them was better qualified in its performance. Similar research might be also important to identify gaps in the gender perception in different roles and scenarios.
Finally, no significant differences were found in the variables of animacy and anthropomorphism, which might be explained because no changes were made in the aesthetics of the robot or movements. The reliability of the test was consistently maintained in both videos, when no significant difference was identified related to the gender perception.
As a risk internal evaluation of the current experiment, it is important to consider that the population surveyed was only 39 people. Most of them were male, and all were students of computer science. It is required that future research improve the tests applied in this experiment and complement the results discovered. In addition, enlarging the scenario with a wider spectrum of roles might also improve the ecological validity of the results.