There is a continuous development of virtual agents and their possible applications. Virtual agents are computer interfaces mostly represented through a virtual embodied character and are humanlike in the way they communicate by using verbal and nonverbal cues [1
]. Based on these abilities, those agents are promoted as future personal everyday-life assistants, which might help scheduling and remembering appointments, inform about the weather, or assist with shopping [2
]. Although the latest commercial agents (such as Siri or Alexa) are solely speech-based and represented by a voice only, an embodied character was found to enhance the human–agent interaction in a positive way [3
]. A meta-analysis revealed the impact of an embodied representation [3
]: a comparison of 46 papers showed that the representation of a humanoid face had, in most cases, a more positive outcome compared to the absence of a represented face [3
]. Hence, the human–agent interaction seems to benefit from an embodied character.
Furthermore, multiple studies demonstrate that the appearance of those embodied characters affect different social variables (e.g., motivation [4
] or the agent’s general evaluation [5
]). Thus, a virtual agent’s appearance has a high impact on the human–agent interaction and is of great importance.
Accordingly, it is important to adapt the appearance of the agent to the user’s needs and preferences in order to enhance beneficial effects of human–agent interaction. With regard to the manifold application fields, the target groups and potential users are diverse and will differ in multiple variables. The application as a virtual assistant, for example, seems to be especially beneficial for people in need of support, such as elderly or cognitively impaired people [6
]. Nowadays, those people are dependent on other people’s help, but for some tasks, virtual agents can assist their daily life to enable a more self-determined life. This special target group largely differs from digital natives, such as students, who are usually employed in user studies (e.g., regarding technical skills, prior experiences with technologies, or cognitive abilities). Therefore, it is important to analyze the specific needs and preferences of older and/or cognitively impaired users not only with regard to functions, but also with regard to appearance variables. Besides specific target groups, also users with special characteristics concerning attitudes, personality traits, or prior experiences will interact with a virtual assistant. These characteristics also need to be taken into account when the virtual agent’s appearance will be analyzed. This might eventually help to enhance the acceptance of these new technologies and to enrich the social interaction.
The best opportunity to tailor the appearance to the user’s needs seems to be the customization by the users themselves. However, this process is expensive, time-consuming, and mostly requires technical skills, which special target groups might be afraid of and not able to do. Similarly, results showed that the free customization and own design process of a virtual agent’s appearance does not lead to higher user satisfaction compared to a choice out of different options [7
]. Therefore, it can be more helpful to know the potential user group and its specific characteristics (e.g., personality traits such as tendency to anthropomorphize) and tailor the appearance or the choice of appearances to these needs. In order to do so, the impact that user characteristics such as age, gender, cognitive impairments, or personality traits have on the perception and evaluation of different appearance variables need to be analyzed. Until now, only little has been known about the preferences of these special target groups such as elderly or cognitively impaired people [8
] or the influence of user characteristics such as personality traits on the perception of appearance variables. Prior research [8
] used qualitative approaches to analyze the preferences of elderly people in regard to a virtual assistant’s appearance. Although this gives interesting insights, quantitative research is needed to specify these findings more. Additionally, no study investigated the preferences of cognitively impaired people, who are a highly relevant user group [6
]. The current approach aims to fill this research gap and explores the effect of user characteristics on the perception and evaluation of different appearance variables in two studies.
Based on previous research, the species of an agent and the degree of realism were identified as key variables to affect human–agent interaction. Accordingly, prior research demonstrates that especially for a specific target group such as elderly, those appearance variables are the ones of most interest [9
] and thus need to be investigated in more depth.
Although most embedded characters are humanoid [8
], a humanoid character does not necessarily have to be the best choice, since animals, for instance, were found to enhance buying intention [10
] or learning outcomes [11
] more than humanoid characters. Therefore, it is highly important to take the species of the virtual agent into account more. However, most studies did not systematically investigate the effect of species, since most of the used stimulus materials did also differ regarding other variables such as realism. Realism, in itself, has been shown to be relevant not only in terms of uncanny valley effects [12
], but it was also demonstrated that a cartoon stylization can have positive effects on the social interaction [3
] and the agent’s overall evaluation such as appeal or perceived friendliness [13
]. However, realism has not been investigated systematically in combination with the species of the agent. Thus, there is a lack of controlled, systematic research on the effect of species and realism and their potential interactions.
Therefore, the present studies aim to examine the perception and evaluation of these variables in more depth and additionally explore the effects of user characteristics.
3. Study 1
3.1. Outline and Deviation of Hypotheses
Since various application fields result in different target groups and those target groups might differ in their preferences and needs, Study 1 focuses on target group-related differences (namely, the preferences of seniors and cognitively impaired people in comparison to students) in the perception and evaluation of appearance variables. Modern technologies give great opportunities to support elderly or cognitively impaired people in their daily life and to maintain their autonomous living as long as possible [2
]. Virtual assistance systems are, for example, able to remind people, help them to structure their daily life, or support them with making appointments. It has already been demonstrated that people in need of support accept virtual assistants [6
]. These findings are mainly focused on elderly people, but might also be valuable for other target groups, such as cognitively impaired people, as they have similar needs of support. Vardoulakis et al. [23
] designed a social relational agent for elderly people (based on a Wizard of Oz setting) in order to care for their mental health and investigate different topics that users may like to talk about with a conversational virtual agent. After talking for one week with the agent on a daily basis, participants stated high levels of companionship, support, and satisfaction and felt comfortable having a virtual agent to talk to at their homes. Moreover, it has been shown that even if interaction problems occur during interaction with a speech-based conversional agent, elderly people and people with cognitive impairments state good levels of acceptance [6
In summary, an embodied virtual agent is assumed to enhance the daily-life assistance of elderly people and people with special needs. However, those findings are based on interactions with agents that have one specific appearance and thus cannot be generalized for all agents. Since appearance was found to affect the perception and evaluation of a virtual agent in multiple ways, it needs to be investigated how these target groups perceive and evaluate different appearance variables. Thus, what remains open is the question of how a virtual agent for these target groups should look. There are some insights with regard to e-commerce. The results of focus groups with 25 elderly people indicate that abstract agents were preferred, because they are less distracting than humans or humanoid agents [8
]. Although these results provide initial insights, it is unclear whether they are transferrable to other applications and tasks. Since in the context of e-commerce, the presented products need to be in the user’s focus, a humanoid agent might be distracting; however, in the context of personal assistance, a humanoid agent could be perceived as more serious and engaging. A qualitative interview study in the context of daily-life assistance gave first hints that age-related differences with regard to appearance preferences exist [9
]. In contrast to the findings of Chattaraman et al. [8
], seniors stated in these interviews that they would prefer a more realistic humanoid agent, while students rejected this kind of appearance [9
]. It seems as if seniors, who are mostly less experienced with technologies, strive for higher familiarity than students in order to foster their trust in the agent and to remove boundaries of technology usage. However, both studies used qualitative methods and therefore their results are difficult to generalize. The current study aims to investigate these target groups and their preferences in more detail. Based on the findings of Straßmann and Krämer [9
], we hypothesize the following:
Hypothesis 1 (H1).
Seniors evaluate humanoid agents more positively compared to nonhumanoid agents (animals and robots).
Hypothesis 2 (H2).
Students evaluate nonhumanoid agents (animals and robots) more positively than humanoid ones.
Hypothesis 3 (H3).
Seniors and students evaluate varying degrees of realism differently.
Since Yaghoubzadeh et al. [6
] showed that virtual assistance is also accepted by cognitively impaired people and as this technology might be highly beneficial for these people, it is important to investigate their special needs. To the authors’ knowledge, there are no findings reported to date about the preferences of people with cognitive impairments regarding appearance variables. Nonetheless, it has been shown that people with intellectual disabilities preferred more simple visual representations of hyperlinks to browse more easily through the internet [24
]. Consequently, it can be assumed that people with cognitive impairments do also prefer a more simple visual appearance (e.g., a cartoon stylization or reduced detail) to avoid distraction. At the same time, the same principles that are assumed for seniors might be applied, in the sense that this target group requires higher familiarity to enhance acceptance and trust. The current study explores the perception of appearance variables by people with cognitive impairments for the first time and intends to answer the following research question:
How do cognitively impaired people perceive and evaluate different species and degrees of realism?
The first study aims at investigating the effect of the target group on the evaluation of different appearance variables. Therefore, three different target groups (students, seniors, and cognitively impaired people) participated in this study. A mixed design with one between-subjects factor (3 target groups) and two within-factors (3 species × 5 realism) was used and all participants evaluated 30 different virtual agents, which differed mainly in regard to species and realism (see Figure 2
). The order of the presentation was randomized to prevent sequence effects. The questionnaire was programmed with the tool SoSciSurvey [25
]. Participants were invited to the lab, since not all target groups have internet access and to avoid high drop-out rates. The local ethics committee approved the study.
Overall, 59 participants from three different target groups completed the questionnaire; 22 students (12 female; age: M = 21.45, SD = 3.99), 21 seniors (14 female; age: M = 68.14, SD = 8.42), and 16 people with cognitive impairments (3 female; age: M = 45.81, SD = 19.33) took part. Half of the participants (31 participants: 11 students, 12 seniors, and 8 cognitively impaired people) knew what a virtual agent is, but only 13 persons (8 students, 4 seniors, and 1 cognitively impaired person) had interacted with one before. The students’ data were collected at a large European university, while the cognitively impaired people were recruited in a European Health Care foundation and seniors participated partly in both places. Students were incentivized by course credits, while elderly and cognitively impaired people received 7 Euro as a compensation for their participation. All participants classified as seniors needed to be at least 65 years old. For the cognitively impaired sample, no age restriction was used, but participants were all associated to a psychosocial care facility and were chosen by skilled staff of the Health Care Foundation. No concrete diagnoses were provided for these participants, as on the one hand, this is often very difficult to derive even for skilled persons, and on the other hand, we wanted to explore the preferences of this generally heterogeneous group. Since a virtual assistant should be useful and effective for people with all kinds of impairments and needs of support, the goal was not to determine the preferences of people with a specific impairment. During the recruitment process, it was checked that those people have a certain degree of cognitive impairments, but that the participants, on the other hand, were able to express their own opinion correctly.
3.2.2. Stimulus Material
As the stimulus material, 30 different static pictures were used. All pictures had the same size and showed the virtual characters’ head, which varied in species and realism.
With regard to species, humans, animals, and robots were chosen. Since even within this subcategory, the range of possible appearances is enormous, a pretest with a total of 18 agents (6 for each species) was conducted first. Overall, 24 people (18 female, age: M = 32.46, SD = 12.12) who were not participants of the two main studies evaluated these agents in a between-subjects design on single items measuring perceived realism, likability of the agent, and willingness to interact. To guarantee generalizability of the results, two agents for each species have been chosen (for example pictures, see Figure 2
) based on their likability scores from the pretest. Therefore, to ensure that obtainable differences in the main studies would be caused by the manipulation only, stimuli should not differ in perceived likability, realism, and willingness to interact. In addition, those stimuli with the highest likability scores were chosen. In the end, a woman, a man (both created with Autodesk’s character builder: [26
]), a fox, a giraffe, the Nao robot, and a more anthropomorphic robot were used. However, the results of the pretest demonstrated that the agents within the species were not evaluated radically differently. Therefore, both versions of the species were collapsed for the calculations.
Realism was manipulated in 5 different degrees based on prior research [5
]. As mentioned above, multiple subcategories influence the overall perception of realism [9
] and therefore are assumed to influence the overall perception of the agent. To investigate these effects in more detail, two subcategories were manipulated: degree of detail and stylization. Thus, the manipulation can be divided into realistic appearances and stylized appearances with a more cartoon-like look. While the realistic appearances are further divisible into high detail and low detail (comparable to the concepts of Gulz and Haake [27
]), the stylization was either applied to the proportions, to the shading, or both (see [5
]). As realism and its subcategories can be seen as a continuum ranging from high realism to low realism, the created stimuli can be sorted along this continuum: (1) realistic high detail, (2) realistic low detail, (3) stylized shade, (4) stylized proportion, and (5) stylized shade and proportions. With decreasing detail and increasing stylization, the perceived realism of the agent is assumed to decrease as well. The (based on the pretest) chosen appearance pictures all scored medium–high on their resolution. In addition, the appearances had a high level of detail (1), since one can obtain details such as skin or fur properties (for the humanoid agents, wrinkles, rashes, and freckles are displayed, while for the machine-like agents, reflections were obtainable and the fur of the animals was pictured in great detail). To create an appearance with a lower degree of detail (2), those properties were smoothened out. Based on the stimuli with lower detail, the shading (3) was manipulated in a cartoon-like look. Therefore, the outlines were thickened, a soft focus was applied, and the shading itself was manipulated. In order to stylize the agent’s proportions (4), facial properties were varied. In dependence of the species, key features such as the eyes and mouth became bigger, while less important parts such as the chin got smaller. Both manipulations (shading and proportions) were applied to create the last degree of realism (5). All manipulations of the degree of realism have been done manually with Photoshop Elements using the same criteria for all species. Figure 2
presents an example of the used stimulus material. In the end, 30 different pictures of agents were created systematically.
As a dependent variable, the person’s perception was measured repeatedly for each of the 30 stimuli. Five different subscales with an overall 12 items were used: likability, uncanniness, realism, willingness to use, and appeal. Most items have been selected from prior research [13
], and self-constructed items to measure uncanniness were added. With regard to the within-subjects design and the high number of stimuli, we aimed to keep the measurements short. Therefore, realism, willingness to use, and liking were queried using single items. The scale of likability contained 4 items (not attractive (reversed), unlikable (reversed), reliable, and pleasant) and showed good internal consistency (Cronbach’s alpha = 0.81). Furthermore, two items (uncanny and negative) were used to measure uncanniness (Cronbach’s alpha = 0.84). Participants rated their agreement to these items on a 5-point Likert scale ranging from 1: “strongly disagree” to 5: “strongly agree”. Some additional variables were measured, but since these are not in the focus of the study’s research aim, they were not reported in this paper. Since a proportion of the sample had cognitive impairments, the questionnaire was adapted to their special needs in order to guarantee that these participants understand what we intended to measure. The language was adapted in a way that the instructions were simplified, but the items themselves were the same for all three groups to ensure comparability of the results.
The experimenter welcomed the participants to the lab and instructed them to fill in the questionnaire at the computer. Before the actual study began, the experimenter informed the participants about the aim and procedure to gain proper informed consent. As done for the measurements, the whole introduction and debriefing material was also adapted to simple language in correspondence with schooled people from the Health Care Foundation. Therefore, proper informed consent was also ensured for people with cognitive impairments. Furthermore, the experimenter assisted the participants (especially those of need in support) whenever it was necessary. However, participants were asked to answer the questionnaire autonomously and informed they should only rely on the experimenter for comprehension problems. After an initial introduction, participants were asked about their prior experiences with virtual agents. Regardless of their answer, a short description of virtual agents was presented to guarantee that all participants had the same definition of a virtual agent in mind. Further on, participants’ usage intention was queried. Before the stimulus material was presented, participants were asked to imagine a scenario 5 years in the future, where virtual agents are widely spread. This future scenario was used to ensure that participants were free of thinking about technical restrictions while they were evaluating the presented stimuli. When the main evaluation began, all 30 stimuli were presented in a randomized order. After each stimulus, the dependent variables were assessed. Toward the end, sociodemographic variables were queried. Finally, a debriefing informed the participants about the main research questions.
To investigate the presented hypotheses and research questions, multiple mixed (Analysis of variance) ANOVAS with a three-component between-subjects factor of target group (students, seniors, and cognitively impaired people) and the two within-subject factors of species (three factors: human, animal, and robot) and realism (high detail, low detail, stylized shade, stylized proportions, and stylized shade with stylized proportions) have been calculated. When the assumption of sphericity was violated, the Greenhouse–Geisser correction is reported. Interaction effects between the within-subject factors and the between-subject factors were further analyzed by calculating the repeated ANOVAs again separately for each target group. In these cases, post-hoc tests using the Bonferroni correction are presented.
3.3.1. Perceived Realism
First, we examined whether the manipulation was successful and investigated the effect of the manipulation on perceived realism. A significant main effect for species occurred; F
(2, 55) = 37.110, p
< 0.001, ηp2
= 0.399. Post-hoc analyses revealed that all three groups differed significantly from each other. While humans were rated as the most realistic (M = 3.07, SE = 0.11), animals (M = 2.77, SE = 0.12) were less realistic and robots (M = 2.00, SE = 0.11) were least realistic. Further on, a significant main effect for realism emerged; F
(4, 53) = 50.59, p
< 0.001, ηp2
= 0.475. The post-hoc analyses showed that there was no difference between a realistic agent with high and low detail, while the three stylized agents were significantly rated as less realistic than the realistic agent with high and low detail. Furthermore, there was no difference in perceived realism between an agent with a stylized proportion and an agent with a stylized proportion and shading, but a stylized shading was rated as being significantly more realistic compared to both the other stylized agents (see Table 1
for means and standard error).
Target-group effects. However, no main effect for target group could be found, indicating that ratings from students, seniors, and cognitively impaired people were, in general, the same regarding the perception of realism.
With regard to likability, no main effect of species was found, while a main effect of realism occurred; F
(2.555, 153.093) = 9.861, p
< 0.001, ηp2
= 0.150. Post-hoc analyses revealed that agents with both nonstylized degrees of realism were perceived as significantly more likable than agents with stylized proportions or agents whose shade and proportions were stylized (see Table 2
. Overall, there was no significant difference in the target groups’ likability evaluation. However, significant interaction effects of the species and target group (F
(3.962, 110.938) = 11.151, p
< 0.001, ηp2
= 0.285) as well as of realism and target group (F
(5.110, 143.093) = 16.881, p
< 0.001, ηp2
= 0.376) were found. Post-hoc analyses revealed the following patterns: Students perceived no differences between the three species in likability, while seniors and cognitively impaired people did. Seniors rated the robot (M = 2.16, SE = 0.12) as significantly less likable than humans (M = 3.25, SE = 0.16) and animals (M = 2.94, SE = 0.18). In contrast, cognitively impaired people evaluated robots (M = 3.53, SE = 0.22) as being more likable than animals (M = 2.77, SE = 0.16), while they perceived no differences concerning likability between robots and humans (M = 3.09, SE = 0.24) or humans and animals. With regard to realism, students rated agents with both nonstylized degrees of realism as being significantly more likable than the three stylized versions, while seniors only saw differences in likability between the realistic version with low detail and the stylized proportions. Cognitively impaired people perceived agents with all degrees of realism as being equally likable. (Consult Table 2
for means and post-hoc analyses.)
Further calculations showed significant main effects of species (F
(2, 112) = 3.735, p
= 0.027, ηp2
= 0.063) and realism (F
(2.746, 153.767) = 252.843, p
< 0.001, ηp2
= 0.820) on perceived uncanniness. Post-hoc analysis demonstrated that animals (M = 2.12, SE = 0.09) were evaluated as being significantly less uncanny compared to robots (M = 2.38, SE = 0.09), while humans (M = 2.26, SE
= 0.08) did not differ significantly from animals or robots. Further on, agents with a stylized shade (also true for the combination with a stylized proportion) were perceived as being less uncanny compared to those with a realistic shading (high detail, low detail, and stylized proportions) (see Table 3
. Again, no main effect of target group emerged as being significant. However, significant interaction effects of the species and target group (F
(3.959, 110.853) = 7.792, p
< 0.001, ηp2
= 0.218) as well as of realism and target group (F
(5.492, 153.767) = 11.292, p
< 0.001, ηp2
= 0.287) were found. Students again perceived no differences in uncanniness between the species. The same pattern that was found for likability with regard to robots occurred for seniors and cognitively impaired people: seniors perceived robots (M = 2.64, SE = 0.14) as being more uncanny than humans (M = 1.90, SE = 0.13) and animals (M = 2.01, SE = 0.14), while cognitively impaired people evaluated robots (M = 2.13, SE = 0.23) as being least uncanny (humans: M = 2.39, SE = 0.20; animals: M = 2.71, SE = 0.16). Moreover, cognitively impaired people evaluated appearances without any stylization as being more uncanny than students and seniors did (Table 3
3.3.4. Liking of the Agent
Appearance effects. Further on, we found significant differences between the species with regard to the participants’ liking of the agent (F(2, 112) = 5.232, p = 0.007, ηp2 = 0.085). Post-hoc analyses revealed that humans (M = 2.68, SE = 0.11) were more liked than robots (M = 2.25, SE = 0.11), while animals (M = 2.50, SE = 0.13) did not differ from robots or humans. The degree of realism also affected the participants’ evaluation of liking (F(2.957, 165.615) = 19.350, p < 0.001, ηp2 = 0.257). The lower the degree of realism, the less people liked the appearance.
Agents with a realistic rendering (high and low detail) were significantly more liked than appearances with a cartoon stylization (all three conditions), while an appearance with cartoon-stylized proportions evoked significantly higher ratings of liking than the combination of a stylized shading and stylized proportions (see Table 4
. Target groups did not differ in their evaluation of liking. However, again, interaction effects were obtained for target group and species (F
(3.809, 106.662) = 4.980, p
< 0.001, ηp2
= 0.151) as well as for realism (F
(5.815, 165.615) = 4.964, p
< 0.001, ηp2
= 0.151). Again, students’ liking of the species was not significantly different, and seniors liked robots (M = 1.52, SE = 0.12) significantly less than humans (M = 2.37, SE = 0.26) and animals (M = 2.26, SE = 0.21). In contrast to prior findings, cognitively impaired people liked animals (M = 3.38, SE = 0.17) more than robots (M = 2.64, SE = 0.25), while there was no significant difference between robots and humans (M = 2.81, SE = 0.24) or humans and animals. Furthermore, for seniors and cognitively impaired people, the degree of realism had no effect on whether they liked the agent. Students liked more realistic agents more than ones with stylized proportions and proportions and shade (see Table 4
3.3.5. Willingness to Use
Moreover, the participants’ willingness to use the agent was examined. The mixed ANOVA showed significant main effects for species (F
(2, 112) = 3.798, p
= 0.025, ηp2
= 0.064) and realism (F
(2.916, 163.274) = 16.062, p
< 0.001, ηp2
= 0.223). Post-hoc tests showed that participants were more likely to use a humanoid agent (M = 2.62, SE = 0.11) compared to a machinelike one (M = 2.27, SE = 0.10). Similar to participants’ liking behavior, the usage intention decreased with the degree of realism. Post-hoc tests revealed that appearances with a realistic stylization evoked the highest usage intention values, while participants stated that they were least likely to use an agent with stylized shading combined with stylized proportions (see Table 5
. Further on, a significant main effect of target group was found (F
(2, 56) = 735.755, p
< 0.001, ηp2
= 0.929). Cognitively impaired people (M = 3.08, SE = 0.17) stated higher values of usage intention compared to seniors (M = 2.01, SE = 0.15) and students (M = 2.21, SE = 0.15). Additionally, interaction effects of the manipulations and the target groups were found (species: F
(3.73, 104.54) = 4.047, p
= 0.005, ηp2
= 0.126; realism: F
(5.83, 163.27) = 4.913, p
< 0.001, ηp2
= 0.149). The same patterns that were shown with regard to liking behavior occurred for usage intention: for students, the species had no effect on their usage intention, while seniors liked to use robots (M = 1.52, SE = 0.13) significantly less than humans and animals (humans: M = 2.23, SE = 0.27; animals: M = 2.27, SE = 0.21) and cognitively impaired people showed higher usage intention for animals (M = 3.38, SE = 0.18) compared to robots (M = 2.88, SE = 0.23). With regard to realism, students stated a higher usage intention for agents with a realistic stylization, while seniors and cognitively impaired people showed no differences with regard to realism in their usage intention (see Table 5
3.4. Interim Conclusions
In line with prior research, the presented findings give hints that species and realism affect the agent’s evaluation and user’s usage intention of the agent. While there was no main effect of species on the perceived likability, robots were evaluated as being more uncanny compared to animals. However, humanoid agents were more liked and evoked higher usage intention compared to machinelike agents. Although appearances with stylized shading were perceived as being less uncanny than those with realistic shading, a higher degree of realism was rated as being more likable than a stylized appearance. Overall, the liking of the agent and user’s usage intention decrease with the degree of realism.
Results of Study 1 suggest that overall, there are differences between the target groups. For students, the species of the agent seems to be of lower importance, since they evaluated the agent’s person perception of the different species equally. Overall, seniors rejected robots and rated them more negatively than humans and animals. Cognitively impaired people rated robots as being more likable and less uncanny than other species, but liked animals more and stated higher usage intention for animals. In contrast, the degree of realism has mostly no effect on person perception for seniors and cognitively impaired people, while students evaluated a more realistic appearance as being more positive. Thus, results indicate that students seem to rely more on realism, while the species is more important for seniors and cognitively impaired people.
4. Study 2
4.1. Outline and Deviation of Hypotheses
Results of Study 1 give the first insights into the effect of species and realism on an agent’s perception and evaluation, especially in regard to target group-related differences. Nevertheless, this study has some limitations that make it difficult to derive generalizable implications. Due to the different target groups and the high effort of recruiting this group of people, the sample size of Study 1 was rather small and a within-subjects design was used. Therefore, these characteristics might have influenced the results and produced more differences between stimuli.
Study 2 therefore employs a between-subjects design in order to complement the findings of Study 1. Instead of focusing on differences between specific user groups, the current study mainly investigates the effects of species (RQ1), realism (RQ2), and their potential interaction effect (RQ3) in a more controlled manner with a larger sample. In addition, to further contribute to the question of user characteristics, different moderating variables are taken into account to investigate the influence of personality traits. This is based on research which shows that personality traits have an impact on the evaluation of an artificial entity [29
One user characteristic of interest is the tendency to anthropomorphize, which can be defined as “the tendency to apply human characteristics (i.e., emotions, motivations, and goals) to nonhuman animals, objects, and natural entities” [30
] (p. 214). It has already been shown that anthropomorphism tendency is positively related to the perception of uncanniness [31
]. However, these insights were found in the context of the perception of readers of fiction, and thus it has to be investigated whether they are also valid for the perception of artificial entities. However, since both are fictional, the same principles might well be applicable, and it can be therefore assumed that the tendency to anthropomorphize affects the perception of different appearance variables. People who have a higher tendency to anthropomorphize might feel more comfortable to interact with a fictional character, since they are more able to attribute humanoid characteristics to nonhumanoid characters. Therefore, we assume the following:
Hypothesis 4 (H4).
A higher tendency to anthropomorphize leads to more liking of nonhumanoid agents.
In the context of human–robot interaction, negative attitudes and anxieties towards these technologies were found to affect the interaction [32
]. We assume that these effects are also transferable to virtual agents. People with negative attitudes or anxieties toward agents might evaluate varying appearance variables differently. For instance, people who are afraid of social harm caused by virtual agents might be more likely to wish for a clear distinction between the virtual and real world and therefore prefer a higher degree of stylization and more artificial species.
Hypothesis 5 (H5).
Negative attitudes and anxieties towards virtual agents moderate the effect of species and realism on liking and usage intention.
For Study 2, an online study with a 3 (species: human, animal, and robot) × 5 (degree of realism: high detail, low detail, cartoon proportions, cartoon shade, and cartoon proportions and shade) between-subjects design has been conducted. The questionnaire was accepted for the recruitment of participants by the SoSciSurvey panel [33
]. This panel has about 65,000 active members. While the age is almost balanced, most of those people in the panel have a high level of education. The local ethics committee approved the study.
Overall, 792 people filled in the questionnaire. Gender was not equally balanced, since 304 (39%) men and 471 (59%) women participated. In addition, 17 people (2%) did not want to state their gender. Age ranged from 15 to 80 years, with an average age of 39 years (M = 38.63, SD = 14.77). When participants had interacted with an agent before, most of them had talked to an agent without an embodiment (78%).
4.2.2. Stimulus Material
Appearance was manipulated in the same way as in Study 1. In this way, species were varied between humans, animals, and robots (with two variations of each species), which were manipulated in five different degrees of realism (high detail, low detail, stylized shade, stylized proportions, and stylized shade and proportions). Thus, the same 30 pictures (see. Figure 2
) were used, but this time, in a between-subjects design. Every participant saw and evaluated only one picture.
As dependent variables, participants evaluated the stimulus material regarding person perception, liking, perceived usefulness, usage intention, and trust. Person perception was measured with five different subscales: anthropomorphism, likability, appeal, trustworthiness, and competence. These were five-point semantic differentials and most of the items originated from the Goodspeed Questionnaire [28
] and the measures of McDonnell et al. [13
]. Anthropomorphism (e.g., unreal–real or machine-like–humanoid) and likability were measured with six pairs of adjectives, while appeal (e.g., appealing–not appealing or attractive–unattractive), trustworthiness (e.g., trustworthy–not trustworthy or reliable–not reliable), and competence (e.g., competent–incompetent, intelligent–not intelligent) contained four item pairs. Liking was a self-constructed scale with five items (e.g., if I had a personal virtual agent, I would wish that the agent would look exactly like this) to measure how much participants liked the presented appearance. To measure usage intention, perceived usefulness, and trust, a scale from Heerink et al. [34
] was transferred to the application of virtual agents. Participants rated the items of those scales on a five-point Likert scale ranging from 1: “strongly disagree” to 5: “strongly agree”. All dependent variables showed good or excellent internal consistency (see Table 6
To measure the users’ tendency to anthropomorphize things, a scale with 10 items (e.g., I sometimes wonder if my computer deliberately runs more slowly after I have shouted at it) was used [30
]. From the original scale with 20 items, only those 10 items focusing on the present behavior and feelings were used. The internal consistency was good (Cronbach’s alpha = 0.82). The users’ attitude toward agents (three items, e.g., I think it’s a good idea to use a virtual agent) and anxiety towards agents (four items, e.g., if I should use a virtual agent, I would be afraid to make mistakes with it) originates from Heerink et al. [34
]. Theses scales showed an acceptable reliability (Cronbach’s alpha = 0.84 and 0.78, respectively). Additionally, the scale of negative attitude towards robots [35
] with its three subscales was transferred to the context of virtual agents. Although the Cronbach’s alpha values were not sufficient (ranging from 0.59 to 0.69), this scale was used, since it was very relevant for the presented research questions and is a well-established scale in the realm of human–robot interaction. Again, participants used a five-point Likert scale to state their agreement for all moderating variables. Furthermore, participants stated their age, gender, educational background, and prior experiences with virtual agents.
In the beginning, participants were welcomed and thanked for their willingness to participate. After this initial introduction, their tendency to anthropomorphize was queried. Thereafter, general questions about virtual agents such as prior experiences or the participant’s attitude towards this technology were assessed. Before the picture of the agent was presented, participants were asked to imagine the same future scenario as the one used in Study 1. Each participant saw one of the 30 different agents and rated it with regard to the dependent variables. After the participants’ sociodemographics were measured, participants were debriefed. At the end, participants had the chance to take part in a lottery.
4.3.1. Person Perception
To investigate the effect of species and realism on person perception, a (multivariate analysis of variance) MANOVA with two between-subject factors (species and realism) on anthropomorphism, likability, appeal, trustworthiness, and competence has been calculated. Main effects of species on all variables except for competence were found (see Table 7
). All three species differed with regard to anthropomorphism; while animals scored highest, humans reached moderate values and robots were perceived as being least anthropomorphic (Table 8
). With regard to likability, appeal, and trustworthiness, animals differed significantly from humans and robots: animals were perceived as more likable, more appealing, and more trustworthy than humans and robots (see Table 8
In addition, a significant main effect of realism occurred for anthropomorphism (F
(4, 456) = 7.459, p
< 0.001, ηp2
= 0.061) and appeal (F
(4, 456) = 3.493, p
= 0.008, ηp2
= 0.030). Post-hoc analyses with Bonferroni correction revealed that the stylization of the shade leads to lower anthropomorphism, since a stylized shade (also in combination with a stylized proportion) differed significantly from both versions without any stylization (high and low detail) (see Table 9
). In addition, the realistic style with lower detail was perceived as being more appealing than the complete cartoon stylization. Interaction effects of both independent variables were only found for anthropomorphism (Table 7
). Animals were seen as more anthropomorphic than humans and robots for most of the realism degrees, but when the cartoon stylization was applied to both proportion and shade, no difference between the species occurred (Figure 3
4.3.2. Liking of the Agent
Moreover, a two-factorial ANOVA with species and realism as independent variables and liking as the dependent variable showed a main effect only for species (F(2, 734) = 7.578, p = 0.001, ηp2 = 0.020), indicating that animals (M = 1.96, SD = 0.96) and robots (M = 1.94, SD = 0.99) were more liked than humans (M = 1.68, SD = 0.74). However, no main effect for realism occurred, nor was an interaction effect found.
4.3.3. Usage Intention, Perceived Usefulness, and Trust
A second MANOVA has been calculated with usage intention, perceived usefulness, and trust towards virtual agents as dependent variables and species and realism as independent variables. We found significant main effects of species (usage intention: F
(2, 744) = 1.769, p
= 0.008, ηp2
= 0.013 and perceived usefulness: F
(2, 744) = 6.899, p
= 0.001, ηp2
= 0.018), while no effect of realism nor an interaction effect was revealed. Usage intention was higher for robots than for humans and animals, while robots also were perceived as being more useful than animals (Table 10
4.3.4. Moderating Variables
To investigate the effect of user characteristics such as age, gender, anthropomorphic tendency, attitudes toward agents, anxieties toward agents, and the negative attitude toward agents on usage intention and liking of the agent, two hierarchical regression analyses were calculated separately. Predictors were entered in the following order: age and gender (step 1); the tendency to anthropomorphize (step 2); and attitude toward agents, anxieties toward agents, and all three subscales of negative attitude toward agents (step 3). As presented in Table 11
, the anthropomorphism tendency and participants´ attitude toward agents significantly explains the variance of usage intention and liking of the agent. The greater the tendency to anthropomorphize or the more positive the participant’s attitude toward agents, the higher the participant’s usage intention and liking of the agent. Moreover, the anxiety toward agents and negative attitudes toward the social influence of agents significantly contribute to the explanation of the variance of usage intention (Table 11
). The usage intention increases with anxiety towards agents and decreases with increasing negative attitudes towards the social influence of agents.
To investigate a possible moderating influence of anthropomorphism tendency, attitudes toward agents, and anxieties toward agents on the effects of both independent variables (species and realism) on participants’ liking and usage intention, multiple moderation analyses using the Hayes process were calculated separately. The effects of the potential moderating variables are presented in Figure 4
and Figure 5
. Against our predictions, all 12 moderation analyses revealed no moderation effects of anthropomorphism tendency, attitudes toward agents, and anxieties toward agents in the effect of species or realism on liking and usage intention.
4.4. Interim Conclusion
All three species vary in their perceived degree of anthropomorphism, in the sense that animals were perceived as most anthropomorphic, humans reached moderate anthropomorphism ratings, and robots were evaluated as least anthropomorphic. Additionally, animals were seen as more likable, more appealing, and more trustworthy than humans and robots. While animals and robots were more liked than humanoid agents, participants’ usage intention was higher for robots than for animals and humans.
The stylization of the shade leads to lower anthropomorphism ratings, while a realistic agent with low detail was evaluated as being more appealing than an agent with stylized shades and proportions. The degree of realism had no effect on likability, liking of the agent, or usage intention.
Regression analyses revealed that participants’ tendency to anthropomorphize as well as their general attitude towards agents predicts the liking of the agent and participants’ usage intention. However, no moderating effects were found for user characteristics.
5. General Discussion
In the context of human–agent interaction, the appearance of the agents was found to have main effects. Prior research demonstrates that the species (e.g., human or animal) and realism (e.g., cartoon shade) of the agent affect the agent’s evaluation [5
] and is therefore important for the outcome of the human–agent interaction. However, until now, both factors had not been examined in a controlled and systematic way. Therefore, we varied three different species (human, animal, and robot) and five degrees of realism (realistic-style high detail, realistic-style low detail, cartoon-stylized shade, cartoon-stylized proportions, and complete cartoon stylization) within two different studies. We investigated how different species (RQ1) and degrees of realism (RQ2) were evaluated and whether there is an interaction effect of both variables (RQ3). Since many applications have special target groups (e.g., people in need of support), we further investigated the preferences of different user groups (H1, H2, H3, and RQ4) and the influence of further user characteristics such as personality traits (H4 and H5). When the influences of the target group and user characteristics are examined, the appearance can be tailored to the needs of the users. This will enhance the interaction and acceptance of virtual agents. In the following, the results of both studies are summarized and discussed based on prior research.
5.1. Effects of Species (RQ1)
With regard to species and research question 1 (RQ1), both studies present contradicting findings. While Study 1 showed that users preferred humans to robots and showed higher usage intention for humans, this was not the case in Study 2. People seem to prefer humanoid agents in a direct comparison (as in the within-subjects design of Study 1), but this is not true when only one agent is presented (as in the between-subjects design of Study 2). The between-subjects design showed that animals were perceived as more likable, appealing, and trustworthy than humans and robots. Moreover, participants liked animals and robots more than humans, and those species evoked higher usage intention. As the results of Study 2 represent a wider sample (no specific target groups and higher sample size) and these results are more generalizable than those of Study 1, nonhumanoid agents seem to be evaluated as being more positive than humanoid ones. However, most of the agents employed in current systems are humanoid. Based on our findings, the gold standard of providing a humanoid agent needs to be reconsidered. While for specific target groups such as senior people, humanoid agents might be more appropriate, the results of Study 2 emphasize that at the same time, a majority were found to like nonhumanoid agents more than humanoid ones.
5.2. Effects of Realism (RQ2)
The investigation of research question 2 (RQ2) also led to contradictory findings. Study 1 demonstrates that participants liked appearances with a higher degree of cartoon stylization less than those with a realistic style. The same pattern was obtained for the usage intention of the participants. Furthermore, agents with a realistic stylization were evaluated as being more likable. However, these findings were not replicated in our second study, since the degree of realism had no effect on likability, liking of the agent, or participants’ usage intention. Previous studies mostly showed that lower degrees of realism evoke a more positive perception of the agent [10
]. However, there is prior research that supports findings of Study 1, since Van Wissen et al. [22
] showed that participants preferred a realistic-looking agent over a cartoon-stylized agent to be their virtual nurse. The most reasonable explanation for these results seems to be the type of application, since Ring et al. [5
] and also van Wissen et al. [22
] showed that realistic agents are more appropriate for medical tasks. In line with this, Robertson et al. [37
] found that people even stated that they get angry about the use of cartoon-stylized agents in medical applications, since it somehow stultifies a very serious task. In our studies, the task itself was not explicitly defined, but participants were asked to imagine the agents as their own personal assistant in everyday life. Thus, this application field might underlie the same principles as a medical task, since both applications aim to provide support and help. Therefore, an appearance with a higher degree of realism seems to be more appropriate. The discrepancy between both studies might be caused by the different designs, since Study 1 used a within-factor design, while participants in Study 2 only evaluated one agent. Realism seems to be more important when users are able to compare different appearances. Since the different realism degrees were manipulated in five small steps (relying on different subcategories), the differences might be much clearer in a direct comparison. While one can easily obtain the agent’s species (and the perception associated therewith), the realism of the agent is more complex and multilayered and its degrees are more subtle. Therefore, differences might be stronger in a within-subject design.
5.3. Interaction Effects of Species and Realism (RQ3)
Furthermore, the present studies examined the interaction effects of species and realism for the first time in a systematic manner (RQ3). However, nearly no interaction effects were found (with the exception of perceived anthropomorphism). It can be concluded that the stylization of an agent has similar effects on all kinds of species (such as humans, robots, and animals). In line with this, prior research showed that animals with atypical features such as enlarged eyes were rated as less familiar [38
], which is also true for humanoid agents with stylized proportions. While in Study 1, the effects of realism degrees were bigger (higher effect sizes), in Study 2, effects of the agent’s species were stronger. Thus, as described above, the degrees of realism seem to be more important in a direct comparison (e.g., when users are able to choose from alternatives), while the species was more decisive in a between-subjects design. When people only see one agent, the species of the agent is more decisive for the impression-building process.
5.4. Effects of the Target Group (H1, H2, H3, H4, H5, and RQ4)
This was based on initial hints that indicated that there are age-related differences with regard to appearance preferences [9
]. Based on these findings, we assumed that seniors evaluate humanoid agents more positively than nonhumanoid ones (H1), while students might rate nonhumanoid agents more positively (H2). This assumption was found partly to be true with regard to the evaluation of robots. While students liked robots the most and showed highest values of usage intention for robots, seniors clearly rejected machinelike agents. These findings are in line with the results of prior qualitative research [9
]. In interviews, seniors stated that they would prefer a humanoid agent, since the interaction with it is more familiar as they talk to humans all the time, while animals or other species cannot answer appropriately [9
]. However, these results cannot be generalized to all people in need of support, since cognitively impaired people evaluated robots more positively and stated higher usage intention and liking for animals; this group seems to prefer nonhumanoid agents more. While for seniors, a high familiarity seems to be of importance and only a humanoid appearance is appropriate as their assistant, cognitively impaired people are less restricted to humans. The cognitively impaired people are probably used to receiving assistance in their daily life, while seniors mostly were more afraid to eventually be in need of support. Therefore, for seniors, it might be more important to have something familiar and serious as a humanoid agent, while cognitively impaired people are a little more open-minded in the sense that they are willing to accept assistance from different species. Differences between those target groups were also found in relation to realism. Thus, hypothesis H3 was confirmed. However, in contrast to the findings of Straßmann and Krämer [9
], seniors evaluated a cartoon stylization more positively, while students rated realistic agents more positively. The overall patterns of Study 1 indicate that for people in need of support (seniors and cognitively impaired people), the species is more important for their evaluation, liking, and intention to use. In contrast, for students, the degree of realism was more important than it was for seniors and cognitively impaired people. Results emphasize that there is no universal appearance factor that is appropriate for various user groups.
5.5. Effects of User Characteristics (H4 and H5)
Overall, no moderating effects of personality or attitudes were found. Therefore, hypothesis H4 and H5 need to be rejected. Nevertheless, anthropomorphism tendency and participants’ attitude toward virtual agents predict the liking of the agent and usage intention. While those variables were found to not moderate the effects of appearance variables, they influence whether people are willing to interact with a virtual agent and whether they like it. People with a high tendency to anthropomorphize and a more positive attitude toward virtual agents liked the agent more. Furthermore, the attitude and anxiety towards virtual agents affect the willingness to use it. Thus, to enhance the interaction with virtual agents, people´s attitudes need to be improved. Although we found no moderating effects with regard to appearance, future research should investigate whether appearance variables can have a positive effect on people’s attitudes or decrease anxieties toward agents in general.
6. Limitations and Future Work
For both studies, several limitations have to be taken into account. One major limitation is caused by the presentation of pictures only. Since participants evaluated the agent based on static pictures, only limited generalizations to user behavior in real human–agent interactions can be made. Although the used method was also beneficial, since participants could not get distracted by the agent’s behavior or interaction characteristics and therefore evaluated appearance factors only, studies with agents that interact with the users are needed. Thus, future studies should investigate whether the presented findings are transferable to user behavior in real interactions and whether there is still an influence of appearance on perceptions.
Even though the stimulus material was prepared in a more systematic way than in previous studies and was based on a pretest, the stimuli are still characterized by other appearance variables beside species and realism (e.g., styling of humanoid agents, chosen animals, or colors). To minimize the effect of those variables, two versions for each species have been chosen and evaluations have been collapsed, which leads to a higher generalizability of the findings. In addition, as mentioned above, the degrees of realism that were manipulated were relatively similar. Although the very systematic manipulation of the appearance factors is a big strength of the present research, with regard to our findings, it might also be a limitation. As only differences in realism in a within-subjects design have been found, it has to be mentioned that differences are very subtle and might only be obtainable in a direct comparison. Despite the fact that the participants were instructed to imagine a future scenario in order to reduce mental restrictions, e.g., with regard to technical implementations, participants might still have been limited in their imagination.
Study 1 aimed to investigate the preferences of different target groups and especially focused on people in need of support. Therefore, elderly and cognitively impaired people participated. It has to be noticed that some participants of the cognitive impairment condition could also be seen as seniors, since their age was over 65. However, the distinction has been made based on participants’ cognitive skills, which were evaluated by schooled employees of an established Health Care Foundation. Since people with different sorts of impairments—which had not been specified, except for the fact that those people had low cognitive skills—participated, findings are limited and could not be generalized to specific impairment groups (e.g., autistic people). Since Study 1 indicates that people with cognitive impairments, in general, have specific preferences, future studies could focus on specific impairments to obtain deeper insights into the preferences and perceptions of this target group. In addition, the sample size of the cognitively impaired subsample was relatively low, since the recruitment of these participants is not easy. Therefore, the findings of this study can rather be seen as initial hints. However, since virtual assistance is highly beneficial for those users, our research is a valuable start in the investigation of the needs of this target group.
The within-subjects design of Study 1 also has to be reflected critically, since the evaluation of all 30 pictures took almost one hour for some of the participants. Especially for people with cognitive impairments, the participation most probably was very exhausting, although the language of the questionnaire was adapted to the participants’ skills to lower the cognitive effort. In order to avoid fatigue, the experimenter helped the participants and gave them a break whenever it was needed. However, it might be beneficial to replicate the effects of the target group in a between-subjects design.
In the present research, two studies have been conceptualized to supplement each other, but the results of both studies are contradictory rather than supporting. Possible explanations have been addressed in the discussion part, but nevertheless, further studies are needed to clarify these contradicting findings. Although such contradictions might decrease comprehensiveness of the implications, with regard to open science and the need for replications of scientific findings, contradicting results might occur more often. Overall, these results highlight that multiple (even more than two) studies are always needed to achieve reliable insights.