The use of intelligent virtual agents (IVA) to support humans in social contexts will depend on their social acceptability. Acceptance will be related to the human’s perception of the IVAs as well as the IVAs’ ability to respond and adapt their conversation appropriately to the human. Adaptation implies computer-generated speech (synthetic speech), such as text-to-speech (TTS). In this paper, we present the results of a study to investigate the effect of voice type (human voice vs. synthetic voice) on two aspects: (1) the IVA’s likeability and voice impression in the light of co-presence, and (2) the interaction outcome, including human–agent trust and behavior change intention. The experiment included 118 participants who interacted with either the virtual advisor with TTS or the virtual advisor with human voice to gain tips for reducing their study stress. Participants in this study found the voice of the virtual advisor with TTS to be more eerie, but they rated both agents, with recorded voice and with TTS, similarly in terms of likeability. They further showed a similar attitude towards both agents in terms of co-presence and building trust. These results challenge previous studies that favor human voice over TTS, and suggest that even if human voice is preferred, TTS can deliver equivalent benefits.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.