Exploring Users’ Mental Models for Anthropomorphized Voice Assistants through Psychological Approaches

Park, Dasom; Namkung, Kiechan

doi:10.3390/app112311147

Open AccessArticle

Exploring Users’ Mental Models for Anthropomorphized Voice Assistants through Psychological Approaches

by

Dasom Park

^1,† and

Kiechan Namkung

^2,*,†

¹

Department of Smart Experience Design, TED, Kookmin University, Seoul 02707, Korea

²

Industry Academic Cooperation Foundation, Kookmin University, Seoul 02707, Korea

^*

Author to whom correspondence should be addressed.

^†

Equally contributed.

Appl. Sci. 2021, 11(23), 11147; https://doi.org/10.3390/app112311147

Submission received: 30 September 2021 / Revised: 22 November 2021 / Accepted: 22 November 2021 / Published: 24 November 2021

(This article belongs to the Special Issue State-of-the-Art in Human Factors and Interaction Design)

Download

Browse Figures

Versions Notes

Abstract

Various perspectives are being studied to increase the usability and persistence of voice assistants (VA) as the use is rapidly expanding to various domains. Particularly, many studies note that users personify VA. Systems designed to suit users differing mental models while using specific systems can provide a positive user experience, increasing usability and persistence. Therefore, we tried to structure the mental model of users using personified VA and proposed these results as an important factor to consider in personifying VA. To determine important factors to consider in personifying VA, this research structures the mental model of users using personified VA. Furthermore, we used two types of psychological approaches that were not applied in previous studies to analyze users’ mental models. Using two types of psychological approaches that were not applied in previous studies, this research analyzed users’ mental models. In Study 1, each user’s thinking process is derived through ZMET (Zaltman metaphor elicitation techniques) as a consensus map. Afterward, in Study 2, correlations between the key components analyzed in Study 1 are validated through RG (repertory grid technique). As a result, the research found that there are three different psychological structures. The first structure is of users who feel human-like empathy and warmth in the use of VA. Meanwhile, the second structure is of users who seek help with problem-solving. The last is the psychological structure of users who regarded anthropomorphic VA as just a machine. Users with this mental model expect the potential for development as a machine rather than the personification of VA. Ultimately, this research is meaningful in that it analyzes each user’s psychological mechanism for personified VA through a psychological approach and derives three new mental models in detail.

Keywords:

voice assistants; users’ mental models; anthropomorphized agent; psychological approaches; ZMET (Zaltman metaphor elicitation technique); repertory grid

1. Introduction

The size of the artificial intelligence (AI) industry market is growing at an alarming rate. Today, we have easy access to a wide range of products and services equipped with AI technology. Artificial intelligence assistants with voice recognition technology are one of the most frequently used interactive products. Amazon launched a product called Echo, which was built with its assistant Alexa, and took up 70% of the market share in the U.S. [1]. In 2011, Apple launched an assistant called Siri, which is currently used by more than 500 million users [2]. Smart speakers, a representative product equipped with voice recognition technology-based artificial intelligence assistants, exceeded 74.2 million users by 2019 [3], and more than 190 million smart speakers are expected to be used by 2023 [4]. This explosive growth can be attributed to the hands-free technique, a unique characteristic of voice user interfaces controlled by voice, but is not comprehensive enough to explain the adoption of this new technology [5].

Many researchers have explored new models to find clues to increase the usability and persistence of voice assistants, which mainly adopt the computers are social actors (CASA) [6] paradigm to assume that users personify non-human objects. Of course, whether AI systems such as voice assistants can have a mind is an area of intense debate in the cognitive science and philosophy communities [7], but whether humans perceive the mind in their interactions with AI is an empirical research topic on human perception [8]. However, this paradigm demonstrated that humans interact socially with AI [9] and identify and assign human personalities [10], and this further expanded to the argument that people apply stereotypes about human-computer interaction [11]. These stereotypes affect the users’ mental models, which can be very important because the congruence of this mental model and the actual systems functioning are used for decisions regarding the use [12]. However, prior works on voice assistants have focused on eliciting and validating the perceived emotional elements of users through interactions with voice assistants personified by various personalities rather than deriving and quantifying mental models of these users. As such, most previous studies verify the positive empathy of users when providing emotional intervention in conversation-based human-computer interactions, but there are also users who want to distance themselves without the need for these emotional factors [13], and eventually it may be very important to group users’ various emotional responses to VAs.

Therefore, our work aims to derive users’ mental models of voice assistants from an anthropomorphic perspective, and as a methodology, we use a psychological approach. The findings may explain the perceived emotional factors that users have in their interactions with voice assistants validated in prior studies and the fundamental reasons for their effectiveness, which will help design future voice assistants.

Research question (RQ): How is the mental model of users who use anthropomorphistic voice assistants formalized?

2. Theoretical Background

2.1. Anthropomorphized Personality

A system that provides assistance, monitoring, and companion services to users is known as a companion system, which shares the vision that computers are partners, not tools [13]. The field of studying the interaction between these companion systems and users is called user-companion interaction (UCI), and various studies exist on this topic. Julia et al. found in qualitative studies in the UCI field that users tend to individually ascribe (mostly human-like) characteristics to the system in order to turn it into a potential relational partners defining characteristics of relational ascriptions, reasons for their formation, factors that influence their content and quality as well as factors which are influenced by the ascriptions [14].

Anthropomorphism, referred to as a prevalent way of thinking [15], is one of the most frequently studied concepts in the study of human-AI interaction [16]. These studies mainly address how anthropomorphism imbues human characteristics, motivations, intentions, or emotions to real or imagined behavior of nonhuman agents [17] and affects interactions with users. Since non-human agents such as AI possess features that more or less resemble human features [16], how the human-like features affect user interaction is a notable question.

Among the many characteristics of human beings, personality helps us decide whether we aim to converse with an opponent, and is an essential factor that allows us to adjust our expectations [18]. The correct matching of a digital agent’s personalities to user expectations is a crucial prerequisite for a positive user experience [19]. Previous studies have focused on classifying the personalities of digital agents and identifying and verifying positive emotional elements that users receive through these.

The consistent personality of digital agents created through anthropomorphism can help users predict their behavior [20]. To do this, digital agents and users need a shared understanding of acceptable behavior [21]. The traditional approach for the classification of personalities is the big five model by McCrae & Costa [22], consisting of openness, conscientiousness, extraversion, agreeableness, and neuroticism (OCEAN), among which extraversion is the most prevalent dimension in HCI studies, as it has high informative value and is easy to observe [23]. Several other studies have long explored the personalities of inhumane objects [24,25,26,27], and Poushneh [28] has identified seven personalities of voice assistants, namely functional intelligence, aesthetic appeal, protective quality, sincerity, creativity, sociability, and emotional intelligence, based on these studies. Also, Aggarwal & McGill [29] found that schema matching was one of the mechanisms involved in anthropomorphism, which found that people preferred the target product more when the activated human schema were congruent with product characteristics.

Although the personality of a non-human agent expressed through anthropomorphism is not expressed by its internal states and motivation [30], many prior studies have demonstrated that the mood and emotion an agent has expressed influences the user’s decision making. For this reason, developers are working on algorithms to give non-human agents notable personalities, but few studies have thoroughly explored the underlying mechanisms of anthropomorphism [16].

2.2. Perceived Emotion

Technical support of digital agents is well-known [31,32], but little is known about emotional support. Emotional support is defined as the empathy and reassurance provided by others [33,34]. Many researches have shown the effectiveness of this emotional support [34,35,36,37]. Users feel emotional satisfaction through interaction with digital agents of various personalities, leading to positive support for users [38,39].

Kim and his colleagues found that the anthropomorphism of AI assistants affects users perceived warmth and pleasure [40]. Warmth captures traits related to the social object’s perceived intentions, including but not limited to trustworthiness, sincerity, and friendliness [41]. Warmth and competence are two fundamental dimensions used by individuals in social perception theory to assess other people [42]. Gelbrich, Hagel, & Orsingher [41] argued that the perceived warmth of the digital agents increases user satisfaction and persistence of use. In particular, they demonstrated through experiments that the effect on persistence only occurs when a digital agent provides emotional support. This assessment of warmth is strongly linked to users’ perceptions of satisfaction [43].

Another essential emotional support of digital agents is to increase trust. Trust is a multifaceted concept that refers to the belief that others will behave with benevolence, integrity, predictability, or competence [44]. In interactions with artificial intelligence, trust plays a crucial role in the initial adoption and successful utilization of users [45]. People tend to believe that the more human-like mental abilities technology has, the more capable it is to perform its intended function [11,46]. Based on these studies, Waytz, Heafner, & Epley [47] argued that anthropomorphic digital agents in autonomous driving vehicles that have to surrender personal control increase users trust. Furthermore, the anthropomorphism of digital agents was shown to increase the perceived emotion of users, such as amiability, sociability, preference, and likability [48,49,50].

Most of the perceived emotional words users have in their interactions with anthropomorphic agents are extracted from humanities and sociology studies that deal with interactions between humans. The studies mentioned above take a common way of hypothesizing and validating that anthropomorphism of digital agents will increase perceived sensitivity, such as that seen in human interactions. However, since the perceived emotions arising from interactions between humans are many and various, it does not seem easy to verify one by one whether users have those perceived emotions in their interactions with digital agents or not.

2.3. Motivations and Values of Using Voice Assistants

Artificial intelligence has become an essential topic between individuals and businesses in recent years, especially with the growth of voice assistants [51]. Voice assistants (VA) are a type of voice-enabled artificial intelligence, and are revolutionizing consumer consumption culture. They are becoming a more significant part of consumers social lives [28]. Nevertheless, despite the increasing availability of voice agents in the market, the research found that the majority of the users either end up using them minimally or stop using them completely [52,53,54,55]. One of the primary obstacles to the prolonged use of voice assistants is the difference between the users’ expectations and agents performance [45]. To reduce this gap, it is vital to accurately identify the motivation, attitude, and expected value of using voice assistants.

McLean & Osei-Frimpong [5] studied the motivation of voice assistants using U&GT [56], a theoretical motivational paradigm that can be used to understand individuals motivations to adopt technology [57]. This study found that household size influences the motivation and use of voice assistants and identifies that in-home voice assistants are used for utilitarian purposes. They also identified the unique social benefits of voice assistants, namely social presence and social attraction, and found that individuals are motivated by the utilitarian benefits, symbolic benefits, and social benefits provided by voice assistants, and hedonic benefits only motivate the use of in-home voice assistants in smaller households.

Ashfaq, Yun, & Yu [58] used the partial most minor structural equation modeling (PLS-SEM) approach to explore smart speaker users’ attitudes and usage persistence intentions. This study explored the effects of perceived coolness on consumers’ attitudes toward smart speakers through perceived values (i.e., functional, hedonic, economic, and social values), and the findings showed that consumer attitudes toward smart speakers was influenced by functional, hedonic, and economic values, but not by social value. In particular, the hedonic value mentioned in both studies is also an essential factor studied as a general perception of enjoyment, fun, or playfulness associated with using a certain technology or service [59]. These motivational values are also defined as utilitarian benefits of individuals gathering information to complete tasks, hedonic benefits to find pleasure, and symbolic benefits to reaffirm their social status [60].

Other studies have identified a variety of variables affecting users behavioral intention, including product-related factors, platform-related factors, privacy concerns [61], privacy-related variables, value-related variables [62], software-related determinants, and hardware-related determinants [63], and Chattaramana et al. [64] have expanded social response theory [6,65] to distinguish voice assistants from task-oriented and social-oriented, and demonstrated differences in the responses of users interacting with them.

On the other hand, research on the intention and expectations of use for non-human digital assistants has also been addressed in the field of interaction with robots. However, human-robot interaction (HRI) research on the voice has been a shortcoming in previous work, receiving less attention as an interface than the influence of appearance [66]. However, as these studies have become less novel, there have been efforts at attempting new research. Kwon, Jung, & Knepper [67] argued that humans construct a distinct theory of mind model of machines and people, people attribute more human mental models to more social robots, and the robot’s behavior can change mental models. Furthermore, Edward et al. [11] argued that it is appropriate to apply constructive paradigms to understand the role of expectations in human mental models. The constructivist paradigm in communication research rests on the assumption that people accumulate and integrate communication knowledge, tendencies, and abilities through previous interactions, and that those form the basis of later social cognitive and message construction behaviors [11].

2.4. The Usage of VA

Voice assistants are software agents that can interpret human speech and respond via synthesized voices. Although each currently available voice assistant has unique features, they share some similarities and are able to perform basic tasks. In addition to these tasks, voice assistants can add other features, often called “skills,” that expand their abilities by interfacing with other programs via voice commands [68].

One of the most frequent areas of study in recent years is the use of VA in vehicles. In particular, regarding the personification of VA, Braun and his colleagues studied the characteristics of in-vehicle VA for various driving scenarios [19]. They argued that trust is important among the characteristics of VA in vehicles [19], and trust in these digital systems is an essential prerequisite for long-time adoption, so it can be said that it is even more so in a driving environment that values safety [69].

In addition, VA offers a number of advantages in healthcare. Voice assistants have the potential to remove barriers and give patients more control over their health, adding value to patient engagement. And whether by optimizing communication or providing critical new features, voice assistants are becoming an essential part of the healthcare industry shift towards patient-centricity [68]. Dimitri Dojchinovski, Andrej Ilievski, & Marjan Gusev studied the concept of VA that contributed to informing patients heart conditions through voice recognition home healthcare systems such as CardioCube and Pilo [68,70,71]. Other recent research has shown that VA can benefit dementia sufferers by providing an ever-present voice that can answer the same questions again and again without losing patience and offer encouragement when needed [72].

Another important application of VA is education. VA, which has a language-based HMI that provides information without time-demanding searching, is a way to bring more attractiveness into education and increase motivation in students [73]. In particular, VA has the advantage of attracting children from an early age, so it is widely used in children’s education [74] and is also applied to entertainment applications for children [75].

Like the use of VA for children, VA can provide great advantages for the elderly. They can easily access online services through speech-centric multimodal interaction with VA [76], and health care services with VA provided to them can be an important factor that influences whether they want to continue living in a smart home environment [77,78].

In addition to the use of VAs covered in these previous studies, there are scenarios for the use of various VAs such as library management, tour guides, and audiobooks [68].

2.5. Mental Model

The mental model theory has obscure origins [79,80]; it is known as the concept first mentioned by psychologist Craik [81]. This theory posits that humans perceive the real external world by applying clues that underlie their brain’s behavior and the theory has attracted much attention from researchers, particularly psychologists.

Zaltman [82] stated that mental models are a set of thoughts and are utilized in the process of processing or reacting to information generated by an event and that humans possess large amounts of mental models unconsciously, but only in some cases where the expectations and mental models produced by dual experience conflict. Carey [83] also noted that humans form their behaviors based on incomplete facts, past experiences, and even intuitive perceptions in activating mental models that exist in their unconscious to understand their surrounding world. Mental model research has been addressed in various disciplines, among which mental models can be understood as a set of knowledge elements from a cognitive science perspective [84].

The research on mental models can be divided into two main branches: the first is centered on internal mental processes and cognitive phenomena. The second is applied mental models to support better interaction between people and the external world [85]. In particular, there have been many studies of mental models in human-computer interaction (HCI). Norman [86] emphasized that the root cause of a problem in using technology products arises from differences between users and designers’ mental models. This was a new perspective on HCI, which led designers to try various methods to provide designs that matched the user’s mental model [87,88].

In HCI, mental models have been actively applied in the field of interface and product design, but are expanding into the realm of robots and artificial intelligence (AI) as technology advances rapidly [8,12,89]. Inhumane objects such as robots and AI build social relationships with humans, where humans mental models are influenced by their physical shape and behavior [67]. Many studies have shown that people express their surprise, amazement, happiness, disappointment, amusement, uneasiness, and confusion in their encounter with an AI [8]. This phenomenon is the same in conversations with non-human objects, as it can be seen that the speech frequency or nature of the conversation affects users’ impressions [90,91]. These studies imply that people create mental models with structure and order through observations and interactions on non-human objects, and that we can understand each other’s interactions more deeply by understanding the elements of this mental model. Eventually, understanding a person’s mental model for non-human objects can help us design them [92].

The methodologies used to derive a mental model of a user in the field of HCI can be divided into four categories: verbalization, rating, sketches, and card sorting [85]. Verbalization is mainly done through interview techniques such as think aloud and laddering [79], and laddering techniques are frequently used to explore the relationship between answers [92]. Rating is a method of asking people to rate using questionnaires [93,94], but it is not often used. Additionally, sketching is visualizing how they thought of a concept or the pathways from the start to a specific point in a system [55,94]; card sorting is an effective way to derive hierarchical structures [95,96]. Since these methods have their respective strengths and weaknesses, it is necessary to compare their differences using various methods to derive a user’s mental model accurately.

The mental model is a clue to the user and is a view of the external world [86], which is very important in considering the mental model that a majority of people have in common rather than the mental model that an individual has. However, it is not easy to derive and quantify these mental models, as they are constructed based on life experiences and accumulated knowledge [86], which exists only in the head of an individual and cannot be directly observed. Therefore, deriving mental models requires indirect reasoning based on observing and analyzing people [85]. Suppose designers understand and develop the user’s mental model from a user perspective based on various approaches. In that case, the mental model can link the interface or the product and the user.

As shown by prior research, mental models have been flexibly applied to different targets and have been essential to understanding users who encounter the technology whenever a new technology emerges. Therefore, VA which has recently emerged as a device equipped with new interaction technologies, is an interesting topic in the mental model research area. However, there are very few cases of studying the mental model of VA, and much research will be needed considering the potential for future development of the VA.

2.6. Psychological Approach

HCI research has developed based on various studies, primarily through psychological perspectives, to address multiple errors or build user-centered designs. The user research methodologies used in HCI research are primarily rooted in psychology as tools for understanding users who use products or services and know their world. Psychology is the study to understand human behavior, and the area used for that purpose is gradually expanding. Psychological approaches are beneficial for user-centered design in that they embrace ideas and knowledge built on multiple experiences and derive theoretical frames such as mental models.

Constructivism is a theory of knowledge and learning in which the individual generates his or her knowledge, and constructs knowledge in the process of tackling problems [97,98]. The theory of “personal constructivism” and “radical constructivism” has recently drawn new attention in the HCI field.

Personal constructivism is also referred to as personal construct psychology (PCP) or personal construct theory (PCT), which originated with the pioneering work of George Kelly [99]. Kelly’s theory of personal constructs was the first attempt to devise a theory of personality and psychotherapy based on a formal model of the organization of human knowledge [100]. He developed personal constructs to organize people’s experiences, which predict how the world and individuals might behave [101]. In terms of PCP theory, they note that the external world has a significant impact on a person’s constructions [101].

Radical constructivism (RC) emphasizes the ability of human beings to use the understanding they create to help them navigate life, regardless of whether or not such understandings match an external reality [102]. Furthermore, it rejects the possibility of objective knowledge because all knowledge depends on the knower’s structure [103]. This notion of RC is mainly compatible with recent studies involving adaptive robots or autonomous agents, and for this reason, constructivism has recently gained attention again [104].

Various assessment techniques have been developed from PCP, divided into grid-based methods and non-grid-based methods [105]. The most famous method for grid-based methods is the repertory grid technique (RG) devised by Kelly. RG is the most widely used of all PCP-based psychological evaluation techniques [106] because it is closely related to PCP theory, can combine quantitative and qualitative analysis, can calculate multiple measurement ratings from the input data on a mathematical basis, and has good accessibility to statistics software [107].

RG builds a mental map of the world the user has into a subjective representation (language) of the individual [106]. In particular, from a design perspective, the goal of RG is to analyze elements, not subjects, and by analyzing the personal constructs generated with different participants, information such as perception-related consumer preference behavior can be obtained [107]. RG is a highly flexible method of data collection used in a wide range of research, including business, education, tourism, and forensic psychology [108]. Leach and his colleagues revealed that the analysis of the RG provides useful qualitative information on progress at various stages of therapy, along with quantitative measures (such as self-ideal self-discrepancy as a measure of self-esteem) [109]. In research exploring the meanings of various kinds of outdoor spaces for people, they argue that the psychological meaning of constructing ‘the natural’ for individuals is different for each individual [110]. RG is also being used in the field of design. In research to collect user experience requirements in the early stages of product development, RG was used to derive more profound and more detailed user experiences [107]. In particular, in this work, RG is viewed as a useful user-centered design tool, as it is possible to derive detailed information that most users do not know themselves. RG is mainly characterized by its ability to measure individual psychological evaluations through multiple statistical analysis methods quantitatively, and if the results from those are similar, confidence in conclusions can be increased [109]. Statistical analysis methods used in RG include PCA (principal components analysis), HCA (hierarchical cluster analysis), multidimension scaling (MDS), and biplot. Among them, HCA, in particular, is mainly used in research such as psychotherapy because it is possible to easily analyze the elements that make up a single individual’s psychology in the form of a tree.

A representative method for non-grid-based methods is laddering techniques [111,112]. Laddering is an in-depth interview technique based on means-end theory, where this approach is a method of observing and asking specific questions about product attributes and results [113]. Laddering involves a tailored interviewing format using a series of directed probes, with the express goal of determining sets of linkages between the critical perceptual elements across the range of attributes (A), consequences (C), and values (V) [114]. In other words, Laddering’s process repeats to elicit configurations and derive new components by asking about the respondent’s preferred structure questions such as “Why is that important to you?”) [115]. We can use laddering to produce a direct and useful understanding of consumers while simultaneously seeing the connection of consumer perceptual processes [114]. Laddering is primarily used in marketing domains to understand consumer decision-making better but has also been successfully used as a research method for creating design recommendations in the HCI field. These researches used laddering and association techniques to understand why users choose a particular website over others [116], evaluate children’s likeability of games [117], and develop a user-friendly mobile city applications [118]. The PCP-derived methodologies, RG and laddering, differ in their results and investigation methods, but they are essentially similar in their exploration. After all, both are methodologies that organize human psychological experiences and have characteristics whereby it is easy to identify perceptual elements for a target.

An investigative methodology called the Zaltman metaphor elicitation technique (ZMET) is commonly used in marketing, which is similar in purpose to these two methodologies. Since people are not aware of their mental activities, and people’s cognitive processes are complex on an unconscious level, it is not easy to recognize the cognitive process independently or express it in the language [119]. ZMET is a method for visualizing the cognitive processes of people, which allows consumers to collect their pictures to control the stimuli used in guided conversations. Through this, we can surface mental models that induce consumer thinking and behavior, and identify and characterize consumers’ ‘deep metaphor’ [120]. ZMET is conducted as a guided conversation to allow participants in the experiment to collect their pictures. The guided conversation is a personal, one-on-one interview that includes a variety of steps, only a subset of which are used in any particular project [120]. Most of the studies using ZMET recruited and interviewed 10–20 participants [120,121,122,123,124] but there is also a study that experimented only with heavy users as needed [122].

In ZMET, ‘deep metaphor’ is a metaphor that structures what we think, act, and say. If deep metaphor is entirely unconscious, ‘surface metaphor’ is the closest metaphor to daily life, and ‘metaphor theme’ is a metaphor that exists under surface metaphor and is not entirely trapped in unconsciousness [125]. Deep metaphors and emotions work interchangeably, and without deep metaphors, it may be impossible to understand emotions [125,126].

Many kinds of research using ZMET have included RG and laddering techniques in the interview process, as they are effective in deriving structures based on thought and behavior [120]. These used ZMET to identify consumer psychology or perceptions in marketing or advertising [121,123,127], to derive women’s emotional responses to intimate advertising [123], and to understand consumers in service marketing research [127]. Also, ZMET has produced valuable results in various fields of research, including Internet experience building [128], brand image [129], and 3G mobile banking services [130]. In user experience design, deep psychological factor analysis of users using ZMET has been helpful in research dealing with user experience on the Facebook interface [122] and UI usability evaluation [131].

The responses derived from ZMET can be structured by visualization methods such as a consensus map, with the greatest feature being that the interaction between the derived factors and the importance of each factor can be identified together. This allows us to identify which factors affect users and identify the link between the set of user thinking and emotions.

3. Method

This study used two psychological approaches to identify the standardized mental model of users using VAs from an anthropomorphism perspective. As shown in Figure 1, First, in Study 1, we used ZMET methodology, which allows experimental respondents to project their mental models into several images to represent important cognitive (thoughts) and emotional aspects to find out the personified personality, emotional factors, VA usage, and values of VA. In Study 2, we tried to analyze psychological factors for the use of VA by individual participants who participated in Study 1 through RG. To this end, we proceeded with HCA and verified the results obtained in Study 1 more deeply.

3.1. Study 1: ZMET

We recruited 19 unspecified participants (Table 1). To determine individual agents’ experience and familiarity, we check the total duration of use of VA, type of device used, and frequency of use. This is to find out whether the difference in the experience of using VA can appear as a difference in mental models.

Since our study aims to derive the mental models of unspecified participants, general information such as gender and age of participants is not a significant factor influencing the results of this study. The data used in this study were collected in compliance with the Korean Statistical Act, and conducted according to research ethics in using the data. Since the participants of the study were Koreans, the VA was selected as a device capable of Korean language support.

The experimental devices consisted of four types: Apple Siri (Smartphone), Google Home mini (AI Speaker), SK NUGU (AI Speaker), and Naver Clova (AI Speaker), and participants used each VA once during the free exploration process and then conducted ZMET interviews.

The in-depth interviews (ZMET) with the participants were conducted for about two hours. We tried to pre-check the experimental design through the pilot test, and the main test was conducted by modifying the experimental design by referring to the results of the pilot test (Figure 2).

Before conducting this experiment, a pilot test was conducted. Through this process, we checked one of the essential processes in ZMET: image acquisition methods. We compared the provision of magazines as a medium of use by participants when collecting images with the freedom to collect images from websites to find out which method is more effective than the other.

First, we asked participants to freely access the website through a computer to collect the desired images, and then provide them with various kinds of magazines to collect images once again. In some ZMET studies, participants were informed of the research in advance and were asked to bring the images they collected on the day of the interview. However, in this work, we conducted image collection of participants during the interview process ensure that all participants collected images under the same conditions.

In the pilot test, when the participants were asked to explore the website, there was a difference in familiarity with the site structure, and we found that there were too many images on the website, making it difficult for them to collect images. On the other hand, when collecting images using magazines, almost all participants in the experiment tended to select images from fashion magazines among various kinds of magazines (we provided them with various magazines such as fashion, travel, living, and art). Participants in the experiment responded that fashion magazines not only contain many images of various human facial expressions and emotions, but also contain advertisement images of many familiar products frequently used in everyday life, which is very suitable for expressing individual feelings about anthropomorphized voice assistants. Therefore, in this experiment, we decided to request that users collect images using fashion magazines. In addition, when creating collage images in the vignette, the last step of ZMET, we informed them that they could collect additional images from magazines, but most of them could no longer find the images they wanted, so we decided to provide them with a coloring tool to draw the images they couldn’t find.

Participants were provided with the fashion magazine GQ Korea to collect images.

First, we asked the reasons for using VA through the pre-test questionnaire, the benefits and values of using it, what the users expect from VA, and the overall idea. It was intended to compare metaphorical images derived from ZMET interviews with thoughts that participants in the experiment perceived themselves (answer to previous questions).

Then, in the free navigation phase, we allowed users to use VA freely. In this process, we gave the participants a minimal guide to all using the same functions, to investigate their thoughts on using VA at the same level.

We instructed the participants to use are 12 functions of the VA (music, timer, weather, Web-search, small talk, scheduling, translation, speaker settings, calculation, traffic navigation, etc.) that all VA can answer in common, which were determined by referring to the user manual provided by the manufacturer of each VA.

Finally, we conducted a ZMET interview consisting of seven steps (Table 2). The overall experiment took approximately 2 h per participant.

Before the ZMET interviews, we prepared and provided coloring tools for collage image, one of the results of ZMET, one VA device, one magazine for image scrap (one for each participant), and one of ZMET’s results.

3.2. Study 2: Repertory Grid

Participants in the experiment proceeded with the same participants as Study 1: ZMET.

Study 2 proceeded sequentially after Study 1. After the end of the Study 1 ZMET interview, users once again proceeded with free navigation for 12 functions. After that, we started the repertory grid technique by providing them with constructs derived from the ZMET interview data of the participant (Figure 3). The total experiment time of RG took about an hour. Also in Study 2, we conducted the pilot test before this experiment.

The pilot test was conducted immediately after ZMET was terminated without free navigation. However, many participants did not remember the contents of the interaction with VA during the free navigation process conducted in Study 1, so we decided to add the free navigation before the start of the experiment (Study 2).

In addition, we found a problem through the Pilot Test, and found that among the 12 functions corresponding to the elements (column) of the report grid form (Figure 4), the ‘smart home’ function was not built in the actual experimental environment and did not work properly. Therefore, this function was changed to ‘scheduling’.

In the rep grid analysis format, elements (columns) are the 12 functions of VAs used in the Free Navigation phase, and constructs (rows) are personal elements that participants feel in the use of VA organized based on the ZMET interview (Study 1) data. We conducted the experiment after confirming this with the participants. Participants were asked to write ‘the opposite of their subjective meaning, not the prior definition of the constructs provided. The configuration pairs of constructs varied from at least eight to a maximum of 12, depending on the interview data.

Participants thought of each pair of constructs and evaluated them on the seven-point scale for 12 elements.

4. Results

4.1. Results of Study 1

We used visualization methods such as collage image and consensus map to understand users’ mental models. Collage image is a technique for storytelling of participants’ thoughts or feelings about a topic in visual language, which allows them to discover the hidden thoughts and feelings of users [132].

A consensus map represents a set of issues, products, services, and thoughts shared by consumers about the companies that promised to provide them. It is also a mental model which shows the thoughts and feelings shared by a particular group of consumers on a particular topic and its linkage [82]. Using these two methods, mental models were represented as visual images by participants, and the participant-wide mental models were diagrammed using data analysis.

4.1.1. Collage Image

In the final phase of the ZMET interview, participants were asked to express the images, feelings and thoughts about VAs using collage techniques.

Through the collage image process, users expressed their thoughts or emotions by combining images selected during the interview (Table A1). The results express the personified personality, emotional factors, and motivation and value of using VA of an individual. Each of the image elements in collage image can be seen to have an idea that participants want to express (See Table 3). Each participant expressed collage image as can be seen in Figure 5.

(a): P6

“It is a very cute friend to me, but it is a tremendous analyst and has tremendous knowledge. Sometimes it’s stupid, but it’s not always like that. I think it’s piled up in a veil to some extent.”

(b): P14

“It would be gentle and kind as it responds very kindly whenever and whatever I ask. Also, Rather than expressing my feelings, I chose the image of a secretary because I thought that the speaker itself was an ‘artificial intelligence secretary’. (…) If the speaker is actually working as a human, wouldn’t it be like this?”

(c): P18

“Speakers are intangible and very high-dimensional, so I think they will be smart. (…) Human life is finite, but speakers are infinite, so it seems that it will continue to develop in the future.”

4.1.2. Consensus Map of Using VA

We were able to produce 18 compositions of all interviews. The 18 compositions were derived from interview data and were based on the convergent rule proposed by Zaltman & Coulter [120]. Typically, for a construct to be included on the consensus map, it must have been mentioned by at least one-third third of the participants and a construct pair must have been mentioned by at least one-quarter of the participants. Consequently, the consensus map, on average, captures 80 percent of the constructs mentioned by each participant [120]. The consensus map based on these data represents the connections between the configurations, and the connections between the configurations represent the thought process of the entire group of participants [124].

In this work, the relationship between compositions by diagram was shown by linking constructs derived from qualitative analysis of interview scripts. Figure 4 is the consensus map of VA, representing a set of commonly mentioned configurations among participants, and shows collective orientation. Each component can be expressed in three themes: (1) use for empathy and fun, (2) helpful, and (3) expectations for VA and the themes can be divided into the human and the mechanical aspects of VA.

(a): Theme 1: use for empathy and fun

The configuration included in the theme “use for empathy and fun” is the left, light gray area of the consensus map (Figure 6). The use of VA included configurations that represent the human aspect of VA, the human factor. Even though participants were using VA, which is only a machine, they felt friendship and familyhood with the VA, forming a bond. Furthermore, it has been shown that they are comforted by VA and relying on it. The participants stated that they felt warmth in this process. Also, they could feel the joy of using VA, empathy, and comfort at the same time, and the feeling of chatting with friends.

“It feels like a friend A LOT.”
(P6)

“It’s mechanical, but when I play words or say hello, I get the idea that I can be friends or family when I’m lonely.”
(P7)

“Comfort and empathy, I would say. (…) I want to be comforted when I’m emotionally unstable. There are some things that are hard to tell my friends or family, that’s when I want to rely on voice assistants.”
(P1)

(b): Theme 2: helpful

The theme “helpful” had the most constructs out of all the themes. The most-felt emotion amongst all emotions users felt was “help”. Through the process of using VA, searching through various functions, controlling and setting by voice conveniently has created a cause for use for users. In addition to it, when continuous use was accumulated, the convenience in a sense of functional aspects was converted to the idea of “working for me,” and “helping me”. These thoughts formed an image of VA as a personal assistant for each individual. In other words, the fact that a VA always waits quietly unless users call it, and responds immediately through a wake-up word at the moment wanted, reminded users of a job as a secretary who is always waiting.

“The voice assistant works for me. When I ask for a certain work, it performs that work and shows it to me, so the part seems like a top-down relationship to me.”
(P17)

“As VAs search and inform information, I think of an image like a secretary from them.
(P14)

It can be seen that the above factors work positively on users, making VAs the same image as a ‘friendly’ secretary and a ‘helpful’ person. However, by allowing users to feel ‘clever’ in the mechanical aspect of VA, it was found that there was also a mental structure in which participants expected VA to act as ‘robot assistants’.

(c): Theme 3: expectations for VA

The final theme is “expectations for VA” in which, unlike previous themes, the user clearly recognizes VA as a machine, one that represents the potential for future development and expectations for the future. Participants felt a lot of mechanical feelings in VA’s tone of voice, and complex speech, especially in that some conversations were unnatural. However, the participants said that the hard tone of speech gave them a rather clever feeling. When VA gave the wrong answer, the participants felt a solid mechanical feeling. Although the users felt the mechanical side through these various factors, they saw the possibility of development due to the machine’s technical part resolving at the same time. Participants who have been using VA consistently for a long time were especially looking forward to the development. In the past, the conversations were more unnatural, but as they are getting more natural and human-like, the participants look forward to the future more.

“I wanted to express a speaker in an intangible space. Human life is finite, but speakers are infinite, so it seems that it will continue to develop in the future.”
(P18)

“There are certainly more advantages than disadvantages, and I feel uncomfortable without it. I feel very good if the conversation continues smoothly, like a conversation with a real person. I expected it to continue to develop in the future.”
(P16)

Theme 1 and 2 are, in the end, elements connected to the convenience and usefulness of VA, and theme 3 is a distinct element from them.

Among constructs, “clever” had three themes that can be interpreted as “smart” friends, “smart” secretaries, and “smart” machines. Users described VAs as ‘clever’, and represented the main features of VAs that give them any answers to the questions based on such information.

4.2. Results of Study 2

We utilized the statistical program R Studio as a method of analyzing RG. Among the various analytical methods provided by R Studio, we analyzed the relationship between the concepts of individual participants’ composition with HCA (Appendix B Figure A1). The relationships between individual constructive concepts of participants shown in the analysis showed similarity to each ZMET experiment’s results. The main components that determine the participants’ psychological judgment were different, but they did not deviate from the three themes derived from Study 1.

The following analysis results are examples of HCA containing the constructs of the participants constituting the three themes derived from Study 1.

A hierarchical cluster analysis (HCA) was conducted to explore the identification and correlation of key constructs derived from this study. This allowed us to explain the association between constructs of participants and to identify similarities. In this work, the distance was measured with Euclidean distance, a traditional method used in cluster analysis and cluster by wards.

Figure 7 is the result of an HCA analysis of P1. It can be seen that ‘to develop further—retreat’ and ‘talking to a child—Learn’ are grouped into one cluster, and ‘empathy—do not understand’ and ‘human answer—inhumane’ are grouped into one cluster. In particular, the attributes of ‘empathy—do not understand’ and ‘human answer—non-human’ appear as key constructs that constitute the psychology of P1. In other words, it can be seen that P1 considers the psychological components of “how much do they answer like a human?” and “how much do they sympathize with me?” as most important in identifying VA. These results can be considered to correspond to theme 1 and 2 among the three themes expressed in the consensus map of Study 1.

In the case of participants (P14), ‘helping—hinder’ and ‘supportive—dominant’ are grouped into one property, clustered with ‘luxurious—poor’ to become an essential criterion for judgment (Figure 8). This represents an image of the VA that helps and assists, and cluster, ‘luxurious—poor’ is associated with the personal secretary’s image. This correlation shows that P14 thinks of VA as the presence of a secretary who can receive help, and it can be seen that it corresponds to theme 2 among the results of Study 1 (Figure 4).

Finally, the analysis results of participants (P18) indicate that they correspond to the “mechanical aspect—expectation for VA” group, which is the theme three of Study 1. ‘Developing—unimproving’ and ‘infinite—limited’ represent the future and possibilities of VA are clustered closest to each other and developed into one psychological criterion.

Also, the intelligent aspects of VA such as ‘intelligent—dumb’ and ‘correct—goes wrong’ are closely clustered. The derived clusters show that machines also have constant development and potential like humans. These possibilities reflect the psychology of wondering and anticipating the future because they strongly feel VA’s intelligent aspects (Figure 9).

5. Discussion

As previous studies have shown, the anthropomorphism of VA is generally known to influence users positively [17,19,20,28,38,39,41,47,48,49,50]. However, it is not efficient to verify the positive influences of many anthropomorphic elements one by one, so we derived users’ mental models from categorizing larger branches of elements. In other words, the purpose of this study was not to verify the positive influences of individual anthropomorphic elements covered in previous studies, but to identify the underlying psychological mechanisms of users. In practice, we have done two studies that found that the psychological mechanisms of users dealing with VA are firmly divided into three categories.

Participants who feel VA as a friend or family member and form a bond were enjoying and comforting emotions through their interaction with VA. For participants who have these psychological mechanisms, the warmth of the VA’s friendly tone and empathy are the biggest criteria for use. This can be said to be the positive results claimed by most prior studies related to the anthropomorphizing elements of VA. Another user classification is the group that places the utmost importance on VA’s fundamental development objective, “help”. Participants with this psychological mechanism tend to search frequently, control, and eventually regard VA as a clever personal secretary. In order to satisfy these participants, VA’s technological advancement should be essentially accompanied, but we found that VA’s kind and clever tone and attitude could offset technical constraints. The anthropomorphism factors that can portray a clever secretary are the biggest criteria for this class of participants. Users with these two types of mental models have been partially revealed in previous studies, but our results of research show that there is a group of users with new mental models that have not been identified in previous studies. This group clearly recognized VA as a mechanical device and had a solid mechanical feel in poor human factors such as complex speech and tone. In addition, they do not want any special emotional needs in the machine. In particular, they came up with a rather clever image through these mechanical elements. Most of the users who showed this tendency are those who have been using VA steadily for a long time (P11, P16, P18). It is assumed that expectations for the development of VA are high because they have seen it develop gradually over a long time. These results show that there are users who are not yet satisfied with the level of anthropomorphism of VA. However, these results show that emphasizing the mechanical aspect of VA may satisfy users, even if the level of anthropomorphism technically does not continue to develop.

6. Conclusions and Limitations

Our experiment is meaningful because we grouped users by studying their psychological mechanisms for comprehensive factors such as the voice, attitude, and response content of anthropomorphized VA. The results of our study, which represented users’ mental models, can be used as an essential factor to consider when developing VA. In particular, it may be an effective method to select one of the three themes in accordance with the purpose of using the VA to be developed and emphasize the anthropomorphism factors accordingly. In other words, technical support that personifies VAs like friends or secretaries should consider the three mental models proposed in this study that suit the nature of the application.

However, we did not address the negative aspects of anthropomorphic VA mentioned by the users who participated in the experiment, and our study has a limitation in that it has not been able to deal with all VA products with various purposes. Since this study was conducted based on the Korean context, it has limitations in translating the breadth of the Korean language related to emotional aspects used in the experiments. In addition, due to the nature of the research using psychological methodology, our study had fewer samples of participants than general quantitative studies and did not deal with technical statistics. Subsequent studies will need to be carried out to address these issues in the future.

Author Contributions

Conceptualization, D.P. and K.N.; Data curation, D.P.; Formal analysis, K.N.; Methodology, D.P. and K.N.; Software, K.N.; Visualization, D.P.; Writing—original draft, D.P.; Writing—review & editing, K.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Explanation of Collage Images.

No	Collage Image	Summarized Explanation
P1		I will keep my eyes on the voice assistants. I was a bit more negative on my first day using it, but the positive feelings have gotten bigger. I think it’ll get better as time passes. My biggest thought is to keep my eyes on them. The reaction voice assistant comes with sometimes is cute.
P2		I feel like I’m the trainer the voice assistant. Though the voice assistant should be able to learn by itself, I am the one who’s educating it one by one. Even though there are many inconveniences, I think it’s user friendly. I felt the kindness there.
P3		Overall, I tried to express a lot of symmetry. In particular, the eyes monitor me (user), but there is also a sense of monitoring to do it properly. I wanted to express a neutral position that was not biased toward one side by expressing the images of positive and negative in the color of the space.
P4		The seas show that I have to go to the sea to see if there are ships and what is. The parts under the sea represents the parts where I feel stuffy, where it is yet to be developed. The scent of diffuser describes how the voice assistant is always by my side.
P5		because the voice assistant gives more comfort than frustration, my collage was created with images without frustration.
P6		It is a very cute friend to me, but it is a tremendous analyst and has tremendous knowledge. Sometimes it’s stupid, but it’s not always like that. I think it’s piled up in a veil to some extent.
P7		It’s mechanical, but when I play words or say hello, I get the idea that I can be friends or family when I’m lonely.
P8		There seemed to be some constant logic when making a robot. Based on this logic, they create a complex feeling, wrap it finely to create a neat appearance, and we think that this system supports, and does things that express charm for us.
P9		I wanted to reveal the image of VA that was covered with the veil, so I looked for image that have low transparency. VA’s personalities that are revealed briefly are expressed in small and overlapping ways to show images.
P10		A situation in which you cannot read the context of the outside world, and you do not know the outside, trapped in your own world.
P11		It’s smart, and the plane flies in order to show quickness. the stone broke instead of the egg, which shows the new side/aspect.
P12		There is a figure of VA at the top, and there is a figure of me using VA at the bottom. I want to show a professional appearance, but in reality, it shows me using it while hiding. I wanted to emphasize that aspect through contrasting yellow.
P13		I expressed my heart. I expressed such a confused mind that I don’t know what I can do while using the voice assistant.
P14		It would be gentle and kind as it responds very kindly whenever and whatever I ask. Also, Rather than expressing my feelings, I chose the image of a secretary because I thought that the speaker itself was an ‘artificial intelligence secretary’. If the speaker is actually working as a human, wouldn’t it be like this?”
P15		There are hard and cold objects in this, but they exist without being mixed in people. I expressed the voice assistant being unable to determine what value itself should provide and the image of it roaming around as itself is not clear about things.
P16		I feel uncomfortable without it. I feel very good if the conversation continues smoothly, like a conversation with a real person. I expected it to continue to develop in the future. The response of the voice assistant and my feelings changed according to the question, so I expressed it in an endless road in this respect.
P17		The voice assistant works for me. When I ask for a certain work, it performs that work and shows it to me, so the part seems like a top-down relationship to me. A speaker that is very frustrating but has infinite possibilities.
P18		Speakers are intangible and very high-dimensional, so I think they will be smart. I wanted to express a speaker in an intangible space. so it seems that it will continue to develop in the future.
P19		I wanted to express the achromatic, mechanical feeling. I wanted to show the convenience of using it without touching it and the voice assistant I think of through the use of a metal watch. Overall, I tried to emphasize the achromatic feel. People were static and not laughing on a gray background, indicating the rigidity of the voice assistant.

Appendix B

Figure A1. HCA data of Repertory Grid.

References

Alexa Devices Maintain 70% Market Share in U.S. According to Survey. Available online: https://marketingland.com/alexa-devices-maintain-70-market-share-in-u-s-according-to-survey-265180 (accessed on 30 August 2020).
Business-Insider. Apple Says that 500 Million Customers Use Siri Business Insider. Available online: https://www.businessinsider.com/apple-says-siri-has-500-million-users-2018-1 (accessed on 24 July 2020).
Global Smart Speaker Users 2019. Available online: https://www.emarketer.com/content/global-smart-speaker-users-2019 (accessed on 28 May 2019).
Business-Insider. Smart Speakers Are Becoming so Popular, More People will Use Them than Wearable Tech Products This Year. Business-Insider. Available online: https://www.businessinsider.com/more-us-adults-will-use-smart-speakers-than-wearables-in-2018-2018-5 (accessed on 13 May 2020).
McLean, G.; Osei-Frimpong, K. Hey Alexa… examine the variables influencing the use of artificial intelligent in-home voice assistants. Comput. Hum. Behav. 2019, 99, 28–37. [Google Scholar] [CrossRef]
Nass, C.; Moon, Y. Machines and mindlessness: Social responses to computers. J. Soc. Issues 2000, 56, 81–103. [Google Scholar] [CrossRef]
Epstein, R.; Roberts, G.; Beber, G. Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer, 1st ed.; Epstein, R., Roberts, G., Beber, G., Eds.; Springer: Dordrecht, The Netherlands, 2009. [Google Scholar]
Shank, D.B.; Graves, C.; Gott, A.; Gamez, P.; Rodriguez, S. Feeling our way to machine minds: People’s emotions when perceiving mind in artificial intelligence. Comput. Hum. Behav. 2019, 98, 104–220. [Google Scholar] [CrossRef]
Nass, C.; Moon, Y.; Green, N. Are machines gender neutral? Gender-stereotypic responses to computers with voices. J. Appl. Soc. Psychol. 1997, 27, 864–876. [Google Scholar] [CrossRef]
Nass, C.; Moon, Y.; Carney, P. Are people polite to computers? Responses to computer-based interviewing systems. J. Appl. Soc. Psychol. 1999, 29, 1093–1110. [Google Scholar] [CrossRef]
Edwards, A.; Edwards, C.; Westerman, D.; Spence, P.R. Initial expectations, interactions, and beyond with social robots. Comput. Hum. Behav. 2019, 90, 308–314. [Google Scholar] [CrossRef]
Schrills, T.; Franke, T. How to answer why-evaluating the explanations of ai through mental model analysis. arXiv 2002, arXiv:2002.02526. [Google Scholar]
Wahl, M.; Krüger, J.; Frommer, J. Users’ Sense-Making of an Affective Intervention in Human-Computer Interaction. In Proceedings of the International Conference on Human-Computer Interaction, Toronto, ON, Canada, 17–22 July 2016; Springer: Cham, Switzerland, 2016; pp. 71–79. [Google Scholar]
Krüger, J.; Wahl, M.; Frommer, J. Users’ relational ascriptions in user-companion interaction. In Proceedings of the International Conference on Human-Computer Interaction, Toronto, ON, Canada, 17–22 July 2016; Springer: Cham, Switzerland, 2016; pp. 128–137. [Google Scholar]
Mithen, S.; Boyer, P. Anthropomorphism and the evolution of cognition. J. R. Anthropol. Inst. 1996, 2, 717–722. [Google Scholar]
Li, X.; Sung, Y. Anthropomorphism brings us closer: The mediating role of psychological distance in User–AI assistant interactions. Comput. Hum. Behav. 2021, 118, 106680. [Google Scholar] [CrossRef]
Epley, N.; Waytz, A.; Cacioppo, J.T. On seeing human: A three-factor theory of anthropomorphism. Psychol. Rev. 2007, 114, 864–886. [Google Scholar] [CrossRef]
Cafaro, A.; Vilhjálmsson, H.H.; Bickmore, T. First impressions in human–Agent virtual encounters. ACM Trans. Comput.-Hum. Interact. 2016, 23, 24. [Google Scholar] [CrossRef]
Braun, M.; Mainz, A.; Chadowitz, R.; Pfleging, B.; Alt, F. At your service: Designing voice assistant personalities to improve automotive user interfaces. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow Scotland, UK, 4–9 May 2019. [Google Scholar]
André, E.; Klesen, M.; Gebhard, P.; Allen, S.; Rist, T. Integrating models of personality and emotions into lifelike characters. In International Workshop on Affective Interactions; Lecture Notes in Computer Science; Paiva, A., Ed.; Springer: Berlin/Heidelberg, Germany, 2000; Volume 1814, pp. 150–165. [Google Scholar]
Jung, M.F. Affective Grounding in Human-Robot Interaction. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction (HRI ’17), Vienna, Austria, 6–9 March 2017; ACM: New York, NY, USA, 2017; pp. 263–273. [Google Scholar]
McCrae, R.R.; Costa, P.T. Personality, coping, and coping effectiveness in an adult sample. J. Personal. 1986, 54, 385–404. [Google Scholar] [CrossRef]
Kammrath, L.K.; Ames, D.R.; Scholer, A.A. Keeping up impressions: Inferential rules for impression change across the Big Five. J. Exp. Soc. Psychol. 2007, 43, 450–457. [Google Scholar] [CrossRef]
Aaker, J.L. Dimensions of brand personality. J. Mark. Res. 1997, 34, 347–356. [Google Scholar] [CrossRef]
Chen, Q.; Rodgers, S. Development of an instrument to measure web site personality. J. Interact. Advert. 2006, 7, 4–46. [Google Scholar] [CrossRef]
Eysenck, H.J. Four ways five factors are not basic. Personal. Individ. Differ. 1992, 13, 667–673. [Google Scholar] [CrossRef]
Goldberg, L.R. The development of markers for the big-five factor structure. Psychol. Assess. 1992, 4, 26–42. [Google Scholar] [CrossRef]
Poushneh, A. Humanizing voice assistant: The impact of voice assistant personality on consumers’ attitudes and behaviors. J. Retail. Consum. Serv. 2021, 58, 1–10. [Google Scholar] [CrossRef]
Aggarwal, P.; McGill, A.L. Is that car smiling at me? Schema congruity as a basis for evaluating anthropomorphized products. J. Consum. Res. 2007, 34, 468–479. [Google Scholar] [CrossRef]
Damasio, A. The Strange Order of Things: Life, Feeling, and the Making of Cultures; Pantheon: New York, NY, USA, 2019. [Google Scholar]
Holzwarth, M.; Janiszewski, C.; Neumann, M.M. The influence of avatars on online consumer shopping behavior. J. Mark. 2006, 70, 19–36. [Google Scholar] [CrossRef]
Ng, I.C.; Wakenshaw, S.Y. The internet-of-things: Review and research directions. Int. J. Res. Mark. 2017, 34, 3–21. [Google Scholar] [CrossRef]
Dunkel-Schetter, C.; Folkman, S.; Lazarus, R.S. Correlates of social support receipt. J. Personal. Soc. Psychol. 1987, 53, 71–80. [Google Scholar] [CrossRef][Green Version]
Menon, K.; Dubé, L. The effect of emotional provider support on angry versus anxious consumers. Int. J. Res. Mark. 2007, 24, 268–275. [Google Scholar] [CrossRef]
Duhachek, A. Coping: A multidimensional, hierarchical framework of responses to stressful consumption episodes. J. Consum. Res. 2005, 32, 41–53. [Google Scholar] [CrossRef]
Turner, J.W.; Robinson, J.D.; Tian, Y.; Neustadtl, A.; Angelus, P.; Russell, M.; Mun, S.K.; Levine, B. Can messages make a difference? The association between e-mail messages and health outcomes in diabetes patients. Hum. Commun. Res. 2013, 39, 252–268. [Google Scholar] [CrossRef]
Hill, C.A. Seeking emotional support: The influence of affiliative need and partner warmth. J. Personal. Soc. Psychol. 1991, 60, 112–121. [Google Scholar] [CrossRef]
Lee, E.; Lee, J.; Sung, Y. Effects of Users Characteristics and Perceived Value on VPA Satisfaction. Korean J. Consum. Advert. Psychol. 2019, 20, 31–53. [Google Scholar]
Wada, K.; Shibata, T. Living with seal robots-its sociopsychological and physiological influences on the elderly at a care house. IEEE Trans. Robot. 2007, 23, 972–980. [Google Scholar] [CrossRef]
Kim, A.; Cho, M.; Ahn, J.; Sung, Y. Effects of gender and relationship type on the response to artificial intelligence. Cyberpsychology Behav. Soc. Netw. 2019, 22, 249–253. [Google Scholar] [CrossRef]
Gelbrich, K.; Hagel, J.; Orsingher, C. Emotional support from a digital assistant in technology-mediated services: Effects on customer satisfaction and behavioral persistence. Int. J. Res. Mark. 2021, 38, 176–193. [Google Scholar] [CrossRef]
Fiske, S.T.; Cuddy, A.J.C.; Glick, P.; Xu, J. A model of (often mixed) stereotype content: Competence and warmth respectively follow from perceived status and competition. J. Personal. Soc. Psychol. 2002, 82, 878–902. [Google Scholar] [CrossRef]
Gao, Y.L.; Mattila, A.S. Improving consumer satisfaction in green hotels: The roles of perceived warmth, perceived competence, and CSR motive. Int. J. Hosp. Manag. 2014, 42, 20–31. [Google Scholar] [CrossRef]
McKnight, D.H.; Chervany, N.L. Trust and distrust definitions: One bite at a time. Trust. Cyber-Soc. 2001, 2245, 27–54. [Google Scholar]
Rheua, M.; Shina, J.Y.; Penga, W.; Huh-Yoob, J. Systematic review: Trust-building factors and implications for conversational agent design. Int. J. Hum. -Comput. Interact. 2021, 37, 81–96. [Google Scholar] [CrossRef]
Pierce, J.R.; Kilduff, G.J.; Galinsky, A.D.; Sivanathan, N. From glue to gasoline: How competition turns perspective takers unethical. Psychol. Sci. 2013, 24, 1986–1994. [Google Scholar] [CrossRef] [PubMed]
Waytz, A.; Heafner, J.; Epley, N. The mind in the machine: Anthropomorphism increases trust in an autonomous vehicle. J. Exp. Soc. Psychol. 2014, 52, 113–117. [Google Scholar] [CrossRef]
Bickmore, T.W.; Picard, R.W. Establishing and maintaining long-term human-computer relationships. ACM Trans. Comput.-Hum. Interact. 2005, 12, 293–327. [Google Scholar] [CrossRef]
Broadbent, E.; Kumar, V.; Li, X.; Sollers, J., 3rd; Stafford, R.Q.; MacDonald, B.A.; Wegner, D.M. Robots with display screens: A robot with a more human-like face display is perceived to have more mind and a better personality. PLoS ONE 2013, 8, e72589. [Google Scholar]
Yam, K.C.; Bigman, Y.E.; Tang, P.M.; Ilies, R.; De Cremer, D.; Soh, H.; Gray, K. Robots at work: People prefer—and forgive—service robots with perceived feelings. J. Appl. Psychol. 2020. Advance online publication. [Google Scholar] [CrossRef]
Guzman, A.L. Voices in and of the machine: Source orientation toward mobile virtual assistants. Comput. Hum. Behav. 2018, 90, 343–350. [Google Scholar] [CrossRef]
Cho, M.; Lee, S.; Lee, K.P. Once a Kind Friend is Now a Thing: Understanding How Conversational Agents at Home are Forgotten. In Proceedings of the 2019 on Designing Interactive Systems Conference (DIS ’19), San Diego, CA, USA, 23–28 June 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 1557–1569. [Google Scholar]
Cowan, B.R.; Pantidi, N.; Coyle, D.; Morrissey, K.; Clarke, P.; Al-Shehri, S.; Earley, D.; Bandeira, N. "What can i help you with?": Infrequent users’ experiences of intelligent personal assistants. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI’17), Vienna, Austria, 4–7 September 2017; Association for Computing Machinery: New York, NY, USA, 2017. Article 43. pp. 1–12. [Google Scholar]
Marchick, A. The 2017 Voice Report by Alpine (fka VoiceLabs). Available online: https://medium.com/@marchick/the-2017-voice-report-by-alpine-fka-voicelabs-24c5075a070f (accessed on 29 August 2020).
Rieh, S.Y.; Yang, J.Y.; Yakel, E.; Markey, K. Conceptualizing institutional repositories: Using co-discovery to uncover mental models. In Proceedings of the third symposium on Information interaction in context, New Brunswick, NJ, USA, 18–21 August 2010; pp. 165–174. [Google Scholar]
Katz, E.; Blumler, J.G.; Gurevitch, M. Utilization of Mass Communication by the Individual. In The Uses of Mass Communications: Current Perspectives on Gratifications Research; Sage: Beverly Hills, CA, USA, 1974; pp. 19–32. [Google Scholar]
Grellhesl, M.; Punyaunt-Carter, N.M. Using the uses and gratifications theory to understand gratifications sought through text messaging practices of male and female undergraduate students. Comput. Hum. Behav. 2012, 28, 2175–2181. [Google Scholar] [CrossRef]
Ashfaq, M.; Yun, J.; Yu, S. My smart speaker is cool! perceived coolness, perceived values, and users’ attitude toward smart speakers. Int. J. Hum.-Comput. Interact. 2020, 37, 560–573. [Google Scholar] [CrossRef]
Yoo, J.; Choi, S.; Choi, M.; Rho, J. Why people use Twitter: Social conformity and social value perspectives. Online Inf. Rev. 2014, 38, 265–283. [Google Scholar] [CrossRef]
Rauschnabel, P.A.; He, J.; Ro, Y.K. Antecedents to the adoption of augmented reality smart glasses: A closer look at privacy risks. J. Bus. Res. 2018, 92, 374–384. [Google Scholar] [CrossRef]
Park, K.; Kwak, C.; Lee, J.; Ahn, J.H. The effect of platform characteristics on the adoption of smart speakers: Empirical evidence in South Korea. Telemat. Inform. 2018, 35, 2118–2132. [Google Scholar] [CrossRef]
Kowalczuk, P. Consumer acceptance of smart speakers: A mixed methods approach. J. Res. Interact. Mark. 2018, 12, 418–431. [Google Scholar] [CrossRef]
Yang, H.; Lee, H. Understanding user behavior of virtual personal assistant devices. Inf. Syst. e-Bus. Manag. 2019, 17, 65–87. [Google Scholar] [CrossRef]
Chattaramana, V.; Kwon, W.; Gilbert, J.E.; Ross, K. Should AI-Based, conversational digital assistants employ social- or task- oriented interaction style? A task-competency and reciprocity perspective for older adults. Comput. Hum. Behav. 2019, 90, 315–330. [Google Scholar] [CrossRef]
Reeves, B.; Nass, C. The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places; CSLI Publications and Cambridge University Press: Cambridge, UK, 1996. [Google Scholar]
Tamagawa, R.; Watson, C.I.; Kuo, I.H.; MacDonald, B.A.; Broadbent, E. The effects of synthesized voice accents on user perceptions of robots. Int. J. Soc. Robot. 2011, 3, 253–262. [Google Scholar] [CrossRef]
Kwon, M.; Jung, M.F.; Knepper, R.A. Human Expectations of Social Robots. In Proceedings of the 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Christchurch, New Zealand, 7–10 March 2016, ISSN 2167-2148. [Google Scholar]
Hoy, M.B. Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants. Interactive home healthcare system with integrated voice assistant. In Proceedings of the 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 20–24 May 2019; pp. 284–288. [Google Scholar]
Hergeth, S.; Lorenz, L.; Vilimek, R.; Krems, J.F. Keep your scanners peeled: Gaze behavior as a measure of automation trust during highly automated driving. Hum. Factors 2016, 58, 509–519. [Google Scholar] [CrossRef]
Cardiocube Voice-Based Ai Software. Available online: https://www.cardiocube.com/ (accessed on 21 November 2021).
The Intelligent Healthcare Companion for the Home. Available online: https://www.pillohealth.com/ (accessed on 21 November 2021).
Wolters, M.K.; Kelly, F.; Kilgour, J. Designing a spoken dialogue interface to an intelligent cognitive assistant for people with dementia. Health Inform. J. 2016, 22, 854–866. [Google Scholar] [CrossRef] [PubMed]
Ondáš, S.; Pleva, M.; Hládek, D. How chatbots can be involved in the education process. In Proceedings of the 2019 17th International Conference on Emerging eLearning Technologies and Applications (ICETA), Starý Smokovec, Slovakia, 21–22 November 2019; pp. 575–580. [Google Scholar]
Terzopoulos, G.; Satratzemi, M. Voice assistants and artificial intelligence in education. In Proceedings of the 9th Balkan Conference on Informatics (BCI’19), Sofia, Bulgaria, 26–28 September 2019; Association for Computing Machinery: New York, NY, USA, 2019; Volume 34, pp. 1–6. [Google Scholar]
Alonso, R.; Concas, E.; Reforgiato Recupero, D. An abstraction layer exploiting voice assistant technologies for effective human—Robot interaction. Appl. Sci. 2021, 11, 9165. [Google Scholar] [CrossRef]
Teixeira, A.; Hämäläinen, A.; Avelar, J.; Almeida, N.; Németh, G.; Fegyó, T.; Zainkó, C.; Csapó, T.; Tóth, B.; Oliveira, A.; et al. Speech-centric Multimodal Interaction for Easy-to-access Online Services—A Personal Life Assistant for the Elderly. Procedia Comput. Sci. 2014, 27, 389–397. [Google Scholar] [CrossRef]
Fernando, N.; Tan, F.T.C.; Vasa, R.; Mouzaki, K.; Aitken, I. Examining digital assisted living: Towards a case study of smart homes for the elderly. In Proceedings of the 24th European Conference on Information Systems ECIS, Istanbul, Turkey, 12–15 June 2016; pp. 12–15. [Google Scholar]
Portet, F.; Vacher, M.; Golanski, C.; Roux, C.; Meillon, B. Design and evaluation of a smart home voice interface for the elderly: Acceptability and objection aspects. Pers. Ubiquitous Comput. 2013, 17, 127–144. [Google Scholar] [CrossRef]
Johnson-Laird, P.N. Mental models and thought. In The Cambridge Handbook of Thinking and Reasoning; Cambridge University Press: Cambridge, UK, 2005; pp. 185–208. [Google Scholar]
Volkamer, M.; Renaud, K. Mental models general introduction and review of their application to human-centred security. Number Theory Cryptogr. 2013, 8260, 255–280. [Google Scholar]
Craik, K.J.W. The Nature of Explanation; CUP Archive: Cambridge, UK, 1967; p. 445. [Google Scholar]
Zaltman, G. How Customers Think: Essential Insights into the Mind of the Market; Harvard Business School Press: Boston, MA, USA, 2003. [Google Scholar]
Carey, S. Cognitive science and science education. Am. Psychol. 1986, 41, 1123–1130. [Google Scholar] [CrossRef]
Devedzic, V.; Radovic, D. A Framework for Building Intelligent Manufacturing Systems. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 1999, 29, 422–439. [Google Scholar] [CrossRef]
Xie, B.; Zhou, J.; Wang, H. How influential are mental models on interaction performance? Exploring the gap between users’ and designers’ mental models through a new quantitative method. Adv. Hum. -Comput. Interact. 2017, 2017, 1–14. [Google Scholar] [CrossRef]
Norman, D.A. The Psychology of Everyday Things (The Design of Everyday Things); Basic Books: New York, NY, USA, 1988. [Google Scholar]
Helander, M.G.; Landauer, T.K.; Prabhu, P.V. Handbook of Human-Computer Interaction, 2nd ed.; North Holland/Elsevier: Amsterdam, The Netherlands, 1997. [Google Scholar]
Van der veer, G.C.; Melguizo, M.C.P. Mental models. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications; CRC Press: Boca Raton, FL, USA, 2002; pp. 52–80. [Google Scholar]
Powers, A.; Kiesler, S. The advisor robot: Tracing people’s mental model from a robot’s physical attributes. In Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction, Salt Lake City, UT, USA, 2–3 March 2006; pp. 218–225. [Google Scholar]
Nass, C.; Brave, S. Wired for Speech: How Voice Activates and Advances the Human–Computer Relationship; MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
Nass, C.; Lee, K.M. Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. J. Exp. Psychol. 2001, 7, 171–181. [Google Scholar] [CrossRef]
Kiesler, S. Fostering common ground in human-robot interaction. In Proceedings of the IEEE International Workshop on Robots and Human Interactive Communication (RO-MAN 2005), Nashville, TN, USA, 13–15 August 2005; pp. 158–163. [Google Scholar]
Rowe, A.L.; Cooke, N.J. Measuring mental models: Choosing the right tools for the job. Hum. Resour. Dev. Q. 1995, 6, 243–255. [Google Scholar] [CrossRef]
Yamada, Y.; Ishihara, K.; Yamaoka, T. A study on an usability measurement based on the mental model. In International Conference on Universal Access in Human-Computer Interaction; Springer: Berlin/Heidelberg, Germany, 2011; pp. 168–173. [Google Scholar]
Hsu, Y.C. The effects of metaphors on novice and expert learners’ performance and mental-model development. Interact. Comput. 2006, 18, 770–792. [Google Scholar] [CrossRef]
Ziefle, M.; Bay, S. Mental models of a cellular phone menu. Comparing older and younger novice users. In Proceedings of the International Conference on Mobile Human-Computer Interaction, Glasgow, UK, 13 September 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 25–37. [Google Scholar]
Brooks, J.; Brooks, M. In Search of Understanding: The Case for the Constructivist Classrooms; ASCD: Alexandria, VA, USA, 1999. [Google Scholar]
Wilson, B.; Lowry, M. Constructivist Learning on the Web. New Dir. Adult Contin. Educ. 2000, 2000, 79–88. [Google Scholar] [CrossRef]
Kelly, G.A. A brief introduction to personal construct psychology. In Perspectives in Personal Construct Psychology; Bannister, D., Ed.; Academic Press: Cambridge, MA, USA, 1970; pp. 1–30. [Google Scholar]
Botella, L. Personal Construct Psychology and social constructionism. In Proceedings of the XI International Congress on Personal Construct Psychology, Barcelona, Spain; 1995; pp. 1–22. [Google Scholar]
Raskin, J.D. Constructivism in psychology: Personal construct psychology, radical constructivism, and social constructionism. Am. Commun. J. 2002, 5, 1–25. [Google Scholar]
Glaserfeld, E. Radical Constructivism: A Way of Knowing and Learning; The Falmer Press: London, UK, 1995. [Google Scholar]
Maturana, H.R.; Varela, F.J. The Tree of Knowledge: The Biological Roots of Human Understanding; Shambhala: Boston, MA, USA, 1987. [Google Scholar]
Ziemke, T. The construction of ‘reality’ in the robot: Constructivist perspectives on situated artificial intelligence and adaptive robotics. Found. Sci. 2001, 6, 163–233. [Google Scholar] [CrossRef]
Walker, B.M.; Winter, D.A. The elaboration of personal construct psychology. Annu. Rev. Psychol. 2007, 58, 453–477. [Google Scholar] [CrossRef]
Botella, L.; Feixas, G. Teoría de los Constructos Personales: Aplicaciones a la Práctica Psicológica; Laertes: Barcelona, Spain, 1988. [Google Scholar]
Tomico, O.; Pifarré, M.; Lloveras, J. Experience landscapes: A subjective approach to explore user-product interaction. Int. Des. Conf.-Des. 2006, 2006, 393–400. [Google Scholar]
Burr, V.; McGrane, A.; King, N. Personal construct qualitative methods. In Handbook of Research Methods in Health Social Sciences; Liamputtong, P., Ed.; Springer: Singapore, 2017. [Google Scholar]
Leach, C.; Freshwater, K.; Aldridge, J.; Sunderland, J. Analysis of repertory grids in clinical practice. Br. J. Clin. Psychol. 2001, 40, 225–248. [Google Scholar] [CrossRef]
Burr, V.; King, N.; Heckmann, M. The qualitative analysis of repertory grid data: Interpretive Clustering. Qual. Res. Psychol. 2020, 1–25. [Google Scholar] [CrossRef]
Gutman, J.; Reynolds, T.J. An investigation at the levels of cognitive abstraction utilized by the consumers in product differentiation. In Attitude Research under the Sun; Eighmey, J., Ed.; American Marketing Association: Chicago, IL, USA, 1979. [Google Scholar]
Reynolds, T.J.; Gutman, J. Laddering: Extending the repertory grid methodology to construct attribute-consequence-value hierarchies. Pers. Values Consum. Psychol. 1984, 2, 155–167. [Google Scholar]
Bech-Larsen, T.; Nielsen, N.A. A comparison of five elicitation techniques for elicitation of attributes of low involvement products. J. Econ. Psychol. 1999, 20, 315–341. [Google Scholar] [CrossRef]
Reynolds, T.J.; Gutman, J. Laddering theory, method, analysis, and interpretation. J. Advert. Res. 1988, 28, 11–31. [Google Scholar]
Fransella, F. Some skills and tools for personal construct practitioners. In International Handbook of Personal Construct Psychology; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2003; pp. 105–121. [Google Scholar]
Subramony, D. Introducing a "Means-End" Approach to Human-Computer Interaction: Why Users Choose Particular Web Sites Over Others; Association for the Advancement of Computing in Education (AACE): Waynesville, NC, USA, 2002. [Google Scholar]
Zaman, B. Introducing contextual laddering to evaluate the likeability of games with children. Cogn. Technol. Work. 2008, 10, 107–117. [Google Scholar] [CrossRef]
Jans, G.; Calvi, L. Using Laddering and Association Techniques to Develop a User-Friendly Mobile (City) Application. In On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1956–1965. [Google Scholar]
Kagan, J. Surprise, Uncertainty, and Mental Structures; Harvard University Press: Cambridge, MA, USA, 2002. [Google Scholar]
Zaltman, G.; Coulter, R.H. Seeing the voice of the customer: Metaphor-based advertising research. J. Advert. Res. 1995, 35, 35–51. [Google Scholar]
Coulter, R.A.; Zaltman, G.; Coulter, K.S. Interpreting consumer perceptions of advertising: An application of the zaltman metaphor elicitation technique. J. Advert. 2001, 30, 1–21. [Google Scholar] [CrossRef]
Ho, S.; Liao, C.; Sun, P. Explore Consumers’ Experience In Using Facebook Through Mobile Devices. PACIS 2012 Proceedings, 7 July 2012. Available online: http://aisel.aisnet.org/pacis2012/713 (accessed on 17 July 2020).
Siergiej, E.; Mentor, F. Intimate Advertising: A Study of Female Emotional Responses Using the ZMET. Explorations 2009, IV, 108–119. [Google Scholar]
Glenn, L.C.; Jerry, C.O. Mapping consumers’ mental models with ZMET. Psychol. Mark. 2002, 19, 477–502. [Google Scholar]
Zaltman, G.; Zaltman, L.H. Marketing Metaphoria: What Deep Metaphors Reveal about the Minds of Consumers; Harvard Business Press: Boston, MA, USA, 2008. [Google Scholar]
Luntz, F. Word That Work: It’s Not What You Say, It’s What People Hear; Hachette UK: Paris, France, 2007. [Google Scholar]
Hancock, C.C.; Foster, C. Exploring the ZMET methodology in services marketing. J. Serv. Mark. 2019, 8, 1–11. [Google Scholar] [CrossRef]
Annamma, J.; Sherry, J., Jr.; Venkatesh, A.; Deschenes, J. Perceiving images and telling tales: A visual and verbal analysis of the meaning of the Internet. J. Consum. Psychol. 2009, 19, 556–566. [Google Scholar]
Zaltman, G.; Coulter, R.H. Using the Zaltman metaphor elicitation technique to understand brand images. Assoc. Consum. Res. 1994, 21, 281–295. [Google Scholar]
Lee, M.S.Y.; McGoldrick, P.J.; Keeling, K.A.; Doherty, J. Using ZMET to explore barriers to the adoption of 3G mobile banking services. Int. J. Retail. Distrib. Manag. 2003, 31, 340–348. [Google Scholar] [CrossRef]
Bias, R.G.; Moon, B.M.; Hoffman, R.R. Concept mapping usability evaluation: An exploratory study of a new usability inspection method. Int. J. Hum.–Comput. Interact. 2015, 31, 571–583. [Google Scholar] [CrossRef]
Eakin, E. Penetrating the Mind by Metaphor. New York Times, 23 February 2002. [Google Scholar]

Figure 1. Research Method and Structure.

Figure 2. Study 1 Procedure.

Figure 3. Study 2 Procedure.

Figure 4. The structure of the Repertory Grid.

Figure 5. Collage Image of all participants.

Figure 6. Consensus map of using VA.

Figure 7. Result of HCA (P1).

Figure 8. Result of HCA (P14).

Figure 9. Result of HCA(P18).

Table 1. Participants Information.

No	Gender	Age	Total Duration of VA	Type of VA Used	Frequency of Use
P1	Female	30	5 years	Smart Phone	once a week
P2	Female	29	3 years	Smart Phone, AI Speaker	once a week
P3	Female	46	A year	Smart Phone, AI Speaker	10–14 times a week
P4	Male	29	6 months	Navigation	5 times a week
P5	Male	31	2 years	Navigation, AI Speaker	everyday
P6	Female	28	A year and half	Smart Phone, AI Speaker	1–2 times a week
P7	Female	37	6 years	Smart Phone, AI Speaker	once a week
P8	Male	29	3 months	Smart Phone, AI Speaker	once a week
P9	Female	25	2 months	Smart Phone, AI Speaker	everyday
P10	Female	43	A year and half	Smart Phone, AI Speaker	once a week
P11	Male	29	A year	Navigation	once a week
P12	Male	27	A month	Smart Phone	once a week
P13	Male	39	A month	Smart Phone	once a week
P14	Male	30	2 years	Smart Phone, AI Speaker	5 times a week
P15	Female	34	2 years	Smart Phone, AI Speaker	everyday
P16	Female	29	4 years	Smart Phone, AI Speaker	everyday
P17	Female	29	3 years	Smart Phone	1–2 times a week
P18	Female	28	5 years	Smart Phone	everyday
P19	Male	35	2 years	Smart Phone	2–3 times a week

Table 2. ZMET Interview Process.

Category		Process
Pre-test Questionnaire	Laddering	What is the reason why you use voice assistant? If you have not used one, what would you use it for?
		What are the consequences and/or values of using voice assistant?
		What role would you like the voice assistant to be?
		What are your opinions/thoughts about voice assistants?
Free Navigation		Use the provided voice assistant until you feel you are satisfied with the questions in the categories below. (music, timer, weather, web-search, small ltalk, scheduling, translate, speaker settings, calculate, traffic navigation, find a location, alarm)
ZMET	Step 1 collect image	Please scrape some images within the magazine provided that voice emotions or evocate images while using voice assistants in the previous step.
	Step 2 storytelling	Please explain the images you have collected.
	Step 3 missing image	Was there an image that you were looking for in the magazine when you were scraping the images?
		Why were you aiming to look for those images?
		Is there any image that you replaced instead of the ones you were looking for because the one you needed wasn’t on the magazine?
	Step 4 construct elicitation	(If so,) why did you think that it could replace the image you were looking for?
		Please classify the images you found according to your criteria.
		If you are done sorting, please label them by category.
		Please explain the theme of the categories. Explain your own criteria of sorting the images, and the way you sorted them.
		Was there an additional category that came up while you were deciding the categories? If there was, describe some images that would fit the category.
	Step 5 Metaphor elaboration	Please choose one image that describes your thoughts and emotions about voice assistants the best out of all the images.
	Step 5 Metaphor elaboration	What should be added to clarify your thoughts and emotions to the images selected (such as colors or images)? Please use the sticky notes to give an explanation.
	Step 6 sensory image	Please describe the image that is opposite to the images in the theme classified in Step 4.
		Please describe voice assistant using sensory images.
		Why did you think so?
	Step 7 the vignette	Please create a collage that describes your thoughts and feelings for a voice assistant.
	Step 7 the vignette	Please explain your collage.

Table 3. Examples of Collage Image Analysis.

Theme		Meaning
Image (a) P6	Shoulder contact with a person with a dog’s face.	I am happy that I am using the VA. VA, my cute friend, and me.
	The sun behind the dog’s face.	Emphasize cute and friendly images.
	A person is watching a screen from a dark background.	VA is searching to find out what information I am asking.
	A dim-faced person.	It’s hard to tell because it’s piled up in a veil—the VA’s double side like a friend.
Image (b) P14	A luxurious bag.	(1) Bag the secretary is likely to carry. (2) Luxurious feeling reminds me of a secretary.
	A kite-flying man.	A free-spreading seeker of information in the VA.
	A person is watching a screen from a dark background.	VA searching for information for me reminds me of a secretary.
Image (c) P18	Nose sculptures and sunglasses.	Unlike humans, with an intelligent, high-dimensional machine, expressing that it is similar to and different from humans.
	Various colors.	Emphasis on VA moving forward.
	Earth and stars.	VA’s potential for development with endless and infinite space.
	People are standing on the black line.	The black line is the borderline between man and VA, and those standing above represent the present in contrast to the forward VA.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Park, D.; Namkung, K. Exploring Users’ Mental Models for Anthropomorphized Voice Assistants through Psychological Approaches. Appl. Sci. 2021, 11, 11147. https://doi.org/10.3390/app112311147

AMA Style

Park D, Namkung K. Exploring Users’ Mental Models for Anthropomorphized Voice Assistants through Psychological Approaches. Applied Sciences. 2021; 11(23):11147. https://doi.org/10.3390/app112311147

Chicago/Turabian Style

Park, Dasom, and Kiechan Namkung. 2021. "Exploring Users’ Mental Models for Anthropomorphized Voice Assistants through Psychological Approaches" Applied Sciences 11, no. 23: 11147. https://doi.org/10.3390/app112311147

APA Style

Park, D., & Namkung, K. (2021). Exploring Users’ Mental Models for Anthropomorphized Voice Assistants through Psychological Approaches. Applied Sciences, 11(23), 11147. https://doi.org/10.3390/app112311147

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring Users’ Mental Models for Anthropomorphized Voice Assistants through Psychological Approaches

Abstract

1. Introduction

2. Theoretical Background

2.1. Anthropomorphized Personality

2.2. Perceived Emotion

2.3. Motivations and Values of Using Voice Assistants

2.4. The Usage of VA

2.5. Mental Model

2.6. Psychological Approach

3. Method

3.1. Study 1: ZMET

3.2. Study 2: Repertory Grid

4. Results

4.1. Results of Study 1

4.1.1. Collage Image

4.1.2. Consensus Map of Using VA

4.2. Results of Study 2

5. Discussion

6. Conclusions and Limitations

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI