The Effect of Social Presence and Chatbot Errors on Trust

: This article explores the potential of Artificial Intelligence (AI) chatbots for creating positive change by supporting customers in the digital realm. Our study, which focuses on the customer and his/her declarative psychological responses to an interaction with a virtual assistant, will fill a gap in the digital marketing research, where little attention has been paid to the impact of Error and Gender, as well as the extent to which Social Presence and Perceived Competence mediate the relationships between Anthropomorphic design cues and Trust. We provide consistent evidence of the significant negative effect of erroneous conversational interfaces on several constructs considered in our conceptual model, such as: perceived competence, trust, as well as positive consumer responses. We also provide support to previous research findings and confirm that people employ a biased thinking across gender and this categorization also influences their acceptance of chatbots taking social roles. The results of an empirical study demonstrated that highly anthropomorphized female chatbots that engage in social behaviors are significantly shaping positive consumer responses, even in the error condition. Moreover, female virtual assistants are much more commonly forgiven when committing errors compared to male chatbots. cues moderated the effect of message interactivity on perceived contingency, such that when the agent was accompanied by a non-human icon, message interactivity positively influence perceived contingency. Message interactivity moderated the effect of visual cues, such that anthropomorphic visual cues compensated for negative consequences resulting from the lack of interactivity.


Introduction
Artificial Intelligence-based messaging solutions, namely conversational bots, represent one of the first stepping stones in order for companies to become faster, more efficient, and more capable of providing customers with relevant and personalized experiences. Indeed, chatbots can be perceived as the perfect illustration of the development and implementation of customer-centric artificial intelligence that mimics human behavior, which has a wide range of applications in various fields, such as education, healthcare, financial services, e-commerce, etc.
Chatbots are able to understand the user's intent by deciphering verbal or written requests and responding with appropriate information. According to a report produced by Deloitte [1] and the investigations of human-robot interactions conducted by Holtgraves et al. [2], chatbots leverage Artificial Intelligence to process language, which enables them to perceive the intended meaning of human speech. Moreover, based on the studies conducted by Radziwill and Benton [3] and Deloitte Digital [4] chatbots have been shown to be intelligent and autonomous agents, which learn from past by Deloite India [1], chatbots can now be perceived as intelligent agents, capable of learning from every interaction and understanding queries just like humans. • Maturing of chatbot platforms. Due to the increased popularity of chatbot technology, the platforms are developing, allowing business users to easily build, train, and manage chatbots by themselves [4].

•
Development of e-commerce. In 2018, the global e-commerce retail sales amounted to USD 2.8 billion, representing an increase of 112% compared to 2014 (USD 1.3 billion) [20,21]. Holtgraves et al. [2] posited that many e-service providers are willing to incorporate intelligent bots that use natural language processing in order to boost profits and increase customer loyalty. Also, according to a report developed by Global Market Insights [23], since chatbots can provide personalized shopping experience and eventually improve customer satisfaction, this has led to a faster acceptance of chatbots in the e-commerce sector.

The Importance of Chatbots in Customer Service
According to Cui et al. [24], customer service is one of the most resource-intensive departments within an organization, which is playing an important role in its ability to generate revenues. Therefore, chatbots can be a great way to augment and replace human personnel in customer service, since they are also capable of answering higher value queries [25]. Holzwarth et al. [26] considered that customers are usually unhappy with their online experience as companies are impersonal and they do not provide the necessary level of assistance, especially in the case of unfamiliar or complex product categories. Chatbots in customer service should be perceived as a combination of three elements [25]: 1. Interface between human and chatbots, which is increasingly using voice interaction by leveraging Natural Language Processing and Artificial Intelligence; 2. Intelligence, as chatbots are more active in areas of broad expertise due to the advancements in machine learning and other techniques which allow them to understand and solve requests, as well as to learn from each interaction; 3. Integration, as chatbots can access a wide range of information from various sources due to the integration between systems and platforms, such as workforce management systems [26]. Sivaramakrishnan et al. [27] also argued that one of the barriers to online shopping is the lack of real time customer service coupled with inaccurate product information displayed on retail websites. Thus, the lack of social interaction and pleasant purchase experience are deterring a lot of customers from online shopping. In order to fulfill the hedonic and utilitarian benefits people are seeking while purchasing goods online, companies can leverage chatbots with anthropomorphic features. According to an Accenture study on chatbots [27], Chief Information Officers and Chief Technology Officers around the world see great potential in chatbots, especially for companies' operations and for their future architecture. More than fifty percent of the interviewed executives think that bots are able to deliver a large return on investment in exchange for minimal effort. Thus, McGoldrick et al. [28] argued that these will entail automatic social responses from users, increased information, and entertainment value and will eventually lead to higher satisfaction during the purchase experience.
Among the top three benefits of implementing chatbots, they mentioned: enhanced employee productivity, an improved ability to manage client queries by networking with other bots, as well as providing customers with a personalized and unique shopping experience and 24/7 access to information. Therefore, companies across different industries are willing to leverage the benefits of intelligent bots in order to streamline their activities, automate tasks, improve productivity, customer acquisition and retention, as well as foster the engagement of both employees and clients. Also, Gartner [20] predicted that twenty-five percent of all firms will integrate chatbot technology into their customer service by 2020. The contribution of this paper to the field of knowledge is important. Firstly, the paper will fill a gap in marketing research, where little attention has been paid to impact of the new forms of interconnection between customers and organizations. This paper thus discusses an option for facing the new competitive reality, involving sustainable marketing strategies. Secondly, the model developed and tested in this paper is, again, an important contribution for the study of the correlation between factors as error and gender, or the extent to which Social Presence and Perceived Competence mediate the relationships between anthropomorphic design cues and trust, through the deployment of competitive chatbots, tailored for gaining satisfaction and loyalty among customers and to better assist the management of companies in adopting responsible business operations.

Literature View on Anthropomorphism and Social Presence
The way in which people respond to chatbots is not only an important research topic in the marketing field, but its practical relevance is also paramount since recent technological developments coupled with constantly changing consumer behavior are fundamentally transforming interactions between clients and companies.
The Social Information Processing theory was one of the first theoretical models to explain how people develop impressions and relations in human-computer interactions, and was developed by Walther et al. [29]. Also, Holtgraves et al. [2], Shawar and Atwell [5]. and Taddei [30], suggested that people rely on various social cues, such as language, interactivity, and capability of expressing emotions, which serve as relational targets when interacting with computers.
Walther et al. [29] and Brave et al. [31] posited that humans mindlessly apply social rules to machines as a result of conscious attention to various contextual cues, such as usernames and profile pictures. This is the foundation of the CASA paradigm (Computers Are Social Actors), as according to Brave et al., people apply social heuristics to machine interactions and they tend to concentrate more on social cues and neglect the asocial traits of the conversational agents [31]. Wang et al. [32] made an important remark on the automatic social responses and posited that the tendency to engage information processing heuristics is not influenced by familiarity with novel technologies, which might influence the website's perceived socialness.
Holzwarth et al. [26] stated that the Media Equation theory posits that people are inclined to treat machines as social entities which engage in social behaviors and make social attributions like humans do [30,33], even though they are aware that computers do not experience human-like emotions [30]. Thus, people apply social conventions in human-computer interactions [32] and personify the technology regardless of the way in which it is represented on screen, whether through voice interaction, or by an agent [30]. Also, social reactions to machines were found to increase when the machines engaged in more human-like behaviors or when more social cues were presented [34].
According to Kilteni et al. [35] and Ho Moon et al. [36], a social presence or virtual togetherness represents the feeling that the conversational partner is living in the same world, and is also capable of reacting to human queries. Given the importance of this concept in the context of human-computer interactions, several studies have looked into the impact of anthropomorphism of virtual agents on the perceived social presence on a website.
Holzwarth et al. [26] found that the addition of an avatar on a retail website generated the feeling of social presence which led to an improved brand attitude, higher satisfaction with the retailer, as well as greater purchase intentions. Also, Wang et al. [32] posited that the usage of humanlike traits as cues in an online retail context results in increased perceptions of social presence on the website, which is in line with the previously described theories. Prior research conducted by Bente et al. [37] showed that anthropomorphic agents who were represented by images or used human-like language made people feel more immersed in the virtual environment, eliciting at the same time social responses from humans.
Moreover, Go and Sundar [38] argued that identity cues represent a key factor in developing certain expectations in relation to a chatbot's performance in an interaction. Also, these expectations influence an individual's psychological, attitudinal, and behavioral responses to conversational agents.
However, Kim et al. [39] had a contrasting view and showed that the presence of an anthropomorphized helper in a computer game can potentially undermine perceived autonomy, which eventually reduces the overall game enjoyment. Lastly, Visser et al. [40] demonstrated that anthropomorphic agents who exhibit social behavior are significantly shaping trusting beliefs towards virtual assistants. Social presence mediated the effect of conversational cues on perceived expertise and perceived friendliness, as well as, attitudes toward the website and behavioral intention. Also, perceived homophily mediated the effect of message interactivity on the consumer responses. In addition, perceived dialogue was a significant mediator in the relationship between message interactivity and perceived expertise, perceived friendliness, attitudes toward the website and behavioral intention. Visual anthropomorphic cues moderated the effect of message interactivity on perceived contingency, such that when the agent was accompanied by a non-human icon, message interactivity positively influence perceived contingency. Message interactivity moderated the effect of visual cues, such that anthropomorphic visual cues compensated for negative consequences resulting from the lack of interactivity.

Conceptual Model and Research Hypotheses
We place our research in the context of digitalization of business models, as well as the advancements of artificial intelligence, neuro-linguistic programming and machine learning in the area of human-computer interaction (HCI), with particular emphasis on interactions in the e-commerce sector. In doing so, we argue for paying greater attention to the determinants of trusting beliefs and positive consumer responses in a human-chatbot interaction. For the development of our conceptual model and the research hypotheses, we relied on two communications theories and one paradigm (Social Information Processing Theory, Media Equation Theory, and the CASA Paradigm), as well as on previous research findings. By extending these theories to the context of consumer behavior coupled with the previous research findings, we propose that during an online commercial experience, consumers respond positively to social cues designed to portray a real customer service representative, as this increases the sense of realism. Figure 1 depicts our conceptual model. We focused on the direct relationships between anthropomorphic design cues and social presence, as well as on anthropomorphic design cues and perceived competence. Then, we focused on two prerequisites for trusting beliefs: social presence and chatbots' perceived competence and we wanted to assess whether they mediate the effect of anthropomorphism and trust. Also, the error rate was used as a moderator in the relationship between anthropomorphic design cues and perceived competence.

Effects of Social Presence and Perceived Competence
Two important links in our conceptual model, namely Social Presence and Perceived Competence, were conceptualized in this study as trust determinants. While the notion of trust was historically investigated in organizational and societal contexts as stated by Følstad et al. [41], it seems that the study of trust in technology is controversial. However, the Human-Computer Interaction literature started to explore the concept of trust in technology. Hancock et al. [42] posited that trust is crucial in online interactions as it influences a customer's willingness to accept the information provided by the machines, to follow their suggestions, as well as to benefit from the advantages inherent in chatbot systems.
Another antecedent of trust in the context of human-computer interaction is the chatbots' perceived competence. Previous research results from Hancock et al. [42], Corritore et al. [43], and Desai et al. [44] posited that system functionality and performance-based factors (i.e., reliability and accuracy) form the trust toward that technology. Moreover, the performance consistency was found to have the largest influence on the trust in robots. Følstad et al. [41] also showed that the ability to correctly respond to customers' queries and provide helpful suggestions were critical factors in the development of trust towards chatbots used for customer service.
Thus, we hypothesized the following:

Downstream Consequences of Interactions with Chatbots
Based on the previous research findings, we postulate the following hypotheses:

Hypothesis 2a. Social presence has a positive influence on trust toward the chatbot.
Hypothesis 2b. Perceived competence has a positive influence on trust toward the chatbot.
Given the hypothesized effects of Anthropomorphic Design Cues on Social Presence and Perceived Competence, and consequently on Trust, it seems likely that Social Presence and Perceived Competence mediate the effects of Anthropomorphic Design Cues on trusting beliefs.
Hence, we proposed the two following hypotheses: Hypothesis 3a. Social presence mediates the effect of anthropomorphic design cues on trust.

Hypothesis 3b. Perceived competence mediates the effect of anthropomorphic design cues on trust.
Previous research results showed the positive influence of social presence and perceived competence on consumer responses. Social presence was found to positively influence customer satisfaction in online service encounters, as stated by Brave et al. [31], Verhangen et al. [45], and Gimpel et al. [46]. Moreover, Etemad-Sajadi [47] showed that social presence creates the feeling of the employees' presence and improves the customer experience in a retail interaction. In addition, Sivaramakrishnan et al. [27] observed that when a machine serves as an information agent, this positively influences the customer's purchase intentions. Wang et al. [32] and Hassanein and Head [48] obtained similar results showing the indirect effect of social presence on purchase intentions and even on recommendation intentions.
Regarding perceived competence, Honig and Oron-Gilad [49] showed that an erroneous anthropomorphic avatar lowers the customer's trust and the service encounter satisfaction. Moreover, their willingness to use the service again and to purchase goods depends on the recovery strategies: a compensation strategy is more efficient in improving the service encounter satisfaction, while the option or apology strategies are better at increasing willingness to use the service again.
Lastly, previous research studies also offered consistent and robust support for the effect of trust on positive consumer responses. Chattaraman et al. [50] argued that virtual agents facilitate trust development through their expressiveness during interactions where they respond to queries and fulfill customer requests. Kunkel et al. [51] showed that trust depends on the perceived expertise of the recommendation source, playing a key role in the client's intention to follow the recommendations. In addition, King and He [52], McKnight and Chervany [53], and Cyr et al. [54] found that trust in online retail interactions is essential to willingness to make online purchases.

The Moderating Role of Chatbots' Error
In order to check how the basic structure of our model interacts with previous research findings, chatbots' error completes the conceptual model as a moderator. It is known that people often develop expectations about robots based on how media presents them, which is frequently as perfect machines. Since conversational agents are perceived as social actors, this elicits mental models from Human-to-Human Interaction, as stated by Mirnig et al. [55]. Moreover, Ragni et al. [56] showed that people generally perceive robots as being functional, competent, and intelligent. Thus, there is a tendency to assume that robots are flawless.
The exploration of errors committed by social robots is clearly a new topic. However, several exploratory studies assessed the people's perception of faulty robots, as well as their reactions when interacting with erroneous machines.
Ragni et al. [56] demonstrated that agents which commit mistakes are perceived to be less reliable, less competent, and having weaker reasoning abilities, which negatively affects the objective task performance. Based on the research findings on faulty robots, we proposed that an anthropomorphic chatbot will raise performance expectations and the susceptibility to errors will negatively influence the perceived competence.
Thus, we postulated the following hypotheses: Hypothesis 4a. Chatbots' error moderates the effect of anthropomorphic design cues on perceived competence, such that an erroneous virtual agent will reduce perceived competence.

Hypothesis 4b
Given that chatbots' error moderates the effect of anthropomorphic design cues on perceived competence, this is ultimately tempering the effect on trust, which is a moderated mediation.

Experimental Study
The study participants (N = 240, MAge = 35.35, SDAge = 10.61; 44.6% female) were recruited via Amazon′s Mechanical Turk (MTurk), which is a crowd sourcing marketplace where virtual workers engage in "human intelligence tasks" (HITs). In order to ensure qualitative responses, the study participants needed to have HIT approval ratings of above 95% to be able to take part in our study. All participants were citizens of the United States of America and were paid $0.30 each for their participation. Moreover, the study participants were randomly assigned to one of the four experimental conditions, which varied depending on the manipulations of anthropomorphic design cues and chatbots' error.

Experimental Design and Procedure
In order to test the differences that gender might entail, an experiment was conducted in which participants had to communicate with an anthropomorphized chatbot, which was presented either as a Male (Paul) or as a Female (Sarah). The study employed a 2 (Chatbots' Gender: Male vs. Female) × 2 (Chatbots' Error: Yes vs. No) between-subject experimental design presented in Table 2.

Chatbots' Error
Yes (N = 60) Yes (N = 60) No (N = 60) No (N = 60) At the beginning of the study, the participants were made aware of the scope of the study, the three parts involved, and the duration of each part. Then, the participants proceeded with a pre-survey, where we controlled for several important alternative explanations for the rationale of the experiment, such as testing tech savviness or the importance of human relationships in a purchase situation. After that, we provided participants with the following instructions: "You are now proceeding with the second part of this study and your tasks are structured as follows: Go  Following the introduction to the study, participants had the chance to explore the website and after clicking on the chat icon, they were greeted by the virtual assistant. Here, all participants were randomly assigned to one of the four experimental conditions. For all the participants, the chatbot made an introductory statement consistent with their condition: "Hello! I am Paul/Sarah. I am here to welcome you to the world of sportswear and assist with product recommendations".
After this, study participants were taken through the pre-defined interaction script, which was designed to eventually lead to a product recommendation. Given that the chatbot has to understand what the customer needs in order to provide an accurate product recommendation, the virtual assistant followed a Q&A (Questions and Answers) format. At the beginning of the interaction, the virtual assistant asked the respondents several questions around their country of residence, category of sports apparel they are interested in, frequency of working out, preferred fit, color, material, and brand. Once a question was asked, the virtual assistant waited until the participant answered. Then, the chatbot responded to the participant′s answer with a programmed delay only in the high anthropomorphic condition, which was used to create the impression that they interact with a real person who is actually typing an answer.
If participants did not follow the pre-defined script when responding, or asked other questions which were not related to the main subject, the chatbot was programmed to redirect the conversation back to the pre-defined script. At the end of the interaction, the virtual assistant provided the participant with a product recommendation, as well as, two options: either to make a purchase or to add the recommended product or products to Favorites. Moreover, the virtual assistant offered to provide the participants with additional information on the delivery options. The experimental task took around 5 min to be executed.
After closing the chat with the virtual assistant, participants had to respond to a series of questions revolving around the variables used in our conceptual model: perceptions of the chatbot's anthropomorphic cues, perceptions of social presence, perceived competence, trust, as well as positive consumer responses. Finally, we assessed the willingness to disclose personal information and the perceived warmth towards the chatbot, as well as registered demographic information.

The Development of Stimulus Material
The development of the stimulus material for the experiments consisted of three steps as follows: Website implementation and online store set-up, Chatbot development, and Integration of the chatbot into the online store, as presented in Figure 2.  Lastly, we also used the AJAX Search for WooCommerce plugin to implement a search inside the products' characteristics. This plugin was linked with the IBM Watson [57] in order to allow the virtual assistant to provide product recommendations to the customers.
2. Chatbot development. For the chatbot's development, we used Watson Assistant, which is the IBM's chatbot for creating conversational interfaces that can be deployed on various channels. The Watson Assistant is pre-trained with relevant content from different industries and it is capable of searching answers from an existing knowledge base, while also asking for more clarity if the users' input is not sufficient [57]. The Watson model uses Intents, Entities, and a Dialog. We have implemented a new intent called #provide_recommendation that once triggered will start a conversation with the customer, gather data, as well as provide a product recommendation based on the user preferences which were stated in the dialog box. The conversation is triggered once the customer clicks on the chat icon, while the chatbot will greet the user and present his intention to provide a recommendation. Also, the chatbot was trained for other intents which were provided by default (e.g., #greetings). In addition, we added a node that recognizes when the user response matches the input condition for the newly added intent. Lastly, the chatbot stores the user preferences during the conversation with each customer and the chatbot was also programmed to guide the user back to the normal conversation flow in case of deviations. In order to select equally attractive avatars for the male and female versions of the virtual assistants, we decided to use only one picture and apply the gender swap functionality in two photo editing software: FaceApp and Adobe Photoshop. In this way, we could ensure that both genders have the same face shape, skin tone, eye color, hair color, as well as the same smile. Also, we selected the picture from an online database which provides free content to be used for personal or commercial purposes [58].
3. Chatbot integration into the online store. To integrate the chatbot into our store, we installed and customized the Watson Assistant plugin where we added all the credentials and keys from our own chatbot. Also, we added and stored all conversations that participants have with the virtual assistant in our database by using a unique key.

Manipulation of Gender
In order to manipulate the gender cues, the chatbot was presented with an avatar and an identity (i.e., Paul, Sarah) ( Figure 4). Also, to induce perceptions of interactivity and expertise, both virtual agents were highly anthropomorphized and thus, they were designed to use natural sentences, to act humanlike, as well as to reply with pre-determined delays in order to make the dialogue seem similar to a human-human conversation. All the study participants were exposed to the gender manipulation while chatting as follows: half of the study participants participated under Male chatbot condition and the other half participated under the Female chatbot condition.
Given that interactivity was found to be an important social cue that people rely on as a relational target [2], we manipulated the response timing, as well as the delay between two responses for both genders, in order to make the dialogue seem similar to a real interaction between humans. For this purpose, we counted the number of seconds needed for a human to type all the questions and answers from the script, which was used to train the chatbot. We also introduced a typing icon in order to create the impression that a real person is actually answering the query. Lastly, irrespective of the anthropomorphic condition, the chatbot was programmed to communicate using natural sentences and to be capable of responding to relevant questions, as well as, to a set of general questions, after which the agent was programmed to redirect the discussion to the normal flow.
In order to select equally attractive avatars for the male and female versions of the virtual assistants, we decided to use only one picture and apply the gender swap functionality in an online photo editor.

Manipulation of Chatbots' Error
For the manipulation of chatbots' error, we programmed the virtual assistant to randomly assign the error condition to half of the study participants. The forced mistake consisted of an erroneous product recommendation for the opposite gender, color, and different brand compared to the features indicated by the participant at the beginning of the interaction. We chose this error type that can be attributed to the basic knowledge of the chatbot given that the main scope of the virtual assistant in this study was to provide an accurate product recommendation based on the attributes specified during their interaction. Thus, we wanted to assess how participants respond to an erroneous chatbot in an online service encounter.

Dependent and Independent Variables
The dependent variables were Trust and Positive Consumer Responses. In order to address the variations in the definitions of trust, this study considered it as a multidimensional construct, composed of the following factors: sincerity, honesty, truthfulness, credibility, and reliability. In addition, we also measured the overall trust in the virtual assistant. In total, we used six seven-point scale items from Gupta et al. [59] (α = 0.92), Soh et al. [60] (α = 0.96), and Carroll and Ahuvia [61] (α = 0.81). The positive consumer responses were assessed through service encounter satisfaction, purchase intentions, as well as patronage intentions. The consumer responses were measured using seven items assessed on a seven-point scale, adapted from Holzwarth et al. [26] (α = 0.95), Etemad-Sajadi et al. [47] (α = 0.89), Chattaraman et al. [50] (α = 0.91), Cyr et al. [62] (α = 0.93), as well as, Maxham and Netemeyer [63] (α = 0.89). The anthropomorphic design cues served as an independent variable in our conceptual model. We used one seven-point scale item from Nowak and Rauh [64] (α = 0.84), and proposed a new item in order to measure the perception of human-like characteristics.

Mediating Variables
The conceptual model consisted of two mediators: Social Presence and Perceived Competence. Previously validated scales provided the basis for all of the items used to measure the variables included in the conceptual model. We employed a 7-point Likert Scale, thus the response options for each item ranged from 1 (strongly disagree/does not apply at all/not descriptive at all) to 7 (strongly agree/fully apply/extremely descriptive). For the measurement of Social Presence (SP), we used five seven-point scale items (human contact, human warmth, sociability, source of comfort, sense of support when in need) from King and He [52] (α = 0.88) and Bruwer et al. [65] (α = 0.88). The number of items used to measure perceptions of socialness in the previous sources was reduced to adapt them to the context of our research.
The Perceived Competence (PC) was assessed using six seven-point scale items from Cho [66], who reported a composite reliability (CR) of 0.99 for the use of scale with clothing purchases and Holzwarth et al. [26] who reported a Cronbach Alpha of 0.93. Lastly, we also wanted to assess the degree of warmth towards the virtual assistant and social disclosure. For the warmth towards the virtual assistant, we adapted the warmth index proposed by Aaker et al. [67] to the context of our research. Thus, we used a three seven-point scale items with α = 0.83.
For measuring Social Disclosure, we also used three seven-point scale items adapted after Cho [65], with a reported composite reliability (CR) of 0.90 for the scale usage with clothing purchases.
The variables, the measured items, the sources and the reliability of the scales used to assess the variables in our conceptual model are listed in Table 3 below:

Alternative Explanations
As the experiment involved an interaction with a chatbot on a retail website selling sports apparel, we decided to control for several alternative explanations, such as the participants' tech savviness and the degree of their openness to new technological developments.
Also, the importance of human relations is controlled as well; the rationale behind being this is that if human-human interactions are essential for the respondents, then the chatbot's identity will be perceived in a different way as a conversation partner, thus creating similar expectations to real-life conversations.
Therefore, we expect respondents who are less tech savvy and who give a high importance to human relations will feel less comfortable and more uncertain when interacting with virtual agents than those who are eager to embrace technological developments and communicate in a computer-mediated environment. Moreover, we controlled for the frequency of working out, shopping online, as well as purchasing sports apparel online. In addition, the acceptance of errors in a service encounter was controlled as well.
In order to ensure that the respondents are thoroughly reading the survey questions, we added one trap question asking them to choose a particular answer from a Multiple-Choice question. Table 4 shows that Cronbach's alpha values (CA) proposed by Bagozzi and Yi [68] for all the measurement constructs in the experimental study indicate excellent internal consistency (CA > 0.9). Therefore, the variables used in the study can be considered extremely reliable, as well as consistent in explaining the variances constituted in them. In order to provide evidence for construct validity, we employed a Confirmatory Factor Analysis. We found a significant chi-square (χ2240 = 610.872, p = 0.00) and a high value for Root Mean Square Error of Approximation (RMSEA = 0.145), which exceeds the cut-off value of 0.06 suggested by MacKinnon [69], implying that the model is not a close-fitting model. Given that the initial model produces a poor fit to the data, we looked at the modification indices and subsequently removed two items from the Trust construct (Tables 5 and 6). Then, we used the maximum likelihood estimation with robust standard errors and the results confirmed that the fit indexes CFI (0.969), TLI (0.962), and SRMR (0.037) significantly improved, while RMSEA (0.066) fell within the acceptable range (Steiger [70]). Therefore, we can confirm that there is a good fit between the conceptual model and the observed data. Moreover, both the composite reliability for each factor (CR ≥ 0.84, Bagozzi et al. [68] and Cronbach's α (CA ≥ 0.84, MacKinnon [69]) exceeded the recommended thresholds, indicating reliable measures, as well as a good internal consistency.

Measurement Model
Also, Table 4 shows that all standardized factor loadings are significantly associated with their latent variable at 0.001 level and their estimated loadings exceed the cut-off value of 0.5 (Hair et al. [71]), ranging from 0.831 (SP3) to 0.910 (SP2).
Moreover, Table 4 shows that each factor's Composite Reliability (CR) is higher than 0.84 as suggested by Bagozzi and Yi [68], which confirm that the scales used have stable and adequate measurement properties. Moreover, the Average Variance Extracted exceeds the cut-off value proposed by reference [68] (AVE ≥ 0.51).
The lowest AVE is registered for the Trust construct (0.72), while the highest AVE is for Social Presence (0.81).
We can confirm that at least 72% or more of the variances in the observed variables were explained by the chosen constructs in the empirical experiment, and the convergent validity was confirmed. However, the Average Variance Extracted exceeds the cut-off value proposed by Bagozzi and Yi [68] (AVE ≥ 0.51).
In addition, the Average Variance Extracted (AVE) for each factor exceeded the highest squared inter-construct correlations associated with that construct (Fornell and Larcker [71]). Thus, the results confirm that the measurement model is characterized by convergent and discriminant validity (Table 5).
Lastly, we checked the standardized factor loadings to identify other potential issues with the Confirmatory Factor Analysis ( Table 6). The results show that all standardized factor loadings are significantly associated with their latent variable at 0.001 level and their estimated loadings exceed the cut-off value of 0.5 of Hair et al. [72], ranging from 0.831 (SP3) to 0.912 (PC5).

Research Model
Through our empirical study, we wanted to check whether error would make a significant difference in participants' perceptions when being exposed to male or female chatbots. Even though Gender was initially conceptualized as a moderator, we decided to consider it as an independent variable, based on the assumption that people tend to create new digital divides and employ a biased thinking across gender, as demonstrated by Brandtzaeg and Følstad [21]. Also, Mou et al. [15] posited that gender cues are the primary information sought by people in computer-mediated environments and gender categorization influences the acceptance of machines taking social roles as stated by Walther et al. [32]. Previous research findings by Brave et al. [31] showed that people mindlessly apply gender stereotypes to computers. Male-voiced computers were found to convey more compelling comments, being perceived as more competent and friendlier.
Moreover, individuals tend to assume that women are more knowledgeable of "feminine topics", while their dominant behavior in an interaction is poorly received. Also, for technical inquires, Zumstein and Hundertmark [14] found that male chatbots instill more confidence, while for customer support centers, people expect to interact rather with female agents.

Gender Manipulation Check
We performed a chi-square test in order to determine the extent to which participants correctly identified the name of the virtual assistant with whom they interacted. The results show that 70.9% of the participants exposed to Paul could correctly identify the name of the chatbot with whom they interacted, while in the case of the female chatbot, 80.9% of the respondents correctly reported that they chatted with Sarah. Thus, the results confirm that our gender manipulations were effective.

Chatbots' Error Manipulation Check
In order to check for the chatbots' error manipulation, a chi-square test was used to determine whether the study participants perceived the chatbot's error or not. The results revealed that 65% of the participants in the erroneous condition noticed the error (χ 2 (1119) = 45.882, p < 0.001), and when asked for the degree of severity, the average was 5.46. Therefore, we can confirm that the error manipulation was effective.

Significant Mean Differences Imposed by Chatbots' Gender and Errors
The results showed that the participants in the female condition did not perceive the virtual assistant as being significantly more human-like and with significantly more human-characteristics (MSarah = 4.84, F (1236) = 0.047, p = 0.828), compared to the participants in the male condition (MPaul = 4.80, t = 0.218, p = 0.828). However, the results confirmed that Gender predicts significant differences in the positive consumer responses (MSarah = 4.65, MPaul = 4.21, F (1236) = 4.04, t = 1.97, p < 0.05). More specifically, participants exposed to the female chatbot reported significantly higher patronage intentions compared to those interacting with the male version (MSarah = 4.61, MPaul = 4.14, t = 1.98, p < 0.05). We also identified a significant main effect of Gender on Social Disclosure, as participants exposed to the female virtual assistant turned out to be more inclined to provide her with personal information (MSarah = 3.85, F (1236) = 11.77, p ≤ 0.001) compared to those exposed to the male version of the chatbot (MPaul = 3.04, t = 3.412, p < 0.001).
Beyond the reported results, we did not register any main or interaction effects that are significant for any of the manipulations used. Second, we ran another 2 (Gender: Paul vs. Sarah) × 2 (Chatbots' Error: Yes vs. No) ANOVA on the trust scale. We found no significant main effect of Gender on perceived Trust (F (1236) = 1.625, p = 0.204), which implies that H1c is not supported. Also, the interaction between Gender and Chatbots' Error did not predict significant differences in the Trust towards the chatbot (F (1236) = 1.953, p = 0.164). However, there was a main effect of Chatbots' Error on the perceived Trust (F (1236) = 7.22, p < 0.01). Indeed, the participants in the erroneous condition (MErroneous = 4.56) reported significantly lower levels of Trust compared to those exposed to a non-erroneous virtual assistant (MNon-Erroneous = 5.05, t = 2.67, p < 0.01).

The Effects of Chatbots' Gender and Errors
Third, we used a 2 × 2 ANOVA to test the effect of Gender on Perceived Competence. The results showed that Gender alone does not predict significant differences in the Perceived Competence (MSarah = 4.84, MPaul = 4.46, F (1236) = 3.17, p = 0.069). Also, Sarah was perceived as more competent than Paul in both the erroneous and non-erroneous conditions, but especially more when the virtual assistant was programmed to commit errors (MSarah = 4.58, MPaul = 3.98, t = 1.924, p = 0.057). Therefore, H1b is not supported. However, Chatbots' Error alone predicts significant differences in the Perceived Competence (F (1236) = 12.92, p < 0.001). However, the interaction between Chatbots' Gender and Error does not predict significant differences in the Perceived Competence (F (1,236) = 1.26, p = 0.261), which rejects our hypothesized moderating effect in Hypothesis 4a.

Downstream Consequences in the Conceptual Model
In order to derive managerial implications, we also looked at the downstream consequences of interactions with chatbots, namely at the positive consumer responses that such encounters might entail. The results of a Univariate Analysis of Variance conducted on the Trust scale showed that Social Presence has a significant main effect on Trust (F (30,182) = 11.26, p < 0.001), which supports H2a. Also, we found that error predicts significant differences on perceived Trust (F (1182) = 16.18, p < 0.001).

Moderated Mediation Analysis
We employed a moderated parallel multiple mediation model of Hayes [73] (SPSS Macro PROCESS, Model 7, bootstrap samples = 10,000) to test whether Chatbots' Error moderates the underlying process through Social Presence and Perceived Competence. Thus, Gender was modeled to exert its effect on Trust indirectly through two mediators: Social Presence and Perceived Competence, as we wanted to model multiple mechanisms at the same time in an integrated model.
The results showed that Gender does not have a significant effect on either Social Presence The main study results for the effects of chatbots' gender and errors are presented in Figure 5.

General Discussion, Research Contributions, and Practical Implications
The results of our empirical study confirmed that gender cues are essential in creating positive consumer responses. The study participants who interacted with the female virtual assistant reported significantly higher patronage intentions and willingness to disclose personal information. Moreover, even in the error condition, the female chatbot was much more commonly forgiven and registered significantly higher levels of socialness perceptions and service encounter satisfaction. The results are in line with the previous research findings and reinforced that idea that people develop digital divides based on gender stereotypes, and they expect to interact with female agents during online customer support interactions. We also shed light on the reluctance of customers to engage in retaliatory behavior when interacting with a female chatbot, that was much more forgiven compared to the male version. This study also demonstrated that the Chatbots' Error rate predicts differences in the trusting beliefs and positive consumer responses. Indeed, respondents are more willing to purchase a product after an error-free interaction. Also, they are more satisfied with the service encounter and are more inclined to engage in patronage behavior towards the virtual assistant and website. Moreover, the results of the control checks showed that the provision of an error-free experience is essential for customers who do not value human relations in a purchase situation.
In addition, this study also showed that perceptions of social presence and competence play a critical role in developing strong trusting beliefs. In this way, more evidence is provided that social presence and perceived competence are indeed paramount determinants of trust in an interaction with a virtual assistant.
Moreover, beyond the significant effect on trust, the empirical study supports the applicability of perceived competence as a determinant of positive consumer responses, particularly service encounter satisfaction, purchase intentions and patronage intentions. Thus, this provides evidence that the perceived competence of customer service representative, a key element in offline service encounters, is essential to the development of positive consumer responses in online interactions as well.
Finally, this study has several implications. First, we make a solid case for the development and implementation of chatbots to provide product recommendations in a retail context given the high scores obtained for the positive consumer responses (service encounter satisfaction, purchase intentions, and patronage intentions) after respondents interacted with the virtual assistants.
Moreover, we believe that strong relationships between customers and brands are difficult to develop in an online service encounter due to the lack of humanness in this setting.
Therefore, we suggest retail firms which rely significantly on their online presence to give extensive thought and allocate appropriate resources when designing virtual assistants with anthropomorphic design cues.
Thus, if successfully implemented, social cues can create stronger perceptions of socialness and warmth, which will be translated into emotional bonds that would clearly provide firms with a sustainable competitive advantage. Because we found error to significantly influence the Perceived Competence, Trust, as well as Positive Consumer Responses, online practitioners should devote resources to developing virtual assistants that are capable of providing customers with a compelling error-free experience. We also showed that this approach is essential in order to efficiently target people who are open to new technological developments, as well as, customers for whom human relations are not important in a purchase situation.
Lastly, given the significant differences in positive consumer responses when manipulating the gender of the chatbot, we advise practitioners to strongly consider the deployment of female virtual assistants for customer care interactions in a retail context. The study results showed that this is particularly important when targeting consumers who believe that machines are flawless. Thus, presenting them with a female virtual assistant will create stronger perceptions of warmth, generosity, and kindness, as well as higher levels for purchase intentions and service encounter satisfaction.

Limitations and Avenues for Future Research
Our research is subject to several limitations. First, the manipulation of the anthropomorphic design cues covered limited elements, such as identity cues (name), visual cues (avatar), and interactivity (delayed answers, typing icons). However, our current findings on the significant warmth perceptions produced by highly anthropomorphized agents should encourage further exploration of relevant social cues in an online retail context. Thus, we encourage researchers to design and deploy more advanced virtual assistants that leverage speech recognition, as well as improved visual and identity cues (i.e., three-dimensional model). This might provide new insights into how people respond to virtual assistants mimicking emotions in real-time voice interactions and whether customers would apply social heuristics and engage in mental models from Human-to-Human interaction. Besides the previously mentioned social cues and the suggested technical improvements, marketing practitioners could also study the impact of the age of virtual assistants on the customers' perceptions and assess whether the younger or the older ones would lead to more positive consumer responses.
Second, we used a sample of respondents (N = 240) recruited via Amazon′s Mechanical Turk. Even though the sample served to increase the external validity and general reliability of the research findings, the study results cannot be generalized to all customer segments in the online environment. Thus, in order to cross-validate the results, future research might incorporate more heterogeneous samples for better customer segmentation. Third, the study empirically established that anthropomorphic female virtual assistants lead to positive consumer responses in the context of sportswear, a relatively low-risk product category.
We encourage researchers to assess whether the effects of anthropomorphic design cues hold in other contexts, such as in the case of riskier products (i.e., mortgages) or hedonic products (i.e., luxury products). We believe that a deep understanding of the most relevant purchase situations in which anthropomorphized virtual assistants improve socialness perceptions and perceived competence, and subsequently lead to trust and positive consumer responses would help companies to wisely employ social cues in the online environment.
Lastly, the mediation paths for both social presence and perceived competence were not statistically significant in either of the empirical studies. Therefore, we encourage researchers to test other potential mediators or moderators of the relationship between Anthropomorphic Design Cues and Trust. An interesting construct would be the consumer search strategy in order to assess which strategy will facilitate the development of an optimal online experience marked by enjoyment, namely the flow experience.

Conclusions
Sustainability is a complex phenomenon, which manifests itself in the direction of a healthy development of organizations, integrating social, economic, and environmental problems into their development strategies [74][75][76][77]. Artificial intelligence, along with other new digital technologies, fundamentally redefines the business environment, supporting sustainable development by creating value for customers, stakeholders, and for the environment. The rapid development in the last few years of Artificial Intelligence and of digital technologies has led to an increased pro-activity in adopting new strategies related to the relations between consumers and organizations in a sustainable way. The recent developments in the fields of Artificial Intelligence, Machine Learning, and Natural Language Processing are clearly blurring the line between humans and non-humans. We suggest that despite the complexity behind the customers' psychological and behavioral answers to virtual assistants, blurring this line even further could compensate for the lack of human contact in online environments, while also increasing customers' willingness to trust the technology and engage in positive consumer responses.
First and foremost, this study is, to the best of our knowledge, one of the first to consider the impact of erroneous chatbots on how consumers perceive the interactions with the virtual assistants. Thus, we make an important contribution to the marketing literature by providing consistent evidence of the significant negative effect of erroneous virtual assistants on several constructs considered in our conceptual model: perceived competence, trust, as well as positive consumer responses. In this article, we explored whether and under which conditions anthropomorphism facilitates trusting beliefs and ultimately, encourages positive consumer responses in an online service encounter. We also assessed the extent to which an interaction with an erroneous virtual assistant negatively influences customers' declarative psychological responses. The results of an experimental study confirmed that gender cues are essential in creating positive consumer responses. The study participants who interacted with the female virtual assistant reported significantly higher patronage intentions and willingness to disclose personal information. Moreover, even in the error condition, the female chatbot was much more frequently forgiven and registered significantly higher levels of socialness perceptions and service encounter satisfaction.
This article also demonstrated that the Chatbots' Error rate predicts differences in the trusting beliefs and positive consumer responses. Indeed, respondents are more willing to purchase a product after an error-free interaction. Also, they are more satisfied with the service encounter and are more inclined to engage in patronage behavior towards the virtual assistant and website after such an interaction.