From Typing to Talking: Unveiling AI’s Role in the Evolution of Voice Assistant Integration in Online Shopping

: This study develops a theoretical framework integrating the Technology Acceptance Model (TAM) and Uses and Gratifications Theory (UGT) to predict and understand the acceptance of voice shopping intentions, particularly through AI-driven voice assistants. This research delves into the dual aspects of AI voice shopping platforms: the functional attributes outlined by the TAM and personal gratifications highlighted by the UGT, such as enjoyment, performance expectancy, and perceived safety. It uncovers a favorable user attitude towards voice shopping, emphasizing the significant role of performance expectancy and perceived utility on behavioral intentions. Key insights include the critical importance of security and privacy for user trust and the acceptance of new AI technologies, and the necessity of a balanced approach that merges functional, emotional, and security aspects for successful AI integration in daily technology use. Contrary to expectations, this study reveals a weak relationship between social norms and perceived usefulness, suggesting a misalignment with societal expectations. This research enriches the understanding of voice shopping using virtual assistants, offering valuable insights into consumer behavior and AI technology acceptance. It highlights practical implications for AI research, the development of voice-based software, and AI-driven advertising strategies, emphasizing the communication of benefits and emotional resonance in voice-enabled AI assistants for consumer purchases.


Introduction
The development of artificial intelligence (AI) in voice assistant technology, epitomized by products like the Echo Dot with Alexa, has led the surge in AI adoption in various industries according to Amazon's Black Friday sales data [1].These AI-driven voice assistants, capable of mimicking human interaction, have transcended basic functionalities such as scheduling and music playback.They are now increasingly incorporated into home automation systems and business operations, marking a fundamental shift in humantechnology interaction.Major corporations like Domino's Pizza and Dunkin' Donuts are utilizing AI voice assistants for customer service, reflecting a broader trend in AI integration in the business sector [2,3].
The global market for AI voice assistants is predicted to expand dramatically, with projections indicating a growth to $27.16 billion by 2026, up from $10.7 billion in 2020 [4].Additionally, the expected shipment of over 409.4 million AI-enabled smart speakers by 2025 further underscores this trend [5].Despite this rapid growth, the use of AI for ecommerce transactions remains in its early stages.As of now, only 15% of smart speaker users in the U.S. frequently use AI for shopping [6].
While the expansion of AI voice assistants in personal and commercial domains is evident, there is a notable research gap in understanding their full impact on the shopping process.Current studies primarily focus on the initial stages of interaction, like product queries or adding items to a virtual cart, emphasizing the importance of trust and perceived human likeness [7].However, these studies do not adequately explore the role of AI in the later stages of shopping, particularly in final purchase decisions.This suggests that while AI technology enhances browsing and preliminary shopping activities, it is less frequently used for completing transactions, indicating a reluctance at the final buying stage [8].
Existing research on voice shopping technologies primarily utilizes traditional models, such as the Technology Acceptance Model (TAM), to understand user acceptance [9].However, this approach often falls short of capturing the complex interplay of psychological, experiential, and technological factors that influence consumer behavior in AI-driven voice shopping.As a result, there is a growing recognition of the need to integrate additional theoretical frameworks, like the Uses and Gratifications Theory (UGT), to provide a more holistic understanding of these dynamics [10].
In parallel, current studies tend to focus predominantly on the initial stages of consumer interaction with voice shopping, such as product queries or adding items to a cart [11].This leaves a significant gap in research concerning the later stages of the shopping process, particularly in terms of making final purchase decisions [8].
Moreover, while technological aspects like functionality and ease of use are wellexplored [12], there is a notable gap in understanding the impact of personal gratifications and psychological factors on consumer acceptance and the continued use of voice shopping technologies.Factors such as enjoyment, performance expectancy, and social norms are yet to be fully examined in this context [13].
Additionally, the personalization aspect and social roles of voice assistants, particularly how these influence consumer perception and satisfaction, remain underexplored areas.The extent to which consumers perceive voice assistants as friends, secretaries, or simply tools, and how well these AI agents cater to individual preferences, are crucial aspects that have not been adequately addressed in relation to consumer satisfaction and purchase intentions in voice shopping [14].These overlooked areas present a rich avenue for future research, promising insights into the evolving relationship between consumers and AIdriven voice shopping technologies.
The main aim of the paper is to develop and validate a theoretical framework that explains and predicts consumer acceptance and intentions to use AI voice shopping technologies, particularly through voice assistants like the Echo Dot.This objective is addressed by integrating traditional TAM variables (such as perceived usefulness, perceived ease of use, and perceived safety) with key gratifications from the UGT, including enjoyment, performance expectancy, and social norms.This paper seeks to offer a comprehensive understanding of consumer behavior in the context of voice shopping, taking into account both the functional attributes of the technology and the personal gratifications that drive user engagement.
A secondary objective of this paper is to address the limitations of existing TAM studies in the context of voice shopping by including additional theoretical perspectives such as UGT to provide a more nuanced understanding of user behavior.This involves exploring the psychological and experiential aspects that influence consumer acceptance of AI-driven voice shopping platforms.This paper aims to extend beyond the traditional focus on perceived risks associated with such technologies, emphasizing the utility, enjoyment, performance expectancy, and perceived safety as crucial factors influencing voice shopping intentions.

Voice Shopping
Voice shopping, powered by AI-driven devices like smart speakers and smartphones, has revolutionized the retail industry, offering consumers a smooth, interactive shopping experience.This advancement, seen in both screen-based and non-screen devices like Amazon's Echo Dot and Apple's HomePod, represents a significant change in consumer shopping behavior.The academic world has taken note, exploring the advantages of voice shopping through AI, such as increased speed, user-friendliness, and potentially higher sales conversion rates compared to traditional methods [15].
The integration of AI in voice shopping is a dynamic field influenced by factors like personality traits, trust, and privacy concerns, which shape customer experiences [16].Research reveals how trust and privacy concerns interact with personality traits to impact customers' shopping perceptions.Additionally, AI-powered voice assistants are poised to significantly alter consumer decision-making and the dynamics of consumer-firm interactions, presenting new marketing challenges and opportunities [17].
The expanding role of AI-based voice assistants across various sectors is also noteworthy, highlighting their evolution from support tools to key elements in enhancing experiences and aiding decision-making [18].The growth in voice shopping is driven by the increasing availability of smart speakers and AI voice assistants, valued for their efficiency and convenience [7,19].Psychological factors like social norms, perceived safety, and performance expectancy are critical in determining customer experience, impacting both purchasing decisions and the willingness to adopt AI for shopping [16,20].
Regarding information searching, reading reviews, and evaluating alternatives, barriers like perceived safety concerns emerge as significant.Concerns about personal data security and perceived risks associated with AI voice shopping platforms can negatively impact the intention to use these services.Addressing privacy and security effectively is thus critical to building trust and wider acceptance of voice shopping [19,20].
Another critical aspect is the personalization and social role of voice assistants.How consumers perceive these AI agents-as friends or secretaries-and how well they cater to individual preferences significantly influences consumer attitudes.Enhanced personalization and a relatable social role can greatly improve the shopping experience, leading to higher satisfaction and increased purchasing intentions [11].
While voice shopping is on the rise, it still faces challenges in achieving broader acceptance.Improving the technical sophistication of voice assistants, addressing privacy concerns, and enhancing user autonomy are key areas for focus.By addressing these challenges, consumer confidence can be boosted, leading to greater acceptance and integration of voice shopping in everyday commerce [7,21].In studying voice shopping intentions, various theoretical frameworks like the UTAUT and TAM have been applied, focusing on factors such as ease of use, usefulness, social influence, and hedonic motivation.These factors are crucial in shaping consumer perceptions and the acceptance of AI voice shopping [22,23].
The study of voice shopping intentions lacks a universally accepted theoretical approach, indicating a diversity of theories without consensus on the most comprehensive one.This diversity highlights the complexity of the field and underscores the need for an integrated theoretical framework that encompasses the various facets of consumer behavior in voice shopping.A review of recent studies, as shown in Table 1, confirms this variety of theoretical approaches.Each study employs different theories, suggesting that no single theory completely explains voice shopping intention.This evidence points to the multifaceted nature of consumer behavior analysis in the context of voice shopping.

Authors Theory Used Conclusions
[16] Goldberg's Big Five Factors, Theory of Reasoned Action (TRA-privacy) Impact of personality, trust, privacy concerns on customer experience performance in voice shopping.
[20] Unified Theory of Acceptance and Use of Technology (UTAUT) Influence of anxiety on customer adoption of voice shopping within the UTAUT framework.
[24] Voice Behavior Theory, Attention-Based View of the Firm Influence of initiative selling tactics on the acceptance of subsidiary initiatives by MNC headquarters.
[25] Pluralist Industrial Relation View, Unary Human Resource Management Theory Inspection and integration of employee voice research in view of its nature and effects. [26] Ref.
[27]'s Anthropomorphism Model Impact of perceived human likeness of voice assistants on purchase intentions in voice shopping.
[28] Extended Technology Acceptance Model (TAM), Theory of Diffusion of Innovations (DOI) Understanding the intention to use mobile shopping applications and its influence on price sensitivity. [29] Technology Acceptance Model, Unified Theory of Acceptance and Use of Technology, Social Exchange Theory, Privacy Concerns Exploring antecedents of voice-activated assistant usage intentions (perceived usefulness, perceived ease of use, social influence, trust and security concerns) [30] Uses and Gratifications Theory This study focused on four gratification dimensions in voice shopping (life efficiency, entertainment, social presence, affordance) and found that these dimensions, along with satisfaction, impact purchase intention.
[31] CASA (Computers Are Social Actors) Interaction with advertisements increases brand and product recognition compared to non-interactive advertisements.Contextual relevance enhances brand and product recognition, but talker variability does not affect it.[32] Communication privacy management (CPM) theory This study concluded that the anthropomorphism of virtual voice assistants significantly influences perceived safety, which is a critical factor affecting consumer engagement in voice shopping.

Technology Acceptance
Integrating an artificial intelligence (AI) perspective, the combined use of the TAM and the UGT offers a robust framework for analyzing voice shopping through AI-enabled platforms.The TAM primarily focuses on the practical aspects of technology use, such as perceived usefulness, ease of use, or safety.This model helps in understanding how the functional attributes of a technology influence its acceptance and usage.On the other hand, the UGT delves into the psychological and experiential motivations behind technology use.The UGT offers insights into aspects like enjoyment and performance expectancy, which are crucial in understanding why consumers choose to engage with technology beyond its functional capabilities.In the context of AI-driven voice shopping, while the TAM provides a foundation for assessing functional attributes like ease of use, UGT complements this by shedding light on the psychological engagement and experiential aspects that are unique to these AI interfaces.Together, the TAM and the UGT offer a comprehensive framework for understanding the multifaceted nature of technology acceptance and usage.This dualtheoretical approach allows for a comprehensive analysis that covers both the practical utility and the experiential elements of AI in voice shopping, leading to a more nuanced understanding of consumer behavior in this innovative domain.The selection of the TAM and the UGT as the foundational theories for studying voice shopping intention is particularly apt given their relevance and thoroughness in capturing the complexities of AI-influenced shopping behaviors.

TAM
The TAM, conceptualized by the work presented in [33], is rooted in behavioral psychology, particularly the Theory of Reasoned Action (TRA) proposed by the study presented in [34].Initially developed to evaluate computer-based information systems, TAM has demonstrated remarkable flexibility and applicability across various technological areas, including mobile commerce and AI-driven voice assistants.This adaptability underscores TAM's enduring significance in contemporary research, especially for investigating the impacts of emerging technologies like AI in voice shopping [12].
By incorporating the principles of the TAM and the UGT, researchers can effectively dissect and understand the multifaceted nature of consumer interaction with AI in voice shopping platforms.This approach enables a deeper exploration of how AI technology influences consumer decision-making and engagement, providing valuable insights for further development and the optimization of AI voice shopping experiences.
When adapting the TAM to the realm of AI-driven voice shopping, its emphasis on two central factors-perceived ease of use and perceived usefulness-becomes particularly significant.In the context of AI, where user interactions are often facilitated by sophisticated algorithms and voice recognition technologies, these elements are crucial.The simplicity of the AI interface and the practical benefits it provides, such as time-saving and convenience, are key determinants of user acceptance in voice shopping [35].The focused constructs of the TAM, distinct from broader models that encompass a wide range of variables, offer precise predictors of technology acceptance, making it a fitting model for investigating voice shopping powered by AI.The model's adaptability and its application in AI-enhanced voice shopping reflect its ability to yield valuable insights into user behavior and technology acceptance in the dynamic field of digital commerce.
However, the application of the TAM in areas like e-commerce and online banking has not been without criticism, particularly regarding its scope in explaining the acceptance of innovative technologies like AI in voice shopping.Critics have noted that the TAM may neglect external contextual factors such as social influences, individual differences, and situational variables, all of which can significantly affect technology acceptance behaviors [36].This limitation calls for additional theoretical frameworks to achieve a deeper understanding of user behavior, particularly in the face of rapidly evolving AI technologies, like voice assistants and voice shopping platforms [37].To address this gap, integrating UGT with TAM can offer a more comprehensive view of the factors influencing the acceptance of AI-driven voice shopping.

Uses and Gratifications Theory
Utilizing the Uses and Gratifications Theory (UGT) to examine AI-driven voice shopping platforms offers a unique perspective, emphasizing the psychological and emotional aspects of media use.The UGT explores how entertainment, information acquisition, and personal identity shape the appeal of these platforms, revealing why consumers might favor AI-enhanced voice shopping for personal gratification reasons beyond mere functional capabilities [10,11].
The significance of the UGT grows in the digital age, where it assesses not just technology's functional attributes but also personal preferences, psychological needs, and social influences.This approach helps to understand the deeper motivations behind choosing AI-driven technologies.Specifically, in voice shopping, the UGT, combined with the Technology Acceptance Model (TAM), sheds light on consumer behavior, showing how enjoyment, perceived usefulness, and social norms influence attitudes toward AI commerce.Performance expectancy and social influence also emerge as key factors in online shopping intentions, highlighting the importance of understanding both functional and experiential aspects for a comprehensive view of consumer engagement with AI voice shopping [38,39].
Studies in other AI contexts, like social media and chatbots, reinforce this perspective, stressing the importance of personal enjoyment, social connection, and privacy concerns in technology use.These findings suggest that while technological attributes are essential, users' subjective experiences and social contexts play an important role in their adoption and interaction with digital AI technologies [40,41].
Overall, incorporating the UGT alongside technological frameworks like the TAM provides a more complete understanding of user behavior in digital technologies, especially in AI-infused applications like voice shopping, by valuing both the functional attributes and the user-centric psychological and social elements [42].
The application of the UGT in studying purchase intentions is crucial, focusing on variables beyond just the technological features and connecting more closely with individual attitudes.The key gratifications identified include enjoyment, performance expectancy, and social norms.Research confirms the significant influence of enjoyment on consumer decisions [43] and the role of performance expectancy in predicting the continued use of online platforms, directly impacting purchasing behavior [13].
Additionally, the influence of social norms on consumer behavior is a major factor.Studies highlight the importance of social influences in shaping how consumers interact with and perceive AI platforms, underscoring the critical role of UGT gratifications in shaping consumer purchase behaviors in digital contexts [44].By examining these factors, the UGT offers a comprehensive view of the complex nature of consumer behavior in the digital era, recognizing that individual attitudes and social context are as influential as technological attributes in determining purchase intentions.
In summary, the blend of the TAM and the UGT in the context of AI-driven voice shopping allows for an exploration of not just functional attributes like ease of use and usefulness, but also the personal gratifications that drive users towards this technology, emphasizing aspects such as convenience and personalization.

Proposed Model and Hypothesis Development
Our proposed model, grounded in the TAM and the UGT, aims to comprehensively examine consumer behavior in voice shopping, particularly through voice assistants like the Echo Dot.This model integrates traditional TAM variables-perceived usefulness, perceived ease of use, and perceived safety-with key gratifications from the UGT, namely enjoyment, performance expectancy, and social norms, to provide a thorough understanding of consumer behavior in the context of voice shopping.Below we present the variables of the study: 1.
Perceived Usefulness and Perceived Ease of Use: In the realm of voice shopping, the concept of perceived usefulness refers to the degree to which a user believes that using a system will enhance their shopping experience [9].Supporting this, ref. [45,46] found that perceived usefulness, alongside perceived enjoyment and innovativeness, significantly impacts millennials' behavioral intention to use voice-activated shopping technologies.In parallel, perceived ease of use, another fundamental aspect of the TAM, signifies the extent to which a person believes that using a system will be effortfree [9].This factor is essential in voice shopping, where systems that are user-friendly are more likely to gain acceptance among consumers.

2.
Perceived Safety: Perceived safety in online shopping, especially in the context of voice commerce, includes the perceived risk of information security and privacy.Consumers' perception of safety and risk significantly influences their willingness to engage in online shopping.For instance, perceived risk was found to negatively influence the intention to use smart speakers for online shopping, highlighting the importance of perceived safety in the decision-making process [47].

3.
Enjoyment: Enjoyment is crucial for the acceptance of online games, significantly affecting users' behavioral intentions [48].These findings suggest that the enjoyment aspect of the UGT is directly linked to perceived usefulness, as it enhances user satisfaction and positive attitudes towards technology use.In the context of voice shopping, enjoyment refers to the pleasure and satisfaction users derive from the experience.This form of gratification is crucial as it positively influences attitudes towards the acceptance of voice shopping technologies.Enjoyment enhances the user experience, making it more likely for individuals to perceive voice shopping as useful and beneficial [43].4.
Performance Expectancy: Performance expectancy in voice shopping pertains to the anticipated improvement in shopping efficiency and effectiveness.This gratification, coming from the UGT, is a significant predictor of user satisfaction and the subsequent intention to continue using voice shopping platforms.It underscores the practical benefits and conveniences offered by voice shopping, thereby reinforcing its perceived usefulness [13]. 5.
Social Norms: Social norms refer to the influence of societal expectations and peer behavior on individual actions.In the context of social media, subjective norms were found to be influential in shaping users' attitudes towards content creation, although they did not directly affect behavior [44].This indicates that while social norms can guide user perceptions of usefulness, their direct impact on user behavior might be limited.
In focusing on artificial intelligence, the influence of a positive mindset is significant in shaping perceptions of an AI technology's usefulness, notably in applications like voice shopping.This positive perception can markedly change how users assess the utility of AI technologies [49].This aligns with the importance of enjoyment, a key factor in the context of AI, which is vital for user engagement [50] and, within the Uses and Gratifications Theory (UGT) framework, enjoyment transcends from a mere secondary element and becomes a primary influencer of user satisfaction and gratification in interactions with AI technologies.
Our model, grounded in both the UGT and the TAM, identifies enjoyment as a key predictor of consumer acceptance in AI-driven voice shopping technologies.This focus is influenced by the research presented in both [14,51], which highlights the importance of enjoyment in shaping user experiences with new AI technologies.By utilizing the [51] definition of enjoyment presented in a previous study, we aim to apply a consistent and comprehensive approach to analyzing this factor within our UGT-based framework.
Therefore, our hypothesis asserts that in the sphere of AI-driven voice shopping, enjoyment significantly impacts user satisfaction and gratification, which in turn affects the acceptance and continued usage of these technologies.This aligns with the emphasis the UGT places on personal fulfillment and pleasure as vital drivers of technology engagement, especially in the rapidly evolving field of AI.

H1.
Enjoyment exerts a positive effect on the perceived usefulness of voiced virtual assistants.
Performance expectancy refers to users' belief in an AI system's ability to enhance task performance, significantly influencing the satisfaction and gratification derived from the technology.This concept revolves around how effectively AI meets users' needs and desires, aligning with the [9] perspective presented in a previous study.
Research of the the UGT framework, when applied to AI, suggests that the higher the users' expectations of the AI system's efficiency, the more satisfaction they derive from it [37].In the context of AI, performance expectancy under the UGT implies that when users anticipate that an AI technology will efficiently meet their personal or professional objectives, their fulfillment and satisfaction is boosted, leading to an increased and sustained use of the AI system [13].Therefore, within the UGT framework for AI technologies, performance expectancy is intricately linked to meeting users' specific needs and desires.Demonstrating an effective performance is key to ensuring higher user gratification and continued interaction with AI technologies.

H2.
Performance expectancy has a positive effect on the perceived usefulness of voiced virtual assistants.
Social norms, understood as the collective expectations and behaviors within one's social circle, play a critical role in the UGT, regarding an individual's perception of tech-nology's benefits.These norms are seen as societal standards or guidelines that dictate appropriate behavior in specific social contexts [52].In the UGT framework, the impact of social norms on technology acceptance is attributed to the influence and pressure to conform to these prevailing norms.Research within the UGT context highlights the influence of social norms in shaping users' perceptions of a technology's benefits in the TAM [37].When individuals sense that their social or professional network expects them to adopt a certain technology, this perception enhances their belief in the technology's benefits.Within the UGT, social norms serve as a social driver, encouraging individuals to see a technology as more advantageous if it aligns with peer or colleague expectations.This reflects the emphasis the UGT places on the social and psychological motivations behind media and technology usage.
H3. Social norms have a positive effect on the perceived usefulness of voiced virtual assistants.
Perceived usefulness is vital in the context of technology acceptance, particularly in the case of VAs.Perceived usefulness refers to an individual's belief about the usefulness of a technology product or service, and it has been shown to be a significant predictor of acceptance and usage behavior [9].Studies have found that consumers are more likely to adopt and use technology products and services that they perceive to be useful [12].In the case of VAs, consumers may view them as useful for tasks such as setting reminders, playing music, or controlling other smart home devices [53].Additionally, consumers may perceive VAs to be useful as a means of accessing information or completing tasks quickly and efficiently [12].
Perceived usefulness exerts a more direct and robust influence on technology acceptance intention compared to perceived ease of use [54].Perceived usefulness takes precedence as a primary factor in an individual's intention to adopt new technology, while perceived ease of plays a secondary role [9].Perceived usefulness has positive effects on users' attitudes and behavioral intention to accept new learning systems [55].The study presented in [56] reached the conclusion that the perceived usefulness of information technology has a positive impact on users intention to use that particular technology.Additionally, the work presented in [57] discovered a positive association between perceived usefulness and users' intention to adopt a health information system.The work presented in [58] showed that the perceived usefulness of a new technology is positively associated with the intention to use that technology.Finally, the work presented in [59] showed that perceived utility positively influences user attitudes and acceptance of smart devices.
Perceived ease of use refers to how much an individual thinks that employing a particular system would entail little effort or difficulty [9].Perceived ease of use has long been considered an important factor in the acceptance of technology.The original TAM indicated that perceived ease of use works through perceived usefulness and has an indirect effect on behavioral intention to use [9].In his original TAM model, Davis proposed relationships between what we consider to be traditional variables.We adopt his approach and propose the following hypotheses: H4.The perceived ease of use of voiced virtual assistants is positively related to voice shopping intention.

H5.
The perceived usefulness of voiced virtual assistants is positively related to voice shopping intention.

H6.
The perceived ease of use of voiced virtual assistants is positively related to the perceived usefulness of voiced virtual assistants.
In recent years, the definition of perceived safety has changed due to changing customer behavior and the shift to online transactions.It was originally called perceived risk and was limited to concepts such as fraud or product quality, but, currently, it is described as the sense of uncertainty about potential adverse outcomes associated with the utilization of a product or service [60].These authors identified various types of risks and argued that it is important to include the perceived risk variable in the TAM.Customers relate to each other and assess risk when evaluating products/services for purchase, which can cause anxiety and discomfort.Various studies have concluded that perceived safety is an important factor in VA acceptance [61,62].Studies on acceptance have demonstrated that perceived risk stands out as a significant factor causing people's hesitancy towards utilizing this technology [16].
H7.The perceived safety of VAs is positively related to voice shopping intention.
The proposed model is shown in Figure 1.This model aims to offer an extensive understanding of how these factors collectively influence the acceptance and intention of using voice assistants for shopping.By assessing these variables, the model intends to provide valuable insights into the dynamics of consumer behavior in the evolving domain of voice commerce.

fulness of voiced virtual assistants.
In recent years, the definition of perceived safety has changed due to changing customer behavior and the shift to online transactions.It was originally called perceived risk and was limited to concepts such as fraud or product quality, but, currently, it is described as the sense of uncertainty about potential adverse outcomes associated with the utilization of a product or service [60].These authors identified various types of risks and argued that it is important to include the perceived risk variable in the TAM.Customers relate to each other and assess risk when evaluating products/services for purchase, which can cause anxiety and discomfort.Various studies have concluded that perceived safety is an important factor in VA acceptance [61,62].Studies on acceptance have demonstrated that perceived risk stands out as a significant factor causing people's hesitancy towards utilizing this technology [16].

H7. The perceived safety of VAs is positively related to voice shopping intention.
The proposed model is shown in Figure 1.This model aims to offer an extensive understanding of how these factors collectively influence the acceptance and intention of using voice assistants for shopping.By assessing these variables, the model intends to provide valuable insights into the dynamics of consumer behavior in the evolving domain of voice commerce.

Methodology
The data collection process took place in Spain in June 2022 and involved conducting an online survey through a market research company.Participants were selected based on their familiarity with VA technology in their daily lives and the concept of voice shopping.All participants had previous experience with VAs and were aware of their existence.They accessed a dedicated web platform designed for this study, where they were required to answer questions regarding their perceptions of VAs and voice shopping.We selected an age limit of 50 due to the pilot study results, which showed that users older than 50 are generally unfamiliar with the concept of VAs.The final survey sample consisted of 906 users of VAs, with 49% being male and 51% being female.The age range of the participants varied from 18 to 50 years.To ensure the survey's content validity in the context of VAs, an extensive review of the literature was conducted to tailor the variables within the model appropriately.The survey then underwent a rigorous validation process involving multiple experts.All the variables in the survey were evaluated using a sevenpoint Likert scale, where respondents could express their agreement or disagreement, with 1 indicating strong disagreement and 7 indicating strong agreement.Before the main survey, a pilot study was carried out to confirm that respondents comprehended the survey items effectively.Enjoyment, performance expectancy, and social norms were adapted from the scales presented in [63] and each one consisted of three items.Perceived safety was adapted from the scale presented in [64] and comprised two items.Perceived usefulness was adapted from the scales presented in [65] and consisted of three items.Perceived ease of use was adapted from the scales presented in [66] and comprised two items.Lastly, the scale for measuring voice-shopping intention was derived from the work presented in [67] and consisted of three items.

Results
The data collection process took place in Spain in June 2022 and involved conducting an online survey through a market research company.Participants were selected based on their familiarity with VA technology in their daily lives and the concept of voice shopping.All participants had previous experience with VAs and were aware of their existence.

Testing the Measurement Model
To investigate the proposed hypotheses, we utilized Structural Equation Modeling (SEM), an advanced statistical method that combines multiple regression with confirmatory factor analysis to simultaneously estimate a series of interrelated dependency relationships.In recent years, there has been a growing number of studies in the field of information systems employing SEM to test proposed models [68].
To ensure the reliability, dimensionality, and validity of our measurement scales, we conducted both exploratory and confirmatory factor analyses using statistical software including SPSS (version 24) and Smart PLS 4. Internal reliability was assessed using Cronbach's alpha for all variables, which exceeded the required minimum value of 0.70, indicating that our variables exhibited satisfactory levels of reliability [69].Convergent validity was confirmed by checking item loadings, all of which exceeded the recommended threshold of 0.70 [70].
The composite reliability values for all variables exceeded 0.70, signifying appropriate internal consistency of the variables [71].Additionally, the average variance extracted (AVE) demonstrated consistency with values equal to or greater than 0.50, indicating that each variable captured a sufficient amount of variance relative to measurement errors [72].In summary, our measurement scales met the criteria for reliability.The results of these analyses are presented in Table 2.We placed significant importance on establishing discriminant validity, a crucial aspect in confirming that the measurement scales used in our study effectively represent separate and distinct concepts.To accomplish this, we relied on the AVE values (as displayed in Table 3) and conducted a comprehensive evaluation of the squared correlation coefficients found in the respective rows and columns (as presented in Table 3).This meticulous examination adhered closely to the established criteria recommended by the work presented in [70,72].Our efforts in this regard provided robust evidence supporting the presence of discriminant validity.Furthermore, we conducted a thorough examination of the HTMT (Heterotrait-Monotrait) values, as advised by the work presented in [69].These values play a crucial role in assessing the adequacy of the measurement model, with HTMT values below 0.95 typically deemed appropriate, particularly for concepts that exhibit a high degree of similarity.Our comprehensive evaluation, as depicted in Table 4, demonstrated that all HTMT values remained below the 0.95 threshold, providing further confirmation of the strength and validity of our measurement model.

Testing the Structural Model
We evaluated the structural model's path analysis using a bootstrapping method with 5000 re-samples.The current structural model explained 52.6% of the voice shopping variance.The results indicate a satisfactory model as all of the structural model's variance percentages explained were higher than 10% [73].
Most of the hypotheses received support in the study (as indicated in Table 5).The research findings demonstrate a significant and positive relationship between enjoyment and perceived usefulness (H1) as well as between performance expectancy and perceived usefulness (H2).These results align with the propositions made by the studies in [37,49].However, the relationship between social norm and perceived usefulness (H3) did not yield significant results.Furthermore, the association between perceived ease of use and perceived usefulness was found to be significantly positive (H4).Regarding the hypotheses related to behavioral intention, all of them exhibited statistical significance.H5, H6, and H7, which examined the relationships involving perceived usefulness (H5), perceived ease of use (H6), and perceived safety (H7), were all supported.Table 6 captures the indirect effect of gratifications from the model on the variable 'voice shopping intention'.We can verify the indirect effect between enjoyment and voice shopping intention through perceived usefulness, with a significant t-value of 4.928 and a p-value of 0.000.Similarly, there is an indirect effect between performance expectancy and voice shopping intention, evidenced by a t-value of 7.915 and a p-value of 0.000.An indirect relationship occurs when one or more intervening variables are present, serving to transmit the influence of variable X on variable Y, as explained by the work presented in [74].This is particularly relevant in mediation analysis, where researchers explore whether changes in an independent variable lead to changes in a mediating variable, which in turn lead to changes in the dependent variable.

Conclusions and Discussion
Our study had a central objective: to develop a theoretical framework capable of explaining and potentially predicting the acceptance of voice shopping intentions.Specifically, we focused on extending a TAM model with various gratifications obtained from the UGT.Previous research has separately analyzed the acceptance of virtual assistants and voice shopping.However, the simultaneous examination of both topics has not yielded significant conclusions.In contrast to other published studies that primarily focus on perceived risks associated with such technologies, our work emphasizes the utility of this new technology in voice commerce.Our research aimed to elucidate the role of utility in shaping users' perceptions during voice shopping interactions.The findings of our study revealed a significant impact of performance expectancy and utility perceived on behavioral intention, indicating a favorable attitude among users toward voice shopping.The utilization of gratifications allows for a more comprehensive explanation of the intention to use VAs for voice shopping when compared to the use of simpler models of technology acceptance such as the TAM and the UTAUT.Our study demonstrates that the presence of enjoyment, performance expectation, and perceived safety, as gratifications, contributes to explaining the acceptance of voice shopping with VAs as a phenomenon.As anticipated, the results provided support for most of the hypotheses pertaining to gratifications.The outcomes presented in Table 5 and Figure 2 underscore the significance of enjoyment and performance expectancy concerning perceived usefulness (H1 and H2).The presence of features related to enjoyment and performance expectancy contributes to an explanation of the perception of usefulness in voice shopping with VAs.These findings align with the work presented in [64] where the authors also highlighted the importance of emotions and expectations when using VAs, as evidenced through gratifications and emphasizing the role of emotions in shaping user experiences and expectations with VAs, and focusing on how user expectations and the gratifications sought from VAs influence their overall satisfaction and continued usage.However, Hypothesis 3, which concerns the relationship between social norms and perceived usefulness, does not follow the same pattern.This phenomenon can be understood through the lens of the Diffusion of Innovation Theory [75], which suggests that social norms can impact technology acceptance decisions when there is a perceived alignment between the innovation and existing social norms.If users believe that using VAs for voice shopping is not in harmony with established social norms, or if these norms do not support this technology, then the influence of social norms on technology adoption may be weakened.Furthermore, the Diffusion of Innovation Theory proposes that the adoption of an innovation is influenced by how much this innovation aligns with each user's existing experiences and values.The absence of a significant relationship between social norms and perceived usefulness could be due to a perceived misalignment between these two factors.In simpler terms, this insignificant relationship might be attributed to a perceived lack of compatibility between societal expectations and the practical utility of using VAs for voice shopping.This could be because users already have prior experience with many technologies, and they may no longer expect to be socially accepted solely based on However, Hypothesis 3, which concerns the relationship between social norms and perceived usefulness, does not follow the same pattern.This phenomenon can be understood through the lens of the Diffusion of Innovation Theory [75], which suggests that social norms can impact technology acceptance decisions when there is a perceived alignment between the innovation and existing social norms.If users believe that using VAs for voice shopping is not in harmony with established social norms, or if these norms do not support this technology, then the influence of social norms on technology adoption may be weakened.Furthermore, the Diffusion of Innovation Theory proposes that the adoption of an innovation is influenced by how much this innovation aligns with each user's existing experiences and values.The absence of a significant relationship between social norms and perceived usefulness could be due to a perceived misalignment between these two factors.In simpler terms, this insignificant relationship might be attributed to a perceived lack of compatibility between societal expectations and the practical utility of using VAs for voice shopping.This could be because users already have prior experience with many technologies, and they may no longer expect to be socially accepted solely based on their use of technology.While the results regarding the relationships between PEOU and PU (H4) and PEOU and VSHOP (H6) are statistically significant, it is essential to highlight the limited effect that the PEOU variable has on these relationships.Some studies have suggested that the influence of perceived ease of use on technology acceptance may have weakened over time, especially in the context of adopting VAs [76][77][78][79].Pre-implementation testing demonstrated that perceived ease of use has a direct and significant impact on an individual's intention to use a system before they have any direct experience with it.However, as individuals gain experience with the system over time, the influence of perceived ease of use on their intention to use gradually diminishes and eventually becomes negligible.Although the direct influence of perceived ease of use on behavioral intention to use is weak, it does exert an indirect impact on behavioral intention through perceived usefulness, as concluded by the work presented in [78].
The results of H5 and H7 highlight the significance of perceived usefulness and perceived safety, respectively, when it comes to adopting new technology.Therefore, a higher level of perceived usefulness (preceded by enjoyment and performance expectancy) is essential for facilitating the acceptance of voice shopping.Similarly, a higher degree of perceived safety is also necessary for the acceptance of voice shopping with VAs.These findings align with recent and prior literature [32], suggesting that one of the primary factors influencing the acceptance of this technology is the confidence users have in the device's security.

Implications for Theory
We believe this study provides a valuable contribution to the relatively limited existing literature on voice shopping with virtual assistants.There is a scarcity of studies that specifically apply the TAM and the UGT to VAs, and the available research may be somewhat biased.This study has the potential to enrich the literature by applying TAM theory to VAs in the underexplored domain of voice shopping.Given the unique characteristics of VAs, many users may not fully comprehend their utility, leading to unknown conclusions.Therefore, we found it essential to investigate the factors that could enhance the behavioral intention of using VAs for voice shopping.Furthermore, this study significantly enhances the literature on voice shopping with virtual assistants by identifying two key gratifications in the acceptance of this technology: perceived usefulness and perceived safety.The influence of the former can be partly attributed to the emotions that arise from using these technologies and the expectations associated with them.The presence of the second variable finds support in prior literature, emphasizing the users' requirement for perceiving a technology as secure before embracing it [32].
Unlike other studies that have focused on specific VAs, this study has deliberately avoided such bias by not concentrating on any particular virtual assistant.Prior publications, as exemplified by [80], often narrow their focus to a particular existing virtual assistant, which could be influenced by users' prior judgments and knowledge.Despite the growing interest in VAs in the literature, there is a limited understanding of the role of voice shopping when using these devices.Currently, there is no consensus on the factors influencing the acceptance of this phenomenon, especially those associated with the characteristics of VAs.This study served two primary purposes.Firstly, it allowed us to contribute to the literature on virtual assistants by proposing a different perspective on their acceptance.Secondly, it shed some light on the phenomenon of voice shopping and the factors that may be necessary for its acceptance.

Implications for Practice
This study offers substantial practical implications in fields deeply intertwined with artificial intelligence, including AI research, the development of AI-enhanced voice-based software, and strategic approaches in AI-driven advertising.Reflecting on the work presented in [9], findings in the original Technology Acceptance Model (TAM), this research underscores that developers of AI voice assistants (VAs) have effectively cultivated a sense of usefulness in these services, thereby bolstering user interaction.It illuminates the necessity of not only focusing on the functional capabilities of AI devices but also on integrating emotional elements like user enjoyment, which is increasingly pivotal in AI interactions.This research further accentuates the paramount importance of security and privacy in the realm of emerging AI technologies.For AI systems, particularly those involving voice commands and shopping, ensuring secure transactions and safeguarding personal data is a critical factor for gaining user trust and acceptance.
For entities operating in the AI sector, managing and accurately conveying expectations is crucial to prevent misconceptions among users, given the significant impact of these expectations on the adoption of new AI technologies.A comprehensive approach that harmoniously blends functional, emotional, and security aspects is crucial for the successful adoption and integration of AI advancements in everyday technology applications.Considering the myriad interactions users have with various platforms daily, it is essential to communicate the benefits of utilizing AI voice assistants for voice-enabled purchases.Although the use of VAs for voice purchases is still emerging, this technology offers consumers the ability to make purchases through voice commands, marking a shift in e-commerce practices.The results of this study suggest that the emotional bond formed between the user and the AI assistant, facilitated by human-like voice features, influences key aspects such as enjoyment during the shopping experience, perceived safety, and the perceived usefulness of voice-assisted purchases.The ongoing challenge lies in finding the optimal voice for AI assistants that resonates with users and conveys feelings of safety, enjoyment, utility, and ease, thereby crafting a positively emotive journey throughout the voice shopping experience.

Limitations and Future Research
Like many studies, this research has its limitations.This study focused on general information and purchase searches, and it would be valuable to test the empirical model for specific product or information searches.It is reasonable to assume that the significance of localization may differ depending on the type of search task conducted, such as hedonic versus utilitarian product searches.This study did not consider the various VA devices available in the market, each of which has a unique voice.Future research could involve examining VAs to determine which has better acceptance for voice shopping and which voice has the most and least effects.In addition, future researchers can enhance their understanding of the influence of VAs on consumer engagement by incorporating additional anthropomorphic factors into the model.Lastly, we see the in-depth study of the emotions transmitted by these devices as essential, as suggested by the study presented in [7], which makes evident the need to understand the role of emotions in order to understand the user experience.Analyzing what emotions are transmitted in a possible shopping scenario could shed light on what the necessary emotional conditions should be to facilitate and encourage voice shopping.

Table 1 .
Voice shopping acceptance studies.

Table 2 .
Measurement model results.