1. Introduction
Live streaming commerce has rapidly become one of the most popular and profitable forms of e-commerce [
1]. Recent advances in natural language generation technologies, such as ChatGPT and Wenxin Yiyan, have accelerated the integration of deep learning into avatar-based interfaces, enabling virtual anchors to operate continuously while reducing barriers to participation and time constraints. For example, virtual influencer Lil Miquela has built a strong Instagram presence and endorsed major fashion brands such as Prada and Chanel, whereas Chinese virtual idol Luo Tianyi engages fans in real time through live streaming supported by social media, motion capture, and artificial intelligence [
2]. Like virtual idols, virtual anchors rely on digital personas and interactive techniques, including voice interaction and personalized recommendations [
3]. Unlike virtual influencers, however, who primarily attract consumers through their personal image [
4], virtual anchors can create interactive and immersive experiences that translate brand exposure into direct sales [
5]. In live-streaming settings, they facilitate consumer decision-making through real-time, dynamic interaction, thereby increasing sales conversion, user engagement, and loyalty [
6,
7].
Conventional research on sensory attribute-product functionality consistency has mainly focused on aligning external distal sensory cues (e.g., visual shape, color, and sound) with actual product functions. In live-streaming contexts, consumers primarily rely on distal sensory information such as sight and sound, while being unable to directly access proximal sensory perceptions related to taste, smell, and touch through a screen [
8]. This lack of direct sensory engagement often leads consumers to question the authenticity of product information. To compensate for this limitation, livestream anchors can use sensory language to convey vicarious sensory perceptions. Sensory language refers to language that describes product attributes in ways that appeal to consumers’ senses. Existing research suggests that anchors can enhance consumers’ vicarious sensory perceptions by providing vivid and precise descriptions of taste, touch, and smell, thereby partially compensating for the difficulty of conveying such sensory experiences directly [
9]. Human livestream anchors, for example, can offer authentic and detailed sensory descriptions that help consumers indirectly access otherwise unavailable sensory information [
10]. However, the extent to which virtual anchors can achieve similar effects remains unclear, given their technological limitations in conveying sensory information [
11]. Although sensory marketing research has made considerable progress, especially in relation to vision and hearing [
12], much less is known about whether and how AI-powered virtual anchors can compensate for the absence of proximal sensory perceptions through linguistic descriptions. In particular, it remains unclear whether consumers can form vicarious proximal sensory perceptions (e.g., taste, smell, or touch) through AI virtual anchors and through what mechanisms such effects may occur. To address this gap, the present study examines how virtual anchors compensate for the lack of direct proximal sensory access through proximal sensory language descriptions and evaluates the effectiveness of this mechanism in live-streaming settings.
In product descriptions, virtual anchors often rely on artificial intelligence (AI) technologies, such as large language models, to generate content. Unlike human influencers, who may draw upon their own sensory perceptions when providing language descriptions, AI-generated language may exhibit specific biases. These include over-generalization of sensory perceptions, unconscious reactions to specific items, or difficulty accurately capturing subtle product differences [
13]. These biases may cause generated descriptions to deviate from a product’s actual sensory characteristics or from consumers’ expected experiences, thereby undermining perceived authenticity. Accordingly, when virtual anchors convey vicarious sensory perceptions of products, proximal sensory description fit—that is, the extent to which descriptions match actual product attributes—becomes especially important. Because consumers cannot directly verify the accuracy of such descriptions, any bias or distortion in sensory information may increase skepticism about their authenticity. Prior research suggests that consumers’ perceptions of the authenticity and credibility of virtual anchors’ product descriptions significantly shape purchase intention [
14]. Therefore, this study examines how, in the absence of direct physical interaction, virtual anchors can strengthen consumers’ perceptual identification and emotional connection with products by improving AI-generated proximal sensory description fit. Ensuring the accuracy and consistency of such descriptions is essential for reducing authenticity concerns caused by descriptive bias and, in turn, promoting purchase intention.
During live streaming, clear product displays (e.g., appearance, details, and parameters) help convey key product information and support consumers’ product perceptions [
15]. As distal sensory cues, these visual displays can reduce uncertainty surrounding proximal sensory descriptions and provide consumers with a basis for validating the anchor’s claims. However, in the context of virtual anchor live streaming, certain online retailers are unable to offer conventional product displays due to technological and spatial limitations [
16]. Conversely, some brands opt to engage viewers through creative content and virtual images [
17]. Given the limitations of virtual anchors in proximal sensory description, the present study will further explore the potential of product display to compensate for deficiencies in proximal sensory description fit and will analyze the interaction between product display and proximal sensory description fit.
The present study makes three contributions. First, it examines how greater alignment between the vicarious proximal sensory perceptions conveyed by virtual anchors through sensory language and actual product attributes enhances purchase intention. In doing so, it provides a clearer basis for brands and retailers to develop strategies that strengthen consumer engagement through vicarious proximal sensory perceptions in virtual anchor livestreaming. Second, drawing on cognitive–emotional theory, we propose a chained mediation model to explain how proximal sensory description fit influences consumer responses in virtual anchor livestreaming. Specifically, the study shows that optimized proximal sensory description fit enhances perceived authenticity and attractiveness, thereby increasing purchase intention. This finding also advances understanding of how consumer perceptions are formed in AI-mediated commerce. Third, we show that incorporating product display during livestreams helps virtual anchors communicate product information more effectively, which in turn strengthens perceived authenticity and purchase intention. In practical virtual livestreaming sales contexts, the combination of sensory descriptions and product display can better stimulate consumer interest and purchase desire, thereby enhancing the attractiveness and conversion effectiveness of livestream sales.
3. Overview of Studies
The experimental approach entailed the implementation of two primary experiments, in which the postulated hypotheses were examined. Study 1 evaluated the efficacy of our conceptual model, demonstrating that proximal sensory description fit exerted a positive influence on purchase intention (H1), thereby substantiating the chained mediation process in the study through the measurement of the mediation effect (H2). Study 2 builds on this result by comparing the moderating role played by the presence or absence of product display (H3). In the present study, the same avatars and virtual backgrounds were utilized for the stimulus material, thereby ensuring maximum control over the other variables. For ease of reference, all hypotheses and their corresponding empirical tests are summarized in
Supplementary Section S6. The conceptual framework underpinning this study is illustrated in
Figure 1.
Study 1: The impact of proximal sensory description fit on purchase intention.
The primary objective of Study 1 was to investigate the effect of proximal sensory description fit of a product during live streaming of a virtual anchor on purchase intention, as well as to verify the mediating effect of the chain mediators of perceived authenticity and attractiveness between proximal sensory description fit and purchase intention. A manipulation test was conducted to ascertain the degree of fit between proximal sensory attributes (i.e., taste, smell, touch) and product features prior to the commencement of the main experiment.
Manipulation Check.
The selection of chocolate as the test product in the manipulation test process was guided by several factors. Primarily, in everyday life, consumers evaluate chocolate from a multi-sensory perspective, encompassing factors such as taste, texture, and smell, which are closely related to the objectives of this study. Secondly, chocolate is a well-known product with a balanced gender distribution among its consumer base and a wide variety of brands, which makes it challenging to restrict the experimental process to a specific brand. As a result, the subjects were not easily limited to a specific brand during the experiment.
A sample of 150 participants from the Credamo online platform (61.3% female, Mage = 32, SDage = 8.49) was selected for evaluation. Participants were selected from the platform’s random user pool. Appropriate screening criteria were established with the dual objectives of safeguarding user privacy and security, whilst meeting the research requirements of the study. All subjects who successfully completed the programme and met the requisite standards received appropriate compensation. We determined the required sample size using the F-test (effect size F = 0.25; α error possibility = 0.05; Power (1 − Beta error possibility) = 0.95; Number of groups = 3) in the G*Power software (version 3.1.9.7) [
52]. Participants were assigned to view images of chocolate products that had been selected by the researchers. The images did not contain any logos or other influencing factors. The data was filtered to remove any data points that were less than half the average duration or more than three times the average duration, based on the simulated average duration for answering the questionnaire items. Prior to the analysis, 14 participants were excluded for exceeding the allotted time for responding to the questions, and 18 participants were excluded for not adhering to the prescribed instructions. The final number of usable responses was 118 (53.5% female, Mage = 32, SDage = 8.6). A comprehensive compendium of sample characteristics is furnished in
Supplementary Section S3. a. Preliminary Study Sample Characteristics.
Following a thorough review of the extant literature, participants were presented with three questions designed to ascertain the sensory modality with which the product was most compatible (“I think this product is more compatible with the taste/smell/touch senses”; 1 = strongly disagree to 7 = strongly agree). The scale in question was adapted from (Making Sense? The Sensory-Specific Nature of Virtual Influencer Effectiveness). Furthermore, given that the subjects were all Chinese, the English measurement questions of the scale were translated into Chinese, and English-speaking experts were invited to modify the measurement items. Additionally, English-speaking experts were invited to modify the measurement items. Subsequently, demographic information was measured.
The objective of the manipulation test is to ascertain the extent to which the proximal sensory language descriptions of virtual anchors align with product attributes. In the context of particular products, divergent sensory dimensions have been demonstrated to engender disparate proportions in consumer experiences [
53]. In the case of multi-sensory products such as chocolate, the core sensory dimension is taste, followed by touch and smell [
54]. Subsequently, an intergroup variance analysis was conducted on the screened data in order to ascertain the degree of alignment between the selected products and the three distinct sensory dimensions. The findings of the study indicated the presence of substantial discrepancies between the fit of the three Proximal Sensory Descriptions and the products. The proximal sensory description fit of taste exhibited a higher level of significance (Mtaste = 5.5847, SDtaste = 1.057; Molfactory =4.3051, SDolfactory =1.429; Mtouch = 4.4407, SDtouch = 1.593). This outcome serves to substantiate the efficacy of the manipulation check employed.
Participants and procedure.
Following the conclusion of the manipulation experiment, we initiated the testing phase of the primary experiment. A total of 300 participants from the online platform Credamo (64% female, Mage = 31.2, SDage = 8.13) were included in the study. The determination of the requisite sample size was achieved through the implementation of the F-test (effect size F = 0.25; α error possibility = 0.05; Power (1 − Beta error possibility) = 0.95; Number of groups = 2) within the G*Power software [
52]. According to the results of the software analysis, the total sample size of this investigation surpassed the minimum recommended value of 249. Participants were randomly assigned to one of two conditions (proximal sensory description fit: high vs. low) in a between-subjects design to watch live video clips. This ensured that each participant was not influenced by other experimental manipulations and reduced potential threats to internal validity. Following the conduction of two simulations, the data points were filtered and removed on the basis that they were shorter than half the average duration or longer than three times the average duration. It was determined that data exceeding three times the average duration represented users with extremely prolonged sessions, as indicated by the video duration and the reasonable completion time for questionnaire responses. Conversely, data below half the average duration likely indicated users who either failed to complete meaningful actions or exited quickly, failing to reflect genuine interest in the questions. Consequently, this portion of the data was excluded from further analysis. Prior to the analysis of the data, 22 participants were disqualified for providing answers that were too extensive or too concise, and 16 participants were disqualified for their failure to adhere to the stipulated instructions. The final count yielded 262 usable responses (50.6% female, Mage = 31.4, SDage = 8.12). For a more detailed analysis of the sample characteristics, please refer to
Supplementary Section S3. b. Sample Characteristics for Experiment 1.
All subjects were instructed to observe live video recordings of the virtual anchor. To ensure that the live streaming remained constant with the exception of the proximal sensory description fit of the product, two video clips of approximately 20 s each were created for each experimental condition (see
Supplementary Section S1. b. Proximal sensory description fit description materials). The live streaming discourse described the product from a gustatory sensory perspective for the higher-fit video material and from a tactile sensory perspective for the lower-fit video material. Strict oversight was exercised over confounding variables during the generation of videos, thereby ensuring the absence of confounding effects that would compromise internal validity. Specifically, the virtual anchors in the video clips exhibited analogous physical attributes, including appearance, hairstyle, and clothing. The products were devoid of any referential features, and the background color and complexity were consistent in the live streaming backgrounds. These modifications were implemented to mitigate the influence of confounding effects on the results.
We developed a series of linguistic texts derived from the common words employed by anchors in live streaming to describe products from a sensory perspective. To illustrate this point, consider the following description of chocolate, as it pertains to the dimension of taste: “Families, today I bring you this box of chocolates. Silky smooth in the mouth, melting on the tongue to bring a mellow taste, rich cocoa aroma in the mouth quickly filled, accompanied by milk and a hint of nuts, so that the sweetness and bitterness is balanced, while the light caramel flavor aroma filled the whole mouth, a memorable, don’t miss it!”. When describing the chocolate from a tactile perspective, the following is mentioned: “Families, this box of chocolates is brought to you today. The product’s packaging is characterized by its exquisite appearance, characterized by clean and simple design elements, a smooth and delicate texture, the absence of any rough texture or granularity, and a tactile experience that is both smooth and firm, with a slight coolness when gently rubbed. The product’s texture is moderately firm, with a warm touch, and it is not easily broken, don’t miss it!”. To enhance content validity, three professors in the field of marketing were also invited to assess the readability of each script. Subsequent to the reception and consideration of their feedback, the final scripts underwent revision and generation. Subsequently, we provided the scripts to our virtual anchor service provider, who was tasked with generating voiceovers and finalizing our video clips.
After completing the video viewing, following our extensive review of the prior literature and based on our structural definitions in the conceptual framework section, participants indicated their purchase intention by answering three questions [
55] (“I will purchase the product promoted by this virtual anchor in the live streaming room.”, “I will purchase the product that this virtual anchor promotes in the live streaming.”, “I would consider this virtual anchor’s live streaming as my preferred place to shop.” (1 = strongly disagree to 7 = strongly agree; α = 0.891). We then asked 3 separate questions to assess perceived authenticity [
46] (“I think the virtual anchor’s description of this product is authentic.”, “The virtual anchor’s description makes me think that if I were to touch this product for real, the experience would be highly consistent with the description.”, “I think the virtual anchor’s description of this product gave me an insightful experience of the product’s realism.”; 1 = strongly disagree to 7 = strongly agree; α = 0.764) and attractiveness [
56] (“The virtual anchor’s description of the product made me interested in the product.”, “The virtual anchor’s description of the product made me think that the product was worth trying.”, “The virtual anchor’s description of the product made the product more attractive than other similar products.” (1 = strongly disagree to 7 = strongly agree; α = 0.863). The two scales were utilized to assess the mediating effects in the research experiment. Given that the subjects were all Chinese, we sought the guidance of an English linguist to ensure the accuracy and cultural appropriateness of the translation of the scales. Following this, we proceeded to collect demographic information. Common method bias tests were conducted for the scales employed in the mediating and dependent variables. Initially, we conducted KMO (Kaiser-Meyer-Olkin) tests and Bartlett’s sphericity tests. In accordance with the requirements for factor analysis, the following procedures were conducted in sequence: Harman’s single-factor test, CFA testing (single-factor vs. three-factor), and CLF model testing. The findings suggest that the common method bias present in this study has a negligible impact on the measurements, and the core findings demonstrate good robustness. The specific data values to which reference is being made can be located in
Supplementary Section S4 (Supplementary Section S4: Common Method Bias Tests).
Results and discussion.
To test our main hypotheses, we conducted an analysis of variance (ANOVA) using proximal sensory description fit (0 = low fit, 1 = high fit) as the independent variable and purchase intention as the dependent variable. The results showed that purchase intention was significantly higher in the high-fit condition than in the low-fit condition (Mhigh = 5.62, SDhigh = 0.89 vs. Mlow = 4.62, SDlow = 1.24; F (1, 262) = 56.55,
p < 0.001, η2 = 0.179, 95% CI [0.740, 1.265]). The results of the independent samples
t-test are reported in
Supplementary Section S5 (Supplementary Section S5: Independent samples t-test).
To test our prediction that proximal sensory description fit promotes purchase intention through perceived authenticity and attractiveness, we conducted a series of mediation analyses (PROCESS Model 6, Hayes, 2017; 95% CI and 50,000 bootstrap samples). The results showed that higher proximal sensory description fit significantly increased perceived authenticity (b = 0.832, t = 9.37,
p < 0.001), and perceived authenticity significantly increased attractiveness (b = 0.926, t = 15.88,
p < 0.001). The indirect effect of proximal sensory description fit on purchase intention through perceived authenticity and attractiveness was significant (95% CI [0.3878, 0.7029]). The overall serial indirect effect was also significant (b = 1.002, t = 7.519,
p < 0.001, 95% CI [0.749, 1.208]). For details, see
Figure 2.
As shown in
Figure 2, higher proximal sensory description fit increased perceived authenticity, which in turn enhanced attractiveness and ultimately promoted purchase intention. Overall, Study 1 provides support for both H1 and H2. It should also be noted that Study 1 examined the effect of proximal sensory description fit under conditions in which product display was present. Therefore, the positive effect observed in this study holds in a context where visual product information is provided.
Study 2: The influence of proximal sensory description fit and product display interaction.
Study 1 provided preliminary evidence to support the hypothesis that the positive impact of proximal sensory description fit on purchase intention is mediated by chain mechanisms when products are displayed. The objective of Study 2 was to further examine the manner in which the presence or absence of product display itself moderates this relationship (H3). In Study 2, we also generated avatars with identical backgrounds and images, with the exception that one group of subjects viewed a product display devoid of a chocolate picture, while the other group viewed a product display accompanied by a chocolate picture. The present framework suggests that the presence of product displays plays a moderating role between proximal sensory description fit and purchase intention, and proximal sensory description fit and perceived authenticity. Specifically, product displays can further influence consumers’ perceptions of proximal sensory description and purchase decisions through the presentation of vision, a distal sensory experience.
Participants and procedure.
From the online platform Credamo 300 participants (64.4% female, Mage = 31.75, SDage = 8.4) took part in the test, this sample size was determined by an a priori power analysis performed by the G*Power software, according to the results of the software we exceeded the minimum number of samples it recommended by 249. Building upon Experiment 1, we retained the two product-display-present videos from Experiment 1 and added two matched product-display-absent videos, controlling for Product display as a moderating variable. Participants were randomly assigned to one of four between-subjects experimental designs: 2 (Product display: Present vs. Absent) × 2 (Proximal Sensory Description Fit: High vs. Low). In accordance with the findings of two simulations, the data points were filtered and removed on the basis that they were shorter than half the average duration or longer than three times the average duration. Data that exceeds three times the mean duration may be indicative of users with exceptionally long dwell times, while data that is shorter than half the mean duration may signify users who failed to complete meaningful actions or exited quickly, thus failing to reflect genuine interest in the issue. Consequently, these data points were excluded from the analysis. Prior to data analysis, 17 participants were eliminated for answering questions that were too long or too short, and 24 participants were eliminated for failing to follow instructions. The final usable sample size was 259 (62% female, Mage = 32, SDage = 8.2). For a more detailed analysis of the sample characteristics, please refer to
Supplementary Section S3. c. Sample Characteristics for Experiment 2.
All eligible participants were randomly assigned to a between-subjects design in a 2 × 2 (Product display: present vs. absent; Proximal Sensory Description Fit: high vs. low). Across the four videos, virtual anchor appearance, background, video length, speaking rate, tone, script structure, and overall information load were kept as consistent as possible, with only Product display and Proximal Sensory Description Fit being manipulated. In this experimental condition, four videos were used, each approximately 23 s in length (see
Supplementary Section S2. Experimental Stimuli for Study 2). As with the experimental material in Experiment 1, strict control of confounding variables was exercised during the creation of these videos. Following the viewing of the videos, the participants proceeded to respond to the statistical and exploratory inquiries that had been presented to them. In this phase of the study, the participants were asked to respond to three-part scale questions regarding their propensity to purchase the product, their perception of its authenticity, and its appeal. After viewing the assigned video, participants completed measures of purchase intention, perceived authenticity, and attractiveness. The collected data were then subjected to rigorous analysis to ascertain consumers’ levels of willingness to make a purchase. The resulting data were then used to test the main effect of Proximal Sensory Description Fit and the moderating effect of Product display.
Results and discussion.
A two-way analysis of variance (ANOVA) was conducted with Product Display (0 = absence, 1 = presence) and Proximal Sensory Description Fit (0 = low proximal sensory description fit, 1 = high proximal sensory description fit) as the independent variables and purchase intention as the dependent variable. The results showed significant main effects of proximal sensory description fit (F(1, 259) = 9.838,
p = 0.002, η2 = 0.037) and Product display (F(1, 259) = 10.174,
p = 0.002, η2 = 0.039). More importantly, the interaction between proximal sensory description fit and Product display was significant (F(1, 259) = 6.003,
p = 0.015, η2 = 0.023). Simple-effects analyses showed that when Product display was absent, the effect of proximal sensory description fit on purchase intention was not significant (Mlow = 4.090, SDlow = 1.485 vs. Mhigh = 4.200, SDhigh = 1.530; F(1, 259) = 0.267,
p = 0.606, η2 = 0.001, 95% CI [−0.309, 0.529]). When Product display was present, however, the effect was significant (Mlow = 4.209, SDlow = 1.710 vs. Mhigh = 5.103, SDhigh = 1.68; F(1, 259) = 13.982,
p < 0.001, η2 = 0.052, 95% CI [0.423, 1.365]). This interaction pattern is presented in
Figure 3.
To examine the moderated mediation effect involving perceived authenticity and attractiveness, we conducted a series of moderated mediation analyses (PROCESS Model 85, Hayes, 2017; 95% CI and 50,000 bootstrap samples). The results indicated that Product display significantly moderated the indirect effect of proximal sensory description fit on purchase intention through perceived authenticity and attractiveness (b = −0.5820, SE = 0.1996, 95% CI [−0.9954, −0.1928]). Specifically, the chain-mediated effect of perceived authenticity and attractiveness was significant when Product display was present (b = 0.6511, SE = 0.1570, 95% CI [0.3606, 0.9766]), but not significant when Product display was absent (b = 0.0692, SE = 0.1312, 95% CI [−0.1829, 0.3349]).
A simplified slope diagram was constructed based on the mean values of four cells. As shown in
Figure 4, the slope for the product-display-present condition is steeper than that for the product-display-absent condition for both purchase intention and perceived authenticity. Specifically, when product display was present, higher proximal sensory description fit was associated with a more pronounced increase in purchase intention and perceived authenticity, whereas in the absence of product display, these increases were relatively flatter. This pattern indicates that product display strengthens the positive effect of proximal sensory description fit on consumer responses, thereby supporting Hypothesis H3.
The findings of Study 2 support H3, showing that Product display strengthens the positive effect of proximal sensory description fit on purchase intention. When the virtual anchor’s Proximal Sensory Description exhibits a high degree of congruence with the product, a lucid visual product display can furnish consumers with intuitive and verifiable product information, which can considerably enhance consumers’ perceived authenticity of the product and promote purchase intention. In instances where the proximal sensory description fit is found to be deficient, the provision of a lucid and meticulous visual product display has been demonstrated to offer supplementary information that can, to a certain extent, counterbalance the shortcomings engendered by the absence of proximal sensory description fit. Consequently, this can serve to fortify consumers’ perceived authenticity and their inclination to make a purchase.
4. General Discussion
The present study demonstrates that the alignment between the proximal sensory language descriptions of virtual anchors and products can influence consumers’ purchase intention. Specifically, Experiment 1 provides empirical evidence that heightened fit in proximal sensory descriptions during virtual anchor livestreams positively impacts purchase intention. This fit primarily enhances consumers’ perceived authenticity, thereby increasing the attractiveness of products or brands and promoting purchase intention. The findings of Experiment 2 demonstrate that product display significantly and positively moderates the relationship between proximal sensory description fit and purchase intention.
Theoretical contributions.
The present study offers three significant contributions to the field. First, building on extant sensory marketing research, this study examines proximal sensory language descriptions in virtual anchor live streaming contexts. Specifically, it investigates the fit between proximal sensory language descriptions and product attributes. Proximal sensory description fit is not introduced as a new construct; rather, it extends sensory congruence research to AI-driven virtual anchor live streaming. In such settings, virtual anchors cannot deliver direct proximal sensory cues due to their physical constraints. Accordingly, we propose that proximal sensory language can evoke vicarious proximal sensory perceptions for consumers. This perspective complements prior sensory marketing work, which has predominantly emphasized distal sensory cues. Existing research on sensory marketing in virtual anchor live streaming has largely focused on congruence between distal cues and product attributes, leaving the impact of proximal sensory language–attribute fit on purchase intention underexplored. To address this gap, the present study adopts a language-based proximal-cue perspective to articulate product attributes. It further examines how virtual anchors’ proximal sensory descriptions shape consumers’ purchase intention. By enriching product descriptions to compensate for virtual anchors’ inability to provide direct proximal sensory experiences, consumers’ perceived product authenticity and purchase intention can be strengthened [
57]. Our empirical evidence supports this account, showing that higher proximal sensory description fit is positively associated with consumers’ purchase intention. Overall, this study clarifies how language-evoked proximal sensory cues influence purchasing behavior within sensory marketing, with particular relevance to virtual anchor live streaming.
Second, based on cognitive–emotional theory, we constructed a chained mediation mechanism to examine how proximal sensory description fit influences consumer responses in virtual anchor livestreaming. Unlike prior research on human influencers’ livestreams, where sensory language enhances perceived authenticity by implying direct product experience—that is, suggesting that the influencer has personally used the promoted product [
9]—the present study shows that, in a virtual anchor context, proximal sensory description fit enhances consumer responses through the sequential mediation of perceived authenticity and attractiveness. Specifically, when virtual anchors describe proximal sensory perceptions accurately and appropriately, consumers perceive greater authenticity [
58], which in turn enhances attractiveness and ultimately increases purchase intention. Empirical analysis further indicates that, in AI-mediated commercial environments lacking direct physical contact and tangible product experience, proximal sensory language can function as an important substitute cue that shapes consumers’ cognitive evaluations and emotional engagement. In this way, it partially compensates for the absence of direct sensory input in AI-driven interactions. More broadly, these findings deepen our understanding of how consumer perceptions are formed in AI-mediated consumption contexts, showing that virtual anchors can strengthen consumers’ cognitive and emotional responses by optimizing proximal sensory description fit, thereby enhancing purchase intention.
Finally, product displays function as a moderating variable, with the insights they provide aiding deeper understanding of product information within consumer contexts. During virtual anchor livestreams, proximal sensory descriptions facilitate the formation of indirect sensory perceptions of products, while distal visual product displays enable consumers to accurately confirm and verify described product information without direct contact [
59]. Specifically, the virtual anchor’s descriptions of sensory attributes (e.g., taste or texture) create these indirect experiences, while the visual product displays serve as an objective, verifiable reference (e.g., product appearance and details). While human influencers’ authenticity cannot be fully replicated, this research addresses that gap. The interaction between distal and proximal sensory perceptions significantly enhances consumers’ perception of product authenticity and purchase intention. This finding indicates that the integrity of vicarious sensory perception transmission during virtual anchor livestreams positively influences consumer purchasing decisions, underscoring the importance of coordinated operation between distal and proximal sensory perceptions.
Management contributions.
The present study posits three key implications for the management praxis. First, the proximal sensory descriptions of product attributes provided by virtual anchors have been shown to positively influence perceived authenticity and purchase intention. In the context of live streams, anchors can effectively attract and retain viewers by employing sensory vocabulary unique to the product brand alongside displays, thereby boosting consumer interest. During live streaming, virtual anchors provide detailed descriptions from appropriate sensory perspectives, based on product characteristics, to evoke emotional resonance with consumers. It is possible for brands to develop and train AI models by utilizing specialized sensory vocabulary libraries. For instance, during product displays, food items should emphasize sensory elements like texture and color that appeal to taste or smell, while apparel products should focus on tactile aspects such as fabric feel and texture. This approach has been demonstrated to heighten viewer interest and enhance product authenticity [
60]. Research findings indicate that this proximal sensory description fit effectively enhances viewer engagement in virtual anchor livestreams, influences purchasing decisions through precise product descriptions, and attracts more viewers and consumers.
Second, research indicates that consumers’ perception of a virtual anchor’s authenticity directly influences their attractiveness to products and purchasing decisions. Consequently, relevant companies have the opportunity to enhance their attractiveness by improving the sense of realism in virtual anchors’ live streams. For instance, businesses can convey richer emotional content through virtual anchors during broadcasts, such as facial expressions, intonation, and body language [
61], thereby strengthening audience trust and emotional engagement. This “show-and-tell” strategy intuitively matches sensory cues with different product categories, particularly aiding consumers in establishing more authentic and natural connections for products that rely on non-visual senses. Empirical evidence has demonstrated the efficacy of such approaches in enhancing brand loyalty and purchase intention, consequently driving enhanced market performance and profitability.
Finally, the results obtained from comprehensive research indicate that product displays exert a positive influence on consumer purchasing decisions during virtual anchor livestreams. It is evident that relevant brands can leverage the strengths of both human and virtual anchors by not only prioritizing the virtual anchor’s verbal delivery and visual presentation, but also by creating customized product displays that are tailored to each item’s unique attributes. In the case of product characteristics that consumers cannot directly perceive, such as food flavors, perfume scents, or clothing textures, relevant technologies can be employed to showcase vivid facial reactions, appropriate intonation changes, and expressive body language. This contributes to the endowing of virtual anchors with human-like qualities, thereby rendering their descriptions more authentic and relatable. In comparison to monotonous mechanical narration, virtual avatars capable of demonstrating positive “reactions” to described food flavors have been shown to significantly boost audience trust and attractiveness. Research findings indicate that the integration of high-quality, information-rich visual product displays with concrete and credible proximal sensory descriptions effectively compensates for the absence of proximal sensory perceptions in virtual live streaming. This combination enhances the perceived authenticity of products, thereby optimizing consumer purchasing decisions. This study proffers an innovative theoretical perspective and practical guidance for enhancing the transmission of proximal sensory perceptions in virtual anchor live streaming, with the objective of better meeting consumer needs. From a technological standpoint, the development of more sophisticated visual product displays or more precise language descriptions has the potential to enhance consumer immersion and perceived authenticity, thus optimizing their purchasing decisions.
Limitations and future research.
The present study focuses on virtual livestreaming as a specific online shopping context. However, laboratory and online simulation settings may differ from real shopping environments and therefore cannot fully capture the complex psychological and behavioral processes involved in actual purchase decision-making. Although simulation experiments allow for greater control over variables and help ensure the reproducibility and reliability of the findings, they cannot account for all external factors present in real shopping situations, such as consumers’ mood fluctuations, social interactions, and immediate feedback [
62]. Therefore, future research should further examine differences in consumer behavior between virtual livestreaming and other e-commerce contexts, ideally through field experiments conducted on different online shopping platforms. In addition, platform-related factors, including user groups, product categories, and interaction methods, may influence consumers’ purchasing decisions in different ways [
63], and thus merit more systematic comparative investigation. Combining multiple research methods with field data would enable a more comprehensive assessment of the actual impact of virtual anchors on consumer purchasing decisions and provide stronger theoretical and practical support for the continued optimization of e-commerce practices.
The virtual anchors in this study primarily relied on sensory language to describe products when conveying vicarious proximal sensory perceptions. However, this approach is limited by the lack of multidimensional sensory stimuli, which may reduce consumers’ sensory engagement and immersion and, in turn, weaken purchase intention [
64]. In addition, unlike conventional livestreaming, the present study did not include real-time interaction or consumer feedback, such as instant question-and-answer exchanges, customized recommendations, or audience emotional responses. These factors may also influence consumers’ emotional connection and purchase motivation. Therefore, future research could further examine marketing effects across different product types and levels of sensory fit. Incorporating multi-sensory experiences (e.g., visual, auditory, and tactile cues) would make it possible to conduct multi-category experiments [
65] and compare how sensory descriptions operate across product categories. At the same time, strengthening real-time interaction in virtual anchor livestreaming may further improve consumers’ shopping experiences and purchase intention. More broadly, integrating multiple sensory stimuli may enable consumers to form a richer perception of products, thereby enhancing emotional connection and purchase motivation.
The assessment of proximal sensory description fit is often influenced by participants’ subjective perceptions, which may introduce bias into the research findings. Consumers’ responses to sensory stimuli may vary according to factors such as sensory sensitivity, individual differences, and cultural background [
66]. Therefore, relying solely on self-reports or subjective evaluations may not fully capture their sensory perceptions. In the present study, the use of Likert scales may have introduced subjective bias because participants’ responses can be affected by limits in self-reflection and context-dependent judgment. To improve measurement objectivity, future research could incorporate more precise physiological methods, such as eye-tracking and EEG, to capture consumers’ real-time responses while observing virtual livestreaming content [
67]. Such methods may help researchers assess proximal sensory description fit more accurately, reduce subjective bias, and improve the reliability and precision of the findings. More broadly, combining self-report measures with physiological indicators may provide a stronger methodological basis for future research on virtual shopping experiences.
The present study also highlights several boundary conditions that warrant further investigation. First, the use of chocolate as a hedonic product raises the question of whether the effect of proximal sensory description fit can be generalized to utilitarian products such as office supplies or tools, which may place greater emphasis on functional attributes than on sensory experience. Second, the findings are based on a Chinese sample, and their generalizability may therefore be constrained by cultural context. In particular, the observed relationships involving AI acceptance, communication styles, and sensory preferences may vary across cultures. Future research should therefore compare different product types (hedonic vs. utilitarian) and include cross-cultural samples to further clarify the boundary conditions of the effects identified in this study.