1. Introduction
Visual social platforms have become a central arena for brand communication and social commerce. On platforms such as Xiaohongshu, brands increasingly rely on a stream of image-and-text posts to stimulate attention, conversation, and downstream commercial outcomes. Yet producing “hot posts”—content that attracts disproportionately high user response—remains difficult and resource-intensive. For brand managers, the core strategic question is not only what to post, but also who should post it and when it should be released. These decisions jointly determine whether platform distribution mechanisms translate into exposure and whether exposure translates into user response at an acceptable cost.
As social media platforms have proliferated, user engagement has become a key metric for evaluating communication effectiveness, understanding content diffusion, and assessing firms’ long-term performance. Recent scholarship increasingly treats customer (or consumer) engagement as a strategic variable that shapes competitive advantage and long-run outcomes [
1]. A 2025 Special Issue of the Journal of the Academy of Marketing Science further emphasizes that, in emerging contexts such as social media, social commerce, and live streaming, engagement behaviors are deeply embedded in firm–customer interaction processes [
2].
Existing research identifies a wide range of psychological, relational, and content-level antecedents of engagement. Studies emphasize value co-creation [
3], individual motives such as information seeking and entertainment, and message features such as sentiment, length, and calls to action [
4,
5]. Influencer marketing research further shows that influencer tier, credibility, and parasocial relationships affect how users respond to branded content [
6,
7]. However, much of this literature remains user- and message-centric. It often treats the brand, creator, and platform as background conditions, offering limited guidance on how brands and creators jointly design posts on a particular platform to achieve high engagement and cost-adjusted performance. As a result, we know much about why users might want to engage, but less about how brands and creators, in relation to platform affordances, make concrete design choices that systematically relate to engagement outcomes.
We highlight three gaps. First, relatively few studies adopt an explicit creator–brand–platform perspective and treat these actors as strategic design levers rather than fixed context variables. Second, most empirical work examines one dimension at a time—such as text features [
8] or influencer tier [
9]—and often relies on relatively small or manually coded samples, making it difficult to identify whether high-engagement posts follow systematic patterns in the joint space of creator, content, and timing choices. Third, although the Technology Affordance Actualization (TAA) framework links technological possibilities to behavioral outcomes [
10], it has rarely been operationalized at the post level in social commerce, especially in large-scale behavioral data on regional platforms such as Xiaohongshu.
The TAA framework provides a process view of how digital technologies enable action [
11]. In this view, affordances are action potentials that arise from the relationship between IT artifacts and goal-directed actors; actualization refers to the concrete actions through which actors realize these potentials; and outcomes are the observable consequences of those actions [
12]. In social media contexts, prior work highlights affordances such as visibility, interactivity, connectivity, multimodal presentation, self-presentation, and temporality [
13,
14]. Yet most TAA studies describe these affordances at a conceptual level and focus on organizations or platforms rather than on measurable post-level design choices that can be systematically analyzed at scale.
We address these gaps by viewing each image–text post as a technical design through which brands and creators actualize platform affordances. Building on TAA, we translate its “who, how, when” process into a Creator–Content–Timing (CCT) framework. The creator dimension captures how creator tiers, brand status, and sponsorship arrangements relate to the actualization of visibility and commercial collaboration affordances. The content dimension describes how textual and visual cues—such as text length, brand mentions, and facial cues—actualize multimodal, self-presentation, and interaction affordances. The timing dimension captures how posting season, day, and within-month timing relate to the actualization of temporal affordances. Within this framework, engagement outcomes provide an empirical footprint of how CCT choices are associated with post-level performance, including potential trade-offs between reach and efficiency across different creator–brand–platform configurations.
Guided by this TAA-based CCT perspective, we examine the following two research questions.
RQ1. At the post level, how are creator, content, and timing choices—interpreted as alternative ways of actualizing platform affordances—associated with engagement scale (total interactions), engagement intensity (interactions per 1000 views), and cost-adjusted engagement efficiency on a visual social commerce platform?
RQ2. Do high-engagement posts concentrate in a limited number of recurring Creator–Content–Timing configurations, rather than being scattered across the CCT space, and how do these configurations differ in engagement scale and efficiency?
Empirically, we analyze 138,713 image-and-text posts linked to 100 beauty brands on Xiaohongshu between December 2022 and February 2024. Beauty is a highly visual, competition-intensive category, making it a suitable category for studying how multimodal design choices relate to engagement in social commerce. China is also one of the world’s most advanced and large-scale social commerce markets, and Xiaohongshu is the leading visual social commerce platform with a predominantly young, beauty-oriented user base, which makes this context theoretically informative for studying how creator–brand–platform configurations shape engagement. For each post we obtain creator and brand attributes, text and image features, and timestamps. We construct post-level variables using machine learning, text mining, and computer vision, and estimate regression and clustering models.
This study makes three contributions. Theoretically, it extends TAA to a creator–brand–platform setting by operationalizing affordance actualization as measurable CCT choices and configurations at the post level. Methodologically, it combines large-scale multimodal data with computational feature extraction and regression and clustering models to identify systematic design patterns behind high engagement and cost-adjusted performance. Managerially, it offers evidence-based guidance on creator selection, content design, and scheduling that helps brands improve both engagement intensity and cost-adjusted engagement on visual social commerce platforms.
3. Method
3.1. Sample Selection
We study Xiaohongshu, the leading visual social commerce platform in China that combines lifestyle content with embedded e-commerce. Because industry context is an important determinant of user engagement behavior, we focus on a single category—the beauty and personal-care sector, one of the platform’s most active and commercially important domains. Beauty and personal-care brands are prototypical lifestyle brands [
54] that rely heavily on esthetic visual cues to communicate values and brand meanings in social media environments [
55]. This setting is therefore well suited for examining how creator, content, and timing choices relate to user engagement.
Our sampling procedure proceeds in several steps. We first obtain a list of beauty brands from a professional third-party data provider, HUITUNDATA, which reports brand-level advertising expenditure on Xiaohongshu. We select the top 100 beauty brands by cumulative advertising expenditure between May and October 2023, excluding non-brand channel accounts (e.g., “Tmall”). For these 100 focal brands, HUITUNDATA provides all associated Xiaohongshu posts between December 2022 and February 2024. From this universe, we retain posts with high total user engagement and restrict attention to image-and-text posts for which we can successfully retrieve the first cover image. This yields 169,773 posts for the main empirical analysis.
3.2. Data Collection and Preprocessing
We harmonize numeric, text, and time fields across data sources and standardize all timestamps to local time. To ensure data completeness and consistency, we conduct extensive preprocessing on missing values, outliers, and variable types (numeric, textual, and date-time). Where necessary, raw fields are recast into appropriate integer or floating-point formats after handling missing entries. We then drop posts with missing core information (e.g., deleted notes) and posts for which the first cover image is unavailable. This step removes 31,060 observations and leaves 138,713 image-and-text posts, which serve as our final sample of high-engagement posts.
- (1)
Creator and brand variables
Creator tier. We classify creators into five tiers based on follower count, following industry practice, as follows: top-tier (≥500,000 followers), mid-tier (50,000–499,999), emerging (5000–49,999), ordinary users (300–4999), and casual users (≤299). We construct a categorical variable, Influencer_tier, using these thresholds.
Brand-generated accounts. We identify brand-owned accounts by matching the creator name with the brand’s Chinese and English names (Brand_generated_content = 1 if matched; 0 otherwise).
Brand status. We code brand status using the Brand Finance Cosmetics 50 (2023) ranking and supplementary information on publicly listed beauty companies. Brands are grouped into four categories—global leaders, international conglomerates, leading domestic brands, and other brands. We also compute brand-level posting intensity as the number of associated posts for each brand during the sample period.
- (2)
Sponsorship and cost variables
The dataset indicates whether a post is registered as a commercial note via Xiaohongshu’s official collaboration system (Sponsored = 1; 0 otherwise). For each post we observe the platform’s estimated advertising quotation, which serves as a proxy for the advertising cost if the post is promoted.
- (3)
Image features
We analyze only the first image of each post, as it appears as the cover in users’ feeds and is likely to drive initial attention. We use a commercial computer-vision API to extract up to 11 high-confidence semantic labels (e.g., person, product, and indoor scene) and to detect faces. For images with detected faces, the API outputs the number of faces and face attributes such as age, expression intensity, attractiveness score, gender score, and image quality. For posts with multiple faces, we compute averages of the face attributes.
For clarity, we treat facial attractiveness and related variables strictly as algorithmic outputs that approximate how automated systems might classify visual inputs; we do not interpret these scores as normative judgements of individuals. Ethical and cultural implications are discussed in the Discussion section.
Appendix A Table A1 provides a comprehensive variable dictionary. This refined dataset provides the empirical foundation for the subsequent analysis.
3.3. Variable Operationalization
3.3.1. Dependent Variables and Operationalization of High Engagement
- (1)
Total engagement
We define user engagement as the total volume of active user responses to social media marketing content on the platform [
56], including likes, favorites, comments, and shares. In this study, high engagement posts are operationalized ex-ante at the sampling stage as follows: for each focal brand, we rank all associated posts within the study period by total interactions and retain the top 3000 posts. All analyzes are conducted within this high-engagement subsample of 138,713 image–text posts.
For each post, we construct three related dependent variables that capture the following different facets of engagement: total interactions, engagement rate (interactions per 1000 estimated views), and cost-adjusted engagement (interactions per 1000 CNY of estimated advertising cost).
- (2)
Engagement_rate
This rate measures how many interactions a post generates for every 1000 impressions.
- (3)
Int_per_cost
Int_per_cost is the number of engagements generated per 1000 CNY of estimated advertising cost. It captures cost-adjusted engagement efficiency.
All three variables are highly right-skewed. In regression models, we use their natural logarithms as follows: ln_Total_engagement, ln_Engagement_rate, and ln_Int_per_cost.
3.3.2. Creator-Related Variables
Creator-level covariates capture how brands and creators configure visibility and commercial collaboration.
Creator-related variables include follower counts, posting frequency, brand status, and creator type, which distinguishes between brand-owned official accounts and non-brand accounts. According to Xiaohongshu’s platform classification, accounts were divided into five influencer tiers, namely top-tier influencers with at least 500,000 followers, mid-tier influencers with 50,000 to 499,999 followers, emerging influencers with 5000 to 49,999 followers, ordinary users with 300 to 4999 followers, and casual users with 299 or fewer followers. This categorical measure is represented as the influencer type variable.
The frequency of brand mentions was quantified by matching brand names (in both English and Chinese) from the reference list across post titles, textual content, and hashtags. Brand status was coded according to the Cosmetics 50 2023 global ranking published by Brand Finance, supplemented with data from publicly listed cosmetics firms. Based on these sources, brands were grouped into four categories, namely global leaders, international conglomerates, leading domestic brands, and other brands. This classification enables the evaluation of how brand reputation and market position may moderate user engagement. The number of associated posts is the number of associated posts for the brand in the sample period, capturing brand-level posting intensity and historical visibility. These variables allow us to test how different brand–creator configurations relate to engagement.
3.3.3. Content-Related Variables
Content variables are derived from the post text and the cover image.
- (1)
Text features
We extract and preprocess titles, captions, and hashtags using Python3.12.4. (i) Text length was calculated by counting the number of characters in cleaned post texts. We also obtain tag_count, which signifies the number of topic tags attached to the post. Question_count, number of question marks in the title and body, is used as a proxy for explicit conversational prompts. (ii) Brand mentions and pattern. We match brand names (Chinese, English, and common abbreviations) in the title, body, hashtags, and obtain mentioned Own Brand dummy equal to 1 if the creator account matches the focal brand name, 0 otherwise. Brand_pattern is a categorical variable capturing brand-mention patterns. Namely, 0 means no brand mentioned, 1 means only the focal brand mentioned (single own brand only), 2 is both focal and other brands mentioned (own + other brands), and 3 means only other brands mentioned (other brands only).
- (2)
Visual features
We derive visual variables from the first image because it serves as the cover in users’ feeds and is most likely to drive initial attention. We use the Tencent Image Recognition API to extract up to 11 high-confidence semantic tags per image (e.g., person, product, and indoor scene), which summarize salient visual elements.
The Tencent Image Recognition API was utilized to analyze primary post images. Num_faces is the number of detected faces in the cover image. Avg_age, Avg_expression, Avg_attractiveness, and Avg_gender_score signify average values of age, expression, attractiveness, and gender scores across detected faces, where applicable. These variables capture whether and how human faces and their attributes are used in visual self-presentation. Most images received between five and six tags, accounting for 51.61% (86,396 images) and 39.41% (65,940 images) of the total sample, respectively. Statistical analysis of these tags aimed to elucidate the principal visual components and their distribution within the dataset. Moreover, given prior research emphasizing facial representation, the Tencent Facial Recognition API identified faces in 39,141 images from the total sample. On average, each primary image featured approximately 0.37 faces; 72% of the samples had no faces and 23% contained exactly one face, with a maximum of five faces identified per image. Images containing multiple faces were further analyzed by averaging facial attributes, including age, smiling expressions, attractiveness scores, gender, image quality scores, and hairstyle length.
3.3.4. Timing Variables
To operationalize temporality affordances, we code posting time at multiple temporal granularities, capturing within-day, weekly, within-month, seasonal/quarterly, and holiday window patterns of attention. Specifically, we include (i) season and quarter dummies; (ii) month of year; (iii) day-of-month groups (early month: days 1–4; mid-month; and late month); (iv) weekday dummy (Monday–Sunday); and (v) time-of-day groups (e.g., early morning, daytime, evening, and late night). These variables allow us to test whether brands and creators who systematically time posts to coincide with periods of elevated attention are more likely to be elevated.
3.4. Model Specification
To assess how creator, content, and timing relate to user engagement, we estimate a series of linear regression models at the post level:
is one of the engagement outcomes (Total_engagement, Engagement_rate, or Int_per_cost); is the vector of creator-related variables. is the vector of content-related variables; is the vector of timing variables; and are additional controls (e.g., brand-level posting intensity, log estimated ad cost where applicable).
3.5. Descriptive Statistics
The final dataset contains 138,713 image–text posts. Creators have on average 59,073 followers (median = 12,323), with a highly skewed distribution (maximum 17.3 million). The platform’s estimated advertising quotation averages 5856 CNY per post (median = 1200). Posts generate on average 21,576 estimated views and 1734 total interactions, including 1116 likes, 369 favorites, 178 comments, and 71 shares. The mean Engagement_rate is 73.5 interactions per 1000 views (median = 66.3). Most posts do not contain @-mentions, and only about 9% include at least one question mark. Approximately 28% of cover images contain at least one human face, with detected faces having an average age around 24 years and relatively high algorithmic attractiveness scores. These statistics are consistent with Xiaohongshu’s positioning as a young, female-skewed beauty and lifestyle community.
5. Discussion and Implications
5.1. General Discussion
Social e-commerce platforms have become one of the core strategies for brand dissemination. A key issue for brand managers is whether the choice of creator, content, and timing can generate high-engagement “hot posts.” Based on the Technology Affordance Actualization (TAA) theory, this study proposes the Creator–Content–Timing (CCT) model, which explains how visibility, interactivity, and commercial collaboration, as well as multimodality, self-presentation, and timing, are realized. Through automated text and image analysis of 138,713 Xiaohongshu posts, the study reveals how creator tier, brand status, and collaboration mode influence user engagement.
First, regarding the creator–brand collaboration, the study finds that collaboration with creators with a large follower base does not necessarily lead to higher user engagement. Nano- and micro-influencers and non-sponsored posts show a higher “engagement efficiency per exposure” and “unit cost engagement output.” Medium-sized creators, similar to previous research, often outperform small and large influencers in terms of interaction efficiency. While medium and top-tier influencers generate the highest total interactions, smaller creators have the highest interaction rates when considering costs and exposure.
Second, in terms of content, this study identifies three key pathways for realizing affordances, including interaction affordance, multimodal self-presentation, and commercial intent. Post length is negatively correlated with total interaction volume and rate. Hashtags and specific visual elements, especially young female faces, attract more user attention and promote interaction. Compared to explicit promotional content, posts with ambiguous or hidden brand identifiers tend to generate more engagement.
Third, timing matters. Posts published during leisure times (such as Sunday or end of month) tend to generate more engagement than posts published during busy periods (such as weekdays or autumn).
Fourth, through CCT framework clustering analysis, high engagement is concentrated in a few recurring CCT combinations, indicating that the realization of affordances is not random or unpredictable but can be shaped by the creator’s choice, content cues, and timing.
5.2. Theoretical Implications
First, this study extends the Technology Affordance Actualization (TAA) perspective from platform-function focus to multimodal social commerce context. Prior TAA research has primarily emphasized platform-level affordances, whereas our findings further illustrate how such affordances are actualized through creators’ choices regarding creator, content cues, and posting timing (CCTs). Building on this perspective, we operationalize and quantify CCT-related variables to capture affordance actualization pathways in practice.
Second, we contribute to the influencer marketing literature by distinguishing engagement volume, engagement rate, and engagement cost-efficiency (interactions per unit cost, Int_per_cost). Our results show that ordinary users and ordinary-user collectives achieve higher engagement rates per exposure and exhibit cost-efficiency advantages relative to other creator tiers. In addition, within our sample, sponsored posts demonstrate significantly lower interaction efficiency and cost-efficiency than non-sponsored posts. Moreover, using a clustering approach based on creator type and posting frequency, we identify three affordance-actualization pathways—coverage-driven, activity-driven, and resource-constrained—which provide a comparative framework for understanding heterogeneous routes to highly engaging posts.
Third, in social media content marketing research, we integrate machine learning, text mining, and computer vision to quantify multiple dimensions of multimodal cues using a Xiaohongshu dataset. The results suggest that high engagement is not purely random; rather, it is systematically associated with different types of CCT configurations. This offers a data-driven framework for explaining engagement differences with multimodal evidence.
5.3. Practical Implications
For practitioners, our findings suggest a differentiated allocation strategy in influencer selection and budget deployment. Brands may assign high-budget campaigns aimed at broad reach to top-tier influencers, while allocating efficiency-oriented tasks (e.g., engagement efficiency and community maintenance) to nano- and micro-influencers. Brands should also pay greater attention to non-sponsored content in their content portfolio. The CCT framework can serve as a planning and evaluation tool to coordinate creator selection, content design, and scheduling decisions.
Second, content design should balance exposure and engagement. Compared with lengthy and information-dense copy, concise and focused messages—combined with clear questions and an appropriate number of hashtags—tend to be more conducive to interaction. Over-emphasizing a single focal brand may reduce engagement, whereas moderating brand salience or embedding the focal brand in a multi-brand context may mitigate the “single-brand penalty.” Visually, face-related cues may attract attention and facilitate engagement; however, higher facial attractiveness does not automatically translate into higher efficiency. Authentic visual presentations that resonate with users may yield more stable effects.
Third, brands should optimize posting schedules. Highly engaging posts are more likely to emerge during users’ leisure time. Shifting content to lower-utilization yet higher-efficiency time windows (e.g., Sundays, specific winter periods, or end-of-month windows) may improve engagement outcomes without increasing the budget.
5.4. Societal and Ethical Considerations
First, algorithm-generated facial attractiveness scores are treated solely as technical variables; however, the training data may embed cultural and gendered biases. If such scores are used directly for content optimization and decision-making, they may reinforce narrow esthetic standards and exclusionary norms. Platform designers and brands should therefore apply these signals with caution.
Second, several variables in this study rely on platform algorithms and third-party tools. Estimated views and advertising cost are based on quoted price from the Huitun data platform rather than actual spending, and facial attributes (e.g., age, expression, and attractiveness) are produced by commercial computer-vision APIs. These measures may contain unknown noise, commercial rules, or cultural bias. Although we conducted multiple robustness checks (including alternative outcomes and multicollinearity diagnostics), measurement error and algorithmic bias cannot be fully ruled out. Future research may integrate platform log data, controlled advertising campaigns, or human-coded image and text features to validate and refine these measures, and to more explicitly audit fairness and bias in appearance-related visual attributes.
In sum, on visual social commerce platforms, user engagement can be understood as an outcome of affordance actualization. The alignment among creator, content, and timing significantly shapes actualization efficiency. Brands may dynamically adjust CCT configurations to guide creator selection, content design, and scheduling optimization.