A Statistical Framework for Modeling Behavioral Engagement via Topic and Psycholinguistic Features: Evidence from High-Dimensional Text Data

Li, Dan; Zhang, Yi

doi:10.3390/math13152374

Open AccessArticle

A Statistical Framework for Modeling Behavioral Engagement via Topic and Psycholinguistic Features: Evidence from High-Dimensional Text Data

by

Dan Li

^1,* and

Yi Zhang

^1,2

¹

College of Economics and Management, Taiyuan University of Technology, Taiyuan 030024, China

²

School of Business Liu Guojun School of Management, Changzhou University, Changzhou 213159, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(15), 2374; https://doi.org/10.3390/math13152374

Submission received: 12 June 2025 / Revised: 11 July 2025 / Accepted: 21 July 2025 / Published: 24 July 2025

(This article belongs to the Special Issue Machine Learning and Statistical Methods to Prediction and Optimal Decision-Making)

Download

Browse Figures

Versions Notes

Abstract

This study investigates how topic-specific expression by women delivery riders on digital platforms predicts their community engagement, emphasizing the mediating role of self-disclosure and the moderating influence of cognitive and emotional language features. Using unsupervised topic modeling (Top2Vec, Topical Vectors via Embeddings and Clustering) and psycholinguistic analysis (LIWC, Linguistic Inquiry and Word Count), the paper extracted eleven thematic clusters and quantified self-disclosure intensity, cognitive complexity, and emotional polarity. A moderated mediation model was constructed to estimate the indirect and conditional effects of topic probability on engagement behaviors (likes, comments, and views) via self-disclosure. The results reveal that self-disclosure significantly mediates the influence of topical content on engagement, with emotional negativity amplifying and cognitive complexity selectively enhancing this pathway. Indirect effects differ across topics, highlighting the heterogeneous behavioral salience of expressive themes. The findings support a statistically grounded, semantically interpretable framework for predicting user behavior in high-dimensional text environments. This approach offers practical implications for optimizing algorithmic content ranking and fostering equitable visibility for marginalized digital labor groups.

Keywords:

self-disclosure; topic modeling; moderated mediation; cognitive complexity; emotional polarity; women gig workers; community engagement; psycholinguistics; text-based prediction

MSC:

62J99; 91E45; 90B50

1. Introduction

The proliferation of high-dimensional text data on digital platforms has created new opportunities and challenges for predictive modeling and decision making under uncertainty. As online expression becomes a proxy for latent user behavior, structured modeling of unstructured text has gained prominence in computational social science, requiring the integration of dimensionality reduction, probabilistic inference, and latent semantic analysis [1,2].

Topic modeling—especially embedding-based algorithms such as Top2Vec (Topical Vectors via Embeddings and Clustering)—offers a robust framework for unsupervised feature extraction by mapping linguistic data into continuous vector spaces and identifying dense clusters via manifold learning and density-based clustering. Compared to generative models like Latent Dirichlet Allocation (LDA), Top2Vec leverages semantic embeddings and hierarchical density estimation, enabling the automatic discovery of coherent themes without pre-specifying the number of components [3,4]. While topic models uncover thematic structures, they do not, by themselves, explain downstream behavioral outcomes. To bridge this underexplored area, conditional process modeling—such as moderated mediation analysis—has emerged as a powerful tool in statistical causal inference, enabling the estimation of interaction effects and indirect pathways across hierarchical models. In particular, Hayes’ PROCESS framework formalizes multistage dependencies with bootstrapped confidence intervals, enhancing the interpretability of complex interactions in behavioral data [5].

Recent advances in text regression have increasingly leveraged sparse modeling techniques, particularly LASSO and its variants, to handle the curse of dimensionality and extract interpretable signals from high-dimensional textual data [6,7,8]. A common strategy involves feeding topic proportions from unsupervised models (e.g., LDA) into supervised pipelines to predict numeric outcomes, as in Bulut’s (2014) Topic machine for ad conversions [9] or Freo and Luati’s (2024) evaluation of LASSO-type selectors for short-text pricing tasks [6]. However, such pipelines typically assume linear, unmediated relationships between topic features and outcomes, neglecting both the semantic richness of user-generated expressions and the latent psychological or behavioral mechanisms involved.

To address these limitations, this study advances the field in three substantial ways. First, building on critiques of LDA’s limitations in sparse and affect-rich environments, the paper adopts Top2Vec with contextual embeddings to generate semantically coherent themes tailored to short, fragmented worker discourse. Second, instead of plugging topics directly into regression, we embed them in a moderated mediation process model, consistent with advances in high-dimensional causal inference [7,8], capturing how topic-level experiences shape engagement via self-disclosure and are moderated by cognitive complexity and emotional polarity. This structure goes beyond standard LASSO pipelines, aligning with recent calls to bridge sparse modeling with psychological process tracing [10]. Third, focusing on women platform workers, an under-represented yet policy-relevant population, adds a socially grounded dimension to sparse text regression. Overall, the study synthesizes advances in embedding-based topic modeling [11], causal sparse learning, and time-aware LASSO optimization [7], producing a replicable behavioral framework for digital labor research.

At a conceptual level, the integration of semantic topic distributions with moderated mediation frameworks allows for the joint modeling of latent constructs (e.g., self-disclosure) and observable outcomes (e.g., user engagement), where topic probability vectors serve as predictors and psycholinguistic features act as moderators. This modeling strategy provides both explanatory and predictive power, especially when dealing with emotionally loaded and context-dependent textual inputs. Recent studies have begun to combine natural language processing (NLP) and psycholinguistic modeling to investigate outcomes such as marketing virality, mental health detection, and public opinion diffusion [2,4,12,13]. Research shows that semantic salience and stylistic coherence interact to shape attention allocation and feedback behavior across digital contexts. For instance, studies in message effectiveness and organizational psychology reveal that message content and its presentation style jointly determine audience engagement, confirming that semantic and affective components act as statistically separable but interdependent predictors [14,15,16,17]. However, few have developed structured models that explicitly integrate semantic content (e.g., topic embeddings) and expressive style (e.g., linguistic indicators) within a moderated mediation framework to predict real-world engagement outcomes with statistical rigor.

In this study, the combined modeling approach is applied to the expressive behaviors of a marginalized labor group: Chinese women delivery riders. Despite representing a growing but under-recognized segment of platform labor, female couriers face systemic constraints, including algorithmic visibility biases, platform design inequities, and limited access to feedback mechanisms [18,19]. These constraints not only reduce participation rates but also affect digital visibility and symbolic presence on public forums. Preliminary observations suggest that women’s posts often receive significantly less engagement than those of male riders, indicating a pattern of expressive disparity that may be driven by both content and stylistic cues [20]. Understanding how these cues influence attention and feedback is therefore central to designing more inclusive digital participation environments.

Drawing on a dataset of 2144 posts from Baidu Tieba, the study explores how women riders express their work-related experiences and how these expressions translate into social feedback, measured via likes, replies, and views. Using Top2Vec for topic extraction and LIWC (Linguistic Inquiry and Word Count)-based psycholinguistic coding [21], a conditional process model is built that evaluates how work experience themes (

X

) influence engagement (

Y

) through the mediating role of self-disclosure (

M

), moderated by cognitive complexity (

W_{1}

) and emotional polarity (

W_{2}

). The paper poses the following research questions:

RQ1: Which types of work experience content are more likely to elicit high levels of self-disclosure?
RQ2: Does the intensity of self-disclosure predict higher engagement (e.g., views, comments, and likes)?
RQ3: Do cognitive and affective linguistic features moderate these expression–engagement pathways?

This paper contributes in three key ways. First, it introduces a scalable framework that embeds semantically meaningful topics into a moderated-mediation model, moving beyond direct LASSO-topic regressions. Second, it reveals that semantic themes and expressive style jointly influence engagement, providing interpretable levers for digital HRM. Third, it offers practical insights for optimizing content delivery, interface communication, and inclusive design in gig-platform ecosystems.

The remainder of this article is organized as follows: Section 2 outlines the theoretical background and hypothesis development. Section 3 describes the modeling framework, variable construction, and estimation approach. Section 4 presents the empirical results. Section 5 discusses implications. Section 6 concludes.

2. Theoretical Background and Hypotheses

To understand the mechanisms by which expressive textual features influence community engagement, this study draws upon two complementary theoretical frameworks: Self-Disclosure Theory and Social Exchange Theory. Together, they offer a coherent conceptual structure for modeling how digital content—particularly user-generated narratives—elicits social responses in online labor platforms.

2.1. Self-Disclosure of Women Delivery Riders’ Work Experience

In the platform economy, takeaway riders often work in flexible, task-based, or self-employed roles. However, these jobs are frequently characterized by institutional precarities such as vague platform rules, lack of employment protection, and insufficient managerial support [22]. In China, an increasing number of women have joined this digital labor force, forming a growing yet marginalized segment of the gig workforce. These women riders not only face algorithmic pressure and customer demands but also a “double dilemma” of labor vulnerability and constrained expressive agency [23]. Within this context, some women riders have begun to articulate their work experiences, release emotional tension, and seek peer validation on external digital platforms such as Baidu Tieba and WeChat groups. These off-platform disclosures have gradually evolved into informal, yet vital, support and feedback mechanisms [24].

Self-disclosure, defined as the voluntary communication of personal experiences, emotions, or opinions during social interactions [25], has become a prevalent expressive strategy among women delivery riders. According to Self-Disclosure Theory, individuals strategically reveal personal information to achieve psychological and social goals, such as seeking support, gaining visibility, or constructing identity [26,27]. In digital labor contexts, such disclosures are not merely emotional catharses but also reflect attempts to navigate systemic marginalization. The subject matter of one’s expression—whether it concerns platform injustice, algorithmic control, or customer conflict—plays a crucial role in shaping the intensity and emotional depth of disclosure [20,28]. Content that involves risk, unfair treatment, or emotional stress tends to activate stronger motivations for personal narrative sharing [16].

Women platform workers are especially inclined to adopt expressive styles that emphasize identity, empathy, and emotional transparency. Studies have shown that female workers often employ first-person perspectives, affective language, and detail-oriented storytelling, features that distinguish their narratives as authentic and socially urgent [23,29,30]. These expressions not only facilitate emotional self-regulation but also serve as mechanisms for asserting their visibility and right to be heard. Within the current limitations of on-platform feedback systems, self-disclosure becomes a vital mode of symbolic resistance and community signaling. Therefore, identifying which themes in women riders’ work narratives most strongly evoke self-disclosure is critical for understanding the psycholinguistic foundations of platform labor expression. It also addresses a broader structural challenge: how the platform economy can recognize and respond to the voices of digitally marginalized workers.

Building on Self-Disclosure Theory, the study further argues that topics involving risk, stress, or injustice—such as algorithmic control, regulatory opacity, or customer mistreatment—tend to trigger deeper personal sharing, as individuals seek support, validation, or symbolic recognition [24,25]. Psycholinguistic research shows that these high-stakes disclosures are often accompanied by identifiable linguistic features, including first-person pronouns, affective language, and elaborative reasoning (e.g., causal connectives and insight words). These features, commonly captured via LIWC metrics, not only reflect emotional authenticity and cognitive effort but also act as social signals that enhance interpretability and credibility [31,32]. In digital settings, such expressions are empirically linked to increased audience responsiveness and social reciprocity, further reinforcing the disclosure–engagement dynamic. Accordingly, this paper proposes hypotheses:

H1:

The thematic content of women delivery riders’ work experiences significantly influences the intensity of self-disclosure in online posts.

2.2. Platform Expression: Work Experience, Self-Disclosure, and Community Engagement

In the context of digital labor platforms, self-disclosure functions not only as a means of personal expression but also as a strategic mechanism to elicit social feedback and build interactive relationships [33]. According to Social Exchange Theory [34], individuals engage in interactions based on expected rewards and perceived reciprocity, especially when communication carries emotional value or social risk. In online labor communities, expressions that signal vulnerability, effort, or authenticity are often met with increased responsiveness, as they create a sense of shared experience or moral obligation [35].

Drawing on Social Exchange Theory, the paper models engagement as a form of social reciprocity. Posts that signal emotional vulnerability, cognitive effort, or authenticity are more likely to be interpreted by community members as personally costly or socially valuable contributions [36,37,38]. In return, audiences may respond with reciprocal behaviors—such as likes, supportive comments, or content amplification—as a form of symbolic reward [39]. This exchange logic implies that engagement is not merely reactive but grounded in perceived expressive investment, making self-disclosure a key mediating mechanism between thematic content and community interaction.

First, the content theme of a post critically shapes its potential to trigger community engagement. Posts addressing topics with high emotional salience—such as performance anxiety, customer conflict, or institutional injustice—are more likely to activate users’ attention and stimulate affective responses [40,41]. These emotionally and socially charged themes serve as exchange signals, indicating that the expressers are offering personally costly information, thereby inviting reciprocal engagement from others.

Second, self-disclosure itself acts as an interactional catalyst. Empirical studies have shown that high levels of disclosure—especially those marked by first-person language, affective vocabulary, and detailed personal narratives—enhance the perceived credibility, sincerity, and emotional intensity of the message [40,42]. From a social exchange perspective, such disclosures create an imbalance of openness that prompts others to reciprocate through likes, comments, or support. Furthermore, disclosure enhances interpretability by providing contextual and psychological cues that help the audience relate to or empathize with the speaker [43].

Crucially, the relationship between work experience topics and community engagement may be indirect, mediated by the level of self-disclosure. High-risk or emotionally loaded topics tend to elicit deeper disclosure, which, in turn, increases the likelihood and strength of engagement responses. This content–disclosure–engagement chain embodies a psycholinguistic feedback loop grounded in perceived social value and affective reciprocity [33,44]. Among women riders in particular, emotional disclosures are often infused with the desire for recognition, solidarity, and empowerment in the face of labor oppression and gendered marginality [20].

In sum, both the thematic relevance and the disclosure intensity of labor-related expressions determine whether and how platform communities engage with them. These patterns are especially salient among women delivery riders, whose expressions are often more affectively framed and socially motivated. Based on this rationale, the following hypotheses are proposed:

H2a:

Self-disclosure increases the likelihood and intensity of community engagement.

H2b:

Self-disclosure mediates the relationship between work experience topics and online community engagement.

2.3. Cognitive Complexity as a Linguistic Moderator

Beyond the semantic content of disclosures, the cognitive complexity of language—defined as the degree of causal reasoning, structural coherence, and analytical elaboration in a post—plays a pivotal role in shaping how expressions are both constructed and socially interpreted [45]. In psycholinguistic research, cognitively complex narratives are often associated with deeper cognitive effort, epistemic engagement, and uncertainty resolution. These features enhance the perceived credibility, intentionality, and seriousness of the message, making them more persuasive and socially valuable [32,46]. From a signaling theory perspective, cognitive complexity functions as a quality signal in digital environments. When platform workers articulate their experiences with greater coherence and elaboration, they reveal not only the factual content of their narratives but also their communicative investment and psychological engagement. Such signals are especially critical in algorithmically mediated labor contexts, where trustworthiness and message salience often rely on linguistic cues rather than formal credentials [47]. Importantly, cognitive complexity may modulate two distinct stages in the expressive process:

First, it influences the transformation of internal experience into outward disclosure. This extends the logic of the Elaboration Likelihood Model (ELM) beyond its traditional outcome focus. While the ELM primarily addresses how message elaboration affects persuasion and attitude change, our model applies its central tenets upstream, highlighting how individuals with higher cognitive capacity are more likely to engage in effortful expressive construction, particularly when facing complex or emotionally charged work experiences [48,49]. In this sense, cognitive elaboration acts as a precondition for self-disclosure, especially in contexts where expression entails social risk or epistemic demand (e.g., algorithmic injustice and institutional ambiguity). This idea aligns with theories of dual-threshold disclosure and social cognition, suggesting that high-barrier content is more likely to be externalized by users who possess sufficient processing capacity.

Second, cognitive complexity enhances the impact of disclosure on social engagement. Posts that exhibit higher levels of structural clarity, analytical depth, and linguistic sophistication are more likely to be perceived as sincere, valuable, and worthy of response. From a social exchange standpoint, such disclosures signal intentionality and reciprocity potential, thereby motivating community members to engage through likes, comments, or peer support [33,45]. In digital signaling environments, effortful expression is interpreted as a costly signal, increasing its persuasive weight and interpretive salience.

However, the moderating effect of cognitive complexity is not uniform across all experiential themes. Posts about regulatory barriers or platform mismanagement may benefit from greater elaboration to establish legitimacy and signal reasoned critique. In contrast, emotionally saturated topics—such as harassment, bodily harm, or customer abuse—may rely more on emotional immediacy and authenticity than on analytical sophistication. Excessive complexity in such contexts may even blunt empathic responses or reduce interpretive clarity [50,51]. Therefore, we expect the moderating role of cognitive complexity to vary across themes, reflecting the functional demands and emotional stakes of each topic. Based on this reasoning, the paper proposes the following hypotheses:

H3a:

Cognitive complexity strengthens the effect of work experience topics on the intensity of self-disclosure.

H3b:

Cognitive complexity amplifies the effect of self-disclosure on community engagement.

H3c:

The moderating effect of cognitive complexity varies across different work experience topics.

2.4. The Moderating Role of Emotional Polarity

Emotion in digital expression functions not merely as a personal state but as a communicative signal with social consequences. In the context of digitally mediated labor discourse, the emotional polarity of language—particularly the degree of negative affective tone—plays a central role in shaping how self-disclosure is interpreted and responded to by platform communities. According to Social Sharing Theory, individuals are more likely to share experiences that are emotionally intense, especially those involving negative affect such as anger, fear, or frustration [52,53]. These emotions serve key social functions: they validate subjective experiences, foster collective understanding, and invite empathic alignment or moral judgment from others. In this way, emotional disclosures act as affective signals that not only convey internal states but also actively mobilize social responses [54,55].

For marginalized platform workers—particularly women riders—negative emotional polarity often reflects lived vulnerability and institutional marginality. Expressions of fear, anxiety, injustice, or burnout become both cathartic outlets and calls for recognition in an ecosystem that frequently lacks formal grievance mechanisms. Such disclosures offer communities a chance to align emotionally, share burdens, and signal solidarity, thus driving engagement behaviors like comments, upvotes, and support messages [33,56].

However, the amplifying effect of negative emotion is topic contingent. In posts dealing with algorithmic injustice, customer abuse, or labor exploitation, emotionally forceful expression may serve to emphasize urgency, raise moral salience, and elicit stronger social reaction. These topics inherently carry a high emotional load, and negative tone enhances the perceived sincerity, legitimacy, and deservedness of the disclosure. In contrast, for posts centered on routine logistics, technical issues, or mundane observations, excessive negativity may appear misplaced or melodramatic, undermining credibility and discouraging engagement. Thus, the communicative value of emotional polarity is not fixed but is instead dynamically conditioned by the thematic framing of the disclosure. Based on this integration of emotional framing and social sharing dynamics, the research proposes the following:

H4a:

Negative emotional polarity enhances the impact of self-disclosure on community engagement.

H4b:

The moderating effect of emotional polarity is contingent upon the work experience topics.

2.5. Path Modeling and Structural Assumptions

To formalize the hypothesized relationships, this study adopts a two-stage moderated mediation framework that captures how the semantic content and expressive form of user-generated posts influence community engagement. Specifically, the model estimates how work experience topics affect self-disclosure intensity, which, in turn, predicts engagement behaviors.

This sequential pathway is structured as follows:

In Stage 1, topics influence self-disclosure, with this effect moderated by cognitive complexity. This reflects the idea that semantically rich but cognitively elaborate expressions are more likely to prompt intense personal narratives (H1 and H3a).

In Stage 2, self-disclosure drives engagement, and this path is jointly moderated by both cognitive complexity and emotional polarity. These moderators affect how disclosure is interpreted, either as a credible signal of intent or as an affective trigger, consistent with H2b, H3b, and H4a. The model also allows for content-contingent moderation, wherein the effects of complexity and polarity are not uniform but vary depending on the thematic context of the expression (H3c and H4b).

Statistically, these relationships are encoded using the PROCESS macro (Model 4 and Model 58), which estimates both direct effects and conditional indirect effects through bootstrapped confidence intervals. This design enables us to disentangle the explanatory contributions of thematic content, disclosure intensity, and psycholinguistic framing, providing a robust account of expressive behavior in digital labor contexts.

Accordingly, Figure 1 summarizes the proposed path structure, highlighting both the mediated pathways and the moderating influences that shape the expressive impact of women riders’ narratives.

3. Data and Modeling Methods

This section defines the symbolic variables, outlines the statistical modeling framework, and describes the process by which semantic and psycholinguistic features were extracted and analyzed. The model integrates a topic-based predictor (

X

), a psycholinguistically derived mediator (

M

), and a multi-indicator outcome (

Y

), with cognitive complexity (

W_{1}

) and emotional polarity (

W_{2}

) serving as moderators for the two-stage conditional process. The model incorporates topic-based predictors, psycholinguistic mediators, and engagement outcomes, operationalized through a set of symbolic variables defined later in this section. Figure 1 illustrates the moderated mediation structure, where both the

X \to M

and

M \to Y

pathways are tested for moderation effects.

3.1. Data Collection and Preprocessing

The data for this study come from Baidu Tieba, China’s mainstream community-based information platform, and related online labor forums. It focuses on the public discussions posted by the delivery riders in their daily work. Baidu Tieba is a social network platform that is characterized by strong openness in content, high topic focus, and stable interaction structure, making it an essential channel for flexible workers such as takeaways and online carriers to share their experiences, vent their emotions, and gain group support [57]. Compared with other communities with mainly elite users (e.g., Zhihu), posting bars are more characterized by grassroots expression from the perspective of laborers and have a higher user penetration rate, which provides a more representative data field for studying women riders in the Yangtze River Delta region.

The data collection work used a web crawler program written in Python 3.10 to systematically capture all posts and replies containing the keywords, “women riders”, “delivery riders”, “riders”, “platform salary”, “system deduction”, etc., in takeaway rider-related postings from January 2023 to March 2025 and then collect the data from the web crawler program. The collected content is released by users independently and in the form of semi-structured text, including key metadata such as posting time, user ID, content body, view count, comment count, like count, etc. The platform sets a gender field for user accounts, and by combining semantic features, posting language, and geographic location information, the study screened out user text samples that have the identity of “women takeout riders” and focused on postings in communities in the Yangtze River Delta region.

The original sample consists of approximately 2144 main posts and 16,898 replies. Following preprocessing and sample screening, invalid content (such as advertisements, emoji retweets, and missing metadata entries) was eliminated. Additionally, multiple posting behaviors associated with duplicate IDs were merged, and the maximum number of posts per user was limited to three. Consequently, the final valid sample unit comprises 2144 main posts from women riders, along with the corresponding comments and views, which are utilized for the subsequent calculation of the engagement rate index. The associated comments and browsing data are employed for the calculation of participation rate indicators.

In addition, to enhance the scientific control of the sample structure, the study employs a stratified random sampling strategy. This approach balances the sample distribution based on the dimensions of posting time (quarterly) and community activity (high vs. low interaction). This ensures that the obtained samples are representative and diverse in terms of time and information density.

3.2. Variables Definitions

This section presents the mathematical framework used to model the relationship between work-related textual expressions and online community engagement. The proposed model integrates topic probabilities, psycholinguistic indicators, and platform metadata to evaluate how expressive features influence user interaction outcomes. A moderated mediation model is developed and formalized as follows:

Let the dataset consist of

n

observations, where each post

i \in {1, 2, \dots, n}

is associated with the variables in Table 1:

Although the raw data source consists of unstructured user-generated content (i.e., delivery riders’ forum posts), all variables used in the modeling process were aggregated at the daily level. Each observation in the dataset thus corresponds to a single day, not an individual post. Variables such as topic expression frequencies Topic 1 to Topic 11 (T1–T11), self-disclosure index (SI), cognitive complexity index (CCI), emotional polarity index (EPI), and engagement rate (ER) were transformed into continuous, time-series variables that capture aggregated discourse patterns per day.

This procedure ensured that the final dataset is low dimensional and relatively dense, satisfying the assumptions of OLS-based path analysis. As such, the subsequent PROCESS models applied to this dataset yield interpretable coefficients and valid bootstrapped confidence intervals for mediation and moderation paths [58].

3.3. Topic Modeling and Psycholinguistic Feature Extraction

To uncover latent themes in high-dimensional text data, this study employs Top2Vec, an unsupervised topic modeling algorithm that leverages semantic embeddings. Unlike traditional generative models such as LDA, Top2Vec integrates document and word vector spaces to identify dense topic clusters based on contextual meaning rather than word co-occurrence frequency [4]. Top2Vec offers several advantages that make it especially well suited for analyzing fragmented, affect-laden user-generated content:

No need to pre-specify the number of topics: The model automatically detects the optimal number of topics based on data topology.
Semantic coherence: By capturing contextual meaning through embeddings, Top2Vec improves interpretability, particularly valuable for short and emotionally expressive texts.
The method is computationally efficient and robust when applied to large-scale, high-dimensional corpora [3,4].

3.3.1. Algorithmic via Top2Vec Algorithm

Given a corpus

D = {d_{1}, d_{2}, \dots, d_{n}}

, Top2Vec performs the following major steps:

Document Embedding: Each post d_i ∈ $D$ is mapped into a high-dimensional vector space $ℝ^{m}$ using a pretrained embedding model. The paper used the pre-trained Chinese BERT model (bert-base-Chinese) as the embedding model in Top2Vec, enabling multilingual sentence-level representation [11].

$v_{i} = E m b e d (d_{i}) \in ℝ^{m}$

(1)
Dimensionality Reduction: The embedding matrix $V = {[v_{1}, v_{2}, \dots, v_{n}]}^{⊤} \in ℝ^{n \times m}$ is reduced to a lower-dimensional manifold using UMAP [3].

$\tilde{V} = UMAP (V) \in ℝ^{n \times k}, k < m$

(2)
Topic Discovery via Density Clustering: The reduced vectors $\tilde{V}$ are clustered using HDBSCAN, a density-based clustering algorithm that identifies variable-density topic groups without pre-specifying the number of clusters.
Topic Vector and Word Proximity: For each discovered topic t_j, a centroid vector $μ_{j}$ is computed. The most representative words ${w_{1}, w_{2}, \dots, w_{s}}$ are then selected based on cosine similarity to $μ_{j}$ :

$sim (w, t_{j}) = \cos (Embed (w), μ_{j})$

(3)
Post Assignment: Each post d_j is assigned to the nearest topic $t_{j}$ based on vector similarity, forming a categorical distribution over topics. The full pipeline is illustrated in Figure 2.

To ensure that embedding and clustering choices were methodologically sound, we benchmarked multiple combinations of topic modeling pipelines. Top2Vec with Chinese BERT embedding and UMAP dimensionality reduction yielded the highest semantic coherence (NPMI = 0.52), substantially outperforming LDA-based alternatives (NPMI = 0.378). Moreover, the Top2Vec with BERT–UMAP–HDBSCAN achieved a Silhouette score of 0.37, indicating more compact and separable topic clusters than PCA- or t-SNE-based reductions (Silhouette < 0.22) (detailed in Appendix A Table A1). These results justify the adoption of BERT–UMAP–HDBSCAN as a theoretically and empirically grounded pipeline for modeling psycholinguistic structures in multilingual gig-work narratives.

In this study, Top2Vec identifies 11 dominant topics from 2144 women rider posts, each represented by keyword vectors and importance scores (see Appendix A Table A1). These topics are later used as input variables for statistical modeling (see Section 4), capturing various dimensions of work experience.

3.3.2. Psycholinguistic Feature Extraction by LIWC

Using the LIWC 2015 dictionary, the paper computes the following:

Self-disclosure intensity (M) was calculated as the standardized sum of breadth, depth, and length of each post— $SI = (Breadth + Depth) \times Length$ —capturing both the richness and extent of personal expression [33].
Cognitive complexity (W₁) was calculated from LIWC cognitive process categories including the sum of terms related to causation (because), insight (think), tentativeness (perhaps), certainty (definitely), and differentiation (different) [59].
Emotional polarity (negative) (W₂) was derived from the affective tone of posts using the formula $EPI = \frac{1 + pos}{2 + pos + neg}$ , centered at 0.5 [60], where “pos” and “neg” represent the number of positive and negative emotion words.
Engagement rate (Y), the dependent variable, was computed as (likes + comments)/views for each post. This metric reflects audience interaction normalized by visibility [61].

All linguistic features are z-standardized before modeling. Table 2 provides a summary of the main variables used in this study, including their symbol, names, definitions, and measurement methods.

3.4. Moderated Mediation Modeling Framework

The paper defines a two-equation system to capture the mediation and moderation mechanisms, extending Hayes’ conditional process framework (Model 4 and Model 58):

Stage 1 (first-stage moderation):

M_{j} = α_{0} + α_{1} X_{j} + α_{2} (X_{j} \times W_{1 j}) + α_{3} (X_{j} \times W_{2 j}) + γ^{⊤} Z_{j} + ε_{1 j}

(4)

Stage 2 (second-stage moderation):

Y_{j} = β_{0} + β_{1} X_{j} + β_{2} M_{j} + β_{3} (M_{j} \times W_{1 j}) + β_{4} (M_{j} \times W_{2 j}) + θ^{⊤} Z_{j} + ε_{2 j}

(5)

In this model, the parameter

α_{2}

captures the first-stage moderation effect of cognitive complexity (

W_{1}

) on the relationship between topical content (

X

) and self-disclosure (

M

), while

α_{3}

captures the first-stage moderation effect of emotional polarity (

W_{2}

) on the same path. In the second stage,

β_{3}

and

β_{4}

represent the moderation effects of cognitive complexity (

W_{1}

) and emotional polarity (

W_{2}

) on the impact of self-disclosure (

M

) on engagement (

Y

), respectively.

ε_{1 j}, ε_{2 j} ~ N (0, σ^{2})

are residuals. They quantify how the strength of self-disclosure’s impact on engagement varies depending on linguistic style. Equation (4) models how topic content influences self-disclosure; Equation (5) captures the effect of self-disclosure and its interaction with linguistic moderators on engagement outcomes. The interaction terms

M_{j} \cdot W_{1 j}

and

M_{j} \cdot W_{2 j}

represent the moderation effects of cognitive complexity and emotional polarity, respectively. These parameters quantify how both linguistic style and emotional expression moderate the strength and direction of content-to-engagement pathways through self-disclosure. The model structure allows us to assess conditional indirect effects of topical expressions, moderated by both cognitive and emotional linguistic features.

3.5. Indirect Effect Estimation and Bootstrapping

The conditional indirect effect of

X

on

Y

via

M

, moderated by

W_{1}

and

W_{2}

, is given by:

{IE}_{j} = (α_{1} + α_{2} W_{1 j} + α_{3} W_{2 j}) \cdot (β_{2} + β_{3} W_{1 j} + β_{4} W_{2 j})

(6)

The paper followed Hayes’ PROCESS modeling framework, combining Model 4 (simple mediation:

X \to M \to Y

) and Model 58 (moderated mediation with first- and second-stage moderators

W_{1}

and

W_{2}

). Specifically, Model 4 was used to test the mediating effect of self-disclosure (

M

) between topic content (

X

) and engagement (

Y

), while Model 58 examined how cognitive complexity (

W_{1}

) and emotional polarity (

W_{2}

) moderate the

X \to M

and

M \to Y

paths. All continuous variables were mean centered before creating interaction terms to reduce multicollinearity.

This study uses bias-corrected bootstrapping (5000 iterations) within the PROCESS macro, which performs OLS-based regression and generates confidence intervals for both conditional indirect effects and interaction terms, enabling direct interpretation of coefficient direction and magnitude. The following steps were implemented:

Simple slope analysis at −1 SD, mean, and +1 SD levels of $W_{1}$ and $W_{2}$ to interpret moderation strength;
Johnson–Neyman technique to locate regions of significance for continuous moderators;
All models include control variables $Z$ (time and user ID).

4. Empirical Results

To examine how women delivery riders’ work experience expressions influence community engagement, the research estimates a moderated mediation model for each of the eleven identified topics. The original theoretical model included cognitive complexity (

W_{1}

) and emotional polarity (

W_{2}

) as moderators in both stages of the process. However, preliminary analyses revealed that the interaction terms involving

W_{2}

in the first stage (topic × emotional polarity → self-disclosure) were statistically insignificant across all topics (

p > 0.1

). Meanwhile, it also indicates that the moderating effect of positive emotional polarity (EPI_N) is statistically insignificant across all tested pathways. Accordingly, the final model specification excludes

W_{2}

from the first-stage moderation path, and negative emotional polarity (EPI_N) was retained as a second-stage moderator to improve parsimony and statistical stability. The revised empirical model includes:

M_{j} = α_{0} + α_{1} X_{j} + α_{2} (X_{j} \times W_{1 j}) + γ^{⊤} Z_{j} + ε_{1 j}

(7)

Y_{j} = β_{0} + β_{1} X_{j} + β_{2} M_{j} + β_{3} (M_{j} \times W_{1 j}) + β_{4} (M_{j} \times W_{2 j}) + θ^{⊤} Z_{j} + ε_{2 j}

(8)

where

$X_{j}$ : topic probability for observation $j$ (extracted via Top2Vec);
$M_{j}$ : self-disclosure intensity (measured using LIWC);
$Y_{j}$ : engagement rate;
$W_{1 j}$ : cognitive complexity index;
$W_{2 j}$ : negative emotional polarity index;
$α_{2}$ : first-stage moderation coefficient ( $X \to M$ modulated by $W_{1 j}$ );
$β_{3}$ : second-stage moderation coefficient ( $M \to Y$ modulated by $W_{1 j}$ );
$β_{4}$ : second-stage moderation coefficient ( $M \to Y$ modulated by $W_{2 j}$ );
$ε_{1 i}$ , $ε_{2 i}$ : residuals.

To structure the analysis of how topical expressions influence user engagement via self-disclosure, we draw on the indirect and conditional process model (Model 4 and Model 58) and decompose the indirect effect into four distinct components. These include simple mediation, first-stage moderation, second-stage moderation, and conditional moderated mediation. This decomposition allows us to trace not only whether, but also how and under what conditions, topic-level expression leads to increased engagement. The four effect types are summarized below.

Table 3 presents the empirical results for direct effects, mediated pathways, moderation by psycholinguistic variables, and conditional indirect effects across topics. The formulations allow us to assess both linear and interaction-based paths, aligning with the hypothesis structure developed in Section 3. All models were implemented using the Hayes PROCESS macro (Model 4 and Model 58) with 5000 bootstrap resamples and 95% bias-corrected confidence intervals. All continuous predictors were mean centered prior to constructing interaction terms to reduce multicollinearity.

4.1. Descriptive Statistics and Variable Correlation Analysis

To evaluate the distributional properties, multicollinearity risk, and inter-relationships among key variables, this study conducted descriptive statistical analysis and nonparametric correlation testing on the final sample of

N = 644

days. All statistical procedures were implemented using SPSS 26.0.

Descriptive Distribution and Variance

As shown in Figure 3 and Table 4, the engagement rate (ER) exhibits a positively skewed distribution (skewness = 1.48), with a mean of 0.235 and standard deviation of 0.109, indicating that most posts receive low interaction, with a small number of posts attracting disproportionate attention. This long-tailed pattern is consistent with prior findings on expressive inequality in digital labor [42]. Although some topic expression variables exhibited right-skewed or zero-inflated distributions, the data were aggregated at the daily level and retained sufficient variance for reliable estimation. Since PROCESS relies on OLS-based path models with bootstrapped confidence intervals, it remains robust to mild sparsity and non-normality in predictors [53]. Therefore, the use of PROCESS estimation is methodologically valid in this context.

From the Figure 3 and Table 4, the self-disclosure index (SI) also demonstrates strong non-normality (skewness = 4.81; kurtosis = 35.95), with values ranging from 0 to 259.23. These extreme values reflect heterogeneity in riders’ expressive behaviors, with some women riders producing much more elaborate and personal posts than others. Topic-related variables (T1–T11), derived from Top2Vec probability distributions, also show sparse activation patterns, where most topics occur infrequently per post. For example, T1 (Service Regulations and Skill Training) has the highest average frequency (mean = 1.798), whereas T7 (Risk Perception and Safe Riding) and T10 (Delivery Process Feedback) have lower means (≤0.2), suggesting unequal salience of lived experiences across themes. Table 4 reports descriptive statistics and multicollinearity diagnostics (VIFs) for all predictors, including topical variables (T1–T11), psycholinguistic indices (CCI, EPI_N, and EPI_P), and the self-disclosure mediator (SI). As ER serves as the dependent variable, its VIF is not reported. Table 4 reports descriptive statistics and multicollinearity diagnostics (VIFs) for all predictors, including topical variables (T1–T11), psycholinguistic indices (CCI, EPI_N, and EPI_P), and the self-disclosure mediator (SI). As ER serves as the dependent variable, its VIF is not reported. Table 4 reports descriptive statistics and multicollinearity diagnostics (VIFs) for all predictors, including topical variables (T1–T11), psycholinguistic indices (CCI, EPI_N, and EPI_P), and the self-disclosure mediator (SI). As ER serves as the dependent variable, its VIF is not reported. All Variance Inflation Factors (VIFs) for predictor variables remain below 3.2, well under the threshold of 5, confirming no serious multicollinearity in subsequent regression analyses.

To further assess inter-variable associations, Figure 4 and Appendix A Table A3 presents a correlation heatmap based on Kendall’s Tau-b, which is suitable for the non-normal distributions observed in our variables. The results reveal a moderate positive correlation between self-disclosure (SI) and engagement rate (ER)—

τ = 0.47

,

p < 0.01

—supporting the idea that more self-revealing posts attract higher interaction. Cognitive complexity (CCI) is positively associated with both SI (

τ = 0.61

,

p < 0.01

) and ER (

τ = 0.52

,

p < 0.01

), suggesting that analytical language use corresponds to deeper disclosure and broader visibility. Emotional polarity indices exhibit inverse trends: negative polarity (EPI_N) is positively correlated with ER (

τ = 0.18

,

p < 0.01

), whereas positive polarity (EPI_P) exhibits a weak negative correlation with SI (

τ = - 0.01

,

p < 0.01

). This implies that emotionally negative content is more expressive and likely to engage others, consistent with predictions from social sharing theory [53]. Topic variables such as T1 (Service Regulations and Skill Training), T2 (Income Dependency Behavior), and T4 (Technological Control and Labor Burden) correlate significantly with SI and ER, implying that content with institutional or financial stress may trigger greater expressive intensity and thus interaction.

These findings provide preliminary support for the mediation and moderation hypotheses tested in later sections and demonstrate the psycholinguistic regularities embedded in women riders’ digital narratives.

4.2. Mediation Effects Across Topics

To assess the extent to which topical expressions influence engagement through self-disclosure, the study first examines the simple mediation paths (

E = α_{1} \cdot β_{2}

) across all 11 topics, following the conditional process model introduced earlier (Table 5, Table 6 and Table 7). Table 5 presents the direct effects of each topic on both self-disclosure intensity (SI) and engagement rate (ER). All paths are statistically significant at the 95% bootstrap confidence level, as evidenced by confidence intervals that exclude zero. This indicates that topic content not only influences users’ willingness to disclose but also directly affects the downstream engagement behavior, supporting Hypothesis H1.

Table 6 further summarizes the indirect effects of each topic on engagement via self-disclosure by PROCESS Model 4. All indirect effects are statistically significant, with bootstrap confidence intervals excluding zero and relatively low standard errors. The magnitude of standardized coefficients (

β_{s t d}

) in Table 5 and Table 6 offers critical insight into the strength and optimization potential of each communication pathway. For instance, Topic 6 (Scenario-Adaptive Behavior) shows a robust mediation structure, with standardized coefficients of 0.4127 (CI [0.0171, 0.0569] without 0) for the T6 → SI path and 0.1414 (CI: [0.0214, 0.0456] without 0) for the SI → ER path. This indicates a strong upstream effect of topic content on self-disclosure, as well as a significant downstream impact of disclosure on engagement. In contrast, topics such as T1 (Service Regulations and Skill Training) and T3 (Order Anomalies and Emergency Response) also show statistically significant effects, but their

β_{s t d}

values are relatively smaller, reflecting a weaker transmission from experience sharing to audience interaction.

The magnitude of mediation effects varies substantially across topics, revealing that different types of expressive content activate the disclosure-to-engagement pathway to different degrees. As shown in Table 7, Topic 6 (Scenario-Adaptive Behavior) exhibits the strongest mediated effect (0.6119), followed closely by Topic 10 (Delivery Process Feedback) at 0.5436, and Topic 7 (Risk Perception and Safe Riding) at 0.5160. These high values indicate that user narratives involving real-time adaptation, logistical complexity, or feedback loops are particularly effective in eliciting engagement via self-disclosure. Conversely, topics such as T1 (Service Regulations and Skill Training) and T2 (Income Dependency Behavior) show relatively weaker indirect effects (0.0171 and 0.0441, respectively) despite statistically significant direct paths to self-disclosure. This suggests that informational or procedural content—while still influential—may not strongly motivate emotional expression or community response. The contrast between Topic 3 (Order Anomalies and Emergency Response) and Topic 6 (Scenario-Adaptive Behavior) is especially illustrative: while both deal with operational pressure, T6 supports higher engagement due to stronger upstream and downstream path coefficients. These findings support Hypotheses H2a and H2b by confirming that topical variation affects the strength of the expressive–behavioral pathway and highlight the importance of content framing in amplifying user response.

To evaluate the strategic optimization potential of mediated engagement pathways, we computed the relative contribution of each topic’s indirect effect to the overall mediated effect (Table 7 and Figure 5). Topics T6 (Scenario-Adaptive Behavior), T10 (Delivery Process Feedback), and T7 (Risk Perception and Safe Riding) emerged as the top contributors, together accounting for 47.35% of the total mediated influence. When expanded to the top five topics, cumulative contribution reached 70.14%, indicating that a small subset of topics drives the majority of expressive-to-engagement transmission. These high-leverage pathways provide a targeted basis for content optimization, recruitment messaging, and algorithmic promotion strategies on digital labor platforms.

4.3. Moderated Mediation Analysis

To investigate how the indirect influence of topic content on user engagement varies under different psychological conditions, we implemented a dual-stage moderated mediation model using PROCESS v4.2 (Model 58). This model incorporates two moderators: cognitive complexity (CCI), which influences both the likelihood of expression and its communicative efficacy, and negative emotional polarity (EPI_N), which modulates the translation of expression into audience response. The conditional indirect effect is defined as Formulas (10) and (11).

4.3.1. Moderating Effects of Cognitive Complexity

Cognitive complexity (CCI) plays a dual role in the expressive pathway. First, it significantly moderates the likelihood that a topic leads to self-disclosure (first-stage moderation). As shown in Table 8, the interaction term

X \times CCI \to SI

is significant in multiple topics, most notably T4, T6, T8, and T11. For instance, in Topic 6 (Scenario-Adaptive Behavior), this interaction term is

α_{2}^{'} = 7.2905

, with a 95% bootstrap CI of [3.5806, 12.7726] without 0, indicating that users with higher cognitive linguistic expression are substantially more likely to disclose in response to adaptive or situational challenges. Second, CCI also moderates the downstream path from self-disclosure to engagement (

SI \times CCI \to ER

), with small yet robust effects across T4–T11 (e.g.,

β_{4}^{'} = 0.0005

CI [0.0002, 0.0009] for T6). This suggests that cognitively structured disclosures are more likely to resonate with the audience and elicit behavioral responses.

As shown in Appendix A Table A4, the paper highlights (bolding) only those moderation effects that are statistically significant at the 95% level, defined as confidence intervals that do not include zero. The further Johnson–Neyman analysis from Appendix A Table A4 reveals that this amplification only emerges when CCI surpasses a critical threshold (e.g., CCI > 5.0029, corresponding to 17.08% of the sample above the cut-off), reflecting a selective activation mechanism. These findings jointly support Hypotheses H3a, H3b, and H3c. And they underscore the conditional nature of cognitive effort: it is not universally applied but emerges selectively under semantically demanding themes.

4.3.2. Emotional Polarity as a Secondary Amplifier

Negative emotional polarity (EPI_N), in contrast, functions as a consistent but less potent second-stage moderator. The interaction term

SI \times {EPI}_{N} \to ER

is statistically significant across topics T2–T10, with bootstrap confidence intervals excluding zero and standardized coefficients ranging from 0.0097 (T8) to 0.0131 (T10). For example, in Topic 10 (Delivery Process Feedback), the emotional moderation effect reaches

β_{4}^{'} = 0.0134

, CI [0.0053, 0.0201]. Unlike CCI, the Johnson–Neyman region for EPI_N typically spans the entire observed value range, indicating a universal amplification effect: regardless of topic or cognitive framing, emotionally negative disclosures are more likely to elicit engagement (see Appendix A Table A5). Nevertheless, the absolute magnitude of EPI_N’s effect remains modest, suggesting its role is less about triggering expression than about subtly enhancing resonance.

4.3.3. From Disclosure to Interaction: A Gated Amplification Pathway

While self-disclosure (SI) exhibits statistically significant direct effects on engagement (ER) across all topics (ranging from 0.0009 to 0.0017), with bootstrap confidence intervals excluding zero, its practical impact remains limited in the absence of moderation (see Table 9). However, once cognitive and emotional moderators are introduced, the indirect pathway becomes substantially amplified, revealing the layered nature of digital expression.

Table 9 summarizes this amplification process by comparing three quantities: the first-stage moderation strength (

M E_{1}

), capturing how cognitive complexity (CCI) affects the topic → self-disclosure (

SI \to ER

) path; the second-stage moderation strength (

M E_{2}

), reflecting how both CCI and emotional polarity (EPI_N) enhance the

SI \to ER

linkage; and the direct effect: the unmoderated path from SI to ER, serving as a baseline.

A clear pattern emerges: topics with high

M E_{2}

values consistently achieve stronger overall expressive impact, even if their baseline

SI \to ER

coefficient is modest. For Topic 10 (Delivery Process Feedback), for example, its raw disclosure-to-engagement effect is 0.0016 CI [0.0005, 0.0012]—average in scale—but under strong emotional and cognitive moderation (

M E_{2} = 0.2950

), the effective pathway becomes highly leveraged. Topic 6 (Scenario-Adaptive Behavior) shows an exceptional

M E_{1}

value of 78.86, indicating that cognitive complexity powerfully amplifies disclosure behavior in this theme. While its

M E_{2}

value is smaller (0.0419), the compounded effect positions T6 as a highly expressive theme under cognitively demanding conditions. In contrast, Topic 1 (Service Regulations and Skill Training) and Topic 3 (Order Anomalies and Emergency Response) show both low

M E_{1}

and

M E_{2}

values (Topic 1:

M E_{2} = 0.0039

, CI [0.0012, 0.0019]; Topic 3:

M E_{2} = 0.0149

, CI [0.0013, 0.002]), suggesting that these themes generate weaker audience response even when disclosed. This comparison underscores a “gated amplification” model:

M E_{1}

governs whether the rider chooses to speak;

M E_{2}

determines whether the disclosure resonates.

Together, these findings support Hypotheses H4a and H4b. They suggest that expression alone is insufficient: for content to meaningfully engage others, it must pass both a cognitive filter (the cognitive complexity of disclosure) and an emotional amplifier (affective urgency of disclosure). This dual-layered mechanism offers critical insight into how expressive behaviors on digital platforms transition into social attention, and it confirms the theoretical logic behind our moderated mediation model.

4.3.4. Robustness Checks

To assess the stability of our findings, we conducted several robustness checks beyond the bootstrapped confidence intervals reported in all PROCESS estimates. First, we implemented a LASSO-topic baseline using the same Top2Vec-derived topics and control variables to benchmark the explanatory power of our PROCESS framework. The LASSO model selected seven non-zero coefficients at an optimal penalty of

α = 0.00637

and achieved a cross-validated RMSE of 0.7989, MAE of 0.6223, and

R^{2} = 0.3618

. While our PROCESS model yielded a slightly higher RMSE (0.8008) and MAE (0.6437), with

R^{2} = 0.3588

, it offered additional explanatory insight by uncovering significant mediated effects (e.g.,

T 6 \to SI \to ER

,

β_{2}^{6} = 0.0303

,

p < 0.01

) and moderated pathways (e.g.,

SI \times {EPI}_{N} \to ER

β_{4}^{6} = 0.0112

,

p < 0.01

). These results demonstrate that the PROCESS model maintains comparable predictive performance while capturing psychological mechanisms that are inaccessible to sparse regression. Second, we re-estimated the PROCESS model using alternative engagement metrics (e.g., comment and view volumes), with the results remaining directionally stable. Third, subsample analyses splitting high vs. low self-disclosers (based on median SI) indicated that key indirect effects, particularly those involving T5 and T6, persisted across both groups.

4.4. Effect Strengths and Optimization Potential

To assess the combined influence of topic content, cognitive framing, and emotional tone on behavioral engagement, the study computed the moderated mediation effect for each topic using Formula (12).

As shown in Table 10 and Figure 6, the key findings are as follows: Topic 10 (Delivery Process Feedback) demonstrates the highest conditional indirect effect (

I E^{10} = 5.4349

), indicating that under cognitively complex and emotionally charged conditions (

W_{1} = 5.0029

,

W_{2} = 9.05

), this topic exhibits extraordinary amplification from expression to interaction. Despite having no first-stage moderator (

α_{2}^{10} = 0

), its strong emotional modulation (

β_{4}^{10} = 0.029

) drives its leading performance. Topic 6 (Scenario-Adaptive Behavior) ranks second (

I E^{6} = 3.3033

), driven by high values for both cognitive moderation on expression (

α_{2}^{6} = 11.7272

) and strong topic–disclosure linkage (

α_{1}^{6} = 20.1933

). This suggests that adaptive behaviors resonate particularly strongly when expressed under cognitive load, such as during complex decision making. Topic 11 (Customer Complaints and Compensation) and Topic 4 (Technological Control and Labor Burden) also show prominent effects (

I E^{11} = 2.283

and

I E^{4} = 1.1763

, respectively), both benefiting from dual-stage moderation and emotionally salient content. In contrast, Topics 1 and 3 produce negligible indirect effects (

I E^{1} = 0.0171

;

I E^{3} = 0.1356

), primarily due to the absence of moderating mechanisms and weaker base path coefficients. These themes—centered on service regulations, skill training, order anomalies, and emergency response—may provoke less psychological investment, limiting their expressive and social utility.

This variation across topics supports the utility of a parametrically moderated mediation model, where cognitive and emotional states not only shape disclosure likelihood but significantly alter the downstream behavioral effect. The quantification of each topic’s pathway strength offers a model-driven basis for prioritizing content categories within behavioral prediction systems.

5. Discussion

5.1. Summary of Core Mechanisms

This study models the behavioral impact of content expression through a two-stage conditional process framework, disentangling how topic-specific disclosures translate into audience engagement under varying cognitive and emotional conditions. The empirical results presented jointly reveal three core mechanisms.

First, topic content exerts a direct and indirect effect on user behavior. As shown in Table 5 and Table 6, all eleven themes extracted via Top2Vec significantly influence both self-disclosure intensity and engagement rate, with mediation effects ranging from 0.0039 (T1) to 0.0303 (T6). This confirms that thematic expression—such as scenario adaptation (T6) or Delivery Process Feedback (T10)—acts as a behavioral signal even when not explicitly structured, aligning with prior findings on user-generated semantic cues.

Second, the transition from expression to interaction is not linear but contingent on cognitive and emotional moderators. The analysis in Section 4.3 demonstrates that cognitive complexity (

W_{1}

) significantly enhances the effect of topic exposure on disclosure (first-stage moderation), while negative emotional polarity (

W_{2}

) strengthens the effect of disclosure on engagement (second-stage moderation). This layered dependency justifies the model’s staged architecture and supports the use of conditional path formulas in behavioral modeling.

Third, the full moderated mediation effect—as computed in Section 4.4 using both

α

and

β

parameters—clarifies which topic pathways are most behaviorally effective under real-world variability. The results show that topics with both high base coefficients and significant modulation terms (e.g., T10 and T6) yield the strongest conditional indirect effects, up to 5.43 times larger than low-modulation topics such as T1.

Cumulatively, these mechanisms highlight that online user behavior is best understood not by linear correlations but through structured cognitive–affective models, where content, cognition, and emotion interact multiplicatively.

5.2. Psycholinguistic Moderators of Behavioral Response

The distinction between the first-stage and second-stage moderation effects provides key insight into how expression becomes actionable engagement. As formalized in Equations (10)–(12), the two-stage model isolates two distinct cognitive–affective pathways as follows: first, the expression pathway: how topic exposure (

X

) prompts users to disclose personal experience (

M

), modulated by cognitive complexity (

W_{1}

); second, the conversion pathway: how self-disclosure (

M

) influences subsequent engagement (

Y

), modulated by both

W_{1}

and emotional polarity (

W_{2}

).

Empirical findings that first-stage moderation plays a dominant role. For example, Topic 6 (Scenario-Adaptive Behavior) and Topic 10 (Delivery Process Feedback) exhibit high

M E_{1}

values (78.86 and 18.43, respectively), indicating that cognitively complex users are especially sensitive to situational or operational content, which encourages them to share more. This supports the idea that expression is a function of topic resonance × user cognition, rather than topic presence alone.

By contrast, second-stage moderation (

M E_{2}

) shows narrower variance across topics, ranging between 0.0039 and 0.2950. While still statistically significant, its behavioral leverage appears more limited. In most cases, the direct effect of self-disclosure on engagement remains within a narrow band (0.0009 to 0.0017), with modest amplification from

W_{2}

(emotional polarity). However, even modest second-stage gains (e.g., T10’s

M E_{2} = 0.2950

) can result in high overall indirect effects, as shown in Table 10.

These results suggest an important asymmetry: while both stages contribute, engagement is primarily driven by whether the user is cognitively activated to speak rather than how affectively intense the speech is. This reinforces the importance of modeling first-stage modulation explicitly, especially in contexts where voice visibility is optional (e.g., digital labor forums). This finding aligns with prior studies linking self-disclosure with engagement in digital communities [33,62] but extends them by modeling the conditional structure of disclosure rather than treating it as a static mediator. While previous studies have emphasized the existence of such pathways, our quantification of conditional indirect effects allows for actionable prediction.

5.3. Role of Conditional Moderation in Behavior Prediction

While traditional mediation models can capture the average indirect influence of topic content on engagement via self-disclosure, they are structurally limited in two respects: On the one hand, they assume homogeneity across users in expressive behavior; On the other hand, they overlook how downstream outcomes (e.g., interaction metrics) are shaped by psychological context. By contrast, our conditional process model captures the heterogeneity of mediation effects as a function of cognitive and emotional states.

While traditional mediation models can capture the average indirect influence of topic content on engagement via self-disclosure, they are structurally limited in two key respects:

They assume homogeneity across users in expressive behavior.
They overlook how downstream outcomes (e.g., interaction metrics) are shaped by psychological context.

By contrast, the conditional process model accounts for the heterogeneity of mediation effects as a function of individual-level cognitive and emotional states.

The comparison between unmoderated mediation effects (

E

) and fully moderated indirect effects (

I E

) illustrates this point. As shown in Table 10, the baseline indirect effect of Topic 1 is merely 0.0039, yet it increases more than fourfold (to 0.0171) under cognitive and emotional modulation. More strikingly, for Topic 10, the indirect effect rises from 0.0295 to 5.4349 under moderation, an amplification by a factor of 184. These differences are not trivial; they demonstrate how similar content can yield drastically different behavioral outcomes depending on the user’s internal psychological state.

This magnification stems from the structure of the conditional indirect effect (see Equation (12)).

In this equation,

W_{1}

represents cognitive complexity and

W_{2}

emotional polarity. The multiplicative interaction between stages reflects a compounding effect, wherein users with high cognitive complexity experiencing negative affect are more likely to convert self-disclosure into downstream engagement, even when the topic content itself is neutral. This pattern is consistent with psychological theories of signal amplification, which posit that emotionally and cognitively primed individuals perceive even weak stimuli as salient [47]. Importantly, the findings reveal that relying solely on simple mediation paths—while statistically significant—may obscure conditional variability that is critical for behavioral modeling. Platform interventions based on average effects risk underperforming, especially in psychologically heterogeneous environments. While prior studies have applied process-based models to analyze social influence and self-disclosure on digital platforms [63,64], such applications often limit the analysis to simple mediation or moderation paths, without accounting for higher-order interaction effects between psychological states and content features. The use of a dual-stage conditional process model (PROCESS v4.2) expands this tradition by enabling simultaneous estimation of both moderated mediation and conditional indirect effects. This approach allows for a more nuanced understanding of how users’ cognitive complexity and emotional polarity jointly shape the pathway from textual expression to audience engagement, a level of integration rarely implemented in existing behavioral modeling of online discourse. In doing so, we bridge a methodological gap between psycholinguistic modeling and interpretable behavioral prediction in platform contexts. In sum, the conditional mediation structure not only improves model fit but also enhances interpretability by identifying which user–content combinations produce the greatest behavioral return. Such quantified pathway strengths offer a principled basis for behavioral prediction and optimization in digital decision-making contexts.

5.4. Modeling High-Dimensional Discourse for Predictive Decision Making

This study contributes methodologically by embedding high-dimensional textual representations into a psychologically grounded statistical modeling framework for behavior prediction. The proposed pipeline—comprising semantic embedding, dimensionality reduction, and moderated mediation modeling—offers a generalizable and interpretable tool for decision-oriented research in digital contexts.

First, the research enhances topic regression pipelines by integrating semantically coherent topic embeddings derived from Top2Vec into a dual-stage conditional process model. Unlike traditional bag-of-words representations or sparsely selected LASSO features, our method preserves narrative integrity while capturing psychologically meaningful discourse patterns. Specifically, we employ a pre-trained BERT-based Chinese embedding model to extract sentence-level semantics from fragmented, affect-rich user posts. Benchmarking results show a significant improvement in thematic coherence compared to LDA (NPMI = 0.52 vs. 0.378), demonstrating the advantages of transformer-based contextual embeddings in short-form, emotionally charged texts.

This structured integration contrasts with prior approaches using LDA proportions or LASSO-selected terms [29,65], which either overlook semantic nuance or ignore theoretical mediation pathways. By embedding Top2Vec-derived constructs into a causal process model, our framework achieves both predictive validity and psychological interpretability, enabling the estimation of both direct and conditional effects, capabilities that are absent in most unsupervised topic pipelines.

Second, to improve topic separation and interpretability, the paper adopts UMAP for embedding dimensionality reduction. Compared with PCA and t-SNE, UMAP better preserves both local and global semantic structures, achieving higher topic coherence (NPMI = 0.52) and tighter cluster compactness (silhouette score = 0.37). These enhancements ensure that the resulting topic features are not only theoretically grounded but also statistically robust in subsequent modeling.

Third, the study applies a dual-stage moderated mediation model (PROCESS v4.2) to trace how topic content influences user engagement through the mediating role of self-disclosure, conditional on users’ cognitive complexity and emotional polarity. This allows us to capture not only average indirect effects but also how psycholinguistic states amplify or suppress behavioral responses. The resulting conditional path coefficients offer actionable insights into when and for whom expressive content becomes behaviorally consequential, providing a principled basis for predictive behavioral modeling and decision optimization.

Fourth, the proposed methodology is inherently modular and extensible. While our empirical application focuses on Chinese-language posts from a food delivery platform, each component—semantic embedding, dimensionality reduction, and process modeling—can be adapted across languages (e.g., LaBSE and XLM-R), domains (e.g., ride-hailing and crowdsourcing), and user communities. This flexibility makes the approach suitable for predictive modeling in high-dimensional, heterogeneous data environments.

Finally, the results show that cognitive and affective pathways linking content to behavior are quantifiable and interpretable. Compared to existing work using LIWC or shallow token features in Chinese [59], our framework offers deeper interpretive resolution and causal mapping. In summary, this study introduces a replicable, psychologically informed, and semantically enriched modeling pipeline that bridges the gap between discourse abstraction and behavioral prediction. It opens new possibilities for decision-making applications in digital labor and broader computational social science domains.

6. Conclusions and Decision-Making Implications

This study introduces a psychologically informed, statistically grounded framework for modeling how semantically rich textual content influences behavioral engagement in gig economy platforms. By integrating BERT-based Top2Vec embeddings, UMAP dimensionality reduction, and a moderated mediation structure (PROCESS v4.2), we offer a replicable and generalizable pipeline for digital behavior prediction.

The key findings are threefold. First, topic content exerts significant direct and indirect influence on user engagement, with self-disclosure acting as a robust mediator across all topic pathways. Second, cognitive complexity and emotional polarity moderate both stages of the mediation process, yielding highly heterogeneous indirect effects. This interaction structure reveals that identical content may trigger vastly different engagement outcomes depending on users’ psychological profiles. Third, by quantifying indirect pathway strength under different moderator conditions, the model offers granular insights into which communication signals are most likely to elicit meaningful interaction.

From a decision-making perspective, the proposed pipeline enables optimization at multiple levels. Platforms can firstly prioritize content types (e.g., scenario-adaptive narratives or Delivery Process Feedback) with stronger mediated effects; secondly, identify user segments most responsive to specific discursive cues; and thirdly, tailor recommendation systems to align with users’ cognitive and emotional tendencies. Unlike conventional classification or regression approaches that treat features as static predictors, our dual-stage structure incorporates psychological realism, enhancing both predictive accuracy and interpretability.

The generalizability of the method—across domains and user types—further positions it as a versatile tool for real-world applications. Whether used for content targeting in platform labor, discourse analysis in social media, or personalized recommendation in digital platform, this integrated framework bridges the gap between text analytics and behavioral decision making. It contributes not only to methodological rigor but also to actionable insight generation in high-dimensional, user-centered environments.

Future research may extend this framework across cultural or occupational contexts, refine temporal dynamics, or incorporate platform-level metrics such as sharing frequency, sentiment trajectory, or real-time algorithmic feedback to build adaptive prediction models. In addition, while the study focuses on a single Chinese platform, the proposed framework—comprising multilingual embeddings, topic modeling, and regression—is designed to be transferable. Nevertheless, the paper recognizes that linguistic and contextual differences may affect signal salience. Future research should validate the model across alternative domains (e.g., ride-hailing), languages (e.g., English and Spanish), and user types to examine its robustness beyond the studied setting.

Author Contributions

Conceptualization, D.L. and Y.Z.; Methodology, D.L.; Software (Python 3.10; LIWC-22; SPSS 26.0), D.L.; Validation, D.L. and Y.Z.; Formal analysis, D.L.; Investigation, D.L. and Y.Z.; Resources, D.L. and Y.Z.; Data curation, D.L.; Writing—original draft preparation, D.L.; Writing—review and editing, Y.Z.; Visualization, D.L.; Supervision, Y.Z.; Project administration, Y.Z.; Funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Social Science Fund of China (Grant number: 22BGL124), the National Social Science Fund of China (Grant number: 20&ZD128), the National Social Science Fund of China (Grant number: 24AGL025). The authors have received research support from Alibaba Group Holding Limited (Project Code: 202269).

Data Availability Statement

The datasets generated during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no financial or non-financial interests that could have appeared to influence the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:

SI	Self-Disclosure Index
EPI	Emotional Polarity Index
CCI	Cognitive Complexity Index
ER	Engagement Rate
DE	Direct Effect
Top2Vec	Topical Vectors via Embeddings and Clustering
LIWC	Linguistic Inquiry and Word Count
LDA	Latent Dirichlet Allocation
NLP	Natural Language Processing
ELM	Elaboration Likelihood Model
T1 to T10	Topic 1 to Topic 10

Appendix A

Table A1. Comparison of dimensionality reduction methods.

Method	Topic Coherence (NPMI)	Silhouette Score
UMAP	0.52	0.37
t-SNE	0.41	0.28
PCA	0.36	0.22

Note: NPMI evaluates topic semantic coherence; silhouette score assesses cluster compactness.

Table A2. Topic names and their definitions based on community discourse analysis.

No.	Topic Name	Keyword Examples (with Importance Scores)	Topic Definition
T1	Service Regulations and Skill Training	polite (0.9798), training (0.9769), ID card (0.9728), health certificate (0.9705), punishment (0.9608), and inspection (0.9585)	Expressions related to sudden delivery issues, including process interruptions and immediate responses [66].
T2	Income Dependency Behavior	hourly wage (0.9867), actual payment (0.9263), income (0.91196), making money (0.9112), planning (0.7588)	Emphasizes the tension between task traceability, visualized workflows, and the pressure to maintain delivery efficiency [67].
T3	Order Anomalies and Emergency Response	wrong delivery (0.8708), order urging (0.6966), reservation (0.7588), and meal preparation (0.7777)	Captures the stress of coordinating multiple orders and managing delivery time windows [68].
T4	Technological Control and Labor Burden	returned orders (0.9290), subsidies (0.9161), arrival (0.8723), and phototaking (0.8003)	Strategic responses to different delivery environments and adaptive coping with contextual challenges [69]
T5	Time Constraints and Task Pressure	order grabbing (0.9326), back-and-forth trips (0.9241), safety (0.9106), on the way (0.8939), rest (0.8671), and intersections (0.8897)	Safety-oriented ride behaviors adopted to cope with anxiety over health and safety in high-risk environments [70].
T6	Scenario-adaptive Behavior	order surge (0.9651), business district (0.9518), vehicle (0.9475), office buildings (0.9183), distance (0.9408), and parking (0.9338)	Psychological labor rooted in income instability, performance penalties, and emotional stress [67].
T7	Risk Perception and Safe Riding	battery (0.9772), elevator (0.9367), resignation (0.9478), complaints (0.9273), and accidents (0.9139)	Responsibility sharing and role-perceived fairness pressure caused by algorithmic and institutional structures [71].
T8	Earnings Anxiety and Rating Pressure	stress (0.1018), negative reviews (0.0912), income (0.0817), and wage (0.0803)	Reflections on opaque dispatch systems and strategies regarding delivery process challenges [67].
T9	Perceived Fairness Pressure	inaccurate (0.8993), provision (0.8805), responsibility (0.8444), proof (0.8114), and rules (0.8025)	Conflicts arising from service disputes and the platform’s compensation mechanisms [72].
T10	Delivery Process Feedback	cancellation (0.6849), meal preparation (0.6272), pickup (0.6036), climbing stairs (0.5810), route (0.4002), and process (0.2429)	Expressions related to sudden delivery issues, including process interruptions and immediate responses [28].
T11	Customer Complaints and Compensation	refund (0.8599), compensation (0.5912), penalties (0.3628), and meal damage (0.7290)	Emphasizes the tension between task traceability, visualized workflows, and the pressure to maintain delivery efficiency [73].

Table A3. Kendall’s Tau correlation matrix.

	SI	ER	CCI	EPI_P	EPI_N	T1	T2	T3	T4	T5	T6	T7	T8	T9	T10	T11
SI	1.00 ***	0.47 ***	0.61 ***	0.028	0.17 ***	0.50 ***	0.51 ***	0.41 ***	0.30 ***	0.31 ***	0.35 ***	0.26 ***	0.30 ***	0.32 ***	0.30 ***	0.33 ***
ER	0.47 ***	1.00 ***	0.52 ***	−0.006	0.18 ***	0.46 ***	0.46 ***	0.34 ***	0.21 ***	0.27 ***	0.29 ***	0.20 ***	0.26 ***	0.27 ***	0.24 ***	0.29 ***
CCI	0.61 ***	0.52 ***	1.00 ***	0.039	0.18 ***	0.53 ***	0.53 ***	0.42 ***	0.30 ***	0.31 ***	0.34 ***	0.24 ***	0.29 ***	0.31 ***	0.29 ***	0.33 ***
EPI_P	0.028	−0.006	0.039	1.00 ***	−0.36 **	−0.012	0.014	0.006	−0.072	0.036	−0.053	0.005	−0.012	0	−0.039	−0.036
EPI_N	0.17 ***	0.18 ***	0.18 ***	−0.36 **	1.00 ***	0.21 ***	0.18 ***	0.10 **	0.18 ***	0.06	0.13 ***	0.04	0.06	0.07	0.14 ***	0.11 **
T1	0.50 ***	0.46 ***	0.53 ***	−0.012	0.21 ***	1.00 ***	0.39 ***	0.31 ***	0.24 ***	0.29 ***	0.25 ***	0.22 ***	0.23 ***	0.25 ***	0.26 ***	0.31 ***
T2	0.51 ***	0.46 ***	0.53 ***	0.014	0.18 ***	0.39 ***	1.00 ***	0.34 ***	0.23 ***	0.28 ***	0.30 ***	0.21 ***	0.21 ***	0.25 ***	0.28 ***	0.29 ***
T3	0.41 ***	0.34 ***	0.42 ***	0.006	0.10 **	0.31 ***	0.34 ***	1.00 ***	0.26 ***	0.28 ***	0.27 ***	0.22 ***	0.26 ***	0.26 ***	0.21 ***	0.28 ***
T4	0.30 ***	0.21 ***	0.30 ***	−0.072	0.18 ***	0.24 ***	0.23 ***	0.26 ***	1.00 ***	0.19 ***	0.22 ***	0.08 *	0.14 ***	0.23 ***	0.20 ***	0.17 ***
T5	0.31 ***	0.27 ***	0.31 ***	0.036	0.06	0.29 ***	0.28 ***	0.28 ***	0.19 ***	1.00 ***	0.13 ***	0.17 ***	0.15 ***	0.24 ***	0.20 ***	0.20 ***
T6	0.35 ***	0.29 ***	0.34 ***	−0.053	0.13 ***	0.25 ***	0.30 ***	0.27 ***	0.22 ***	0.13 ***	1.00 ***	0.14 ***	0.15 ***	0.14 ***	0.23 ***	0.22 ***
T7	0.26 ***	0.20 ***	0.24 ***	0.005	0.04	0.22 ***	0.21 ***	0.22 ***	0.08 *	0.17 ***	0.14 ***	1.00 ***	0.15 ***	0.09 *	0.19 ***	0.24 ***
T8	0.30 ***	0.26 ***	0.29 ***	−0.012	0.06	0.23 ***	0.21 ***	0.26 ***	0.14 ***	0.15 ***	0.15 ***	0.15 ***	1.00 ***	0.17 ***	0.12 **	0.22 ***
T9	0.32 ***	0.27 ***	0.31 ***	0	0.07	0.25 ***	0.25 ***	0.26 ***	0.23 ***	0.24 ***	0.14 ***	0.09 *	0.17 ***	1.00 ***	0.17 ***	0.17 ***
T10	0.30 ***	0.24 ***	0.29 ***	−0.039	0.14 ***	0.26 ***	0.28 ***	0.21 ***	0.20 ***	0.20 ***	0.23 ***	0.19 ***	0.12 **	0.17 ***	1.00 ***	0.24 ***
T11	0.33 ***	0.29 ***	0.33 ***	−0.036	0.11 **	0.31 ***	0.29 ***	0.28 ***	0.17 ***	0.20 ***	0.22 ***	0.24 ***	0.22 ***	0.17 ***	0.24 ***	1.00 ***

Note: * for

p < 0.1

, ** for

p < 0.05

, and *** for

p < 0.01

.

Table A4. Moderating effect of cognitive complexity index (CCI) on the path of self-disclosure.

X	Path	CCI	$β_{3}$	SE	$t$	$p$	LLCI	ULCI	Johnson–Neyman Value(s)	Below%	Above%
T4	T4 × CCI→SI	2.388	−1.8428	3.8361	−0.4804	0.6311	−9.3757	5.6901
		3.6955	1.6798	2.426	0.6924	0.4889	−3.0842	6.4437
		5.0029	5.2024	1.5245	3.4125	0.0007	2.2087	8.196
	SI × CCI→ER	2.388	−0.0006	0.0005	−1.0401	0.2987	−0.0016	0.0005	4.5197	75.7764	24.2236
		3.6955	0	0.0003	0.0514	0.959	−0.0006	0.0006
		5.0029	0.0006	0.0002	3.4032	0.0007	0.0002	0.0009
T6	T6 × CCI→SI	2.388	−7.3372	4.5758	−1.6035	0.1093	−16.3226	1.6483	2.0017	7.4534	92.5466
		3.6955	2.195	2.6998	0.813	0.4165	−3.1065	7.4966	4.0171	61.3354	38.6646
		5.0029	11.7272	1.6912	6.9344	0	8.4063	15.0481
	SI × CCI→ER	2.388	−0.0006	0.0005	−1.1005	0.2715	−0.0016	0.0004	4.6287	78.1056	21.8944
		3.6955	0	0.0003	−0.09	0.9283	−0.0006	0.0005
		5.0029	0.0005	0.0002	2.9149	0.0037	0.0002	0.0009
T8	T8 × CCI→SI	2.388	−6.8584	5.1558	−1.3302	0.1839	−16.9826	3.2659	4.9884	82.9193	17.0807
		3.6955	−1.5979	3.0857	−0.5178	0.6047	−7.6571	4.4613
		5.0029	3.6626	1.833	1.9981	0.0461	0.0632	7.262
	SI × CCI→ER	2.388	−0.0005	0.0005	−1.0463	0.2958	−0.0016	0.0005	4.5442	76.5528	23.4472
		3.6955	0	0.0003	0.0354	0.9718	−0.0006	0.0006
		5.0029	0.0006	0.0002	3.3202	0.001	0.0002	0.0009
T11	T11 × CCI→SI	2.388	−0.8671	4.863	−0.1783	0.8585	−10.4164	8.6822	4.3443	71.8944	28.1056
		3.6955	2.7941	3.0946	0.9029	0.3669	−3.2827	8.871
		5.0029	6.4554	1.8361	3.5157	0.0005	2.8498	10.0609
	SI × CCI→ER	2.388	−0.0007	0.0005	−1.2959	0.1955	−0.0017	0.0004	4.5559	76.8634	23.1366
		3.6955	0	0.0003	−0.1646	0.8693	−0.0006	0.0005
		5.0029	0.0006	0.0002	3.4008	0.0007	0.0002	0.0009
T5	SI × CCI→ER	2.388	−0.0006	0.0005	−1.2132	0.2255	−0.0016	0.0004	4.5642	76.8634	23.1366
		3.6955	0	0.0003	−0.1025	0.9184	−0.0006	0.0006
		5.0029	0.0006	0.0002	3.3438	0.0009	0.0002	0.0009
T7	SI × CCI→ER	2.388	−0.0006	0.0005	−1.1071	0.2687	−0.0016	0.0004	4.5314	76.2422	23.7578
		3.6955	0	0.0003	0.01	0.992	−0.0006	0.0006
		5.0029	0.0006	0.0002	3.3875	0.0007	0.0002	0.0009
T9	SI × CCI→ER	2.388	−0.0008	0.0005	−1.639	0.1017	−0.0019	0.0002	1.5077	3.882	96.118
		3.6955	−0.0001	0.0003	−0.4558	0.6487	−0.0007	0.0004	4.6011	77.795	22.205
		5.0029	0.0006	0.0002	3.4395	0.0006	0.0002	0.0009
T10	SI × CCI→ER	2.388	−0.0008	0.0005	−1.5109	0.1313	−0.0018	0.0002	0.9732	3.5714	96.4286
		3.6955	−0.0001	0.0003	−0.3243	0.7458	−0.0007	0.0005	4.5732	77.0186	22.9814
		5.0029	0.0006	0.0002	3.4502	0.0006	0.0003	0.0009

Note: Bolded rows indicate statistically significant moderation effects based on bias-corrected 95% confidence intervals (i.e., LLCI and ULCI do not include zero). Bolded rows’ coefficients are estimated via PROCESS v4.2 using 5000 bootstrap samples. Johnson–Neyman intervals report thresholds at which moderation effects become significant. Non-significant rows are retained for completeness but not bolded, following standard threshold reporting practices.

Table A5. Moderating effect of negative emotional polarity index (EPI_N) on the path of self-disclosure.

Path	EPI_N	$β_{4}$	SE	$t$	$p$	LLCI	ULCI
T2:SI × EPI_N→ER	0	0.0303	0.0033	9.2472	0	0.0238	0.0367
	0.0499	0.0239	0.0025	9.5815	0	0.019	0.0287
	0.129	0.0137	0.0039	3.502	0.0005	0.006	0.0214
T3:SI × EPI_N→ER	0	0.001	0.0002	4.7571	0	0.0006	0.0013
	0.0499	0.0015	0.0002	8.1639	0	0.0011	0.0018
	0.129	0.0023	0.0004	6.0837	0	0.0016	0.003
T4:SI × EPI_N→ER	0	0.0013	0.0002	6.3946	0	0.0009	0.0017
	0.0499	0.0018	0.0002	10.2507	0	0.0015	0.0022
	0.129	0.0027	0.0004	7.1358	0	0.002	0.0035
T5:SI × EPI_N→ER	0	0.0011	0.0002	5.6241	0	0.0007	0.0015
	0.0499	0.0018	0.0002	10.1216	0	0.0014	0.0021
	0.129	0.0027	0.0004	7.7034	0	0.002	0.0034
T6:SI × EPI_N→ER	0	0.0011	0.0002	5.5868	0	0.0007	0.0015
	0.0499	0.0017	0.0002	9.4572	0	0.0014	0.0021
	0.129	0.0026	0.0004	7.3721	0	0.0019	0.0033
T7:SI × EPI_N→ER	0	0.0012	0.0002	6.0433	0	0.0008	0.0016
	0.0499	0.0018	0.0002	10.5299	0	0.0015	0.0022
	0.129	0.0028	0.0004	7.9282	0	0.0021	0.0035
T8:SI × EPI_N→ER	0	0.0013	0.0002	6.4397	0	0.0009	0.0016
	0.0499	0.0017	0.0002	10.2584	0	0.0014	0.0021
	0.129	0.0025	0.0003	7.195	0	0.0018	0.0032
T9:SI × EPI_N→ER	0	0.0012	0.0002	5.9102	0	0.0008	0.0016
	0.0499	0.0017	0.0002	10.0795	0	0.0014	0.002
	0.129	0.0026	0.0003	7.4181	0	0.0019	0.0033
T10:SI × EPI_N→ER	0	0.043	0.0115	3.7577	0.0002	0.0206	0.0655
	0.0499	0.029	0.0093	3.1297	0.0018	0.0108	0.0472
	0.129	0.0068	0.013	0.5251	0.5997	−0.0187	0.0323
T11:SI × EPI_N→ER	0	0.0012	0.0002	6.0001	0	0.0008	0.0016
	0.0499	0.0017	0.0002	9.8758	0	0.0014	0.0021
	0.129	0.0026	0.0004	7.3121	0	0.0019	0.0033

Note: CIs excluding 0 indicate statistical significance at the 95% confidence level.

References

Park, W.H.; Shin, D.R.; Mutahira, H. An Integrated Approach to Bayesian Weight Regulations and Multitasking Learning Methods for Generating Emotion-Based Content in the Metaverse. Expert Syst. Appl. 2025, 259, 125197. [Google Scholar] [CrossRef]
Feuerriegel, S.; Maarouf, A.; Bär, D.; Geissler, D.; Schweisthal, J.; Pröllochs, N.; Robertson, C.E.; Rathje, S.; Hartmann, J.; Mohammad, S.M.; et al. Using Natural Language Processing to Analyse Text Data in Behavioural Science. Nat. Rev. Psychol. 2025, 4, 96–111. [Google Scholar] [CrossRef]
Govindarajan, U.H.; Narang, G.; Singh, D.K.; Yadav, V.S. Blockchain Technologies Adoption in Healthcare: Overcoming Barriers amid the Hype Cycle to Enhance Patient Care. Technol. Forecast. Soc. Change 2025, 213, 124031. [Google Scholar] [CrossRef]
Chaudhary, A.; Milios, E.; Rajabi, E. Top2Label: Explainable Zero Shot Topic Labelling Using Knowledge Graphs. Expert Syst. Appl. 2024, 242, 122676. [Google Scholar] [CrossRef]
Karatepe, O.M.; Rezapouraghdam, H.; Hassannia, R.; Karatepe, T.; Kim, T.T. Test of a Moderated Mediation Model of Green Human Resource Management, Workplace Spirituality, Environmental Commitment, and Green Behavior. Int. J. Hosp. Manag. 2025, 126, 104010. [Google Scholar] [CrossRef]
Freo, M.; Luati, A. Lasso-Based Variable Selection Methods in Text Regression: The Case of Short Texts. AStA Adv. Stat. Anal. 2024, 108, 69–99. [Google Scholar] [CrossRef]
Adamek, R.; Smeekes, S.; Wilms, I. Lasso Inference for High-Dimensional Time Series. J. Econom. 2023, 235, 1114–1143. [Google Scholar] [CrossRef]
Wang, Y.; Zou, B.; Xu, J.; Xu, C.; Tang, Y.Y. ALR-HT: A Fast and Efficient Lasso Regression without Hyperparameter Tuning. Neural Netw. 2025, 181, 106885. [Google Scholar] [CrossRef] [PubMed]
Bulut, A. TopicMachine: Conversion Prediction in Search Advertising Using Latent Topic Models. IEEE Trans. Knowl. Data Eng. 2014, 26, 2846–2858. [Google Scholar] [CrossRef]
Taha, K. Text Regression Analysis: A Review, Empirical, and Experimental Insights. IEEE Access 2024, 12, 137333–137344. [Google Scholar] [CrossRef]
Liu, Y.; Zhou, X.; Zhang, Z.; Yang, X. BETM: A New Pre-Trained BERT-Guided Embedding-Based Topic Model. Big Data Res. 2025, 41, 100551. [Google Scholar] [CrossRef]
Berger, J.A.; Milkman, K.L. What Makes Online Content Viral? SSRN Electron. J. 2012, 49, 192–205. [Google Scholar] [CrossRef]
Wang, Y.; Sun, J.; Ma, Z. Dual Impact of Information Complexity and Individual Characteristics on Information and Disease Propagation. Mathematics 2025, 13, 1949. [Google Scholar] [CrossRef]
Balaji, M.S.; Behl, A.; Jain, K.; Baabdullah, A.M.; Giannakis, M.; Shankar, A.; Dwivedi, Y.K. Effectiveness of B2B Social Media Marketing: The Effect of Message Source and Message Content on Social Media Engagement. Ind. Mark. Manag. 2023, 113, 243–257. [Google Scholar] [CrossRef]
Chae, M.-J.; Rodríguez-Vilá, O.; Bharadwaj, S. Real-Time Marketing Messages and Consumer Engagement in Social Media. J. Bus. Res. 2025, 191, 115266. [Google Scholar] [CrossRef]
Bourguignon, B.; Terho, H.; Hajjem, A. How B2B Social Media Content Strategies Generate Engagement across Different Social Media Platforms. Ind. Mark. Manag. 2025, 125, 413–430. [Google Scholar] [CrossRef]
Lu, X.; Zhou, X.; Gan, S.; He, X.; Chen, X.; Xiao, Y.; Liu, Y. SAEQ: Semantic Anomaly Event Quantifier for Event Detection and Judgement in Social Media. Expert Syst. Appl. 2025, 271, 126522. [Google Scholar] [CrossRef]
Alacovska, A.; Bucher, E.; Fieseler, C. Algorithmic Paranoia: Gig Workers’ Affective Experience of Abusive Algorithmic Management. New Technol. Work Employ. 2024, 1–15. [Google Scholar] [CrossRef]
Barta, K.; Andalibi, N. Theorizing Self Visibility on Social Media: A Visibility Objects Lens. ACM Trans. Comput.-Hum. Interact. 2024, 31, 1–28. [Google Scholar] [CrossRef]
Carbone, E.; Loewenstein, G.; Scopelliti, I.; Vosgerau, J. He Said, She Said: Gender Differences in the Disclosure of Positive and Negative Information. J. Exp. Soc. Psychol. 2024, 110, 104525. [Google Scholar] [CrossRef]
Setten, E.; Chen, S. Playing with Emotions: Text Analysis of Emotional Tones in Gender-Casted Children’s Media. J. Bus. Res. 2024, 175, 114541. [Google Scholar] [CrossRef]
Martínez-Sykora, A.; McLeod, F.; Cherrett, T.; Friday, A. Exploring Fairness in Food Delivery Routing and Scheduling Problems. Expert Syst. Appl. 2024, 240, 122488. [Google Scholar] [CrossRef]
Liu, J.; Pei, S.; Zhang, X. (Michael) Online Food Delivery Platforms and Female Labor Force Participation. Inf. Syst. Res. 2024, 35, 1074–1091. [Google Scholar] [CrossRef]
Zhang, J.; Özpolat, K.; Karamemis, G.; Schniederjans, D. To Disclose or Not? The Impact of Prosocial Behavior Disclosure on the Attainment of Social Capital on Social Networking Sites. Decis. Support Syst. 2025, 192, 114437. [Google Scholar] [CrossRef]
Xiao, Z.; Gong, X.; Cheung, C.M.K. Self-Disclosure in Online Social Networks: The Needs-Affordances-Features Perspective. Inf. Manag. 2025, 62, 104102. [Google Scholar] [CrossRef]
Stevic, A.; Koban, K.; Matthes, J. Tell Me More: Longitudinal Relationships between Online Self-Disclosure, Co-Rumination, and Psychological Well-Being. Comput. Hum. Behav. 2025, 165, 108540. [Google Scholar] [CrossRef]
Shaffer, D.R.; Ogden, J.K. On Sex Differences in Self-Disclosure during the Acquaintance Process: The Role of Anticipated Future Interaction. J. Personal. Soc. Psychol. 1986, 51, 92–101. [Google Scholar] [CrossRef]
Won, J.; Lee, D.; Lee, J. Understanding Experiences of Food-Delivery-Platform Workers under Algorithmic Management Using Topic Modeling. Technol. Forecast. Soc. Change 2023, 190, 122369. [Google Scholar] [CrossRef]
Zhang, Y.; Li, D.; Liu, S. Research on the Impact of the Public Safety Emergencies on Women Riders’ Preference of Shanghai Real-Time Crowdsourcing Logistics Platform. Sage Open 2024, 14, 21582440241255804. [Google Scholar] [CrossRef]
Li, D.; Zhang, Y. Exploring Asymmetric Gender-Based Satisfaction of Delivery Riders in Real-Time Crowdsourcing Logistics Platforms. Symmetry 2024, 16, 1499. [Google Scholar] [CrossRef]
Lin, H.; Wang, C.; Hao, Q. A Novel Personality Detection Method Based on High-Dimensional Psycholinguistic Features and Improved Distributed Gray Wolf Optimizer for Feature Selection. Inf. Process. Manag. 2023, 60, 103217. [Google Scholar] [CrossRef]
Ahmed, K.; Khan, M.A.; Haq, I.; Mazroa, A.A.; M.S., S.; Innab, N.; Alajmi, M.; Alkahtani, H.K. Social Media’s Dark Secrets: A Propagation, Lexical and Psycholinguistic Oriented Deep Learning Approach for Fake News Proliferation. Expert Syst. Appl. 2024, 255, 124650. [Google Scholar] [CrossRef]
Kim, H.-S.; Pak, J.; Chung, M.-Y.; Kim, Y. How Self-Disclosure Builds Cancer Communities through Authentic Stories on YouTube: Mediating Role of User Participation in Self-Disclosure Reciprocity. Comput. Hum. Behav. 2024, 156, 108226. [Google Scholar] [CrossRef]
Cropanzano, R.; Anthony, E.L.; Daniels, S.R.; Hall, A.V. Social Exchange Theory: A Critical Review with Theoretical Remedies. Acad. Manag. Ann. 2017, 11, 479–516. [Google Scholar] [CrossRef]
Chernyak-Hai, L.; Rabenu, E. The New Era Workplace Relationships: Is Social Exchange Theory Still Relevant? Ind. Organ. Psychol. 2018, 11, 456–481. [Google Scholar] [CrossRef]
Lin, T.-C.; Huang, C.-C. Withholding Effort in Knowledge Contribution: The Role of Social Exchange and Social Cognitive on Project Teams. Inf. Manag. 2010, 47, 188–196. [Google Scholar] [CrossRef]
Nunkoo, R.; Ramkissoon, H. Power, Trust, Social Exchange and Community Support. Ann. Touris. Res. 2012, 39, 997–1023. [Google Scholar] [CrossRef]
Luqman, A.; Zhang, Q.; Hina, M. Employees’ Proactiveness on Enterprise Social Media and Social Consequences: An Integrated Perspective of Social Network and Social Exchange Theories. Inf. Manag. 2023, 60, 103843. [Google Scholar] [CrossRef]
Ren, X. How Customized Managerial Responses Influence Subsequent Consumer Ratings: The Language Style Matching Perspective. Decis. Support Syst. 2024, 180, 114188. [Google Scholar] [CrossRef]
Fu, X.; Liu, X.; Li, Z. Catching Eyes of Social Media Wanderers: How Pictorial and Textual Cues in Visitor-Generated Content Shape Users’ Cognitive-Affective Psychology. Tour. Manag. 2024, 100, 104815. [Google Scholar] [CrossRef]
Kim, A.J.; Johnson, K.K.P. Power of Consumers Using Social Media: Examining the Influences of Brand-Related User-Generated Content on Facebook. Comput. Hum. Behav. 2016, 58, 98–108. [Google Scholar] [CrossRef]
Leung, X.Y.; Sun, J.; Asswailem, A. Attractive Females versus Trustworthy Males: Explore Gender Effects in Social Media Influencer Marketing in Saudi Restaurants. Int. J. Hosp. Manag. 2022, 103, 103207. [Google Scholar] [CrossRef]
Teepapal, T. AI-Driven Personalization: Unraveling Consumer Perceptions in Social Media Engagement. Comput. Hum. Behav. 2025, 165, 108549. [Google Scholar] [CrossRef]
Meier, Y.; Krämer, N.C. The Privacy Calculus Revisited: An Empirical Investigation of Online Privacy Decisions on between- and within-Person Levels. Commun. Res. 2024, 51, 178–202. [Google Scholar] [CrossRef]
Ichien, N.; Lin, N.; Holyoak, K.J.; Lu, H. Cognitive Complexity Explains Processing Asymmetry in Judgments of Similarity versus Difference. Cognit. Psychol. 2024, 151, 101661. [Google Scholar] [CrossRef] [PubMed]
Powell, P.A.; Roberts, J. Situational Determinants of Cognitive, Affective, and Compassionate Empathy in Naturalistic Digital Interactions. Comput. Hum. Behav. 2017, 68, 137–148. [Google Scholar] [CrossRef]
Fawcett, S.E.; Jin, Y.H.; Fawcett, A.M.; Magnan, G. I Know It When I See It: The Nature of Trust, Trustworthiness Signals, and Strategic Trust Construction. Int. J. Logist. Manag. 2017, 28, 914–938. [Google Scholar] [CrossRef]
Liao, L.; Huang, T. The Effect of Different Social Media Marketing Channels and Events on Movie Box Office: An Elaboration Likelihood Model Perspective. Inf. Manag. 2021, 58, 103481. [Google Scholar] [CrossRef]
Shi, J.; Hu, P.; Lai, K.K.; Chen, G. Determinants of Users’ Information Dissemination Behavior on Social Networking Sites: An Elaboration Likelihood Model Perspective. Internet Res. 2018, 28, 393–418. [Google Scholar] [CrossRef]
Vachhani, S.J. Networked Feminism in a Digital Age—Mobilizing Vulnerability and Reconfiguring Feminist Politics in Digital Activism. Gend. Work Organ. 2024, 31, 1031–1048. [Google Scholar] [CrossRef]
Song, Y.; Lin, Q.; Kwon, K.H.; Choy, C.H.Y.; Xu, R. Contagion of Offensive Speech Online: An Interactional Analysis of Political Swearing. Comput. Hum. Behav. 2022, 127, 107046. [Google Scholar] [CrossRef]
Rimé, B.; Finkenauer, C.; Luminet, O.; Zech, E.; Philippot, P. Social Sharing of Emotion: New Evidence and New Questions. Eur. Rev. Soc. Psychol. 1998, 9, 145–189. [Google Scholar] [CrossRef]
Rodríguez Hidalgo, C.T.; Tan, E.S.H.; Verlegh, P.W.J. The Social Sharing of Emotion (SSE) in Online Social Networks: A Case Study in Live Journal. Comput. Hum. Behav. 2015, 52, 364–372. [Google Scholar] [CrossRef]
Gaspar, R.; Pedro, C.; Panagiotopoulos, P.; Seibt, B. Beyond Positive or Negative: Qualitative Sentiment Analysis of Social Media Reactions to Unexpected Stressful Events. Comput. Hum. Behav. 2016, 56, 179–191. [Google Scholar] [CrossRef]
Stsiampkouskaya, K.; Joinson, A.; Piwek, L.; Ahlbom, C.-P. Emotional Responses to Likes and Comments Regulate Posting Frequency and Content Change Behaviour on Social Media: An Experimental Study and Mediation Model. Comput. Hum. Behav. 2021, 124, 106940. [Google Scholar] [CrossRef]
Pham, S.; Churruca, K.; Ellis, L.A.; Braithwaite, J. Utilisation of the Internet and Support Communities on Facebook for Gestational Diabetes Mellitus Self-Management and Empowerment: A Cross-Sectional Online Survey Study. Inf. Commun. Soc. 2025, 1–19. [Google Scholar] [CrossRef]
Lei, Y.-W. Delivering Solidarity: Platform Architecture and Collective Contention in China’s Platform Economy. Am. Sociol. Rev. 2021, 86, 279–309. [Google Scholar] [CrossRef]
Ozkan-Tektas, O.; Basgoze, P. Pre-Recovery Emotions and Satisfaction: A Moderated Mediation Model of Service Recovery and Reputation in the Banking Sector. Eur. Manag. J. 2017, 35, 388–395. [Google Scholar] [CrossRef]
McHaney, R.; Tako, A.; Robinson, S. Using LIWC to Choose Simulation Approaches: A Feasibility Study. Decis. Support Syst. 2018, 111, 1–12. [Google Scholar] [CrossRef]
Walther, J.B. Selective Self-Presentation in Computer-Mediated Communication: Hyperpersonal Dimensions of Technology, Language, and Cognition. Comput. Hum. Behav. 2007, 23, 2538–2557. [Google Scholar] [CrossRef]
Surucu-Balci, E.; Balci, G.; Yuen, K.F. Social Media Engagement of Stakeholders: A Decision Tree Approach in Container Shipping. Comput. Ind. 2020, 115, 103152. [Google Scholar] [CrossRef]
Wang, L.; Yan, J.; Lin, J.; Cui, W. Let the Users Tell the Truth: Self-Disclosure Intention and Self-Disclosure Honesty in Mobile Social Networking. Int. J. Inf. Manag. 2017, 37, 1428–1440. [Google Scholar] [CrossRef]
Cai, R.; Wang, Y.-C.; Sun, J. Customers’ Intention to Compliment and Complain via AI-Enabled Platforms: A Self-Disclosure Perspective. Int. J. Hosp. Manag. 2024, 116, 103628. [Google Scholar] [CrossRef]
Leite, F.P.; Pontes, N.; Baptista, P.D.P. Oops, I’ve Overshared! When Social Media Influencers’ Self-Disclosure Damage Perceptions of Source Credibility. Comput. Hum. Behav. 2022, 133, 107274. [Google Scholar] [CrossRef]
Wu, X.; Liang, R.; Zhang, Z.; Cui, Z. Multi-Block Linearized Alternating Direction Method for Sparse Fused Lasso Modeling Problems. Appl. Math. Modell. 2025, 137, 115694. [Google Scholar] [CrossRef]
Zhang, Y.; Shi, X.; Abdul-Hamid, Z.; Li, D.; Zhang, X.; Shen, Z. Factors Influencing Crowdsourcing Riders’ Satisfaction Based on Online Comments on Real-Time Logistics Platform. Transp. Lett. 2023, 15, 363–374. [Google Scholar] [CrossRef]
Jing, Z.; Yuru, L.; Yue, Z. More Reliance, More Injuries: Income Dependence, Workload and Work Injury of Online Food-Delivery Platform Riders. Saf. Sci. 2023, 167, 106264. [Google Scholar] [CrossRef]
Veen, A.; Barratt, T.; Goods, C. Platform-Capital’s ‘App-Etite’ for Control: A Labour Process Analysis of Food-Delivery Work in Australia. Work Employ. Soc. 2020, 34, 388–406. [Google Scholar] [CrossRef]
Newlands, G. Algorithmic Surveillance in the Gig Economy: The Organization of Work through Lefebvrian Conceived Space. Organ. Stud. 2021, 42, 719–737. [Google Scholar] [CrossRef]
Zhang, R.; Yu, Z.; Yao, W. Navigating the Complexities of Online Opinion Formation: An Insight into Consumer Cognitive Heuristics. J. Retail. Consum. Serv. 2024, 81, 103966. [Google Scholar] [CrossRef]
Nguyen-Phuoc, D.Q.; Mai, N.X.; Ho-Mai, N.T.; Nguyen, M.H.; Oviedo-Trespalacios, O. What Factors Contribute to In-Role and Extra-Role Safety Behavior among Food Delivery Riders? Transp. Res. Part F Psychol. Behav. 2024, 102, 177–198. [Google Scholar] [CrossRef]
Xiang, Y.; Du, J.; Zheng, X.N.; Long, L.R.; Xie, H.Y. Judging in the Dark: How Delivery Riders Form Fairness Perceptions under Algorithmic Management. J. Bus. Ethics 2024, 199, 653–670. [Google Scholar] [CrossRef]
Zhang, Y.; Huang, H. Unraveling How Poor Logistics Service Quality of Cross-Border E-Commerce Influences Customer Complaints Based on Text Mining and Association Analysis. J. Retail. Consum. Serv. 2025, 84, 104237. [Google Scholar] [CrossRef]

Figure 1. Conceptual model of the pathway from expression to empowerment: H1 = topics influence self-disclosure; H2b = topics affect engagement; H2a = self-disclosure drives engagement; H3/H3b cognitive complexity moderates the effects of topics/self-disclosure; H4a/H4b = emotional polarity moderates the effects of self-disclosure/topics.

Figure 2. Flow chart of TOP2VEC algorithm.

Figure 3. Distribution of key variables with skewness and kurtosis.

Figure 4. Kendall’s Tau correlation matrix among main variables. Notes: for correlation coefficients’ significance, see Table A3. The color gradient is centered at

τ = 0.30

to visually emphasize meaningful correlation strengths.

Figure 4. Kendall’s Tau correlation matrix among main variables. Notes: for correlation coefficients’ significance, see Table A3. The color gradient is centered at

τ = 0.30

to visually emphasize meaningful correlation strengths.

Figure 5. Relative contribution of indirect effects by topics. Note: Values reflect relative contribution of indirect effects by topics. Topics with 95% bootstrap CI excluding zero are considered statistically significant (see Table 6).

Figure 6. Conditional indirect effects by topic category. Note: Values reflect conditional indirect effects. Topics with 95% bootstrap CI excluding zero are considered statistically significant (see Table A4 and Table A5). Topics with the three largest standardized effect sizes are highlighted using diagonal hatching for visual emphasis.

Table 1. The symbols with definitions of variables.

Symbol	Definition
$X_{i} \in ℝ^{k}$	Vector of topic probabilities for post $i$ representing work experience themes $T_{1}$ to $T_{k}$ .
$T_{i} \in {1, 2, \dots, k}$	Dominant topic category for post $i$ , used in subsample analysis.
$M_{i} \in [0, 1]$	Self-disclosure index (SI), a composite score based on breadth, depth, and duration of expression.
$Y_{i} \in [0, 1]$	Engagement rate (ER), defined as the ratio of audience interactions (likes + comments) to post views.
$W_{1 i} \in [0, 1]$	Cognitive complexity index (CCI), measured by LIWC cognitive function word ratio.
$W_{2 i} \in [0, 1]$	Emotional polarity index (EPI), normalized sentiment score centered at 0.5.
$Z_{i}$	Control variables: post time, user ID, etc.

Table 2. Description of variables.

Symbol	Variable Name	Definition	Measurement Method
$X$	Work Experience Topic	Categorical variable indicating dominant topic T1, T2, …, T11	Top2Vec dominant cluster
$M$	Self-Disclosure Index (SI)	Composite index representing expressive depth and breadth	Standardized sum [33]: $SI = (Breadth + Depth) \times Length$
$W_{1}$	Emotional Polarity Index (EPI)	Affective tone of post content	$EPI = \frac{1 + pos}{2 + pos + neg}$ , centered at 0.5 [60]
$W_{2}$	Cognitive Complexity Index (CCI)	Proportion of cognitive mechanism words (e.g., causation and tentativeness)	Output from LIWC dictionary [59]
$Y$	Engagement Rate (ER)	Relative audience interaction rate for each post	$ER = \frac{likes + comments}{views}$ [61]
$Z$	Control Variables	Posting time and user ID	Categorical or continuous

Table 3. Summary of path effects in the conditional process model.

Effect Type	Formula
Simple Mediation	$E = α_{1} \cdot β_{2}$	(9)
First-Stage Moderation	$M E_{1} = \frac{\partial M}{\partial X} = α_{1} + α_{2} W_{1}$	(10)
Second-Stage Moderation	$M E_{2} = \frac{\partial Y}{\partial M} = β_{2} + β_{3} W_{1} + β_{4} W_{2}$	(11)
Moderated Mediation	$I E_{X \to M \to Y}^{j} = (α_{1}^{j} + α_{2}^{j} W_{1}) (β_{2}^{j} + β_{3}^{j} W_{1} + β_{4}^{j} W_{2})$	(12)

Table 4. Descriptive statistics of key variables (

N = 644

).

Table 4. Descriptive statistics of key variables (

N = 644

).

Variables	Min	Max	z	Standard	Deviation	VIF
ER	0.097	1	0.235	0.109	0.012
SI	0	259.229	14.311	24.350	592.915	1.680
T1	0	15	1.798	2.718	7.39	3.112
T2	0	12	1.112	1.761	3.101	2.428
T3	0	7	0.595	1.130	1.277	2.027
T4	0	4	0.228	0.579	0.335	1.345
T5	0	4	0.210	0.524	0.275	1.289
T6	0	3	0.211	0.498	0.248	1.279
T7	0	4	0.144	0.428	0.183	1.268
T8	0	3	0.175	0.472	0.223	1.228
T9	0	4	0.185	0.522	0.272	1.353
T10	0	4	0.157	0.448	0.201	1.276
T11	0	5	0.172	0.495	0.246	1.394
EPI_P	0	0.389	0.039	0.074	0.005	1.173
EPI_N	0	0.333	0.050	0.079	0.006	1.202
CCI	0	6.208	3.695	1.307	1.71	2.337

Table 5. Regression coefficients, direct effects.

Path	$α_{1}^{j} / β_{1}^{j}$	$β_{s t d}$	SE	Bootstrap CIs
T1 → SI	4.3957 ***	0.4907	0.308	(0.0150, 0.0211)
T2 → SI	6.9963 ***	0.506	0.4707	(0.0192, 0.0305)
T3 → SI	9.1129 ***	0.4229	0.7706	(0.0196, 0.0370)
T4 → SI	13.7441 ***	0.3267	1.5691	(0.0130, 0.0435)
T5 → SI	15.0642 ***	0.3243	1.7341	(0.0224, 0.0545)
T6 → SI	20.1933 ***	0.4127	1.7587	(0.0171, 0.0569)
T7 → SI	17.4328 ***	0.3061	2.1395	(0.0190, 0.0516)
T8→ SI	13.44 ***	0.2605	1.9663	(0.0321, 0.0668)
T9 → SI	13.5635 ***	0.2906	1.7626	(0.0266, 0.0644)
T10 → SI	18.4264 ***	0.3392	2.0171	(0.0176, 0.0517)
T11 → SI	16.6056 ***	0.3379	1.8254	(0.0212, 0.0584)
T1 → ER	0.0181 ***	0.4485	0.0015	(0.0150, 0.0211)
T2 → ER	0.0249 ***	0.3999	0.0024	(0.0192, 0.0305)
T3 → ER	0.0283 ***	0.2922	0.0037	(0.0196, 0.0370)
T4 → ER	0.0282 ***	0.1493	0.0071	(0.0130, 0.0435)
T5 → ER	0.0385 ***	0.1842	0.0078	(0.0224, 0.0545)
T6 → ER	0.037 ***	0.1682	0.0086	(0.0171, 0.0569)
T7 → ER	0.0353 ***	0.1379	0.0098	(0.0190, 0.0516)
T8→ ER	0.0494 ***	0.2131	0.0084	(0.0321, 0.0668)
T9 → ER	0.0455 ***	0.2169	0.0077	(0.0266, 0.0644)
T10 → ER	0.0346 ***	0.1418	0.0092	(0.0176, 0.0517)
T11 → ER	0.0398 ***	0.1802	0.0083	(0.0212, 0.0584)

Notes:

α_{1}^{j} / β_{1}^{j}

denotes unstandardized estimate;

β_{s t d}

denotes standardized estimate; SE denotes standard error; CI denotes 95% bootstrap confidence interval, and ***

p < 0.01

.

Table 6. Mediating effects of self-disclosure.

Path	$β_{2}^{j}$	$β_{s t d}$	SE	Bootstrap CIs
T1 → SI →ER	0.0039 ***	0.0942	0.0009	(0.0026, 0.0064)
T2 →SI → ER	0.0063 ***	0.1061	0.0017	(0.0045, 0.0113)
T3 →SI → ER	0.0118 ***	0.122	0.0031	(0.0081, 0.0203)
T4 →SI → ER	0.022 ***	0.1187	0.0059	(0.0145, 0.0379)
T5 →SI → ER	0.0241 ***	0.1143	0.0059	(0.0162, 0.0391)
T6 →SI → ER	0.0303 ***	0.1414	0.0062	(0.0214, 0.0456)
T7 →SI → ER	0.0296 ***	0.1132	0.0057	(0.0202, 0.0427)
T8→SI → ER	0.0215 ***	0.0929	0.0054	(0.0139, 0.0349)
T9 → SI →ER	0.0217 ***	0.1014	0.0061	(0.0136, 0.0374)
T10 →SI → ER	0.0295 ***	0.1234	0.0064	(0.0212, 0.0465)
T11 → SI →ER	0.0266 ***	0.1187	0.0059	(0.0180, 0.0408)

Notes:

β_{2}^{j}

denotes unstandardized estimate;

β_{s t d}

denotes standardized estimate; SE denotes standard error; CI denotes 95% bootstrap confidence interval, with ***

p < 0.01

.

Table 7. The relative contribution of topics.

Rank	Variables	$E$	Relative Contribution (%)	Cumulative Contribution (%)
1	T6	0.6119	17.33%	17.33%
2	T10	0.5436	15.40%	32.73%
3	T7	0.516	14.62%	47.35%
4	T11	0.4417	12.51%	59.86%
5	T5	0.363	10.28%	70.14%
6	T4	0.3024	8.57%	78.71%
7	T9	0.2943	8.34%	87.05%
8	T8	0.289	8.19%	95.24%
9	T3	0.1075	3.04%	98.28%
10	T2	0.0441	1.25%	99.53%
11	T1	0.0171	0.48%	100%

Note: Values reflect relative contribution of topics. Topics with 95% bootstrap CI excluding zero are considered statistically significant (see Table 6).

Table 8. Moderating effects of negativity and cognitive complexity.

Variables	Path	$α_{2}^{'} / β_{3}^{'} / β_{4}^{'}$	Bootmean	BootSE	Bootstrap CIs
T2	SI × EPI_N→ER	0.0116 ***	0.0113	0.0034	(0.0047, 0.018)
T3	SI × EPI_N→ER	0.0104 ***	0.0103	0.0033	(0.0038, 0.0168)
T4	T4 × CCI→SI	2.6942 ***	2.8085	1.0609	(0.8329, 5.0519)
	SI × CCI→ER	0.0004 ***	0.0005	0.0002	(0.0001, 0.001)
	SI × EPI_N→ER	0.0112 ***	0.0109	0.0038	(0.0032, 0.0181)
T5	SI × CCI→ER	0.0005 ***	0.0005	0.0002	(0.0002, 0.001)
	SI × EPI_N→ER	0.0124 ***	0.0121	0.0036	(0.0046, 0.019)
T6	T6 × CCI→SI	7.2905 ***	7.317	2.3556	(3.5806, 12.7726)
	SI × CCI→ER	0.0004 ***	0.0005	0.0002	(0.0001, 0.0009)
	SI × EPI_N→ER	0.0112 ***	0.0109	0.0033	(0.0042, 0.017)
T7	SI × CCI→ER	0.0004 ***	0.0005	0.0002	(0.0002, 0.0009)
	SI × EPI_N→ER	0.0125 ***	0.0122	0.0036	(0.0049, 0.0189)
T8	T8 × CCI→SI	4.0234 ***	4.1305	1.2867	(1.7482, 6.9008)
	SI × CCI→ER	0.0004 ***	0.0005	0.0002	(0.0001, 0.0009)
	SI × EPI_N→ER	0.0097 ***	0.0093	0.0034	(0.0025, 0.016)
T9	SI × CCI→ER	0.0005 ***	0.0006	0.0002	(0.0002, 0.0011)
	SI × EPI_N→ER	0.0109 ***	0.0108	0.0036	(0.0037, 0.0175)
T10	SI × CCI→ER	0.0005 ***	0.0006	0.0002	(0.0002, 0.001)
	SI × EPI_N→ER	0.0134 ***	0.0131	0.0037	(0.0053, 0.0201)
T11	T11 × CCI→SI	2.8002 ***	3.0919	1.4477	(0.6758, 6.4133)
	SI × CCI→ER	0.0005 ***	0.0005	0.0002	(0.0002, 0.001)
	SI × EPI_N→ER	0.0106 ***	0.0103	0.0033	(0.0033, 0.0167)

Notes:

α_{2}^{'} / β_{3}^{'} / β_{4}^{'}

denotes unstandardized estimate; SE denotes standard error; CI denotes 95% bootstrap confidence interval, with ***

p < 0.01

.

Table 9. The second-stage moderation effects of topics.

$Rank (by M E_{2})$	Variables	$M E_{1}$	$M E_{2}$	Direct Effect (DE)	DE’s Bootstrap CIs
1	T10	18.4264	0.2950	0.0016	(0.0005, 0.0012)
2	T11	48.9013	0.0467	0.0016	(0.0006, 0.0013)
3	T7	17.4328	0.0435	0.0017	(0.001, 0.0016)
4	T6	78.8633	0.0419	0.0015	(0.0013, 0.002)
5	T9	13.5635	0.0384	0.0016	(0.0013, 0.0019)
6	T8	31.7636	0.0365	0.0016	(0.0012, 0.0019)
7	T5	15.0642	0.0339	0.0016	(0.0013, 0.002)
8	T2	6.9963	0.0314	0.0009	(0.0013, 0.0019)
9	T4	39.7712	0.0296	0.0016	(0.0012, 0.0019)
10	T3	9.1129	0.0149	0.0013	(0.0013, 0.002)
11	T1	4.3957	0.0039	0.0009	(0.0012, 0.0019)

Note: Topics with 95% bootstrap CI excluding zero are considered statistically significant.

Table 10. The conditional indirect effect (

I E_{X \to M \to Y}^{j}

).

Table 10. The conditional indirect effect (

I E_{X \to M \to Y}^{j}

).

Rank	Variables	$α_{1}^{j}$	$α_{2}^{j}$	$β_{2}^{j}$	$β_{3}^{j}$	$β_{4}^{j}$	$W_{1}$	$W_{2}$	$I E_{X \to M \to Y}^{j}$
1	T10	18.4264	0	0.0295	0.0006	0.029	5.0029	9.05	5.4349
2	T6	20.1933	11.7272	0.0303	0.0006	0.0017	5.0029	5.05	3.3033
3	T11	16.6056	6.4554	0.0266	0.0006	0.0017	5.0029	10.05	2.283
4	T4	13.7441	5.2024	0.022	0.0006	0.0015	5.0029	3.05	1.1763
5	T8	13.44	3.6626	0.0215	0.0006	0.0017	5.0029	7.05	1.159
6	T7	17.4328	0	0.0296	0.0006	0.0018	5.0029	6.05	0.7582
7	T9	13.5635	0	0.0217	0.0006	0.0017	5.0029	8.05	0.5207
8	T5	15.0642	0	0.0241	0.0005	0.0018	5.0029	4.05	0.5105
9	T2	6.9963	0	0.0063	0	0.0239	5.0029	1.05	0.2196
10	T3	9.1129	0	0.0118	0	0.0015	5.0029	2.05	0.13555
11	T1	4.3957	0	0.0039	0	0	5.0029	0.05	0.0171

Note: All conditional indirect effects (IE) in the final column were computed based on statistically significant coefficients (

p < 0.01

) from the PROCESS Model 4 and Model 58 outputs. Bootstrap confidence intervals do not include zero.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, D.; Zhang, Y. A Statistical Framework for Modeling Behavioral Engagement via Topic and Psycholinguistic Features: Evidence from High-Dimensional Text Data. Mathematics 2025, 13, 2374. https://doi.org/10.3390/math13152374

AMA Style

Li D, Zhang Y. A Statistical Framework for Modeling Behavioral Engagement via Topic and Psycholinguistic Features: Evidence from High-Dimensional Text Data. Mathematics. 2025; 13(15):2374. https://doi.org/10.3390/math13152374

Chicago/Turabian Style

Li, Dan, and Yi Zhang. 2025. "A Statistical Framework for Modeling Behavioral Engagement via Topic and Psycholinguistic Features: Evidence from High-Dimensional Text Data" Mathematics 13, no. 15: 2374. https://doi.org/10.3390/math13152374

APA Style

Li, D., & Zhang, Y. (2025). A Statistical Framework for Modeling Behavioral Engagement via Topic and Psycholinguistic Features: Evidence from High-Dimensional Text Data. Mathematics, 13(15), 2374. https://doi.org/10.3390/math13152374

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Statistical Framework for Modeling Behavioral Engagement via Topic and Psycholinguistic Features: Evidence from High-Dimensional Text Data

Abstract

1. Introduction

2. Theoretical Background and Hypotheses

2.1. Self-Disclosure of Women Delivery Riders’ Work Experience

2.2. Platform Expression: Work Experience, Self-Disclosure, and Community Engagement

2.3. Cognitive Complexity as a Linguistic Moderator

2.4. The Moderating Role of Emotional Polarity

2.5. Path Modeling and Structural Assumptions

3. Data and Modeling Methods

3.1. Data Collection and Preprocessing

3.2. Variables Definitions

3.3. Topic Modeling and Psycholinguistic Feature Extraction

3.3.1. Algorithmic via Top2Vec Algorithm

3.3.2. Psycholinguistic Feature Extraction by LIWC

3.4. Moderated Mediation Modeling Framework

3.5. Indirect Effect Estimation and Bootstrapping

4. Empirical Results

4.1. Descriptive Statistics and Variable Correlation Analysis

Descriptive Distribution and Variance

4.2. Mediation Effects Across Topics

4.3. Moderated Mediation Analysis

4.3.1. Moderating Effects of Cognitive Complexity

4.3.2. Emotional Polarity as a Secondary Amplifier

4.3.3. From Disclosure to Interaction: A Gated Amplification Pathway

4.3.4. Robustness Checks

4.4. Effect Strengths and Optimization Potential

5. Discussion

5.1. Summary of Core Mechanisms

5.2. Psycholinguistic Moderators of Behavioral Response

5.3. Role of Conditional Moderation in Behavior Prediction

5.4. Modeling High-Dimensional Discourse for Predictive Decision Making

6. Conclusions and Decision-Making Implications

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI