Measuring Psychological Constructs from Social Media Text Using the Word Embedding Projection Approach

Deng, Xudong; Li, Yijun

doi:10.3390/bs16050762

Open AccessArticle

Measuring Psychological Constructs from Social Media Text Using the Word Embedding Projection Approach

by

Xudong Deng

^*

and

Yijun Li

School of Management, Harbin Institute of Technology, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Behav. Sci. 2026, 16(5), 762; https://doi.org/10.3390/bs16050762

Submission received: 7 April 2026 / Revised: 2 May 2026 / Accepted: 11 May 2026 / Published: 13 May 2026

Download

Browse Figures

Versions Notes

Abstract

Social media text offers new opportunities for tracking psychological constructs over time, but such measurement requires methods that are interpretable and stable in longitudinal settings. Grounded in Conceptual Space Theory and semantic projection, this study develops the Word Embedding Projection Approach (WEPA), which represents psychological constructs as semantic axes defined by theory-driven anchor words in a domain-specific word embedding space. WEPA scores are interpreted as indicators of expressed psychological salience in user-generated text. Using panel data from Keep, a Chinese fitness-oriented social media platform (177,829 users; 2,668,298 weekly observations), we evaluate WEPA across four studies. Study I examines the measurement foundation through anchor-word validation, human-coded benchmarks, and cross-temporal semantic-axis stability checks. Studies II and III show theoretically interpretable predictive associations between WEPA-derived scores and subsequent exercise duration for goal-setting constructs and the four source dimensions of self-efficacy. Study IV provides exploratory relative-week trajectories of aggregate construct dynamics. Overall, WEPA shows strong agreement with human-coded benchmarks for goal specificity and physiological states and achieves broader text coverage than a dictionary-based baseline. These findings suggest that WEPA offers a promising approach to theory-driven psychological measurement from domain-specific social media text.

Keywords:

social media text; psychological measurement; Word Embedding Projection Approach; Conceptual Space Theory; semantic projection; self-efficacy; goal-setting theory; computational behavioral science

1. Introduction

User-generated content on social media offers new opportunities for large-scale, non-intrusive psychological measurement because it contains rich digital traces of users’ thoughts, feelings, and behavior (Berger et al., 2020; Eichstaedt & Weidman, 2020; Feng & Ivanov, 2023; Golder & Macy, 2014; Sultan et al., 2023; Zhang & Hu, 2024). Word embedding methods have been used to study macro-level phenomena such as stereotypes, ideology, and cultural change (Grand et al., 2022; Kozlowski et al., 2019). However, their validity for measuring individual psychological constructs, such as self-efficacy and goal commitment, remains less clear, especially in longitudinal settings where scores must remain interpretable and comparable over time. Dictionary-based methods, including LIWC-style tools, provide transparent lexical indicators but rely on closed vocabularies and exact word matching, which can lead to sparse coverage in short texts (Eichstaedt et al., 2020; Hartmann et al., 2019). Contextual language models and large language models offer richer semantic representations, but their outputs can be difficult to interpret as fixed construct dimensions and may be sensitive to context, prompts, model updates, or representational drift (Demszky et al., 2023; Nikadon et al., 2025; Sclar et al., 2023; Zhou et al., 2024).

To address this measurement problem, we extend semantic projection to individual-level psychological construct measurement and develop the Word Embedding Projection Approach (WEPA). WEPA does not introduce semantic projection as a new computational principle; instead, it adapts the geometric logic of word embeddings to a psychometric measurement task. In WEPA, theory-driven anchor words define semantic axes, and user text is projected onto these axes to obtain construct scores. This design links Conceptual Space Theory with domain-specific static embeddings, localized anchor-word dictionaries, and validation procedures to form an interpretable workflow for measuring psychological constructs from platform text. In this study, WEPA scores are interpreted as indicators of expressed psychological salience in user-generated text. They capture how strongly users’ language reflects construct-related meanings, not direct observations of latent psychological states or external motivational conditions.

WEPA makes three specific contributions. First, it applies semantic projection to individual-level psychological measurement in a large-scale Chinese digital fitness community, where short texts and domain-specific expressions challenge closed-vocabulary methods. Second, it demonstrates context-specific anchor-word development and validation, using Goal-Setting Theory and Social Cognitive Theory to guide the operationalization of psychological constructs in Chinese fitness-community text. Third, it treats semantic projection as a measurement workflow, not only a scoring technique, by examining measurement foundation, criterion validity, and cross-temporal semantic-axis stability. In this sense, WEPA should be understood as a psychometric implementation of semantic projection, not as a universal replacement for dictionary-based, supervised-learning or contextual language-model approaches.

We evaluate WEPA using an observational design and a stepwise validation strategy. Our analysis draws on panel data from Keep, a Chinese digital fitness community, comprising 177,829 users and 2,668,298 weekly observations. Study I establishes the measurement foundation through anchor-word validation, human-coded benchmarks, and cross-temporal semantic-axis stability checks. Studies II and III examine criterion validity by testing whether WEPA-derived construct scores are associated with subsequent exercise duration for goal-setting constructs and the four source dimensions of self-efficacy. Study IV provides an exploratory relative-week analysis of aggregate construct trajectories after aligning users by weeks since registration. Because the study is observational and based on a single platform, the findings should be interpreted as predictive associations and exploratory descriptive patterns, not as causal effects or universal psychological dynamics. To support transparent and reproducible research, we provide an open-source implementation of the WEPA scoring procedure.

2. Theoretical Foundations and WEPA Design

2.1. Text as a Basis for Psychological Measurement

Language is central to human social interaction and systematically reflects personality traits, behavioral intentions, and psychological tendencies (Herderich et al., 2024; Pennebaker et al., 2015). Research at the intersection of linguistics and psychology has shown that patterns of language use are associated with cognitive structure, affective states, motivational orientation, and related psychological processes (Humphreys & Wang, 2017; Kjell et al., 2019; Lee & Han, 2023; Orrù et al., 2024). This body of work provides a foundation for using social media text to measure individual psychological constructs.

In psychometrics, psychological constructs are latent variables that cannot be observed directly and must be inferred from observable indicators. In classical reflective models, indicators are assumed to be caused by the latent construct and are therefore expected to be correlated and partly interchangeable because they share the same underlying source. Under this logic, adding or removing a small number of indicators affects measurement precision, not the ontological meaning of the construct itself (Flake et al., 2017; Kyle et al., 2020). By extension, digital traces embedded in user-generated content can be treated as reflective linguistic indicators. Lexical choices and broader patterns of language use can thus be understood as manifestations of latent constructs such as self-efficacy and goal commitment, not as constitutive components of those constructs.

Treating user-generated content as reflective indicators implies three assumptions. First, latent psychological constructs give rise to recurring linguistic features and behavioral expressions. In a fitness context, for example, individuals with high self-efficacy may be more likely to use words such as capable and confident. Second, different textual features, including lexical choice and semantic orientation, can serve as related indicators because they share common variance generated by the same latent construct. Third, adding or removing individual words or expressions changes measurement reliability rather than the theoretical boundaries of the construct. Prior research has used this logic to identify latent structures linking language use to personality and emotion, suggesting that language-based indicators can capture psychological patterns with temporal regularity and behavioral relevance (Eichstaedt & Weidman, 2020; Golder & Macy, 2014).

2.2. Conceptual Space and Construct Representation

Conceptual Space Theory treats abstract concepts as mental representations that can be formalized in a multidimensional geometric space (Aceves & Evans, 2024; Douven et al., 2011). In WEPA, word embeddings define this semantic space at the word level, and theory-driven anchor words define construct-specific semantic axes. User text is then projected onto these axes to generate construct scores, which can later be aggregated to describe group- or community-level patterns when appropriate, as illustrated in Figure 1.

The core operation of WEPA is the mapping from word-level semantic representations to user-level construct indicators. In psychometric terms, constructs are the latent psychological variables of interest, dimensions are their theoretical components, and indicators are the projection scores generated from text. Conceptual Space Theory formalizes the semantic structure of a construct, while psychometric theory clarifies that the latent construct itself is inferred from observable indicators. Thus, a bipolar semantic axis does not replace the latent construct. It defines a theory-guided measurement direction in language space, and the projection score serves as a text-based indicator of expressed construct salience.

A bipolar dimension refers to a structure in which opposing tendencies on a single latent construct, such as extraversion–introversion or high pleasure–low pleasure, are treated as opposite ends of the same continuum (Barchard et al., 2024; Reise et al., 2013). Goal commitment illustrates this structure. High commitment may be represented by prototypes such as persist, determination, and complete, while low commitment may be represented by quit, abandon, and give up. These semantically opposed prototypes define the geometric axis of the construct, and a user’s text can be projected onto the axis to obtain a reflective linguistic indicator. For example, the expression, “I will persist and finish my 5K run today” falls on the positive side of the axis, indicating relatively high expressed goal commitment.

When a construct contains multiple theoretically distinct dimensions, these dimensions can be represented separately and, when justified, summarized into an overall score (Reise et al., 2013; Rodriguez et al., 2016). Self-efficacy illustrates this case. In Social Cognitive Theory, self-efficacy is shaped by mastery experience, vicarious experience, social persuasion, and physiological and affective states (Bandura, 1977). In this study, these four sources are operationalized as text-based dimensions that can be examined separately and aggregated as an overall self-efficacy indicator, allowing WEPA to quantify multidimensional constructs while preserving their internal organization.

2.3. Measurement Stability and Longitudinal Comparability

Measurement stability and longitudinal comparability are central concerns when psychological constructs are measured repeatedly over time. A related but stricter concept is measurement invariance, which refers to whether a measurement approach preserves a consistent construct scale across time, groups, or contexts (Leitgöb et al., 2022; Liu et al., 2017). In longitudinal text analysis, observed changes are interpretable only when they reflect changes in construct-related expression, not shifts in word meaning or instability in the measurement procedure.

Strict measurement invariance is difficult to establish in text-based settings because language evolves and contextualized models may generate representations that vary with input context. WEPA therefore adopts an operational strategy focused on stability and comparability. It uses static word embeddings, fixed semantic axes, and anchor-word aggregation. Static embeddings provide fixed word vectors after training. Fixed axes keep the measurement direction constant across samples and time. Anchor-word aggregation reduces dependence on any single period-specific expression by representing each pole as a prototype rather than as an individual word.

These design choices are intended to support stable measurement, but they do not constitute a full test of measurement invariance in the strict psychometric sense. We therefore examine cross-temporal semantic-axis stability in Section 4.1.3, focusing on both axis orientation and the rank-order consistency of anchor words over time.

2.4. Alternative Text-Based Measurement Approaches

Text-based psychological measurement can be implemented through several methodological families. Dictionary-based approaches, such as LIWC-style lexicons, are transparent and easy to apply, but they depend on predefined word lists and may have limited coverage in domain-specific social media text (Eichstaedt et al., 2020; Hartmann et al., 2019; Pennebaker et al., 2015). Supervised machine-learning models can improve task-specific prediction when large labeled datasets are available, although they require construct-specific annotations and often provide limited interpretability as theory-defined construct scores (Kadhim, 2019; Speer et al., 2024). Transformer-based and large language models offer strong semantic flexibility, but their outputs may be sensitive to prompts, model versions, proprietary training procedures, and computational cost (Demszky et al., 2023; Nikadon et al., 2025; Sclar et al., 2023; Zhou et al., 2024). Semantic projection methods, including WEPA, occupy an intermediate position by representing meaning continuously in embedding space while maintaining an explicit link between construct definitions and measurement rules (Grand et al., 2022; Kozlowski et al., 2019).

WEPA builds on the semantic-projection tradition and adapts it to theory-driven psychological measurement. It is designed for constructs that can be represented through theoretically meaningful semantic poles and extends anchor-word-based measurement by projecting text in embedding space instead of relying on exact word matching. For this reason, the dictionary-based method provides the closest baseline for the benchmark analysis in Study I. Broader comparisons with supervised classifiers, transformer encoders, and LLM-based scoring remain valuable directions for future research when high-quality labeled data and stable model access are available. Table 1 summarizes the main strengths and limitations of these approaches and positions WEPA within the broader landscape of text-based psychological measurement.

2.5. Computational Implementation of WEPA

Having clarified the measurement rationale and the trade-offs among alternative approaches, we now describe how WEPA constructs semantic axes, vectorizes user text, and computes projection-based construct scores.

2.5.1. Core Computational Procedure

WEPA operationalizes semantic projection in four steps. First, the positive and negative prototype vectors are computed by averaging the word vectors in the corresponding anchor-word sets:

proto_{vec}_{d} = \frac{1}{| A_{d} |} \sum_{a \in A_{d}} Embed (a), d \in {p o s, n e g}

(1)

Second, the difference between the two prototype vectors is normalized to obtain the unit semantic-axis vector:

axis_vec = \frac{proto_{vec}_{p o s} - proto_{vec}_{n e g}}{∥ proto_{vec}_{p o s} - proto_{vec}_{n e g} ∥}

(2)

Third, the mean of all valid word vectors in the user’s text is computed:

text_vec = \frac{1}{| W |} \sum_{w \in W} Embed (w)

(3)

Finally, projecting the text vector onto the semantic axis yields the WEPA score:

wepa_score = text_vec \cdot axis_vec

(4)

Here,

A_{p o s}

and

A_{n e g}

denote the positive and negative anchor-word sets,

{proto}_{v} {ec}_{d}

denotes the centroid for polarity d, and W contains the cleaned tokens after preprocessing and stop-word removal. The embedding function

Embed (\cdot)

returns the learned vector for in-vocabulary tokens and a zero vector for out-of-vocabulary (OOV) tokens. OOV tokens remain in the denominator but add no directional semantic information, pulling low-coverage texts closer to the neutral point. We examine the implications of this choice in the textual-sparsity sensitivity analysis, as discussed in Section 5.1.

WEPA uses a single bipolar semantic axis for each construct. Positive values indicate stronger alignment with the positive pole, and larger absolute values indicate greater deviation from the neutral point. Substantively, these scores capture construct-related linguistic salience. A high score means that a user’s language is positioned closer to one semantic pole than the other. This distinction matters when classical theories refer to external conditions, such as challenging goals or social encouragement, while social media text records how users express these experiences. In the remainder of the paper, we refer to these WEPA-derived scores as construct scores unless clarification is needed.

2.5.2. Implementation Steps

Table 2 summarizes the WEPA workflow used in this study.

For corpus preparation and embedding training, we build a cleaned 816 MB corpus from fitness-community text. Chinese text is preprocessed by removing URLs, @-mentions, nonlinguistic symbols, and one-character tokens. We then train domain-specific static word vectors with cntext (v2.2.0) (Deng & Nan, 2022). Based on pilot comparisons, the final GloVe model uses 200 dimensions, a context window of 15 words, and 20 training iterations. Embedding quality is assessed through synonym retrieval for 50 high-frequency fitness terms, with 90% of retrieved neighbors judged semantically relevant in the fitness domain.

The WEPA scoring procedure is implemented in cntext, an open-source Python text-analysis package developed and maintained by the first author. The package provides functions for preprocessing, embedding training, semantic-axis construction, and text projection. To support reproducibility, the repository includes installation instructions, a minimal WEPA example, API descriptions, and basic tests for semantic-axis construction and text scoring. When applying WEPA to new constructs or domains, researchers should focus especially on theoretically grounded construct definitions and rigorous anchor-word development and validation.

3. Validation Strategy, Data, and Measurement

3.1. Validation Strategy

WEPA is evaluated through a structured validation framework that combines measurement validation with exploratory longitudinal analysis (see Figure 2). Study I establishes the measurement foundation through anchor-word validation, benchmark-based evaluation against human annotations, and cross-temporal stability assessment. Studies II and III examine criterion validity by testing whether construct scores are associated with subsequent exercise behavior. Study IV provides an exploratory descriptive analysis of relative-week dynamics by aligning observed users according to weeks since registration and examining aggregate construct trajectories over time. Across the four studies, the analysis proceeds from measurement foundation to criterion validity and then to exploratory evidence on longitudinal psychological dynamics in a naturalistic digital setting.

3.2. Data and Sample

The data come from Keep, a leading fitness app in China that combines exercise tracking with social functions such as posts, check-ins, and topic discussions. This setting provides both user-generated text containing psychological cues and behavioral records, such as exercise duration, that can serve as criterion measures.

We collected publicly accessible secondary digital trace data from 1 February 2015 to 1 June 2021. The sample was expanded from active seed users through follower and followee networks, with duplicate records removed during data collection. This network-based procedure was intended to broaden coverage across active-user neighborhoods, not to produce a probability sample of the entire platform population. Data collection involved no direct interaction with users and accessed no private or restricted content.

We cleaned the raw data and aggregated them to the user–week level. Weekly aggregation increases text density, reduces day-level noise, and aligns with the weekly rhythm of exercise behavior. Exercise duration and other numerical indicators were summed by week, while all posts published by a user within a given week were concatenated chronologically to form the weekly text. Before analysis, user identifiers were irreversibly hashed, personally identifiable information was removed, and results were reported only in aggregate form. We removed user–week observations with empty text and excluded users with missing registration information or fewer than two valid observation weeks, yielding a final sample of 177,829 users and 2,668,298 weekly observations.

3.3. Variable Measurement

Exercise engagement intensity is measured by weekly cumulative exercise duration (ExDur, minutes per week), transformed as

ln (x + 1)

to reduce right skew. This variable captures behavioral investment and serves as the objective behavioral criterion in the validation analyses.

The independent variables are WEPA-based construct scores derived from users’ weekly text. The raw WEPA score is the scalar projection of the user–week text vector onto a construct-specific semantic axis, as defined in Equation (4). The goal-setting constructs include goal commitment, goal specificity, and perceived goal difficulty. Self-efficacy is measured using Bandura’s four-source model, including mastery experience, vicarious experience, social persuasion, and physiological states. The sum of these four dimensions forms the overall self-efficacy score, while the dimensions are also examined separately.

All control variables are time-varying and cover demographic characteristics, expressive volume, behavioral inertia, and social incentives. User age (age) is measured in years as the difference between the observation week and the user’s date of birth. Expressive volume (StrLen) is the number of characters posted during the week. Behavioral inertia (DurChg) is the first difference in logged exercise duration between adjacent weeks. Social engagement (SocEngage) is the weekly sum of discussion and check-in activities, and social feedback (SocFeedback) is the weekly sum of likes and comments received. Both social variables are transformed as

ln (x + 1)

. All continuous predictors, including WEPA scores and controls, are standardized in regression analyses unless otherwise stated.

For panel identification, userid is used as the anonymized user identifier, and WeekI records the number of weeks between the current observation week and the user’s registration week. Table 3, Table 4 and Table 5 report variable descriptions, summary statistics, and the correlation matrix. Summary statistics are presented on the original scale after 1% winsorization, including raw WEPA projection scores. The correlation matrix is computed using winsorized and standardized data.

The correlation matrix shows several notable relationships. Goal commitment is negatively correlated with goal difficulty (

r = - 0.544

), indicating that more difficulty-related expression tends to coincide with lower commitment in this sample. The four self-efficacy dimensions are highly correlated (

r = 0.79

–

0.96

), suggesting substantial shared variance. Social engagement is strongly correlated with exercise duration (

r = 0.696

), supporting the inclusion of social controls in the regression models.

4. Validation Studies

4.1. Study I: Measurement Foundation and Benchmark Validation

Study I establishes the measurement foundation of WEPA through anchor-word development, anchor-word validation, benchmark-based evaluation against human annotations, and cross-temporal stability assessment. Because WEPA generates a single construct score for each user–week observation and the focal constructs are expected to vary over time, classical reliability measures such as internal consistency and test–retest reliability are not directly applicable. We therefore examine whether the anchor-word dictionaries exhibit the expected semantic structure, whether WEPA scores align with human judgments and outperform a dictionary-based baseline, and whether semantic axes retain stable orientation across the observation period.

4.1.1. Anchor-Word Development and Validation

We identified focal psychological constructs from Goal-Setting Theory and Social Cognitive Theory. The goal-setting constructs include goal commitment, goal specificity, and goal difficulty. We also examine self-efficacy and its four source dimensions, including mastery experience, vicarious experience, social persuasion, and physiological states. These constructs cover both unidimensional and multidimensional structures, are theoretically central to exercise behavior, and can be meaningfully represented through bipolar semantic dimensions.

In the Chinese fitness domain, anchor words were selected to capture theoretically meaningful semantic contrasts. Goal specificity contrasts quantified goal indicators with vague outcome expectations. Goal difficulty contrasts high-effort or challenge-related expressions with low-intensity or easy activities. Goal commitment contrasts determination and follow-through with hesitation, delay, or disengagement. For self-efficacy, mastery experience contrasts success with failure-related expressions, vicarious experience contrasts learning from others with unguided trial-and-error, social persuasion contrasts supportive with discouraging feedback, and physiological states contrast energetic or positive conditions with fatigued or depleted ones.

Some dimensions require context-specific interpretation. The goal difficulty axis captures difficulty-related expressions in online fitness discourse, including perceived obstacles, fatigue, and challenge-related language, and should not be treated as a direct measure of objective goal difficulty in the original Goal-Setting Theory. Similarly, the social persuasion axis captures platform expressions related to encouragement or discouragement, whose behavioral meaning may depend on whether they reflect stable support or compensatory help-seeking.

Table 6 presents representative English glosses of selected Chinese anchor words to help international readers to understand their semantic content and polarity. The empirical analysis uses the original Chinese anchor words.

The anchor-word dictionaries were developed in three stages. First, theory pooling drew on Goal-Setting Theory, Social Cognitive Theory, and fitness-community language to generate 260 candidate words. After screening against the 200-dimensional GloVe vocabulary trained on the 816 MB corpus, 251 valid candidates remained. Second, three eHealth scholars independently evaluated the fit between candidate words and construct definitions. Words were retained when at least two experts agreed on their assigned category. Inter-rater agreement was high, with Krippendorff’s

α = 0.8249

, and 232 theoretically relevant anchor words were retained.

Finally, we conducted semantic-space diagnostics. For each construct, we calculated average cosine similarity within the positive pole, within the negative pole, and between poles. This diagnostic assessed whether words within the same pole were, on average, closer to one another than to words from the opposite pole. Table 7 shows that all constructs follow this expected pattern. Although the absolute cosine values are moderate, this is plausible because anchor words are not intended to be strict synonyms. They function as theoretically related expressions aligned with the same pole, supporting subsequent semantic-axis construction.

4.1.2. Human Benchmark and Algorithm Comparison

To evaluate WEPA against human benchmarks, we selected two theoretically distinct constructs. Goal specificity represents a relatively concrete cognitive–behavioral dimension of Goal-Setting Theory and tests whether WEPA can recover quantified goal statements. Physiological states represent an affective–somatic source of self-efficacy and provide a more difficult test because expressions such as energized or exhausted are often implicit and fragmented in short social-media text.

We constructed two annotated datasets from the Keep corpus. The goal-specificity benchmark includes 518 posts with quantifiable goals, and the physiological-states benchmark includes 536 posts with identifiable bodily or emotional descriptions. All texts contain at least 10 characters. The same three eHealth scholars annotated both datasets. They identified quantifiable behavioral indicators for goal specificity and physiological or affective indicators for physiological states without inferring broader motivational meanings. Inter-annotator agreement was high for both constructs, with Cronbach’s

α = 0.965

for goal specificity and 0.922 for physiological states.

We use a dictionary-based method as a transparent closed-vocabulary baseline. Both methods rely on the same anchor-word sets, but their scoring logic differs. The dictionary method relies on exact keyword matching, whereas WEPA uses distributed semantic aggregation. This baseline is not intended to represent all possible dictionary systems. It provides a conservative reference for evaluating whether semantic projection improves coverage and agreement with human judgments under short-text conditions.

The dictionary-based method computes scores using the normalized difference in anchor-word frequencies (Eichstaedt et al., 2020):

d i c t_s c o r e = \frac{| W \cap A_{pos} | - | W \cap A_{neg} |}{| W \cap A_{pos} | + | W \cap A_{neg} | + 1}

(5)

where

W

denotes the set of words in the focal text, and

A_{pos}

and

A_{neg}

denote the positive and negative anchor-word sets. Stronger agreement with human annotations indicates a better capture of the intended construct as expressed in the text.

Table 8 reports the benchmark comparison. For goal specificity, WEPA achieves a Spearman rank correlation of

ρ = 0.895

(

p < 0.001

), higher than the dictionary baseline (

ρ = 0.733

,

p < 0.001

). For physiological states, the contrast is larger. The dictionary method covers only 52 of 536 texts (9.7%) and shows weak agreement with human judgments (

ρ = 0.161

,

p < 0.001

), whereas WEPA scores all texts and achieves strong agreement (

ρ = 0.850

,

p < 0.001

).

These results suggest that semantic projection improves both coverage and benchmark agreement under realistic short-text conditions. The advantage is especially clear for constructs expressed through implicit, affective, or fragmented language, where exact lexical matching often fails. At the same time, the benchmark should not be interpreted as direct human-annotation validation for all seven constructs. It provides evidence across two theoretically distinct cases, one relatively concrete cognitive–behavioral construct and one affective–somatic construct.

4.1.3. Cross-Temporal Stability of Semantic Axes

The benchmark comparison establishes that WEPA captures the intended constructs in human-annotated texts. For longitudinal measurement, it is also necessary to examine whether semantic axes retain stable orientation over time. If axis orientation shifts because of linguistic evolution or corpus-specific estimation variance, observed score changes may partly reflect measurement artifacts.

To assess cross-temporal robustness, we distinguish between axis orientation and semantic structure. We partition the full corpus by calendar year from 2015 to 2021 and train separate yearly embedding models using the same hyperparameters as the main specification. Each yearly embedding space is aligned to the full-period reference space using a Procrustes transformation, so that all yearly models share a common coordinate system (Yao et al., 2018).

The full-period embedding is used as the reference because it provides a more stable estimate of the semantic space than yearly embeddings, especially in early sparse periods, and avoids anchoring the analysis to any single-year realization (Cassani et al., 2021). This choice does not introduce forward-looking bias because the analysis does not involve prediction. The full-period embedding serves only as a technical reference frame for comparing semantic representations across time.

For each construct, we compute two indicators. Cosine similarity between the yearly semantic axis and the full-period axis captures axis stability, or whether the overall direction of the construct axis remains stable. Spearman rank correlation captures rank-order consistency, or whether the relative ordering of anchor words along the construct dimension is preserved. We place less emphasis on 2015 because the Keep platform was launched in February 2015 and the first-year corpus is sparse, approximately 23 MB, compared with 134 MB in 2016.

Figure 3 presents the results. Panel (a) reports semantic axis stability based on cosine similarity, and Panel (b) reports rank-order consistency based on Spearman correlations.

These two indicators capture complementary aspects of temporal stability. Axis stability reflects whether the overall geometric direction is preserved, while rank-order consistency reflects whether the internal semantic structure of anchor words remains stable. The results show that axis orientation varies across constructs, but rank-order consistency remains consistently high after 2016. This pattern suggests that WEPA provides a stable semantic measurement structure over time, especially at the level of internal semantic ordering. The analysis supports the longitudinal interpretability of WEPA scores, although it should not be treated as a full proof of measurement invariance across platforms or contexts.

4.2. Study II: Criterion Validity of Goal-Setting Constructs

Study II evaluates whether WEPA-derived scores for goal commitment, goal specificity, and goal difficulty are associated with subsequent exercise behavior in theoretically consistent directions.

4.2.1. Hypotheses

Goal-Setting Theory argues that clear, specific, and appropriately challenging goals, when combined with feedback, can improve individual performance (Locke & Latham, 2019). Building on this logic, we develop expectations for three goal-setting constructs. Because WEPA measures these constructs through users’ own expressions in platform text, the hypotheses concern text-based indicators of goal-related psychological states rather than experimentally assigned goal conditions.

Goal commitment reflects persistence, motivational investment, and determination. Users with higher commitment are more likely to sustain exercise behavior (Brandstätter & Bernecker, 2022). Goal specificity refers to quantified and actionable goal statements, such as run 5 km per week. Specific goals reduce behavioral ambiguity and provide clearer implementation paths (Wirth et al., 2009). Accordingly, WEPA goal commitment and goal specificity scores are expected to be positively associated with future exercise duration.

Goal difficulty requires a context-specific interpretation. In classical Goal-Setting Theory, difficult goals can promote performance when they are accepted and supported by sufficient ability and resources (Senko & Harackiewicz, 2005). In online fitness discourse, however, expressions such as too hard, I cannot keep going, or I am exhausted may signal perceived obstacles, fatigue, or frustration. The WEPA goal difficulty score is therefore interpreted as expressed difficulty in platform text, and we expect it to be negatively associated with subsequent exercise duration.

Based on these arguments, we propose the following hypotheses:

Hypothesis 1.

The WEPA goal commitment score is positively associated with future exercise duration.

Hypothesis 2.

The WEPA goal specificity score is positively associated with future exercise duration.

Hypothesis 3.

The WEPA goal difficulty score is negatively associated with future exercise duration.

4.2.2. Model Specification

The three goal-setting constructs are derived from users’ weekly text using WEPA and measured as continuous construct scores. Because goal commitment and goal difficulty are conceptually related and empirically correlated (

r = - 0.544

; see Table 5), estimating all three constructs in a joint model would complicate interpretation. Joint estimation would yield partial effects conditional on other constructs, which is less aligned with the goal of assessing criterion validity for each construct individually. We therefore estimate separate individual fixed-effects panel models for each construct.

Multicollinearity diagnostics indicate that joint estimation is statistically feasible, with maximum condition indices below 2.07 and variance inflation factors below 1.57. The use of separate models is therefore motivated by interpretability rather than by multicollinearity concerns.

Using individual fixed-effects models, we estimate the following regression for each construct:

E x D u r_{i, t + 1} = α + β_{k} \cdot C o n s t r u c t_{k, i, t} + γ \cdot C o n t r o l s_{i, t} + μ_{i} + ϵ_{i, t}, k \in {1, 2, 3}

(6)

where

E x D u r_{i, t + 1}

is log-transformed exercise duration at time

t + 1

, and

C o n s t r u c t_{k, i, t}

denotes goal commitment, goal specificity, or goal difficulty at time t.

μ_{i}

represents individual fixed effects, and standard errors are clustered at the user level. The time-varying controls include age, weekly text length, exercise-duration change, social engagement, and social feedback.

Criterion validity is assessed through temporally ordered predictive associations between construct scores at time t and exercise behavior at time

t + 1

. Prediction refers to cross-temporal statistical association, not causal prediction or intervention effects. Weekly aggregation limits temporal precision because users may exercise or post at any point within a week. Directional consistency between construct scores and the external behavioral criterion provides evidence for criterion validity within this observational framework.

4.2.3. Results

Table 9 reports the criterion-validity results. Goal commitment is positively associated with exercise duration at time

t + 1

(

β = 0.069

,

p < 0.001

), supporting H1. Goal specificity is also positively associated with exercise duration (

β = 0.019

,

p < 0.001

), supporting H2. Goal difficulty is negatively associated with exercise duration (

β = - 0.053

,

p < 0.001

), supporting H3.

Because the dependent variable is log-transformed, the coefficients can be interpreted approximately as semi-elasticities. A one-standard-deviation increase in goal commitment, goal specificity, and goal difficulty corresponds to about +7.1%, +1.9%, and −5.2%, respectively, in exercise duration at

t + 1

. Using the raw mean of weekly exercise duration before log transformation as a reference (15.61 min/week), these effects translate into about +1.12, +0.30, and −0.81 min/week. These modest raw-minute equivalents should be interpreted as statistically reliable behavioral associations, not as large individual-level intervention effects.

These findings provide criterion-validity evidence for WEPA in measuring unidimensional psychological constructs.

4.3. Study III: Criterion Validity of Self-Efficacy Dimensions

Study III extends the criterion-validity assessment to multidimensional measurement. It examines whether WEPA-derived scores for overall self-efficacy and its four source dimensions are associated with subsequent exercise behavior in theoretically expected directions.

4.3.1. Hypotheses

According to Social Cognitive Theory, self-efficacy is shaped by four sources of information, including mastery experience, vicarious experience, social persuasion, and physiological states (Bandura, 1977). This four-dimensional structure makes self-efficacy a useful case for testing whether WEPA can distinguish behavioral associations among dimensions within the same construct. Because these dimensions are measured through linguistic expressions in platform text, the hypotheses concern WEPA-derived textual indicators rather than a direct experimental manipulation of self-efficacy sources.

Mastery experience reflects successful task completion and is widely regarded as the strongest source of self-efficacy. Vicarious experience arises from observing similar others succeed and provides behavioral models for action. Expressions of prior accomplishment or learning from others should therefore be positively associated with future exercise duration. Physiological states reflect subjective evaluations of physical condition and emotional arousal. Positive bodily and emotional states may support behavioral maintenance, while fatigue or negative affect may weaken efficacy beliefs. Thus, the WEPA physiological states score is also expected to be positively associated with future exercise duration.

Social persuasion requires a context-specific interpretation in digital fitness communities. In Bandura’s original theory, persuasive encouragement from others can strengthen efficacy beliefs. In platform discourse, however, references to external affirmation, such as, “I need likes”, “please encourage me”, or “I need support to continue”, may appear as compensatory expressions during motivational blockage. We therefore expect the WEPA social persuasion score to be negatively associated with subsequent exercise duration in this platform context. This hypothesis reflects the user-expressed salience of persuasion- or support-related language and should not be interpreted as a general claim that social persuasion weakens self-efficacy.

Based on these arguments, we propose the following hypotheses.

Hypothesis 4.

The WEPA self-efficacy score is positively associated with future exercise duration.

Hypothesis 5.

The WEPA mastery experience score is positively associated with future exercise duration.

Hypothesis 6.

The WEPA vicarious experience score is positively associated with future exercise duration.

Hypothesis 7.

The WEPA physiological states score is positively associated with future exercise duration.

Hypothesis 8.

The WEPA social persuasion score is negatively associated with future exercise duration.

4.3.2. Model Specification

Study III adopts the joint model as the primary specification because the four source dimensions represent theoretically distinct components of the same multidimensional construct. The goal is to assess each dimension’s association with subsequent behavior and to examine its relative contribution after accounting for the other sources. This logic is consistent with the theoretical role of self-efficacy sources in Social Cognitive Theory.

Confirmatory factor analysis results support this multidimensional structure (CFI = 0.999, TLI = 0.998, RMSEA = 0.038). Although the four dimensions are highly correlated (

r = 0.79

–

0.96

), the joint model remains informative because it captures conditional associations among theoretically related components. Multicollinearity diagnostics indicate that the model is estimable. The VIFs for MastExp and SocPer are 13.09 and 10.31, slightly above the conventional threshold of 10, while the maximum condition index is 8.56, well below the critical threshold of 30 (Kalnins & Hill, 2023; O’Brien, 2007). These diagnostics call for a cautious interpretation of coefficient magnitudes without invalidating the joint specification.

Using an individual fixed-effects model, the joint specification is written as follows:

E x D u r_{i, t + 1} = α + \sum_{m = 1}^{4} β_{m} \cdot D i m e n s i o n_{m, i, t} + γ \cdot C o n t r o l s_{i, t} + μ_{i} + ϵ_{i, t},

(7)

where

D i m e n s i o n_{m, i, t}

denotes mastery experience, vicarious experience, social persuasion, or physiological states at time t. The joint model estimates each source dimension conditional on the other three dimensions.

As a sensitivity check, we also estimate separate single-dimension models:

E x D u r_{i, t + 1} = α + β_{m} \cdot D i m e n s i o n_{m, i, t} + γ \cdot C o n t r o l s_{i, t} + μ_{i} + ϵ_{i, t}, m \in {1, 2, 3, 4} .

(8)

The separate models estimate gross associations between each source expression and subsequent exercise behavior, while the joint model estimates conditional associations. This comparison helps to assess whether the joint estimates are sensitive to intercorrelations among the four dimensions. The controls, fixed effects, and clustered standard errors follow Study II.

4.3.3. Results

Table 10 reports the criterion-validity results for overall self-efficacy and its four source dimensions. All models use the same analysis sample. The overall self-efficacy score is positively associated with exercise duration at time

t + 1

(

β = 0.008

,

p < 0.001

), supporting H4.

The separate models show positive gross associations for mastery experience (

β = 0.025

,

p < 0.001

), vicarious experience (

β = 0.052

,

p < 0.001

), social persuasion (

β = 0.003

,

p < 0.05

), and physiological states (

β = 0.025

,

p < 0.001

). The joint model reveals a more differentiated pattern. Mastery experience (

β = 0.117

,

p < 0.001

), vicarious experience (

β = 0.122

,

p < 0.001

), and physiological states (

β = 0.025

,

p < 0.001

) remain positively associated with exercise duration, supporting H5, H6, and H7. Social persuasion becomes negatively associated with exercise duration in the joint model (

β = - 0.231

,

p < 0.001

), supporting H8.

This contrast is substantively informative. Social persuasion is weakly positive when entered alone but negative after controlling for the other three sources, suggesting that its residual component may capture support-seeking, motivational vulnerability, or difficulty-related appeals for encouragement after mastery, vicarious, and physiological-affective signals are accounted for.

Because the joint model is the primary specification, we interpret its coefficients as approximate semi-elasticities. A one-standard-deviation increase in mastery experience, vicarious experience, physiological states, and social persuasion corresponds to about +12.4%, +13.0%, +2.5%, and −20.6% in exercise duration at

t + 1

, or approximately +1.94, +2.03, +0.40, and −3.22 min/week using the same raw-mean reference. These modest raw-minute equivalents support criterion validity for text-based measurement but should not be interpreted as clinically meaningful changes in exercise behavior.

Overall, these findings provide criterion-validity evidence for WEPA in multidimensional psychological measurement. They also suggest that self-efficacy expressions in online fitness communities should be interpreted as contextually embedded linguistic indicators, not as direct one-to-one replications of classical self-efficacy sources.

4.4. Study IV: Exploratory Relative-Week Dynamics

Study IV provides an exploratory descriptive analysis of relative-week dynamics in a naturalistic digital setting. It is not intended as a formal test of ecological validity, a hypothesis test, or the evidence of individual developmental trajectories. Instead, it examines whether WEPA-derived construct scores reveal broad and interpretable aggregate patterns when observed users are aligned by time since platform registration.

Because users differ in activity duration and participation continuity, directly tracing individual trajectories would be misleading. Some users disengage quickly, some return after interruptions, and later relative weeks contain a more selected subset of retained users. We therefore align users by registration week (

t = 0

), aggregate construct scores at each relative week, and interpret the results as patterns among observable users at each stage rather than as trajectories of the original registration cohort.

For each construct, this procedure produces an aggregate relative-week trajectory based on the user–week records observed during that relative week. To smooth short-term fluctuations, we apply a three-week centered moving average. For comparability across constructs, each raw WEPA score is standardized using its full-sample mean and standard deviation before aggregation. Figure 4 therefore reports mean standardized WEPA scores rather than raw projection values.

When all seven trajectories are considered together, three descriptive patterns emerge. First, mastery experience, vicarious experience, and social persuasion show a pronounced early decline that becomes much flatter after approximately Week 20, with all three trajectories later fluctuating close to the zero line. Mastery experience generally remains the highest among the three, although their differences become smaller after the early stage.

Second, goal difficulty and goal specificity both move upward, but with different temporal profiles. Goal difficulty converges rapidly toward the zero line and remains close to it after approximately Week 20. Goal specificity starts from the lowest level and increases more gradually across the observation window. Even in later weeks, it remains below zero, making it the clearest long-term directional pattern in the figure.

Third, physiological states and goal commitment display weaker recovery-like patterns. Both decline in the early stage and later move partially toward the zero line. Compared with goal specificity and goal difficulty, these two trajectories show smaller changes and should be interpreted as mild aggregate adjustments.

Taken together, the trajectories suggest an approximate descriptive transition around Week 20. This point is a visual reference, not a statistically estimated breakpoint. Before Week 20, several trajectories change relatively quickly; after Week 20, most constructs stabilize near the neutral range or fluctuate within a narrower band. The main long-term pattern is therefore not a uniform increase in motivational intensity, but gradual improvement in goal-related specificity alongside a broader stabilization of other construct-related expressions among retained observable users.

5. Robustness Checks

This section reports five robustness checks covering textual sparsity, technical implementation, anchor-word perturbation, subgroup heterogeneity, and measurement-error bias. The textual-sparsity analysis examines whether the findings depend on limited valid-token information. Technical sensitivity, anchor-word perturbation, and subgroup analyses use goal commitment as a representative construct, while SIMEX covers all core constructs.

5.1. Sensitivity to Textual Sparsity

Because WEPA scores are computed from the average vector of valid tokens, extremely short user–week texts may yield less stable construct scores. We therefore conduct a sensitivity analysis based on ValidStrLen, the effective length of cleaned text used in WEPA scoring. First, we exclude observations with fewer than five valid tokens. Second, we compare shorter valid-text observations (

0 < ValidStrLen < 20

) with longer observations (

ValidStrLen \geq 20

). Goal-setting constructs are estimated separately, consistent with Study II, and self-efficacy dimensions are estimated jointly, consistent with Study III.

As shown in Table 11, all focal coefficients retain their original directions and remain statistically significant after observations with fewer than five valid tokens are excluded. The shorter- and longer-text subgroup results also show stable coefficient directions. These findings suggest that the main patterns are not primarily driven by extremely sparse user–week texts.

5.2. Technical Robustness

We test whether WEPA scores remain stable under alternative technical specifications. Using the main specification M1, defined as GloVe with 200 dimensions, a window size of 15, and the full corpus, we construct four alternatives. M2 replaces GloVe with Word2Vec, M3 reduces the vector dimension to 100, M4 reduces the context window to 10, and M5 uses a 50% random subsample of the corpus.

For each specification, we recompute the WEPA goal commitment score and correlate it with the human benchmark. As shown in Table 12, all WEPA correlations remain above 0.86 and close to the main specification (

ρ = 0.890

), while substantially outperforming the dictionary baseline. These results indicate that WEPA’s benchmark agreement is not driven by a single embedding architecture or hyperparameter setting.

5.3. Dictionary Robustness

This check examines whether WEPA depends heavily on a small number of anchor words. Using goal commitment as the representative construct, we randomly remove one, two, and three positive–negative anchor-word pairs, recompute the construct scores, and re-estimate their predictive association with exercise duration at

t + 1

.

As shown in Table 13, the coefficient for goal commitment remains stable across perturbation conditions, ranging from 0.067 to 0.072 across R0–R3. The maximum deviation from the baseline estimate does not exceed 4.35%. This pattern suggests that WEPA captures a broader semantic structure defined by the anchor-word set, not the mechanical effect of a few high-frequency words.

5.4. Cross-Group Robustness

This check evaluates whether the association between goal commitment and future exercise duration remains stable across gender subsamples. If the association appeared only in one subgroup, WEPA’s generalizability would be limited.

Table 14 shows that goal commitment is positively associated with future exercise duration in both female and male subsamples (

β_{female} = 0.074

,

p < 0.001

;

β_{male} = 0.059

,

p < 0.001

). A between-group difference test indicates no significant coefficient difference across gender groups (

p = 0.087

). The signs of the control variables are also consistent across subsamples, supporting the stability of the focal association across gender.

5.5. Estimation Robustness

Finally, we assess whether measurement error in WEPA scores attenuates the regression estimates. Because these scores are text-based proxies for latent constructs, some measurement noise is unavoidable. Under classical measurement error, noise in explanatory variables biases coefficients toward zero.

We apply SIMEX by adding artificial measurement error with multipliers

λ \in {0, 0.5, 1, 1.5, 2}

, running 30 simulations at each value, and extrapolating the coefficient path to

λ = - 1

using a quadratic function. This procedure estimates the extent to which the original coefficients are attenuated by measurement error.

Table 15 shows that the naive estimates are attenuated by

- 11.2 %

to

- 15.1 %

across constructs. For goal commitment, the naive estimate is

β_{naive} = 0.0689

, while the SIMEX-corrected estimate is

β_{simex} = 0.0792

, implying a bias of

- 13.0 %

. For self-efficacy, the corresponding bias is

- 13.2 %

.

Figure 5 visualizes the SIMEX curves. For constructs with positive coefficients in the main models, the estimates move toward zero as measurement error increases; for constructs with negative coefficients, the estimates move toward zero from below. This pattern is consistent with classical measurement-error theory.

SIMEX correction addresses attenuation from measurement noise, but it does not alter the observational nature of the study or establish causality. Overall, the corrected estimates suggest that the baseline coefficients are conservative. Measurement noise modestly attenuates association strength without changing coefficient signs, supporting the robustness of WEPA-based construct scores.

6. Discussion

6.1. Main Findings

This study developed the Word Embedding Projection Approach (WEPA) and evaluated its usefulness for theory-driven psychological measurement from social media text. Using long-panel data from the Keep fitness community, the results provide convergent evidence on measurement performance, behavioral relevance, temporal comparability, and exploratory longitudinal applicability. WEPA showed strong agreement with human annotation, achieved broader text coverage than the dictionary-based baseline, and produced construct scores meaningfully associated with subsequent exercise behavior. The relative-week analysis further showed that WEPA can recover interpretable aggregate temporal patterns in user-generated text. These findings support the usefulness of WEPA as an interpretable measurement approach. At the same time, they should be understood as evidence of measurement performance, predictive associations, and exploratory descriptive utility, not as evidence of causal effects or strict measurement invariance.

Several substantive patterns emerged across constructs. For goal-setting constructs, goal commitment and goal specificity were positively associated with later exercise duration, consistent with Goal-Setting Theory (Locke & Latham, 2019). Goal difficulty showed a negative association, reflecting the platform-specific meaning of difficulty-related expressions, which often capture perceived obstacles, fatigue, or reduced persistence. For self-efficacy, mastery experience, vicarious experience, and physiological states were positively associated with later exercise behavior, while social persuasion showed a negative association. This result does not imply that social persuasion weakens self-efficacy. It suggests that users’ own mentions of encouragement or support may become salient when they are struggling, seeking reassurance, or experiencing motivational vulnerability (Bandura, 1977).

The temporal analysis adds a descriptive perspective on aggregate psychological dynamics. Construct-related language changed systematically across participation stages, with early declines followed by stabilization for several dimensions and gradual increases in goal specificity and goal difficulty. These patterns do not establish individual developmental trajectories, because later relative weeks contain a more selected group of continuing users and participation intensity varies across users. They nevertheless illustrate how WEPA can describe aggregate changes in theory-relevant psychological language in naturalistic digital settings.

6.2. Contributions to Theory and Measurement

This study makes three contributions to computational behavioral science and psychological measurement. First, it introduces WEPA as a theory-driven and interpretable approach for measuring psychological constructs from social media text. WEPA operationalizes constructs as semantic projections along theoretically grounded bipolar axes, making it most suitable for constructs that can be meaningfully represented through contrastive semantic poles. By linking anchor words, semantic axes, and construct scores, WEPA connects psychological theory with distributional semantics and offers an interpretable complement to dictionary-based methods and deep learning models (Grand et al., 2022; Kozlowski et al., 2019). Its application to more than 2.6 million user–week observations demonstrates practical feasibility for large-scale longitudinal text analysis, while broader scalability across platforms, languages, and computational environments remains a direction for future evaluation.

Second, the study contributes to validation practice by treating text-based psychological measurement as a layered validation problem. The validation strategy combines benchmark-based evaluation, predictive associations with subsequent behavior, temporal-stability diagnostics, and exploratory longitudinal analysis. These components address different questions, including whether WEPA aligns with human judgment, whether WEPA-derived scores are associated with later behavior, whether semantic axes remain sufficiently comparable across time, and whether construct-related language shows interpretable aggregate temporal patterns. This structure helps to avoid treating annotation agreement, behavioral association, temporal comparability, and descriptive dynamics as one undifferentiated form of validity (Flake et al., 2017; Kyle et al., 2020).

Third, the study clarifies how classical psychological constructs should be interpreted when measured through user-generated text. The findings broadly support the relevance of Goal-Setting Theory and Social Cognitive Theory in digital fitness communities, but they also show that text-based scores capture expressed psychological salience rather than the external motivational conditions described in classical theories. This distinction is especially important for goal difficulty and social persuasion. Difficulty-related language may reflect perceived obstacles or fatigue, while persuasion-related language may reflect users’ own mentions of encouragement or support. WEPA therefore provides a structured way to study theory-relevant psychological language while keeping the interpretation of digital trace measures within appropriate boundaries.

6.3. Practical Implications

WEPA has practical implications for monitoring psychological signals in digital health platforms. By converting user-generated text into interpretable construct scores, it can help to identify aggregate signals that are difficult to observe through behavioral logs alone. Changes in expressed goal commitment, perceived difficulty, or self-efficacy-related language may indicate motivational strain, declining persistence, or emerging support needs. These signals can support the platform-level monitoring and retrospective evaluation of community dynamics.

WEPA may also support more theory-informed intervention design. Platforms could move beyond generic engagement metrics and examine whether users’ language reflects clearer goals, stronger commitment, positive efficacy-related experiences, or greater expressed difficulty. Such information may inform feedback messages, goal-setting prompts, community support features, personalized encouragement strategies, and the timing of intervention delivery.

These applications require caution. WEPA scores reflect expressed psychological salience in text, not direct diagnoses of users’ internal states or objective motivational conditions. This approach is most appropriate for aggregate analysis, content design, and platform-level decision support. Individual-level applications require additional validation, ethical safeguards, and transparency to avoid overinterpretation or intrusive behavioral profiling.

6.4. Limitations and Future Research

Several limitations should be acknowledged. First, the generalizability of WEPA requires further validation. The present study is based on a single Chinese fitness-oriented social media platform, and both the embedding space and anchor-word dictionaries were developed for this specific linguistic and behavioral context. This localization strengthens construct validity in the Keep community but limits direct portability across platforms, languages, and cultural settings. In addition, the network-based sampling procedure and the exclusion of users with insufficient observation records may overrepresent socially active and more persistent users. Future research should examine whether the WEPA workflow can be replicated in other digital environments and how anchor-word dictionaries can be adapted through cross-domain validation, cross-language alignment, theory-guided seed words, large-language-model expansion, and expert review.

Second, the measurement evidence remains bounded in scope. Study I provides anchor-word validation, human-coded benchmarks for selected constructs, and cross-temporal semantic-axis stability checks. These analyses support longitudinal comparability, although they do not provide a full test of measurement invariance in the strict psychometric sense (Leitgöb et al., 2022; Liu et al., 2017). WEPA is also most suitable for constructs that can be meaningfully represented through contrastive semantic poles; extending the approach to non-bipolar constructs remains a direction for future research. Some construct dimensions, especially the four self-efficacy sources, show high empirical correlations, suggesting substantial shared variance across theoretically distinct dimensions in platform text. Although the textual-sparsity sensitivity analysis suggests that the main results are not driven by extremely sparse texts, weekly aggregation still involves a trade-off between measurement stability and temporal precision. Future work should develop larger annotated benchmarks, examine discriminant validity across related constructs, and explore alternative aggregation strategies.

Third, future applications of WEPA need to improve the scalability of anchor-word development and expand beyond text-only measurement. In the current study, anchor-word dictionaries were developed through theory-guided selection and expert validation, which supports construct validity but remains labor-intensive when WEPA is extended to new constructs, platforms, or languages. Large language models may help to generate candidate anchor words from theory-guided seed terms, explain semantic distinctions among candidates, and support cross-language adaptation, although expert review and empirical validation remain necessary. Multimodal large language models also make it increasingly feasible to align text, images, audio, video, and behavioral traces within a shared semantic space. Future research could extend WEPA from text-based measurement to multimodal semantic projection and move psychological measurement from weekly aggregation toward daily or event-level analysis (Sudarsanam et al., 2025; Zhu et al., 2024). Because the present study is observational, the reported associations should be interpreted as predictive associations. Future work could combine WEPA with experimental or quasi-experimental designs to support stronger causal inference.

Overall, WEPA should be understood as a promising and context-sensitive measurement workflow, not as a fully general solution for psychological measurement from all social media text. The present study provides evidence from Chinese fitness-oriented social media text. Broader claims require replication across platforms, languages, populations, and theoretical constructs.

Author Contributions

Conceptualization, X.D.; methodology, X.D.; software, X.D.; validation, X.D.; formal analysis, X.D.; investigation, X.D.; data curation, X.D.; writing—original draft preparation, X.D.; writing—review and editing, X.D.; supervision, Y.L.; project administration, Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study is fully exempt from institutional ethical review under the Measures for the Ethical Review of Life Science and Medical Research Involving Human Beings (Order No. 4 [2023], National Health Commission of the People’s Republic of China). According to these Measures, research that uses publicly available data and involves no intervention, no interaction with human participants, and no collection of identifiable personal information falls within the statutory exemption scope. The data used in this study consist solely of publicly accessible social media text that was voluntarily posted by users. Prior to analysis, all data underwent complete anonymization and de-identification, with the permanent removal of all personal identifiers (including usernames, IDs, contact information, and any traceable attributes). No human subjects, experiments, surveys, or interventions were conducted, and all analyses were performed exclusively at the aggregated group level. Therefore, no individual can be identified directly or indirectly from the processed dataset. In compliance with the above national regulatory framework, ethical approval was not required for this study.

Informed Consent Statement

Ethical approval for the waiver of informed consent was granted because the study employed only publicly accessible and de-identified secondary data, did not involve direct contact with users, and did not report any identifiable individual-level information.

Data Availability Statement

The code for WEPA scoring and the complete Chinese anchor-word dictionaries are available in the open-source Python package cntext (version 2.2.0): https://github.com/hiDaDeng/cntext, accessed on 10 May 2026. The repository includes installation instructions, a minimal WEPA example, API descriptions, and basic tests for semantic-axis construction and text scoring. The raw platform data contain user-generated content and are not publicly redistributed to protect user privacy.

Acknowledgments

The authors are grateful to the anonymous reviewers for their valuable comments and constructive suggestions, which have greatly improved the quality of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Aceves, P., & Evans, J. A. (2024). Mobilizing conceptual spaces: How word embedding models can inform measurement and theory within organization science. Organization Science, 35(3), 788–814. [Google Scholar] [CrossRef]
Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215. [Google Scholar] [CrossRef]
Barchard, K., Carroll, J. M., Reynolds, S., & Russell, J. A. (2024). Testing Bipolarity. Psychological Methods. Advance online publication. [CrossRef]
Berger, J., Humphreys, A., Ludwig, S., Moe, W. W., Netzer, O., & Schweidel, D. A. (2020). Uniting the tribes: Using text for marketing insight. Journal of Marketing, 84(1), 1–25. [Google Scholar] [CrossRef]
Brandstätter, V., & Bernecker, K. (2022). Persistence and Disengagement in Personal Goal Pursuit. Annual Review of Psychology, 73, 271–299. [Google Scholar] [CrossRef] [PubMed]
Cassani, G., Bianchi, F., & Marelli, M. (2021). Words with Consistent Diachronic Usage Patterns are Learned Earlier: A Computational Analysis Using Temporally Aligned Word Embeddings. Cognitive Science, 45(4), e12963. [Google Scholar] [CrossRef] [PubMed]
Demszky, D., Yang, D., Yeager, D. S., Bryan, C. J., Clapper, M., Chandhok, S., Eichstaedt, J., Hecht, C. A., Jamieson, J. P., Johnson, M., Jones, M., Krettek-Cobb, D., Lai, L., JonesMitchell, N., Ong, D. C., Dweck, C., Gross, J. J., & Pennebaker, J. W. (2023). Using large language models in psychology. Nature Reviews Psychology, 2, 688–701. [Google Scholar] [CrossRef]
Deng, X., & Nan, P. (2022). cntext: A Python tool for text mining. Computer software. Zenodo. Available online: https://github.com/hiDaDeng/cntext (accessed on 10 May 2026).
Douven, I., Decock, L., Dietz, R., & Égré, P. (2011). Vagueness: A conceptual spaces approach. Journal of Philosophical Logic, 42, 137–160. [Google Scholar] [CrossRef]
Eichstaedt, J. C., Kern, M. L., Yaden, D. B., Schwartz, H. A., Giorgi, S., Park, G., Hagan, C. A., Tobolsky, V. A., Smith, L. K., Buffone, A., & Iwry, J. (2020). Closed- and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations. Psychological Methods, 26(4), 398–427. [Google Scholar] [CrossRef]
Eichstaedt, J. C., & Weidman, A. (2020). Tracking fluctuations in psychological states using social media language: A case study of weekly emotion. European Journal of Personality, 34(5), 845–858. [Google Scholar] [CrossRef]
Feng, R., & Ivanov, A. (2023). Gender differences in emotional valence and social media content engagement behaviors in pandemic diaries: An analysis based on microblog texts. Behavioral Sciences, 13(1), 34. [Google Scholar] [CrossRef]
Flake, J., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research. Social Psychological and Personality Science, 8(4), 370–378. [Google Scholar] [CrossRef]
Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40(1), 129–152. [Google Scholar] [CrossRef]
Grand, G., Blank, I. A., Pereira, F., & Fedorenko, E. (2022). Semantic projection recovers rich human knowledge of multiple object features from word embeddings. Nature Human Behaviour, 6, 975–987. [Google Scholar] [CrossRef]
Hartmann, J., Huppertz, J., Schamp, C., & Heitmann, M. (2019). Comparing automated text classification methods. International Journal of Research in Marketing, 36(1), 20–38. [Google Scholar] [CrossRef]
Herderich, A., Freudenthaler, H. H., & Garcia, D. (2024). A computational method to reveal psychological constructs from text data. Psychological Methods. Advance online publication. [CrossRef]
Humphreys, A., & Wang, R. J.-H. (2017). Automated text analysis for consumer research. Journal of Consumer Research, 44(6), 1274–1306. [Google Scholar] [CrossRef]
Kadhim, A. I. (2019). Survey on supervised machine learning techniques for automatic text classification. Artificial Intelligence Review, 52, 273–292. [Google Scholar] [CrossRef]
Kalnins, A., & Hill, K. P. (2023). The VIF score. What is it good for? Absolutely nothing. Organizational Research Methods, 28, 58–75. [Google Scholar] [CrossRef]
Kjell, O., Kjell, K., Garcia, D., & Sikström, S. (2019). Semantic measures: Using natural language processing to measure, differentiate, and describe psychological constructs. Psychological Methods, 24(1), 92–115. [Google Scholar] [CrossRef] [PubMed]
Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The geometry of culture: Analyzing the meanings of class through word embeddings. American Sociological Review, 84(5), 905–949. [Google Scholar] [CrossRef]
Kyle, G., Landon, A. C., Vaske, J., & Wallen, K. (2020). Tools for assessing the psychometric adequacy of latent variables in conservation research. Conservation Biology, 34(6), 1353–1363. [Google Scholar] [CrossRef]
Lee, J. W., & Han, D. H. (2023). Data analysis of psychological approaches to soccer research: Using LDA topic modeling. Behavioral Sciences, 13(10), 787. [Google Scholar] [CrossRef]
Leitgöb, H., Seddig, D., Asparouhov, T., Behr, D., Davidov, E., De Roover, K., Jak, S., Meitinger, K., Menold, N., Muthén, B., Rudnev, M., Schmidt, P., & van de Schoot, R. (2022). Measurement invariance in the social sciences: Historical development, methodological challenges, state of the art, and future perspectives. Social Science Research, 110, 35–65. [Google Scholar] [CrossRef] [PubMed]
Liu, Y., Millsap, R., West, S., Tein, J., Tanaka, R., & Grimm, K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22, 486–506. [Google Scholar] [CrossRef]
Locke, E. A., & Latham, G. (2019). The development of goal setting theory: A half century retrospective. Motivation Science, 5(2), 93–105. [Google Scholar] [CrossRef]
Nikadon, J., Suitner, C., Erseghe, T., Džanko, L., & Formanowicz, M. (2025). BERTAgent: The development of a novel tool to quantify agency in textual data. Journal of Experimental Psychology: General, 154(7), 1855–1877. [Google Scholar] [CrossRef]
O’Brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41, 673–690. [Google Scholar] [CrossRef]
Orrù, L., Cuccarini, M., Moro, C., & Turchi, G. P. (2024). Clinical questions and psychological change: How can artificial intelligence support mental health practitioners? Behavioral Sciences, 14(12), 1225. [Google Scholar] [CrossRef]
Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Technical Report. University of Texas at Austin. Available online: https://repositories.lib.utexas.edu/handle/2152/31333 (accessed on 10 May 2026).
Reise, S., Bonifay, W., & Haviland, M. G. (2013). Scoring and modeling psychological measures in the presence of multidimensionality. Journal of Personality Assessment, 95, 129–140. [Google Scholar] [CrossRef]
Rodriguez, A., Reise, S., & Haviland, M. G. (2016). Applying bifactor statistical indices in the evaluation of psychological measures. Journal of Personality Assessment, 98, 223–237. [Google Scholar] [CrossRef] [PubMed]
Sclar, M., Choi, Y., Tsvetkov, Y., & Suhr, A. (2023). Quantifying language models’ sensitivity to spurious features in prompt design or: How I learned to start worrying about prompt formatting. arXiv, arXiv:2310.11324. [Google Scholar] [CrossRef]
Senko, C., & Harackiewicz, J. (2005). Achievement goals, task performance, and interest: Why perceived goal difficulty matters. Personality and Social Psychology Bulletin, 31(12), 1739–1753. [Google Scholar] [CrossRef]
Speer, A. B., Perrotta, J., & Kordsmeyer, T. L. (2024). Taking it easy: Off-the-shelf versus fine-tuned supervised modeling of performance appraisal text. Organizational Research Methods, 29, 73–91. [Google Scholar] [CrossRef]
Sudarsanam, P., Martín-Morató, I., & Virtanen, T. (2025). Representation learning for semantic alignment of language, audio, and visual modalities. arXiv, arXiv:2505.14562. [Google Scholar] [CrossRef]
Sultan, M., Scholz, C., & van den Bos, W. (2023). Leaving traces behind: Using social media digital trace data to study adolescent wellbeing. Computers in Human Behavior Reports, 10, 100281. [Google Scholar] [CrossRef]
Wirth, J., Künsting, J., & Leutner, D. (2009). The impact of goal specificity and goal type on learning outcome and cognitive load. Computer in Human Behavior, 25(2), 299–305. [Google Scholar] [CrossRef]
Yao, Z., Sun, Y., Ding, W., Rao, N., & Xiong, H. (2018). Dynamic word embeddings for evolving semantic discovery. In Proceedings of the eleventh acm international conference on web search and data mining (pp. 673–681). Association for Computing Machinery. [Google Scholar] [CrossRef]
Zhang, N., & Hu, W. (2024). Do psychological ownership and communicative presence matter? examining how user-generated content in E-commerce live streaming influences consumers’ purchase intention. Behavioral Sciences, 14(8), 696. [Google Scholar] [CrossRef] [PubMed]
Zhou, L., Schellaert, W., Martínez-Plumed, F., Moros-Daval, Y., Ferri, C., & Hernández-Orallo, J. (2024). Larger and more instructable language models become less reliable. Nature, 634, 61–68. [Google Scholar] [CrossRef]
Zhu, X., Guo, C., Feng, H., Huang, Y., Feng, Y., Wang, X., & Wang, R. (2024). A review of key technologies for emotion analysis using multimodal information. Cognitive Computation, 16, 1504–1530. [Google Scholar] [CrossRef]

Figure 1. Conceptual-space representation of text-based psychological measurement. Word embeddings define the semantic space, theory-driven anchor words define bipolar construct axes, and user text is projected onto these axes to generate observable construct indicators. Aggregated scores can then be used to describe group- or community-level patterns.

Figure 2. Three-layer validation framework.

Figure 3. Cross-temporal stability of semantic axes. Panel (a) shows semantic axis stability based on cosine similarity. Panel (b) shows rank-order consistency based on Spearman correlations.

Figure 4. Exploratory relative-week trajectories of psychological construct scores. Each construct score is standardized using its full-sample mean and standard deviation before aggregation by relative week.

Figure 5. SIMEX curves for core constructs.

Table 1. Alternative approaches to text-based psychological measurement.

Approach	Main Strengths	Main Limitations
Dictionary-based methods	Transparent, interpretable, and computationally efficient	Limited by predefined categories, exact word matching, and domain-specific coverage
Supervised machine learning	Strong task-specific prediction when high-quality labeled data are available	Requires construct-specific labels; model outputs may be less interpretable as theory-defined construct scores
Transformer or LLM-based scoring	Flexible semantic understanding and strong contextual language processing	Sensitive to prompts, model versions, computational cost, and reproducibility constraints
Semantic projection methods, including WEPA	Continuous semantic scoring with explicit anchors and interpretable geometric structure	Require meaningful semantic axes, validated anchors, and stability diagnostics

Note: This table summarizes conceptual trade-offs among method families and does not report supervised benchmarks across all model families.

Table 2. WEPA implementation workflow.

Step	Stage	Description
1	Construct definition	Specify the theoretical construct and its dimensional structure. The construct should have a clear theoretical definition and, where appropriate, a bipolar interpretation.
2	Anchor construction	Select and review anchor words for the construct poles. Anchor words should be theory-driven, domain-sensitive, and expert-reviewed to support content validity.
3	Corpus preparation	Collect and clean the domain corpus. The corpus should represent the target language environment and be denoised before embedding training.
4	Embedding training	Train a domain-specific static word embedding model and assess whether the resulting semantic space captures domain-relevant relations through semantic-neighbor inspection.
5	Semantic projection	Construct semantic axes from anchor centroids, compute mean vector of cleaned text tokens, and project text vector onto the axis to obtain WEPA construct score.
6	Measurement validation	Evaluate whether scores show theoretically expected measurement properties through anchor validation, benchmark comparison, temporal stability checks, and criterion-validity analyses.
7	Empirical application	Use the validated scores in substantive analyses, such as panel regressions, predictive association models, or exploratory longitudinal analyses.

Table 3. Variable descriptions.

Variables	Description
Panel Identifiers
userid	De-identified platform ID, used for individual fixed effects.
WeekI	Number of weeks between the current observation week and the user’s registration week.
Dependent Variable
ExDur	Log-transformed weekly exercise duration in minutes, computed as $ln (x + 1)$ .
Independent Variables (WEPA projection scores)
GoalCommit	Projection score of the weekly text vector onto the Goal Commitment semantic axis.
GoalSpec	Projection score of the weekly text vector onto the Goal Specificity semantic axis.
GoalDiff	Projection score of the weekly text vector onto the Goal Difficulty semantic axis.
SelfEfficacy	Sum of the four subdimensional projection scores: MastExp + VicExp + SocPer + PhyState.
MastExp	Projection score of the weekly text vector onto the Mastery Experience semantic axis.
VicExp	Projection score of the weekly text vector onto the Vicarious Experience semantic axis.
SocPer	Projection score of the weekly text vector onto the Social Persuasion semantic axis.
PhyState	Projection score of the weekly text vector onto the Physiological States semantic axis.
Time-Varying Control Variables
age	Age in years, calculated as the difference between the observation week and the user’s date of birth.
StrLen	Character length of the user–week text, computed as the number of characters in the text string.
ValidStrLen	Effective length of the cleaned user–week text, measured by valid tokens for WEPA scoring.
DurChg	First difference in logged exercise duration between adjacent weeks: $E x D u r_{i, t} - E x D u r_{i, t - 1}$ .
SocEngage	Log-transformed weekly social engagement, computed as $ln (x + 1)$ from discussion and check-in activities.
SocFeedback	Log-transformed weekly social feedback, computed as $ln (x + 1)$ from likes and comments received.

Note: WEPA variables are raw scalar projection scores. Continuous predictors are standardized in regression analysis. ValidStrLen is used only in the textual-sparsity sensitivity analysis.

Table 4. Summary statistics.

Variable	Users	Obs	Mean	S.D.	Min	Max
WeekI	177,829	2,668,298	97.636	74.244	2.000	338.000
ExDur	177,829	2,668,298	2.514	2.343	0.000	8.504
SelfEfficacy	177,829	2,668,298	6.680	2.006	3.616	12.337
MastExp	177,829	2,668,298	1.950	0.608	1.072	3.755
VicExp	177,829	2,668,298	1.175	0.369	0.608	3.999
SocPer	177,829	2,668,298	1.929	0.625	1.049	3.989
PhyState	177,829	2,668,298	1.626	0.500	0.887	3.535
GoalCommit	177,829	2,668,298	1.101	0.484	0.403	3.291
GoalSpec	177,829	2,668,298	−0.354	0.408	−0.926	2.656
GoalDiff	177,829	2,668,298	−1.079	0.404	−1.623	2.124
age	177,829	2,668,298	29.459	8.002	10.003	80.000
StrLen	177,829	2,668,298	50.559	58.858	1.000	188.000
ValidStrLen	177,829	2,668,298	44.198	51.364	0.000	164.000
DurChg	177,829	2,668,298	0.482	1.458	−8.368	8.349
SocEngage	177,829	2,668,298	1.811	0.820	0.693	3.258
SocFeedback	177,829	2,668,298	2.646	1.757	0.000	5.501

Table 5. Correlation matrix.

Variable	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)	(10)	(11)	(12)	(13)	(14)
(1) ExDur	1.000
(2) GoalCommit	−0.053	1.000
(3) GoalSpec	−0.072	0.084	1.000
(4) GoalDiff	−0.025	−0.544	0.147	1.000
(5) SelfEfficacy	−0.030	0.862	−0.171	−0.669	1.000
(6) MastExp	−0.005	0.831	−0.197	−0.647	0.977	1.000
(7) VicExp	−0.002	0.798	−0.085	−0.677	0.918	0.858	1.000
(8) SocPer	−0.035	0.803	−0.222	−0.601	0.965	0.957	0.834	1.000
(9) PhyState	−0.071	0.841	−0.146	−0.615	0.937	0.894	0.792	0.872	1.000
(10) age	0.045	0.036	0.050	−0.007	0.026	0.024	0.015	0.024	0.036	1.000
(11) StrLen	0.308	−0.074	0.092	0.045	−0.172	−0.179	−0.149	−0.163	−0.162	0.055	1.000
(12) DurChg	0.333	−0.021	−0.066	−0.025	0.011	0.018	0.011	0.013	−0.002	0.003	0.069	1.000
(13) SocEngage	0.696	−0.011	0.044	−0.021	−0.045	−0.043	−0.033	−0.047	−0.048	0.129	0.507	0.218	1.000
(14) SocFeedback	0.510	−0.128	0.047	0.114	−0.185	−0.193	−0.185	−0.176	−0.147	0.120	0.653	0.145	0.756	1.000

Table 6. English glosses of selected chinese anchor words.

Construct	Direction	Anchor-Word Glosses
Goal Specificity	Positive	kg; BMI; pace; minutes; calories; heart rate
Goal Specificity	Negative	lose weight; get stronger; look better; look attractive; feel better
Goal Difficulty	Positive	difficult; grit one’s teeth; tearing; breakthrough; overcome; limit
Goal Difficulty	Negative	basic; beginner; entry-level; activation; soothing; easy
Goal Commitment	Positive	determination; firm resolve; persistence; plan; follow through; deliver
Goal Commitment	Negative	delay; hesitation; interruption; termination; retreat
Mastery Experience	Positive	success; achievement; victory; reach the target; complete
Mastery Experience	Negative	failure; give up; lose control; stall; come to an end
Vicarious Experience	Positive	seek help; ask for advice; consult; expert; senior; imitate; refer to; template
Vicarious Experience	Negative	alone; no one to teach; rely on feeling; grope one’s way; fool around; act blindly
Social Persuasion	Positive	recognition; support; affirmation; encouragement; praise; compliment; trust
Social Persuasion	Negative	denial; doubt; ridicule; sarcasm; mockery; neglect; feel disheartened
Physiological State	Positive	vitality; energy; happiness; confidence; fully energized; alert; high-spirited
Physiological State	Negative	weakness; drowsiness; fatigue; stress; self-abasement; listlessness

Note: The English entries are interpretive glosses provided for readability and are not used as operational dictionaries in the empirical analysis. Where necessary, contextualized phrases rather than literal translations are used to preserve construct meaning.

Table 7. Semantic-space diagnostic results for anchor-word sets.

Construct	Average Within-POS Similarity	Average Within-NEG Similarity	Average Between-Pole Similarity
Goal Specificity	0.3399	0.3216	0.2282
Goal Commitment	0.2095	0.1586	0.0899
Vicarious Experience	0.1967	0.0723	0.0426
Mastery Experience	0.4068	0.2077	0.1763
Physiological State	0.3086	0.2549	0.1524
Social Persuasion	0.2930	0.1505	0.0834
Goal Difficulty	0.3054	0.4454	0.2968

Note: The table reports average cosine similarities for each construct in the embedding space. When both within-pole similarities exceed the between-pole similarity, the anchor-word set is considered to show the expected pattern of separation between poles.

Table 8. Benchmark Comparison: WEPA vs. dictionary-based method.

Measurement Method	Goal Specificity		Physiological State
Measurement Method	$ρ$	Coverage	$ρ$	Coverage
Dictionary-based	0.733	32.4%	0.161	9.7%
WEPA	0.895	100%	0.850	100%

Note: N = 518 (goal specificity) and 536 (physiological state) human-annotated texts. Coverage denotes the percentage of texts that receive a non-missing score.

Table 9. Criterion validity of goal-setting constructs.

	DV: ${ExDur}_{i, t + 1}$
	(1) Baseline Model	(2) GoalCommit	(3) GoalSpec	(4) GoalDiff
GoalCommit		0.069 ***
		(0.002)
GoalSpec			0.019 ***
			(0.002)
GoalDiff				−0.053 ***
				(0.002)
age	−0.942 ***	−0.945 ***	−0.944 ***	−0.942 ***
	(0.004)	(0.004)	(0.004)	(0.004)
StrLen	0.011 ***	0.011 ***	0.010 ***	0.011 ***
	(0.002)	(0.002)	(0.002)	(0.002)
DurChg	0.066 ***	0.066 ***	0.066 ***	0.065 ***
	(0.001)	(0.001)	(0.001)	(0.001)
SocEngage	0.685 ***	0.681 ***	0.685 ***	0.682 ***
	(0.004)	(0.004)	(0.004)	(0.004)
SocFeedback	−0.059 ***	−0.054 ***	−0.058 ***	−0.055 ***
	(0.002)	(0.002)	(0.002)	(0.002)
Ind FE	Yes	Yes	Yes	Yes
Observations	2,567,730	2,567,730	2,567,730	2,567,730
N. of Users	175,114	175,114	175,114	175,114
Within $R^{2}$	0.336	0.336	0.336	0.336

Note: Standard errors in parentheses and clustered at the user level. *** p < 0.001.

Table 10. Criterion validity of self-efficacy dimensions.

	DV: ${ExDur}_{i, t + 1}$
			Separate Models				Joint Model
	(1) Baseline	(2) Overall SE	(3) MastExp	(4) VicExp	(5) SocPer	(6) PhyState	(7) SE Sources
SelfEfficacy		0.008 ***
		(0.000)
MastExp			0.025 ***				0.117 ***
			(0.002)				(0.006)
VicExp				0.052 ***			0.122 ***
				(0.002)			(0.003)
SocPer					0.003 ***		−0.231 ***
					(0.002)		(0.006)
PhyState						0.025 ***	0.025 ***
						(0.002)	(0.004)
age	−0.942 ***	−0.942 ***	−0.941 ***	−0.942 ***	−0.942 ***	−0.943 ***	−0.943 ***
	(0.004)	(0.004)	(0.004)	(0.004)	(0.004)	(0.004)	(0.004)
StrLen	0.011 ***	0.012 ***	0.012 ***	0.012 ***	0.011 ***	0.012 ***	0.011 ***
	(0.002)	(0.002)	(0.002)	(0.002)	(0.002)	(0.002)	(0.002)
DurChg	0.066 ***	0.066 ***	0.066 ***	0.066 ***	0.066 ***	0.066 ***	0.066 ***
	(0.001)	(0.001)	(0.001)	(0.001)	(0.001)	(0.001)	(0.001)
SocEngage	0.685 ***	0.683 ***	0.683 ***	0.681 ***	0.685 ***	0.684 ***	0.680 ***
	(0.004)	(0.004)	(0.004)	(0.004)	(0.004)	(0.004)	(0.004)
SocFeedback	−0.059 ***	−0.057 ***	−0.057 ***	−0.054 ***	−0.058 ***	−0.057 ***	−0.052 ***
	(0.002)	(0.002)	(0.002)	(0.002)	(0.002)	(0.002)	(0.002)
Ind FE	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Observations	2,567,730	2,567,730	2,567,730	2,567,730	2,567,730	2,567,730	2,567,730
N. of Users	175,114	175,114	175,114	175,114	175,114	175,114	175,114
Within $R^{2}$	0.336	0.336	0.336	0.336	0.336	0.336	0.337

Note: Standard errors in parentheses are clustered at the user level. Separate models capture gross associations, whereas the joint model captures conditional associations. *** p < 0.001.

Table 11. Robustness of WEPA estimates to textual sparsity.

	DV: ${ExDur}_{i, t + 1}$
Construct	Full Sample	ValidStrLen $\geq 5$	$0 <$ ValidStrLen $< 20$	ValidStrLen $\geq 20$
Goal-setting constructs
GoalCommit	0.069 ***	0.079 ***	0.042 ***	0.103 ***
	(0.002)	(0.002)	(0.002)	(0.005)
GoalSpec	0.019 ***	0.024 ***	0.013 ***	0.029 ***
	(0.002)	(0.002)	(0.002)	(0.004)
GoalDiff	−0.053 ***	−0.059 ***	−0.025 ***	−0.109 ***
	(0.002)	(0.002)	(0.002)	(0.004)
Self-efficacy dimensions
MastExp	0.117 ***	0.156 ***	0.055 ***	0.242 ***
	(0.006)	(0.007)	(0.006)	(0.015)
VicExp	0.122 ***	0.128 ***	0.081 ***	0.187 ***
	(0.003)	(0.004)	(0.003)	(0.007)
SocPer	−0.231 ***	−0.301 ***	−0.122 ***	−0.480 ***
	(0.006)	(0.007)	(0.005)	(0.013)
PhyState	0.025 ***	0.042 ***	0.009 *	0.068 ***
	(0.004)	(0.005)	(0.004)	(0.009)
Observations	2,567,730	2,298,356	1,185,853	1,381,877
Within $R^{2}$ for goal-setting models	0.336	0.342	0.260	0.368
Within $R^{2}$ for self-efficacy models	0.337	0.343	0.261	0.370
Controls	Yes	Yes	Yes	Yes
Individual FE	Yes	Yes	Yes	Yes

Note: Standard errors in parentheses are clustered at the user level. Goal-setting constructs are estimated separately, whereas self-efficacy dimensions are estimated jointly. All models control for DurChg, SocEngage, SocFeedback, age, and StrLen, with individual fixed effects. * p < 0.05, *** p < 0.001.

Table 12. Technical robustness across model specifications.

Model	Embedding Algorithm	Dimension	Window	Corpus	WEPA ( $ρ$ )	Dictionary ( $ρ$ )	Change vs. M1 (%)
M1 (Baseline)	GloVe	200	15	100%	0.890	0.658	—
M2	Word2Vec	200	15	100%	0.887	0.658	−0.3
M3	GloVe	100	15	100%	0.881	0.658	−1.0
M4	GloVe	200	10	100%	0.889	0.658	−0.1
M5	GloVe	200	15	50%	0.863	0.658	−3.0

Note: N = 518 human-annotated texts. The dictionary-based method is unaffected by model parameter settings.

Table 13. Robustness to anchor-word perturbation.

	DV: ${ExDur}_{i, t + 1}$
	R0 (Baseline)	R1 (Remove 1 Pair)	R2 (Remove 2 Pairs)	R3 (Remove 3 Pairs)
GoalCommit	0.069 ***	0.067 ***	0.071 ***	0.072 ***
	(0.002)	(0.002)	(0.002)	(0.002)
age	−0.945 ***	−0.945 ***	−0.945 ***	−0.945 ***
	(0.004)	(0.004)	(0.004)	(0.004)
StrLen	0.011 ***	0.011 ***	0.011 ***	0.012 ***
	(0.002)	(0.002)	(0.002)	(0.002)
DurChg	0.066 ***	0.066 ***	0.066 ***	0.066 ***
	(0.001)	(0.001)	(0.001)	(0.001)
SocEngage	0.681 ***	0.681 ***	0.681 ***	0.680 ***
	(0.004)	(0.004)	(0.004)	(0.004)
SocFeedback	−0.054 ***	−0.054 ***	−0.053 ***	−0.053 ***
	(0.002)	(0.002)	(0.002)	(0.002)
Ind FE	Yes	Yes	Yes	Yes
Observations	2,567,848	2,567,848	2,567,848	2,567,848
N. of Users	175,119	175,119	175,119	175,119
Within $R^{2}$	0.336	0.336	0.336	0.337

Note: Standard errors in parentheses and clustered at the user level. *** p < 0.001.

Table 14. Criterion validity across gender subsamples.

	DV: ${ExDur}_{i, t + 1}$
	(1) Full Sample	(2) Female	(3) Male
GoalCommit	0.069 ***	0.074 ***	0.059 ***
	(0.002)	(0.003)	(0.003)
$D u r C h g$	0.066 ***	0.071 ***	0.057 ***
	(0.001)	(0.001)	(0.002)
$S o c E n g a g e$	0.681 ***	0.661 ***	0.720 ***
	(0.004)	(0.005)	(0.007)
$S o c F e e d b a c k$	−0.054 ***	−0.049 ***	−0.063 ***
	(0.002)	(0.003)	(0.004)
$a g e$	−0.945 ***	−0.941 ***	−0.949 ***
	(0.004)	(0.005)	(0.006)
$S t r L e n$	0.011 ***	0.009 ***	0.018 ***
	(0.002)	(0.002)	(0.003)
Ind FE	Yes	Yes	Yes
Observations	2,567,730	1,609,225	958,505
N. of Users	175,114	107,600	67,514
Within $R^{2}$	0.336	0.332	0.344

Note: Standard errors in parentheses and clustered at the user level. *** p < 0.001.

Table 15. SIMEX estimates of measurement error.

Variable	Naive Estimate $β_{naive}$	Corrected Estimate $β_{simex}$	Bias (%)	Interpretation
MastExp	0.0249	0.0286	−12.9%	Underestimated
VicExp	0.0523	0.0589	−11.2%	Underestimated
SocPer	0.0035	0.0041	−14.6%	Underestimated
PhyState	0.0254	0.0294	−13.5%	Underestimated
GoalSpec	0.0185	0.0218	−15.1%	Underestimated
GoalDiff	−0.0526	−0.0594	−11.5%	Underestimated
GoalCommit	0.0689	0.0792	−13.0%	Underestimated
SelfEfficacy	0.0076	0.0087	−13.2%	Underestimated

Note: Bias (%)

= \frac{| β_{naive} | - | β_{simex} |}{| β_{simex} |} \times 100 %

. Negative values indicate attenuation, meaning that naive estimates understate the absolute magnitude of the corrected estimates.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Deng, X.; Li, Y. Measuring Psychological Constructs from Social Media Text Using the Word Embedding Projection Approach. Behav. Sci. 2026, 16, 762. https://doi.org/10.3390/bs16050762

AMA Style

Deng X, Li Y. Measuring Psychological Constructs from Social Media Text Using the Word Embedding Projection Approach. Behavioral Sciences. 2026; 16(5):762. https://doi.org/10.3390/bs16050762

Chicago/Turabian Style

Deng, Xudong, and Yijun Li. 2026. "Measuring Psychological Constructs from Social Media Text Using the Word Embedding Projection Approach" Behavioral Sciences 16, no. 5: 762. https://doi.org/10.3390/bs16050762

APA Style

Deng, X., & Li, Y. (2026). Measuring Psychological Constructs from Social Media Text Using the Word Embedding Projection Approach. Behavioral Sciences, 16(5), 762. https://doi.org/10.3390/bs16050762

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Measuring Psychological Constructs from Social Media Text Using the Word Embedding Projection Approach

Abstract

1. Introduction

2. Theoretical Foundations and WEPA Design

2.1. Text as a Basis for Psychological Measurement

2.2. Conceptual Space and Construct Representation

2.3. Measurement Stability and Longitudinal Comparability

2.4. Alternative Text-Based Measurement Approaches

2.5. Computational Implementation of WEPA

2.5.1. Core Computational Procedure

2.5.2. Implementation Steps

3. Validation Strategy, Data, and Measurement

3.1. Validation Strategy

3.2. Data and Sample

3.3. Variable Measurement

4. Validation Studies

4.1. Study I: Measurement Foundation and Benchmark Validation

4.1.1. Anchor-Word Development and Validation

4.1.2. Human Benchmark and Algorithm Comparison

4.1.3. Cross-Temporal Stability of Semantic Axes

4.2. Study II: Criterion Validity of Goal-Setting Constructs

4.2.1. Hypotheses

4.2.2. Model Specification

4.2.3. Results

4.3. Study III: Criterion Validity of Self-Efficacy Dimensions

4.3.1. Hypotheses

4.3.2. Model Specification

4.3.3. Results

4.4. Study IV: Exploratory Relative-Week Dynamics

5. Robustness Checks

5.1. Sensitivity to Textual Sparsity

5.2. Technical Robustness

5.3. Dictionary Robustness

5.4. Cross-Group Robustness

5.5. Estimation Robustness

6. Discussion

6.1. Main Findings

6.2. Contributions to Theory and Measurement

6.3. Practical Implications

6.4. Limitations and Future Research

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI