Next Article in Journal
Machine Learning for Triple-Entry Accounting: Enhancing Transparency and Oversight
Previous Article in Journal
Financial Auditing as an Effective Tool for Fraud Detection: A Systematic Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

From Headlines to Forecasts: Narrative Econometrics in Equity Markets

by
Davit Hayrapetyan
1,* and
Ruben Gevorgyan
2
1
Faculty of Philosophy and Psychology, Yerevan State University, Yerevan 0025, Armenia
2
Faculty of Economics and Management, Yerevan State University, Yerevan 0025, Armenia
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2025, 18(9), 524; https://doi.org/10.3390/jrfm18090524
Submission received: 12 August 2025 / Revised: 5 September 2025 / Accepted: 15 September 2025 / Published: 18 September 2025
(This article belongs to the Section Financial Markets)

Abstract

This study investigates whether firm-specific narratives extracted from the news add predictive content to monthly stock return models. Using bidirectional encoder representations from transformer-based topic modeling (BERTopic), we processed Microsoft (MSFT) news and constructed monthly narrative activations (binary presence and decay weighting). These narrative activations are used in autoregressive moving-average models with exogenous regressors (ARIMA-X) to analyze MSFT monthly log returns alongside the U.S. Economic Policy Uncertainty (EPU) index from February 2021 to March 2025. Decay models using a similarity-distilled BERT embedding yielded three significant narratives: Media and Public Perception (MPP) (β = 0.0128, p = 0.002), Currency and Macro Environment (CME) (β = −0.0143, p < 0.001), and Tech and Semiconductor Ecosystem (TSE) (β = −0.0606, p = 0.014). Binary activation identifies reputational shocks: the Media and Public Perception (MPP) indicator predicts lower returns at one- and two-month lags (β = −0.0758, p = 0.043; β = −0.1048, p = 0.007). A likelihood-ratio test comparing ARIMA-X models with narrative regressors to a baseline ARIMA (no narratives) rejects the null hypothesis that narratives add no improvement in fit (p < 0.01). Firm-level narratives enhance monthly forecasts beyond conventional predictors; decay activation and similarity-distilled embeddings perform best. Demonstrated on Microsoft as a proof of concept, the ticker-agnostic design scales to multiple firms and sectors, contingent on sufficient firm-tagged news coverage for external validity.

Graphical Abstract

1. Introduction

1.1. Motivation and Positioning: Firm-Level Narrative Econometrics in Modern Markets

Financial market information processing has been transformed by digital communication, algorithmic trading, and high-frequency decision-making, reshaping how news enters prices (Deveikyte et al., 2022; Tetlock, 2007). Traditional asset-pricing models, which are premised on rapid, fully rational updates, do not fully account for persistent anomalies and heterogeneous reactions. This idea motivates a behavioral finance lens: in practice, investors rely on heuristics and affect under uncertainty, and narratives operate as interpretive frames that reduce ambiguity, guide expectations, and influence choices beyond simple risk–return calculus. Classical work distinguishes measurable risk from hard-to-quantify uncertainty (Bernanke, 1983; Bloom, 2009; Knight, 1921). Macro indices, such as the Economic Policy Uncertainty (EPU) index and proxy fluctuations in uncertainty and sentiment but only partially capture how concrete stories shape belief formation. Accordingly, we treat narrative variables as complementary information and evaluate their incremental contribution relative to the baselines that already include macro/uncertainty control.
Shiller’s “narrative economics” (Shiller, 2017, 2019) argues that widely shared stories propagate like epidemics, shaping sentiments and behaviors. Recent NLP advances, especially transformer-based embeddings and BERT-based topic modeling (BERTopic), make such stories measurable at scale and trackable over time at the firm level (Grootendorst, 2022). At the market level, Blanqué et al. (2022) show that systematically measured economic and geopolitical narratives improve the prediction of S&P 500 dynamics, motivating firm-specific analysis.
We address this gap by proposing a reusable pipeline that extracts company-specific narratives from tagged news (transformer embeddings + BERTopic), converts them into monthly activations—both decay-weighted (persistence) and binary (spikes)—and augments standard stock-level ARIMA-X baselines by adding narrative activations as exogenous regressors. Using Microsoft (MSFT) as a news-dense proof of concept, we ask whether firm narratives provide incremental predictive content conditional on standard controls, which activation design is most informative at the monthly horizon, and how encoder choice affects topic structure and downstream predictability. Our forecasting framework follows the Box–Jenkins ARIMA tradition, which remains widely used in economics and the social sciences, including recent epidemiological forecasting during COVID-19 (Das, 2020), thereby grounding our narrative variables within a well-established methodology.
Unlike prior narrative-economics studies that build macro or societal narratives to explain aggregate equity indices (e.g., Blanqué et al., 2022), we identify and quantify firm-level narratives for Microsoft as proof of concept. This study advances narrative econometrics from market-wide storylines to firm-specific, reputation- and technology-oriented narratives that measurably improve stock-level forecasts. Building on this positioning, we formalize the behavioral conduit through which firm-level narratives travel from attention to portfolio actions.

1.2. Behavioral Mechanism: From Narrative to Investment Action

Narratives influence markets through a three-stage process.
  • Attention and salience. Repetition and coherence push themes over attention thresholds, concentrating limited attention on a small set of frames.
  • Belief updating and sentiment analysis. Frames act as cognitive “shortcuts,” shaping cash flow expectations (growth stories, product cycles) and risk perceptions (legal, regulatory, security). This yields shifts in (i) Δ Et[CF] and (ii) the required returns via risk premia/left tail risk.
  • Internalization through actions. As ideas spread through analysts, buy-side notes, and media coverage, investors adjust their portfolios every month to align with their views on market trends.
These stages translate into tractable monthly predictions, which we distill into testable implications.

1.3. Testable Implications (Consistent with a Monthly Horizon)

  • Growth narratives (e.g., AI-product) → higher Et[CF], positive drift, and moderate rise in idiosyncratic volatility during adoption.
  • Constraint/legal narratives (e.g., antitrust) → higher downside/skew perceptions, tighter discount rates, muted or negative drift, steeper downside option skew.
  • Security/reputation narratives → transient risk-premium spikes, elevated idiosyncratic volatility, weaker multiples.
Our empirical design targets the persistent component of this mechanism; event window dynamics are complementary (see the Section 6.1.3, Section 10.2 and Section 10.3). To anchor these implications, we situate our approach within the narrative economics, investor attention, and text-based asset pricing literature.

2. Literature Review and Theoretical Background

2.1. Narrative Economics and Investor Expectations

Economic narratives, which are linked short stories that cohere into broader themes, shape market dynamics by structuring perceptions and actions. As behavioral scripts, narratives influence social norms and economic choices (Schank & Abelson, 1977) and can propagate in crisis episodes, shifting sentiments and behaviors (Shiller, 2019). Behavioral finance shows that investors rely on heuristics under uncertainty (Kahneman & Tversky, 1979) and that stories affect decisions by framing beliefs and emotions (Akerlof & Shiller, 2010). Conviction Narrative Theory formalizes this mechanism: investors use narratives to navigate radical uncertainty and commit to their positions (Tuckett & Nikolic, 2017). In the following sections, we consider narratives as complements to standard risk-based predictors and not substitutes. Having outlined why narratives matter, we turn to how they can be rendered measurable in financial settings in the following section.

2.2. Quantifying Narratives in Financial Markets

At scale, narratives interact with attention and memory, shaping expectations and their flow. Market-level evidence links measured economic and geopolitical themes to aggregate outcomes (Blanqué et al., 2022). Attention is selective; investors sometimes down-weight threatening information (the ostrich effect), altering its diffusion and salience (Karlsson et al., 2009). Narratives operate at personal, social, and collective levels, and shared storylines coordinate behavior under uncertainty (Roos & Reccius, 2023). Empirically, narrative content moves emotions such as fear and optimism, with market impact (Taffler et al., 2024). From a valuation perspective, narrative breaks, transformations, and adjustments shift expectations and risk premiums with varying degrees of intensity (Damodaran, 2017). Empirically, the question is whether narrative measures improve models that are conditional on fundamentals and uncertainty proxies (e.g., EPU). This motivates modern NLP pipelines, especially embedding-based topic models, which can operationalize narrative structures.

2.3. Natural Language Processing and Narrative Modeling

Recent advances in NLP have enabled the systematic measurement of financial narratives using transformer-based embeddings and embedding-driven topic models, such as BERTopic (Grootendorst, 2022). Beyond genre studies demonstrating machine detection of narrativity (Piper et al., 2021), finance applications show that narrative signals are informative for predicting stock returns. Narratives matter, especially when macroeconomic uncertainty is high (Mangee, 2021). Emotionally charged stories are correlated with asset value changes (Bhargava et al., 2023). Sentiment-based narrative intensity is related to volatility and trading strategies (Blanqué et al., 2022). These developments motivate our firm-level approach of using transformer embeddings with BERTopic to discover company-specific themes and construct monthly narrative activations for econometric testing, estimated alongside conventional predictors to assess their incremental values. Concurrently, forecasting has shifted from lexicons to contextual encoders, which we leverage in our design.

2.4. Recent NLP Advances for Financial Forecasting

Transformer-based models have accelerated text-driven financial forecasting. Domain-specialized encoders such as FinBERT adapt BERT to financial corpora and improve performance on sentiment and stance classification used in return prediction and the analysis of regulatory filings and company news. At the model scale, BloombergGPT, a 50B-parameter finance LLM, demonstrated robust gains on finance benchmarks, illustrating the value of domain pre-training for entity-rich, numerically dense text. Recent surveys have synthesized these trends across forecasting, risk, and compliance tasks, underscoring the shift from lexicons to contextual embeddings and large-scale modeling (Wu et al., 2023; Du et al., 2025). These tools invite firm-level implementation, but their portability and scope require comparative grounding.

2.5. Comparative Grounding and External Validity

While our empirical demonstration centers on Microsoft, the pipeline is ticker-agnostic and designed for extension to other companies and sectors in the future. Prior cross-firm work shows that firm-specific textual indicators predict both fundamentals and returns: the tone of individual-firm news forecasts earnings and stock returns across S&P 500 constituents (Tetlock et al., 2008); 10-K tone is priced around filing dates in broad cross-sections when finance-specific lexicons are used (Loughran & McDonald, 2011; Jegadeesh & Wu, 2013); and earnings-call text on political risk measures firm-level exposures with real and market effects across industries (Hassan et al., 2019). Related evidence shows that news categories and tone explain return variation across many stocks, further supporting portability (Boudoukh et al., 2013). Accordingly, we view our MSFT study as a replicable case and outline a panel extension for future work (multi-firm implementation with sector fixed effects and topic-by-sector interactions to test heterogeneity). Simultaneously, topic pipelines face familiar concerns regarding construct validity, coverage, and identification, which we acknowledge and address.

2.6. Limitations, Peer Criticisms, and Our Positioning

Narrative/topic pipelines face well-known concerns: (i) construct validity—topic coherence correlates with human interpretability but does not guarantee reliability/stability across resamples or hyperparameters; (ii) dataset bias—provider coverage, language filters, and paywalls can skew the news sample; (iii) econometric identification—text features are typically predictors, not instruments, so causal claims must be avoided; and (iv) temporal drift—domain shifts can degrade embeddings over time. We address these by (a) using embedding-based BERTopic with fixed seeds and pre-specified Uniform Manifold Approximation and Projection (UMAP) (McInnes et al., 2018) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) (Campello et al., 2015) settings; (b) releasing topic artifacts (IDs, keywords, example headlines) and code for auditability; (c) evaluating narratives incrementally within ARIMA with exogenous regressors at a monthly horizon aligned to standard uncertainty controls; and (d) treating non-significant results as null rather than over-interpreting (Grootendorst, 2022). These limits delineate the gap targeted by our study and motivate the specific objectives we pursue.

3. Research Gaps and Objectives

Despite advances in narrative economics and text-based finance, there are still key gaps. Most evidence operationalizes market-level narratives for aggregate indices rather than constructing systematic firm-level time series; identification often relies on sentiment/lexicon counts instead of unsupervised, embedding-based discovery with transparent topic governance (objective retention/merges and deterministic labels tied to stable IDs); empirical designs emphasize event-window/daily reactions, leaving the monthly, slow-moving narrative channel underexplored; and econometric integrations rarely augment standard Box–Jenkins (ARIMA/ARIMA-X) baselines with narratives as exogenous covariates evaluated for incremental value conditional on macro/uncertainty controls (e.g., EPU). Comparative evidence on design choices, such as encoder architecture (e.g., AMiniLM vs. PMPNet) and activation rules (binary vs. decay), is limited, and practical, reproducible pipelines with clear operating guidelines are scarce. Accordingly, this study (i) builds a reusable firm-level pipeline (transformer embeddings with BERT-based topic modeling) that converts company-tagged news into monthly narrative activations; (ii) augments ARIMA-X baselines to test the incremental predictive contribution of narratives conditional on standard controls; (iii) compares encoder and activation designs and traces their implications for topic structure, persistence, and fit; and (iv) advances transparency and practice via an explicit governance protocol and concise guidance on how these signals can inform monitoring and risk oversight. To sharpen these objectives, we state the focal research question and its rationale.

4. Research Problem

This study examines whether financial stories taken from the news using advanced language processing methods can better explain stock price changes than traditional measures of economic uncertainty, such as the Economic Policy Uncertainty (EPU) index. It also aims to show that in addition to general economic discussions, news stories often include details about companies, such as innovation, leadership, brand strategy, or corporate events, which are important for understanding financial models. We now map this problem into a conceptual architecture that links narrative activations to returns through expectations and risk-premium channels.

5. Conceptual Model and Interlinkages

Narratives are considered complements to standard risk-based predictors of asset returns. Let Nt denote the vector of monthly topic activations (binary and decay) extracted from news articles tagged with firms. From Nt, we form narrative indices used in the estimation. Uncertainty controls (e.g., EPU) operate in parallel and can moderate narrative effects. The implied return identity is
r t f Δ E C F growth channel   g ( Δ R P ) risk - premium channel + n o i s e
We estimate the corresponding reduced-form with an ARIMA-X baseline as follows:
r t = μ + φ L r t 1 + β N t N t + β E P U E P U t + ε t
where φ (L) is the ARMA lag operator and εt is the innovation term. The coefficients are interpreted as percentage points (pp). This structure clarifies (a) what each indicator measures, (b) where it enters the behavioral mechanism, and (c) how it is implemented in the model.
With the mechanism specified, we describe the data, extraction pipeline, and econometric design used for testing.

6. Materials and Methods

6.1. Conceptual and Empirical Justification of the Methodology

The dataset comprises the monthly stock log returns of Microsoft Corporation (MSFT), the U.S. Monthly Economic Policy Uncertainty (EPU) index created by Baker et al. (2016), and a firm-tagged financial news corpus spanning 1 February 2021–31 March 2025 (T = 50). The firm-tagged corpus was pulled via the Marketaux API, which tracks over 5000 news sources globally (30+ languages; 80+ markets) and provides per-article metadata and entity tags; timestamps are in UTC. We queried the /v1/news/all endpoint with symbols = MSFT, filter_entities = true (returns only MSFT entities for each hit), must_have_entities = true, language = en, group_similar = true, and monthly published_after/published_before bounds, using limit/page pagination as specified by the API. From 1 February 2021 to 31 March 2025, we collected 43,848 unique, rigorously pre-processed items covering earnings, regulation, products, and macro context relevant to Microsoft’s valuation and investor sentiments. Articles were de-duplicated using the API’s similarity grouping plus an internal near-duplicate filter (earliest retained); non-English items and off-topic content were excluded from the analysis. We preserved each item’s source domain using Marketaux’s source endpoint to facilitate domain-level auditing and released a complete domain list (with counts) with the code.

6.1.1. Inclusion/Exclusion Criteria

Included were English-language articles with an identified MSFT entity and valid timestamps from mainstream outlets and reputable aggregators, as returned by Marketaux. Non-English items, near-duplicates, and off-topic pieces lacking the MSFT entity after filtering were excluded. The news series returns and EPU were aggregated monthly to align frequencies and minimize microstructure noise.

6.1.2. Coverage Considerations

Using a third-party aggregator can introduce sampling bias (English-language emphasis, provider-specific source coverage, and under-representation of paywalled content). We mitigate these concerns by (i) publishing the full domain-level source list and counts, (ii) applying transparent deterministic filters, and (iii) evaluating narrative variables conditional on standard controls (e.g., EPU) in the ARIMA-X framework.
We start on 1 February 2021 to capture post-COVID recovery and the onset of a new U.S. administration and to span multiple regimes—monetary tightening, the Russia–Ukraine war, and tech-sector volatility—providing variation in uncertainty and narratives. The February 1 start also avoids a partial initial month and provides a brief warm-up for the decay weights and ARIMA lags. The 31 March 2025 cut-off is the last fully observed month (no look-ahead).
We used BERTopic to identify narrative signals and combined them into monthly topic distributions, creating indicators that capture both ongoing narrative trends and short-lived spikes, and converted qualitative news into structured inputs for econometric modeling. Narrative activations enter as exogenous regressors in the Box–Jenkins ARIMA-X baselines. All hyperparameters and preprocessing rules were specified generically (no ticker-specific lexicons), making the pipeline ticker-agnostic and directly extendable to other firms and time horizons. All variables were pre-screened for appropriateness (descriptive statistics and Augmented Dickey–Fuller (ADF) and Kwiatkowski–Phillips–Schmidt–Shin (KPSS) tests): returns are stationary; narrative activations/indices are bounded and stationary; EPU enters in levels; full screening results are available in the open repository.

6.1.3. Temporal Aggregation Rationale (Monthly Design)

We adopt a monthly horizon to capture the persistent component of firm-level narratives, which evolve through repetition and decay rather than single-day sentiment spikes. Monthly sampling also reduces timestamp misalignment between intraday news and market close, as well as microstructure noise and short-lived overreactions or reversals. This naturally aligns with the monthly frequency of standard uncertainty/sentiment controls and maintains the parsimony of the ARIMA-X specification in a single-firm setting. The pipeline itself is frequency-agnostic: the embedding → BERTopic → decay/binary activation workflow, and all hyperparameters are specified generically so that the procedure can be replicated at weekly or daily horizons using rolling windows. In the Discussion section, we outline higher-frequency extensions that focus on event-driven dynamics. Monthly activation thus proxies the internalization stage—the period over which attention, belief updates, and institutional frictions translate narrative salience into portfolio rebalancing and sentiment. Given these data and monthly alignment choices, BERTopic provides an unsupervised backbone for narrative discovery.

6.2. Topic Modeling with BERTopic

Narrative extraction began using BERTopic, a new tool for finding topics that uses advanced techniques to group similar ideas and make them easy to understand over time. Unlike traditional models such as Latent Dirichlet Allocation (LDA), BERTopic leverages BERT-based sentence embedding to capture the semantic content of news articles.
For each document d i , compute its embedding e i
e i =   B E R T d i ,   e i   R 384
Embeddings are reduced from a high-dimensional space to a low-dimensional space:
y i =   U M A P e i ,   y i   R 2
UMAP aims to preserve the local manifold structure and can be formulated by minimizing the cross-entropy loss as follows:
L U M A P =   i     j   w i j · log w i j w i j +   1 +   1     w i j · log 1 w i j +   1
where w i j is the similarity measure derived from local neighbourhoods of the embeddings.
The clusters were identified using the density-based spatial clustering of applications with noise (HDBSCAN) method. HDBSCAN defines clusters as stably connected components with various density thresholds. Cluster extraction can be formally expressed by the stability measure S(C) for cluster C as follows:
S C = x i C     λ x i   λ b i r t h C
where
  • λ x i is the density level at which data point x i leaves the cluster;
  • λ b i r t h C denotes the density at which Cluster C first appears.
Clusters with the highest stability S(C) were selected.
Using UMAP to reduce dimensions and HDBSCAN for clustering without supervision, BERTopic creates clear topic groups without the need to set the number of topics in advance, which is important for analyzing new story patterns in large text collections.
All steps were pre-specified with fixed seeds and hyperparameters; the exact UMAP/HDBSCAN/BERTopic configs, topic-governance thresholds, and ARIMA-X scripts were released in a public repository (see Data Availability). Because representation quality shapes clusters, we compare sentence encoders and report intrinsic and extrinsic checks on them.

6.3. Sentence Encoders, Domain Adaptation, and Model Choice

6.3.1. Alternatives

We evaluated four Sentence-BERT (SBERT)-family encoders for narrative discovery: paraphrase-MPNet-base-v2 (PMPNet), all-MiniLM-L6-v2 (AMiniLM), paraphrase-MiniLM-L12-v2 (PMiniLM), and multi-qa-MiniLM-L6-cos-v1 (MQMiniLM). Finance-specific models (e.g., FinBERT variants) are primarily optimized for sentiment classification tasks. Without task-specific fine-tuning, they tend to cluster broad tones rather than semantically coherent, firm-level narratives. Therefore, we prioritized high-quality general semantic encoders that transfer information across heterogeneous news sources.

6.3.2. Domain Adaptation Safeguards

Financial text features tickers, product code names, numerals, and hyphenated multiword terms. We preserved named entities and multi-word expressions, encoded headline + deck (≤256 tokens), clustered in cosine space with ℓ2-normalized embeddings, and represented topics with class-based term frequency-inverse document frequency (c-TF-IDF) so that domain terms dominated even when globally rare.

6.3.3. Intrinsic Validation (Clustering Quality)

Using identical BERTopic settings (UMAP metric = cosine, n_neighbors = 15, min_dist = 0.00; HDBSCAN min_cluster_size = 25, min_samples = 5; seed = 42), we obtain the following (see Table 1):

6.3.4. Reason for Selecting These Metrics

In BERTopic, c_v evaluates the semantic consistency of top keywords and is widely used as a proxy for human interpretability, while the silhouette evaluates separation in the embedding manifold and flags over-fragmentation/merging. Perplexity-style cross-validation is designed for probabilistic LDA and is not diagnostic for c-TF-IDF topics; prior work shows that perplexity can diverge from human judgments, whereas coherence tracks them more closely. Accordingly, we ranked the encoders by c_v, using a silhouette as a guardrail against degenerate clustering.

6.3.5. Extrinsic Validation

The topics were converted to monthly series via binary and decay activation. In ARIMA-X comparisons against a no-narratives baseline, AMiniLM yields fewer, more persistent topics with lower collinearity, producing larger ΔAIC improvements and more 5% significant narrative families at the monthly horizon. PMPNet offers finer semantic granularity useful for diagnostics/labels but tends to net out within a month. Therefore, we used AMiniLM activations for the main results and reported PMPNet for a robustness comparison.

6.3.6. Evaluation and Sensitivity Analysis

Beyond c_v/Silhouette, we report (i) encoder sensitivity, (ii) activation-rule sensitivity (5% vs. 10% binary threshold; h ∈ {1, 2, 3} months for decay), and (iii) extrinsic fit (ΔAIC, significant families) relative to ARIMA baselines. To support interpretability, we published the ID, deterministic label, and top-n keywords, enabling independent audit; estimation uses IDs/activations and not labels. The full configs/seeds are listed in our repository. Figure 1 summarizes the proposed end-to-end workflow of the model. Two representation-level properties further explain the observed differences in the models and their downstream predictability.

6.3.7. Model Selection and Null Results Policy

For each encoder–activation design, we selected ARIMA-X orders (p, q) using AIC/BIC/HQIC, then assessed the incremental value of the narrative block via Holm-adjusted individual p-values, a joint Wald test, ΔAIC relative to a no-narratives baseline, and residual variance. The models were further validated with nested likelihood ratio tests against ARIMA and ARIMA + EPU baselines and with residual diagnostics (Ljung–Box, Autoregressive Conditional Heteroskedasticity (ARCH), Jarque–Bera). When the narrative block is non-significant, we retain and report the best model as a null result to avoid selective reporting and inform the extraction-design choice. With T = 50 monthly observations, hold-out or walk-forward splits materially shrink the estimation window and inflate the variance of accuracy estimates; accordingly, we interpret the findings as incremental, in-sample associations.
  • Dimensionality and hubness. AMiniLM’s 384-dimensional space exhibits less high-dimensional hubness/anisotropy than PMPNet’s higher-dimensional space, aiding density-based separation and reducing spurious topic-bridges.
  • Granularity and synonymy. As a distilled similarity-oriented model, AMiniLM is less sensitive to minor word-order/phrase variants, which helps consolidate technology/product synonyms (e.g., “Copilot,” “GenAI,” “GPT-based features”) into a single, persistent narrative rather than several micro-topics.
These properties translate into smoother narrative activation series (more persistence, less fragmentation) and lower cross-topic collinearity, which are favorable for monthly ARIMA-X estimation. For transparency, we kept the encoder settings and all clustering parameters fixed across models; observed performance differences, therefore, reflect the encoder representation rather than tuning differences.

6.4. Topic Governance and Labeling Protocol

  • Unsupervised extraction of features. Topics are discovered with BERTopic (transformer embeddings) using fixed random seeds and pre-specified hyperparameters; no firm-specific heuristics are used.
  • Objective retention. A topic is retained if it meets (i) coherence ≥ τ and (ii) monthly activation prevalence ≥ p% of months. Highly redundant topics are filtered when the monthly activation series correlation with an already retained topic exceeds |ρ| ≥ r.
  • Deterministic labeling. Each retained topic receives a deterministic label built from its top c-TF-IDF words plus a concise domain tag (e.g., “AI Strategy and Ecosystem”), appended to the topic ID (e.g., NP01,—MSFT Stock and Market Performance).
  • Transparency. We provide an appendix table with ID–label mapping and top-k keywords to enable independent auditing or relabeling.
Equipped with a transparent topic-governance protocol, we present the discovered narrative families and their dynamics.

7. Financial Narrative Structures and Temporal Dynamics

7.1. Terminology and Narrative Taxonomy

To ensure uniform usage, we adopted canonical labels for firm-level narrative families. These labels are expository and map one-to-one to numeric topic IDs used in estimation (NPxx/NAxx). At the first mention, we introduce the full label and abbreviation; thereafter, we use the abbreviation.
MSFT Stock and Market Performance (SMP). Coverage was centered on Microsoft’s share price, returns, index moves, and immediate market reaction to events. Excludes trading signal commentary (see TTMS). Topic IDs: NP01, NA01.
Industry Trends and Cloud Market (ITC). Sectoral/cloud developments (Azure, competition with AWS/GCP), market share, adoption, pricing, partner ecosystems, and demand signals. Excludes upstream hardware supply (see TSE) and formal financial disclosures (see FRE/EFC). Topic IDs: NP02.
Security and Conflict (SCF). Cybersecurity incidents, outages, vulnerabilities, regulatory/sovereign threats, and geopolitical conflicts with operational or demand implications. Topic IDs: NP03.
AI Strategy and Ecosystem (AISE). Microsoft’s AI roadmaps, platform and product strategies (such as Copilot), partnerships, and competitive positioning within the AI stack and search landscape are outlined. If the primary angle is hardware supply, code TSE as primary and AISE as secondary, and vice versa. Topic IDs: NP04.
Earnings and Financial Communication (EFC). Management’s interpretive and forward-looking communication to investors outside the formal report package: earnings call Q&A, guidance framing, key performance indicator (KPI) narratives, and IR messaging. Numbers and statutory disclosures remain under the FRE. Topic IDs: NP05.
Media and Public Perception (MPP). Third-party reports and opinions that shape reputation/brand and broad stakeholder sentiment (TV/web/print). Not primarily about price recap (SMP) or trading signals (TTMS). Topic IDs: NP06, NA04.
Currency and Macro Environment (CME). Foreign exchange (FX) rates (USD strength/weakness), inflation, growth, and macro market sessions influence translation effects or demand. Topic ID: NP07, NA03.
Financial Reports and Earnings (FRE). Formal, scheduled financial disclosures and their direct coverage: quarterly/annual results, earnings per share (EPS)/revenue, guidance figures, and statutory reports/press releases. The interpretive spin during calls is the EFC. Topic IDs: NP08, NA05.
Tech and Semiconductor Ecosystem (TSE). Upstream hardware and compute supply chain (CPUs/GPUs/accelerators, foundry capacity, platforms from AMD/Intel/etc.) that condition Microsoft’s product delivery. If the main frame is AI product strategy, code AISE primary and TSE secondary. Topic IDs: NA02.
Technical Trading and Market Sentiment (TTMS). Trading-oriented takes and indicators (patterns, candlesticks, momentum/relative strength index (RSI), short-term “buy/sell” chatter) and near-term sentiment shifts framed as signals. Factual price recaps belong to SMP. Topic IDs: NA06.
Cloud Tools and AI Platforms (CTAIP). Cloud services and developer ecosystems that enable building, deploying, and scaling applications, databases, and runtimes (e.g., MongoDB Atlas), Azure services (Azure AI/OpenAI, AKS/Kubernetes), integrations, and partner solutions. Covers product releases, feature updates, partnerships, reference architectures, and developer adoption/use cases. Not primarily about semiconductor supply chain (TSE), formal earnings/filings (FRE), or market price recaps/trading signals (SMP/TTMS). Topic IDs: NA07.
First, we document the structures obtained using the PMPNet embeddings.

7.2. Narratives Identified by PMPNet Embeddings

To extract clear stories from Microsoft Financial News, we used the PMPNet sentence-embedding model along with BERTopic for unsupervised clustering. From the first grouping, we found fourteen different topics, eight of which were retained because they were easy to understand, related to Microsoft’s financial and technological situation, and had favorable coherence scores. We excluded the remaining six because of thematic misalignment or an off-topic nature (see Appendix A Table A1).
The first retained topic, SMP, emerged as the most persistent and broadly relevant narrative in the study. It encompassed high-frequency terms such as “microsoft,” “nasdaq,” “msft,” “market,” and “earnings,” and served as a foundational narrative reflecting core investor discourse surrounding Microsoft’s stock behavior and performance within the broader technology sector. Although its coherence score was moderate (0.456), its document frequency and continuous presence throughout the observation period highlighted its structural centrality. In contrast, the narrative identified as ITC achieved a much higher coherence score (0.878) and grouped content related to global economic shifts and the growth of the cloud computing industry than the other narratives. This topic reflects Microsoft’s strategic alignment with macroeconomic digitalization trends and its competitive positioning in cloud services, with peak visibility in March 2023.
Another important narrative centered on cybersecurity risks and geopolitical development was titled SCF. This topic included references to cyber threats, international tensions (e.g., Ukraine and Russia), and operational risks, reflecting the intersection of digital security and global instability. Although active across all months, salience peaked in July 2024, followed by a sharp decay (1.299), highlighting the event-driven and episodic nature of the topic. In parallel, the narrative labeled AISE captured Microsoft’s ongoing involvement in artificial intelligence, including its collaborations with OpenAI and strategic positioning against competitors such as Google. With a coherence score of 0.594 and a moderate decay rate (0.157), this narrative peaked in November 2023 and demonstrated both structural and episodic features tied to AI-related product launches and announcements.
We also identified narratives related to investor communication, particularly EFC, which comprises terminology from quarterly calls, conference transcripts, and investor briefings. Although the coherence score was lower (0.228), its activity followed a pattern of earnings seasons and showed a faster drop in interest at 0.362, indicating short bursts of media attention at specific points in time. A related but more externally facing narrative, MPP, was active for 42 months, peaking in April 2022. This reflects Microsoft’s public image, which is shaped by media coverage, leadership statements, and high-visibility announcements. Although they appear less often in documents, the story’s clarity (0.752) and decline rate (0.172) make it an important part of how the public views the company, particularly in their responses to events.
In the macroeconomic domain, CME narratives emerged in October 2021 and remained active until late 2024 (see Figure 2). With the highest coherence score for all retained topics (0.952), the discussions were related to forex fluctuations, global sessions, and market-wide financial dynamics. Although not specific to Microsoft, its inclusion reflects broader contextual forces that modulate stock valuation. The narrative peaked in July 2023 and had a relatively steep decay rate of 0.397, consistent with short-lived surges in attention during periods of monetary policy shifts or currency volatility. Finally, the FRE narrative exhibited a unique dynamic, beginning in February 2022 and peaking immediately, but with an estimated negative decay rate (−0.035), implying a gradual resurgence of interest over time. This counterintuitive trend likely reflects an increasing focus on detailed earnings breakdowns and secondary financial disclosures rather than headline results. We then contrast these with AMiniLM, highlighting the differences in persistence and collinearity relevant for monthly modeling.

7.3. Narratives Identified by AMiniLM Embeddings

This section presents the results of the narrative identification and temporal analysis based on the application of the AMiniLM embedding model to financial news related to Microsoft Corporation. We extracted seven thematically coherent and semantically robust narratives. These narratives were selected based on topic coherence scores and substantive relevance to Microsoft’s financial and technological discourse. Irrelevant or diffuse topics were excluded to enhance analytic precision (see Appendix A Table A2).
The central narrative, SMP, was the most persistent and foundational theme, capturing Microsoft’s ongoing relevance to the financial media landscape. This theme consistently linked stock performance to broader trends in technology and investor sentiment, achieving a coherence score of 0.446. Another high-quality narrative, TSE, demonstrated strong thematic clarity (coherence = 0.769) and captured the dynamics between Microsoft and hardware firms, such as AMD and Intel, particularly in the context of AI and processing infrastructure.
Narrative CME shows the state of the global economy, especially the trends in foreign exchange and inflation, which affect Microsoft’s stock prices indirectly. This theme had a moderate coherence value of 0.668. However, MPP had the highest coherence (0.921), covering conversations about how Microsoft is seen in the public eye and how it is portrayed in the news and broadcast media. Additional narratives include FRE (coherence = 0.649), which emphasizes quarterly disclosures, CTAIP (coherence = 0.680), which traces the evolution of Microsoft’s cloud-based ecosystems and developer tools, and finally, TTMS (coherence = 0.916). The binary activation matrices created using this method were used to determine the importance of stories over time in both the AMiniLM and PMPNet models, which helped compare the performance of the models under different activation conditions.
Beyond thematic structure, each narrative exhibits distinct temporal dynamics, including emergence, persistence, and decay (see Figure 3). The SMP narrative was active across the entire observation window (February 2021–March 2025), with a decay rate of 0.001, indicating minimal decline and continuous relevance. The TSE narrative followed a similar trajectory with high durability (49 months) but showed greater volatility (decay = 0.150), peaking around major industry events.
The CME narrative remained active for 40 months, with a relatively low decay rate (0.085), underscoring the steady background role of macroeconomic conditions. In contrast, MPP peaked sharply in April 2022 before declining more rapidly (decay = 0.141), reflecting the event-driven nature of media cycles. The FRE exhibits a flat frequency with no measurable decay, mirroring the regular cadence of quarterly reporting. TTMS manifested as one-time spikes observed only in March 2021, suggesting a reactive narrative tied to brief episodes of speculative activity.
Finally, CTAIP emerged later in the timeline, active for 27 months, with a faster decay rate (0.191) but growing relevance during the AI-driven narratives of the 2023–2024 period. Overall, these patterns show that financial stories vary not only in what they say but also in how long they last and how quickly they spread, which are important features for including them in time-series models.
After characterizing narrative dynamics, we connect their continuous diffusion and event-spike aspects using decay and binary activation.

8. Integrating Narrative Intensity and Regime Dynamics: From Decay-Based Modeling to Binary Activation

While examining how investor interest in financial discussions evolves is advantageous, it is also crucial to detect clear shifts in the importance of narratives, potential indicators of changing market conditions, communication strategies, and unexpected events. To address this, we added a simple method to the current decay-based system that monitors how specific events spark narratives, thereby revealing changes in the situation. This two-part approach, which combines exponential decay and binary activation, allows for a more profound understanding of both the ongoing and temporary story effects in financial time series.
As detailed in Section 7, decay patterns summarize slow-moving narrative drift; here, we pair them with binary spikes to flag concentrated episodes. These include structural themes such as stock performance and cloud market expansion, as well as episodic topics such as cybersecurity breaches, earnings announcements, and AI initiatives. The modeled exponential decay captures the strength, decline, and temporal influence of the attention of investors.
However, not all narratives followed a smooth diffusion process in the early stages. Some emerge as sharp spikes driven by media bursts, strategic disclosures, or crises. To capture this, we introduced a binary thresholding framework that flags high-intensity narrative periods based on the monthly share of the total narrative frequency. By applying 5% and 10% activation thresholds, we identified periods when a narrative was more prominent than expected, signaling increased investor attention and market responsiveness.
Typically, major announcements or external shocks align with the 10% threshold, highlighting months dominated by focused storytelling. In contrast, the 5% threshold captures smaller but meaningful changes in discussion intensity, supporting better medium-term tracking of narrative impact.
The resulting binary activation matrices were used with both the AMiniLM and PMPNet models, which allowed us to observe how each responded to different themes, conversation styles, and key events in various activation situations. We operationalize this binary layer below, reporting the activation periods under each encoder and threshold.

9. Narratives’ Binary Activation

9.1. Narratives’ Binary Activation by PMPNet

For the PMPNet model, a 10% threshold yielded a selective set of high-impact narrative activations (see Figure 4).
The MPP narrative was among the earliest to be activated, with notable concentrations in October 2021 and throughout 2022, consistent with elevated public discourse on Microsoft’s messaging and visibility. FRE narratives showed distinct activations corresponding to quarterly cycles (February 2022, February–May 2023, and May 2024). However, only certain disclosures attract above-threshold attention. The CME narrative was activated in April and July 2023, indicating temporally concentrated responses to macroeconomic shifts. A single, highly focused activation of the AISE narrative occurred in November 2023, plausibly reflecting a major announcement or collaboration with OpenAI. No activations were recorded at this level for MSFT SMP, ITC, and EFC narratives, indicating that these narratives were consistently present but lacked sharp monthly concentrations.
At a 5% threshold, the paraphrase model revealed a richer and more textured narrative timeline. Seven months from 2021 to mid-2024, including multiple policy-sensitive periods such as October 2021 and April 2024, saw CME activation. The SCF became visible at this level, showing activation from February to March 2022 and July 2024, closely mapping to periods of geopolitical tension and cybersecurity concerns. Both February and November 2023 saw the activation of the AISE narrative, emphasizing its episodic, yet impactful, nature. The EFC appeared in April and May 2024, reflecting the uneven investor attention paid to corporate communication events. However, foundational narratives, such as SMP and ITC, remained below the threshold across all months, reaffirming their nature as temporally diffuse, high-frequency background narratives.
The combination of the PMPNet model with topic modeling methods helped identify various meaningful stories about Microsoft that are relevant over time and are financially important. These stories show very different patterns of how they appear, become important, and fade away, covering long-term topics such as stock performance and cloud infrastructure, as well as short-term events caused by security issues or media coverage. This heterogeneity in temporal behavior provides a critical empirical foundation for integrating it into econometric models. Specifically, when organized external factors are considered, the narrative time series can be included in ARIMA-X or Generalized ARCH with exogenous variables (GARCH-X) models to understand how they might affect Microsoft’s stock returns and volatility. Determining which stories stick around and which fade quickly helps create a better model of how investors pay attention, share their feelings, and how media discussions affect market behavior. Applying the same thresholds to AMiniLM revealed a complementary activation landscape with distinct reputational and reporting cycles.

9.2. Narratives’ Binary Activation by AMiniLM

The results for the AMiniLM embedding model at the 10% threshold revealed sparse but distinct activation events (see Figure 5).
The TTMS narrative was the first to be activated, registering a strong spike in March 2021, which is likely associated with a brief period of market volatility or speculative trading in Microsoft. MPP demonstrated clustered activations between December 2021 and August 2022, aligned with extended waves of public messaging and media campaigns. The currency and macroenvironment narratives became salient in April and July 2023, coinciding with macroeconomic developments affecting valuation and trading sentiment. Meanwhile, the CTAIP narrative emerged only in June 2023, reflecting the niche and temporally concentrated nature of developer-centric discourses. The FRE narrative crossed the 10% activation threshold intermittently during the key reporting periods of February 2022, February and May 2023, and May 2024, indicating that only selected earnings announcements attracted media attention.
AMiniLM introduced a broader activation landscape after reducing the threshold to 5%. The technical trading narrative extended its activity to April 2021 but disappeared thereafter, reinforcing its episodic nature. In contrast, FRE has a more widespread activation pattern across earnings cycles from November 2021 to early 2025. Media-related stories have grown over time, with several activities taking place in 2021 and 2022. This suggests that people are still interested in Microsoft’s communication strategies. The CME narrative emerged in more months (October 2022, January and April 2023, and January 2024), showing that it was sensitive to changes in global policies. CTAIP gained salience throughout 2023 and 2024, suggesting a gradually intensifying narrative linked to technological infrastructure. The TSE narrative was activated only briefly in early 2024, supporting its identification as a narrowly focused, event-dependent theme. We synthesized these structural and activation differences using a direct cross-model comparison.

10. Discussion of Cross-Model Comparison: AMiniLM vs. PMPNet in Narrative Extraction

10.1. Empirical Comparison: Coverage, Persistence, and Collinearity

Both the AMiniLM and PMPNet embedding models showed strong agreement in understanding Microsoft’s main financial stories, focusing on key topics such as SMP, FRE, MPP, and the overall economic conditions. The SMP topic was Topic N01 in both models, with nearly the same keywords and tiny differences in their importance (0.002 for PMPNet compared to 0.001 for AMiniLM), indicating that both models saw this theme as important and stable during the 50-month analysis period. Both models also identified similar FRE stories that started in early to mid-2022 and received steady or even growing attention over time, with PMPNet showing a small increase (negative decay of −0.035), while AMiniLM remained steady (0.000).
With identical clustering settings, AMiniLM produces fewer redundant micro-topics and smoother monthly activations than PMPNet, yielding narratives that are less collinear and more persistent; this aligns with the observed improvement in ARIMA-X fit when using AMiniLM-based narratives.
In the area of media discussions, both models found similar stories focused on public communication and how the media presented information, reaching their highest point in April 2022 and showing similar declines afterward. Similarly, both methods found macroeconomic stories but reacted differently over time; AMiniLM showed a slower macro theme, whereas PMPNet focused on specific events, leading to a more rapid decline.
However, notable divergences emerged in the scope and granularity of the model-specific narratives. PMPNet identified important strategic themes that focused on global and geopolitical issues, such as a specific SCF story related to Microsoft’s risks from cyber threats and global politics and an AISE story that highlighted competition with OpenAI and changes in regulations. These narratives show clear event-driven structures, peaking in 2023–2024 and decaying rapidly thereafter. Additionally, PMPNet uniquely captures an EFC theme that aligns tightly with investor call cycles, providing invaluable guidance for earnings-season modeling.
In contrast, AMiniLM proved to be more attuned to narrower technical narratives. It successfully isolated content on TSE, developer tools, and speculative trading signals, the latter appearing only briefly but with high semantic specificity. These topics, which are missing or only slightly mentioned in the PMPNet results, show that AMiniLM is adept at selecting detailed and specific discussions, particularly in areas such as hardware, developer tools, and short-term trading culture.
In terms of narrative dynamics, both models showed similar longevity for stable narratives (e.g., SMP, ITC), whereas model-unique narratives exhibited more pronounced contrasts. The SCF of PMPNet peaked late (July 2024) and decayed swiftly, indicating its response to exogenous geopolitical shocks in the US. In contrast, the narrative for AMiniLM’s CTAIP showed a more moderate decay and a shorter duration, reflecting internal product-driven developments.
In summary, the comparison shows that both models effectively extract key stories, but PMPNet is better at finding big-picture, strategic, and global discussions, whereas AMiniLM is more skilled at pinpointing short, technical, and developer-focused topics. These strengths indicate that using a combination of both models can provide the best understanding when examining financial discussions for economic analysis or predictions.
Binary thresholding analysis shows that it is more useful to determine changes in storytelling by looking at how concentrated they are, instead of simply counting how often they happen. The 10% threshold helps pinpoint clear and confident events, which are ideal for models that use binary interventions and analyze structural breaks. The 10% threshold focuses on clear, strong events that are useful for models that use yes/no decisions and analyze significant changes. The 5% threshold provides a more inclusive signal set, enabling the modeling of recurring medium-intensity narrative surges. The comparison shows that AMiniLM reacts better to early guesses and technical signals, whereas PMPNet understands more complex but less frequent stories, such as those related to AI, security, and communication with investors. These differences should help in choosing appropriate embedding models for analyzing narratives in economics, particularly when creating time-series frameworks that focus on various story types and time-related details.

10.2. Interpreting Encoder-Specific Findings: Investor Attention and Market Response

Our monthly objective is to capture the slow-moving narrative drift that shapes portfolio rebalancing. The divergence between AMiniLM and PMPNet can be understood through investor behavior and market response characteristics:
  • Salience and coherence thresholds. Narrative effects materialize when stories are coherent and consistently signaled to the audience. AMiniLM consolidates semantically close headlines into single, salient themes, whereas PMPNet’s finer granularity splits these into multiple micro-topics that individually fail to clear the attention threshold at the monthly horizon.
  • Horizon mismatch and netting out. Fine-grained narratives often matter over days around events (earnings, product launches, legal filings), but net out within a month because of overreactions, reversals, and liquidity provisions. Monthly ARIMA-X therefore favors persistent themes over micro-stories.
  • Category learning (information compression): Investors cognitively organize firm news into compact categories (e.g., AI product pipelines and antitrust/legal risks). The representation geometry of AMiniLM yields fewer, more stable clusters that approximate these categories, thereby increasing signal persistence. Over-segmented micro-topics disperse attention and weaken pricing power.
  • Narrative competition and collinearity. When semantically adjacent micro-topics co-occur in a month, they compete in regressions and introduce multicollinearity, reducing statistical power, even if the underlying economic effect is common to them.
  • Media-style heterogeneity vs. Economic content. PMPNet preserves subtle phraseological differences across different outlets. At a monthly cadence, such stylistic variation can inflate fragmentation without adding distinct economic information, whereas AMiniLM downweights microvariation and emphasizes the economic core of the storyline.
Taken together, these mechanisms explain why AMiniLM-based narratives show greater explanatory power for monthly returns, whereas PMPNet may be preferable for event-window/daily weekly analyses, where finer distinctions are more informative.

10.3. When Might PMPNet Be Preferable?

Our choice of encoder was driven by the monthly theme-level objective. PMPNet’s finer sensitivity to phrasing can be advantageous for event-window or daily/weekly studies, where the goal is to separate closely related storylines around earnings calls, product launches, or regulatory actions. In contrast, for long-horizon narrative drifts, the cluster compactness and temporal stability of AMiniLM were beneficial. Future studies should also consider ensemble encoders or simple post-processing for isotropy to combine these strengths. We now translate these narrative series into ARIMA-X estimates and assess their incremental contributions.

11. Empirical Modeling Results

11.1. Economic Interpretation of the Coefficients

We interpret all ARIMA-X coefficients on narrative regressors in percentage points (pp) of the monthly return. For binary indicators (0/1), β equals the pp change in return when the narrative is active (1) versus inactive (0) at a given lag. For decay activations (bounded, unitless), β represents the pp change per unit increase in narrative intensity (from lower to higher). Positive betas on growth/attention indices (e.g., MPP) reflect the expectations channel: salience and attention raise perceived cash-flow prospects, whereas negative betas on macro/constraint narratives (e.g., CME, TSE) reflect risk premia: heightened tail risk requiring higher discount rates. These mappings are consistent with narrative economics and attention literature (Shiller, 2017, 2019; Tetlock, 2007; Tuckett & Nikolic, 2017) and uncertainty–risk-premium mechanisms (Bloom, 2009; Pastor & Veronesi, 2013; Baker et al., 2016). With the interpretation conventions fixed, we first report decay-based specifications.

11.2. ARIMA-X Modeling of Microsoft Returns Using Decay-Based Narrative Time Series

This section compares two ARIMA-X modeling approaches to evaluate how financial narratives predict (MSFT) monthly log returns. We report the best AIC/BIC/HQIC ARIMA-X for each encoder and distinguish between significant and non-significant narrative effects. Where the narrative block is non-significant (e.g., PMPNet), we interpret the result as having no incremental predictive content at the monthly horizon and use it to compare the extraction quality across encoders. Controls follow a parsimonious, monthly design: we benchmark narratives against an ARIMA baseline with EPU; additional outcome-adjacent market variables (e.g., volume/volatility) are not included to avoid endogeneity and over-parameterization at T = 50.
Figure 6 compares in-sample fitted values with actual MSFT monthly log returns for ARIMA-X models that include decay-activated narrative indices. Top: PMPNet, ARIMA-X(1, 0, 1). Bottom: AMiniLM, ARIMA-X(3, 0, 3). EPU is included as a control in both models (not plotted).
For PMPNet embeddings, the narrative variables decayed over time and were entered into the ARIMA-X models. A grid search examined various ARIMA(p, 0, q) (p (autoregressive lags), d (difference order), q (moving-average lags), in our monthly return application, stationarity tests support d = 0) configurations (with p and q ranging from 1 to 5) and determined that ARIMA(1, 0, 1) was the best option, with AIC = −129.5, BIC = −106.74, HQIC = −120.83, and log-likelihood = 76.72.
The diagnostic tests showed no major autocorrelation issues (Ljung–Box p = 0.54), the residuals were normally distributed (Jarque–Bera p = 0.93), and there was consistent variance (ARCH, p = 0.08). Despite this fit, no narrative coefficients were significant at the 5% level, including MPP (β = −0.0140, p = 0.349) and currency and macro narratives (β = −0.0268, p = 0.392). Higher-order models yielded similar insignificance, except for a marginally significant coefficient in ARIMA(3, 0, 3) (β = 0.0008, p = 0.046), which was too small to be meaningful.
These results suggest that while PMPNet captures broad narrative information, its diffuse thematic structure may limit its predictive power for short-term returns. In contrast, using the same method, the AMiniLM-based ARIMA-X model found that ARIMA(3, 0, 3) was the best choice (AIC = −123.53, BIC = −95.148, HQIC = −112.76, log-likelihood = 76.76). Diagnostics confirmed excellent residual behavior (Ljung–Box p = 0.79; Jarque–Bera p = 0.84; ARCH p = 0.37).
In this AMiniLM model, several narratives showed significant impacts on MSFT returns. The MPP narratives were positively associated with returns (β = 0.0128, p = 0.002), whereas CME had a strong negative effect (β = −0.0143, p < 0.001). TSE also had a significant negative influence (β = −0.0606, p = 0.014). The other narratives and EPU remained insignificant. The AMiniLM models showed more consistent leftover variation (σ2 = 0.0021) than PMPNet (σ2 = 0.0028), indicating that AMiniLM’s better-defined topic groups improve forecasting based on narrative texts.
Overall, AMiniLM’s focused embeddings better capture narrative effects relevant to investor behavior, whereas PMPNet’s broader representations may require refinement for short-term market predictions. We then examined thresholded binary designs to isolate high-intensity episodes.

11.3. ARIMA-X Modeling Using Binary Narrative Activation: Threshold-Based Impact Assessment on MSFT Returns

This section examines the ARIMA-X models that include binary narrative activations based on the AMiniLM and PMPNet embeddings. We coded narrative presence as active if it exceeded 5% or 10% of the monthly document volume (see Figure 7).
For PMPNet at the 10% threshold, the optimal model was ARIMA(2, 0, 2) (AIC = −112.7, BIC = −84.042, HQIC = −101.8, log-likelihood = 71.36). Significant coefficients emerged for SCF (β ≈ −0.0967) and AISE (β ≈ 0.0845). However, high z-scores and low standard errors indicate potential overfitting or multicollinearity due to sparse activation. Media, macro, and earnings narratives were directionally consistent but statistically insignificant, and the EPU index remained irrelevant.
A threshold lower than 5% changed the best PMPNet model to ARIMA(3, 0, 3), with values of AIC = −112.5, BIC = −79.991, HQIC = −100.12, and log-likelihood = 73.25. MPP stories were the only important factors that had reputational impact. Here, MPP narratives were the only significant predictors (β = −0.0517, p = 0.032), emphasizing their reputational influence. Other narratives, particularly those on AISE and SCF, lost significance. Although the 10% threshold provided sharper signals, it resulted in sparse activation and potential instability. The 5% threshold captured more sustained narrative effects but also introduced overlap and noise.
Binary activation models were robustly applied to the AMiniLM embeddings. The ARIMA(1, 0, 1) model showed positive results (AIC = −127.9, BIC = −105.18, HQIC = −119.27, log-likelihood = 75.94), indicating that stories about the MPP had a significant negative impact on returns (β = −0.0758, p = 0.043). FRE narratives show borderline significance (β = 0.1076, p = 0.059), whereas macroeconomic, AISE, and TTMS narratives are insignificant. By lowering the threshold to 5%, the inclusion of narratives in ARIMA(3, 0, 3) (AIC = −114.4) was broadened, revealing delayed impacts, particularly for MPP (β = −0.1048, p = 0.007), aligning with the behavioral finance theories of narrative contagion.
These findings underscore the trade-off between narrative signal clarity and temporal coverage in binary-activation modeling. AMiniLM-based models provide clearer and more reliable results, showing that simple embedding structures are useful for understanding narrative econometrics. Although the broader semantic scope of PMPNet is insightful, it appears to be less effective for precise short-term forecasting without further calibration. We interpreted these associations in light of confounding risks, alternative mechanisms, and identification limits.

12. Discussion of Empirical Modeling Results

12.1. Enhancing Explanatory Power with Embedding-Based Narrative Variables in ARIMA-X Models

This comparison shows that including embedding-based narrative variables in ARIMA-X models significantly enhances their ability to explain results compared with the basic model that lacks narratives or EPU (see Appendix A Table A3). The incremental value is assessed via nested tests—ARIMA + EPU → ARIMA + EPU + Narratives—reporting ΔAIC/ΔBIC/ΔHQIC, LR p-values, and residual variance changes. We do not interpret coefficients from models in which the narrative block is jointly nonsignificant. All the improved models—whether they used AMiniLM or PMPNet embeddings and whether they included narratives with decay-weighted or binary thresholds–worked much better than the basic model, as shown by likelihood ratio (LLR) tests with p-values much lower than 0.01. The AMiniLM decay-based model (ARIMA(3, 0, 3)) provided the best balance of accuracy and detail, identifying three important areas of focus (media, macro, and technology) and reaching the highest log-likelihood score of 76.76.
Even PMPNet’s decay-based model, despite lacking statistically significant narratives, markedly improved over the baseline in terms of log likelihood (76.72 vs. 68.49, LLR p < 0.0001), emphasizing the relevance of embedding-derived narrative structures. Binary integration strategies supported this finding: AMiniLM maintained key media stories with reasonable delays at both the 10% and 5% thresholds, whereas PMPNet identified additional themes, such as the AISE narrative. Overall, the results indicate that embedding-based narrative features offer useful information on stock return patterns beyond standard time-series data.

12.2. Confounding Factors and Interpretation of the Results

The ARIMA with exogenous regressors (ARIMA-X) results were associational and may have reflected confounding factors. First, common shocks—macro releases, industry events, regulatory actions, or geopolitical developments—can simultaneously shift both narrative coverage and returns, producing correlations that are not structural effects of narratives. Second, reverse causality in news supply is possible; price moves can spur coverage and alter the topic mix, creating feedback between the returns and narratives. Third, measurement and sample bias may arise from provider coverage, English-only filtering, paywalls, or de-duplication choices that shape observed data. To limit these risks, we model returns with AR terms, use an ex-ante month-end cut-off (no look-ahead), aggregate to monthly frequency to reduce microstructure noise and timestamp misalignment, align controls to monthly frequency, and interpret only statistically significant coefficients while treating non-significant coefficients as nulls. Throughout, we describe the findings as predictive associations conditional on controls and not causal effects.

12.3. Alternative Explanations Include the Following

Narrative–return associations may arise from information co-determination (stories co-move with fundamentals and macro exposures), media supply reactions to volatility or large price moves, liquidity/flows that track attention rather than beliefs, or time-varying risk appetites that jointly shift coverage tone and discount rates. These mechanisms are consistent with the patterns we observe, but do not imply a structural causal channel from narratives to returns; they motivate our conservative and incremental interpretations of the results.

12.4. Scope and Future Identification

Establishing causality would require exogenous variation in news supply or narratives (e.g., natural experiments, instrumented coverage, difference-in-differences around plausibly exogenous shocks, or structural vector autoregressive (VAR)/event study designs with credible exclusion restrictions). These designs are beyond the scope of our single-firm, methods-first study; therefore, we confine claims to incremental predictive content and flag causal identification as a priority for multi-firm extensions in future research.

12.5. Methodological Trade-Offs in Narrative Temporal Encoding

The results underscore the fundamental methodological distinction between the two narrative-modeling strategies. Decay-modeled narratives, shown as continuous variables that slowly decrease over time, provide a detailed view of how feelings change, allowing us to observe both immediate and fading associations with returns. In contrast, binary presence models, which were built using set monthly limits (5% or 10% of the total narrative volume), were easier to understand and offered straightforward information regarding specific events. However, these studies did not trace the time profiles of investor attention.
The choice between AMiniLM and PMPNet embeddings further affected the coherence and discriminability of the narrative clusters. AMiniLM produces sharper and more compact narratives that translate more reliably into econometric predictors. In contrast, although paraphrase embeddings provide more profound meanings, they create wider, less clear groups that weaken their ability to explain reality in time-series models.

12.6. Statistical and Behavioral Implications

Decay-based ARIMA-X models using AMiniLM embeddings consistently yielded the best empirical performance. The ARIMA-X(3, 0, 3) model showed the best fit, and the leftover patterns showed significant associations with MPP (positive) and CME (negative) stories. These findings support the idea in behavioral finance that stories are consistent with theories in which stories shape expectations, especially during times of changes in reputation or economic uncertainty.
In contrast, although decay-based models with paraphrase embedding performed well according to the AIC, they did not generate statistically significant narrative coefficients. This evidence suggests that narrative compactness, rather than linguistic richness, is essential for modeling market sentiment in econometric terms.
Binary presence models, particularly those using a 10% threshold, have proven effective for flagging high-intensity, event-related narrative episodes. For instance, the AMiniLM-based ARIMA-X(1, 0, 1) model identified a significant negative association with a one-month lag between public perception narratives and financial returns. The 5% threshold model extended this insight, revealing a delayed two-month lag for similar reputational narratives, particularly in paraphrase-based models. These results indicate that narrative associations can be contemporaneous or lagged and occur at different speeds, which aligns with theories about how people absorb stories and react to them later.
Our empirical scope is limited to a single firm. Therefore, we refrain from making broad claims regarding external validity. The contribution is primarily methodological: a transparent, replicable pipeline for firm-level narrative identification and integration in ARIMA-X. The approach is portable to other firms/industries because it relies on (i) firm-tagged news rather than hand-crafted, firm-specific dictionaries; (ii) unsupervised, embedding-based topic discovery; and (iii) activation rules (decay/binary) that are parameterized and not hand-tuned per ticker. Future work will apply the same specifications to multiple firms and sectors to evaluate their cross-sectional stability. These statistical and behavioral patterns inform both modeling choices and real-world use, which we distilled into actionable guidance.

13. Practical and Theoretical Insights

13.1. Mechanism and Modeling Implications: Attention, Persistence, and Design Choices

Our results are consistent with narratives acting as attention-driven frames that update beliefs and are incorporated through monthly decisions (fund flows, analyst revisions, and risk budget resets). This phenomenon explains why coherent, persistent themes (AMiniLM-based) show explanatory power, whereas finer micro-stories are often netted out within a month. Future work can connect the same narrative activations to behavioral proxies (e.g., abnormal retail search, analyst tone, options skew, and flow data) to map each stage of internalization more directly and accurately.
Narrative indices are best used as monitoring overlays rather than as stand-alone signals. For investors and risk desks, decay-based MMP and TSE indices can act as early warning indicators when elevated in high-uncertainty regimes (e.g., high EPU), informing risk budgets, hedge overlays, or scenario narratives. Binary spikes flag event-related episodes that require human review. For policymakers, a sustained elevation in negative narrative breadth can guide communication and surveillance priorities. We do not propose mechanical trading rules; the results indicate incremental in-sample associations over a monthly horizon.
Several conclusions emerge from this comparative analysis.
  • Stories using decay models, particularly those created using AMiniLM embeddings, provided the strongest and most understandable signals. They are best suited for identifying narrative momentum and persistence, especially in innovation or risk-sensitive discourse.
  • Binary presence models complement decay-based methods by identifying sharp, high-intensity narrative spikes. Setting the correct thresholds is important: the 10% rule ensures that the signals are clear but might miss the effects that occur later or are spread out, whereas the 5% rule covers a longer time but can cause more background noise and confusion in the data.
  • Narrative type matters. MPP narratives consistently emerged as statistically significant predictors, regardless of the threshold or embedding model. These findings validate the narrative economics claim that reputation, trust, and communication shape investor sentiment in ways distinct from traditional fundamentals or macroeconomic indicators.
  • The embedding architecture was sequential in nature. AMiniLM provides better topic organization, is easier to understand, and performs better than paraphrase models in creating useful econometric features.
  • Hybrid strategies that combine decay-based weighting with binary filtering or rolling window methods can improve the results by considering both rapid changes and long-term feelings.
  • The results support the use of different methods that focus on meaning in narrative econometrics, in which model design, embedding selection, and timing are all closely aligned with the way financial markets operate.
Using our monthly narrative activations (binary/decay; labels tied to stable IDs), we outline the stakeholder-specific implementations. Translating these design lessons into practice, we outline stakeholder-specific implementation.

13.2. Operational Guidance for Industry and Policy

13.2.1. Investors (Narrative Indices)

Build Firm Narrative Momentum (FNM) from “growth” families (e.g., AI-Product) and Negative Narrative Pressure (NNP) from “constraint” families (Antitrust/Legal, Security/Privacy); optionally track Balance = FNM − NNP. Triggers: onset at the rolling 12-month 80th percentile for two months; escalation at the 90th percentile. Actions: +50–100 basis points tilt when FNM onsets and NNP < median; small put-spread overlay (1–2% notional) when NNP escalates; freeze adds when negative-narrative breadth ≥ 3.

13.2.2. Financial Institutions (Risk Models)

FNM/NNP is included as an exogenous variable in volatility/Value-at-Risk engines (e.g., GARCH-X) and in Probability of Default(PD)/rating-migration components for Expected Credit Loss/IFRS9/Current Expected Credit Loss (CECL); percentile shocks to FNM/NNP in stress scenarios; and narrative thresholds to risk-appetite statements.

13.2.3. Policymakers (Monitoring)

Publish a Narrative Stability Monitor: sector-level NNP (value-weighted), breadth of sectors above thresholds, and simple context-shift flags. Early warnings: yellow if ≥3 sectors are ≥80th percentile for 2 months; red if ≥5 sectors are ≥90th percentile. Communication timing, scenario design, and document methodology were used for transparency. We conclude by summarizing the behavioral channels, design implications, and portability of the pipeline.

14. Conclusions

This study advances narrative economics by showing that firm-level narrative signals constructed from news texts via semantic embeddings and integrated into a conventional ARIMA-X framework enter asset pricing through two behavioral channels. First, the expectations channel: persistent, decay-weighted narratives are associated with shifts in beliefs about future cash flows, which is consistent with diffusion-based conviction formation under conditions of radical uncertainty. Second, a risk-premia channel: negative, macro-oriented, and technology-ecosystem narratives co-move with returns in ways consistent with time-varying left-tail risk and precautionary discounting. These results position narratives as complements—not substitutes—to standard predictors; they proxy attention and salience that shape belief updating and perceived risk when uncertainty is elevated.
Methodologically, model comparisons underscore the fact that design choices in text representation and temporal encoding are meaningful. Decay-modeled narratives built with AMiniLM embeddings deliver the strongest month-horizon associations and predictions, suggesting that investors respond more to compact, coarse-grained thematic bundles than to the fine-grained paraphrase distinctions emphasized by PMPNet. This pattern aligns with the limited attention and categorical processing observed in the field of behavioral finance. The decay versus binary contrast maps naturally correspond to the diffusion of stories (persistence) versus high-intensity event spikes, implying distinct temporal footprints of narrative salience and different use cases: decay for background belief formation and binary activation for detecting short-lived event reactions when the thresholds are calibrated.
Across specifications and thresholds, public messaging and media perception narratives are among the most stable and replicable findings, consistently predicting future MSFT returns with negative coefficients, particularly at longer time lags. This uncovers a reputational risk channel that traditional models often undervalue. More broadly, the evidence demonstrates that narratives are measurable signals that systematically shape investor behavior and returns and that the choice of embedding model is not merely technical but consequential for economic inference.
We emphasize that the ARIMA-X estimates are associational rather than causal; concurrent macro/industry shocks and news supply dynamics may confound their interpretation. However, the reusable and transparent pipeline introduced here provides a micro-to-macro bridge for narrative economics and yields testable implications: narrative–return associations should strengthen when macro uncertainty is high, compact reputation-focused themes should matter more for large visible firms, and the breadth of negative narratives should correlate with risk premium proxies. Practically, the results motivate investor dashboards that track decay-based narrative indices, risk models that incorporate reputation story flows, and policy monitoring tools that flag narrative-driven shifts in perceived risk. Future research can extend these insights by combining behavioral theory with richer text data, ongoing decay functions, and calibrated cutoff rules to study how narratives shape price formation, volatility regimes, and trading behavior across multiple time horizons. Finally, we temper these conclusions with scope conditions and avenues for refinement in future research.

15. Limitations

Despite the study’s robust and substantively meaningful results, we acknowledge its limitations.
  • Activation Design (Binary Baseline): We deliberately use binary narrative activations to target the structural presence of slow-moving negative themes at a monthly horizon and avoid overfitting in a single-firm sample size. The pipeline is modular: we outline intensity-aware extensions (valence-weighted and arousal indices) and context metrics (dispersion and month-to-month shifts) that can be estimated from the same topic assignment. These measures complement, rather than replace, the binary baseline by isolating amplitude and reframing dynamics. Evaluating their incremental value relative to the binary specification is reserved for future work.
  • Model Form and Identification Limits: The estimates are reduced-form and linear-additive; they assume the weak exogeneity of the narrative series and stationarity of the residuals. With T = 50 monthly observations, the order selection and coefficients are subject to small-sample uncertainty and model selection risk. Narrative activations are measured with error, can be collinear with macro controls, and are sensitive (though not fragile) to activation rules (binary thresholds/decay half-life). Monthly aggregation improves alignment with controls but can smoothen short-lived relations. Therefore, we applied Holm adjustments, treated non-significant results as null, and avoided structural claims.
  • Temporal Granularity: The analysis was conducted on monthly data, which may not fully capture the high-frequency dynamics through which narrative shocks and price adjustments unfold over days or weeks. While future studies could benefit from employing weekly or daily data to better model fast-moving market responses, it is also important to acknowledge that narratives typically do not change abruptly. Unlike price movements, which may react quickly to information, they tend to evolve gradually, reflecting sustained shifts in discourse, sentiment, and thematic focus. Therefore, a monthly resolution remains appropriate for capturing the structural and behavioral trajectories of narrative influence over time.
  • Embedding Model Constraints: Although AMiniLM and PMPNet provide strong semantic foundations, no embedding model is perfect. Limitations in capturing domain-specific finance language or emerging slang can affect the granularity of topic clustering.
  • Narrative Decay Assumptions: The exponential decay model assumes that the influence of a narrative gradually fades away. However, its assumptions may be too simple for real-life investor reactions, in which feelings can suddenly change or become stronger.
  • Binary Thresholding Sensitivity: The choice of 5% or 10% thresholds for binary narrative presence is arbitrary and could influence the results. Different threshold calibrations or dynamic thresholding strategies may yield further refinements.
  • Single-Stock Focus: Focusing only on Microsoft (MSFT) provides detailed insights, but more evidence is needed to determine whether these insights apply to other companies, industries, or economic situations.
  • Narrative Interpretation Subjectivity: Despite the methodological rigor in topic modeling, the manual interpretation and labeling of narratives still involve an element of researcher subjectivity that may introduce bias.
These limitations do not detract from the study’s core contributions, but highlight the need for methodological refinement and future research.

Author Contributions

Conceptualization, D.H. and R.G.; Methodology, D.H. and R.G.; Software, R.G.; Validation, D.H. and R.G.; Formal analysis, D.H.; Investigation, R.G.; Resources, D.H.; Data curation, R.G.; Writing—original draft preparation, D.H.; Writing—review and editing, D.H. and R.G.; Visualization, D.H.; Supervision, R.G.; Project administration, R.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data and pipeline codes for the inputs, narrative extraction, and modeling can be accessed at https://figshare.com/ (accessed on 12 August 2025) using the identifier https://doi.org/10.6084/m9.figshare.29410043.v1.

Conflicts of Interest

The authors declare no competing financial interests or personal relationships that could influence the work reported in this study.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
AICAkaike Information Criterion
ADFAugmented Dickey–Fuller
AISEAI Strategy & Ecosystem (narrative family)
AMiniLMall-MiniLM-L6-v2 (sentence-embedding model)
APIApplication Programming Interface
ARCHAutoregressive Conditional Heteroskedasticity
ARIMAAutoregressive Integrated Moving Average
ARIMA-XAutoregressive Integrated Moving Average with Exogenous Variables
BERTBidirectional Encoder Representations from Transformers
BICBayesian Information Criterion
CECLCurrent Expected Credit Loss
CMECurrency & Macro Environment (narrative family)
c-TF-IDFClass-based Term Frequency–Inverse Document Frequency
c_vTopic coherence metric (BERTopic)
ECLExpected Credit Loss
EFCEarnings & Financial Communication (narrative family)
EPSEarnings Per Share
EPUEconomic Policy Uncertainty
FXForeign Exchange
FNM(Growth/)Momentum narrative index
FREFinancial Reports & Earnings (narrative family)
GARCH-XGeneralized Autoregressive Conditional Heteroskedasticity with Exogenous Variables
GDELTGlobal Database of Events, Language, and Tone
HDBSCANHierarchical Density-Based Spatial Clustering of Applications with Noise
HQICHannan–Quinn Information Criterion
IFRS 9International Financial Reporting Standard 9 (Financial Instruments)
IRInvestor Relations
ITCIndustry Trends & Cloud Market (narrative family)
KPIKey Performance Indicator
LDALatent Dirichlet Allocation
LLMLarge Language Model
LLRLikelihood Ratio
MPPMedia & Public Perception (narrative family)
MQMiniLMmulti-qa-MiniLM-L6-cos-v1 (sentence-embedding model)
MSFTMicrosoft Corporation stock ticker symbol
NLPNatural Language Processing
NNPNegative Narrative Pressure (index)
PDProbability of Default
PMiniLMparaphrase-MiniLM-L12-v2 (sentence-embedding model)
PMPNetparaphrase-MPNet-base-v2 (sentence-embedding model)
pppercentage points
RSIRelative Strength Index
SBERTSentence-BERT
SCFSecurity & Conflict (narrative family)
SMPMSFT Stock & Market Performance (narrative family)
TSETech & Semiconductor Ecosystem (narrative family)
TTMSTechnical Trading & Market Sentiment (narrative family)
UMAPUniform Manifold Approximation and Projection
VARVector Autoregression
VaRValue at Risk

Appendix A

Table A1. Narrative structures and temporal dynamics via PMPNet embedding.
Table A1. Narrative structures and temporal dynamics via PMPNet embedding.
Top Keywords (Abridged)Topic IDProposed LabelTopic Coherence (c_v, Unitless ∈ [0, 1])Start MonthEnd MonthDuration (Months)Peak Monthly Docs (n)Peak MonthDecay Parameter λ (per Month)
microsoft, nasdaq, msft, market, earnings, ai, techNP01MSFT Stock & Market Performance (SMP)0.456February 2021March 2025501228March 20210.002
market, global, cloud, researchNP02Industry Trends & Cloud Market (ITC)0.878February 2021March 202550137March 20230.059
crowdstrike, cybersecurity, ukraine, russiaNP03Security & Conflict (SCF)0.422February 2021March 20255092July 20241.299
openai, google, ai, search, microsoftNP04AI Strategy & Ecosystem (AISE)0.594February 2021March 20255077November 20230.157
earnings call, investors, transcriptNP05Earnings & Financial Communication (EFC)0.228February 2021March 20255045April
2024
0.362
announces, street, tv, globe, mediaNP06Media & Public Perception (MPP)0.752May
2021
October 2024427April 20220.172
usd, forex, session, currency trendsNP07Currency & Macro Environment (CME)0.952October
2021
July 2024346July 20230.397
quarterly, revenue, results, reportsNP08Financial Reports & Earnings (FRE)0.649February 2022May 2024282February 2022−0.035
Table A2. Narrative structures and temporal dynamics via AMiniLM-base embedding.
Table A2. Narrative structures and temporal dynamics via AMiniLM-base embedding.
Top Keywords (Abridged)Topic IDProposed LabelTopic Coherence (c_v, Unitless ∈ [0, 1])Start Month End Month Duration (Months)Peak Monthly Docs (n)Peak MonthDecay Parameter λ (per Month)
microsoft, nasdaq, stocks, msft, tech, aiNA01MSFT Stock & Market Performance (SMP)0.446February 2021March 2025501244March 20210.001
amd, processors, intel, aiNA02Tech & Semiconductor Ecosystem (TSE)0.769March 2021March 20254920January 20240.150
usd, forex, fx, north, tradingNA03Currency & Macro Environment (CME)0.668October 2021January
2025
409April
2023
0.085
tv, newsmax, interviews, globeNA04Media & Public Perception (MPP)0.921October 2021December
2024
396April
2022
0.141
quarterly, reports, earnings, resultsNA05Financial Reports & Earnings (FRE)0.649November 2021March
2025
412February 20220.000
pattern, hammer, selling, buy, strugglingNA06Technical Trading & Market Sentiment (TTMS)0.916March 2021March 2021111March
2021
-
mongodb, azure, ai, developer, applicationsNA07Cloud Tools & AI Platforms (CTAIP)0.680October 2021December 2024273June
2023
0.191
Table A3. Comparative performance of ARIMA-X models with embedding-based narrative variables vs. baseline model.
Table A3. Comparative performance of ARIMA-X models with embedding-based narrative variables vs. baseline model.
EmbeddingNarrative IntegrationBest ARIMA (p, 0, q)Log Likelihood (Unitless)AIC (Unitless)BIC (Unitless)HQIC (Unitless)Significant Narratives (Lag, Holm-adj. p)LLR p-Value vs. Baseline
AMiniLMDecay based(3, 0, 3)76.76−123.5−95.148−112.76Media, Macro, Tech0.0054
PMPNetDecay based(1, 0, 1)76.72−129.5−106.74−120.83No significant<0.0001
AMiniLMBinary (10%)(1, 0, 1)75.94−127.9−105.18−119.27Media (1 lag, p = 0.043)0.0001
AMiniLMBinary (5%)(3, 0, 3)73.19−114.4−84.446−103.07Media (2 lag, p = 0.007)0.0076
PMPNetBinary (10%)(2, 0, 2)71.36−112.7−84.042−101.8Security, AI0.0015
PMPNetBinary (5%)(3, 0, 3)73.25−112.5−79.991−100.12Media (2 lag, p = 0.032)0.0011
BaselineNo Narratives and EPU(1, 0, 1)68.49−131−125.23−128.79

References

  1. Akerlof, G., & Shiller, R. (2010). Animal spirits: How human psychology drives the economy, and why it matters for global capitalism. Princeton University Press. [Google Scholar]
  2. Baker, S. R., Bloom, N., & Davis, S. J. (2016). Measuring economic policy uncertainty. Quarterly Journal of Economics, 131(4), 1593–1636. [Google Scholar] [CrossRef]
  3. Bernanke, B. S. (1983). Irreversibility, uncertainty, and cyclical investment. Quarterly Journal of Economics, 98(1), 85–106. [Google Scholar] [CrossRef]
  4. Bhargava, R., Lou, X., Ozik, G., Sadka, R., & Whitmore, T. (2023). Quantifying narratives and their impact on financial markets. Journal of Portfolio Management, 49(5), 82–95. [Google Scholar] [CrossRef]
  5. Blanqué, P., Ben Slimane, M., Cherief, A., Le Guenedal, T., Sekine, T., & Stagnol, L. (2022). The benefit of narratives for prediction of the S&P 500 Index. Journal of Financial Data Science, 4, 72–94. [Google Scholar] [CrossRef]
  6. Bloom, N. (2009). The impact of uncertainty shocks. Econometrica, 77, 623–685. [Google Scholar] [CrossRef]
  7. Boudoukh, J., Feldman, R., Kogan, S., & Richardson, M. (2013). Which news moves stock prices? A textual analysis (NBER Working Paper No. 18725). National Bureau of Economic Research. [Google Scholar] [CrossRef]
  8. Campello, R. J. G. B., Moulavi, D., Zimek, A., & Sander, J. (2015). Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data, 10(1), 1–51. [Google Scholar] [CrossRef]
  9. Damodaran, A. (2017). Narrative and numbers: The value of stories in business. Columbia University Press. [Google Scholar] [CrossRef]
  10. Das, R. C. (2020). Forecasting incidences of COVID-19 using Box–Jenkins method for the period July 12–September 11, 2020: A study on highly affected countries. Chaos Solitons Fractals, 140, 110248. [Google Scholar] [CrossRef]
  11. Deveikyte, J., Geman, H., Piccari, C., & Provetti, A. (2022). A sentiment analysis approach to the prediction of market volatility. Frontiers in Artificial Intelligence, 5, 836809. [Google Scholar] [CrossRef]
  12. Du, K., Zhao, Y., Mao, R., Xing, F., & Cambria, E. (2025). Natural language processing in finance: A survey. Information Fusion, 115, 102755. [Google Scholar] [CrossRef]
  13. Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. Available online: https://arxiv.org/abs/2203.05794 (accessed on 12 August 2025).
  14. Hassan, T. A., Hollander, S., van Lent, L. A. G. M., & Tahoun, A. (2019). Firm-level political risk: Measurement and effects. Quarterly Journal of Economics, 134, 2135–2202. [Google Scholar] [CrossRef]
  15. Jegadeesh, N., & Wu, D. (2013). Word power: A new approach for content analysis. Journal of Financial Economics, 110, 712–729. [Google Scholar] [CrossRef]
  16. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–291. [Google Scholar] [CrossRef]
  17. Karlsson, N., Loewenstein, G., & Seppi, D. (2009). The ostrich effect: Selective attention to information. Journal of Risk and Uncertainty, 38, 95–115. [Google Scholar] [CrossRef]
  18. Knight, F. H. (1921). Risk, uncertainty and profit. Houghton Mifflin Company. [Google Scholar]
  19. Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. Journal of Finance, 66, 35–65. [Google Scholar] [CrossRef]
  20. Mangee, N. (2021). How novelty and narratives drive the stock market. Cambridge University Press. [Google Scholar]
  21. McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv. [Google Scholar] [CrossRef]
  22. Pastor, Ľ., & Veronesi, P. (2013). Political uncertainty and risk premia. Journal of Financial Economics, 110(3), 520–545. [Google Scholar] [CrossRef]
  23. Piper, A., So, R. J., & Bamman, D. (2021, November 7–11). Narrative theory for computational narrative understanding. 2021 Conference on Empirical Methods in Natural Language Processing (pp. 298–311), Virtual Event/Punta Cana, Dominican Republic. [Google Scholar] [CrossRef]
  24. Roos, M., & Reccius, M. (2023). Narratives in economics. Journal of Economic Surveys, 38(2), 303–341. [Google Scholar] [CrossRef]
  25. Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals, and understanding: An inquiry into human knowledge structures. Lawrence Erlbaum. [Google Scholar]
  26. Shiller, R. J. (2017). Narrative economics. American Economic Review, 107(4), 967–1004. [Google Scholar] [CrossRef]
  27. Shiller, R. J. (2019). Narrative economics: How stories go viral and drive major economic events. Princeton University Press. [Google Scholar]
  28. Taffler, R. J., Agarwal, V., & Obring, M. (2024). Narrative emotions and market crises. Journal of Behavioral Finance. [CrossRef]
  29. Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. Journal of Finance, 62(3), 1139–1168. [Google Scholar] [CrossRef]
  30. Tetlock, P. C., Saar-Tsechansky, M., & Macskassy, S. (2008). More than words: Quantifying language to measure firms’ fundamentals. Journal of Finance, 63, 1437–1467. [Google Scholar] [CrossRef]
  31. Tuckett, D., & Nikolic, M. (2017). The role of conviction and narrative in decision-making under radical uncertainty. Theory and Psychology, 27(4), 501–523. [Google Scholar] [CrossRef] [PubMed]
  32. Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur, P., Rosenberg, D., & Mann, G. (2023). BloombergGPT: A large language model for finance. arXiv. [Google Scholar] [CrossRef]
Figure 1. Conceptual framework. Narrative activations (binary or decay-weighted) are aggregated into indices that proxy attention/salience and affect monthly returns through two channels: cash flow expectations (growth) and time-varying risk premia. Economic Policy Uncertainty (EPU) is used as a control and moderator. The return equation was estimated as an autoregressive integrated moving average with exogenous regressors (ARIMA-X), with narrative indices treated as supplementary covariates to standard predictors.
Figure 1. Conceptual framework. Narrative activations (binary or decay-weighted) are aggregated into indices that proxy attention/salience and affect monthly returns through two channels: cash flow expectations (growth) and time-varying risk premia. Economic Policy Uncertainty (EPU) is used as a control and moderator. The return equation was estimated as an autoregressive integrated moving average with exogenous regressors (ARIMA-X), with narrative indices treated as supplementary covariates to standard predictors.
Jrfm 18 00524 g001
Figure 2. Temporal dynamics of PMPNet embedding identified narrative intensities over active periods (2021–2025).
Figure 2. Temporal dynamics of PMPNet embedding identified narrative intensities over active periods (2021–2025).
Jrfm 18 00524 g002
Figure 3. Temporal dynamics of AMiniLM embedding identified narrative intensities over active periods (2021–2025).
Figure 3. Temporal dynamics of AMiniLM embedding identified narrative intensities over active periods (2021–2025).
Jrfm 18 00524 g003
Figure 4. Binary activation periods of financial narratives based on PMPNet embeddings.
Figure 4. Binary activation periods of financial narratives based on PMPNet embeddings.
Jrfm 18 00524 g004
Figure 5. Binary activation periods of financial narratives based on AMiniLM embeddings.
Figure 5. Binary activation periods of financial narratives based on AMiniLM embeddings.
Jrfm 18 00524 g005
Figure 6. Model fit with narratives and Economic Policy Uncertainty (EPU) intervention (PMPNet and AMiniLM, narratives decay).
Figure 6. Model fit with narratives and Economic Policy Uncertainty (EPU) intervention (PMPNet and AMiniLM, narratives decay).
Jrfm 18 00524 g006
Figure 7. ARIMA-X model fits with narratives and Economic Policy Uncertainty (EPU) intervention (PMPNet and AMiniLM, binary narrative presence, 10 and 5%).
Figure 7. ARIMA-X model fits with narratives and Economic Policy Uncertainty (EPU) intervention (PMPNet and AMiniLM, binary narrative presence, 10 and 5%).
Jrfm 18 00524 g007
Table 1. Empirical comparison results of different embedding methods.
Table 1. Empirical comparison results of different embedding methods.
ModelSilhouette ScoreTopic Coherence (c_v)
PMPNet–0.0190.580
AMiniLM0.0160.537
PMiniLM0.0470.495
MQMiniLM0.0020.486
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hayrapetyan, D.; Gevorgyan, R. From Headlines to Forecasts: Narrative Econometrics in Equity Markets. J. Risk Financial Manag. 2025, 18, 524. https://doi.org/10.3390/jrfm18090524

AMA Style

Hayrapetyan D, Gevorgyan R. From Headlines to Forecasts: Narrative Econometrics in Equity Markets. Journal of Risk and Financial Management. 2025; 18(9):524. https://doi.org/10.3390/jrfm18090524

Chicago/Turabian Style

Hayrapetyan, Davit, and Ruben Gevorgyan. 2025. "From Headlines to Forecasts: Narrative Econometrics in Equity Markets" Journal of Risk and Financial Management 18, no. 9: 524. https://doi.org/10.3390/jrfm18090524

APA Style

Hayrapetyan, D., & Gevorgyan, R. (2025). From Headlines to Forecasts: Narrative Econometrics in Equity Markets. Journal of Risk and Financial Management, 18(9), 524. https://doi.org/10.3390/jrfm18090524

Article Metrics

Back to TopTop