Strategic Complexity and Behavioral Distortion: Retail Investing Under Large Language Model Augmentation

Gimmelberg, Dmitrii; Ludviga, Iveta

doi:10.3390/ijfs13040210

Open AccessArticle

Strategic Complexity and Behavioral Distortion: Retail Investing Under Large Language Model Augmentation

by

Dmitrii Gimmelberg

^*

and

Iveta Ludviga

Faculty of Business and Economics, RISEBA University of Applied Sciences, Meza iela 3, LV-1048 Riga, Latvia

^*

Author to whom correspondence should be addressed.

Int. J. Financial Stud. 2025, 13(4), 210; https://doi.org/10.3390/ijfs13040210

Submission received: 1 September 2025 / Revised: 13 October 2025 / Accepted: 28 October 2025 / Published: 6 November 2025

(This article belongs to the Special Issue Advances in Behavioural Finance and Economics 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

This conceptual article introduces Perceived Cognitive Assistance (PCA)—a novel psychological construct capturing how interactive support from Large Language Models (LLMs) alters investors’ perception of their cognitive capacity to execute complex trading strategies. PCA formalizes a behavioral shift: LLM-empowered retail investors may transition from intuitive heuristics to institutional-grade strategies—sometimes without adequate comprehension. This empowerment–distortion duality forms the theoretical contribution’s core. To empirically validate this model, this article outlines a five-step research agenda including psychological diagnostics, trading behavior analysis, market efficiency tests, and a Behavioral Shift Index (BSI). One agenda component—a dual-agent simulation framework—enables causal benchmarking in post-LLM environments. This simulation includes two contributions: (1) the Virtual Trader, a cognitively degraded benchmark approximating bounded human reasoning, and (2) the Digital Persona, a psychologically emulated agent grounded in behaviorally plausible logic. These components offer methods for isolating LLM assistance’s cognitive uplift and evaluating behavioral implications under controlled conditions. This article contributes by specifying a testable link from established decision frameworks (Theory of Planned Behavior, Technology Acceptance Model, and Risk-as-Feelings) to two estimators: a moderated regression for individual decisions (Equation (1)) and a composite Behavioral Shift Index derived from trading logs (Equation (2)). We state directional, falsifiable predictions for the regression coefficients and for index dynamics, and we outline an identification and robustness plan—versioned, time-locked, and auditable—to be executed in the subsequent empirical phase. The result is a clear operational pathway from theory to measurement and testing, prior to empirical implementation. No empirical results are reported here; the contribution is the operational, falsifiable architecture and its implementation plan, to be executed in a separate preregistered study.

Keywords:

perceived cognitive assistance (PCA); large language models (LLMs); retail investor behavior; behavioral finance; AI-augmented decision-making; theory of planned behavior; algorithmic trading; cognitive biases; fintech adoption; market efficiency

1. Introduction

Over the past decade, institutional investors have increasingly relied on algorithmic and AI-enhanced tools to improve decision quality, execution speed, and portfolio optimization (Harris, 2024; Elly et al., 2025). In contrast, retail investors (RIs) have historically lacked access to such cognitive infrastructure. The recent diffusion of Large Language Models (LLMs)—including ChatGPT, Claude, and BloombergGPT—onto retail-facing platforms marks a potential inflection point in the cognitive capacity of non-professional market participants (Dong et al., 2024; Lopez-Lira & Tang, 2024).

Unlike earlier retail-oriented fintech tools such as robo-advisors or stock screeners, LLMs offer real-time, dialogic, and context-sensitive support. These systems can simulate strategies, explain complex derivatives, interpret macroeconomic signals, and respond to natural-language queries (Nie et al., 2024; J. Yang et al., 2025). This shift enables retail users to access and apply concepts—like volatility arbitrage, delta-neutral positioning, or GARCH1-based risk modeling—that were previously limited to institutional domains (Kirtac & Germano, 2024; Valeyre & Aboura, 2024).

While this accessibility represents a major step toward democratizing financial intelligence, it also raises underexplored behavioral risks. LLMs blur the boundary between information access and interpretive authority (Z. Chen et al., 2025; Tatsat & Shater, 2025). They do not merely present data but shape how it is framed and understood, often producing compelling narratives that encourage confidence in users’ understanding and strategy formulation. Empirical studies already point to an “illusion of understanding” phenomenon, where perceived competence rises without corresponding gains in actual decision accuracy (Bahaj et al., 2025; F. Sun et al., 2025).

This inflation of perceived cognitive control may distort investor behavior, particularly under uncertainty (Jia et al., 2024). LLM-generated narratives can simplify complex tradeoffs, reduce the perceived risk of advanced strategies, and increase user trust—even when system outputs are heuristically generated or overfit (Boussioux, 2024; Jia et al., 2024). Such conditions may accelerate transitions from intuitive strategies, such as price momentum, to structurally complex volatility-based trades, such as straddles or iron condors, without adequate comprehension of nonlinear risks (Kirtac & Germano, 2024; Henning et al., 2025). In this study, we further propose that PCA acts not merely as an internal cognitive mechanism but as a conditional moderator. Specifically, PCA shapes the degree to which the availability of LLM cognitive support translates into actual adoption and effective execution of complex investment strategies.

Importantly, these psychological effects—perceived control inflation (overconfidence), intention–behavior divergence as described by risk-as-feelings, and coordination effects induced by LLM-supported heuristics—can scale across retail cohorts as LLM assistants diffuse. Retail investors already account for a significant share of trading volume in instruments such as 0DTE (zero-days-to-expiration) options (Bandi et al., 2023; Xu, 2025). If a large number of users adopt similar prompts or follow comparable trade logic generated by LLMs, the result may be algorithmic coherence—a form of emergent behavioral alignment where strategy convergence distorts price discovery, amplifies volatility, and contributes to reflexive feedback loops (Y. Peng, 2024; Y. Yang et al., 2025). This phenomenon has been documented in the context of risk-premia crowding and critical transitions in complex financial systems, and has been linked more recently to LLM-mediated coordination effects (Y. Peng, 2024; Y. Yang et al., 2025). In such environments, even independently rational decisions may aggregate into destabilizing collective patterns, particularly when LLM-generated narratives are similar, fluently persuasive, and widely adopted (L. Chen et al., 2025; Y. Yang et al., 2025).

Yet academic attention has focused disproportionately on LLM productivity and output fidelity, while largely neglecting retail behavioral shifts. A retrieval-augmented literature review of over 120 papers since 2023 performed by authors in preparation for this research shows strong emphasis on AI-enhanced forecasting, coding, and analysis—but near absence of research into psychological and strategic consequences for retail investors.

Relative to adjacent studies, our scope is investor-level and cognitive rather than macro-market or capability benchmarking. Forecasting and sentiment extraction studies primarily evaluate the predictive performance of LLMs; capability papers catalog model architectures and task scores; and macro co-movement work studies cross-asset linkages under stress or inflationary regimes, such as analyses of safe-haven properties and interactions between gold, technology equities, and cryptocurrencies during inflationary periods (Dimitriadis et al., 2025). Complementary evidence on cross-market connectedness shows time-varying spillovers: Kayani et al. (2024) employ quantile connectedness and a time-varying-parameter vector autoregression (TVP-VAR) to document asymmetric transmission between digital and traditional assets and renewable-energy prices, and Goodell et al. (2023) use TVP-VAR to map shock propagation across traditional assets, cryptocurrencies, and renewable energy through COVID-19 and the Russia–Ukraine war. By contrast, our contribution is micro-behavioral and operational: we theorize how LLM-scaffolded perceived control (PCA) shifts strategy selection at the investor level and we specify falsifiable estimators to test those shifts (Equation (1) moderated regression; Equation (2) BSI).

Central research question: How do large language models (LLMs) reshape retail investors’ strategy selection and risk posture through perceived cognitive assistance, and how can this be operationalized into testable investor-level predictions (Equation (1)) and a composite Behavioral Shift Index (BSI; Equation (2))?

Significance. This question matters because LLMs lower cognitive and operational frictions to higher-complexity tactics (for example, multi-leg options), potentially enabling informed adoption or, alternatively, inducing miscalibrated confidence. Distinguishing empowerment from distortion is essential for investor-protection policy, platform design, and for interpreting any market-level footprints of retail behavior within efficiency diagnostics.

This article explicitly adopts a conceptual stance, introducing two major theoretical and methodological innovations to behavioral finance literature. First, it proposes PCA as a distinctive psychological construct within Theory of Planned Behavior (TPB) frameworks, specifically capturing investor perceptions of cognitive facilitation provided by interactive LLMs. Second, acknowledging inherent methodological constraints, it advances an innovative dual-agent simulation approach, designed explicitly for causal benchmarking of LLM impacts on investor behavior. These contributions collectively pave the way for rigorous future empirical evaluation that proceeds on two distinct tracks: (i) investor-level tests of the mechanism using Equation (1) and Equation (2) (BSI), and (ii) a dual-agent counterfactual (Virtual Trader; Digital Persona) whose purpose is to evaluate whether the LAT workflow generates alpha beyond randomness and to benchmark attribution in post-diffusion settings.

In brief, the contribution is architectural: Equation (1) captures TPB-based micro-causality with LLM × PCA moderation, while Equation (2) provides an observable, time-series diagnostic of realized behavior change. Implementation refers to the explicit operationalization of these estimators, directional sign tests linking theory to coefficients and index movements, and a preregistered, version-controlled plan for estimation and replication.

The rest of this article is structured as follows: Section 2 develops the behavioral and theoretical foundations. Section 3 presents model-implied results and the theoretical model. Section 4 offers a discussion and a future research agenda. Section 5 presents the conclusions.

1.1. The Distinctive Nature of LLMs vs. Previous Investment Technologies

LLMs represent a cognitive inflection point in the evolution of retail financial tools. Unlike prior systems such as robo-advisors or screeners—which operated through static interfaces and rule-based outputs—LLMs function as interactive reasoning agents capable of simulating real-time, context-aware dialog (Gao et al., 2024; Ferrag et al., 2025). This qualitative distinction marks a transition from passive access to co-constructed inference.

LLMs uniquely support four capabilities critical to retail investor cognition: (1) dialogic simulation of strategic reasoning (Costarelli et al., 2024; Shinn et al., 2024); (2) seamless integration of technical, macroeconomic, and derivative contexts within a single thread (Wu et al., 2023; Xue et al., 2024); (3) iterative strategy refinement based on reciprocal logic rather than filter selection (Yu et al., 2023; H. Yang et al., 2024); and (4) hypothesis generation that surfaces novel relationships often missed by heuristic-based tools (Wu et al., 2023; Xue et al., 2024).

These features shift the investor’s role from selector to co-creator. Rather than choosing among predefined outputs, users engage in cognitive scaffolding that reduces complexity barriers and narrows the gap between intent and execution (Gao et al., 2024; Ferrag et al., 2025). While prior platforms extended reach, LLMs actively shape perceived competence and behavioral control, warranting a new theoretical lens on investor cognition in AI-mediated environments.

1.2. Do Retail Investors Favor Lower-Risk, Intuitive Strategies over Complex, High-Risk Ones?

Retail investors (RIs) consistently exhibit preferences for intuitive, low-complexity strategies that minimize analytical burden and perceived risk. This behavior is shaped by cognitive limitations, lack of institutional-grade tools, and behavioral biases such as overconfidence and recency effects (Aqham et al., 2024; D. Singh et al., 2024). Such preferences are reinforced by platform design and social cues, which often emphasize trend-following signals or simplified narratives (Briere, 2023; Xue et al., 2024).

For example, simple momentum investing and dip-buying exemplifies this pattern, relying on extrapolation of recent price trends and demanding minimal technical knowledge (Du et al., 2025; Wheat & Eckerd, 2024). Empirical studies show that retail flows gravitate toward recent winners, particularly in lower-literacy or high-volatility contexts (Miguel & Su, 2019; Wheat & Eckerd, 2024). Dividend investing and ETF buy-and-hold strategies are similarly favored for their perceived safety, familiarity, and low monitoring costs (Graham & Kumar, 2006; Gempesaw et al., 2023).

By contrast, uptake of structurally complex strategies—such as multi-leg options, volatility overlays, or delta-neutral positions—remains limited among retail cohorts (Bryzgalova et al., 2023; Bogousslavsky & Muravyev, 2024). These instruments require comprehension of Greeks, nonlinear payoffs, and market-maker dynamics—factors that impose high cognitive barriers (Alsup, 2023; Naranjo et al., 2023). Even with increased access to derivatives post-2019, most retail activity is limited to directional calls and puts (Bryzgalova et al., 2023; Bogousslavsky & Muravyev, 2024).

This behavioral asymmetry forms the baseline condition for our investigation: whether LLMs reduce complexity perception and facilitate measurable transitions to higher-risk, sophistication-intensive strategies.

In this study, complexity is defined not as academic richness or institutional usage, but as the number of informational dimensions, conditional relationships, and decision nodes involved in strategy execution. This definition aligns with decision science literature, which conceptualizes complexity as the increase in required cognitive operations and branching conditionality under uncertainty (Tversky & Kahneman, 1986; Payne et al., 1993).

2. Materials and Methods

The materials for this study are the theoretical constructs and estimands (the quantities the model intends to estimate in the empirical phase) specified in the main text and appendices: the TPB-based moderated regression (Equation (1)), the Behavioral Shift Index (Equation (2)), the model-implied propositions in Section 3 and the five-step empirical agenda. Methods comprise the operational definitions for Equations (1) and (2), the estimation blueprint (unit of observation, controls, and robustness) to be applied in the subsequent empirical phase, the dual-agent benchmarking design for counterfactual attribution, and market-level diagnostics under the Efficient Market Hypothesis (EMH)/Adaptive Markets Hypothesis (AMH). Implementation proceeds under preregistered versioning, time-locked inputs, and auditable artifacts, as documented in the accompanying protocol.

Theoretical Model—LLM Impact on Retail: Our unit of analysis is the investor period (for example, weekly observations), which motivates a moderated regression in Equation (1) to test whether exposure to LLM assistance and perceived control/assistance (PCA) jointly predict behavioral outcomes and whether the LLM × PCA interaction is positive as hypothesized; estimation uses linear or generalized models with repeated-observation robustness, as appropriate. Equation (2) complements this by aggregating period-over-period changes in multi-leg share, trading frequency, concentration, and volatility exposure into a Behavioral Shift Index (BSI) that provides a time-series diagnostic of realized behavior change. Because post-diffusion settings lack clean human controls, attribution is supported by two information-identical synthetic baselines—the Virtual Trader (bounded cognition) and the Digital Persona (behaviorally plausible benchmark)—evaluated on time-locked inputs. All model specifications, inputs, and artifacts are preregistered and version-controlled, with snapshots that freeze model epoch, tools, prompts, and market state for replay and auditability.

2.1. Extending the Theory of Planned Behavior (TPB) for LLM-Augmented Contexts

This section investigates whether exposure to LLMs triggers a shift in retail investors’ behavior—from low-complexity strategies to cognitively demanding, risk-intensive structures. To model this behavioral migration, we extend the Theory of Planned Behavior (TPB), drawing on complementary psychological and technology adoption constructs.

Theory of Planned Behavior (TPB): A Starting Point

TPB posits that Attitude, Subjective Norms, and Perceived Behavioral Control (PBC) shape behavioral intention, which in turn predicts action (Ajzen, 1991; Armitage & Conner, 2001). In retail investing, this will map onto observable phenomena such as stated intentions to financial actions and actual execution patterns (East, 1993; Xiao & Wu, 2006).

In principle, TPB is well suited to model behavioral transitions in retail investing. It allows us to trace a chain from LLM exposure → psychological construct shifts → changes in trading intention and execution. However, despite its utility, TPB exhibits critical limitations when applied to LLM-mediated investor behavior:

Lack of system-oriented constructs: TPB does not account for technology-specific beliefs such as perceived usefulness or ease of use as other models do (Davis, 1989; Venkatesh & Davis, 2000). Other models can help to explain why some users develop trust in AI systems while others do not (Davis et al., 1989; Venkatesh et al., 2003).
Neglect of affective mechanisms: TPB assumes rational intention formation (Ajzen, 1991; Sussman & Gifford, 2019). It does not incorporate emotional arousal, anticipatory anxiety, or vivid scenario framing (Sniehotta, 2009; Alhamad & Donyai, 2021)—factors that are central in LLM interactions, which often simulate high-stakes decision environments.
No structured behavioral outputs: While TPB links intention to behavior conceptually, it provides no direct link to observable data such as portfolio changes, option usage, or trade frequency—key metrics in financial behavior modeling (East, 1993; Cucinelli et al., 2016).

To address these gaps, we propose four extensions: (1) innovative Perceived Cognitive Assistance (PCA) sub-construct, (2) Technology Acceptance Model (TAM) integration, (3) Risk-as-Feelings Theory integration, and (4) the Behavioral Shift Index (BSI).

2.2. Perceived Cognitive Assistance (PCA): Extending Perceived Behavioral Control for AI-Augmented Decisions

Perceived Cognitive Assistance (PCA) is an individuals’ subjective belief in their enhanced cognitive capability to execute cognitively demanding tasks, facilitated by intelligent, context-aware, and interactive support provided by LLMs, independent of their actual performance outcomes or proficiency with the LLM itself.

This definition deliberately separates three distinct elements: (1) the psychological state of perceived capability enhancement, (2) the availability of LLM support as an environmental condition, and (3) the actual performance outcomes that may or may not align with these perceptions. This separation is crucial for understanding how LLM fundamentally alter users’ self-perceived behavioral boundaries in ways that differ from traditional technology adoption.

To isolate the moderating effect of PCA, it is conceptualized as an emergent, temporally lagged construct: initial LLM exposure fosters perceived cognitive assistance, which then moderates subsequent LLM engagement effects on complex strategy adoption. This temporal separation mitigates endogeneity concerns inherent in cross-sectional models. In this role, PCA determines the strength and effectiveness of cognitive scaffolding by LLMs in shaping investor behavioral outcomes (see Table 1).

PCA builds upon established cognitive science concepts, notably the notion that external tools can scaffold human reasoning (Hutchins, 1995) and that individuals monitor and calibrate their cognitive readiness through metacognitive awareness (Flavell, 1979). In AI-mediated decision contexts, these foundations converge within emerging human-AI collaboration research, which highlights how interactive systems reshape users’ perceived cognitive boundaries (D. Wang et al., 2020).

To establish discriminant validity, we differentiate PCA from four related but distinct constructs:

(a): Technology Self-Efficacy. While technology self-efficacy captures confidence in using a technology effectively (Compeau & Higgins, 1995; Marakas et al., 1998), PCA specifically addresses the perceived enhancement of domain-specific capabilities through AI support. A trader might have high self-efficacy in using LLM (knowing how to prompt, interpret outputs) while having low PCA (not believing it enhances their trading capabilities), or vice versa. PCA builds upon but is conceptually distinct from domain-specific self-efficacy (Bandura, 1997), which captures an individual’s belief in their ability to perform a task based on internal mastery or experience. In contrast, PCA reflects a perceived expansion of one’s capability boundaries specifically induced by external, AI-driven scaffolding, irrespective of genuine skill acquisition.
(b): Cognitive Offloading. Cognitive offloading describes the delegation of memory or computation to external tools (e.g., calculators, to-do lists), but does not entail the internalized sense of behavioral readiness for novel or analytically intensive tasks (Risko & Gilbert, 2016; Gerlich, 2025). PCA differs fundamentally because it captures not just task delegation but the belief in expanded personal capability boundaries: the belief that one is cognitively able to engage in complex tasks due to real-time AI support—even in the absence of skill acquisition. Empirical support for this distinction is growing, as both preliminary studies (A. K. Singh et al., 2023; Spatharioti et al., 2023) and recent peer-reviewed evidence (Steyvers et al., 2025) consistently demonstrate that interactions with LLMs or dialogic AI tools inflate users’ self-assessed competence, even when their actual decision accuracy remains unchanged. Specifically, it was found that users exposed to AI-generated financial narratives rated themselves as more financially knowledgeable but failed to interpret basic derivative setups correctly (Jakesch et al., 2023; Spatharioti et al., 2023).
(c): Trust in Automation. Trust in automation concerns the reliability, transparency, and dependability of the system (Parasuraman et al., 2000), rather than the user’s own felt competence in executing decisions under system assistance (J. D. Lee & See, 2004). PCA is orthogonal to trust—users might trust an LLM’s outputs while not feeling it enhances their capabilities, or might feel empowered by LLM despite harboring doubts about its reliability. This distinction is crucial for understanding the “illusion of understanding” phenomenon.
(d): Perceived Usefulness. Perceived usefulness from the Technology Acceptance Model (TAM) reflects beliefs about system utility in task performance (Davis, 1989; Venkatesh & Davis, 2000), but it does not capture the user’s self-appraisal of increased capability to act (Davis, 1989; King & He, 2006). PCA specifically captures the user’s belief about their own enhanced capabilities, not just improved outcomes. An investor might find LLM useful for gathering information while not feeling it makes them a more capable trader.

Drawing on dual-process theory (Kahneman, 2011) and metacognitive research (Kruger & Dunning, 1999), we propose that PCA can manifest through two distinct pathways:

Positive pathway: When supported by adequate understanding and factual confirmation of LLM effectiveness, PCA facilitates risk democratization with informed confidence.
Negative pathway: When PCA outpaces actual comprehension and factual confirmation, it may lead to behavioral distortion.

This duality depends on factors such as actual vs. perceived understanding, the transparency of LLM outputs, availability of LLM effectiveness validations, and asymmetries in informational framing. To formalize this relationship, we specify the following regression model (see Appendix A):

B_{i (t)} = β_{0} + β_{1} L L M_{i (t)} + β_{2} P C A_{i (t)} + β_{3} L L M_{i (t) P C A_{i (t)}} + β_{4} A_{i (t)} + β_{5} S N_{i (t)} + γ^{T} C o n t r o l s_{i (t)} + ε_{i (t)}

(1)

Symbols: i indexes investors; t indexes time periods; B_i(t) = behavior measure (specified per test); LLM_i(t) = LLM engagement intensity; PCA_i(t) = perceived cognitive assistance; A_i(t) = attitude toward complex strategies; SN_i(t) = subjective norms; Controls_i(t) = control covariates; γ is the coefficient vector on Controls; ε_i(t) = error.

In this regression model with moderation, PCA specifically moderates the cognitive scaffolding effectiveness of LLM capabilities. The interaction term (LLM × PCA) explicitly captures whether and how strongly perceived cognitive assistance enhances the relationship between exposure to LLM capabilities and behavioral outcomes. A statistically significant, positive β3 indicates PCA’s critical moderating role. Expected patterns and when they may fail. We expect a positive link between LLM use and the uptake of more complex strategies, a positive link between perceived assistance and that same uptake, and a stronger combined effect when both are high. These expectations can fail in recognizable situations: when costs or frictions are high, when risk is unusually punitive, when tools or data are unreliable, or when the investor does not feel meaningfully supported. If the analysis does not show these positive patterns—or they disappear once basic cost and risk checks are applied—the claim is not supported. In practice, Equation (1) will be estimated at the investor–period level (for example, weekly observations), using the pre-specified behavioral measures (option-trade frequency, the share of multi-leg strategies, and concentration across underlyings). Estimation will use standard linear models with errors robust to repeated observations per investor (or appropriate generalized models if the outcome is a count or rate). Variable clarifications. A_i(t) is the investor’s attitude toward using complex strategies; SN_i(t) captures perceived social influence (for example, peer or community approval); PCA reflects the investor’s felt ability to handle complex tasks with help from an LLM assistant. Equation (2) then summarizes observed changes in these behaviors as a Behavioral Shift Index for within-investor and cohort comparisons; empirical estimation is reserved for the subsequent phase. For measurement, we propose preliminary scale items that capture PCA’s unique characteristics e.g., a 7-point Likert scale (for example: “When I have access to ChatGPT, I feel capable of executing trading strategies that would otherwise be beyond my abilities”; “AI assistance makes complex financial concepts manageable for me”).

Task complexity will be handled using pre-specified observable features of the trade decision (for example, the number of legs and conditional steps), with refinements to measurement deferred to the empirical phase.

While PCA is most likely to manifest in contexts involving complex, interpretively demanding tasks and interactive AI systems, its emergence is less probable—but not impossible—with simple, rule-based tasks or non-dialogic technologies. Future empirical work should specify these contextual moderators to delineate PCA’s operational boundaries.

2.3. Technology Acceptance Model (TAM): Explaining Variation in LLM Uptake and Reliance

While PCA explains how RIs may feel empowered to engage in more complex financial behavior due to LLM support, it does not explain why some investors develop this perception while others do not. To address this heterogeneity in technology engagement, we integrate constructs from the TAM (Davis et al., 1989; Venkatesh & Davis, 2000), which focuses on users’ perceptions of new technologies. TAM posits that two core beliefs—Perceived Usefulness (PU) and Perceived Ease of Use (PEOU)—predict the adoption, frequency, and persistence of technology usage. These constructs are particularly relevant to financial LLMs, which vary in interface design, response quality, and interpretability. These beliefs influence not only the initial adoption of LLMs but also how deeply and persistently investors integrate LLMs into their strategy formulation process. Empirical studies in behavioral fintech adoption show that PU and PEOU significantly predict usage of robo-advisors, mobile trading apps, and decision-support dashboards (Pavlou & Fygenson, 2006; Belanche et al., 2019). Similar logic applies to LLM engagement in financial contexts, though few studies have yet modeled this in relation to perceived behavioral control. TAM constructs moderate the formation of PCA and influence TPB dimensions—providing the upstream logic for adoption intensity and behavior shift. Operationally, we treat Perceived Usefulness (PU) and Perceived Ease of Use (PEOU) as upstream determinants of LLM engagement (uptake, intensity, reliance). In Equation (1) they enter through the engagement term and its interaction with PCA, and in Equation (2) their effects are expected to surface as directional shifts in ΔFrequency and ΔMultiLeg within the Behavioral Shift Index (BSI).

2.4. Risk-as-Feelings Theory: Modeling Affective Divergence

Affective Finance (AFT) examines how affective states, discrete emotions, and emotion-laden narratives shape judgments and market behavior. It combines three mechanisms that are directly relevant to LLM-augmented retail trading: (i) fast, valenced “good/bad” signals that guide judgments under uncertainty (the affect heuristic), (ii) appraisal-specific effects whereby emotions such as fear or anger shift risk perception and choice in predictable directions, and (iii) narrative-driven conviction that enables action under radical uncertainty. Representative syntheses include the Annual Review survey on emotion and decision-making (Lerner et al., 2015), the affect-heuristic literature (Finucane et al., 2000; Slovic et al., 2007), appraisal-tendency results showing opposite risk effects for fear versus anger (Lerner & Keltner, 2001), and narrative accounts such as Conviction Narrative Theory and narrative economics that link story-driven conviction to market outcomes (Shiller, 2017; Johnson et al., 2023; Tuominen, 2023). Within this broader AFT program, we focus analytically on the Risk-as-Feelings (RaF) hypothesis: immediate feelings at the point of choice can diverge from cognitive risk evaluations and, when they do, feelings often dominate behavior (Loewenstein et al., 2001). RaF is the most appropriate AFT mechanism for our setting for three reasons: (1) Horizon fit—our interest is in short-horizon execution where in-the-moment affect is most behaviorally potent; (2) Identifiable predictions—RaF implies measurable intention–behavior gaps (e.g., plan vs. execution) that our design can target; and (3) Clean operationalization—RaF maps directly onto our observables: transient increases in volatility exposure and compositional shifts in strategy mix.

While TPB and TAM provide robust cognitive models of intention formation and technology adoption, they do not account for a critical behavioral dimension: emotionally driven decision divergence. Retail investing—especially under conditions of high volatility or uncertainty—is often shaped not by rational intent, but by visceral reactions such as fear, excitement, regret, or urgency. In this context, RaF (Loewenstein et al., 2001; Kobbeltved & Wolff, 2009) offers indispensable explanatory value. This theory proposes that emotions experienced at the moment of decision-making—not just cognitive evaluations of outcomes—can override planned behavior (Loewenstein et al., 2001; Kobbeltved & Wolff, 2009). The divergence between cognitive intention and behavioral execution is particularly pronounced in domains involving risk and delayed outcomes, such as financial trading (C. Peng, 2024). Even if TPB constructs (e.g., high PCA) predict strategy engagement, emotionally charged LLM narratives may derail or distort execution. This theory explains why users may hesitate—or leap—despite previously formed plans.

Consistent with this mechanism, we pre-specify two bias risks most directly linked to affective divergence: overconfidence (miscalibrated probability judgments and over-precision) and confirmation bias (selective exposure to belief-congruent evidence). Detection proceeds via investor-level metrics reported in Section 4.4 and is estimated for the LLM-Augmented Trader (LAT) against two information-identical baselines—the Virtual Trader (VT) and the Digital Persona (DP)—on time-locked inputs.

In Equation (1), RaF motivates tests for execution-side divergence conditional on intentions (LLM engagement, PCA) and their interaction; episodes consistent with RaF should coincide with short-run deviations in behavior even when planned intentions are stable. In Equation (2), RaF predicts temporary spikes in ΔVolExposure and, at times, higher concentration within the Behavioral Shift Index (BSI), while the TAM path (Perceived Usefulness/Ease of Use) predicts sustained increases in ΔFrequency and ΔMultiLeg via higher LLM reliance. Together these AFT-RaF and TAM channels yield complementary, falsifiable sign patterns that the BSI can track over time.

2.5. Behavioral Shift Index (BSI): Empirical Operationalization

To connect psychological shifts with observable market behaviors, we develop the BSI—a composite, time-series-friendly measure that quantifies the extent to which LLM adoption alters investor strategy (see Appendix A):

B S I_{i, t} = w_{1} Δ MultiLe g_{i, t} + w_{2} Δ Frequenc y_{i, t} + w_{3} Δ Concentratio n_{i, t} + w_{4} Δ VolExposur e_{i, t}

(2)

\sum_{k = 1}^{4} w_{k} = 1, w_{k} \geq 0

Symbols: i indexes investors; t indexes time; w_k ≥ 0 and ∑w_k = 1; MultiLeg_i,t = share of option trades that are multi-leg; Frequency_i,t = option-trade count per period; Concentration_i,t = portfolio concentration across underlyings; VolExposure_i,t = volatility exposure proxy (e.g., share of open positions with material vega or a designated vol intensity score).

This index instantiates the theoretical lenses as observable deltas: TPB constructs (attitude, norms, perceived control via PCA) map to ΔFrequency and ΔMultiLeg; TAM operates upstream by shaping LLM engagement intensity; Risk-as-Feelings predicts short-horizon divergence, anticipated as spikes in ΔVolExposure. Accordingly, the BSI serves as the outcome-side counterpart to Equation (1) when testing whether LLM exposure and PCA are associated with measurable shifts in trading behavior. The BSI thus enables time-series and cross-sectional comparisons across investor cohorts with varying levels of LLM usage, linking psychological frameworks to observable market data—a methodological advancement rarely implemented in TPB-based financial studies. To compute the Behavioral Shift Index for an investor or a cohort, we take period-over-period changes from trading logs in: (i) option-trade frequency, (ii) the share of multi-leg strategies, (iii) concentration across underlyings, and (iv) option risk exposure (for example, a higher share of trades that increase overall leverage). These components are standardized and combined with weights declared before analysis. What the index should show in practice. If LLM genuinely helps lower the effort required to use complex tactics, the BSI should rise when LLM use and perceived assistance rise. In its components, we expect (i) more multi-leg constructions, (ii) more overall attempts, and (iii), in short, emotionally charged windows, more use of volatility-linked positions. Concentration may drop at first as investors experiment across several tactic families, and then stabilize as learning consolidates. The direction of change in the BSI should be consistent with the positive patterns tested in Equation (1). Falsification and limits. These expectations are easy to disconfirm: if the expected movements do not follow increases in LLM use in the index and its components—or if any movement disappears once basic checks for cost, risk, liquidity, and execution quality are applied—the claim is not supported. We do not assert durable, repeatable excess returns at the market level; where temporary effects appear, they should fade as participants adapt. This defines what the propositions do and do not predict and provides a clear basis for later empirical checks. In addition to these continuous proxies, we code each trade with a Strategy Structural Complexity (SSC) label on a four-level ordinal scale C0–C3. SSC is used for descriptive distributions and robustness checks (e.g., share of trades at C2–C3), while ΔMultiLeg and ΔVolExposure remain the primary BSI complexity proxies.

2.6. Proposed Diagnostic Framework for Detecting Behavioral Shifts: Integrating Efficient Market Hypothesis and Adaptive Market Hypothesis

While the TPB, PCA, Risk-as-Feelings Theory, and TAM offer robust psychological accounts of how LLMs alter individual-level intentions and behaviors, they do not address the question of market consequences. Specifically, these frameworks are not designed to evaluate whether LLM-induced behavioral shifts among retail investors will be materialized in price formation, volatility dynamics, or the informational efficiency of markets. To assess these causalities, a market-oriented framework is required—one that enables the identification of deviations from random pricing, tests for persistent performance anomalies, and distinguishes between behavioral distortions and adaptive learning. This diagnostic need is particularly acute in the context of technological discontinuities, where behavioral changes may be rapid, widespread, and nonlinear. In this study, we adopt a dual-framework approach grounded in financial market theory: the Efficient Market Hypothesis (EMH) (Fama, 1970; Abdullahi, 2021) and the Adaptive Markets Hypothesis (AMH) (A. W. Lo, 2004). These two perspectives serve complementary functions. The EMH provides a null hypothesis framework for testing whether retail behavior—augmented by LLMs—generates abnormal returns or price patterns inconsistent with informational efficiency. The AMH, in contrast, offers an interpretive model that explains such deviations not as irrationality but as the result of bounded rational adaptation to changing cognitive and technological conditions.

This dual integration allows us to move beyond individual cognition and intention formation, toward diagnosing whether the cumulative behavioral effects of LLM adoption—particularly changes in strategy complexity, timing, and coordination—leave observable traces in market structure. Our aim is not to model systemic behavior directly, but to assess whether LLM-induced retail activity contributes to detectable anomalies such as return persistence, volatility clustering, or adaptive convergence. In this sense, the EMH and AMH serve as complementary diagnostic tools: one identifying statistical deviations, the other interpreting them through bounded rational adaptation.

2.6.1. EMH as Baseline Diagnostic Framework

The EMH serves as a null model against which persistent trading advantages or price anomalies linked to LLM use can be tested. The operational diagnostics we apply are summarized in Table 2. In its weak form, the EMH posits that historical price information is fully incorporated into current prices, rendering trend-following ineffective (Fama, 1970; Showalter & Gropp, 2019). The semi-strong form extends this logic to all publicly available information, precluding informational arbitrage (Diamond & Perkins, 2022).

By establishing tests, the EMH serves as a null hypothesis framework. If LLM-augmented trades do not yield statistically significant or repeatable performance gains, the market remains informationally efficient. However, if anomalies are observed, this suggests a deviation from pure market rationality—precisely where the AMH becomes relevant. At the market level, the contribution is a dual-frame diagnostic: the EMH provides the null (no persistent alpha or anomalies), while the AMH interprets any transients as adaptive learning—thereby making the micro-level predictions from Equation (1) and the BSI refutable against macro evidence.

2.6.2. AMH as an Evolutionary Framework

The AMH reconceptualizes market efficiency as dynamic and context-dependent, shaped by cycles of agent adaptation, feedback effects, and technological innovation (A. W. Lo, 2004). Unlike the EMH, which assumes stable, rational processing of available information, the AMH accounts for cognitive limitations, path dependence, and environmentally conditioned learning in financial behavior (A. Lo, 2004). Since LLMs represent a cognitive inflection point in this evolutionary process they collapse informational silos, simulate expert reasoning, and scaffold complex strategy development through interactive and multimodal feedback (Z. Wang et al., 2023; Bewersdorff et al., 2025). While these behavioral changes may temporarily produce patterns inconsistent with traditional EMH formulations (e.g., clustered returns, volatility overshoots), the AMH views such anomalies as part of an adaptive cycle, not evidence of market irrationality. Market participants—once exposed to new tools or conditions—learn, adapt, and modify behavior over time (Khuntia & Pattanayak, 2018; A. W. Lo, 2019). The AMH thus provides an interpretive lens for determining whether LLM-induced deviations are transient signals of adaptation or signs of persistent inefficiency. Table 3 outlines illustrative diagnostics grounded in AMH logic.

Ultimately, the AMH complements the EMH by interpreting observed deviations not as violations, but as learning dynamics. This dual-frame approach supports empirical efforts to disentangle short-term distortions from long-run cognitive evolution, providing a behavioral bridge between micro-level strategy shifts and macro-level financial adaptation.

3. Results

As a conceptual article, we report model-implied results rather than sample estimates. These model-implied results matter because they translate the theory into precise, falsifiable sign tests and platform-design checks to be executed in the preregistered empirical phase. The model predicts that exposure to LLM assistance and perceived cognitive assistance jointly increase adoption of structurally complex strategies; in Equation (1), this corresponds to positive main effects for LLM engagement and perceived control/assistance, and a positive interaction, with signs stated for later falsification. At the behavior level, Equation (2) (BSI) should rise with LLM exposure and be stronger when perceived assistance is high. At the market level, persistent alpha or systematic post-entry drift would challenge weak-form efficiency, whereas transient patterns are consistent with adaptive dynamics. These implications set out the testable contribution that later empirical work will adjudicate. In practice, this frames investor-protection and platform-design implications—especially where confidence may rise faster than competence—and links directly to the efficiency diagnostics specified in the manuscript for distinguishing adaptive transients from persistent anomalies.

Under the TAM path, higher perceived usefulness/ease should raise LLM uptake and reliance; in Equation (1) this implies positive coefficients on the LLM engagement term (and its interaction with PCA) and, in Equation (2), increases in ΔFrequency and the share of multi-leg strategies. Under the Affective Finance/Risk-as-Feelings path, short-horizon affective swings should manifest as transient increases in ΔVolExposure and, at times, higher concentration—indicating execution that departs from prior plan. PCA moderates these effects: when perceived capability is calibrated, we expect complexity uptake without destabilizing risk shifts; when PCA outpaces actual skill, we expect confidence-led complexity and volatility exposure that the BSI will flag. Taken together, Equation (1) yields falsifiable sign predictions on engagement and moderation, and the BSI provides a time-series diagnostic distinguishing capability-driven from affect-driven shifts.

Theoretical Model—LLM Impact on Retail Investor Strategy Migration

The statements below answer the central research question by specifying falsifiable sign predictions in Equation (1), expected movements in the Behavioral Shift Index (Equation (2)), and market-level implications to be interpreted with the EMH/AMH diagnostics, with empirical estimation reserved for the preregistered program. This study advances a behavioral finance model that theorizes how LLMs facilitate strategic migration among retail investors—particularly the shift from heuristic-based, low-complexity strategies to structurally sophisticated approaches. Strategic complexity denotes the structural and procedural sophistication of a trade. Higher strategic complexity is reflected by moving beyond single-leg or simple two-leg directional setups toward multi-leg constructions, explicit volatility posture or time-staging (e.g., calendars/diagonals), and path-dependent or delta-neutral structures that require cross-Greek management. Empirically, we will label each executed or recommended trade with a four-level Strategy Structural Complexity (SSC) code (C0–C3) and use Equation (1) to test whether higher PCA is associated with a greater likelihood of selecting C2–C3 structures, conditional on controls. The model, depicted in Figure 1, is grounded in an extended TPB, enriched by the novel construct of PCA, and positioned within a dual diagnostic framework informed by the EMH and the AMH. Here, ‘implementation’ means operationalizing the model into estimable forms: an individual-level moderated regression (Equation (1)) and a cohort-level diagnostic index (Equation (2)), accompanied by explicit sign predictions and a preregistered estimation plan. In reduced form, we evaluate whether the BSI varies with LLM engagement (as shaped by Technology Acceptance Model constructs—Perceived Usefulness and Perceived Ease of Use) and PCA, including their interaction, conditional on controls.

At the core of the model lies a foundational relationship between the Independent Variable (IDV)—retail investors’ current trading preferences and risk attitudes—and the Dependent Variable (DV), defined as observable changes in strategy type, risk profile, and trade effectiveness. Empirical research demonstrates that retail investors, in the absence of external cognitive augmentation, systematically favor intuitive, simple strategies that minimize analytical burden and perceived complexity. These include approaches such as price momentum, dividend investing, and passive ETF allocations, which are accessible without requiring advanced knowledge of market microstructure, derivatives, or volatility dynamics (Foltice & Langer, 2015; D’Hondt et al., 2023; Chui et al., 2022).

This behavioral asymmetry reflects the cognitive and procedural barriers that limit retail engagement with complex instruments, including multi-leg options, volatility overlays, and delta-neutral strategies. The direct IDV → DV pathway thus anticipates that retail investors, constrained by their baseline cognitive resources and risk preferences, will predominantly select low-complexity strategies and avoid structurally sophisticated approaches. This leads to predictable patterns in strategy composition, portfolio concentration, and risk exposure.

Proposition 1.

Retail investors, without cognitive augmentation, exhibit a systematic preference for simple, intuitive strategies and avoid structurally complex approaches.

The introduction of LLMs, however, modifies this direct relationship by introducing three interrelated mechanisms that mediate and moderate the pathway from baseline investor preferences to observable strategic behavior.

Factor 1: LLM Capabilities. LLMs provide a qualitatively distinct form of cognitive scaffolding compared to prior retail-facing tools. Unlike static information platforms or rule-based advisors, LLMs offer real-time, dialogic, and multimodal reasoning support. These affordances enable users to simulate trading scenarios, integrate technical and macroeconomic signals, and comprehend complex instruments that would otherwise exceed unaided understanding.

Factor 2: LLM-Enhanced Strategy Integration. The cognitive scaffolding provided by LLMs reduces interpretive and procedural complexity, facilitating operational adoption of advanced strategies. Retail investors, supported by LLMs, can incorporate volatility-linked instruments, multi-leg derivatives, or delta-neutral positions into their decision routines—often embedding these within familiar heuristic frameworks.

Proposition 2.

LLM-supported strategies enable retail investors to operationalize structurally complex approaches, improving accessibility and the potential for enhanced risk-adjusted returns.

Factor 3: Behavioral Mediation via PCA and Moderating Role of LLM Capabilities. The behavioral impact of LLMs on retail investor strategy selection is conceptualized as part of a moderated psychological process. This model introduces PCA as a novel subdimension of PBC, capturing the investor’s subjective belief that complex tasks become tractable with intelligent system support, even in the absence of true domain mastery. PCA functions as a first-stage moderator, shaping the strength of the relationship between investors’ baseline predispositions—such as risk appetite, cognitive orientation, and strategy familiarity—and their willingness to adopt structurally complex approaches. Under conditions of high PCA and accessible LLM support, investors perceive greater self-efficacy and reduced cognitive barriers, increasing the likelihood of engaging with complex strategies such as volatility overlays or multi-leg derivatives. Conversely, when PCA is low or LLM capabilities are absent or underutilized, the cognitive and procedural barriers that constrain strategic migration remain intact.

Proposition 3a.

PCA moderates the relationship between baseline investor predispositions and strategic behavior; higher PCA strengthens the likelihood of adopting complex, risk-intensive strategies.

Proposition 3b.

The moderating effect of PCA is itself conditioned by LLM capabilities; greater LLM accessibility and functionality amplify the influence of PCA on behavioral outcomes.

These mechanisms operate along two distinct pathways. The Empowerment Pathway reflects the democratizing potential of LLMs to enhance access to institutional-grade strategies, supporting performance gains and strategic sophistication. The Distortion Pathway, by contrast, highlights the risk of behavioral miscalibration, where inflated perceptions of competence outpace actual understanding—amplifying risk-taking, overconfidence, and suboptimal decision-making. These pathways are conditional, not universal. The shift toward complex strategies should be muted or absent when costs or frictions are high, when risk is unusually punitive, when tools or data are unreliable, or when the investor does not feel meaningfully supported by the system. In emotionally intense episodes, short-term surges in risk-taking can occur even if long-run preferences do not change; this is the distortion side of the model.

Our claims are refutable. In later tests, we expect to see more frequent attempts, greater use of multi-leg constructions, and—especially in short, emotional episodes—more use of volatility-linked positions when LLM use and perceived assistance are higher. If these patterns do not appear, or if any apparent effects vanish once basic cost, risk, and execution checks are applied, the model is not supported. At the market level, we do not claim durable, repeatable abnormal returns; where temporary patterns arise, they should fade as participants adapt.

In sum, the model offers a causally structured and empirically testable account of how LLMs reshape retail investor behavior—not only by altering cognitive capacity but by fundamentally restructuring the psychological conditions under which complex strategies are adopted or avoided. Through its integration of PCA, TPB extensions, and market-level diagnostics via the EMH and AMH, the model enables evaluation of whether LLM-driven strategic migration constitutes an adaptive evolution or introduces new forms of behavioral distortion in retail trading ecosystems.

4. Discussion and Future Research Agenda

The model’s contribution is to specify a falsifiable link from LLM exposure and perceived assistance to observable shifts in retail trading complexity, with explicit estimators (Equations (1) and (2)) and an identification plan suitable for post-LLM environments. We emphasize boundaries: perceived uplift may outpace competence; intention–behavior gaps can widen under effects; and index-level diagnostics must separate transient adaptation from persistent anomalies. Interpreting individual-level findings against the EMH/AMH offers a principled way to distinguish short-run distortions from longer-run learning. The empirical phase is deliberately deferred; the present contribution is the operational mapping from theory to tests under a preregistered, auditable workflow.

4.1. Empirical Validation of the Theoretical Model

This study develops a behavioral finance model of how LLMs influence retail investor behavior. The model integrates constructs from the TPB, the TAM, Risk-as-Feelings, and PCA to explain strategy migration from intuitive, low-complexity approaches toward higher-complexity option strategies and volatility-linked instruments. The contribution of this article is conceptual-methodological: a fully operationalized, falsifiable architecture comprising an individual-level moderated regression (Equation (1)) and a Behavioral Shift Index (Equation (2)), with directional sign tests and an identification blueprint. Equation (1) will test whether exposure to LLM assistance and perceived control or complexity predict the behavioral measures, and whether their interaction is positive as hypothesized. Equation (2) will provide a composite index for within-investor and cohort comparisons. In parallel, we will report investor-level bias-metric deltas (calibration, selective exposure, disposition) for LAT vs. VT/DP so that any observed ‘confidence lift’ can be distinguished from genuine capability gains. Inference will be robust to repeated observations per investor, and we will use investor-specific and time effects to control for unobserved heterogeneity and common shocks. Timing checks and sensitivity analyses will probe identification. At the market level, results will be interpreted against efficient- and adaptive-market diagnostics already specified in the manuscript. Empirical results are intentionally out of scope; the empirical program is specified as a preregistered agenda. To translate the model’s causal logic into observable outcomes, we propose a five-step agenda (Figure 2) addressing two empirical challenges: First, the detection of LLM-induced behavioral change: whether exposure to LLMs is associated with measurable shifts in strategy complexity, risk-taking, and execution patterns. Second, causal benchmarking under post-LLM diffusion: a valid comparison of LLM-assisted trades with bounded-human counterfactuals when live control groups are infeasible. Our solution is a dual-agent benchmark (Virtual Trader and Digital Persona) that isolates decision-maker effects and is replicable. For falsifiability, we pre-register directional signs for Equation (1)—β₁ > 0 (LLM main effect), β₂ > 0 (PCA), β₃ > 0 (LLM × PCA)—and expect the BSI to increase with LLM exposure and to be amplified by PCA; failure to observe these patterns would refute the model.

The five-step agenda is structured to ensure logical alignment with the theoretical model. Step 1 tests the enabling role of LLM capabilities (Factor 1) in reducing cognitive barriers to strategic complexity. Step 2 validates the baseline condition (Independent Variable) of retail investors’ historical preference for intuitive, low-risk strategies. Step 3 examines psychological mediation (Factor 3) via PCA and TPB constructs. Step 4 operationalizes performance attribution and tests LLM-enabled strategy integration (Factor 2) using the Dual Simulation Framework. Finally, Step 5 captures the dependent variable—observable changes in behavior and risk attitude—while embedding the analytical components necessary to address the detection and attribution of LLM-induced behavioral change.

Step 1: Cognitive Preconditions for Strategy Migration (Factor 1). The first step assesses whether LLMs reduce cognitive barriers that have historically limited retail engagement with complex instruments. This investigation draws on thematic literature reviews in AI accessibility, human–computer interaction, and fintech adoption. Comparative case studies of LLM-based tools are used to evaluate whether LLMs facilitate real-time reasoning, knowledge democratization, and cognitive scaffolding in financial decision-making. This step tests the foundational role of LLM capabilities (Factor 1) as the initiating mechanism in behavioral migration.

Step 2: Baseline Preferences and Strategic Behavior (Independent Variable). The second step empirically validates the baseline condition central to the theoretical model: retail investors’ historical preference for intuitive, low-complexity strategies. Desk research and market analysis document how platform design, user interface affordances, and limited cognitive resources have nudged retail investors toward momentum trading, dividend capture, and other heuristic-based approaches. This step establishes the counterfactual from which LLM-induced behavioral change is evaluated.

Step 3: Psychological Mediation via PCA and TPB Constructs (Factor 3). Building on the model’s psychological foundations, the third step investigates how LLM exposure alters investor psychology. Survey instruments based on the TPB will be designed to measure changes in TPB constructs—augmented by the PCA construct, which captures AI-scaffolded self-efficacy. Optional modules, including expert interviews and field experiments, assess confidence calibration and perceived competence. This step operationalizes Factor 3, testing the mediating role of PCA in enabling behavioral migration.

Step 4: Performance Attribution via Dual-Agent Simulation (Factor 2). The fourth step addresses the second challenge by testing whether LLM-assisted trades outperform human baselines using a dual-agent simulation framework comprising a VT and a DP (see Section 2.3, Appendix B). The VT applies empirically grounded cognitive degradations to model bounded rationality, representing a lower-bound human counterfactual. The SP simulates psychologically plausible, LLM-independent decision-making through structured prompting. This simulation framework operationalizes Factor 2 of the theoretical model by evaluating whether LLMs facilitate superior strategy integration and decision quality. Performance is benchmarked across Sharpe ratios, drawdown metrics, and trade consistency. Triangulated inference logic distinguishes true cognitive uplift from heuristic mimicry or distortion effects.

Step 5: Dependent Variable—Change in Behavior and Risk Attitude. The final step captures the dependent variable: measurable shifts in retail investor behavior and risk orientation. Change is operationalized via the BSI, trading frequency, portfolio concentration, and volatility exposure. This step incorporates the full apparatus necessary to address first challenge, structured around four analytical components: (1) temporal attribution and exposure anchoring; (2) identification of behavioral markers and strategy migration; (3) triangulation of psychological mechanisms via PCA diagnostics; and (4) attribution integrity using matched-asset and macro-event controls. These processes ensure that observed behavioral changes are causally linked to LLM exposure rather than confounded by exogenous factors. Market-level diagnostics are incorporated to evaluate whether individual behavioral shifts aggregate into detectable market anomalies. EMH tests assess deviations from informational efficiency, while AMH diagnostics interpret such deviations as bounded rational adaptation or reflexive feedback. Together, these frameworks support attribution across both micro (behavioral) and macro (market) levels, distinguishing cognitive uplift from collective distortion or convergence effects.

The empirical agenda integrates both quantitative and qualitative methodologies to ensure comprehensive validation of the proposed model. The initial steps employ established techniques including thematic literature reviews, comparative case studies, and desk research, widely recognized for their role in theory-driven behavioral investigation and technology evaluation (Webster & Watson, 2002; R. K. Yin, 2018). Survey instruments and field experiments grounded in the Theory of Planned Behavior (Ajzen, 1991) and validated through psychometric standards (DeVellis, 2016) capture psychological mediation and perceived cognitive assistance. The simulation-based performance attribution and market-level diagnostics build on rigorous agent-based and econometric methods, detailed in Appendix B and Appendix C. This mixed-method design reflects best practices in behavioral finance and information systems research, supporting both construct validity and empirical robustness across micro- and macro-level analyses.

Taken together, this five-step agenda provides a coherent, theoretically grounded, and empirically rigorous framework for validating the proposed model. By integrating psychological mediation, simulation-based benchmarking, and market diagnostics, the agenda enables causal inference in environments where traditional control groups are infeasible, advancing both theoretical understanding and methodological practice in LLM-augmented retail investing research.

4.2. Detecting LLM-Induced Behavioral Change in Retail Investing

This section operationalizes the causal pathway: LLM Exposure → Psychological Mediators (TPB, PCA) → Behavioral Shift (BSI) → Market Diagnostics (EMH/AMH). Appendix A and Appendix C provide full operational, statistical, and formulaic details; here we focus on theoretical anchoring and empirical linkage.

Temporal Exposure Anchoring. The launches of ChatGPT (November 2022) and GPT-4 (March 2023) serve as exogenous structural breaks in time-series analyses of retail trading behavior. We follow event-study methodology refined for behavioral finance, aligning periods of LLM diffusion with shifts in investor activity. Diffusion intensity is proxied by LLM-related search and engagement metrics (e.g., Google Trends, Reddit, YouTube), a strategy grounded in the literature on technology adoption and investor attention (Kirtac & Germano, 2024).
Strategy Shifts in Retail Trading (BSI Changes). Behavioral indicators—such as multi-leg options trading, increased turnover, portfolio concentration, and reduced holding durations—are used to capture shifts in investor strategies (Lopez-Lira & Tang, 2024). These serve as observable manifestations of perceived behavioral control (PCA) and overconfidence, consistent with prior work linking psychological distortions to risky retail trading (Glaser & Weber, 2007; Han et al., 2022). These metrics compose the Behavioral Shift Index (BSI), defined and mathematically specified in Appendix A.
Psychological Mediation via TPB and PCA Calibration. To directly investigate psychological pathways, we implement TPB-based surveys measuring changes in TPB constructs following LLM exposure. TPB has robust empirical support across domains and informs our understanding of intention-to-behavior processes (Ajzen, 1991; Gimmelberg et al., 2025b). Qualitative interviews and confidence-calibration tasks to detect PCA-induced miscalibration—where increased confidence does not correspond to improved trading outcomes.
Attribution Control via Market Diagnostics (EMH/AMH). We apply matched-asset counterfactuals—comparing LLM-associated securities to similar but presumably unexposed ones—enabling difference-in-differences estimation that accounts for confounders such as macro shocks, earnings announcements, and social sentiment (Tetlock, 2007). Patterns of behavioral change accompanied by price movement would align with competitive equilibrium. In contrast, persistent behavior with negligible price adjustment may indicate AMH dynamics, whereas sustained inefficiencies could suggest EMH deviations.

4.3. Dual Simulation Benchmarking: The Virtual Trader and Digital Persona Framework

In this section, we address the fundamental evaluation challenge: the absence of a valid human control group once LLMs are integrated into trading workflows. Such a control would require matched LLM-naive investors operating over prolonged periods under equivalent conditions—a setup that is logistically, ethically, and epistemologically untenable once cognitive exposure to LLMs has occurred. To overcome this, we propose a Dual-Agent Simulation Framework, depicted in Figure 3. Two complementary agents—the Virtual Trader (VT) and the Digital Persona (DP)—serve as computational stand-ins for human decision-making baselines. The VT models performance low bounds constrained by documented cognitive limitations, while the DP captures middling yet psychologically plausible investor behaviors. Their juxtaposition against LLM-augmented outcomes facilitates epistemic triangulation, offering structured inference in lieu of unobtainable human controls. This framework provides an empirically grounded alternative for assessing LLM-augmented trading performance.

4.3.1. Virtual Trader: A Cognitively Degraded Counterfactual

The VT serves as a cognitively bounded counterfactual, designed to estimate the lower boundary of plausible human decision performance under real-world constraints. Rather than representing a sub-human or pathologically impaired agent, it models unaided retail investor cognition operating under empirically documented limitations: attentional fatigue, input omission, recency bias, volatility misestimation, and heuristic simplification. These degradation parameters do not simulate irrationality or failure but instead reflect well-established cognitive constraints observed in behavioral finance and decision science (H. A. Simon, 1955; Barberis & Thaler, 2002). By framing the VT as a lower-bound not in absolute human potential, but in unaided, real-time performance under cognitive load, the model clarifies what LLMs are expected to overcome. Importantly, this simulation does not benchmark against idealized expert behavior, but against a psychologically grounded baseline reflective of the conditions under which most retail investors operate. This provides a meaningful performance comparator for testing whether LLM augmentation offers true cognitive uplift in strategy design, execution, and outcome.

Degradation process draws on the bounded rationality framework (H. A. Simon, 1955) and reflects cognitive asymmetries documented in experimental markets, such as misestimation of implied volatility, recency bias, and limited multimodal integration. The methodology enables performance attribution—across Sharpe ratios, entry-exit timing, or drawdown metrics—while ensuring that the only difference between LLM and VT trades is the presence or absence of cognitive augmentation.

This approach embeds a key epistemological assumption: that LLMs represent an optimal or near-optimal strategic baseline. Recent research warns against such assumptions. LLMs exhibit hallucination, narrative overconfidence, and probabilistic miscalibration, especially in domains involving structured reasoning and stochastic inference (Ganguli et al., 2022a; L. Huang et al., 2025; Varshney et al., 2025). Treating the LLM as an infallible reference risks overestimating its advantage and must be mitigated via validation protocols, including signal corroboration, consensus scoring, and rejection of low-confidence outputs.

While the VT models cognitive constraints across validated dimensions, the current implementation treats these as modular and additive for purposes of tractability and simulation stability. This simplification facilitates calibration and attribution in the baseline framework. However, behavioral research highlights that such impairments often interact nonlinearly—e.g., fatigue amplifies attentional lapses, and emotional arousal skews volatility interpretation (Loewenstein et al., 2001; Kahneman, 2011). These limitations are acknowledged and addressed in Appendix A.3, which introduces a rule-based interdependency module as a forward-looking enhancement for simulating cascading cognitive effects under stress.

4.3.2. Digital Persona: Human Plausibility via LLM Emulation

The Digital Persona (DP) provides a conceptually distinct counterfactual. Rather than degrading an LLM output, this approach builds behavioral emulation from the ground up. Each persona is constructed using structured prompts encoding psychological traits, demographic attributes, and trading history, instructing the LLM to “think like” a specific type of retail investor. This technique draws on agent-based simulation theory and recent advances in LLM-enabled computational social science (Gao et al., 2023; H. Jiang et al., 2024; Hartley et al., 2025).

DPs offer mid-range performance estimates rooted in plausibility rather than constraint. They integrate constructs such as financial literacy, emotional reactivity, risk aversion, and platform familiarity. Unlike the VT, they do not assume LLM optimality and are evaluated based on whether their responses align with typical human behaviors under given constraints. Recent empirical studies demonstrate that LLM personas can exhibit realistic investment behavior when designed using psychological trait encoding, such as the Big Five personality model (Borman et al., 2024; Hartley et al., 2025).

However, DPs face significant ecological limitations. They do not learn from error, lack reward-based feedback loops, and exhibit no temporal memory of past trades—traits that are central to real investor cognition (Lux & Zwinkels, 2018). Furthermore, LLM-generated personas are sensitive to temperature settings, prompt design, and output randomness, which raises concerns regarding behavioral coherence and reproducibility, especially in longitudinal simulations (Gao et al., 2023; H. Jiang et al., 2024).

To mitigate such risks, simulations will be run under constrained randomness parameters with standard temperature controls and prompt regularization. Nonetheless, these limitations may challenge the authenticity of simulated investor behavior, particularly in high-stakes, feedback-sensitive domains like trading, which warrant further research.

4.3.3. Epistemic Triangulation and Inference Logic

This dual-agent design supports three structured inference scenarios:

Both baselines underperform the LLM → Strong evidence for LLM-enabled cognitive uplift and strategic enhancement.
Only the degraded Virtual Trader underperforms → Indicates that the LLM mimics best-practice human logic without necessarily surpassing it.
Digital Persona outperforms the LLM → Suggests that LLMs may introduce distortions or risk-taking strategies inconsistent with typical investor behavior.

This triangulation (See Figure 3) approach aligns with emerging standards in simulation-based causal attribution, particularly in environments where treatment exposure (LLM usage) alters the cognitive architecture irreversibly (Holland, 1986; Athey & Wager, 2019). It also reflects broader principles of counterfactual realism and hybrid benchmarking in AI–human comparison studies (Gui & Toubia, 2023; Anthis et al., 2025).

To address potential algorithmic myopia and improve generalizability, a human-validation stage is specified as part of the preregistered empirical agenda (see Appendix B). Its execution and results lie outside the scope of the present conceptual contribution; the cited protocol defines how it will be conducted (Gao et al., 2023; Q. Wang et al., 2025). As specified in our preregistration protocol (Gimmelberg et al., 2025a), the LLM-Augmented Trader (LAT) operates as a capability-routed, multi-model system. Tasks are dispatched to the most suitable large language model (LLM) based on a maintained capability matrix: complex tool-orchestration and multi-step tool-calling are routed to the top-performing GPT-5 family model; multimodal parsing that requires combined PDF-and-image handling is routed to the top-performing Claude Sonnet family model; task types where current evidence indicates weaker tool-calling are not routed to that family until evidence changes. Each task invocation is recorded in a registry capturing provider, model family, model alias, release channel, client/library version, and prompt/protocol version. We distinguish two change classes: within-family upgrades (e.g., Claude Sonnet 4.1 to 4.5; GPT-5 minor refresh) are treated as minor, whereas cross-family substitutions (e.g., moving a tool-orchestration task from Gemini to GPT-5) are treated as major. All changes are time-locked and posted to the Open Science Framework (OSF)2 change log (Gimmelberg et al., 2025a). Validation is structured on two layers: first, information-identical replays by the Virtual Trader (a bounded-rule comparator) and a Digital Persona (a behaviorally specified counterfactual) isolate decision-maker effects; second, a small practitioner vignette panel is specified to rate plausibility and ecological fit. Data validation accompanies both layers via time-stamped decision snapshots and vendor cross-checks (prices, implied volatility, open interest) with schema validation of machine-readable artifacts. This architecture accommodates rolling model improvements while preserving auditability and replication at the capability-class level; empirical execution belongs to the separate, preregistered study.

Asset selection in this study is procedural and dynamic: at each decision timestamp, candidates are selected by a time-locked screener; therefore, no fixed ex ante asset list exists. For replication, the realized asset set and the strategy instances corresponding to the reported runs will be archived at study launch and on each protocol version bump as OSF artifacts (time-stamped screener exports and contemporaneous option-chain snapshots), with Appendix B, Appendix B.1 providing the sampling rules, consistent strategy families, and the data collection and quality control procedures applied at decision time. Concrete trading strategy families are enumerated in Appendix B, Appendix B.1 and match the OSF preregistered workflow.

4.4. Behavioral Bias: Quantification and Controls

This study pre-specifies bias quantification and control logic before data collection.

Our amplification test is defined ex ante as pairwise differences in bias metrics between LAT and each baseline (LAT–VT; LAT–DP). We report effect sizes and uncertainty for (i) probability calibration/over-precision, (ii) confirmation/selective exposure in evidence slates, and (iii) the disposition effect, with signs and magnitudes interpreted as amplification if LAT is statistically worse than both baselines under identical information. Directionally, LLM scaffolding may reduce selective exposure by broadening evidence collection, yet it can increase over-precision by fluently narrated but overconfident rationales; if such contrasts fail to materialize once basic cost/risk controls are applied, we will conclude no amplification.

Controls are implemented via two replay agents that receive the same information as the LLM-Augmented Trader (LAT): a Virtual Trader (VT; bounded-rule comparator) and a Digital Persona (DP; behaviorally specified baseline). These serve as control groups at this stage; the practitioner vignette panel rates plausibility only. We will quantify three bias families using established measures. First, calibration error of decisions is evaluated with proper scoring/Brier-based calibration (and, where applicable, coverage of stated intervals), following the verification literature on reliability and strictly proper scoring rules (Dawid, 1982; Tilmann & Raftery, 2007). Second, the disposition effect is measured using the standard Proportion of Gains Realized (PGR) minus Proportion of Losses Realized (PLR) specification, adapted to options via mark-to-market classification of gains and losses at decision time (Shefrin & Statman, 1985; Odean, 1998; Weber & Camerer, 1998). Third, selective-exposure/confirmation is operationalized as the share of belief-congruent evidence in the decision’s recorded evidence slate, anchored in the selective-exposure meta-analysis and a finance-specific confirmation bias literature (Nickerson, 1998; Hart et al., 2009; Pouget et al., 2017). We avoid ad hoc thresholds; any change with inferential impact will be version-bumped in the protocol and disclosed prior to execution. The experiment has not started; preregistration is finalized. Operational rubrics, inputs, and worked audit examples for all bias metrics are provided in Appendix B, and in the preregistered protocol (Gimmelberg et al., 2025a).

5. Conclusions

This study proposes an operational behavioral finance model of LLM-augmented retail trading centered on perceived cognitive assistance (PCA). PCA may enable informed adoption of complex strategies or inflate confidence beyond competence; distinguishing these pathways is the empirical task addressed by Equation (1), the Behavioral Shift Index (Equation (2)), and the preregistered bias diagnostics. The construct is distinct from cognitive offloading, trust in automation, and perceived usefulness because it concerns investors’ self-appraisal of capability under real-time AI scaffolding (Logg et al., 2019; Jakesch et al., 2023). Empirically separating genuine capability gains from inflated self-assessment is difficult in real-time trading. The preregistered diagnostics and the BSI mitigate but do not eliminate this risk; identification of PCA is treated as a falsifiable claim corroborated by convergent evidence from surveys, behavior, and bias probes (Logg et al., 2019; Karinshak et al., 2023). Future work must therefore not only track behavioral outcomes but also refine tools for parsing underlying cognitive states.

Theoretically, PCA extends the Theory of Planned Behavior to AI-augmented contexts by specifying a form of perceived control that arises from machine scaffolding. Empirically, studies must distinguish actual capability enhancement from perceived enhancement; practically, platforms and regulators should anticipate confidence outrunning competence and design appropriate guardrails.

Validation follows a preregistered five-step agenda: assess cognitive preconditions and baseline strategy preferences; measure psychological mediation via PCA, attitudes, and norms; implement a dual-agent simulation (Virtual Trader via cognitive degradation; Digital Persona via structured prompting) for causal benchmarking; and evaluate behavioral change using BSI-linked metrics and performance outcomes. The agenda tests whether LLMs enable strategic sophistication or amplify behavioral risk when live control groups are infeasible; if sign predictions and BSI dynamics fail, the model is refuted.

Limitations—scope and constructs. The model posits directional migration from intuitive to complex strategies under LLM support, yet real-world behavior is nonlinear, shock-sensitive, and prone to intention–behavior gaps (Shefrin & Statman, 1985; Barberis et al., 1999; A. W. Lo, 2004). Intention–behavior gaps in fast-paced domains such as trading are well documented (Webb & Sheeran, 2006; Sheeran & Webb, 2016), raising doubts about whether perceived capability consistently translates into execution, particularly under stress or ambiguity.

Limitations—construct overlap and interaction risk. While PCA offers a theoretically distinct mechanism, introducing new measurement and interpretation complexities, it shares features with automation trust (Parasuraman & Riley, 1997), cognitive offloading (Risko & Gilbert, 2016), and perceived usefulness (Davis, 1989). This proximity raises concerns about discriminant validity. Moreover, the concurrent use of TPB, TAM, and Risk-as-Feelings introduces a risk of internal interaction effects. For example, vivid emotion-laden scenarios (Loewenstein et al., 2001) may override Attitudinal or Normative intention components. The model currently treats these mechanisms as additive rather than dynamic, limiting its predictive granularity.

Limitations—testability and generalizability. Behavioral proxies imperfectly capture internal states (Buçinca et al., 2020), so we triangulate surveys, behavior, and bias probes and treat inferences as provisional. External validity may be constrained by early-adopter samples (Venkatesh et al., 2003; Khan & Shabbir, 2025) and abstraction in simulation environments; future work should integrate brokerage APIs or high-fidelity market simulators to reintroduce frictions (Byrd et al., 2019; Vyetrenko et al., 2020) and broaden assets and settings (Henning et al., 2025).

Author Contributions

D.G.: conceptualization, methodology, visualization, writing—original draft preparation, I.L.: methodology, writing—review and editing, supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

GenAI Use Statement: During the preparation of this manuscript, the author(s) used Anthropic Claude Opus 4.1 and Claude Sonnet 4, OpenAI ChatGPT-4o and ChatGPT-5 with reasoning capabilities for the purposes of literature search assistance, retrieval-augmented generation (RAG) for source identification, initial draft text generation to support non-native English writing, and typographical error checking. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Behavioral Models Tests and Components

Appendix A.1. Components and Units of Measurement in the TPB-Based Regression Model

The unit of measurement for the regression equation is as follows:

B_{i (t)} = β_{0} + β_{1} L L M_{i (t)} + β_{2} P C A_{i (t)} + β_{3} L L M_{i (t) P C A_{i (t)}} + β_{4} A_{i (t)} + β_{5} S N_{i (t)} + γ^{T} C o n t r o l s_{i (t)} + ε_{i (t)}

(A1)

where

B_i(t): Behavioral outcome at time t (dependent variable; specified per test; see Table A1).
β coefficients represent the estimated weights or effects of each explanatory variable on the behavioral outcome B_i(t).
LLM_i(t): LLM engagement intensity.
PCA_i(t): perceived cognitive assistance.
A_i(t), SN_i(t): TPB components (attitude toward complex strategies; subjective norms).
Controls_i(t): vector of control covariates.
βk: regression coefficients;
γ: coefficient vector on Controls_i(t)
ε_i(t): Residual error.

Interpretation of coefficients is determined by how the dependent variable B_i(t) is operationalized.

Operationalization crosswalk: A_i(t), SN_i(t), and PCA_i(t) are defined in the main text; LLM_i(t) is engagement intensity; B_i(t) is selected from Table A1; the interaction LLM_i(t) × PCA_i(t) encodes moderated scaffolding implied by the theoretical model.

Table A1. Possible operationalizations of B_i(t).

Operationalization of Bi(t)	Unit of Measurement	Interpretation
Number of complex trades per week	Count (integer)	A discrete behavior frequency
Proportion of complex trades	Ratio (0–1)	A share or percentage
Portfolio risk score	Volatility or risk points	Risk exposure level
BSI score	Unitless composite index	Behavioral complexity measure
Change in strategy complexity	Delta in ordinal/index score	Magnitude of strategy shift
Self-assessment of behavior (Likert)	Ordinal (1–5/1–7)	Perceived behavior intensity

Implementation notes for Equation (A1). The unit of analysis is an investor observed over regular time periods (for example, weeks; days where logs allow). The dependent variable is one of the pre-specified behavioral measures listed in Appendix A.1 (for example, frequency of option trades, share of multi-leg strategies, concentration across underlyings). The main explanatory variables are the investor’s exposure to LLM assistance at time t, perceived control or complexity (PCA), and their interaction. Estimation will use ordinary least squares with standard errors that are robust to repeated observations per investor. If the dependent variable is a non-negative count or a bounded rate, a generalized linear model with an appropriate link function will be used. Baseline adjustments include investor-specific effects and time effects to absorb common shocks. Section 4.1 outlines the identification and robustness steps that will be applied in the empirical phase.

Appendix A.2. Components and Units of Measurement in the BSI Formula

B S I_{i, t} = w_{1} Δ MultiLe g_{i, t} + w_{2} Δ Frequenc y_{i, t} + w_{3} Δ Concentratio n_{i, t} + w_{4} Δ VolExposur e_{i, t}

(A2)

\sum_{k = 1}^{4} w_{k} = 1, w_{k} \geq 0

where

Symbols: i indexes investors; t indexes time; k ∈ {1, 2, 3, 4} indexes the four BSI components (MultiLeg, Frequency, Concentration, VolExposure); w_k are non-negative component weights with $\sum_{k = 1}^{4} w_{k} = 1$ ; w = (w₁, …, w₄).
t: a specific time point or interval (e.g., week 1 after ChatGPT launch).
ΔMultiLeg_i,t: Change in proportion of multi-leg options trades.
ΔFrequency_i,t: Change in trading frequency.
ΔConcentration_i,t: Change in portfolio concentration (e.g., top-3 asset weight).
ΔVolExposure_i,t: Change in exposure to implied volatility.

Weights w₁ to w₄ can be calibrated based on pilot data or expert input. Each component represents a change metric over time and is typically normalized.

BSI operationalizes the theoretical lenses into observable components: ΔMultiLeg and ΔFrequency reflect intention-to-execution channels linked to TPB/PCA; ΔVolExposure captures affect-driven divergence highlighted by Risk-as-Feelings; concentration reflects strategic reallocation. As such, BSI provides the cohort-level diagnostic complementary to Equation (1).

Table A2. Units of each Δ component.

Component	Δ Variable Description	Unit of Measurement
ΔMultiLeg_t	Change in proportion of multi-leg option trades	Ratio (0–1) or percentage points
ΔFrequency_t	Change in trading frequency	Trades per period (count)
ΔConcentration_t	Change in portfolio concentration	Ratio (0–1) or percentage points
ΔVolExposure_t	Change in volatility exposure	Volatility units or standardized scores

Table A3. Comparison: TPB Regression vs. BSI Model.

Aspect	TPB Regression Model	BSI Composite Index
Formula	B_i(t) = δ₀ + δ₁·LLM_i(t) + …	BSI_t = w₁·ΔMultiLeg_t + …
Dependent Variable	Behavior/intention of individual i at time t	Index at time t (cohort level)
Output Unit	Likert, %, or count	Unitless composite
Input Variables	TPB constructs + LLM exposure	Trading behavior metrics
Time scale	Cross-sectional	Time-series or panel
Use case	Explains how LLMs shift intentions	Measures actual behavior change
Level of Analysis	Micro (individual)	Meso/Macro (cohort or group)
Methodology	Regression (causal inference)	Composite diagnostics

Appendix A.3. Complementarity Within the Research Model

The TPB-based regression model and the BSI serve complementary purposes within the research framework. The TPB model provides a psychological and explanatory foundation. It quantifies how cognitive variables such as Attitude, Subjective Norms, and Perceived Behavioral Control—along with LLM engagement—affect behavioral intention and individual decision-making. This is essential for understanding the mechanisms by which LLMs influence investor psychology and behavior at the micro level.

In contrast, the BSI is diagnostic and empirical. It measures whether the theorized behavioral changes actually materialize in observable trading activity at the cohort or market level. The BSI aggregates multiple dimensions of trading behavior—strategy complexity, frequency, concentration, and volatility exposure—into a unified, time-sensitive index. It is especially valuable for detecting shifts over time or between exposed and non-exposed investor groups.

Together, these two models enable a full-circle investigation: the TPB regression tests the cognitive causality of LLM-driven behavior, while the BSI validates those effects in market behavior. They support both causal inference (via regression) and behavioral diagnostics (via index tracking), ensuring that psychological shifts are not only statistically modeled but also empirically verified in real-world trading data.

Appendix A.4. Convergent/Discriminant Validity Test for PCA

To establish construct validity, PCA will be subjected to convergent and discriminant validity testing using confirmatory factor analysis (CFA). Survey items will be developed in parallel with validated scales for perceived usefulness (Venkatesh & Davis, 2000), trust in automation (J. D. Lee & See, 2004; Hoff & Bashir, 2015), and overconfidence bias (Bruine De Bruin et al., 2007; Parker et al., 2020). Discriminant validity will be evaluated through the Fornell-Larcker criterion (Fornell & Larcker, 1981) and HTMT ratios (Henseler et al., 2015). In addition, exploratory factor analysis (EFA) will be conducted on pilot data to ensure PCA items form a distinct factor structure reflective of AI-scaffolded self-efficacy in decision-making under complexity (Y.-Y. Wang & Chuang, 2023; Morales-García et al., 2024).

To operationalize PCA, we design psychometric items that specifically target AI-scaffolded self-efficacy rather than general trust or perceived utility. The following illustrative items reflect discriminant targeting:

PCA (AI-scaffolded self-efficacy):
“I feel confident executing complex trading strategies because the AI helps me understand the required steps.”
Automation Trust (system reliability):
“I trust AI systems to make accurate and responsible trading recommendations.”
Perceived Usefulness (task enhancement):
“AI systems improve the overall quality of my trading performance.”

These items will be included in confirmatory factor analysis (CFA) to establish convergent and discriminant validity. The PCA scale specifically targets user belief in their own cognitive capability under AI support, not belief in the AI system itself.

Appendix B. Simulation Agent Design: Virtual Trader and Digital Persona Framework for Causal Inference

This appendix outlines the methodological design of the dual-agent simulation framework used to evaluate the causal contribution of large language model (LLM) augmentation in retail investor behavior. The framework comprises two epistemically distinct but empirically comparable agents: the Virtual Trader, constructed through a degradation-based simulation of bounded rational cognition, and the Digital Persona, generated via behaviorally conditioned prompting to emulate plausible retail investor decision patterns. Together, these agents enable counterfactual inference in environments where traditional control groups are infeasible due to the irreversible cognitive effects introduced by LLM adoption (Gerlich, 2025; H.-P. Lee et al., 2025).

The Virtual Trader serves as a cognitively constrained counterfactual, allowing for decision-level pairwise comparisons against LLM-assisted trades using identical informational inputs. Its architecture applies degradation across ten empirically validated cognitive domains to simulate the limitations of unaided human reasoning under financial uncertainty. This approach is grounded in bounded rationality theory (H. A. Simon, 1955; Giarlotta & Petralia, 2024), the AMH (A. W. Lo, 2004), and recent studies documenting persistent human–AI asymmetries in volatility interpretation, multimodal reasoning, and emotional regulation (Z. Wang et al., 2023; K. Zhang et al., 2025).

The Digital Persona, by contrast, is designed to model human plausibility rather than constraint. It is constructed using structured prompts that encode demographic, psychological, and experiential investor attributes. By querying the LLM to “think like” a representative retail trader under given market conditions, this agent provides a behaviorally realistic baseline for evaluating whether LLM-assisted trades deviate from or merely replicate human decision heuristics. The logic draws on advances in generative agent modeling (J. S. Park et al., 2023; Ghaffarzadegan et al., 2024) and computational behavioral simulation (Lux & Zwinkels, 2018; Gao et al., 2024), and addresses the need for multi-perspective benchmarking in AI–human comparison research (Han et al., 2024; Ying et al., 2025).

This dual-agent design allows for triangulated causal inference: the Virtual Trader estimates the lower bound of unaided human cognition, while the Digital Persona reflects a demographically plausible behavioral midpoint. Their comparison to validated LLM-assisted trades facilitates multidimensional performance attribution, helping disentangle true cognitive augmentation from strategy mimicry or behavioral distortion.

The dual-agent simulation framework involves several methodological constraints and design choices that reflect established practices in agent-based computational economics rather than requiring extensive literature justification for each assumption. These include the bounded cognitive degradation parameters applied to the Virtual Trader, the non-learning and non-adaptive architecture of the Digital Personas, the exclusion of memory and feedback loops, the omission of social learning mechanisms, and the simplification of cognitive interdependencies through modular rather than fully interactive impairments. Such constraints align with the principle that agent-based models benefit from transparency about simplifying assumptions and computational tractability considerations, where the primary requirement is clear documentation of design choices and their implications rather than exhaustive literature support for every modeling decision (LeBaron, 2006; Lux & Zwinkels, 2018). This approach follows the established agent-based modeling philosophy that emphasizes generative explanation through algorithmic specification while maintaining empirical adequacy in the model’s micro-specification (Epstein, 1999; Conte & Paolucci, 2014). Additionally, the simulation framework enables systematic assumption robustness testing of the cognitive constraint parameters and behavioral typologies drawn from the literature, revealing which assumptions most strongly influence model outcomes and validating the internal consistency of the theoretical framework. The framework prioritizes methodological transparency, sensitivity analysis, and robust validation over extensive citation requirements for technical implementation details, consistent with contemporary standards in computational finance research.

While not designed to measure PCA directly, the simulation results may be consistent with PCA-mediated behavioral changes, particularly in areas of multimodal integration (where LLM-assisted agents can simultaneously process technical indicators, volatility data, and macroeconomic signals), strategic complexity adoption (reflected in migration from simple momentum strategies to sophisticated volatility-based instruments), and risk perception calibration (evidenced by systematic differences in volatility interpretation and position sizing between LLM-augmented and cognitively constrained baseline agents).

Appendix B.1. Dataset Construction and Requirements

Trade Sample Size
To ensure statistical robustness, we require a minimum of 500 matched trade pairs (LLM-assisted and simulated), supporting the following:

Paired t-tests for return differentials (α = 0.05, power = 0.80).
Regression-based analyses of risk-adjusted performance.
Subgroup comparisons by strategy type and market regime.

This figure is based on power calculations for medium effect sizes (Cohen’s d ≈ 0.5) and standard econometric thresholds (Kang, 2021; McNulty, 2021; Firoozye et al., 2023).

Asset Selection
To minimize idiosyncratic noise and ensure strategic generalizability, the dataset shall include 20–30 low-correlation assets (|ρ| < 0.3), drawn from the following:

Equities (e.g., AAPL, MSFT, IWM, QQQ).
Volatility ETFs (e.g., VXX, UVXY).
Macro instruments (e.g., TLT, GLD, BTC).

This structure enables cross-sectional validation and controls for strategy clustering by asset type or beta exposure (Narayan et al., 2023; C. Sun, 2023).

Strategic Variants
Each trade pair shall be assigned one of canonical strategies, covering both intuitive and complex forms, such as the following:

Momentum (price breakout).
Contrarian (mean-reversion with volatility filter).
Volatility overlays (e.g., long straddles).
Hybrid (momentum + IV skew).
Event-driven (earnings, Fed policy).
Passive hold with timing overlays.

These categories are drawn from both academic typologies and practical trading classifications (Galariotis, 2014; Ghosh, 2025).

Operational trading families (consistent with OSF registration (Gimmelberg et al., 2025a)).

Directional: 1–9-week momentum/continuation or breakouts when long-trend and Story gates hold.
Volatility overlays: long-vol versus short-vol expressions chosen by IV/term/skew context to express dispersion or containment.
Theta plays: income-oriented overlays that monetize time decay while respecting structural levels and volatility posture.
Earnings-print plays: event-constrained tactics around expected move and term structure under explicit guardrails.
STORY (including Dividend STORY) with tactical overlays: actions conditioned on a validated narrative, using overlays to scale in/out or harvest carry without breaking the thesis.
Temporary ranging long stocks: range-bound/channel tactics and pin-type overlays when structural levels and positioning imply containment.

The simulations reported in this paper use only the families enumerated above; individual option combinations are selected within-family under the risk policy and are not enumerated here.

Market Regime Coverage

Trades should be distributed across the following:

Bull vs. bear markets.
High vs. low volatility regimes (based on VIX > 20 threshold).
Earnings/non-earnings weeks.

This enables sensitivity analysis of LLM vs. human divergence under varying cognitive loads (Q. Wang et al., 2024; W. W. Li et al., 2025).

Replicability note.

Because selection is procedural rather than a fixed list, the authoritative realized set for this paper will be posted to OSF as ‘Registered Asset Universe’ artifacts: time-stamped screener exports and contemporaneous option-chain snapshots for each decision timestamp. Inputs are restricted to information available at the decision time (no-lookahead), with cross-vendor sanity checks for prices, implied volatility, and open interest, T + 1 option flow confirmation when flow inferences are used, and schema validation of chain completeness; any procedural change is version-bumped and logged on OSF. Threshold values and internal cutoffs are maintained in the registry.

Appendix B.2. Agent Architecture Overview

Table A4. Agent Architecture.

Agent Type	Design Logic	Epistemic Role	Simulation Method
Virtual Trader	Cognitive constraint modeling	Bounded human logic (lower bound)	Degradation
Digital Persona	Behavioral emulation via prompting	Demographically plausible behavior	2-step prompting

The Virtual Trader is built by applying empirically validated cognitive limitations to the same informational input that drives LLM trade generation. The Digital Persona is constructed by prompting an LLM to simulate a plausible retail investor under realistic constraints. These simulations are matched against validated LLM-assisted trades to evaluate whether observed performance uplifts are attributable to augmentation or strategy selection bias.

This dual-agent structure reflects two epistemological lenses: degradation identifies the cognitive uplift of LLMs by modeling human limitations, while emulation reveals whether LLMs deviate from plausible human behavior. Their complementary roles resolve the fundamental causal attribution problem in post-LLM adoption settings.

Appendix B.3. Simulation Design: Cognitive Degradation Parameters

To simulate boundedly rational human decision-making under market complexity, the Virtual Trader applies empirically validated degradations across several cognitive dimensions. Each domain reflects a systematic constraint that distinguishes unaided human cognition from LLM-assisted reasoning, drawing on experimental research in financial decision-making, behavioral economics, and AI–human asymmetry studies (Henning et al., 2025; Mainali & Weber, 2025).

These degradations are modular, probabilistically weighted, and dynamically interacting, enabling flexible simulation of realistic investor performance under cognitive load. The table below summarizes the performance differential between LLMs and human agents by cognitive domain.

The following Table A5 presents a theory-informed proposal for cognitive asymmetries between LLM-assisted and unaided human decision-making relevant to financial trading tasks. These ten dimensions are not intended as definitive, empirically validated categories, but as an initial conceptual framework derived from adjacent research in behavioral economics, cognitive psychology, and emerging AI–human performance studies. While preliminary findings from these fields suggest plausible performance differentials, systematic empirical evaluation of these distinctions—particularly in complex financial decision environments—remains limited. Table A5 summarizes expected qualitative differences across each dimension, serving as a structural guide for simulation parameterization in the VT model. To avoid overstating the evidentiary status of these assumptions, detailed research has been omitted from this paper. However, relevant studies informing specific dimensions are cited within the accompanying simulation parameter descriptions.

Table A5. Proposed degradation dimensions.

Dimension	LLMs	Humans
Processing Speed	Quick execution per trade scenario; process vast amounts of data consistently without fatigue, large context window	Slower evaluation; multimodal switching cost, limited “context window”
Input Retention	No memory loss across input streams; able to handle complex multisource data simultaneously	Limited working memory; chance of omitting secondary inputs under cognitive load
Bias Resistance	Reduced susceptibility to emotional and cognitive biases; biases present but potentially mitigable; unaffected by prior beliefs or emotions	Well-documented vulnerability to emotional and cognitive biases in trading
Multimodal Integration	Synthesizes chart patterns, macro data, and options Greeks simultaneously into coherent reasoning	Often relies on few dominant signals or simple heuristics due to overload or lack of training
Fatigue and Focus	Performance stable over time; no degradation across multiple trade evaluations	Pattern detection errors rise after several evaluations; accuracy drops with time-on-task
Volatility Interpretation	Parses IV skew, gamma, and vega structures with consistency and fewer errors	High variance in interpretation; misestimation of IV thresholds common among retail traders
Analytical Depth	Capable of generating detailed, structured financial narratives and scenario analysis	Dependent on intuition and experience; struggles with scaling across large datasets
Adaptability and Learning	Retrainable on new datasets; consistent output over fixed training distribution	Adapt in real-time through experience, error correction, and feedback loops
Emotional Resilience	Unaffected by stress, mood, or fear of loss; fully consistent under pressure	Emotional stress can degrade judgment, increase error rates
Creativity and Intuition	Generates output from statistical pattern recognition; lacks true creativity or intuition	Capable of intuitive judgment and creative problem-solving under novel or ambiguous conditions

This degradation, once operationalized, is modular and adjustable, enabling calibration for alternative cognitive baselines, investor profiles, and trade complexity:

Information Processing Constraints

LLMs demonstrate superior processing speed across multiple data modalities relevant to trading analysis. LLMs process text at 30–100+ tokens per second,3 analyzing entire financial statements in seconds (Kiely, 2025), while human reading speed averages 200–300 words per minute with comprehension decreasing for complex material (Rayner, 1998; Brysbaert, 2019). Research shows reading speeds above 500–600 words per minute cause significant comprehension loss (Rayner et al., 2016), and human attention spans average only 76.24 s for sustained tasks (A. J. Simon et al., 2023). Financial analysis requires sequential processing of numerical data, ratios, and contextual information—tasks where humans are fundamentally limited by serial processing constraints due to working memory capacity of approximately 7 ± 2 information chunks (Miller, 1956), and cognitive load limitations in financial decision-making (Broadbent, 1958; Parte et al., 2018; Rose et al., 2004). In contract LLMs can simultaneously analyze price charts, CSV files, and tabular data in seconds without modality-switching costs (S. Yin et al., 2024; Y. Zhang et al., 2024) while maintaining context window of 200,000 tokens4 (~150,000 words). Document processing studies demonstrate AI systems analyze legal and financial documents 50–275 times faster than human professionals (Grossman & Cormack, 2011; Martin et al., 2024). Trading analysis may amplify these differentials because financial reasoning is cognitively complex and time-consuming for humans (Rubinstein, 2013), whereas recent studies show LLMs extract predictive signals from financial text quickly—e.g., GPT-style scores from news headlines predict next-day returns and subsequent drift; LLMs also classify central-bank “Fedspeak” in ways that map to market reactions - often matching or exceeding traditional baselines (Lopez-Lira & Tang, 2024; Hansen & Kazinnik, 2024; Jadhav et al., 2025). Based on these processing limits and sequential-attention constraints, we parameterize the Virtual Trader to evaluate only 25% of the setups an LLM processes in the same interval. Simulation Parameter: The Virtual Trader evaluates only 25% of the trade setups the LLM processes. Formula: Throughput Reduction = 0.75.

2.: Input Retention

Humans experience systematic information dropout under cognitive load, particularly when processing multiple data streams simultaneously. Research demonstrates that working memory capacity limits humans to effectively monitoring 3–5 information streams (Cowan, 2001), with performance degradation of 20–50% when exceeding these limits (Lavie, 2005). Studies show inattentional blindness rates of 44–65% under high cognitive load (Simons & Chabris, 1999), while divided attention tasks result in 30–40% information loss for secondary inputs (Middlebrooks et al., 2017). Trading environments amplify these limitations, with 51% of incidents attributable to situation awareness failures when monitoring multiple information sources (Leaver & Reader, 2016). In contrast, LLMs maintain 95–98% information retention across all input streams without selective attention constraints (J. Wang et al., 2024), processing multiple data modalities simultaneously without performance degradation. Simulation Parameter: Random dropout of 1–2 informational classes per trade with p = 0.35 probability across input types.

3.: Bias Resistance

The comparative analysis of cognitive biases in LLMs versus human decision-makers reveals a nuanced landscape that defies simplistic categorization (Sumita et al., 2024). While LLMs demonstrably exhibit systematic cognitive biases including anchoring, confirmation, and overconfidence effects (Y. Zhang et al., 2023; Lou & Sun, 2024), research confirms these biases are “measurable and mitigatable” through sophisticated intervention strategies. Advanced techniques such as the AwaRe (Awareness Reminder) methodology, multi-agent frameworks achieving 81% accuracy improvements, and self-help debiasing protocols have shown substantial bias reduction capabilities (Sumita et al., 2024). When these sophisticated mitigation approaches are systematically employed, LLMs arguably demonstrate superior bias resistance compared to human traders, who face quantifiable performance costs of 2–7% annually from overconfidence and 3–5% from disposition effects (Campbell & Sharpe, 2009; J. Park et al., 2010; Cen et al., 2013). This suggests that technological solutions, when properly implemented with comprehensive bias monitoring, may offer meaningful advantages over human-only decision-making systems. Simulation Parameter (unified degradation formula): VTDS = α₁(A) + α₂(O) + α₃(R) + α₄(Re) + α₅(C) + α₆(H) + α₇(F) + ε, where research-based coefficients for anchoring (0.795), overconfidence (0.572), recency (0.71), representativeness (0.773), confirmation (0.69), heuristic (0.434), and framing biases (0.649–0.815) are weighted according to trading strategy and market regime requirements

4.: Multimodal Integration

Multimodal integration reflects the fundamental disparity between human cognitive processing bandwidth and LLM parallel processing capabilities in real-time financial decision-making contexts (Vaswani et al., 2017; S. Yin et al., 2024). Human traders face systematic temporal constraints when attempting to synthesize multiple information streams simultaneously within trading timeframes due to well-established cognitive bottlenecks in concurrent information processing (Sweller et al., 2011; Pernagallo & Torrisi, 2022). Research demonstrates that high cognitive load conditions under time pressure produce measurable performance degradation in both speed and accuracy, with systematic decrements documented across laboratory and real-world environments (Ferrari, 2001; Rendon-Velez et al., 2016; Castro et al., 2019; Ferrari, 2001; Castro et al., 2019). In contrast, LLMs can process equivalent multimodal information through parallel channels without temporal switching costs or capacity degradation (W. Zhang et al., 2024; J. Huang et al., 2025). The constraint therefore operates as a processing bandwidth limitation per unit of time rather than absolute analytical capacity (Pernagallo & Torrisi, 2022)—human traders can process extensive signal sets given sufficient time, but face systematic throughput restrictions under real-time market conditions. The convergent evidence across disciplines provides specific parameters for realistic trading simulations include: Cognitive capacity limits 3–4 items in working memory (Cowan, 2001; Chai et al., 2018) degrading to effective processing of only few complex multimodal signals under trading conditions; Attention switching costs: Measurable reaction time increases of 200–500 ms when switching between data modalities, critical in fast-moving markets (Kuchinsky et al., 2024). This comprehensive review suggests implementing the signal constraint in Virtual Trader simulations. The parameter reflects fundamental cognitive architecture limitations rather than lack of training or expertise. Simulation Parameter: Use of only 4–5 signal types per trade setup.

5.: Cognitive Fatigue and Focus

LLMs can evaluate complex trades faster than humans and humans suffer from attention degradation. Based on cognitive fatigue research, sustained attention tasks cause gradual performance degradation and vigilance decrement (Pimenta et al., 2014), with accuracy dropping 15–25% after 4–6 consecutive evaluations (Behrens et al., 2023). Research demonstrates that vigilance tasks cause performance decrements within 15 min of sustained attention (Warm et al., 2008), with cognitive fatigue significantly impacting analytical accuracy. Human working memory is limited to 3–5 chunks of information (Cowan, 2001), constraining simultaneous processing of complex financial data. Studies show decision quality deteriorates with sustained cognitive load—physicians’ surgical scheduling dropped by 33% toward shift end (Persson et al., 2019; Grignoli et al., 2025). LLMs process information at computational speeds without fatigue, maintaining consistent performance (Bartsch et al., 2023; Y. Li et al., 2025). Considering these limitations—working memory constraints, vigilance decrements, decision fatigue, and need for breaks—we conservatively estimate humans can effectively evaluate 25% of LLM throughput. This operationalization reflects robust evidence that repeated decision-making depletes cognitive resources and degrades performance accuracy (Eriksen & Eriksen, 1974; Baumeister, 2018). Simulation Parameter: Add 25% to false positive rate after the 5th trade in each simulation session5.

6.: Volatility Interpretation

Human traders systematically misinterpret implied volatility (IV) skew and term structure, leading to biased option pricing and suboptimal strike selection. This distortion arises from a combination of perceptual biases, overestimation of tail risks, and systematic overpricing of volatility—particularly among retail market participants. Empirical studies document that traders frequently overweight low-probability events embedded in the IV skew (Felix et al., 2020) and that retail-driven option markets exhibit inflated implied volatilities inconsistent with objective risk (Choy, 2015). Laboratory and field experiments further reveal that volatility perception is path-dependent and prone to misestimation, with traders underestimating volatility after calm periods and overestimating it following recent market turbulence (Payzan-LeNestour et al., 2023). These findings justify the introduction of a volatility interpretation degradation in the Virtual Trader simulation, specifically, a 30% probability that the trader misreads IV skew or term structure. Simulation Parameter: Introduce a 30% chance of IV misreading and random strike selection error for option construction.

7.: Analytical Depth

LLMs consistently generate structured, multi-scenario logic, including defined entry, target, and stop-loss frameworks (Yu et al., 2024; Liu et al., 2025). In contrast, human traders typically rely on single-scenario forecasts and rarely employ formal risk controls. Industrial and experimental data show most retail investors omit stop-loss planning, contributing to disposition effects (Odean, 1998; Richards et al., 2017), and narrow scenario planning is common without external scaffolding (Meissner & Wulf, 2013). Overconfidence further exacerbates this neglect (Fenton-O’Creevy et al., 2003). Accordingly, the Virtual Trader uses a single forecast and omits structured exit logic to realistically simulate this cognitive degradation. Simulation Parameter: Virtual Trader uses one forecast only; omits stop-loss or target logic.

8.: Adaptability and Learning

Recent academic research underscores the structural learning asymmetry between LLM-driven trading agents and human retail investors. Gu et al. (2024) demonstrate that LLM-enhanced portfolio management systems employ reinforcement learning and multimodal inputs to dynamically adjust strategic positions across evolving market conditions, including volatility shifts and macroeconomic regime transitions. W. Zhang et al. (2024) similarly show that LLM-powered financial agents integrate memory retrieval and tool augmentation to modify trade logic responsively as external conditions change. In contrast, extensive behavioral finance evidence documents the cognitive rigidity of retail investors finding that individual traders exhibit low strategic adaptability resulting in poor performance persistence (Seth et al., 2020; Tan et al., 2023). This contrast—LLM-enabled regime-sensitive adjustment versus human heuristic inertia—necessitates divergent simulation logic to ensure realistic benchmarking. Simulation Parameter: Human traders will be simulated with no regime-level adaptation, applying a fixed strategic rulebook consistently across market regime transitions (e.g., low to high volatility periods), reflecting empirically observed patterns of strategic inertia and low learning capacity among retail investors. In alignment with your simulation design, this parameter applies at the trade-to-trade and market condition level, not within individual intraday sessions.

9.: Emotional Resilience

Recent research confirms that Large Language Models (LLMs) demonstrate structural emotional neutrality, maintaining consistent decision-making even under simulated emotional pressure. Schlegel et al. (2025) show that LLMs like ChatGPT-4 solve emotionally complex tasks without performance degradation, applying emotional reasoning consistently regardless of emotionally charged content. Similarly, Fan et al. (2025) report that LLM-generated outputs exhibit dampened emotional tone and resist volatile mood swings, reinforcing their immunity to stress-induced distortions. In contrast, human retail investors exhibit pronounced emotional reactivity. Bawalle et al. (2025) document widespread panic selling among overconfident investors, driven by fear following losses. Finet et al. (2025) further reveal that negative emotions in novice traders impair decision quality, triggering premature exits or irrational trades. These asymmetries justify modeling retail investors with a 15–20% probability of early exit after losses. This parameter appropriately reflects empirical behavior, contrasting with LLMs’ emotional consistency, and provides a realistic, evidence-based input for behavioral trading simulations. Simulation Parameter: 15–20% chance of early exit if previous trade was a loss, capturing empirically documented patterns of panic selling and emotional reactivity among retail investors. This parameter, combined with ε ~ U(0, 0.05), realistically reflects both the average probability and individual variability in loss-induced premature exits observed in behavioral finance research. Formula: P(early exit) = 0.15 + ε, with ε ~ U(0, 0.05).

10.: Creativity and Intuition

The relationship between LLMs and human traders in the domain of creative strategy generation reflects a structural asymmetry rather than a contradiction. Recent research confirms that LLMs apply statistical logic with high internal consistency but lack genuine creative generalization, limiting their ability to independently generate novel, unconventional strategies (Lopez-Lira, 2025; Song et al., 2025). In contrast, human traders possess true intuitive and innovative capacity, but this same attribute exposes them to measurable strategic misfires, where gut-driven decisions frequently diverge from data-supported optimality (Uhr et al., 2021; Akepanidtaworn et al., 2023; Escobar & Pedraza, 2023). This behavioral trade-off—innovation potential versus misalignment risk—is well documented, though the precise misfire rate remains debated. For simulation purposes, a 10% misalignment probability for human traders provides a conservative yet defensible operationalization of this cognitive asymmetry. Simulation Parameter: 10% chance of generating a misaligned, unconventional strategy appropriately reflects the lower-bound empirical risk of human intuitive misfires while maintaining methodological caution.

Interdependency Modeling

Cognitive dimensions are not treated as independent. These interaction effects are implemented via a dynamic rule-based engine, allowing for conditionally weighted impairments.

Fatigue increases dropout probability by 20%.
Emotional arousal increases IV misreading likelihood by 30%.
Bias susceptibility amplifies under volatile market regimes.

Interaction terms can be drawn from experimental cognitive models and task-level simulation studies.

LLM Trade Validation Prior to Degradation

To ensure that degradation comparisons reflect cognitive impairment rather than input error, only LLM trades passing a two-step confidence filter are subjected to simulation:

Signal Alignment Test: Trade rationale must show internal coherence across chart structure, IV skew (implied volatility), and GEX (dealer gamma exposure) positioning.
Confidence Scoring: Trade explanations must be rated “high confidence” by the LLM’s self-reflective prompt or a second-model verification (Ganguli et al., 2022a; Shinn et al., 2024).

This safeguards the integrity of the Virtual Trader as a bounded cognitive simulation rather than a corrupted input model.

Appendix B.4. Digital Persona: Prompt-Based Behavioral Emulation via LLM Prompting

The Digital Persona simulation models what a plausible, unaided retail investor might decide under identical market conditions to those seen by an LLM-assisted agent. Rather than simulating cognitive degradation, as with the Virtual Trader, the Digital Persona framework aims to emulate behaviorally grounded human decision-making using a structured prompting methodology. This approach enables a second axis of causal attribution: it allows for the evaluation of whether LLM-enhanced trades reflect true cognitive augmentation or merely replicate human-like behavior under complex financial conditions.

This modeling strategy draws upon agent-based simulation research (Lux & Zwinkels, 2018), generative persona frameworks (J. S. Park et al., 2023), and psychographic investor classification from behavioral finance (Almansour & Elkrghli, 2023; Z. Jiang et al., 2024). By capturing decision logic anchored in recognizable archetypes and modulated by psychological traits, Digital Personas provide a mid-range behavioral baseline for benchmarking LLM outputs.

Investor Archetype Layer

Personas can be defined through empirically grounded investor types—each with clear demographic and behavioral attributes. This structure ensures intuitive prompt generation, supports experimental reproducibility, and allows researchers to systematically vary key cognitive–emotional traits across simulations, consistent with documented heterogeneity in retail investor behavior (Barber & Odean, 2000; Gerrans et al., 2023; Kumar, 2009). Specifically, extensive research confirms that age, digital fluency, investment experience, and susceptibility to cognitive biases systematically influence both strategy preferences and trading behavior (Grinblatt & Keloharju, 2009; Korniotis & Kumar, 2009; Warkulat & Pelster, 2024). The proposed archetypes reflect these empirically established dimensions, as defined in Table A6.

Table A6. Persona Types.

Persona Type Example	Core Characteristics	Typical Strategy Profile	Behavioral Biases
Conservative Retiree	Age 65+, income-focused, low digital literacy	Bond ETFs, dividend stocks, low turnover	High loss aversion, inertia
Tech-Savvy Millennial	Age 25–35, high digital fluency, growth-seeking	Crypto, speculative tech, high-frequency trades	FOMO, recency bias
Experienced Amateur	Age 40–55, 10+ years market exposure, rule-based cognition	Options spreads, sector rotation, event trades	Overconfidence, underreaction
Novice Enthusiast	Age 20–30, <2 years experience, social media influenced	Meme stocks, trend following, short-hold momentum	Herding, anchoring, thrill-seeking

Each example archetype is internally defined by risk tolerance, financial literacy, emotional disposition, and trading platform familiarity—allowing prompt structures to reflect not just socioeconomic labels but actionable decision tendencies. This typology aligns with recent calls in behavioral finance to integrate psychological realism into computational models of retail investor behavior (Lux & Zwinkels, 2018; Bortoli et al., 2019).

Causal Modeling Constraints

To maintain the scientific validity of the simulation, the following intentional exclusions are applied:

No memory or feedback loops: Personas do not retain trade history or adapt based on past outcomes.
No reinforcement logic: Strategy choice is not influenced by synthetic rewards or simulated gain/loss records.
No social learning or community modeling: Personas are isolated from group sentiment, crowd signals, or simulated feedback mechanisms.
No path-dependent evolution: Risk profiles, biases, and strategy tendencies are fixed within each simulation episode.

These exclusions are by design. Dynamic learning, memory, and reinforcement structures are reserved for future iterations involving hybrid agents. In this framework, the Digital Persona functions as a bounded and non-adaptive control, preserving clean counterfactual logic for causal inference.

Fidelity Anchors and Behavior Modulation

To enforce behavioral realism, the following mechanisms are implemented:

Risk-aligned decision logic: Personas with low risk tolerance avoid leveraged instruments or multi-leg derivatives unless strongly justified by signals.
Emotionally modulated responses: Anxious profiles avoid volatility overlays during VIX spikes; overconfident personas may chase trend reversals.
Trait-behavior congruence: Internal consistency is maintained between persona traits and the chosen strategy (e.g., a Conservative Retiree cannot execute a long straddle on TSLA).
Heuristic structuring: Output logic emulates retail mental models—such as round-number targeting, recency-based signal preference, or “confirmation bias” pattern reinforcement.

Validation Protocols and Reproducibility Controls

To ensure methodological transparency and replicability, each simulation undergoes the following validation tests:

Table A7. Validation Protocol.

Test Type	Purpose
Temporal Consistency Test	Re-running persona prompt with identical input at different times yields similar outputs.
Cross-Scenario Coherence	Behavioral traits remain stable across regimes (bull, bear, neutral).
Framing Effect Control	Varying prompt phrasing does not cause illogical strategic divergence.
Temperature Sensitivity	Outputs are stable across runs at fixed temperature (set at 0.4).
Randomness Mitigation	Multiple iterations confirm output stability and noise resilience.

Prompt scaffolding is additionally standardized through template locking to ensure lexical and structural consistency across archetypes.

It operationalizes the view that LLM–human comparisons must be anchored not only in accuracy but also in ecological and psychological plausibility (Lux & Zwinkels, 2018; Salemi & Zamani, 2024).

Appendix B.5. Performance Metrics and Evaluation Framework

Trade-level outcomes are compared using both absolute and risk-adjusted metrics to enable multi-dimensional attribution of performance differences to cognitive constraints rather than strategy selection or random market fluctuations, in line with established risk-adjusted performance standards and error-of-omission diagnostics in financial research (Barber & Odean, 2000; A. W. Lo & MacKinlay, 1988; Sharpe, 1994):

Table A8. Performance Metrics.

Metric	Description
Return	Absolute net gain/loss per trade (LLM vs. Virtual Trader)
Sharpe Ratio	Risk-adjusted return measured over a rolling window, standardizing performance relative to volatility exposure.
Max Drawdown	Largest cumulative loss from peak to trough
Win Rate	Percentage of trades yielding positive returns, reflecting execution consistency.
Missed Opportunity Rate	Percentage of profitable LLM trades skipped by the Virtual Trader, isolating the behavioral and cognitive cost of bounded rationality and conservatism.
Latency-adjusted ROI	Return per unit of evaluation time, accounting for cognitive processing speed as a contributor to performance differentials.

These complementary metrics allow for rigorous decomposition of performance differences, ensuring that cognitive uplift from LLM augmentation is empirically distinguishable from alternative explanations such as strategy bias, volatility regime effects, or random variation.

Appendix B.6. Limitations and Extensions

Despite the methodological rigor of the dual-agent simulation framework, several limitations persist that constrain internal validity, ecological realism, and generalizability. These issues are addressed in part by a set of proposed extensions, designed to enhance future iterations of this study.

Residual Methodological Constraints

Baseline Dependence on LLM Validation
The quality of the Virtual Trader simulation is constrained by the reliability of the LLM-assisted trade it mirrors. Without robust hallucination detection and contradiction screening, the benchmarked trade may encode subtle structural flaws. Even validated LLMs exhibit confident errors under stress scenarios or adversarial prompts (Ganguli et al., 2022b; Shinn et al., 2024).
Static Agent Architecture
Digital Personas are designed as non-learning, non-adaptive agents to preserve causal inference integrity. This excludes path-dependent decision-making, memory of past trades, and incentive-based learning—all of which play meaningful roles in real-world behavior. While this design choice ensures clean counterfactual attribution, it limits realism, particularly for experienced investors (Lux & Zwinkels, 2018).
Absence of Trader-Type Calibration
Degradation parameters applied to the Virtual Trader are drawn from empirical aggregate benchmarks but are not yet tailored to specific archetypes (e.g., high-frequency traders, swing traders, retirees). This generalization may mask behaviorally relevant asymmetries in bounded cognition across subgroups (Almansour & Elkrghli, 2023; Ruggeri et al., 2023).
Ecological Validity Gaps
The simulation environment abstracts away real-world frictions such as bid-ask spreads, slippage, mobile interface constraints, and platform-induced cognitive load. These omitted variables are known to shape live trade execution behavior (Barber & Odean, 2002; Wheeler & Varner, 2024).

Validation Architecture

To address foundational concerns around LLM infallibility and model reproducibility, a multi-tiered validation framework is proposed:

LLM Baseline Reliability

Ensemble Model Comparison: Cross-validation across distinct LLM families (e.g., GPT-4, Claude, Gemini).
Confidence Scoring and Contradiction Detection: Flag low-certainty and conflicting outputs.
Expert Calibration: Manual review of a sample of LLM trades by human finance experts.

Adversarial and Regime Testing

Hallucination Stress Tests: Detecting factual inconsistencies in rationale.
Bias Amplification Monitoring: Checking for recency, confirmation, and availability biases.
Scenario Boundaries: Evaluating LLM consistency in volatile markets and during black-swan events.

Reproducibility and Robustness Protocols

To ensure internal coherence and external generalizability, a triple-validation architecture is recommended:

Table A9. Internal & External Coherence.

Validation Tier	Purpose
Internal	Trait–behavior coherence in Personas; degradation logic stability across trade types
External	Consistency with behavioral finance findings and retail trading datasets
Temporal	Simulation outcomes replayed on out-of-sample market data for cross-period validation

Reproducibility is further supported by bounding randomness (LLM temperature = 0.4), prompt anchoring, and multiple reruns per scenario.

Appendix B.7. Sensitivity and Stress Testing

Simulation fidelity can be improved through parameter sensitivity analysis, including the following:

±20% variation across all degradation weights.
Alternate prompt constructions for the same persona.
LLM output variation across prompt phrasings and temperature ranges.
Strategy divergence testing under high-volatility, low-liquidity, and regulatory change scenarios.

These extensions allow the framework to quantify how tightly results depend on model configuration choices.

Pathways for Extension and Scalability

Future research should explore the following:

Hybrid Agents: Combining degradation-based impairments with stable persona traits to simulate boundedly rational learning agents.
Reinforcement-Conditioned Personas: Selectively enabling feedback loops in simulation sessions longer than 10 trades.
Memory Integration: Using tokenized state representation to simulate persistent trade cognition.
Real-Time Market APIs: Embedding streaming volatility and flow data into simulation contexts.
Cross-Market Generalization: Testing framework adaptability in international and decentralized finance (DeFi) markets.

Appendix B.8. Human Validation

Human validation of simulated trades will be employed to mitigate concerns regarding algorithmic specificity and enhance the external validity of findings. Following established validation methodologies from clinical decision-making research (Peabody et al., 2000; Lakens, 2017), a panel of 30–50 practicing traders recruited via brokerage partners will evaluate anonymized trade vignettes executed during the dual-agent simulation. Each trader independently decides whether they would “buy,” “sell,” “hold,” or “skip” each scenario, providing a direct comparative benchmark against simulated outcomes. A logistic mixed-effects model will then assess the alignment between human and simulated decisions, formally testing equivalence using the two-one-sided procedure recommended by Lakens (2017). Although the number of respondents is modest, the clustered data structure yields approximately 400 decision points, providing sufficient statistical power for inference (Snijders & Bosker, 2012). Key limitations of this approach, particularly potential digital-literacy bias among brokerage-sourced respondents, will be explicitly acknowledged in the discussion to avoid undue generalizations.

Appendix B.9. Behavioral Bias Operationalization (Investor-Level)

This subsection pre-specifies investor-level bias measures for the Large Language Model (LLM)-Augmented Trader (LAT) and its two replay controls—the Virtual Trader (VT; bounded-rule comparator) and the Digital Persona (DP; behaviorally specified baseline). These measures are computed from existing workflow artifacts and are compatible with pre-registration and versioning.

Table A10. Behavioral bias operationalization (investor-level).

Measure	Operational Rule (Summary)	Inputs (From Workflow)	Interpretation
Calibration error (probabilistic)	For decisions with explicit probabilities, compute Brier score and decompose into reliability/resolution/uncertainty; where interval forecasts are present, report empirical coverage vs. nominal. Aggregate by agent and period.	Decision records with stated probabilities and/or intervals; realized outcomes	Lower Brier and higher reliability imply better calibration
Disposition effect (PGR–PLR)	Compute Proportion of Gains Realized (PGR) minus Proportion of Losses Realized (PLR); for options, define gains/losses via mark-to-market relative to entry/basis at decision time.	Trade logs and timestamps; mark-to-market states; closing/roll actions	Higher PGR–PLR indicates a stronger disposition effect
Selective exposure (confirmation)	Share of belief-congruent vs. incongruent evidence tokens in each decision’s recorded evidence slate; aggregate by agent and period.	LAT/DP evidence logs (tokenized sources/claims), decision label	Higher congruent-share indicates stronger selective exposure

Appendix B.10. Strategy Structural Complexity (SSC) Coding (C0–C3)

Purpose and rubric. To enable like-for-like comparisons across LAT, VT, and DP (and later the human vignette panel), each trade is labeled with an ordinal SSC code reflecting its structural sophistication: C0 minimal (single-leg; no time-staging; no explicit vega posture; no conditional steps); C1 basic structured (two-leg directional/covered; one conditional decision node; no ratio or time-staging); C2 defined-risk or time-staged (multi-leg defined-risk spreads such as iron butterflies/condors, or calendars/diagonals with explicit term-structure rationale; two or more conditional steps; explicit long/short-vol stance); C3 path-dependent/cross-Greek (delta-neutral or ratio structures; staged exits; explicit cross-Greek management; three or more conditional steps). Tie-break rules: calendars/diagonals default to C2; ratio backspreads default to C3; covered calls/cash-secured puts default to C1; four-leg defined-risk spreads default to C2 unless a delta-neutral management plan is explicit (then C3). The SSC label is used as a PCA-relevant outcome for Equation (1)—e.g., Pr(C ≥ 2)—and for descriptive reporting of complexity distributions by condition and over time; it does not alter the pre-specified continuous components. The coding aligns with the strategy families listed in Appendix B and mirrored in the OSF preregistration; the brief SSC guide and worked examples will be uploaded to the OSF record as a new component.

Appendix C. Candidate Empirical Methods for Market-Level Diagnostics

This appendix outlines candidate empirical techniques for observing whether cumulative behavioral changes induced by LLM usage among retail investors manifest in measurable market patterns. These methods are designed to complement the BSI and the investor-level diagnostics presented in Step 5 and Section 2.6 of the main text. While the BSI captures within-investor transitions in risk profile and strategic complexity, the methods below serve as diagnostic tools for evaluating whether such shifts leave detectable traces at the asset or market level.

These diagnostics are not treated as predefined hypotheses or core validation tools, but as optional extensions that align with the dual theoretical lens of EMH and AMH presented in Section 5.

The selection of candidate diagnostics follows established empirical practices in financial econometrics and behavioral finance. All methods listed below have been extensively applied in prior research to detect deviations from informational efficiency, structural shifts, and adaptive behavioral patterns in asset markets. Techniques such as matched asset comparisons (Barber et al., 2009; Abadie et al., 2010), structural break tests (Bai & Perron, 2003; Pástor & Veronesi, 2013), and variance ratio analysis (A. W. Lo & MacKinlay, 1988) represent standard approaches for identifying price anomalies and testing market efficiency. Similarly, diagnostics aligned with the AMH—such as anomaly decay modeling (Mclean & Pontiff, 2016), phased adaptation timelines (Hommes, 2013; A. W. Lo, 2019), and behavioral clustering metrics (Khorana et al., 1999; Barber et al., 2021)—are well documented in both theoretical and applied studies. While the specific application to LLM-induced retail investor behavior is novel, the methodological foundation draws directly from validated tools widely used in the empirical study of market dynamics.

Appendix C.1. EMH-Aligned Diagnostics: Detecting Deviations from Informational Efficiency

These methods aim to assess whether LLM-induced behaviors give rise to price patterns inconsistent with the weak-form or semi-strong-form EMH, such as persistent abnormal returns, delayed price reactions, or volatility clustering.

Table A11. EMH Alignment Diagnostic.

Method	Description	Key Metric
Matched Asset Counterfactuals	Compare high–LLM-exposure assets with volatility- and beta-matched control assets to isolate LLM effects on performance and volatility	Differential Sharpe ratios, return drift, realized volatility
Event Window Structural Breaks	Use LLM release points (e.g., ChatGPT, GPT-4) as anchors to detect structural breaks in return distributions or volatility regimes	Chow tests, Bai–Perron segmentation
Return Drift/Variance Ratio Testing	Examine whether LLM-flagged trades or tickers show non-random post-entry price behavior	Variance ratio, autocorrelation, CARs

Appendix C.2. AMH-Aligned Diagnostics: Interpreting Adaptive Behavioral Evolution

These methods evaluate whether observed anomalies persist, decay, or adapt over time—consistent with the AMH’s interpretation of market behavior as the result of bounded rational learning and environmental adjustment.

Table A12. AMH Alignment Diagnostic.

Method	Description	Key Metric
Time-Interacted Anomaly Decay	Model anomaly persistence as a function of time since LLM diffusion, capturing performance normalization	Time-interacted Sharpe ratio regressions, cohort decay coefficients
Three-Stage Adaptation Timeline	Operationalize AMH as a timeline: Stage 1 (0–6 months: surge); Stage 2 (6–18 months: crowding); Stage 3 (18+ months: absorption)	Time-windowed anomaly strength
Strategy Clustering and Mimicry	Detect convergence in retail behavior via prompt similarity, repeated ticker targeting, or synchronized entry timing	Cosine similarity (prompts); correlation matrices (tickers); timestamp clustering

Appendix C.3. Integration with Core Behavioral Framework

These candidate methods extend the individual-level measurement framework—centered around the BSI—into observable financial data. Their role is diagnostic: to identify whether LLM-induced behavioral shifts accumulate into statistically significant, time-sensitive, or cohort-specific effects in asset prices and flows.

Researchers may implement these methods based on data access and the specific characteristics of their sample (e.g., brokerage-level execution logs, prompt history, or Reddit/Twitter behavioral proxies). These tools also allow for triangulation: behavioral survey data (e.g., perceived control, confidence miscalibration) can be paired with asset-level diagnostics to assess whether psychological shifts are market-reflected.

Notes

1	GARCH—Generalized autoregression conditional heteroskedasticity model.
2	https://osf.io/ (accessed on 30 August 2025).
3	For most LLMs, 4 tokens approximately equals 3 words, https://www.baseten.co/ (accessed on 30 August 2025).
4	https://www.anthropic.com/claude/sonnet (accessed on 30 August 2025).
5	While both Processing Speed and Cognitive Fatigue limitations may contribute to reduced human throughput relative to LLM performance, the current framework treats these as potentially overlapping but conceptually distinct constraints. The 25% throughput estimate suggested for processing speed analysis may partially or fully encompass fatigue-related decrements. Further empirical investigation is required to decompose these effects and determine whether they represent independent or confounded limitations. For the purposes of this simulation framework, we implement separate parameters while acknowledging this represents a preliminary operationalization pending empirical validation.

References

Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. Journal of the American Statistical Association, 105(490), 493–505. [Google Scholar] [CrossRef]
Abdullahi, M. (2021). The efficient market hypothesis: A critical review of equilibrium models and imperical evidence. African Scholar Journal of Mgt. Science and Entrepreneurship, 23(7), 379–386. Available online: http://www.africanscholarpublications.com/wp-content/uploads/2022/03/AJMSE_Vol23_No7_Dec2021-23.pdf (accessed on 10 February 2025).
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50(2), 179–211. [Google Scholar] [CrossRef]
Akepanidtaworn, K., Mascio, R. D., Imas, A., & Schmidt, L. D. W. (2023). Selling fast and buying slow: Heuristics and trading performance of institutional investors. The Journal of Finance, 78(6), 3055–3098. [Google Scholar] [CrossRef]
Alhamad, H., & Donyai, P. (2021). The validity of the theory of planned behavior for understanding people’s beliefs and intentions toward reusing medicines. Pharmacy, 9(1), 58. [Google Scholar] [CrossRef]
Almansour, B. Y., & Elkrghli, S. (2023). Behavioral finance factors and investment decisions: A mediating role of risk perception. Cogent Economics & Finance, 11(2), 2239032. [Google Scholar] [CrossRef]
Alsup, A. (2023, April 25). Retail investors play a losing game with complex options, according to research. UF Warrington News. Available online: https://news.warrington.ufl.edu/faculty-and-research/retail-investors-play-a-losing-game-with-complex-options-according-to-research/ (accessed on 22 June 2025).
Anthis, J. R., Liu, R., Richardson, S. M., Kozlowski, A. C., Koch, B., Evans, J., Brynjolfsson, E., & Bernstein, M. (2025). LLM social simulations are a promising research method. arXiv. [Google Scholar] [CrossRef]
Aqham, A., Endaryati, E., Subroto, V., & Kusumajaya, R. (2024). Behavioral biases in investment decisions: A mixed-methods study on retail investors in emerging markets. Journal of Management and Informatics, 3, 568–586. [Google Scholar] [CrossRef]
Armitage, C. J., & Conner, M. (2001). Efficacy of the theory of planned behavior: A meta-analytic review. The British Journal of Social Psychology, 40(Pt 4), 471–499. [Google Scholar] [CrossRef]
Athey, S., & Wager, S. (2019). Estimating treatment effects with causal forests: An application. arXiv. [Google Scholar] [CrossRef]
Bahaj, A., Rahimi, H., Chetouani, M., & Ghogho, M. (2025). Gauging overprecision in LLMs: An empirical study. arXiv. [Google Scholar] [CrossRef]
Bai, J., & Perron, P. (2003). Computation and analysis of multiple structural change models. Journal of Applied Econometrics, 18(1), 1–22. [Google Scholar] [CrossRef]
Bandi, F. M., Fusari, N., & Renò, R. (2023). 0DTE option pricing. Social Science Research Network. [Google Scholar] [CrossRef]
Bandura, A. (1997). Self-efficacy: The exercise of control (pp. ix, 604). W H Freeman/Times Books/Henry Holt & Co. [Google Scholar]
Barber, B. M., Huang, X., Odean, T., & Schwarz, C. (2021). Attention induced trading and returns: Evidence from robinhood users. Social Science Research Network. [Google Scholar] [CrossRef]
Barber, B. M., & Odean, T. (2000). Trading is hazardous to your wealth: The common stock investment performance of individual investors. The Journal of Finance, 55(2), 773–806. [Google Scholar] [CrossRef]
Barber, B. M., & Odean, T. (2002). Online investors: Do the slow die first? The Review of Financial Studies, 15(2), 455–488. [Google Scholar] [CrossRef]
Barber, B. M., Odean, T., & Zhu, N. (2009). Do retail trades move markets? The Review of Financial Studies, 22(1), 151–186. [Google Scholar] [CrossRef]
Barberis, N., Huang, M., & Santos, T. (1999). Prospect theory and asset prices. Social Science Research Network. [Google Scholar] [CrossRef]
Barberis, N., & Thaler, R. H. (2002). A survey of behavioral finance. Social Science Research Network. Available online: https://papers.ssrn.com/abstract=332266 (accessed on 24 June 2025).
Bartsch, H., Jorgensen, O., Rosati, D., Hoelscher-Obermaier, J., & Pfau, J. (2023). Self-consistency of large language models under ambiguity. arXiv. [Google Scholar] [CrossRef]
Baumeister, R. F. (2018). Self-regulation and self-control: Selected works of Roy F. Baumeister (1st ed.). Routledge. [Google Scholar] [CrossRef]
Bawalle, A. A., Khan, M. S. R., & Kadoya, Y. (2025). Overconfidence, financial literacy, and panic selling: Evidence from Japan. PLoS ONE, 20(3), e0315622. [Google Scholar] [CrossRef]
Behrens, M., Gube, M., Chaabene, H., Prieske, O., Zenon, A., Broscheid, K.-C., Schega, L., Husmann, F., & Weippert, M. (2023). Fatigue and human performance: An updated framework. Sports Medicine, 53(1), 7–31. [Google Scholar] [CrossRef]
Belanche, D., Casaló Ariño, L., & Flavian, C. (2019). Artificial intelligence in fintech: Understanding robo-advisors adoption among customers. Industrial Management & Data Systems, 119, 1411–1430. [Google Scholar] [CrossRef]
Bewersdorff, A., Hartmann, C., Hornberger, M., Seßler, K., Bannert, M., Kasneci, E., Kasneci, G., Zhai, X., & Nerdel, C. (2025). Taking the next step with generative artificial intelligence: The transformative role of multimodal large language models in science education. Learning and Individual Differences, 118, 102601. [Google Scholar] [CrossRef]
Bogousslavsky, V., & Muravyev, D. (2024). An anatomy of retail option trading. SSRN Electronic Journal. [Google Scholar] [CrossRef]
Borman, H., Leontjeva, A., Pizzato, L., Jiang, M., & Jermyn, D. (2024). Do LLM personas dream of bull markets? Comparing human and AI investment strategies through the lens of the five-factor model. arXiv. [Google Scholar] [CrossRef]
Bortoli, D. D., Costa, D. d., Jr., Goulart, M., & Campara, J. (2019). Personality traits and investor profile analysis: A behavioral finance study. PLoS ONE, 14(3), e0214062. [Google Scholar] [CrossRef]
Boussioux, L. (2024). Narrative AI and the human-AI oversight paradox in evaluating early-stage innovations. Available online: https://pubsonline.informs.org/do/10.1287/2f948394-3eb2-40b0-aaff-6c621f5f5ab9 (accessed on 22 June 2025).
Briere, M. (2023). Retail investors’ behavior in the digital age: How digitalization is impacting investment decisions. Social Science Research Network. [Google Scholar] [CrossRef]
Broadbent, D. E. (1958). Perception and communication (pp. v, 338). Pergamon Press. [Google Scholar] [CrossRef]
Bruine De Bruin, W., Parker, A. M., & Fischhoff, B. (2007). Individual differences in adult decision-making competence. Journal of Personality and Social Psychology, 92(5), 938–956. [Google Scholar] [CrossRef]
Brysbaert, M. (2019). How many words do we read per minute? A review and meta-analysis of reading rate. Journal of Memory and Language, 109, 104047. [Google Scholar] [CrossRef]
Bryzgalova, S., Pavlova, A., & Sikorskaya, T. (2023). Retail trading in options and the rise of the big three wholesalers. The Journal of Finance, 78(6), 3465–3514. [Google Scholar] [CrossRef]
Buçinca, Z., Lin, P., Gajos, K. Z., & Glassman, E. L. (2020, March 17–20). Proxy tasks and subjective measures can be misleading in evaluating explainable ai systems. 25th International Conference on Intelligent User Interfaces. IUI ’20: 25th International Conference on Intelligent User Interfaces (pp. 454–464), Cagliari, Italy. [Google Scholar] [CrossRef]
Byrd, D., Hybinette, M., & Balch, T. H. (2019). ABIDES: Towards high-fidelity market simulation for AI research. arXiv. [Google Scholar] [CrossRef]
Campbell, S. D., & Sharpe, S. A. (2009). Anchoring bias in consensus forecasts and its effect on market prices. Journal of Financial and Quantitative Analysis, 44(2), 369–390. [Google Scholar] [CrossRef]
Castro, S. C., Strayer, D. L., Matzke, D., & Heathcote, A. (2019). Cognitive workload measurement and modeling under divided attention. Journal of Experimental Psychology. Human Perception and Performance, 45(6), 826–839. [Google Scholar] [CrossRef]
Cen, L., Hilary, G., & Wei, J. (2013). The role of anchoring bias in the equity market: Evidence from analysts’ earnings forecasts and stock returns. The Journal of Financial and Quantitative Analysis, 48(1), 47–76. [Google Scholar] [CrossRef]
Chai, W. J., Abd Hamid, A. I., & Abdullah, J. M. (2018). Working memory from the psychological and neurosciences perspectives: A review. Frontiers in Psychology, 9, 401. [Google Scholar] [CrossRef]
Chen, L., Zhang, Y., Feng, J., Chai, H., Zhang, H., Fan, B., Ma, Y., Zhang, S., Li, N., Liu, T., Sukiennik, N., Zhao, K., Li, Y., Liu, Z., Xu, F., & Li, Y. (2025). AI agent behavioral science. arXiv. [Google Scholar] [CrossRef]
Chen, Z., Chen, J., Chen, J., & Sra, M. (2025). Standard benchmarks fail—Auditing LLM agents in finance must prioritize risk. arXiv. [Google Scholar] [CrossRef]
Choy, S.-K. (2015). Retail clientele and option returns. Journal of Banking & Finance, 51, 26–42. [Google Scholar] [CrossRef]
Chui, A. C. W., Subrahmanyam, A., & Titman, S. (2022). Momentum, reversals, and investor clientele. Review of Finance, 26(2), 217–255. [Google Scholar] [CrossRef]
Compeau, D. R., & Higgins, C. A. (1995). Computer self-efficacy: Development of a measure and initial test. MIS Quarterly, 19(2), 189. [Google Scholar] [CrossRef]
Conte, R., & Paolucci, M. (2014). On agent-based modeling and computational social science. Frontiers in Psychology, 5, 668. [Google Scholar] [CrossRef]
Costarelli, A., Allen, M., Hauksson, R., Sodunke, G., Hariharan, S., Cheng, C., Li, W., Clymer, J., & Yadav, A. (2024). GameBench: Evaluating strategic reasoning abilities of LLM agents. arXiv. [Google Scholar] [CrossRef]
Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. The Behavioral and Brain Sciences, 24, 87–114; discussion 114. [Google Scholar] [CrossRef]
Cucinelli, D., Gandolfi, G., & Soana, M.-G. (2016). Customer and advisor financial decisions: The theory of planned behavior perspective. International Journal of Business and Social Science, 7(12), 80–92. [Google Scholar]
Davis, F. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3), 319. [Google Scholar] [CrossRef]
Davis, F., Bagozzi, R., & Warshaw, P. (1989). User acceptance of computer technology: A comparison of two theoretical models. Management Science, 35, 982–1003. [Google Scholar] [CrossRef]
Dawid, P. (1982). The well-calibrated bayesian. Journal of the American Statistical Association, 77(379), 605–610. [Google Scholar] [CrossRef]
DeVellis, R. F. (2016). Scale development: Theory and applications. SAGE Publications. [Google Scholar]
D’Hondt, C., Petitjean, M., & Elhichou, Y. (2023). Uncovering the profile of passive exchange-traded fund retail investors. Forthcoming in Finance. SSRN Working Paper No. 3522963. Available online: https://ssrn.com/abstract=3522963 (accessed on 22 June 2025).
Diamond, N., & Perkins, G. (2022). Using intermarket data to evaluate the efficient market hypothesis with machine learning. arXiv. [Google Scholar] [CrossRef]
Dimitriadis, K. A., Koursaros, D., & Savva, C. S. (2025). Exploring the dynamic nexus of traditional and digital assets in inflationary times: The role of safe havens, tech stocks, and cryptocurrencies. Economic Modelling, 151, 107195. [Google Scholar] [CrossRef]
Dong, M. M., Stratopoulos, T. C., & Wang, V. X. (2024). A scoping review of ChatGPT research in accounting and finance. International Journal of Accounting Information Systems, 55, 100715. [Google Scholar] [CrossRef]
Du, J., Huang, D., Liu, Y.-J., Shi, Y., Subrahmanyam, A., & Zhang, H. (2025). Nominal Prices, Retail Investor Participation, and Return Momentum. Management Science, Advance Online Publication, 1423. [Google Scholar] [CrossRef]
East, R. (1993). Investment decisions and the theory of planned behavior. Journal of Economic Psychology, 14(2), 337–375. [Google Scholar] [CrossRef]
Elly, A., John, D., Okunola, A., & Notiny, B. (2025). The impact of AI on algorithmic trading and investment strategies: Analyzing performance and risk management. Available online: https://www.researchgate.net/profile/Abiodun-Okunola-6/publication/390172832_The_Impact_of_AI_on_Algorithmic_Trading_and_Investment_Strategies_Analyzing_Performance_and_Risk_Management/links/67e321c6fe0f5a760f9034a5/The-Impact-of-AI-on-Algorithmic-Trading-and-Investment-Strategies-Analyzing-Performance-and-Risk-Management.pdf (accessed on 22 June 2025).
Epstein, J. M. (1999). Agent-based computational models and generative social science. Complexity, 4(5), 41–60. [Google Scholar] [CrossRef]
Eriksen, B. A., & Eriksen, C. W. (1974). Effects of noise letters upon the identification of a target letter in a nonsearch task. Perception & Psychophysics, 16(1), 143–149. [Google Scholar] [CrossRef]
Escobar, L., & Pedraza, A. (2023). Active trading and (poor) performance: The social transmission channel. Journal of Financial Economics, 150(1), 139–165. [Google Scholar] [CrossRef]
Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. Journal of Finance, 25(2), 383–417. [Google Scholar] [CrossRef]
Fan, W., Zhu, Y., Wang, C., Wang, B., & Xu, W. (2025). Consistency of responses and continuations generated by large language models on social media. arXiv. [Google Scholar] [CrossRef]
Felix, L., Kraussl, R., & Stork, P. (2020). Implied volatility sentiment: A tale of two tails. Quantitative Finance, 20(5), 823–849. [Google Scholar] [CrossRef]
Fenton-O’Creevy, M., Nicholson, N., Soane, E., & Willman, P. (2003). Trading on illusions: Unrealistic perceptions of control and trading performance. Journal of Occupational and Organizational Psychology, 76(1), 53–68. [Google Scholar] [CrossRef]
Ferrag, M. A., Tihanyi, N., & Debbah, M. (2025). From LLM reasoning to autonomous AI agents: A comprehensive review. arXiv. [Google Scholar] [CrossRef]
Ferrari, J. R. (2001). Procrastination as self-regulation failure of performance: Effects of cognitive load, self-awareness, and time limits on ‘working best under pressure’. European Journal of Personality, 15(5), 391–406. [Google Scholar] [CrossRef]
Finet, A., Kristoforidis, K., & Laznicka, J. (2025). Emotional drivers of financial decision-making: Unveiling the link between emotions and stock market behavior. Journal of Next-Generation Research 5.0, 1(3), 1–25. [Google Scholar] [CrossRef]
Finucane, M. L., Alhakami, A., Slovic, P., & Johnson, S. M. (2000). The affect heuristic in judgments of risks and benefits. Journal of Behavioral Decision Making, 13(1), 1–17. [Google Scholar] [CrossRef]
Firoozye, N., Tan, V., & Zohren, S. (2023). Canonical portfolios: Optimal asset and signal combination. arXiv. [Google Scholar] [CrossRef]
Flavell, J. (1979). Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry. American Psychologist, 34(10), 906–911. [Google Scholar] [CrossRef]
Foltice, B., & Langer, T. (2015). Profitable momentum trading strategies for individual investors. Financial Markets and Portfolio Management, 29(2), 85–113. [Google Scholar] [CrossRef]
Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 18(1), 39–50. [Google Scholar] [CrossRef]
Galariotis, E. (2014). Contrarian and momentum trading: A review of the literature. Review of Behavioral Finance, 6, 63–82. [Google Scholar] [CrossRef]
Ganguli, D., Hernandez, D., Lovitt, L., DasSarma, N., Henighan, T., Jones, A., Joseph, N., Kernion, J., Mann, B., Askell, A., Bai, Y., Chen, A., Conerly, T., Drain, D., Elhage, N., Showk, S. E., Fort, S., Hatfield-Dodds, Z., Johnston, S., … Clark, J. (2022a, June 21–24). Predictability and surprise in large generative models. 2022 ACM Conference on Fairness Accountability and Transparency (pp. 1747–1764), Seoul, Republic of Korea. [Google Scholar] [CrossRef]
Ganguli, D., Lovitt, L., Kernion, J., Askell, A., Bai, Y., Kadavath, S., Mann, B., Perez, E., Schiefer, N., Ndousse, K., Jones, A., Bowman, S., Chen, A., Conerly, T., DasSarma, N., Drain, D., Elhage, N., El-Showk, S., Fort, S., … Clark, J. (2022b). Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv. [Google Scholar] [CrossRef]
Gao, C., Lan, X., Li, N., Yuan, Y., Ding, J., Zhou, Z., Xu, F., & Li, Y. (2023). Large language models empowered agent-based modeling and simulation: A survey and perspectives. arXiv. [Google Scholar] [CrossRef]
Gao, C., Lan, X., Li, N., Yuan, Y., Ding, J., Zhou, Z., Xu, F., & Li, Y. (2024). Large language models empowered agent-based modeling and simulation: A survey and perspectives. Humanities and Social Sciences Communications, 11(1), 1259. [Google Scholar] [CrossRef]
Gempesaw, D., Henry, J. J., & Xiao, H. (2023). Retail ETF investing. Social Science Research Network. [Google Scholar] [CrossRef]
Gerlich, M. (2025). AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies, 15(1), 6. [Google Scholar] [CrossRef]
Gerrans, P., Abisekaraj, S. B., & Liu, Z. (2023). The fear of missing out on cryptocurrency and stock investments: Direct and indirect effects of financial literacy and risk tolerance. Journal of Financial Literacy and Wellbeing, 1(1), 103–137. [Google Scholar] [CrossRef]
Ghaffarzadegan, N., Majumdar, A., Williams, R., & Hosseinichimeh, N. (2024). Generative agent-based modeling: An introduction and tutorial. System Dynamics Review, 40, e1761. [Google Scholar] [CrossRef]
Ghosh, P. (2025). Types of trading strategies: Momentum, mean reversion, and style differences explained by Prodipta Ghosh, QuantInsti articles. Available online: https://www.quantinsti.com/articles/types-trading-strategies/ (accessed on 24 June 2025).
Giarlotta, A., & Petralia, A. (2024). Simon’s bounded rationality. Decisions in Economics and Finance, 47(1), 327–346. [Google Scholar] [CrossRef]
Gimmelberg, D., Belinskiy, A., Valentine, A., Iveta, L., Kaže, V., & Filatov, A. (2025a). FACET—Four agent causal evaluation toolkit. Large language models for retail equity and options traders. OSF Registration. Open Science Framework. [Google Scholar] [CrossRef]
Gimmelberg, D., Głowacka, M., Belinskiy, A., Korotkii, S., Artamov, V., & Ludviga, I. (2025b). Bridging human expertise and AI: Evaluating the role of large language models in retail investors’ decision-making. International Journal of Finance & Banking Studies, 14(1), 20–29. [Google Scholar] [CrossRef]
Glaser, M., & Weber, M. (2007). Overconfidence and trading volume. The Geneva Risk and Insurance Review, 32(1), 1–36. [Google Scholar] [CrossRef]
Goodell, J. W., Yadav, M. P., Ruan, J., Abedin, M. Z., & Malhotra, N. (2023). Traditional assets, digital assets and renewable energy: Investigating connectedness during COVID-19 and the Russia-Ukraine war. Finance Research Letters, 58, 104323. [Google Scholar] [CrossRef]
Graham, J. R., & Kumar, A. (2006). Do dividend clienteles exist? Evidence on dividend preferences of retail investors. The Journal of Finance, 61(3), 1305–1336. [Google Scholar] [CrossRef]
Grignoli, N., Manoni, G., Gianini, J., Schulz, P., Gabutti, L., & Petrocchi, S. (2025). Clinical decision fatigue: A systematic and scoping review with meta-synthesis. Family Medicine and Community Health, 13(1), e003033. [Google Scholar] [CrossRef] [PubMed]
Grinblatt, M., & Keloharju, M. (2009). Sensation seeking, overconfidence, and trading activity. The Journal of Finance, 64(2), 549–578. [Google Scholar] [CrossRef]
Grossman, M. R., & Cormack, G. V. (2011). Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Richmond Journal of Law and Technology, 17(3), 11. Available online: http://jolt.richmond.edu/v17i3/article11.pdf (accessed on 25 June 2025).
Gu, J., Ye, J., Wang, G., & Yin, W. (2024, November 14–17). Adaptive and explainable margin trading via large language models on portfolio management. 5th ACM International Conference on AI in Finance (pp. 248–256), Brooklyn, NY, USA. [Google Scholar] [CrossRef]
Gui, G., & Toubia, O. (2023). The challenge of using llms to simulate human behavior: A causal inference perspective. SSRN Electronic Journal. [Google Scholar] [CrossRef]
Han, X., Sakkas, N., Danbolt, J., & Eshraghi, A. (2022). Persistence of investor sentiment and market mispricing. Financial Review, 57(3), 617–640. [Google Scholar] [CrossRef]
Han, X., Wang, N., Che, S., Yang, H., Zhang, K., & Xu, S. X. (2024, November 14–17). Enhancing investment analysis: Optimizing AI-Agent collaboration in financial research. 5th ACM International Conference on AI in Finance. ICAIF ’24: 5th ACM International Conference on AI in Finance (pp. 538–546), Brooklyn, NY, USA. [Google Scholar] [CrossRef]
Hansen, A. L., & Kazinnik, S. (2024). Can ChatGPT decipher Fedspeak? SSRN working paper No. 4399406. Available online: https://ssrn.com/abstract=4399406 (accessed on 22 June 2025).
Harris, L. (2024). Algorithmic trading and portfolio optimization using big data analytics. Available online: https://www.researchgate.net/publication/386076052_Algorithmic_Trading_and_Portfolio_Optimization_Using_Big_Data_Analytics (accessed on 22 June 2025).
Hart, W., Albarracín, D., Eagly, A. H., Brechan, I., Lindberg, M. J., & Merrill, L. (2009). Feeling validated versus being correct: A meta-analysis of selective exposure to information. Psychological Bulletin, 135(4), 555–588. [Google Scholar] [CrossRef]
Hartley, J., Hamill, C., Batra, D., Seddon, D., Okhrati, R., & Khraishi, R. (2025). How personality traits shape LLM risk-taking behavior. arXiv. [Google Scholar] [CrossRef]
Henning, T., Ojha, S. M., Spoon, R., Han, J., & Camerer, C. F. (2025). LLM trading: Analysis of LLM agent behavior in experimental asset markets. arXiv. [Google Scholar] [CrossRef]
Henseler, J., Ringle, C., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science, 43, 115–135. [Google Scholar] [CrossRef]
Hoff, K. A., & Bashir, M. (2015). Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors, 57(3), 407–434. [Google Scholar] [CrossRef]
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960. [Google Scholar] [CrossRef]
Hommes, C. (2013). Behavioral rationality and heterogeneous expectations in complex economic systems (1st ed.). Cambridge University Press. [Google Scholar] [CrossRef]
Huang, J., Xiao, M., Li, D., Jiang, Z., Yang, Y., Zhang, Y., Qian, L., Wang, Y., Peng, X., Ren, Y., Xiang, R., Chen, Z., Zhang, X., He, Y., Han, W., Chen, S., Shen, L., Kim, D., Yu, Y., … Tsujii, J. (2025). Open-FinLLMs: Open multimodal large language models for financial applications. arXiv. [Google Scholar] [CrossRef]
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2025). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2), 1–55. [Google Scholar] [CrossRef]
Hurst, H. E. (1951). Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers, 116(1), 770–799. [Google Scholar] [CrossRef]
Hutchins, E. (1995). Cognition in the wild (pp. xviii, 381). The MIT Press. [Google Scholar]
Jadhav, A., Pang, N., & Zhou, Y. (2025). Large language models in equity markets: Applications, opportunities, and risks. Frontiers in Artificial Intelligence, 8, 1608365. [Google Scholar] [CrossRef] [PubMed]
Jakesch, M., Hancock, J., & Naaman, M. (2023). Human heuristics for AI-generated language are flawed. Proceedings of the National Academy of Sciences of the United States of America, 120, e2208839120. [Google Scholar] [CrossRef] [PubMed]
Jia, J., Yuan, Z., Pan, J., McNamara, P. E., & Chen, D. (2024). Decision-making behavior evaluation framework for LLMs under uncertain context. arXiv. [Google Scholar] [CrossRef]
Jiang, H., Zhang, X., Cao, X., Breazeal, C., Roy, D., & Kabbara, J. (2024). PersonaLLM: Investigating the ability of large language models to express personality traits. arXiv. [Google Scholar] [CrossRef]
Jiang, Z., Peng, C., & Yan, H. (2024). Personality differences and investment decision-making. Journal of Financial Economics, 153, 103776. [Google Scholar] [CrossRef]
Johnson, S. G. B., Bilovich, A., & Tuckett, D. (2023). Conviction narrative theory: A theory of choice under radical uncertainty. Behavioral and Brain Sciences, 46, 1–26. [Google Scholar] [CrossRef]
Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux. [Google Scholar]
Kang, H. (2021). Sample size determination and power analysis using the G*Power software. Journal of Educational Evaluation for Health Professions, 18, 17. [Google Scholar] [CrossRef]
Karinshak, E., Liu, S. X., Park, J. S., & Hancock, J. T. (2023). Working with AI to persuade: Examining a large language model’s ability to generate pro-vaccination messages. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1), 1–29. [Google Scholar] [CrossRef]
Kayani, U., Ullah, M., Aysan, A. F., Nazir, S., & Frempong, J. (2024). Quantile connectedness among digital assets, traditional assets, and renewable energy prices during extreme economic crisis. Technological Forecasting and Social Change, 208, 123635. [Google Scholar] [CrossRef]
Khan, M. A., & Shabbir, H. (2025). Digital literacy and retail investing: Exploring market dynamics, efficiency, and stability in the digital era. Journal of Digital Literacy and Learning, 1, 20. [Google Scholar]
Khorana, A., Chang, E. C., & Cheng, J. W. (1999). An examination of herd behavior in equity markets: An international perspective. Social Science Research Network. [Google Scholar] [CrossRef]
Khuntia, S., & Pattanayak, J. K. (2018). Adaptive market hypothesis and evolving predictability of bitcoin. Economics Letters, 167, 26–28. [Google Scholar] [CrossRef]
Kiely. (2025). Understanding performance benchmarks for LLM inference. Baseten Blog, Baseten. Available online: https://www.baseten.co/blog/understanding-performance-benchmarks-for-llm-inference/ (accessed on 24 June 2025).
King, W. R., & He, J. (2006). A meta-analysis of the technology acceptance model. Information & Management, 43(6), 740–755. [Google Scholar] [CrossRef]
Kirtac, K., & Germano, G. (2024). Sentiment trading with large language models. Finance Research Letters, 62, 105227. [Google Scholar] [CrossRef]
Kobbeltved, T., & Wolff, K. (2009). The Risk-as-feelings hypothesis in a Theory-of-planned-behavior perspective. Judgment and Decision Making, 4(7), 567–586. [Google Scholar] [CrossRef]
Korniotis, G. M., & Kumar, A. (2009). Do older investors make better investment decisions? Social Science Research Network. Available online: https://papers.ssrn.com/abstract=767125 (accessed on 25 June 2025).
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 1121–1134. [Google Scholar] [CrossRef]
Kuchinsky, S. E., Gallun, F. J., & Lee, A. K. C. (2024). Note on the dual-task paradigm and its use to measure listening effort. Trends in Hearing, 28, 23312165241292215. [Google Scholar] [CrossRef]
Kumar, A. (2009). Who gambles in the stock market? The Journal of Finance, 64(4), 1889–1933. [Google Scholar] [CrossRef]
Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8(4), 355–362. [Google Scholar] [CrossRef] [PubMed]
Lavie, N. (2005). Distracted and confused? Selective attention under load. Trends in Cognitive Sciences, 9(2), 75–82. [Google Scholar] [CrossRef]
Leaver, M., & Reader, T. W. (2016). Human factors in financial trading: An analysis of trading incidents. Human Factors, 58(6), 814–832. [Google Scholar] [CrossRef] [PubMed]
LeBaron, B. (2006). Chapter 24 agent-based computational finance. In L. Tesfatsion, & K. L. Judd (Eds.), Handbook of computational economics (pp. 1187–1233). Elsevier. [Google Scholar] [CrossRef]
Lee, H.-P., Sarkar, A., Tankelevitch, L., Drosos, I., Rintel, S., Banks, R., & Wilson, N. (2025, April 26–May 1). The impact of generative AI on critical thinking: Self-reported reductions in cognitive effort and confidence effects from a survey of knowledge workers. 2025 CHI Conference on Human Factors in Computing Systems (pp. 1–22), Yokohama, Japan. [Google Scholar] [CrossRef]
Lee, J. D., & See, K. A. (2004). Trust in automation: Designing for appropriate reliance. Human Factors: The Journal of the Human Factors and Ergonomics Society, 46(1), 50–80. [Google Scholar] [CrossRef]
Lerner, J. S., & Keltner, D. (2001). Fear, anger, and risk. Journal of Personality and Social Psychology, 81(1), 146–159. [Google Scholar] [CrossRef] [PubMed]
Lerner, J. S., Li, Y., Valdesolo, P., & Kassam, K. S. (2015). Emotion and decision making. Annual Review of Psychology, 66(1), 799–823. [Google Scholar] [CrossRef]
Li, W. W., Kim, H., Cucuringu, M., & Ma, T. (2025). Can LLM-based financial investing strategies outperform the market in long run? arXiv. [Google Scholar] [CrossRef]
Li, Y., Miao, Y., Ding, X., Krishnan, R., & Padman, R. (2025). Firm or fickle? Evaluating large language models consistency in sequential interactions. arXiv. [Google Scholar] [CrossRef]
Liu, Z., Guo, X., Lou, F., Zeng, L., Niu, J., Wang, Z., Xu, J., Cai, W., Yang, Z., Zhao, X., Li, C., Xu, S., Chen, D., Chen, Y., Bai, Z., & Zhang, L. (2025). Fin-R1: A large language model for financial reasoning through reinforcement learning. arXiv. [Google Scholar] [CrossRef]
Lo, A. (2004). Reconciling efficient markets with behavioral finance: The adaptive markets hypothesis. Journal of Investment Consulting, 7, 21–44. [Google Scholar]
Lo, A. W. (2004). The adaptive markets hypothesis. The Journal of Portfolio Management, 30(5), 15–29. [Google Scholar] [CrossRef]
Lo, A. W. (2019). Adaptive markets: Financial evolution at the speed of thought (2nd ed). Princeton University Press. [Google Scholar]
Lo, A. W., & MacKinlay, A. C. (1988). Stock market prices do not follow random walks: Evidence from a simple specification test. The Review Of Financial Studies, 1(1), 41–66. [Google Scholar] [CrossRef]
Loewenstein, G. F., Weber, E. U., Hsee, C. K., & Welch, N. (2001). Risk as feelings. Psychological Bulletin, 127(2), 267. [Google Scholar] [CrossRef]
Logg, J., Minson, J., & Moore, D. (2019). Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes, 151, 90–103. [Google Scholar] [CrossRef]
Lopez-Lira, A. (2025). Can large language models trade? Testing financial theories with LLM agents in market simulations. arXiv. [Google Scholar] [CrossRef]
Lopez-Lira, A., & Tang, Y. (2024). Can ChatGPT forecast stock price movements? Return predictability and large language models. arXiv. [Google Scholar] [CrossRef]
Lou, J., & Sun, Y. (2024). Anchoring bias in large language models: An experimental study. arXiv. [Google Scholar] [CrossRef]
Lux, T., & Zwinkels, R. C. J. (2018). Empirical validation of agent-based models. In Handbook of computational economics (pp. 437–488). Elsevier. [Google Scholar] [CrossRef]
MacKinlay, A. C. (1997). Event studies in economics and finance. Journal of Economic Literature, 35(1), 13–39. [Google Scholar]
Mainali, M., & Weber, R. O. (2025). Exploring cognitive attributes in financial decision-making. arXiv. [Google Scholar] [CrossRef]
Marakas, G. M., Yi, M. Y., & Johnson, R. D. (1998). The multilevel and multifaceted character of computer self-efficacy: Toward clarification of the construct and an integrative framework for research. Information Systems Research, 9(2), 126–163. [Google Scholar] [CrossRef]
Martin, L., Whitehouse, N., Yiu, S., Catterson, L., & Perera, R. (2024). Better call GPT, comparing large language models against lawyers. arXiv. [Google Scholar] [CrossRef]
Martinez-Blasco, M., Serrano, V., Prior, F., & Cuadros, J. (2023). Analysis of an event study using the Fama–French five-factor model: Teaching approaches including spreadsheets and the R programming language. Financial Innovation, 9(1), 76. [Google Scholar] [CrossRef]
Mclean, R. D., & Pontiff, J. (2016). Does academic research destroy stock return predictability? The Journal of Finance, 71(1), 5–32. [Google Scholar] [CrossRef]
McNulty, K. (2021). Handbook of regression modeling in people analytics: With examples in R and Python. Available online: https://peopleanalytics-regression-book.org/gitbook/power-tests.html?utm_source=chatgpt.com (accessed on 25 June 2025).
Meissner, P., & Wulf, T. (2013). Cognitive benefits of scenario planning: Its impact on biases and decision quality. Technological Forecasting and Social Change, 80(4), 801–814. [Google Scholar] [CrossRef]
Middlebrooks, C. D., Kerr, T., & Castel, A. D. (2017). Selectively distracted: Divided attention and memory for important information. Psychological Science, 28(8), 1103–1115. [Google Scholar] [CrossRef]
Miguel, A. F., & Su, D. (2019). Explaining differences in the flow-performance sensitivity of retail and institutional mutual funds—International evidence. Theoretical Economics Letters, 9(7), 2711–2731. [Google Scholar] [CrossRef]
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97. [Google Scholar] [CrossRef]
Morales-García, W. C., Sairitupa-Sanchez, L. Z., Morales-García, S. B., & Morales-García, M. (2024). Adaptation and psychometric properties of a brief version of the general self-efficacy scale for use with artificial intelligence (GSE-6AI) among university students. Frontiers in Education, 9, 1293437. [Google Scholar] [CrossRef]
Naranjo, A., Nimalendran, M., & Wu, Y. (2023). Betting on elusive returns: Retail trading in complex options. Social Science Research Network. [Google Scholar] [CrossRef]
Narayan, S. W., Rehman, M. U., Ren, Y.-S., & Ma, C. (2023). Is a correlation-based investment strategy beneficial for long-term international portfolio investors? Financial Innovation, 9(1), 64. [Google Scholar] [CrossRef]
Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175–220. [Google Scholar] [CrossRef]
Nie, Y., Kong, Y., Dong, X., Mulvey, J. M., Poor, H. V., Wen, Q., & Zohren, S. (2024). A survey of large language models for financial applications: Progress, prospects and challenges. arXiv. [Google Scholar] [CrossRef]
Odean, T. (1998). Are investors reluctant to realize their losses? The Journal of Finance, 53(5), 1775–1798. [Google Scholar] [CrossRef]
Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39(2), 230–253. [Google Scholar] [CrossRef]
Parasuraman, R., Sheridan, T. B., & Wickens, C. D. (2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 30(3), 286–297. [Google Scholar] [CrossRef]
Park, J., Konana, P., Gu, B., Kumar, A., & Raghunathan, R. (2010). Confirmation bias, overconfidence, and investment performance: Evidence from stock message boards. Social Science Research Network. [Google Scholar] [CrossRef]
Park, J. S., O’Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. arXiv. [Google Scholar] [CrossRef]
Parker, J. A., Schoar, A., & Sun, Y. (2020). Retail financial innovation and stock market dynamics: The case of target date funds. National Bureau of Economic Research. [Google Scholar] [CrossRef]
Parte, L., Garvey, A. M., & Gonzalo-Angulo, J. A. (2018). Cognitive load theory: Why it’s important for international business teaching and financial reporting. Journal of Teaching in International Business, 29(2), 134–160. [Google Scholar] [CrossRef]
Pavlou, P. A., & Fygenson, M. (2006). Understanding and predicting electronic commerce adoption: An extension of the theory of planned behavior. MIS Quarterly, 30(1), 115. [Google Scholar] [CrossRef]
Payne, J. W., Bettman, J. R., & Johnson, E. J. (1993). The adaptive decision maker. Cambridge University Press. [Google Scholar]
Payzan-LeNestour, E., Pradier, L., & Putniņš, T. J. (2023). Biased risk perceptions: Evidence from the laboratory and financial markets. Journal of Banking & Finance, 154, 106685. [Google Scholar] [CrossRef]
Pástor, Ľ., & Veronesi, P. (2013). Political uncertainty and risk premia. Journal of Financial Economics, 110(3), 520–545. [Google Scholar] [CrossRef]
Peabody, J. W., Luck, J., Glassman, P., Dresselhaus, T. R., & Lee, M. (2000). Comparison of vignettes, standardized patients, and chart abstraction: A prospective validation study of 3 methods for measuring quality. JAMA, 283(13), 1715–1722. [Google Scholar] [CrossRef]
Peng, C. (2024). Emotion-impacted Decision-making under Risks. Advances in Social Behavior Research, 13, 68–76. [Google Scholar] [CrossRef]
Peng, Y. (2024). Internet sentiment exacerbates intraday overtrading, evidence from A-Share market. arXiv. [Google Scholar] [CrossRef]
Pernagallo, G., & Torrisi, B. (2022). A Theory of Information overload applied to perfectly efficient financial markets. Review of Behavioral Finance, 14(2), 223–236. [Google Scholar] [CrossRef]
Persson, E., Barrafrem, K., Meunier, A., & Tinghög, G. (2019). The effect of decision fatigue on surgeons’ clinical decision making. Health Economics, 28(10), 1194–1203. [Google Scholar] [CrossRef]
Pimenta, A., Carneiro, D., Novais, P., & Neves, J. (2014). Analysis of Human Performance as a Measure of Mental Fatigue. In M. Polycarpou, A. C. P. L. F. de Carvalho, J.-S. Pan, M. Woźniak, H. Quintian, & E. Corchado (Eds.), Hybrid artificial intelligence systems (pp. 389–401). Springer International Publishing. [Google Scholar] [CrossRef]
Pouget, S., Sauvagnat, J., & Villeneuve, S. (2017). A mind is a terrible thing to change: Confirmatory bias in financial markets. The Review of Financial Studies, 30(6), 2066–2109. [Google Scholar] [CrossRef]
Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372–422. [Google Scholar] [CrossRef]
Rayner, K., Schotter, E. R., Masson, M. E. J., Potter, M. C., & Treiman, R. (2016). So much to read, so little time: How do we read, and can speed reading help? Psychological Science in the Public Interest, 17(1), 4–34. [Google Scholar] [CrossRef]
Rendon-Velez, E., Van Leeuwen, P. M., Happee, R., Horváth, I., Van Der Vegte, W. F., & De Winter, J. C. F. (2016). The effects of time pressure on driver performance and physiological activity: A driving simulator study. Transportation Research Part F: Traffic Psychology and Behavior, 41, 150–169. [Google Scholar] [CrossRef]
Richards, D. W., Rutterford, J., Kodwani, D., & Fenton-O’Creevy, M. (2017). Stock market investors’ use of stop losses and the disposition effect. The European Journal of Finance, 23(2), 130–152. [Google Scholar] [CrossRef]
Risko, E. F., & Gilbert, S. J. (2016). Cognitive offloading. Trends in Cognitive Sciences, 20(9), 676–688. [Google Scholar] [CrossRef]
Rose, J. M., Roberts, F. D., & Rose, A. M. (2004). Affective responses to financial data and multimedia: The effects of information load and cognitive load. International Journal of Accounting Information Systems, 5(1), 5–24. [Google Scholar] [CrossRef]
Rubinstein, A. (2013). Response time and decision making: An experimental study. Judgment and Decision Making, 8(5), 540–551. [Google Scholar] [CrossRef]
Ruggeri, K., Ashcroft-Jones, S., Abate Romero Landini, G., Al-Zahli, N., Alexander, N., Andersen, M. H., Bibilouri, K., Busch, K., Cafarelli, V., Chen, J., Doubravová, B., Dugué, T., Durrani, A. A., Dutra, N., Garcia-Garzon, E., Gomes, C., Gracheva, A., Grilc, N., Gürol, D. M., … Stock, F. (2023). The persistence of cognitive biases in financial decisions across economic groups. Scientific Reports, 13(1), 10329. [Google Scholar] [CrossRef] [PubMed]
Salemi, A., & Zamani, H. (2024). Evaluating retrieval quality in retrieval-augmented generation. arXiv. Available online: http://arxiv.org/abs/2404.13781 (accessed on 31 August 2025).
Schlegel, K., Sommer, N. R., & Mortillaro, M. (2025). Large language models are proficient in solving and creating emotional intelligence tests. Communications Psychology, 3(1), 80. [Google Scholar] [CrossRef]
Seth, H., Talwar, S., Bhatia, A., Saxena, A., & Dhir, A. (2020). Consumer resistance and inertia of retail investors: Development of the resistance adoption inertia continuance (RAIC) framework. Journal of Retailing and Consumer Services, 55, 102071. [Google Scholar] [CrossRef]
Sharpe, W. F. (1994). The sharpe ratio. The Journal of Portfolio Management, 21(1), 49–58. [Google Scholar] [CrossRef]
Sheeran, P., & Webb, T. L. (2016). The intention–behavior gap. Social and Personality Psychology Compass, 10(9), 503–518. [Google Scholar] [CrossRef]
Shefrin, H., & Statman, M. (1985). The disposition to sell winners too early and ride losers too long: Theory and evidence. The Journal of Finance, 40(3), 777–790. [Google Scholar] [CrossRef]
Shiller, R. J. (2017). Narrative economics. American Economic Review, 107(4), 967–1004. [Google Scholar] [CrossRef]
Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., & Yao, S. (2024). Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36, 1–19. Available online: https://proceedings.neurips.cc/paper_files/paper/2023/hash/1b44b878bb782e6954cd888628510e90-Abstract-Conference.html (accessed on 25 February 2024).
Showalter, S., & Gropp, J. (2019). Validating weak-form market efficiency in united states stock markets with trend deterministic price data and machine learning. arXiv. [Google Scholar] [CrossRef]
Simon, A. J., Gallen, C. L., Ziegler, D. A., Mishra, J., Marco, E. J., Anguera, J. A., & Gazzaley, A. (2023). Quantifying attention span across the lifespan. Frontiers in Cognition, 2, 1207428. [Google Scholar] [CrossRef]
Simon, H. A. (1955). A Behavioral model of rational choice. The Quarterly Journal of Economics, 69(1), 99–118. [Google Scholar] [CrossRef]
Simons, D. J., & Chabris, C. F. (1999). Gorillas in our midst: Sustained inattentional blindness for dynamic events. Perception, 28(9), 1059–1074. [Google Scholar] [CrossRef] [PubMed]
Singh, A. K., Devkota, S., Lamichhane, B., Dhakal, U., & Dhakal, C. (2023). The confidence-competence gap in large language models: A cognitive study. arXiv. [Google Scholar] [CrossRef]
Singh, D., Malik, G., & Jha, A. (2024). Overconfidence bias among retail investors: A systematic review and future research directions. Investment Management and Financial Innovations, 21(1), 302–316. [Google Scholar] [CrossRef]
Slovic, P., Finucane, M. L., Peters, E., & MacGregor, D. G. (2007). The affect heuristic. European Journal of Operational Research, 177(3), 1333–1352. [Google Scholar] [CrossRef]
Sniehotta, F. (2009). An experimental test of the theory of planned behavior. Applied Psychology: Health and Well-Being, 1, 257–270. [Google Scholar] [CrossRef]
Snijders, T. A., & Bosker, R. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. SAGE Publications Ltd. Available online: https://uk.sagepub.com/en-gb/eur/multilevel-analysis/book234191 (accessed on 26 June 2025).
Song, J., Xu, Z., & Zhong, Y. (2025). Out-of-distribution generalization via composition: A lens through induction heads in Transformers. Proceedings of the National Academy of Sciences, 122(6), e2417182122. [Google Scholar] [CrossRef]
Spatharioti, S. E., Rothschild, D. M., Goldstein, D. G., & Hofman, J. M. (2023). Comparing traditional and LLM-based search for consumer choice: A randomized experiment. arXiv. [Google Scholar] [CrossRef]
Steyvers, M., Tejeda, H., Kumar, A., Belem, C., Karny, S., Hu, X., Mayer, L., & Smyth, P. (2025). What large language models know and what people think they know. Nature Machine Intelligence, 7(2), 221–231. [Google Scholar] [CrossRef]
Sumita, Y., Takeuchi, K., & Kashima, H. (2024). Cognitive biases in large language models: A survey and mitigation experiments. arXiv. [Google Scholar] [CrossRef]
Sun, C. (2023). Factor correlation and the cross section of asset returns: A correlation-robust approach. Available online: https://wp.lancs.ac.uk/finec2023/files/2023/02/FEC-2023-049-Chuanping-Sun-Final.pdf (accessed on 24 June 2025).
Sun, F., Li, N., Wang, K., & Goette, L. (2025). Large language models are overconfident and amplify human bias. arXiv. [Google Scholar] [CrossRef]
Sussman, R., & Gifford, R. (2019). Causality in the theory of planned behavior. Personality and Social Psychology Bulletin, 45(6), 920–933. [Google Scholar] [CrossRef]
Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive load theory. Springer Science & Business Media. [Google Scholar]
Tan, L., Zhang, X., & Zhang, X. (2023). Retail and institutional investor trading behaviors: Evidence from China. Social Science Research Network. [Google Scholar] [CrossRef]
Tatsat, H., & Shater, A. (2025). Beyond the black box: Interpretability of LLMs in finance. arXiv. [Google Scholar] [CrossRef]
Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139–1168. [Google Scholar] [CrossRef]
Tilmann, G., & Raftery, A. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477), 359–378. [Google Scholar] [CrossRef]
Tuominen, J. (2023). Decisions under uncertainty are more messy than they seem. Behavioral and Brain Sciences, 46, e109. [Google Scholar] [CrossRef]
Tversky, A., & Kahneman, D. (1986). Rational choice and the framing of decisions. The Journal of Business, 59(4), S251–S278. [Google Scholar] [CrossRef]
Uhr, C., Meyer, S., & Hackethal, A. (2021). Smoking hot portfolios? Trading behavior, investment biases, and self-control failure. Journal of Empirical Finance, 63, 73–95. [Google Scholar] [CrossRef]
Valeyre, S., & Aboura, S. (2024). LLMs for time series: An application for single stocks and statistical arbitrage. arXiv. [Google Scholar] [CrossRef]
Varshney, N., Raj, S., Mishra, V., Chatterjee, A., Saeidi, A., Sarkar, R., & Baral, C. (2025). Investigating and addressing hallucinations of LLMs in tasks involving negation. In T. Cao, A. Das, T. Kumarage, Y. Wan, S. Krishna, N. Mehrabi, J. Dhamala, A. Ramakrishna, A. Galystan, A. Kumar, R. Gupta, & K.-W. Chang (Eds.), Proceedings of the 5th workshop on trustworthy NLP (TrustNLP 2025) (pp. 580–598). Association for Computational Linguistics. [Google Scholar] [CrossRef]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems 30 (pp. 5998–6008). Curran Associates. [Google Scholar] [CrossRef]
Venkatesh, V., & Davis, F. D. (2000). A theoretical extension of the technology acceptance model: Four longitudinal field studies. Management Science, 46(2), 186–204. [Google Scholar] [CrossRef]
Venkatesh, V., Morris, M. G., Davis, G. B., & Davis, F. D. (2003). User acceptance of information technology: Toward a unified view. MIS Quarterly, 27(3), 425–478. [Google Scholar] [CrossRef]
Vyetrenko, S., Byrd, D., Petosa, N., Mahfouz, M., Dervovic, D., Veloso, M., & Balch, T. (2020, October 15–16). Get real: Realism metrics for robust limit order book market simulations. First ACM International Conference on AI in Finance (pp. 1–8), New York, NY, USA. [Google Scholar] [CrossRef]
Wang, D., Churchill, E., Maes, P., Fan, X., Shneiderman, B., Shi, Y., & Wang, Q. (2020, April 25–30). From human-human collaboration to Human-AI collaboration: Designing AI systems that can work together with people. Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (p. 6), Honolulu, HI, USA. [Google Scholar] [CrossRef]
Wang, J., Jiang, H., Liu, Y., Ma, C., Zhang, X., Pan, Y., Liu, M., Gu, P., Xia, S., Li, W., Zhang, Y., Wu, Z., Liu, Z., Zhong, T., Ge, B., Zhang, T., Qiang, N., Hu, X., Jiang, X., … Zhang, S. (2024). A comprehensive review of multimodal large language models: Performance and challenges across different tasks. arXiv. [Google Scholar] [CrossRef]
Wang, Q., Gao, Y., Tang, Z., Luo, B., & He, B. (2024). Enhancing LLM trading performance with fact-subjectivity aware reasoning. arXiv. [Google Scholar] [CrossRef]
Wang, Q., Tang, Z., & He, B. (2025). From ChatGPT to DeepSeek: Can LLMs simulate humanity? arXiv. [Google Scholar] [CrossRef]
Wang, Y.-Y., & Chuang, Y.-W. (2023). Artificial intelligence self-efficacy: Scale development and validation. Education and Information Technologies, 29(4), 4785–4808. [Google Scholar] [CrossRef]
Wang, Z., Li, Y., Wu, J., Soon, J., & Zhang, X. (2023). FinVis-GPT: A multimodal large language model for financial chart analysis. arXiv. [Google Scholar] [CrossRef]
Warkulat, S., & Pelster, M. (2024). Social media attention and retail investor behavior: Evidence from r/wallstreetbets. International Review of Financial Analysis, 96, 103721. [Google Scholar] [CrossRef]
Warm, J. S., Parasuraman, R., & Matthews, G. (2008). Vigilance requires hard mental work and is stressful. Human Factors, 50(3), 433–441. [Google Scholar] [CrossRef] [PubMed]
Webb, T., & Sheeran, P. (2006). Does changing behavioral intentions engender behavior change? A meta-analysis of the experimental evidence. Psychological Bulletin, 132(2), 249–268. [Google Scholar] [CrossRef]
Weber, M., & Camerer, C. F. (1998). The disposition effect in securities trading: An experimental analysis. Journal of Economic Behavior & Organization, 33(2), 167–184. [Google Scholar] [CrossRef]
Webster, J., & Watson, R. T. (2002). Analyzing the past to prepare for the future: Writing a literature review. MIS Quarterly, 26(2), xiii–xxiii. [Google Scholar]
Wheat, C., & Eckerd, G. (2024). Returns-chasing and dip-buying among retail investors. Research Snapshot. Available online: https://www.jpmorganchase.com/institute/all-topics/financial-health-wealth-creation/returns-chasing-and-dip-buying-among-retail-investors (accessed on 22 June 2025).
Wheeler, A., & Varner, J. D. (2024). Scalable agent-based modeling for complex financial market simulations. arXiv. [Google Scholar] [CrossRef]
Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur, P., Rosenberg, D., & Mann, G. (2023). BloombergGPT: A large language model for finance. arXiv. [Google Scholar] [CrossRef]
Xiao, J. J., & Wu, G. (2006). Applying the theory of planned behavior to retain credit counseling clients. In D. C. Bagwell (Ed.), Proceedings of the association for financial counseling and planning education (pp. 91–101). [Google Scholar] [CrossRef]
Xu, M. (2025, May 2). 0DTEs decoded: Positioning, trends, and market impact. Volatility Insights. Available online: https://www.cboe.com/insights/posts/0-dt-es-decoded-positioning-trends-and-market-impact/?utm_source=chatgpt.com (accessed on 22 June 2025).
Xue, S., Zhou, F., Xu, Y., Jin, M., Wen, Q., Hao, H., Dai, Q., Jiang, C., Zhao, H., Xie, S., He, J., Zhang, J., & Mei, H. (2024). WeaverBird: Empowering financial decision-making with large language model, knowledge base, and search engine. arXiv. [Google Scholar] [CrossRef]
Yang, H., Zhang, B., Wang, N., Guo, C., Zhang, X., Lin, L., Wang, J., Zhou, T., Guan, M., Zhang, R., & Wang, C. D. (2024). FinRobot: An open-source AI agent platform for financial applications using large language models. arXiv. [Google Scholar] [CrossRef]
Yang, J., Tang, Y., Li, Y., Zhang, L., & Zhang, H. (2025). Dynamic hedging strategies in derivatives markets with LLM-Driven sentiment and news analytics. arXiv. [Google Scholar] [CrossRef]
Yang, Y., Zhang, Y., Wu, M., Zhang, K., Zhang, Y., Yu, H., Hu, Y., & Wang, B. (2025). TwinMarket: A scalable behavioral and social simulation for financial markets. arXiv. [Google Scholar] [CrossRef]
Yin, R. K. (2018). Case study research and applications: Design and methods (6th ed.). Scribd. Available online: https://www.scribd.com/document/687414473/YIn-2018-Case-Study (accessed on 23 June 2025).
Yin, S., Fu, C., Zhao, S., Li, K., Sun, X., Xu, T., & Chen, E. (2024). A survey on multimodal large language models. National Science Review, 11(12), nwae403. [Google Scholar] [CrossRef]
Ying, L., Collins, K. M., Wong, L., Sucholutsky, I., Liu, R., Weller, A., Shu, T., Griffiths, T. L., & Tenenbaum, J. B. (2025). On benchmarking human-like intelligence in machines. arXiv. [Google Scholar] [CrossRef]
Yu, Y., Li, H., Chen, Z., Jiang, Y., Li, Y., Zhang, D., Liu, R., Suchow, J. W., & Khashanah, K. (2023). FinMem: A performance-enhanced LLM trading agent with layered memory and character design. arXiv. [Google Scholar] [CrossRef]
Yu, Y., Yao, Z., Li, H., Deng, Z., Cao, Y., Chen, Z., Suchow, J. W., Liu, R., Cui, Z., Zhang, D., Subbalakshmi, K., Xiong, G., He, Y., Huang, J., Li, D., & Xie, Q. (2024). FinCon: A synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making. arXiv. [Google Scholar] [CrossRef]
Zhang, K., Yang, J., Inala, J. P., Singh, C., Gao, J., Su, Y., & Wang, C. (2025). Towards understanding graphical perception in large multimodal models. arXiv. [Google Scholar] [CrossRef]
Zhang, W., Zhao, L., Xia, H., Sun, S., Sun, J., Qin, M., Li, X., Zhao, Y., Zhao, Y., Cai, X., Zheng, L., Wang, X., & An, B. (2024, August 25–29). A multimodal foundation agent for financial trading: Tool-augmented, diversified, and generalist. 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 4314–4325), Barcelona, Spain. [Google Scholar] [CrossRef]
Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., Huang, X., Zhao, E., Zhang, Y., Chen, Y., Wang, L., Luu, A. T., Bi, W., Shi, F., & Shi, S. (2023). Siren’s song in the AI ocean: A survey on hallucination in large language models. arXiv. [Google Scholar] [CrossRef]
Zhang, Y., Pan, Y., Zhong, T., Dong, P., Xie, K., Liu, Y., Jiang, H., Wu, Z., Liu, Z., Zhao, W., Zhang, W., Zhao, S., Zhang, T., Jiang, X., Shen, D., Liu, T., & Zhang, X. (2024). Potential of multimodal large language models for data mining of medical images and free-text reports. Meta-Radiology, 2(4), 100103. [Google Scholar] [CrossRef]

Figure 1. Theoretical Model.

Figure 2. Five-step empirical research agenda.

Figure 3. Triangulation and Inference Logic.

Table 1. Comparison of Traditional PBC and PCA as Moderating Construct.

Feature	Traditional PBC	PCA (AI-Scaffolded)
Source of confidence	Internal experience and knowledge	Access to intelligent external systems
Nature of reasoning	Self-reliant, effortful processing	Co-constructed with AI guidance
Knowledge access	Stored internally	Queried or retrieved dynamically
Behavioral boundary	Defined by personal capability	Extended by perceived machine cognition
Example	“I know how to trade options”	“I can trade options because GPT explains it”

Table 2. EMH Operational Diagnostics.

EMH Test Focus	Operational Measure	Interpretation
Weak-form efficiency	Autocorrelation in post-trade returns; Hurst exponent analysis (Hurst, 1951)	Significant autocorrelation or persistence → potential inefficiency
Random walk behavior	Variance ratio tests (A. W. Lo & MacKinlay, 1988); cumulative return drift	Systematic drift → violation of EMH randomness
Risk-adjusted performance	Sharpe ratio (Sharpe, 1994) comparisons between LLM-assisted and baseline trades	Sustained alpha with LLM → weak-form EMH violation
Price adjustment speed	Event study of asset price movement (MacKinlay, 1997; Martinez-Blasco et al., 2023) after LLM-identified signals	Delayed reactions suggest semi-strong inefficiency

Table 3. AMH Operational Diagnostics.

AMH Construct	Observable Indicator	Measurement Method
Cognitive adaptation	Increase in complex strategies (e.g., spreads, straddles, delta-neutral)	Trade classification (pre/post LLM use)
Strategic evolution	Higher frequency of volatility exposure, use of Greeks in decision-making	Strategy tagging, prompt log analysis
Tool-conditioned efficiency	Return stabilization or reduced drawdowns in LLM-assisted trades	Rolling Sharpe ratios, drawdown histograms
Behavioral sophistication	Reduced herding, greater asset diversification	Correlation matrix of asset choices among users
Cross-sectional diffusion	Spread of institutional-grade strategies into retail segments	Option flow segmentation by account type

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gimmelberg, D.; Ludviga, I. Strategic Complexity and Behavioral Distortion: Retail Investing Under Large Language Model Augmentation. Int. J. Financial Stud. 2025, 13, 210. https://doi.org/10.3390/ijfs13040210

AMA Style

Gimmelberg D, Ludviga I. Strategic Complexity and Behavioral Distortion: Retail Investing Under Large Language Model Augmentation. International Journal of Financial Studies. 2025; 13(4):210. https://doi.org/10.3390/ijfs13040210

Chicago/Turabian Style

Gimmelberg, Dmitrii, and Iveta Ludviga. 2025. "Strategic Complexity and Behavioral Distortion: Retail Investing Under Large Language Model Augmentation" International Journal of Financial Studies 13, no. 4: 210. https://doi.org/10.3390/ijfs13040210

APA Style

Gimmelberg, D., & Ludviga, I. (2025). Strategic Complexity and Behavioral Distortion: Retail Investing Under Large Language Model Augmentation. International Journal of Financial Studies, 13(4), 210. https://doi.org/10.3390/ijfs13040210

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Strategic Complexity and Behavioral Distortion: Retail Investing Under Large Language Model Augmentation

Abstract

1. Introduction

1.1. The Distinctive Nature of LLMs vs. Previous Investment Technologies

1.2. Do Retail Investors Favor Lower-Risk, Intuitive Strategies over Complex, High-Risk Ones?

2. Materials and Methods

2.1. Extending the Theory of Planned Behavior (TPB) for LLM-Augmented Contexts

Theory of Planned Behavior (TPB): A Starting Point

2.2. Perceived Cognitive Assistance (PCA): Extending Perceived Behavioral Control for AI-Augmented Decisions

2.3. Technology Acceptance Model (TAM): Explaining Variation in LLM Uptake and Reliance

2.4. Risk-as-Feelings Theory: Modeling Affective Divergence

2.5. Behavioral Shift Index (BSI): Empirical Operationalization

2.6. Proposed Diagnostic Framework for Detecting Behavioral Shifts: Integrating Efficient Market Hypothesis and Adaptive Market Hypothesis

2.6.1. EMH as Baseline Diagnostic Framework

2.6.2. AMH as an Evolutionary Framework

3. Results

Theoretical Model—LLM Impact on Retail Investor Strategy Migration

4. Discussion and Future Research Agenda

4.1. Empirical Validation of the Theoretical Model

4.2. Detecting LLM-Induced Behavioral Change in Retail Investing

4.3. Dual Simulation Benchmarking: The Virtual Trader and Digital Persona Framework

4.3.1. Virtual Trader: A Cognitively Degraded Counterfactual

4.3.2. Digital Persona: Human Plausibility via LLM Emulation

4.3.3. Epistemic Triangulation and Inference Logic

4.4. Behavioral Bias: Quantification and Controls

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Behavioral Models Tests and Components

Appendix A.1. Components and Units of Measurement in the TPB-Based Regression Model

Appendix A.2. Components and Units of Measurement in the BSI Formula

Appendix A.3. Complementarity Within the Research Model

Appendix A.4. Convergent/Discriminant Validity Test for PCA

Appendix B. Simulation Agent Design: Virtual Trader and Digital Persona Framework for Causal Inference

Appendix B.1. Dataset Construction and Requirements

Appendix B.2. Agent Architecture Overview

Appendix B.3. Simulation Design: Cognitive Degradation Parameters

Appendix B.4. Digital Persona: Prompt-Based Behavioral Emulation via LLM Prompting

Appendix B.5. Performance Metrics and Evaluation Framework

Appendix B.6. Limitations and Extensions

Appendix B.7. Sensitivity and Stress Testing

Appendix B.8. Human Validation

Appendix B.9. Behavioral Bias Operationalization (Investor-Level)

Appendix B.10. Strategy Structural Complexity (SSC) Coding (C0–C3)

Appendix C. Candidate Empirical Methods for Market-Level Diagnostics

Appendix C.1. EMH-Aligned Diagnostics: Detecting Deviations from Informational Efficiency

Appendix C.2. AMH-Aligned Diagnostics: Interpreting Adaptive Behavioral Evolution

Appendix C.3. Integration with Core Behavioral Framework

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI