A Large Language Model-Driven Strategic Evaluation Framework via Time-Series Directed Acyclic Graphs

Zou, Mingyin; Zhu, Xiaomin; Ye, Yanqing; You, Guangrong; Ma, Li

doi:10.3390/app16042007

Open AccessArticle

A Large Language Model-Driven Strategic Evaluation Framework via Time-Series Directed Acyclic Graphs

by

Mingyin Zou

,

Xiaomin Zhu

,

Yanqing Ye

^*,

Guangrong You

and

Li Ma

Strategic Assessments and Consultation Institute, Academy of Military Science, Beijing 100091, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(4), 2007; https://doi.org/10.3390/app16042007

Submission received: 4 January 2026 / Revised: 11 February 2026 / Accepted: 12 February 2026 / Published: 18 February 2026

(This article belongs to the Special Issue Applied Machine Learning in Industry 4.0)

Download

Browse Figures

Versions Notes

Abstract

Strategic evaluation is essential for decision-making under uncertainty. Yet existing qualitative and quantitative methods—including chat-oriented large language model (LLM) evaluations—are difficult to deploy in complex, dynamic environments. They often fail to represent nonlinear causal dependencies among indicators, account for temporal lags, or support scalable reasoning. To address these limitations, we propose an LLM-driven strategic evaluation framework with three innovations. First, the framework integrates LLMs across the evaluation lifecycle and couples their qualitative reasoning with quantitative model computation, improving both efficiency and deployability. Second, we introduce a Time-Series Directed Acyclic Graph (TS-DAG) indicator system that explicitly encodes causal structure and time-lagged interdependencies. Third, we develop an LLM-driven procedure that automatically derives the TS-DAG architecture and instantiates its computational parameters, reducing reliance on expert-only construction. We validate the framework through an empirical study of the new energy vehicle market, complemented by baseline algorithm comparisons and sensitivity analyses. The results show that the proposed framework can uncover core indicators, capture competitive dynamics, and explain long-term strategic outcomes across varying environmental conditions. Overall, the framework provides a robust solution for strategic evaluation in complex settings, bridging qualitative strategic reasoning and quantitative, data-driven analysis.

Keywords:

strategic evaluation; temporal-sequential directed acyclic graph; large language model; human–AI co-reasoning

1. Introduction

As global competition accelerates and decision environments grow more complex, strategic evaluation has become increasingly important. Strategic evaluation provides a structured way to analyze, quantify, and predict the consequences of alternative strategies, thereby supporting decision-making [1]. However, strategic settings typically involve multiple interacting agents, nonlinear causal dependencies, and substantial uncertainty. These factors make rigorous evaluation challenging, yet essential for modern governance and enterprise planning.

Strategic evaluation typically involves two coupled mechanisms. First, strategic actions directly affect target indicators through resource allocation, capability development, market positioning, and related levers. Second, these direct effects propagate through an indicator network, where changes in one variable induce downstream adjustments in others. For example, increased Research and Development (R&D) investment may reduce short-term profitability but later improve innovation output and market share [2]. Crucially, these dependencies are both causal and time-lagged: strategic effects rarely appear immediately and often emerge after heterogeneous delays. This high-dimensional, nonlinear propagation makes strategic evaluation methodologically demanding, yet essential for anticipating long-term consequences and managing uncertainty.

Traditional strategic evaluation methods span both qualitative and quantitative paradigms. On the qualitative side, SWOT (Strengths, Weaknesses, Opportunities, and Threats) is commonly used to organize internal and external factors [3,4]. PESTEL (Political, Economic, Social, Technological, Environmental, and Legal) analysis supports macro-environmental scanning [5,6], and the Balanced Scorecard links strategic objectives to multidimensional performance indicators [7]. Despite their widespread use, these approaches rely heavily on manual expert input. This reliance introduces subjectivity, constrains scalability, and limits the granularity and data-driven interpretability needed to model complex strategic dynamics.

In the quantitative domain, multi-criteria decision-making (MCDM) methods are widely used to provide numerical assessments [8]. Representative examples include AHP (Analytic Hierarchy Process), which derives weights from hierarchical comparisons [9]; TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution), which ranks alternatives by their distance to ideal reference points [10]; and ELECTRE (Elimination and Choice Expressing Reality), which supports decision-making via outranking relations [11]. In practice, when these methods are applied to qualitative strategy evaluation, experts must still specify how each strategy affects the underlying indicators [12,13]. The model then propagates these expert-defined impacts to produce final evaluation scores. Despite their usefulness, most MCDM pipelines rely on linear aggregation and compress complex strategic dynamics into a single scalar. As a result, they struggle to represent nonlinear causal dependencies and time-lagged effects that often govern how strategic influence propagates in real settings.

Large language models (LLMs) have recently reshaped how complex problems are formulated and solved. Built on the Transformer architecture [14], these models use self-attention and are pre-trained on large-scale corpora via self-supervised learning [15,16]. This progress has also changed the landscape of strategic evaluation. LLMs provide strong semantic reasoning and contextual understanding, which helps translate high-level strategic concepts into operational, measurable indicators. When integrated with evaluation pipelines, LLMs can structure qualitative considerations into quantitative representations and automate the interpretation of unstructured inputs. This capability complements traditional models by better handling nonlinear interactions and supporting more precise, data-driven strategic assessment.

Accordingly, integrating LLMs into both qualitative [17,18,19] and quantitative [20,21,22] evaluation paradigms has become an active research direction. A common line of work treats LLMs as virtual domain experts and uses dialog-driven prompting to elicit strategic judgments or simulate decision-making [21]. However, many existing studies remain primarily conversational and are not tightly coupled with a rigorous evaluation pipeline. Using LLMs as standalone evaluators raises practical concerns: their generative nature limits interpretability, and stochastic outputs can reduce reliability. As a result, LLM-only assessments may be fluent yet fail to meet the precision and robustness requirements of high-stakes strategic evaluation.

In summary, despite progress in both traditional methods and LLMs, strategic evaluation still faces three key limitations: (1) Modeling efficiency and integration. Strategic evaluation is knowledge-intensive and highly complex, which makes model construction costly. Traditional approaches rely on prolonged expert intervention, while many LLM-based efforts remain at the level of ad hoc dialog. A systematic framework that embeds LLMs across the evaluation lifecycle—while maintaining scientific rigor and interpretability—has yet to be established. (2) Dynamic causality and temporal lags. Strategic indicators interact through nonlinear causal mechanisms and time-lagged dependencies. However, many existing paradigms rely on static structures or linear aggregation, which cannot faithfully represent how strategic influence propagates over time. (3) Scalability of indicator-system construction. Building high-dimensional indicator systems and specifying their relationships is difficult to scale when it depends primarily on domain experts. This motivates automated methods that can derive indicator architectures and instantiate their parameters in a way that better reflects underlying strategic dynamics.

To address these challenges, we propose a methodology that couples LLM-based reasoning with explicit structural modeling. The contributions are summarized as follows:

1.: LLM-Driven Strategic Evaluation Framework: We present a framework that integrates LLMs across the evaluation lifecycle to support the analysis of complex strategic environments. Evaluation elements are elicited and refined by the LLM, and candidate indicators are derived to align with the evaluation purpose and rules.
2.: TS-DAG Indicator System Architecture: A time-series directed acyclic graph (TS-DAG) representation is introduced. Acyclicity is enforced for zero-lag dependencies while causal and time-lagged relationships among indicators are encoded, enabling sequential computation and interpretable propagation of strategic influence over time.
3.: LLM-Driven TS-DAG Instantiation: We propose an automated procedure in which the TS-DAG structure is derived and node-level computational forms and parameters are instantiated by an LLM, reducing reliance on expert-only modeling. The resulting indicator system captures nonlinear relationships, causal chains, and delayed effects.
4.: Empirical Validation in the NEV Market: The proposed framework is validated in a new energy vehicle (NEV) market case study. Competitive dynamics among three representative participants (legacy automaker, technology innovator, and price leader) are analyzed, and the influence of key variables (e.g., consumer confidence, subsidy level, and battery breakthroughs) on market-share outcomes is evaluated. In addition, we conduct comparative experiments against two baselines (AHP-based and prompt-based) to assess the robustness of market-share evolution.

2. Materials and Methods

2.1. LLM-Driven Strategic Evaluation Framework

Figure 1 provides an overview of the proposed LLM-driven strategic evaluation framework. The workflow comprises four interconnected phases: (i) evaluation-element specification, (ii) indicator-system construction, (iii) strategic deduction and evaluation, and (iv) post hoc data analysis. All LLMs used as algorithmic components are denoted by

Φ

, and the corresponding prompts are provided in the Supplementary Materials. By coupling expert input with LLM-based reasoning at each phase, the framework forms a closed-loop pipeline that maps problem definitions to quantitative insights.

Evaluation-Element Determination: The workflow begins with human–AI collaboration. Domain experts and the LLM jointly specify the evaluation constraints, including (i) the evaluation topic (strategic context), (ii) the evaluation objects (entities under review), (iii) the evaluation purpose (target outcomes), and (iv) the evaluation rules (boundaries and logical constraints).

Indicator System Construction: This phase uses LLM-based semantic reasoning to derive the indicator system topology (e.g., causal and hierarchical relations) and to instantiate the functional dependencies and parameters needed for quantitative computation.

Strategic Deduction and Evaluation: Given perturbations in key environmental variables, the framework generates strategy sequences and evaluates their impacts. The LLM acts as an inference engine that maps each strategic action to quantitative perturbations on the indicator system, enabling influence to propagate through the network over time.

Data Analysis: After deduction, the framework aggregates simulation traces and extracts actionable insights through (i) trend analysis (temporal evolution), (ii) correlation analysis (interdependencies), and (iii) importance analysis (key drivers).

The following subsections detail the core components of the framework, including the TS-DAG indicator system, its automated construction, the dynamic deduction mechanism, and the post hoc data analysis methods.

2.2. Temporal-Sequential Directed Acyclic Graph

Traditional indicator systems are often implemented as hierarchical trees, general network structures, or hybrid tree–network structures. While effective for static settings, these representations are ill-suited to strategic evaluation, where indicators span multiple domains (e.g., politics, economics, military, and diplomacy) and frequently exhibit cross-domain dependencies. Tree-structured systems typically compute node values in a strict bottom-up order. This design makes it difficult to represent lateral dependencies among indicators at the same level. Network-based systems can express such dependencies but may introduce cycles. Cycles can create ambiguous update orders and redundant computation; for example, if A influences B, B influences C, and C influences A, then A may be updated multiple times within a single iteration. Moreover, strategic influence is rarely instantaneous: causal effects often emerge with temporal delays. Conventional tree and network representations do not explicitly encode these time-lagged dependencies, limiting their ability to model how strategic effects propagate over time.

To address these challenges, we propose a Temporal-Sequential Directed Acyclic Graph (TS-DAG) representation for strategic indicator systems (Figure 2). The TS-DAG contains three node types: input nodes, constant nodes, and compute nodes. Each compute node derives its state from a function of its predecessors’ current values and time-lagged (historical) values.

2.2.1. Mathematical Modeling

A TS-DAG contains heterogeneous node types, each serving a distinct role in the computation graph. Unlike standard static graphs, its dependencies include temporal offsets: a node at time t may depend on the value of another node from

t - τ

.

To represent this structure and ensure computability, we formalize a TS-DAG as a triple

G = (V, E, F)

, where V is the node set, E is the directed edge set with temporal lags, and

F = {f_{v} ∣ v \in V_{comp}}

is the collection of computation functions for the compute nodes.

Node Model. Nodes in strategic evaluation systems play heterogeneous roles: some encode fixed parameters, some ingest exogenous time-series signals, and others compute derived states. To capture this heterogeneity, we partition V into three pairwise disjoint subsets:

V = V_{const} \cup V_{input} \cup V_{comp}, V_{i} \cap V_{j} = \emptyset \forall i \neq j

(1)

where

V_{const}

denotes constant nodes with time-invariant values,

V_{input}

denotes input nodes driven by external time series, and

V_{comp}

denotes compute nodes whose states are derived via functions in

F

.

Edge Model. Temporal dependencies often include explicit time lags: a node’s state at time t may depend on another node’s state at time

t - τ

. We therefore define the directed edge set as

E \subseteq V \times V \times N \geq 0

. Each edge

e = (u, v, τ) \in E

indicates that node v consumes node u with lag

τ

.

Function Model. The function model specifies how node states evolve and ensures that a TS-DAG is executable. The update rule depends on the node type. For constant nodes

v \in V_{const}

, the state is time-invariant:

x_{v} (t) = c_{v}, \forall t \geq 0

(2)

where

c_{v} \in R

is a predefined constant.

For input nodes

v \in V_{input}

, the state is given by an exogenous time series:

x_{v} (t) = I_{v} (t), \forall t \geq 0

(3)

where

I_{v} : N_{\geq 0} \to R

denotes the external input signal.

For each compute node

v \in V_{comp}

, we define the predecessor set

S_{v} = {(u, τ) ∣ (u, v, τ) \in E}

and its cardinality

k_{v} = | S_{v} |

. We then introduce an indexing bijection

σ_{v} : {1, \dots, k_{v}} \to S_{v}

such that

σ_{v} (j) = (u_{j}, τ_{j})

.

At time t, we assemble the input vector

z_{v} (t) \in R^{k_{v}}

by retrieving each predecessor value at the required lag. To keep the update rule well-defined when

t - τ_{j} < 0

, we enforce the following boundary condition:

z_{v, j} (t) = \{\begin{matrix} x_{u_{j}} (t - τ_{j}) & if t - τ_{j} \geq 0 \\ Γ (u_{j}) & otherwise \end{matrix}

(4)

where

Γ : V \to R

provides a default value for each node.

Given

z_{v} (t)

, the node state is computed by applying the update function

f_{v} \in F

:

x_{v} (t) = f_{v} (z_{v} (t)), f_{v} : R^{k_{v}} \to R

(5)

2.2.2. Topological Constraints and Sequential Computation

A key property of TS-DAGs is that temporal edges can form feedback loops across time. Specifically, a past state of node u can influence node v, and a past state of node v can, at a different lag, influence node u. In contrast, we forbid instantaneous mutual dependence within the same time step; otherwise, the update order would be undefined and the computation ill-posed. We enforce this constraint by defining the zero-lag subgraph and requiring it to be acyclic.

Zero-Lag Subgraph. We define the zero-lag subgraph

G_{0} = (V, E_{0})

, where

E_{0} = {(u, v) ∣ (u, v, 0) \in E}

contains exactly the edges with zero temporal lag. We enforce that

G_{0}

is a directed acyclic graph (DAG). This constraint eliminates instantaneous circular dependence and yields a well-defined computation order at each time step.

Topological Ordering. To derive an executable computation sequence, Kahn’s algorithm [23] is applied on the zero-lag subgraph

G_{0}

. This procedure constructs a linear ordering

L = (v_{1}, v_{2}, \dots, v_{| V |})

that satisfies the precedence constraint:

\forall (u, v, 0) \in E : pos (u) < pos (v)

(6)

where

pos (v)

denotes the position of node v in

L

. As a result, when

x_{v} (t)

is computed for any

v \in V_{comp}

, all required zero-lag predecessor states at time t have already been computed.

We define the zero-lag in-degree of node v as

d_{in}^{0} (v) = | {u ∣ (u, v, 0) \in E} |

. Kahn’s algorithm initializes a queue with the root node set

R = {v \in V ∣ d_{in}^{0} (v) = 0}

. This set includes

V_{const}

,

V_{input}

, and any compute node that depends only on historical states. The algorithm then repeatedly removes a node from the queue, appends it to

L

, and deletes its outgoing zero-lag edges. These deletions decrement the corresponding in-degrees and may expose new roots.

We validate acyclicity by checking whether the algorithm produces a full ordering. If

| L | < | V |

at termination, then

G_{0}

contains a directed cycle and the TS-DAG specification is invalid. Algorithm 1 details the full procedure.

Algorithm 1 Topological Sort for TS-DAG

Require: TS-DAG $G = (V, E, F)$
Ensure: A linear ordering $L$ , or ⊥ if cycle detected
1:
$E_{0} \leftarrow {(u, v) ∣ (u, v, 0) \in E}$
2:
$L \leftarrow []$ ; $Q \leftarrow []$
3:
$d_{in} [v] \leftarrow 0, \forall v \in V$
4:
for each $(u, v) \in E_{0}$ do
5:
       $d_{in} [v] \leftarrow d_{in} [v] + 1$
6:
for each $v \in V$ with $d_{in} [v] = 0$ do
7:
       $Q . E NQUEUE (v)$
8:
while $Q \neq \emptyset$ do
9:
       $u \leftarrow Q . D EQUEUE ()$ ; $L . A PPEND (u)$
10:
      for each $(u, v) \in E_{0}$ do
11:
              $d_{in} [v] \leftarrow d_{in} [v] - 1$
12:
             if $d_{in} [v] = 0$ then
13:
                   $Q . E NQUEUE (v)$
14:
return $| L | = | V |$ ? $L$ : ⊥

Because the graph topology is fixed during simulation (only node states

x_{v} (t)

evolve),

L

is computed once at initialization. It is then cached and reused for all time steps

t \in {0, 1, \dots, T}

, which amortizes the

O (| V | + | E_{0} |)

sorting cost across the full horizon.

2.2.3. TS-DAG State Propagation

Given the topological ordering

L

, the TS-DAG is evaluated sequentially over the simulation horizon. At each time step t, nodes are processed in the order specified by

L

. This ordering resolves all instantaneous (zero-lag) dependencies before any dependent computation is executed.

For each compute node

v \in V_{comp}

, we first assemble its input vector

z_{v} (t) \in R^{k_{v}}

using Equation (4). We then instantiate the node state by applying its transfer function, as specified in Equation (5).

Algorithm 2 summarizes the full propagation procedure. We establish correctness via two invariants. (i) For any zero-lag edge

(u, v, 0) \in E

, the ordering places u before v, so

x_{u} (t)

is available when we construct

z_{v} (t)

. (ii) For any lagged edge with

τ > 0

, temporal causality ensures that

x_{u} (t - τ)

has already been computed in an earlier time step.

The algorithm runs in

O (T \cdot (| V | + | E |))

time. It uses

O (| V | \cdot T)

space to store node trajectories.

Algorithm 2 TS-DAG State Propagation

Require: TS-DAG $G = (V, E, F)$ , horizon T, default mapping $Γ$
Ensure: State trajectories ${x_{v} (t)}_{v \in V, t \in [0, T]}$
1:
$L \leftarrow T OPOLOGICAL S ORT (G)$ (Algorithm 1)
2:
if $L = ⊥$ then return Error
3:
for $t = 0$ to T do
4:
      for each v in $L$ do
5:
            if $v \in V_{const}$ then
6:
                  $x_{v} (t) \leftarrow c_{v}$
7:
            else if $v \in V_{input}$ then
8:
                  $x_{v} (t) \leftarrow I_{v} (t)$
9:
            else
10:
                 for $j = 1$ to $k_{v}$ do
11:
                        $(u_{j}, τ_{j}) \leftarrow σ_{v} (j)$
12:
                        $z_{v, j} (t) \leftarrow x_{u_{j}} (t - τ_{j})$ if $t \geq τ_{j}$ else $Γ (u_{j})$
13:
                  $x_{v} (t) \leftarrow f_{v} (z_{v} (t))$
14:
return ${x_{v} (t)}_{v \in V, t \in [0, T]}$

2.3. Automated Construction of the Strategic TS-DAG

Section 2.2 specifies the TS-DAG formally and establishes its formal foundation. However, manually instantiating a TS-DAG for a complex strategic scenario is labor-intensive and prone to cognitive bias. To mitigate these issues, we propose an automated construction framework that couples the semantic reasoning capability of LLMs with the structural rigor of graph theory. The framework derives the network topology and instantiates node-level parametric logic in a single pipeline.

2.3.1. LLM-Driven TS-DAG Structure Generation

Manually constructing TS-DAGs for complex strategic evaluation is impractical. The indicator space is combinatorial, and temporal interdependencies further increase the modeling burden.

We propose an LLM-driven, top-down recursive decomposition procedure. The LLM derives candidate indicators, decomposes composite indicators into sub-indicators, and assigns temporal lags to the resulting dependencies. This design preserves semantic alignment with high-level strategic intent while enabling systematic graph construction.

Strategic evaluation is specified by four elements: (i) the evaluation theme (domain of interest), (ii) the evaluation object (entity under evaluation), (iii) the evaluation purpose (decision intent), and (iv) the evaluation rule (domain constraints and normative criteria). Together, these elements define the evaluation scope and the criteria used to judge outcomes.

We encode them as a Strategic Context Tuple

C = 〈 Theme, Object, Purpose, Rule 〉

. This tuple serves as the semantic anchor that guides subsequent TS-DAG construction.

Given

C

, an LLM-driven agent derives the indicator hierarchy via top-down recursive decomposition. It starts with a root concept node derived from

C

and then iteratively expands the graph.

For each node v, the agent classifies it as either atomic (directly observable) or composite (derived from other indicators). It assigns atomic nodes to

V_{input}

. It assigns composite nodes to

V_{comp}

and recursively decomposes them until all leaves are atomic. Figure 3 illustrates this structure-generation process.

The decomposition is inherently temporal. Instead of assuming instantaneous dependencies and repairing cycles post hoc, the agent infers a temporal lag

τ

for each predecessor by reasoning about the underlying causal mechanism. This design captures feedback across time while enforcing acyclicity in the zero-lag subgraph

G_{0}

.

Concurrently, the agent synthesizes the symbolic form of each transfer function

f_{v}

. It leaves parameters unspecified for subsequent calibration. Algorithm 3 formalizes the overall procedure.

2.3.2. Data-Driven Parameter Identification

The symbolic transfer functions generated in Algorithm 3 must be instantiated with concrete parameters before we can perform quantitative evaluation. However, manual tuning is impractical, and labeled strategic data are typically unavailable.

We therefore adopt an iterative refinement procedure. In this procedure, the LLM-driven agent generates diagnostic scenarios and evaluates the resulting model outputs against domain knowledge.

For each compute node

v \in V_{comp}

with predecessor set

S_{v} = {(u_{j}, τ_{j})}_{j = 1}^{k_{v}}

, we instantiate its parameters via the following iterative procedure.

Step 1 (Initialize). The agent generates a scenario set

X_{v} \in R^{N \times k_{v}}

that covers boundary cases, typical ranges, and extreme conditions. It also proposes an initial parameter vector

θ_{v}^{(0)}

:

X_{v}, θ_{v}^{(0)} \leftarrow Φ (I NITIALIZE, v, S_{v}, C)

(7)

Step 2 (Forward evaluation). At iteration t, we evaluate the symbolic function on all scenarios:

{\hat{y}}_{v}^{(t, n)} = f_{v} (x_{v}^{(n)}; θ_{v}^{(t)}), n = 1, \dots, N

(8)

Algorithm 3 LLM-Driven TS-DAG Structure Generation

Require: Strategic Context $C$ , LLM-driven Agent $Φ$
Ensure: TS-DAG structure $(V, E)$ with symbolic functions ${f_{v}}_{v \in V_{comp}}$
1:
$V \leftarrow {v_{root}}$ ; $V_{input} \leftarrow \emptyset$ ; $V_{comp} \leftarrow \emptyset$ ; $E \leftarrow \emptyset$
2:
$S \leftarrow [v_{root}]$
3:
while $S \neq \emptyset$ do
4:
     $v \leftarrow S . P OP ()$
5:
    if ${type}_{v} = atomic$ then
6:
           $V_{input} \leftarrow V_{input} \cup {v}$
7:
    else
8:
           $V_{comp} \leftarrow V_{comp} \cup {v}$
9:
           ${(u_{i}, τ_{i})}_{i = 1}^{k} \leftarrow Φ (D ECOMPOSE, v, C)$
10:
          for $i = 1$ to k do
11:
                if $u_{i} \notin V$ then
12:
                       $V \leftarrow V \cup {u_{i}}$ ; $S . P USH (u_{i})$
13:
                 $E \leftarrow E \cup {(u_{i}, v, τ_{i})}$
14:
           $f_{v} \leftarrow Φ (S YNTHESIZE S YMBOLIC, v, {(u_{i}, τ_{i})})$
15:
return $(V, E, {f_{v}}_{v \in V_{comp}})$

Step 3 (Feedback). The agent validates each prediction against domain knowledge and returns directional feedback:

d_{v}^{(t)} = Φ (E VALUATE, X_{v}, {\hat{y}}_{v}^{(t)}, C)

(9)

where

d_{v}^{(t, n)} \in {- 1, 0, + 1}

indicates whether

{\hat{y}}_{v}^{(t, n)}

is underestimated, appropriate, or overestimated.

Step 4 (Update). The agent aggregates the feedback and adjusts the parameters:

θ_{v}^{(t + 1)} = Φ (A DJUST, θ_{v}^{(t)}, d_{v}^{(t)}, X_{v}, C)

(10)

Stopping criterion. We terminate the refinement when either the parameter change is small or the feedback indicates sufficient alignment:

∥ θ_{v}^{(t + 1)} - θ_{v}^{(t)} ∥ < ϵ or \frac{1}{N} \sum_{n = 1}^{N} | d_{v}^{(t, n)} | < δ

(11)

After T iterations, we obtain the final parameter vector

{\hat{θ}}_{v} = θ_{v}^{(T)}

. We execute this procedure sequentially along the topological ordering to respect computational dependencies.

2.4. Strategic Action Quantification and Propagation

The TS-DAG executes on numerical inputs, whereas strategic decisions are naturally expressed in language (e.g., ‘launch a diplomatic initiative’ or ‘accelerate technology deployment’). A principled mechanism is therefore required to translate each qualitative action into quantitative perturbations of the exogenous input nodes

V_{i n}

.

We use an LLM-driven agent as a domain-aware evaluator. Given a chronologically ordered action sequence

S = (s_{1}, s_{2}, \dots, s_{K})

, where action

s_{k}

is enacted at time

t_{k}

, the agent infers the impact of

s_{k}

on each input node. This inference is context-dependent. Specifically, it conditions on both the prior action history

(s_{1}, \dots, s_{k - 1})

and the current system state

x^{(t_{k} - 1)} = {x_{v}^{(t_{k} - 1)}}_{v \in V}

. As a result, identical actions can yield different outcomes under different histories and states.

For each action

s_{k}

and input node

v \in V_{i n}

, the agent returns a discrete impact label

I_{v}^{(k)} \in I

, where

I = {L +, S +, N, S -, L -}

denotes large positive, small positive, neutral, small negative, and large negative effects.

We convert this qualitative label into a multiplicative factor via a predefined mapping

M : I \to R^{+}

, with values

{1.4, 1.1, 1.0, 0.9, 0.6}

. We then update each input node as

x_{v}^{(t_{k})} = x_{v}^{(t_{k} - 1)} \cdot M (Φ (s_{k}, (s_{1}, \dots, s_{k - 1}), x^{(t_{k} - 1)}, v)), \forall v \in V_{i n}

(12)

where

Φ (\cdot)

denotes the agent’s impact-evaluation function. After we update all input nodes, Algorithm 2 propagates the perturbations through the TS-DAG. This propagation yields the resulting direct and indirect consequences across the indicator system.

We implement strategic deduction as an interaction between two roles: decision-makers and evaluators. Decision-makers propose strategic actions, whereas evaluators assess their implications. This separation mirrors real-world practice, where strategy formulation and impact evaluation require distinct expertise.

Within the TS-DAG architecture, we operationalize this interaction as a three-phase cyclic process: (i) strategic gaming, (ii) strategic evaluation, and (iii) graph propagation. Figure 4 illustrates the overall procedure.

Strategic gaming. Multiple agents propose strategic actions, each representing a distinct stakeholder within the simulated environment. Each agent derives its proposal from the current system state and its anticipation of other agents’ responses.

Strategic evaluation. This phase quantifies the joint effect of the proposed (and potentially conflicting) actions. Given the current actions, the action history, and each participant’s capability metrics, the evaluator derives a net perturbation for each input node. The evaluation accounts for interaction effects, i.e., actions may reinforce, offset, or amplify one another.

Graph propagation. We then propagate the evaluated perturbations through the TS-DAG. Starting from the perturbed input nodes

V_{i n}

, the TS-DAG executes its causal update rules to compute all downstream node states. This propagation exposes both immediate effects and delayed, higher-order consequences.

2.5. Extracting Strategic Insights Through Comparative Analysis

The goal of strategic deduction is to derive actionable insights. However, a single simulation run may mainly reflect idiosyncratic conditions. To extract robust patterns, results are compared across deduction runs under different key-variable configurations. This comparative design helps reveal the underlying logic of strategic dynamics rather than scenario-specific outcomes.

Our workflow proceeds in two steps. First, key-variable combinations that define the strategic landscape are identified. Second, for each combination, multiple deduction rounds are run, and the resulting trajectories are analyzed. Insights are then extracted using three complementary analyses: trend analysis, correlation analysis, and feature importance analysis.

2.5.1. Trend Analysis

Trend analysis characterizes how each agent’s indicators evolve over time within a fixed deduction scenario. To make this analysis precise, we first define the data generated by each deduction round.

Deduction definition. Fix a key-variable configuration

K = {k_{1}, k_{2}, \dots, k_{m}}

. N independent deduction rounds are run. The r-th round (

r \in {1, 2, \dots, N}

) is denoted by the tuple:

R_{K}^{(r)} = (K, C^{(r)}, D_{K}^{(r)})

(13)

where

C^{(r)}

specifies the contextual background (e.g., initial conditions and environmental parameters), and

D_{K}^{(r)}

stores the resulting trajectories of all agents’ indicators.

Strategic Indicator Trend Analysis. For participant i in the r-th deduction round under key variable combination

K

, the strategic indicators form a time-indexed vector

x_{i, K}^{(r)} (t) = {[x_{i, K, 1}^{(r)} (t), x_{i, K, 2}^{(r)} (t), \dots, x_{i, K, p}^{(r)} (t)]}^{⊤} \in R^{p}

, where p is the number of tracked indicators and

t \in {0, 1, \dots, T}

represents discrete time steps within the deduction horizon T.

Analyzing the changing trends of strategic indicators of different agents over time can reveal the conflicts or mutual benefits among these indicators, as well as the competitive and cooperative trends among different agents under the same indicator.

2.5.2. Correlation Analysis

Correlation analysis quantifies two types of relationships: (i) correlations among indicators and (ii) correlations between indicators and key variables. These analyses capture both intrinsic dependencies in the indicator system and sensitivity to external conditions.

Inter-indicator correlation. To derive patterns that generalize across scenarios, we aggregate indicator values across all key-variable configurations and deduction rounds. For indicator j, we define the aggregated set:

X_{j} = {x_{i, K, j}^{(r)} (t) ∣ K \in K, r \in {1, \dots, N}, i \in P, t \in {0, \dots, T}}

(14)

where

K

denotes the set of key-variable configurations and

P

denotes the participant set.

The Pearson correlation between indicators j and l is then computed as

ρ_{j, l} = \frac{Cov (X_{j}, X_{l})}{σ_{X_{j}} σ_{X_{l}}}

(15)

The coefficient is interpreted as follows:

ρ_{j, l} > 0

indicates positive co-movement,

ρ_{j, l} < 0

indicates negative co-movement, and

ρ_{j, l} \approx 0

indicates weak dependence.

Indicator-variable correlation. To quantify how each participant responds to external conditions, we compute the correlation between its indicators and the key variables.

One-hot encoding. Each key-variable configuration

K = {k_{1}, k_{2}, \dots, k_{m}}

is one-hot encoded. Let

V_{q}

be the value set of the q-th key variable, with cardinality

| V_{q} | = d_{q}

. For a value

v \in V_{q}

, we define the one-hot vector

e_{q}^{(v)} \in {0, 1}^{d_{q}}

, where the entry corresponding to v is 1 and all others are 0. We then concatenate all one-hot vectors to obtain

h_{K} = [e_{1}^{(k_{1})}, e_{2}^{(k_{2})}, \dots, e_{m}^{(k_{m})}] \in {0, 1}^{D}, D = \sum_{q = 1}^{m} d_{q}

(16)

Paired dataset. For participant i and indicator j, we construct the paired dataset:

Z_{i, j} = {(x_{i, K, j}^{(r)} (t), h_{K}) ∣ K \in K, r \in {1, \dots, N}, t \in {0, \dots, T}}

(17)

Correlation. The Pearson correlation between indicator values and the s-th one-hot component of key variables is computed as

ρ_{i, j, s} = \frac{\sum_{(x, h) \in Z_{i, j}} (x - μ_{i, j}) (h_{s} - μ_{s})}{| Z_{i, j} | \cdot σ_{i, j} σ_{s}}

(18)

where

h_{s}

denotes the s-th component of

h_{K}

. The coefficient

ρ_{i, j, s}

identifies which participants are most sensitive to specific key variables.

2.5.3. Feature Importance Analysis

Feature importance analysis quantifies how strategic actions and exogenous conditions contribute to the ultimate outcome, represented by the top-level node in the TS-DAG. Because the underlying relationships can be nonlinear and interaction-driven, we use Random Forest (RF) regression as a flexible, nonparametric estimator.

Experimental protocol and randomness control. A set of independent simulation trajectories indexed by scenario-action pairs

(K, r)

is analyzed. Here,

K \in K

denotes a key-variable configuration and r indexes independently generated action sequences. All analyses use trajectories sampled over a common horizon

t \in {0, \dots, T}

. To ensure reproducibility, the pseudo-random seed is fixed for RF training and for the permutation procedure.

Top-level objective. Let

N_{top}

denote the set of top-level nodes in the TS-DAG. For a target

j^{*} \in N_{top}

, we denote its value for participant i under scenario

K

in run r at time t by

x_{i, K, j^{*}}^{(r)} (t)

.

Block-wise RF for action importance (lagged features). The observations form a panel time series. To avoid temporal leakage, lagged predictors are used and run-wise (block) splits are performed so that all time steps from the same run remain within a single fold.

Let

g (i, K, r)

denote the run identifier. For

t \geq 1

, we define the feature vector as

f_{i, K, r, t} = {[a_{i, K, r} (t - 1), c (K)]}^{⊤}

(19)

where

a_{i, K, r} (t - 1)

stacks the lag-1 strategic action variables (e.g.,

R D_M o n e y

,

M K T_M o n e y

,

P r i c e

,

P r o f i t

for each agent), and

c (K)

encodes the macro-environmental variables (Consumer Confidence, Subsidy Level, Battery Technology). Samples at

t = 0

are excluded because lagged values are not available.

The feature–target dataset is constructed as

D_{RF} = ⋃_{i \in P} ⋃_{K \in K} ⋃_{r = 1}^{N} ⋃_{t = 1}^{T} {(f_{i, K, r, t}, y_{i, K, r, t})}, y_{i, K, r, t} = x_{i, K, j^{*}}^{(r)} (t)

(20)

An RF regressor

F_{RF}

is trained using a fixed hyperparameter setting shared across targets. Generalization is assessed via run-wise (block) cross-validation (GroupKFold grouped by

g (i, K, r)

). Predictive performance is summarized using

R^{2}

and mean absolute error (MAE).

Permutation importance and rank stability. To mitigate known biases of impurity-based importance under correlated predictors, permutation importance is computed on each held-out fold, and the resulting distributions are aggregated. For feature j, we define its importance as the expected degradation in validation score after permuting

f_{j}

:

I_{j} = E [Δ S_{j}], Δ S_{j} = S (\hat{y}, y) - S ({\hat{y}}^{π (j)}, y)

(21)

where

S

denotes the validation metric (we use

R^{2}

), and

{\hat{y}}^{π (j)}

is the prediction after permuting feature j in the validation set.

3. Results

3.1. Case Description

To validate the framework, the new energy vehicle (NEV) market is used as a case study. We model a competitive landscape with three archetypal participants: a legacy automaker, a technology innovator, and a price leader. Across diverse market scenarios, their strategic performance is simulated and evaluated.

3.1.1. Evaluation Elements

Evaluation Topic. We identify the key drivers that determine market share across different types of NEV manufacturers.

Evaluation Objects. The three agents represent distinct enterprise archetypes.

Legacy Automaker (Legacy). A traditional premium brand with strong brand equity, mature manufacturing capability, and extensive offline channels. It is comparatively slower in adopting intelligent and software-defined features.
Technology Innovator (Tech). A technology-centric player whose competitive advantage lies in intelligent driving and user experience. It maintains an avant-garde brand image and typically adopts a direct-sales model.
Price Leader (Price). A cost-performance-oriented player targeting mass-market households with aggressive pricing and relatively high specifications.

Evaluation Purpose. Each participant follows a distinct objective. Legacy aims to preserve leadership in the premium segment and maintain its EV market share at least at the level achieved in the internal combustion engine era. Tech aims to become the leading smart-EV brand, monetize technological advantages through sustained profitability, and cultivate a high-end user community. Price aims to rapidly expand market share and achieve economies of scale.

Evaluation Rules. We apply three principles. First, we evaluate balanced performance across multiple dimensions rather than optimizing a single metric (e.g., market share, financial health, and brand equity). Second, we emphasize sustained performance over time to distinguish short-term gains from long-term advantages. Third, we evaluate robustness across environmental conditions and rank strategies by their performance across scenarios, rather than by peak performance in a single favorable setting.

3.1.2. Key Variables

We consider three macro-level variables that shape the NEV market: the Consumer Confidence Index (CCI), the Government Subsidy Level (GSL), and the Battery Technology Bottleneck (BTB).

Consumer Confidence Index (CCI) captures market sentiment with three levels (High, Medium, Low). It indicates consumers’ willingness to purchase high-end consumer goods.

Government Subsidy Level (GSL) specifies the intensity of government subsidies for high-end consumption (Strong, Medium, Weak). It directly affects vehicle prices and manufacturers’ profits.

Battery Technology Bottleneck (BTB) denotes whether a critical battery technology has achieved a breakthrough (Yes/No). Such breakthroughs can fundamentally alter cost structures and performance.

3.2. Experimental Setup

3.2.1. LLM Parameters

We use qwen3-max-20260123 [24] for all LLM-driven components, including (i) deriving the TS-DAG structure, (ii) instantiating node parameters, and (iii) evaluating strategic actions online during deduction. We set the temperature to 0 to ensure deterministic outputs, and the maximum context length to 256k tokens.

To handle malformed outputs, we implement an automatic repair policy that focuses strictly on syntactic validation (i.e., JSON schema compliance). The repair loop follows a feedback-based refinement logic: upon detecting a formatting error, the system constructs a new prompt that contains (1) the original task, (2) the specific error message, and (3) the required output schema. This composite input is re-submitted to the LLM to guide it toward producing a valid structure. We allow up to three retries per query.

3.2.2. Random Forest Parameterization

For feature-importance analysis, we train a Random Forest regressor with n_estimators=500, max_depth=12, min_samples_leaf=4, and max_features=sqrt. We set random_state=42 to control randomness. We compute permutation importance with n_repeats=5.

We assess generalization via run-wise (block) 5-fold GroupKFold cross-validation. The grouping variable is the run identifier (run_id).

3.2.3. Strategic Deduction Settings

Experiments are designed to cover the macro-environmental space induced by the key variables. Consumer Confidence has 3 levels, Government Subsidy has 3 levels, and Battery Technology has 2 levels. Together, they define

3 \times 3 \times 2 = 18

scenarios.

For each scenario, 10 independent strategic action sequences are generated. This yields 180 simulation runs in total. These runs form the dataset used for correlation and feature-importance analyses.

3.2.4. Comparison Algorithms

To benchmark the indicator system derived in Section 2.3, we compare TS-DAG against two baselines. The first baseline is an AHP-based direct evaluation method (AHP-based). The second baseline generates a TS-DAG via direct prompting with

τ = 0

(prompt-based). Implementation details are provided in the Supplementary Materials.

3.2.5. Comparison Indicators

We evaluate the proposed method and the baselines using three metrics that jointly quantify competitive separation and temporal stability.

Discriminability ( $D$ ): We quantify how well a method separates competing agents. We compute $D$ as the temporal mean of the across-agent standard deviation of market shares. Larger $D$ indicates clearer separation and avoids degenerate uniform outcomes.
Volatility ( $V$ ): We quantify short-term instability in the simulated trajectories. We define $V$ as the temporal mean of the absolute market-share differences between consecutive time steps. Smaller $V$ implies smoother trajectories and supports reliable long-horizon analysis.
Stable Discriminability Score (SDS): We combine separation and stability via $SDS = D / (V + ϵ)$ . This metric rewards methods that preserve separation (high $D$ ) while suppressing jitter (low $V$ ). Larger SDS indicates better overall performance.

3.2.6. Parameter Settings for Sensitivity Analysis

To evaluate robustness with respect to action-impact scaling, we conduct a sensitivity analysis on the mapping M in Equation (12). We consider three configurations: Baseline, Conservative, and Aggressive.

Baseline is the default setting used in the main experiments. Conservative compresses the magnitude of action impacts toward the neutral factor (1.0). Aggressive expands the impact range to stress-test the deduction dynamics under stronger perturbations. Table 1 reports the numerical values used for each impact level.

3.3. Automated Generation of the Evaluation Indicator System

Using the LLM-driven TS-DAG structure-generation procedure, the framework derives an evaluation indicator system for the NEV case. The generated specification explicitly defines node types, initializes node states, and instantiates symbolic transfer functions that govern temporal market evolution.

The resulting TS-DAG comprises three node categories: Exogenous Inputs (decision variables), State Variables (accumulated assets with temporal memory), and Computational Nodes (instantaneous derivations). In this formulation,

i \in {Legacy, Tech, Price}

indexes the agent and t denotes the discrete time step.

To parameterize the simulator, we set a unified constant configuration. Specifically, we set the monthly Total Market Demand (

T M D

) to 50,000 units, the reference Consumer Confidence Index (

C C I

) to 1.0, and the base Net Promoter Score (

N P S_{b a s e}

) to 150. We encode temporal persistence using decay factors (

k_{d e c a y} = 0.98, k_{b r} = 0.95, k_{n p s} = 0.97

) and model efficiencies using coefficients (

R D F = 1, M K T = 0.5

). We compute the Total Attraction Index with fixed component weights:

W_{p} = 0.4

(Product),

W_{c} = 0.3

(Price), and

W_{b} = 0.3

(Brand). The full graph specification is provided in Table 2.

This representation links macro conditions to micro decisions within a single causal–temporal model. The key variables (e.g., CCI, GSL, BTB) act as scenario-level modulators that are exogenous to individual agents and shared across participants. In contrast, the input nodes encode agent-specific decision levers (e.g., investment, pricing, and profit-related choices). During deduction, these inputs propagate through meso-level state variables (e.g., brand power and

N P S

) via TS-DAG state-update equations. The propagation ultimately determines top-level outcomes such as market share. Overall, the TS-DAG provides an interpretable mechanism in which macro conditions define the operating regime, while agent-level decisions drive endogenous competitive dynamics through structured intermediate states.

The automatically generated TS-DAG captures several nuanced mechanisms of business operations (Figure 5). First, the transfer functions for

P P S

and

B P S

combine a decay factor (

k < 1

) with a logarithmic investment return. This design enforces continuous spending to sustain competitiveness, which operationalizes the “Red Queen” effect.

Second, the framework derives

P P S 2

(Price Power Score) as the ratio

P P S / P r i c e

. This formulation induces an explicit trade-off: increasing

P r i c e

improves unit profit (

P r o f i t

) but reduces

P P S 2

. A lower

P P S 2

decreases the Total Attraction Index and can therefore reduce market share (

M K T S

).

Third, the TS-DAG instantiates a feedback loop between product quality and brand perception. Specifically,

N P S

increases with relative product superiority,

(P P S_{i} - {\bar{P P S}}_{t o t a l})

. A higher

N P S

then increases brand power (

B P S

), capturing the long-term effect of word-of-mouth. This mechanism is particularly important for the Tech agent.

Finally, the resource constraints (

R D B

,

M K T B

) impose a survival boundary. The

R D B

update rule includes a replenishment term derived from

S a l e s

and

P r o f i t

. This closes the loop between commercial success and future innovation capacity.

3.4. Analysis of Single-Run Evolutionary Dynamics

To examine whether the generated strategies yield coherent temporal dynamics, we analyze key indicators from a representative single simulation run. Figure 6 plots the resulting trajectories for the three interacting agents.

The market-share trajectory (

M K T S

) shows a pronounced shift in competitive leadership. At initialization, Legacy leads the market due to its higher initial brand power and accumulated resources. Over time, Tech exhibits sustained growth and eventually overtakes Legacy, resulting in a clear crossover.

The underlying drivers are consistent with the TS-DAG mechanisms. Tech achieves higher

P r o f i t

and

N P S

. Its larger R&D investment improves product quality (

P P S

), which increases

N P S

. In turn, higher

N P S

feeds back to strengthen brand power (

B P S

). Moreover, higher unit profitability replenishes the R&D budget (

R D B

), enabling sustained reinvestment and preventing competitiveness from decaying.

In contrast, Price highlights the limitations of a purely cost-driven strategy in this setting. It maintains the highest Price Power Score (

P P S 2

), i.e., the strongest cost-performance ratio. However, its market share declines over time. This decline is primarily driven by thin margins, which limit

R D B

accumulation, and by a lower brand ceiling. Overall, the simulation suggests that cost-performance alone (

P P S 2

) is insufficient to secure market dominance when competitors jointly optimize profitability, reputation (

N P S

), and product innovation.

3.5. Statistical Correlation Analysis of Multi-Run Simulations

To extract insights that generalize beyond a single run, we perform a correlation analysis across all simulation runs. We quantify statistical dependencies (i) among generated indicators and (ii) between indicators and macro-environmental key variables. Figure 7 reports the resulting Pearson correlation heatmaps.

Figure 7a reports inter-indicator correlations aggregated across all agents. By pooling agents, we highlight structural market mechanisms rather than agent-specific artifacts. We observe a pronounced negative correlation between the Price Power Score (

P P S 2

) and most other performance indicators. This pattern suggests that emphasizing cost-performance (low price relative to product power) can trade off against profitability and brand accumulation, which slows long-term asset growth.

Excluding direct computational links, Brand Power Score (

B P S

) shows the strongest positive correlation with Market Share (

M K T S

). Net Promoter Score (

N P S

) is also strongly aligned with market dominance. These results are consistent with the single-run dynamics (Figure 6): Tech overtakes Legacy primarily through higher

N P S

and sustained brand accumulation (

B P S

), rather than through price competition.

Figure 7b relates macro-environmental key variables (Consumer Confidence, Subsidy Level, and Battery Technology) to core indicators. Although these correlations are weaker than intra-system correlations, they still reveal interpretable policy and technology effects. For example, strong subsidies correlate positively with

M K T S_{t e c h}

but negatively with

M K T S_{p r i c e}

. This indicates that financial incentives disproportionately benefit premium, innovation-oriented players.

We also observe that moderate Consumer Confidence and Subsidy levels correlate positively with core indicators across all agents. Finally, battery-technology breakthroughs correlate positively with most global performance indicators, but negatively with

M K T S_{t e c h}

. This suggests that an industry-wide technology leap can reduce the innovation leader’s relative advantage and narrow the competitive gap.

3.6. Feature Importance Analysis via Random Forest Regression

Figure 8 summarizes permutation-importance estimates obtained under run-wise (block) cross-validation. Across all three agents, lagged R&D investment (

R D_M o n e y

) ranks as the most important predictor of subsequent market share. This indicates that sustained innovation input is the primary lever for competitive advantage in this market.

Lagged marketing investment (

M K T_M o n e y

) is consistently the second most important feature. This highlights the role of demand shaping and brand exposure in converting technological capability into realized market share.

The stable ordering across

M K T S_{l e g a c y}

,

M K T S_{t e c h}

, and

M K T S_{p r i c e}

suggests that the dominance of R&D over marketing is structural, rather than an artifact of a specific participant.

Table 3 reports cross-validated predictive performance. The positive

R^{2}

values and low MAE across all three targets indicate that the learned mapping from lagged actions to subsequent market share is statistically meaningful. This supports using permutation-based importance to interpret the dominant strategic drivers.

3.7. Comparison Algorithm Analysis

To assess the quality of the generated indicator system, we benchmark TS-DAG against two baselines. The first baseline is an AHP-based direct evaluation method (AHP-based). The second baseline generates a TS-DAG via direct prompting (prompt-based).

Table 4 summarizes the results. TS-DAG achieves the lowest volatility,

V = 2.49 \times 10^{- 3}

(var

6.11 \times 10^{- 8}

), while maintaining competitive separation,

D = 8.52 \times 10^{- 2}

(var

1.22 \times 10^{- 5}

). As a result, TS-DAG attains the highest stable discriminability score,

SDS = 3.44 \times 10^{1}

(var

6.35

). It substantially outperforms the AHP-based method (

SDS = 5.77

; var

7.25

) and the prompt-based method (

SDS = 4.47

; var

3.67

). The definitions of

V

,

D

, and SDS are given in the preceding subsection.

Table 4 indicates that TS-DAG yields substantially lower volatility while preserving competitive separation. This combination leads to the highest SDS among the compared methods. Overall, the results suggest that TS-DAG produces trajectories that are both stable and discriminative, which is critical for reliable strategic evaluation.

3.8. Algorithm Sensitivity Analysis

Figure 9 compares three configurations of the impact mapping M in Equation (12): Baseline, Conservative, and Aggressive. Aggressive amplifies action-induced perturbations. This setting disproportionately benefits the Price agent by strengthening short-term market acquisition under cost-competitive positioning. In contrast, Conservative compresses action-to-indicator effects. This stabilizes trajectories and favors Legacy by preserving its accumulated brand and resource advantages. Overall, Baseline lies between these extremes and yields the most stable market-share evolution.

To further quantify this stability, we examine ranking consistency under parameter perturbations. Across 180 runs with different parameter settings, the MKTS ranking changes in only 5 runs (97.2% unchanged), indicating that the qualitative conclusions are robust to the impact-weight configuration. We therefore adopt Baseline as the default setting in the main experiments.

4. Conclusions

This paper introduces an LLM-driven strategic evaluation framework that couples large language models with a time-series directed acyclic graph (TS-DAG). The TS-DAG enforces an explicit causal structure at each time step while capturing time-lagged propagation, and the LLM helps derive the indicator system, instantiate model components, and quantify action impacts during deduction. Experiments in the new energy vehicle market demonstrate that the framework yields coherent competitive dynamics and supports interpretable, scenario-dependent analysis. However, the framework inherits LLM risks (hallucination, prompt sensitivity, and bias), which can affect indicator definitions and action-to-impact mappings. External validity is constrained by the stylized simulator and the finite scenario set. In addition, performance may degrade under distribution shift, and uncertainty is not explicitly quantified.

Future work will extend the framework in three directions. First, evaluation will be coupled with explicit decision and optimization modules so that strategies can be selected, refined, and validated in a closed loop. Second, robustness and external validity will be improved by calibrating TS-DAG components with real-world evidence, expanding scenario coverage, and stress-testing under distribution shift with uncertainty-aware analysis. Third, we will develop domain-specialized models and safer prompting/guardrail mechanisms to reduce hallucination, mitigate bias, and improve the reliability of action-to-impact mappings.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app16042007/s1, Section S1: Prompts for LLM-Driven Agent, Section S1.1: LLM-Driven TS-DAG Structure Generation-Generate Input Node, Section S1.2: LLM-Driven TS-DAG Structure Generation-Generate Formula, Section S1.3: Data-Driven Parameter Identification-Initialize Input Parameter, Section S1.4: Data-Driven Parameter Identification-Evaluate Feedback, Section S1.5: Data-Driven Parameter Identification-Adjust Parameters, Section S1.6: Strategic Action Quantification and Propagation, Section S2: Comparison Algorithms Description, Section S2.1: AHP-based Algorithm, Section S2.2: Prompt-based Algorithm.

Author Contributions

Conceptualization, M.Z. and X.Z.; methodology, M.Z. and Y.Y.; software, M.Z.; formal analysis, M.Z.; writing—original draft preparation, M.Z.; writing—review and editing, X.Z. and Y.Y.; visualization, M.Z.; supervision, G.Y. and L.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code and raw data will be made public if necessary.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Punt, A.E.; Butterworth, D.S.; de Moor, C.L.; De Oliveira, J.A.; Haddon, M. Management strategy evaluation: Best practices. Fish Fish. 2016, 17, 303–334. [Google Scholar] [CrossRef]
Xin, Z.; Kui, G.; Xie, Z. Modes of Research-oriented Strategy Deduction and Its Intelligent Decision-making Assistance Methods. In 2024 36th Chinese Control and Decision Conference (CCDC); IEEE: Xi’an, China, 2024; pp. 3882–3885. [Google Scholar]
Puyt, R.W.; Lie, F.B.; Madsen, D.Ø. From SOFT approach to SWOT analysis, a historical reconstruction. J. Manag. Hist. 2025, 31, 333–373. [Google Scholar] [CrossRef]
Benzaghta, M.A.; Elwalda, A.; Mousa, M.M.; Erkan, I.; Rahman, M. SWOT analysis applications: An integrative literature review. J. Glob. Bus. Insights 2021, 6, 54–72. [Google Scholar] [CrossRef]
Yüksel, I. Developing a multi-criteria decision making model for PESTEL analysis. Int. J. Bus. Manag. 2012, 7, 52. [Google Scholar] [CrossRef]
Azis, S.S.A.; Salleh, K.M.; Ab Rahman, N.H.; Aziz, N.; Abdullah, S. Asset Management using PESTLE-SWOT Analysis: Strategic Aspects for Integrated Dam Catchment Management. Plan. Malays. 2025, 23, 790–804. [Google Scholar] [CrossRef]
Chehimi, M.; Naro, G. Balanced Scorecards and sustainability Balanced Scorecards for corporate social responsibility strategic alignment: A systematic literature review. J. Environ. Manag. 2024, 367, 122000. [Google Scholar] [CrossRef]
Mahdiraji, H.A.; Kazimieras Zavadskas, E.; Kazeminia, A.; Abbasi Kamardi, A. Marketing strategies evaluation based on big data analysis: A CLUSTERING-MCDM approach. Econ. Res. Ekon. Istraživanja 2019, 32, 2882–2892. [Google Scholar] [CrossRef]
Darko, A.; Chan, A.P.C.; Ameyaw, E.E.; Owusu, E.K.; Pärn, E.; Edwards, D.J. Review of application of analytic hierarchy process (AHP) in construction. Int. J. Constr. Manag. 2019, 19, 436–452. [Google Scholar] [CrossRef]
Behzadian, M.; Otaghsara, S.K.; Yazdani, M.; Ignatius, J. A state-of the-art survey of TOPSIS applications. Expert Syst. Appl. 2012, 39, 13051–13069. [Google Scholar] [CrossRef]
Govindan, K.; Jepsen, M.B. ELECTRE: A comprehensive literature review on methodologies and applications. Eur. J. Oper. Res. 2016, 250, 1–29. [Google Scholar] [CrossRef]
Chou, S.Y.; Chang, Y.H.; Shen, C.Y. A fuzzy simple additive weighting system under group decision-making for facility location selection with objective/subjective attributes. Eur. J. Oper. Res. 2008, 189, 132–145. [Google Scholar] [CrossRef]
Edwards, W.; Barron, F. SMARTS and SMARTER: Improved Simple Methods for Multiattribute Utility Measurement. Organ. Behav. Hum. Decis. Processes 1994, 60, 306–325. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention is All you Need. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar]
Puyt, R.W.; Madsen, D.Ø. Evaluating ChatGPT-4’s historical accuracy: A case study on the origins of SWOT analysis. Front. Artif. Intell. 2024, 7, 1402047. [Google Scholar] [CrossRef]
Brath, R.; Bradley, A.; Jonker, D. Strategic management analysis: From data to strategy diagram by LLM. arXiv 2024, arXiv:2409.06643. [Google Scholar] [CrossRef]
Kharlashkin, L.; Morooka, E.; Tereshchenko, Y.; Hämäläinen, M. ORACLE: Time-Dependent Recursive Summary Graphs for Foresight on News Data Using LLMs. arXiv 2025, arXiv:2512.15397. [Google Scholar] [CrossRef]
Svoboda, I.; Lande, D. Enhancing Multi-Criteria Decision Analysis with AI: Integrating Analytic Hierarchy Process and GPT-4 for Automated Decision Support. arXiv 2024, arXiv:2402.07404. [Google Scholar] [CrossRef]
Wang, H.; Zhang, F.; Mu, C. One for All: A General Framework of LLMs-based Multi-Criteria Decision Making on Human Expert Level. arXiv 2025, arXiv:2502.15778. [Google Scholar]
Phoka, T.; Wathahong, T.; Boriwan, P. An LLM–MCDM Framework with Lin’s Concordance Correlation Coefficient for Recommendation Systems: A Case Study in Food Preference. Appl. Sci. 2026, 16, 117. [Google Scholar] [CrossRef]
Kahn, A.B. Topological sorting of large networks. Commun. ACM 1962, 5, 558–562. [Google Scholar] [CrossRef]
Yang, A.; Li, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; Yu, B.; Gao, C.; Huang, C.; Lv, C.; et al. Qwen3 Technical Report. arXiv 2025, arXiv:2505.09388. [Google Scholar] [CrossRef]

Figure 1. Schematic overview of the LLM-driven strategic evaluation framework.

Figure 2. An example of a TS-DAG.

Figure 3. Schematic diagram of the structure generation of TS-DAG based on the LLM-driven approach.

Figure 4. Schematic diagram of strategic deduction based on TS-DAG.

Figure 5. Indicator system diagram structure of Table 2.

Figure 6. An example of the indicator trajectories in a single simulation run.

Figure 7. Correlation analysis of evaluation indicators and environmental variables. (a) shows inter-indicator correlations aggregated across all agents. (b) shows the sensitivity of core indicators to macro-environmental key variables.

Figure 8. Permutation-based feature importance for market-share prediction under a Random Forest model. Panels (a–c) correspond to the Legacy, Tech, and Price agents, respectively, and show cross-validated importance distributions for lagged strategic actions. Across all three agents,

R D_M o n e y

is consistently the most important driver of

M K T S

, followed by

M K T_M o n e y

.

Figure 8. Permutation-based feature importance for market-share prediction under a Random Forest model. Panels (a–c) correspond to the Legacy, Tech, and Price agents, respectively, and show cross-validated importance distributions for lagged strategic actions. Across all three agents,

R D_M o n e y

is consistently the most important driver of

M K T S

, followed by

M K T_M o n e y

.

Figure 9. Sensitivity analysis under different impact-weight configurations. Panels (a–c) report the resulting market share trajectories for the Legacy, Tech, and Price agents, respectively.

Table 1. Parameter settings for the mapping M under different sensitivity scenarios.

Setting	Large Positive	Small Positive	Neutral	Small Negative	Large Negative
Baseline	1.4	1.1	1.0	0.9	0.6
Conservative	1.2	1.05	1.0	0.95	0.8
Aggressive	1.6	1.2	1.0	0.8	0.4

Table 2. The automatically generated indicator system for NEV market simulation.

Node Name	Description	Type	Initial Values (t = 0)	Generated Transfer Function/Formula
Exogenous Decision Inputs (Per Step)
$R D_M o n e y_{i}$	R&D Investment	Input	L: 50, T: 30, P: 15	Exogenous (Action)
$M K T_M o n e y_{i}$	Marketing Investment	Input	L: 45, T: 30, P: 15	Exogenous (Action)
$P r i c e_{i}$	Average Selling Price	Input	L: 40, T: 35, P: 25	Exogenous (Action)
$P r o f i t_{i}$	Profit Per Unit	Input	L: 8, T: 6, P: 4	Exogenous (Action)
Core Competitiveness Metrics (State Variables)
$P P S_{i}$	Product Power Score	State	L: 120, T: 130, P: 100	$P P S_{i} [t - 1] \cdot k_{d e c a y} + ln (1 + \frac{R D_M o n e y_{i}}{R D F})$
$B P S_{i}$	Brand Power Score	State	L: 150, T: 110, P: 80	$B P S_{i} [t - 1] \cdot k_{b r} + ln (1 + \frac{M K T_M o n e y_{i}}{M K T}) + (N P S_{i} - N P S_{b a s e}) \cdot 0.1$
$N P S_{i}$	Net Promoter Score	State	L: 160, T: 180, P: 150	$N P S_{i} [t - 1] \cdot k_{n p s} + (P P S_{i} - {\bar{P P S}}_{t o t a l}) \cdot 2$
Intermediate & Market Logic
$P P S 2_{i}$	Price Power Score	Comp	-	$P P S_{i} [t] / P r i c e_{i} [t]$
$T o t a l_{i}$	Total Attraction Index	Comp	-	$P P S_{i} \cdot W_{p} + P P S 2_{i} \cdot W_{c} + B P S_{i} \cdot W_{b}$
$M K T S_{i}$	Market Share	Comp	-	$T o t a l_{i} [t] / \sum_{j} T o t a l_{j} [t]$
$S a l e s_{i}$	Sales Volume	Comp	-	$T M D \cdot M K T S_{i} [t]$
$O n l i n e_{i}$	Online Buzz Index	Comp	-	$10 ln (1 + M K T_M o n e y_{i}) + 5 ln (1 + S a l e s_{i})$
Resource Constraints (Budget States)
$R D B_{i}$	RD Fund Balance	State	L: 500, T: 300, P: 150	$R D B_{i} [t - 1] - R D_M o n e y_{i} + \frac{P r o f i t_{i} \cdot S a l e s_{i}}{10,000}$
$M K T B_{i}$	MKT Budget Balance	State	L: 300, T: 200, P: 100	$M K T B_{i} [t - 1] - M K T_M o n e y_{i}$

Table 3. Cross-validated predictive performance of the Random Forest models for market share prediction.

	$R^{2}$	MAE
MKTS_legacy	$5.33 \times 10^{- 1}$	$1.52 \times 10^{- 3}$
MKTS_tech	$8.86 \times 10^{- 1}$	$4.53 \times 10^{- 3}$
MKTS_price	$8.64 \times 10^{- 1}$	$5.54 \times 10^{- 3}$

Table 4. Comparative performance of TS-DAG and baseline methods in terms of volatility, discriminability, and stable discriminability score.

	V		D		SDS
	Mean	Var	Mean	Var	Mean	Var
TS-DAG	$2.49 \times 10^{- 3}$	$6.11 \times 10^{- 8}$	$8.52 \times 10^{- 2}$	$1.22 \times 10^{- 5}$	$3.44 \times 10^{+ 1}$	$6.35$
AHP-based	$4.20 \times 10^{- 2}$	$3.35 \times 10^{- 4}$	$2.20 \times 10^{- 1}$	$7.65 \times 10^{- 3}$	$5.77$	$7.25$
prompt-based	$4.00 \times 10^{- 2}$	$4.39 \times 10^{- 5}$	$1.73 \times 10^{- 1}$	$3.23 \times 10^{- 3}$	$4.47$	$3.67$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zou, M.; Zhu, X.; Ye, Y.; You, G.; Ma, L. A Large Language Model-Driven Strategic Evaluation Framework via Time-Series Directed Acyclic Graphs. Appl. Sci. 2026, 16, 2007. https://doi.org/10.3390/app16042007

AMA Style

Zou M, Zhu X, Ye Y, You G, Ma L. A Large Language Model-Driven Strategic Evaluation Framework via Time-Series Directed Acyclic Graphs. Applied Sciences. 2026; 16(4):2007. https://doi.org/10.3390/app16042007

Chicago/Turabian Style

Zou, Mingyin, Xiaomin Zhu, Yanqing Ye, Guangrong You, and Li Ma. 2026. "A Large Language Model-Driven Strategic Evaluation Framework via Time-Series Directed Acyclic Graphs" Applied Sciences 16, no. 4: 2007. https://doi.org/10.3390/app16042007

APA Style

Zou, M., Zhu, X., Ye, Y., You, G., & Ma, L. (2026). A Large Language Model-Driven Strategic Evaluation Framework via Time-Series Directed Acyclic Graphs. Applied Sciences, 16(4), 2007. https://doi.org/10.3390/app16042007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Large Language Model-Driven Strategic Evaluation Framework via Time-Series Directed Acyclic Graphs

Abstract

1. Introduction

2. Materials and Methods

2.1. LLM-Driven Strategic Evaluation Framework

2.2. Temporal-Sequential Directed Acyclic Graph

2.2.1. Mathematical Modeling

2.2.2. Topological Constraints and Sequential Computation

2.2.3. TS-DAG State Propagation

2.3. Automated Construction of the Strategic TS-DAG

2.3.1. LLM-Driven TS-DAG Structure Generation

2.3.2. Data-Driven Parameter Identification

2.4. Strategic Action Quantification and Propagation

2.5. Extracting Strategic Insights Through Comparative Analysis

2.5.1. Trend Analysis

2.5.2. Correlation Analysis

2.5.3. Feature Importance Analysis

3. Results

3.1. Case Description

3.1.1. Evaluation Elements

3.1.2. Key Variables

3.2. Experimental Setup

3.2.1. LLM Parameters

3.2.2. Random Forest Parameterization

3.2.3. Strategic Deduction Settings

3.2.4. Comparison Algorithms

3.2.5. Comparison Indicators

3.2.6. Parameter Settings for Sensitivity Analysis

3.3. Automated Generation of the Evaluation Indicator System

3.4. Analysis of Single-Run Evolutionary Dynamics

3.5. Statistical Correlation Analysis of Multi-Run Simulations

3.6. Feature Importance Analysis via Random Forest Regression

3.7. Comparison Algorithm Analysis

3.8. Algorithm Sensitivity Analysis

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI