The MSMDA framework operates on a simulation environment defined by a set of firms
with
, a discrete time index
with
, and a strategy space
. At each time step
t, firm
i chooses strategy
, sets price
, and produces quantity
. Realized sales are determined by the market clearing condition
, where
is the firm-level demand function. Profit accrues as
and assets evolve as
. Macroeconomic variables, GDP and inflation, are endogenous aggregates of firm-level actions. The six methodological innovations build on this shared foundation and are described in the subsections below.
Figure 1 illustrates the overall information flow among the six components and their convergence toward policy recommendations.
3.1. Strategy Temporal Pattern Recognition
The STPR component addresses the fundamental challenge of identifying recurring and statistically significant patterns in the sequence of strategic choices made by firms across time. Unlike conventional clustering approaches that treat each time period as an independent observation, STPR explicitly models the temporal dependencies inherent in strategic decision-making by embedding the strategy adoption process within a context-augmented Hidden Markov Model (HMM). The key intuition is that firms do not choose strategies in a vacuum but respond to a combination of their own recent history, observed competitor behavior, and prevailing macroeconomic conditions. Formally, let
denote the strategy of firm
i at time
t. The conditional transition probability from strategy history
to current strategy, given market history
, is modeled as
where
k is the memory depth hyperparameter,
is a feature map that extracts relevant summary statistics from the strategy history and market context, and
is a learned weight vector estimated via a modified Baum–Welch algorithm that incorporates gradient-based updates to handle the continuous market context features. The statistical significance of each identified pattern
p is assessed through the pattern significance metric
where
is the average effect of pattern
p on GDP and inflation, estimated by local projection regression [
32]. Only patterns with
exceeding the 95th percentile of a block-bootstrap null distribution are retained in subsequent analyses. The bootstrap null is constructed by resampling
block-bootstrap samples of the strategy indicator series with block length
chosen by the Politis–White procedure [
16], recomputing
on each pseudo-sample, and taking the empirical 95th percentile of the resulting distribution as the critical value. This approach preserves the serial correlation present in the strategy time series and avoids the well-documented size distortion that arises when independent bootstrap methods are applied to temporally dependent data [
14,
15].
The hidden state of the augmented HMM is defined as
, where
is a discrete latent market phase and
is the standardized macroeconomic context vector constructed from GDP growth, inflation, and unemployment. For each strategy
and phase
, the emission probability is specified as
where
is a phase-specific strategy intercept and
measures how macroeconomic conditions shift the probability of strategy
s. The phase-transition kernel governs the evolution of the latent phase as
where
is the baseline phase-transition log-odds matrix and
is a destination-phase context sensitivity vector. Estimation uses an expectation maximization routine in which the forward–backward step computes
and
, and the maximization step updates
by maximizing
The multinomial-logit emission and transition subproblems are solved by gradient updates inside each maximization step. This formulation allows the model to identify market phases in which specific strategies dominate and to characterize the macroeconomic conditions that trigger transitions between phases.
The degree of within-strategy temporal persistence is quantified by the lag-
ℓ autocorrelation function
where
is the marginal prevalence of strategy
s. The overall unpredictability of strategy sequences is captured by the strategic entropy rate
where
is the stationary distribution of the estimated Markov chain. A low value of
indicates that firms tend to maintain their current strategy from one period to the next, while a high value signals frequent and unpredictable transitions. Together, the persistence autocorrelation and the entropy rate provide a concise two-dimensional characterization of the strategic dynamics of each market phase.
3.2. Strategy Transition Detection and Analysis
The STDA component focuses on the detection and characterization of statistically significant strategy transitions, defined as moments at which a firm changes its strategic orientation in a manner that reflects genuine reoptimization rather than sampling noise. A transition event
for firm
i at time
t is defined as
where
is the change in the estimated value function of firm
i at time
t, and
is a significance threshold determined by parametric bootstrap under the null hypothesis of pure random switching. The first indicator filters out genuine strategy changes while the second filters out value-function fluctuations below the noise floor. Transition events propagate through the market network, triggering cascades in which the strategic change of one firm induces similar changes in neighboring firms. The cascade propagation measure is defined as
where
is the market neighborhood of firm
i consisting of firms with overlapping product portfolios,
are Jaccard similarity proximity weights, and
is the propagation delay in periods. A cascade event is declared when
exceeds the 95th percentile of its empirical distribution, and the cascade size is defined as the number of firms that transition within the window
.
Multi-scale transition detection is achieved through application of the Continuous Wavelet Transform applied to the population-level strategy signal
as
where
is the complex conjugate of the Morlet mother wavelet
with
, and
and
are the scale and position parameters respectively. A significant wavelet coefficient at scale
a and position
b satisfies
where
is the red noise background power spectrum at Fourier frequency
k corresponding to scale
a,
is the signal variance, and
is the
-level critical value of the chi-squared distribution with two degrees of freedom. The wavelet approach distinguishes transitions at short timescales driven by idiosyncratic price shocks from those at long timescales corresponding to fundamental shifts in market structure.
The causal attribution of transition events to observable firm-level and macroeconomic factors is conducted via LASSO-regularized logistic regression with transition occurrence as the binary outcome
where the feature vector
comprises firm-level profitability, market-share volatility, leverage ratio, pricing pressure, and macroeconomic conditions including GDP growth and inflation, and
is the regularization parameter selected by five-fold cross-validation.
3.3. Strategy-Macro Causality Analysis
The SMCA component addresses the fundamental question of how micro-level strategic decisions aggregate to produce macro-level economic outcomes, and conversely, how macroeconomic conditions feed back into strategic decision-making. The analysis proceeds in three stages consisting of linear Granger causality testing, nonlinear causality quantification, and doubly robust counterfactual policy analysis. The macroeconomic state vector and the strategic composition vector are defined respectively as and , where is the fraction of firms employing strategy k at time t.
The multivariate Granger causality from
to
is quantified by the log-determinant ratio
where
is the residual covariance matrix of
regressed on
k of its own lags, and
is the residual covariance when
ℓ lags of
are additionally included. A positive value of
indicates that knowledge of the past strategic composition improves prediction of future macroeconomic conditions beyond what is available from the macro history alone. Statistical significance is assessed via the likelihood ratio test statistic
under the null of no Granger causality, where
and
are the dimensions of
and
respectively.
Because the asymptotic chi-squared approximation in Equation (
14) can be unreliable in finite samples when the underlying time series exhibit persistent autocorrelation, we supplement the asymptotic
p-value with a block-bootstrap
p-value. Under the null hypothesis the restricted VAR residuals
are resampled in contiguous blocks of length
periods, the block length being determined by the Politis–White spectral estimator applied to the autocorrelation function of
. Each of the
bootstrap replications generates a pseudo macroeconomic series
under the null that
has no predictive content for
, and the bootstrap
p-value is the fraction of replications for which the pseudo likelihood ratio statistic exceeds the observed statistic. The robustness of the block length choice is verified by repeating the entire procedure with
, and the rejection decision is invariant across this range.
To capture the nonlinear causal relationships that linear Granger tests may miss, we define the kernel nonlinear causality ratio
where
is the prediction of a linear VAR that includes both
and
, and
is the prediction of a Nadaraya–Watson kernel regression estimator with a Gaussian RBF kernel, bandwidth selected by leave-one-out cross-validation. A ratio significantly above unity confirms that nonlinear strategic interactions contribute additional predictive information beyond the linear component.
As a fully nonparametric complement, the transfer entropy from
to
is computed as
measuring the reduction in uncertainty about the next macroeconomic state that is gained by knowing the current strategic composition, over and above the reduction already achieved by knowing the current macro state. The statistical significance of
is again assessed by block bootstrap, with the same block length and number of replications as the Granger causality test, by computing transfer entropy on
block-bootstrapped null series in which the temporal alignment between
and
is destroyed within each block boundary.
For counterfactual policy analysis under a hypothetical intervention
, let
denote the pre-intervention state and let
denote the macroeconomic outcome at horizon
h. Because
is a vector of strategy shares, the treatment density is estimated by a generalized propensity score
rather than by an exact discrete propensity. The doubly robust estimator is
with stabilized weight
where
is a Gaussian kernel centered at the target composition
,
is the outcome regression,
is estimated from the observed strategy-composition process, and
is a trimming constant that prevents unstable inverse-density weights. The estimator remains consistent if either the outcome regression or the generalized propensity model is correctly specified. Confidence intervals for the counterfactual projections are obtained by the same moving block bootstrap, resampling the time series of observations in blocks of length
and recomputing the estimator on each bootstrap sample.
3.4. Dynamic Market Stability Index
The DMSI is constructed as a weighted sum of four component indices, each capturing a distinct dimension of market health. The four components correspond to separate instability channels observed in the simulations: price instability, strategic concentration, entry–exit imbalance, and sensitivity to standardized shocks. A fixed-weight volatility index was used as a baseline during model development, but it does not distinguish among these channels and is less informative when the source of instability changes across phases. The adaptive formulation retains the interpretation of a composite index while allowing the most predictive component to receive more weight in each market phase. The overall index at time
t is
subject to the normalization constraint
for all
t. The four components are defined as follows. The price stability component is
which is high when recent price changes are small relative to their mean, and low during episodes of volatile price fluctuation. The strategy diversity component is
the normalized Shannon entropy of the current strategy distribution, which peaks at one when strategies are equally prevalent and falls toward zero when the market is dominated by a single strategy. The entry–exit balance component is
where
and
are the numbers of firm entries and exits at time
t, so that this component is maximized when entry and exit rates are balanced, reflecting a market in demographic equilibrium, and is minimized during episodes of mass exit or entry. The shock resilience component is
measuring the average fractional macro-state displacement following a standardized shock
, where
is a fixed set of demand, cost, and regulatory perturbations applied identically across all time periods to ensure comparability.
The adaptive component weights are determined by a softmax function over the phase-specific relevance of each component as
where
is the partial
of component
j in predicting future market distress, estimated from a rolling window of 50 periods, and
controls the concentration of the weight distribution. This mechanism ensures that the DMSI automatically upweights the most relevant stability dimensions during each market phase, producing a measure that is better calibrated than any fixed-weight alternative.
An early-warning instability risk signal is derived from the first and second derivatives of the DMSI as
where
ensures that a rapidly accelerating decline in DMSI triggers an elevated risk signal even before the derivative becomes negative. The threshold
is selected on the first half of each scenario by maximizing the F1 score over the grid
, with distress defined as a subsequent DMSI decline below the scenario-specific 10th percentile within 20 periods. When evaluated on the held-out second half of the simulations, the same threshold achieves average precision of 78% and average recall of 71%. Scenario-level precision ranges from 74% to 82%, and scenario-level recall ranges from 68% to 75%.
3.5. Adaptive Rationality Equilibrium
A classical Nash equilibrium [
33] requires that each agent chooses the best response to the strategies of all other agents, implicitly assuming unbounded cognitive capacity and foresight. Following the standard reinforcement learning framework [
34], we model agents that learn through interaction with their environment. The bounded rationality literature, originating with Simon [
35], acknowledges that real decision-makers operate under cognitive constraints but typically imposes uniform limitations across all agents and time periods, which is at odds with the empirical evidence for substantial heterogeneity in strategic sophistication among market participants. The Adaptive Rationality Equilibrium (ARE) introduced here accommodates heterogeneous, time-varying rationality by modeling each firm’s action selection as a convex combination of fully RL-rational and fully myopic payoffs. Formally, the effective Q-value that firm
i uses to select its strategy at time
t is
where
is the discounted cumulative return estimated by the PPO critic,
is the myopic single-period payoff under strategy
s, and
is the rationality parameter of firm
i at time
t. The PPO policy
conditions on the firm state
and on the current rationality value. The ARE-adjusted advantage entering the clipped PPO objective is
where
is the generalized-advantage estimate from the PPO rollout and
is the within-batch mean myopic payoff. The inner-loop update maximizes
where
. The rationality parameter is updated after each PPO epoch by the projected meta-gradient step
where
u indexes PPO epochs,
B is the minibatch size,
is the meta-learning rate, and
enforces the feasible rationality interval. This two-level optimization specifies how ARE is embedded in the PPO framework rather than treating rationality as an exogenous post-processing parameter.
Proposition 1. An Adaptive Rationality Equilibrium exists in any compact, convex joint strategy-rationality space, where the best-response correspondence maps each (strategy profile, rationality profile) pair to the set of jointly optimal strategy and rationality choices. Existence follows from Brouwer’s fixed-point theorem applied to the continuous best-response mapping on the compact, convex product space .
In the implemented simulation, this condition is represented by the mixed-strategy simplex over the three available strategies and the projected interval
for rationality. The PPO policy produces smooth mixed-action probabilities, and the projection in Equation (
29) keeps the rationality component compact. The proposition therefore supports the existence of a fixed point for the smoothed learning dynamics used in the numerical ARE search, while the location of the reported optimum remains a simulation-specific result.
3.6. Information Asymmetry Propagation
Information asymmetry among market participants is a fundamental driver of strategic heterogeneity and market inefficiency. Firms with superior private information about demand conditions, cost structures, or competitor intentions can exploit this advantage to earn above-market returns, while the gradual diffusion of private information into public knowledge through observed prices and quantities determines the speed at which markets approach informational efficiency. The IAP component quantifies both the degree of private information held by individual firms and the rate at which it dissipates. The information advantage of firm
i at time
t is defined as the KL divergence between the firm’s private predictive distribution and the publicly available predictive distribution for the next-period macroeconomic state as
where
is the private information set of firm
i and
is the set of publicly observable variables at time
t. A higher value of
indicates that firm
i holds more informative beliefs about future market conditions than can be inferred from public data alone. The market-wide information diffusion rate is measured by the IAP metric
which equals zero when information advantages are perfectly persistent, approaches one when they are completely eliminated within a single period, and can be negative when strategic obfuscation causes the gap between private and public knowledge to widen. The empirical relationship between information advantage and strategic transition probability is captured by the logistic link function
where
is the sigmoid function and the coefficients
are estimated by maximum likelihood. A significant positive value of
b confirms that firms with larger information advantages are more likely to transition strategies in the current period, consistent with the hypothesis that private information enables firms to anticipate changing market conditions and proactively adjust their behavior.