1. Introduction
The price of a financial instrument is the emergent outcome of a large population of heterogeneous decision-makers, each acting on private information, private objectives and private cognitive frames. Since Bachelier first described price movements as a random walk [
1], and through the successive refinements introduced by Mandelbrot in his work on fractal geometry of finance [
2,
3], it has become widely accepted that the behaviour of prices does not match the assumptions of classical Gaussian finance. Fat tails, volatility clustering and scale invariance are no longer anomalies to be explained away but structural features of the object of study. The field of econophysics was born from the recognition that the statistical mechanics of complex systems, the theory of self-similar stochastic processes [
4,
5], multi-resolution analysis via wavelets [
6,
7], and a substantial body of empirical work on volatility, volume and stylized facts of financial returns [
8,
9,
10,
11,
12,
13,
14,
15,
16] offer a conceptual toolbox far better suited to financial dynamics than ideal-market assumptions.
Within this trajectory, a unified perspective has recently emerged under the name of Multiscale Relativistic Quantum Finance (MRQF) [
17,
18]. MRQF combines three ideas that were previously treated in isolation: the multiscale nature of the price signal, a relativistic formalism that posits a maximum instantaneous speed of price propagation
, and a quantum interpretation of liquidity in which a gauge boson—called the financion—mediates the interaction between a bullish operator
and a bearish operator
. The theory formalizes the market as a black body that absorbs financions, introduces a finance specific quantum of action
and recasts decisions in the transformed energy–entropy plane
rather than in the familiar price–time plane. The claim of MRQF is conceptually strong: the familiar phenomenology of trending, laterality and exhaustion is not merely described but explained through state transitions in the
plane. A companion line of work [
19,
20] has developed a thermodynamic decision-support engine on the same two-dimensional projection, showing that the combinatorial complexity of a seven-factor problem can be reduced by more than two orders of magnitude without loss of solution quality.
Yet a gap persists between MRQF as a theory and MRQF as a trading system. The original papers characterize the mathematical structure of the market and the behaviour of its elementary constituents, but they do not specify how these constituents should be instantiated as software agents, how they should exchange information, how they should handle real-time data, or how their collective behaviour should be coordinated. At the same time, the literature on multi-agent systems applied to algorithmic trading has grown rapidly [
21,
22,
23,
24], with recent contributions showing that cooperative multi-agent deep reinforcement learning can outperform single-agent baselines on currency-pair data [
21,
23]. In particular, the framework of Shavandi and Khedmati [
21], which instantiates expert trading agents on distinct timeframes and connects them through a hierarchical feedback mechanism in which knowledge flows from higher to lower timeframes, is the closest existing work to the present contribution and deserves an explicit comparison. Our approach differs in three respects that we consider structural rather than cosmetic. First, the agent taxonomy in [
21] is organized along timeframes alone (e.g., daily, hourly, minute), while MRQF-MAS organizes agents along the operator categories derived from MRQF (institutional, commercial, retail), which jointly determine a characteristic timeframe rather than being identified with it. Second, the information flow in [
21] is strictly top-down along the hierarchy, whereas MRQF-MAS introduces a peer-to-peer lateral channel among sub-agents of the same specialization, with cooperative–competitive fusion governed by an explicit threshold
. Third, ref. [
21] operates on the price–time plane and uses a pragmatic feature set, while MRQF-MAS operates on the MRQF energy–entropy plane augmented with probabilistic observables, inheriting a theoretical structure that is absent in [
21]. The most common architectural pattern in the broader stream remains a centralized orchestrator that aggregates signals from specialized agents. Peer-to-peer cooperation, in which agents share beliefs laterally and converge through consensus rather than through top-down arbitration, is less explored in finance, although in the adjacent domain of large language model agents the benefits of lateral communication have recently been documented: multi-agent debate, iterative consensus and peer-to-peer belief exchange have been shown to improve reasoning and factual accuracy over single-agent or orchestrator-only baselines [
25,
26,
27,
28]. MRQF-MAS imports this architectural principle into finance and anchors it to the MRQF theoretical framework.
The present work closes both gaps. We develop MRQF-MAS, a multi-agent cooperative framework that inherits its taxonomy, its interaction primitives and its decision space directly from MRQF, and that extends the theory with three architectural contributions. First, we instantiate the institutional, commercial and retail operators as cooperative agents and decompose each of them into five specialized sub-agents covering the signal, energy, entropy, risk and execution dimensions. Second, we introduce a horizontal cooperation protocol that lets sub-agents exchange beliefs directly through financion-mediated messages, complementing rather than replacing the vertical orchestration layer. Third, we define a shared knowledge base that stores trajectories, resonance levels and learned patterns, and that is updated online as agents execute decisions and observe their outcomes. To support the theoretical exposition, we provide a synthetic EUR/USD case study that reproduces the MRQF scale-invariance law and that illustrates the behaviour of the architecture on a realistic tick-level signal.
The paper is organized as follows.
Section 2 recalls the theoretical foundations of MRQF, restricted to the elements strictly needed for the subsequent development.
Section 3 describes how the theory is mapped onto computable features and how the state vector in the prospect environment is constructed.
Section 4 introduces the MRQF multi-agent cooperative architecture.
Section 5 formalizes the interaction dynamics by mapping scattering, annihilation and pair production from physics onto the multi-agent system.
Section 6 specifies the trading framework on the nine-region plane.
Section 7 presents the algorithmic implementation, including the cooperative coordination algorithm.
Section 8 discusses the synthetic EUR/USD case study.
Section 9 examines the limits of the quantum-relativistic analogy and the boundaries of its applicability.
Section 10 reports the extended empirical validation against orchestrator-only baselines and classical quant models.
Section 11 presents the sensitivity and complexity analysis of the framework.
Section 12 discusses the overall results and their implications.
Section 13 draws conclusions and outlines directions for future work.
8. Synthetic EUR/USD Case Study
To illustrate the behaviour of MRQF-MAS on a realistic signal we construct a synthetic EUR/USD tick series that obeys the MRQF scaling law by design, and we run the architecture on it. The aim is not to demonstrate real-market performance—which would require extensive back-testing on historical data and is outside the scope of this theoretical contribution—but to verify that the architecture is internally consistent, that its estimators recover the postulated scaling exponent, and that its decision trajectory on the
plane behaves as predicted. An overview of the resulting tick series is shown in
Figure 5.
The synthetic data is generated from a fractionally integrated innovation process with target scaling exponent
and baseline price level consistent with the EUR/USD spot rate observed in recent months.
Figure 5a shows a representative 2000-tick realization; the visual appearance reproduces the alternation of directional moves and consolidation that is familiar from real tick data.
Figure 5b shows the volatility scaling in log–log coordinates: the slope of the measured points is
, close to but below the nominal value
. This systematic downward bias is itself informative: it signals that the particular realization, while generated from a process with true exponent 0.618, exhibits a lower local scaling consistent with a specific regime. MRQF-MAS picks up this mismatch without instrumenting it explicitly: the Signal sub-agents of the three operator-level agents report
estimates in the range
, and the Risk sub-agent reflects the discrepancy in its reliability score
.
Figure 5c shows the rolling estimates of
and
along the same realization. Both time series fluctuate in the narrow band that the scale discretisation imposes, but the joint evolution visible in
Figure 5d shows that the trajectory visits a restricted region of the plane, concentrated around
, and remains in that region throughout the simulated period. The interpretation is immediate: the synthetic process is in a regime of moderate energy and high-to-medium entropy, which MRQF-MAS classifies as the transitional zone between Region VIII and Region IX. Consistently with the strategy classes of
Section 6.2, the cooperative consensus favours chaos-management strategies with tight stop-losses and modest exposure, and the reliability-gated risk module keeps the size per trade at approximately one third of the maximum admissible size. The behaviour of the architecture on this synthetic realization is therefore the expected one: an appropriately conservative response to a regime that combines non-trivial energy with significant disorder. Part of the gap between the nominal exponent
and the measured
is also attributable to the finite-sample bias of wavelet-based scaling estimators, which is well documented in the Hurst-exponent literature and is known to systematically under-estimate the true exponent on samples below
ticks [
38]. The MRQF-MAS reliability sub-agent reflects this epistemic uncertainty in its score
, and the cooperative consensus is correspondingly cautious.
Two observations deserve emphasis. The first is that the trajectory on the plane is remarkably stable: successive points in
Figure 5d remain within a localized cluster, without the high-frequency oscillation that would be visible in a price–time plot of the same data. This stability is the structural reason why the
plane supports robust decisions. The second is that the synthetic regime identified by the architecture coincides with the a priori classification that an analyst would assign on the basis of the generating process: the case study is therefore a consistency check rather than a predictive test, and the architecture passes it.
A proper predictive evaluation, on historical market data with realistic transaction costs and market impact, is the natural next step and is discussed in
Section 10 as a priority for future work.
Table 4 reports the metrics produced by MRQF-MAS on this synthetic realization.
10. Extended Empirical Validation
The synthetic case study of
Section 8 was designed as a consistency check for the
formalism, not as a performance evaluation. In this section we address the principal concern raised by peer review—that a quantum-motivated architecture must be benchmarked on genuine market data rather than on a parametric proxy—by running the complete framework on the official EUR/USD daily reference rates published by the European Central Bank over the period from 4 January 1999 to 6 April 2026, totalling 6978 business-day observations, and by evaluating it as a high-volatility regime classifier rather than as a profit-oriented trading system. This repositioning is consistent with the architectural design of MRQF-MAS, whose distinctive contribution is regime-aware abstention through the
region map; claiming direct profit extraction would be over-reach, while claiming a structured regime filter is both defensible and operationally useful as a pre-processing layer for downstream decision systems. Throughout this section the framework is run in diagnostic mode: Execution.execute is the no-op branch (no order is sent), the quantity emitted at each step is the binary high-volatility regime label produced by RegimeLabel together with its one-step switch flag, and these labels—not trading returns—are scored against the realized-volatility ground truth; the order-emission branch of Algorithm 1 is retained for completeness but not exercised here.
The data source is the ECB euro foreign exchange reference rates, published daily at 16:00 CET and consumed here through the embedded historical file distributed by the CurrencyConverter Python package version 0.18.17 [see
Appendix A for access details]. The series spans 27 years and covers the 2001 dot-com aftermath, the 2008 global financial crisis, the 2010 European sovereign debt episode, the 2011 Swiss-franc ceiling regime, the 2015 Swiss National Bank floor removal, the 2016 Brexit referendum, the 2020 COVID-19 shock, and the 2022 Russia–Ukraine conflict. Mean daily log return is approximately
basis points with standard deviation of
basis points; excess kurtosis is
and skewness is essentially zero, consistent with the leptokurtic, near-symmetric distribution documented for major FX pairs in the stylised-facts literature [
16]. This real-data profile differs materially from the parametric GARCH proxy of the previous version of the manuscript: fat tails emerge from genuine macroeconomic episodes rather than from a calibrated innovation process, and regime shifts are clustered in time rather than injected at prescribed checkpoints.
The evaluation metric is the classification performance against a high-volatility regime indicator defined as the realized 22-day standard deviation exceeding its trailing 80th percentile, a standard construction in the regime-switching literature [
11]. This ground-truth classifier—flagging the high-volatility regime whenever the realized 22-day standard deviation of daily log returns exceeds its trailing 80th percentile—is the reference against which all five classifiers are scored [
11,
39,
40]. The threshold amounts to
basis points, and
of the observations are labelled as belonging to the high-volatility regime, giving a non-trivial but unbalanced classification task. The first 500 days are reserved for warm-up and excluded from evaluation to allow the multiscale wavelet decomposition, the cumulant estimator and the shared knowledge base to reach a steady state; all metrics reported below refer to the out-of-warm-up portion of the sample, which contains 81 distinct regime onsets.
Three non-trivial baseline families—four classifiers in total—are considered. The rolling-volatility threshold classifier mirrors the ground-truth construction with a shorter calibration window, and represents the simplest practitioner heuristic. The Generalized Autoregressive Conditional Heteroscedasticity (GARCH)-like classifier fits a
autoregressive conditional heteroscedasticity recursion on the full series and flags regimes as the top 20% of the implied conditional variance. We consider two GARCH variants to address the legitimate concern that fixed coefficients may understate the baseline. The first variant, GARCH(1,1), uses literature-default coefficients
,
,
and represents an out-of-the-box deployment without parameter fitting. The second variant, GARCH-fitted, estimates
by maximum likelihood on the pre-2013 sub-sample only and applies the resulting parameters to the full series, in order to provide a strong reference baseline that exploits the autoregressive structure of conditional variance without inducing look-ahead bias. The fitted parameters are
,
,
, with persistence
consistent with the near-unit-root behaviour widely documented for major FX series [
11,
16]. Both GARCH(1,1) baselines use a constant conditional mean and Gaussian innovations estimated by (quasi-)maximum likelihood; because the regime label depends only on the ranking of the conditional-variance path, the Gaussian quasi-MLE is consistent for (
ω,
α,
β) even under non-Gaussian innovations [
41]. As a robustness check we refit the model with Student-t innovations on the same ECB series: the estimated degrees of freedom are about 7.3 (an AIC improvement of about 269 over the Gaussian fit), confirming heavy tails, while the conditional-variance parameters are essentially unchanged (
α: 0.029 → 0.031,
β: 0.968 → 0.968, persistence 0.997 → 0.998). The implied volatility ranking is identical (Spearman ρ = 1.000) and the top-20% regime labels coincide on 99.97% of days, so the comparison of
Table 5 is invariant to the Gaussian-versus-Student-t innovation choice; we retain the Gaussian specification in the headline table. The momentum detector flags a regime whenever the magnitude of the cumulative 22-day return exceeds its trailing 80th percentile; it operationalizes the time-series-momentum effect [
42,
43] and plays the role of a non-volatility competitor, testing whether directional persistence alone is informative about volatility regimes. MRQF-MAS is run as specified in
Section 4,
Section 5,
Section 6 and
Section 7, with the same threshold
, the same nine-band decomposition and the same golden-ratio cooperation weight
used in the earlier analyses.
Table 5 reports the metrics together with bootstrap 95% confidence intervals on the Matthews correlation coefficient (1000 resamples with replacement of the evaluation set). MRQF-MAS attains an accuracy of
, a precision of
, a recall of
, an F1 score of
, MCC
with 95% CI
, and a median detection latency of two trading days. The rolling-volatility baseline attains
,
,
,
and MCC
with 95% CI
; the MRQF-MAS improvement of
MCC points over this baseline is statistically significant under the bootstrap test, with non-overlapping confidence intervals. The GARCH(1,1) variant with literature-default coefficients attains accuracy
, F1
and MCC
with 95% CI
, slightly above MRQF-MAS on aggregate F1 and MCC. The MLE-fitted GARCH variant raises this further to MCC
with 95% CI
, F1
, accuracy
, precision
and recall
, opening a statistically significant gap of approximately
MCC points over MRQF-MAS with non-overlapping confidence intervals. The momentum classifier lags both (
accuracy,
F1,
MCC), confirming that directional persistence is a weak indicator of volatility regimes on daily FX data.
Two observations matter for the practical interpretation of these numbers. First, MRQF-MAS is a high-precision, moderate-recall classifier: when it flags a regime, it is right 82% of the time, but it flags fewer than it should. This is a useful profile when false alarms are costly—for instance, when the output gates expensive downstream actions such as portfolio de-risking—but it is not the profile that would maximize aggregate F1, which is why both the out-of-the-box and the MLE-fitted GARCH baselines win on F1 and on MCC. Second, GARCH exploits a narrower feature set (past squared returns) than MRQF-MAS (multiscale wavelet energy, entropy, kurtosis, cooperation signal) and nonetheless dominates the aggregate metrics; this is an honest signal that the additional machinery of MRQF-MAS does not buy raw detection accuracy. The architectural gain that MRQF-MAS provides—a structured assignment to one of nine
regions, each with a distinct economic reading, and an agent-traceable audit trail of how the classification was reached—is not measured by classification metrics in isolation and must be argued at the architectural level, as is done in the next paragraph and in
Section 12.
It is therefore useful to state explicitly the theoretical relationship between the two approaches. GARCH is a univariate parametric model of conditional variance: it specifies a single latent quantity
that evolves as an autoregressive function of past squared innovations and of its own lagged values, and uses a threshold on
as the regime indicator. Its interpretive output is a point estimate of volatility, and nothing more. MRQF-MAS, by contrast, does not estimate a scalar variance: it decomposes the incoming return stream on the two-dimensional energy–entropy plane, assigns the current observation to one of nine regions each with a distinct economic reading (low-energy/low-entropy corresponds to quiet accumulation, high-energy/high-entropy to stressed dispersion, and so on), and aggregates across agents through a cooperation rule parameterized by the golden ratio
. The two approaches are not substitutes but complements: GARCH is preferable when the sole requirement is an aggregate regime flag with maximum F1; MRQF-MAS is preferable when the downstream consumer needs to know not only whether the market is in a stressed regime, but also which type of stress—directional dispersion, multiscale energy concentration, tail-driven disagreement among sub-agents—because that structure informs different mitigation actions. Read this way, the comparison in
Table 5 is not a head-to-head race but a placement of the two tools on a Pareto frontier between raw classification accuracy and structural interpretability.
The precision–recall profile of MRQF-MAS deserves a separate comment, because it is by design rather than by accident. Recall of
corresponds to approximately 54% of high-volatility days being flagged, against 82% precision on the days that are flagged.
Figure 6 plots the full precision–recall trade-off curves obtained by sweeping the operating threshold of each method, with the default operating points marked by filled symbols; MRQF-MAS occupies the upper-left region of the plot consistent with its high-precision design profile. Lowering the cooperation threshold
would increase recall at the cost of precision; the sensitivity analysis of
Section 11 shows that the trade-off can be tuned along a smooth curve. The default
is set by an a priori convention rather than by post hoc optimization on the evaluation metric: it corresponds to a one-fifth share of the cooperative-belief unit interval, which we adopt as a neutral starting point in the absence of a closed-form derivation from the MRQF constants. The empirically optimal value on the present series is
, which would push MCC to approximately
and bring MRQF-MAS within
MCC points of the MLE-fitted GARCH baseline; we report this explicitly to allow the reader to judge how much of the residual performance gap is attributable to threshold tuning. We elect to retain the convention-based default in the headline numbers because the architectural argument we wish to defend—that the framework produces a regime decomposition that is structurally interpretable on the
plane—is independent of a fine threshold tuning that would conflict with the spirit of an a priori, theoretically motivated parameterisation. The GARCH-like baseline, with precision
and recall
, occupies a different point on the same Pareto frontier and is the natural choice when the consumer tolerates more false alarms.
To address the legitimate concern that the validation might rely on in-sample tuning, we report a temporal out-of-sample split that partitions the evaluation period into two disjoint sub-samples at 1 January 2013. The pre-split sub-sample (2169 observations from 2001 to 2012, regime base rate
) yields MCC
, accuracy
, precision
and recall
. The post-split sub-sample (1171 observations from 2013 to 2026, covering the 2014 oil-price collapse, the 2020 COVID shock and the 2022 Ukraine crisis, regime base rate
) yields MCC
, accuracy
, precision
and recall
. Two facts must be reported about this split. First, the post-split sub-sample has a higher regime base rate than the pre-split, so absolute MCC values are not directly comparable across the two segments; this caveat applies symmetrically to all baselines and does not invalidate the comparison among them within either segment. Second, a chi-squared homogeneity test on the two confusion matrices yields
(
), indicating that the absolute counts of true and false detections differ significantly between the two eras, as expected given the different macroeconomic context and base rates. The relevant claim is not that the classifier is identically behaved on the two segments, but that its hyperparameters fixed on theoretical grounds in
Section 2,
Section 3,
Section 4,
Section 5,
Section 6 and
Section 7 produce a robust regime detector across two disjoint macroeconomic regimes, with precision actually rising on the held-out post-2013 sub-sample. This is the evidence we offer against the hypothesis of in-sample overfitting; a stronger out-of-sample protocol—rolling re-fits, expanding window, multi-pair held-outs—is identified in
Section 13 as a priority direction for future empirical work.
Table 8 decomposes these metrics across the pre-split and post-split sub-samples.
Figure 7 complements the aggregate metrics with a crisis-by-crisis diagnostic on six named episodes. The 2008 global financial crisis is captured in full: both MRQF-MAS and the rolling-volatility baseline classify 100% of the 101 days of the window as stressed, against a ground-truth rate of 92%. The 2022 Ukraine shock is also fully captured (
detection rate, matching the ground-truth 83%). The 2020 COVID shock is partially captured (
MRQF-MAS,
rolling-vol against
ground truth), reflecting the extreme narrowness of the event and the 22-day window effect.
The 2010 European sovereign debt episode is detected markedly better by MRQF-MAS (
) than by rolling-vol (
), indicating that the multiscale cooperation layer adds value in slower-building regimes. The 2015 Swiss-franc episode requires a clarification: although the abandonment of the EUR/CHF floor on 15 January 2015 produced a 30-class single-day move on EUR/CHF, its propagation to EUR/USD was muted, with a daily log return of only
on that date (approximately
relative to the trailing 100-day standard deviation) and no day in the 2015 first-quarter window ranking among the eight largest absolute daily moves of the entire 1999–2026 EUR/USD series. The classifier’s failure to flag this window therefore reflects the fact that the event was a CHF cross-rate dislocation rather than an EUR/USD volatility regime; the ground-truth indicator captures its modest spillover (rate
) but neither MRQF-MAS nor the rolling-volatility baseline activates. This episode highlights a structural limitation of the chosen single-pair daily evaluation: the framework cannot detect cross-asset stress that does not propagate at the daily resolution to the pair under analysis. The 2016 Brexit referendum is correctly left unflagged by both methods, in agreement with the low ground-truth rate of 9%; the pair recovered quickly and the window-based indicator does not classify the episode as a sustained regime. The per-episode detection rates are reported in
Table 6.
An ablation study isolates the contribution of the two distinctive architectural elements of MRQF-MAS: the horizontal cooperation layer of
Section 4.3, and the shared knowledge base of
Section 4.4. The full model is re-run with each component disabled in turn, all other parameters fixed, and the resulting classifiers are evaluated against the same ground truth.
Table 7 reports the outcome. Removing the cooperation layer degrades precision from
to
and MCC from
to
, while recall increases marginally from
to
: the cooperation step visibly sharpens the classifier by filtering out marginal signals that would otherwise trigger false positives, at the cost of missing a few additional onsets. Removing the shared knowledge base, at daily frequency and on this particular ground-truth construction, produces no measurable change in classification metrics. We report this null result openly rather than concealing it, because the architectural rationale for the SKB is theoretical and design-level rather than empirical at this scale: the SKB encodes the persistent memory of
trajectories that the cooperation rules of
Section 4.3 require to detect consistency violations between current and historical agent beliefs, and its expected operational benefit accrues in three settings none of which are stressed by the present daily single-pair evaluation. First, at intraday and tick frequencies the same regime can be revisited many times within a session, so consistency—with stored
trajectories—provides incremental information that is absent at daily resolution where each observation is essentially a fresh draw from a slow process. Second, on multi-instrument portfolios the SKB acts as a cross-asset memory that allows the cooperation layer to detect joint regime onsets that any single sub-agent would miss, a setting that is structurally outside the scope of a single-pair daily classifier. Third, in consecutive-decision settings such as portfolio rebalancing or order scheduling the SKB enforces continuity between successive decisions, a property that only manifests across decisions and not within a single classification step. The null result reported here is therefore expected, narrowly bounded to the present evaluation regime, and explicitly framed in
Section 13 as a priority direction for future empirical work; we deliberately retain the SKB component in the baseline architecture to preserve the structural integrity of the design and to enable the cited extensions without architectural refactoring.
The takeaway is that MRQF-MAS behaves on real ECB data as a structured regime filter with high precision and moderate recall, improving upon a direct rolling-volatility baseline by a consistent margin, remaining competitive with—but not uniformly superior to—a well-tuned GARCH estimator, and capturing the major crisis episodes of the last two decades with a two-day median detection latency. Crucially, the reported numbers are out-of-warm-up on genuine reference rates, not on a parametric proxy; the evaluation protocol specifies ground truth, warm-up cutoff, threshold, metrics and ablation variants, and the full pipeline is reproducible from the CurrencyConverter package version 0.18.17 and the seed fixed in
Appendix A. The claim we defend is architectural interpretability with competitive classification performance, not outperformance on an aggregate F1 score.
Table 8.
Temporal out-of-sample split of MRQF-MAS (full) on the ECB EUR/USD evaluation set, partitioned at 1 January 2013. The post-split sub-sample was never inspected during parameter design; absolute counts differ between sub-samples (chi-squared homogeneity test, p < 0.001) due in part to the different regime base rate (0.172 pre vs. 0.248 post), but precision is highest in the held-out period, providing evidence that the theory-motivated default hyperparameters are not overfitted to the pre-2013 era.
Table 8.
Temporal out-of-sample split of MRQF-MAS (full) on the ECB EUR/USD evaluation set, partitioned at 1 January 2013. The post-split sub-sample was never inspected during parameter design; absolute counts differ between sub-samples (chi-squared homogeneity test, p < 0.001) due in part to the different regime base rate (0.172 pre vs. 0.248 post), but precision is highest in the held-out period, providing evidence that the theory-motivated default hyperparameters are not overfitted to the pre-2013 era.
| Sub-Sample | Period | Obs. | Regime Days | Accuracy | Precision | Recall | MCC |
|---|
| Pre-split | 2001–2012 | 2169 | 374 | 0.891 | 0.740 | 0.570 | 0.587 |
| Post-split | 2013–2026 | 1171 | 290 | 0.873 | 0.961 | 0.507 | 0.640 |
11. Sensitivity and Complexity Analysis
Reviewers raised two legitimate concerns about the robustness and the practical feasibility of the architecture: the dependence of the results on the hyperparameters , and on the granularity of the bands; and the computational cost of a design that instantiates three operator-level agents, each decomposed into five sub-agents, with a lateral cooperation protocol. This section addresses both concerns quantitatively.
The sensitivity analysis was performed by sweeping each of the three hyperparameters over a realistic range while keeping the others at their default values, and by re-running the regime-detection pipeline of
Section 10 on the real ECB EUR/USD series. The scaling exponent
was varied through the proxy of the scaling-window length
, which controls how many observations are aggregated in the feature extraction step and thereby determines the effective Hurst-like exponent captured by the Signal sub-agent; values of
between 100 and 1500 business days correspond to proxied
values between
and
. The cooperation threshold
was varied between
and
around its default of
; its effect is modelled through the hysteresis length of the cooperative signal, which directly reflects how stringent the lateral consensus is before a regime flag is raised. The band granularity was varied from 3 to 20, covering the range from extremely coarse tertile partitions to fine-grained bin structures that approach a continuum.
Figure 8 reports the resulting Matthews correlation coefficients. The three panels convey a consistent message. Panel (a) shows that the classifier is stable over a broad plateau around the golden-mean value
, with MCC values in the narrow range
across
; the default value sits near the top of the plateau, and the sensitivity at the default point is minimal. Panel (b) shows that the cooperation threshold
has the most pronounced effect of the three: MCC peaks at
for
and degrades monotonically to
at
, as increasingly stringent hysteresis suppresses genuine regime transitions; the default
achieves MCC
, within
of the sweep optimum, indicating that the chosen value is defensible but that fine-tuning on the evaluation metric could yield a modest improvement. Panel (c) shows that the band granularity is the least influential parameter of the three: MCC values are essentially constant across the tested range, reflecting the fact that the nine-band map already provides sufficient resolution on this task. The overall sensitivity profile is flat around the theory-motivated defaults, which is the behaviour one would expect of an architecture whose parameters are chosen on physical grounds rather than fitted by cross-validation on the evaluation metric.
A natural companion to the sensitivity sweep is the question of stability: do close inputs lead to close outputs, or are there chaotic opportunities in which similar inputs produce very different decisions? Away from the cooperative/competitive boundary the answer is the former. The (
E,
S) mapping of
Section 3.1 is built from bounded, clipped rolling statistics—a window standard deviation for
E and a robust mean absolute deviation for
S—and is therefore Lipschitz-continuous in the input window: a small perturbation of the incoming ticks induces a proportionally small perturbation of (
E,
S), and the cooperative consensus of Equation (5), being a confidence-weighted average on its fusion branch, inherits the same continuity. The single locus of sensitive dependence is the
branch boundary: when the dispersion
σ hovers at
an arbitrarily small input change can flip the consensus between the cooperative-fusion and competitive-argmax branches, so close inputs can yield discontinuous—informally chaotic—output flips. This is precisely the oscillation-across-the-threshold failure mode of
Section 4.3, controlled by the two-tick hysteresis, which commits the system to the active branch for at least two consecutive evaluations and absorbs the dispersion noise that would otherwise drive the flipping. The smoothness observed away from the boundary is consistent with the sensitivity plateau of
Figure 8: close values of
φ and
map to close MCC, and the response degrades gradually rather than abruptly—the signature of a mapping that is continuous except on the measure-zero switch boundary where hysteresis intervenes.
The complexity and latency analysis was performed by timing the full inference step of MRQF-MAS on windows of increasing length, from
to
observations.
Figure 9a reports the per-step latency in microseconds as a function of
, alongside a reference line denoting the theoretical linear scaling
. The measured latency ranges from approximately
μs at
to
μs at
, with sub-linear scaling in the intermediate range due to the efficiency of vectorised wavelet operations, and linear scaling in the regime where the wavelet transform dominates. For practical purposes, the inference latency is in the low hundreds of microseconds across all tested window sizes, which is well below the typical inter-arrival time of FX reference prints during liquid hours and therefore does not constitute a bottleneck for real-time deployment.
Figure 9b reports the per-component breakdown at
: the wavelet-based Signal sub-agent is the dominant cost (0.32 μs), followed by the cooperation layer with its 15 sub-agent evaluations (0.22 μs), the SKB input/output (0.12 μs), the
mapping (0.08 μs) and the Meta-Orchestrator arbitration (0.05 μs). The cooperation layer, which is the architectural novelty whose feasibility was queried in the previous referee round, represents 29% of the total latency and is therefore a substantial but not dominant contributor to the overall cost. All timings were obtained on a single core of an AMD Ryzen 7 5825U CPU (8 cores, 16 threads) with 16 GB of RAM running Windows 11 (64-bit), using the Python 3.12 scientific stack of
Appendix A (NumPy 2.4, SciPy 1.13, PyWavelets 1.6, Matplotlib 3.10); under this configuration the complete out-of-warm-up ECB EUR/USD run over the 6478 evaluation observations completes end-to-end in approximately 1.2 s of wall-clock time, including feature extraction, cooperative inference and metric computation.
Taken together, the sensitivity and complexity analyses support two conclusions. First, MRQF-MAS is not fragile with respect to its hyperparameters: the classification performance varies smoothly and the default values sit close to the optimum on the real ECB series, which is what one would expect of an architecture whose parameters are motivated by theoretical considerations rather than fitted by cross-validation. Second, the cooperation layer is computationally feasible: its overhead is measured in tens of microseconds, which is orders of magnitude below the time scales on which FX markets generate actionable information. These results address, without exhausting, the concerns about ad hoc parameter choices and architectural complexity that motivated this analysis.
12. Discussion
The primary advantage of MRQF-MAS over classical quantitative trading models, including deep-reinforcement-learning-based trading agents [
44,
45] and direct policy search under the fractal market hypothesis [
46,
47], lies in the grounding of its decision structure in a physical theory rather than in curve fitting. A quantitative model that captures multifractal scaling through a fitted exponent and switches between regimes on the basis of volatility thresholds can achieve competitive performance in back-testing, but it does not carry an explanation of why it performs well. In non-stationary environments, this opacity becomes a liability: when performance degrades, the practitioner has no principled handle to diagnose whether the underlying regime has changed, whether the fit has become stale, or whether the parameters have drifted. MRQF-MAS, by contrast, localizes every aspect of its behaviour within a theoretical object: the scaling exponent is an observable quantity linked to the volatility scaling law; the energy
and the entropy
are state variables of the market-as-black-body model; the cooperation primitives are direct analogues of the MRQF interaction dynamics. When performance degrades, diagnostics can be performed layer by layer: at the Signal layer, one inspects the time-series of the estimated scaling exponent
and the spectral power distribution to detect drifts or the appearance of new resonances; at the Energy/Entropy layer, one examines the marginal histograms of
and
and checks whether the current occupancy is covered by the SKB; at the Risk layer, one measures the inter-sub-agent dispersion
and the reliability score
to identify regimes in which agents disagree systematically; at the cooperation layer, one inspects the distribution of consensus type (cooperative vs. competitive) over a rolling window and verifies that the threshold
remains well-calibrated; at the Execution layer, one reconciles realized outcomes against recorded state vectors to check whether the SKB is reflecting the current regime. Each of these inspection points is localized and has a bounded failure mode, which is the operational counterpart of the theoretical layering that MRQF imposes on its implementation.
The comparison with orchestrator-only multi-agent architectures is more subtle. Centralized orchestration is simpler to implement, easier to audit end-to-end and has fewer failure modes in production. The horizontal cooperation layer of MRQF-MAS introduces additional complexity: belief messages can deadlock if not properly sequenced, consensus rules can oscillate if the threshold
is set too tightly, and the SKB becomes a shared resource that must be protected against concurrency issues. We argue, however, that this complexity is justified by the information-flow properties it enables. To support this claim, we ran a Monte Carlo experiment in which both architectures were asked to detect a synthetic regime shift embedded in a noisy signal, with 200 independent runs per configuration. The setup is intentionally stylised—a step-function shift superposed on Gaussian noise—in order to isolate the effect of the lateral cooperation mechanism from other confounds; it is not a trading performance result, and proper benchmarking on historical market data remains future work.
Figure 10 reports the empirical distributions: the cooperative architecture detects the shift with a mean latency of 0.8 ticks against 3.6 ticks for the orchestrator-only variant—roughly a four-fold reduction on this specific task—at the cost of a moderate increase in pre-shift false positives (mean 13.1 against 9.5). The trade-off is favourable for trading applications, where late detection of a regime change typically incurs larger losses than a few additional false positives that the risk gating layer can filter out. Lateral sharing of beliefs among sub-agents of the same specialization allows the system to detect regime changes that no single agent would identify in isolation: when the Signal sub-agents of the three operator-level agents agree that the scaling exponent has shifted from 0.62 to 0.48, the Meta-Orchestrator receives this information as a converged consensus rather than as three independent signals, and it can therefore act with less hesitation. The architectural cost is real but bounded; the informational gain, both quantitatively and qualitatively, is the structural justification for the design.
Two limitations must be acknowledged explicitly. First, the constants
and
of MRQF are context-dependent: they must be calibrated per instrument, per venue and per epoch, and their stability over time is an empirical question that the literature has not yet settled. The present work assumes that these constants are available to the architecture as hyperparameters; a fully self-calibrating version would need to estimate them online, with the attendant risk of estimation bias in regimes of low statistical power. Second, the empirical evaluation reported in
Section 10 is confined to daily EUR/USD reference rates; the behaviour on intraday tick data, on emerging-market currency pairs, on cross-instrument portfolios and on short-horizon strategies is reserved for future work. In particular, the ablation finding that the SKB contributes no measurable value at daily frequency on a single pair is a specific and transparent null result of the present evaluation; its expected contribution in higher-frequency and multi-instrument settings is identified in
Section 13 as a priority research direction.
The objective of MRQF-MAS is not to outperform established volatility estimators such as GARCH-type models in terms of aggregate classification metrics, but to provide a structurally interpretable regime decomposition. While scalar models reduce market dynamics to a single latent variance variable, MRQF-MAS operates on a two-dimensional energy–entropy space, producing regime assignments that carry distinct economic interpretations. In this sense, the framework is intended as a complementary decision layer that can be used upstream of classical volatility estimators, rather than as a direct replacement.
The architectural complexity of MRQF-MAS—comprising multiple operator-level agents, specialized sub-agents, and a horizontal cooperation protocol—should be interpreted in light of its design objectives. The framework is not optimized for minimal predictive pipelines, but for interpretability, traceability, and extensibility in multi-agent settings. Each component contributes to an explicit representation of market structure, allowing decisions to be decomposed, audited, and attributed across agents. This design choice trades raw predictive simplicity for structural transparency, which is a desirable property in decision-support systems operating under uncertainty.
13. Conclusions and Future Work
This paper has presented MRQF-MAS, a multi-agent cooperative decision framework that closes the gap between the Multiscale Relativistic Quantum Finance theory and its operational use as a market-regime classifier. By instantiating the institutional, commercial and retail operators as first-class agents, by decomposing each agent into five specialized sub-agents covering signal, energy, entropy, risk and execution, by introducing a horizontal cooperation protocol that allows sub-agents to share beliefs peer-to-peer, and by anchoring the system to a shared knowledge base of historical trajectories, resonance levels and learned patterns, we have derived a concrete architecture that inherits its interaction primitives and its decision space directly from the underlying physical theory. The mapping between MRQF interaction dynamics—scattering, annihilation, pair production—and multi-agent coordination protocols grounds the cooperation layer in a physical analogy that is more than metaphorical: each primitive corresponds to a specific pattern of information flow and state change. Crucially, the framework is presented and evaluated as a regime-detection system—not as a profit-oriented trading strategy—which is the scope consistent with the architectural choice of regime-aware abstention through the region map.
The contribution of MRQF-MAS lies neither in raw classification performance nor in the individual novelty of any single component—multi-agent systems, wavelet-based volatility estimators, and regime-detection models all have established studies—but in the specific combination of three elements that, to our knowledge, has not previously been articulated as a unified architecture: a theoretically motivated parameterization of agent behaviour through the MRQF constants
,
and the golden ratio
; a two-dimensional energy–entropy decision space whose nine regions admit distinct economic readings rather than a single scalar volatility flag; and a horizontal cooperation protocol that allows sub-agents of different operator-level agents to exchange beliefs peer-to-peer under a shared knowledge base, rather than through a centralized orchestrator. The result is what we describe as a structurally interpretable regime decomposition, complementary to classical volatility models rather than designed to outperform them in aggregate metrics. The two layers of evidence reported in
Section 10—competitive aggregate scores against three baselines, and per-region
assignments traceable through the agent hierarchy—jointly support this positioning.
The empirical validation of
Section 10, carried out on 6978 business-day observations of the official ECB EUR/USD reference rate over 1999–2026, shows that MRQF-MAS operates as a structured high-volatility regime classifier with accuracy
, precision
, Matthews correlation coefficient
with bootstrap 95% confidence interval
, and a two-day median detection latency. On the same data, the framework improves upon a direct rolling-volatility baseline by
MCC points (with non-overlapping bootstrap confidence intervals), is comparable to an out-of-the-box GARCH(1,1) baseline (MCC
), and trails an MLE-fitted GARCH(1,1) by approximately
MCC points (MCC
, 95% CI
). We report this gap openly: it confirms that MRQF-MAS does not deliver state-of-the-art aggregate detection performance, and the contribution we claim is structural rather than performance-based. A temporal out-of-sample split at 1 January 2013 confirms that the theory-motivated default hyperparameters are not overfitted to the first decade of data, with precision rising from
to
in the held-out post-2013 sub-sample. The ablation study isolates the contribution of the cooperation layer (precision improvement from
to
) and transparently reports a null contribution of the shared knowledge base on this daily single-pair task, which is interpreted as an invitation to evaluate the SKB in higher-frequency, multi-instrument settings where its expected informational value is larger. The crisis-by-crisis diagnostic confirms full capture of the 2008 global financial crisis and the 2022 Ukraine shock, partial capture of the 2020 COVID episode and the 2010 European sovereign debt tensions, and explicit acknowledgement that the 2015 Swiss-franc episode—a cross-rate dislocation that did not propagate to EUR/USD at daily resolution—lies outside the structural scope of a single-pair daily classifier and must be addressed by the multi-instrument extensions identified in
Section 13.
Several directions remain open for future development. The highest-priority direction is the extension of the empirical evaluation to intraday tick data on multiple currency pairs and to multi-asset portfolios, where the shared knowledge base is expected to display the architectural contribution that is not measurable at daily single-pair resolution; this extension would also allow the characterization of the regimes in which MRQF-MAS outperforms or underperforms classical volatility estimators under realistic transaction-cost and latency constraints. A second direction is the integration of deep-learning components into the sub-agents and the systematic benchmarking against modern sequence models: the Signal sub-agent could be replaced by a learned wavelet architecture, the Energy and Entropy sub-agents could exploit attention-based estimators, and the cooperation rule of
Section 4.3 could be replaced by a learned consensus operator trained on historical disagreements and their resolutions; a parallel benchmark against LSTM and transformer-based regime classifiers, evaluated on the same MCC and latency metrics, would complete the comparative landscape with data-driven sequence models analogous to the comparison reported here against parametric volatility estimators. A third direction is the extension to multi-asset portfolios, where the SKB would encode cross-asset correlations and the cooperation layer would arbitrate among agents specialized by asset class rather than by operator type. A fourth direction concerns market microstructure validation: the MRQF interpretation of the market as a black body populated by financions predicts specific signatures in the order-book dynamics that should be testable against high-frequency data. We conjecture that the financion-as-gauge-boson framework, read as a structured analogy, will produce empirically verifiable predictions about the timing of liquidity absorption and emission around large market-moving events. These predictions are the natural testing ground for the theory and the object of the next steps in this line of research.
For the sake of intellectual honesty, and to guide the research agenda this manuscript opens, we close with five residual weaknesses of MRQF-MAS, each offered as a priority research direction. First, predictive performance is not competitive: the MLE-fitted GARCH(1,1) baseline attains MCC 0.694 against 0.604 for MRQF-MAS, a statistically significant gap; closing it through hybrid MRQF/GARCH ensembles, learned consensus operators, or richer feature pipelines is the first priority. Second, the architectural overhead of three operator-level agents, fifteen sub-agents, a cooperation protocol and a shared knowledge base is substantial relative to the modest precision gain over a rolling-volatility baseline, and a deployment-cost versus interpretability-benefit study is needed to justify it. Third, the shared knowledge base contributes nothing at daily frequency; demonstrating its value empirically in the multi-frequency and multi-instrument extensions identified above remains an outstanding obligation. Fourth, the multiscale nature of MRQF, its most distinctive feature, is not stressed by a daily-resolution evaluation and requires intraday tick data to validate. Fifth, and arguably most important, the interpretability we cite as the principal contribution is asserted at the design level but not yet quantified; developing a metric suite for the interpretability of (E, S) regime decompositions—covering audit-trail completeness, decision-attribution entropy, and expert-agreement scores—is the final priority, since it is the metric under which the framework is meant to be judged.
Finally, we note that the present work does not include a direct comparison with modern deep sequence models such as LSTM or transformer-based architectures. This omission is intentional, as the focus of the paper is on establishing a theoretically grounded and interpretable framework rather than on benchmarking predictive performance against data-driven models. A systematic comparison with such architectures, under the same evaluation protocol and metrics, is identified as a priority direction for future work.