1. Introduction
Private consumption is central to aggregate demand but declines often precede official crisis statistics. Anticipating these events matters not only for forecast accuracy but also for the timing of household welfare and macroeconomic stability. Recent research on tail risks emphasizes that tight financial conditions can shift the distribution of growth in a negative direction, making early, policy-relevant warning a central policy objective [
1,
2]. Historical evidence shows that credit-intensive expansions are often followed by severe recessions, with significant declines in output, investment, and, in particular, private spending [
3,
4]. This link between financial imbalances and real activity underpins the search for indicators that can be adapted early and relevant to demand management.
Early warning systems have traditionally focused on currency and banking crises, where pre-crisis anomalies can be detected in real time. A key finding is that information embedded in financial aggregates and cross-market prices can foreshadow actual economic stresses long before changes in broader activity indicators [
5,
6]. However, for consumption, early warnings targeting private demand (rather than systemic defaults) remain underdeveloped, despite their direct policy relevance. Fiscal channels add further significance to this omission. Balance sheet strains and tight credit availability can quickly affect spending, exacerbating recessions precisely when insurance mechanisms weaken [
2]. Microfounded consumption theories also predict that recessions are more severe when liquidity constraints exist and precautionary motives dominate. This underscores why timely detection of emerging stresses is essential for social and fiscal policy [
7].
A smaller strand of work looks specifically at risks closer to the consumption margin. Studies on household balance sheets show that high leverage and large debt-service burdens can depress consumption for prolonged periods after adverse shocks, effectively turning balance-sheet indicators into early signals of weak private demand [
2]. Other contributions highlight the predictive power of survey-based indicators and search data for consumption dynamics, indicating that information beyond national accounts can flag turning points in household spending in real time. These findings strengthen the case for an explicit demand-side early-warning framework that combines traditional financial conditions with broader measures of household stress.
Methodologically, data-rich macroeconomics suggests that a wide range of information can help identify early turning points. Diffusion indices and factor-based approaches transform extensive panel data (financial and real) into compact signals that precede national accounts data, providing a natural foundation for the design of early warning systems [
8,
9]. Applying these concepts to private consumption can leverage rapidly evolving financial data and more slowly revised consumption statistics. However, the attribution of financial data to demand varies over time, with variance increasing during periods of stress. Time-varying parameter models with stochastic volatility account for these characteristics, incorporating structural change and heteroskedasticity while maintaining interpretability [
10,
11]. Incorporating a regime change perspective also explains economic transitions across countries with different shock transmission rates [
12].
Thus, the TVP-SV-VARX model is well-suited to isolating dynamic real-financial transmission channels affecting consumption. By developing coefficients and volatility, we can separate persistent behavioral changes from transient noise, while the exogenous module can absorb high-frequency financial conditions whose predictive content is inherently time-varying [
10,
13]. The model’s residual structure provides a rigorous basis for detecting sudden anomalies. Tree ensembles are particularly attractive in practice. Random forests stabilize noisy predictors through bagging and random subspaces, while gradient boosting creates additive trees that capture high-order interactions; both can handle mixed-scale and heavy-tailed noise with minimal preprocessing [
14,
15,
16]. These properties are consistent with financial data, where nonlinear thresholds and interactions are common.
Nonlinear and high-dimensional learners improve real-time inference on large panel data, especially during crises, complementing traditional factor models and reducing regression [
17,
18]. For instantaneous consumption forecasting in particular, information beyond national accounts (search queries and current financial indicators) provides early signals of spending changes [
19]. Developing early warning systems for consumption declines also requires tailored assessments based on scarcity and political costs. For scarce positive events, precision–recall analysis is more meaningful than ROC curves because it is based on flagged events and appropriately penalizes false positives [
20]. Appropriate evaluation rules, such as the Brier score, assess probability calibration, which is essential for authorities to translate signals into transparent, rule-based responses [
21].
Policy application also requires clear thresholds and an understanding of the loss function: a well-calibrated signal with a clear operating point can provide lead time without causing signal fatigue. The early warning literature recommends adjusting thresholds for asymmetric losses and verify stability across timeframes and model classes, often via ensemble confirmation rules [
22,
23]. These principles guide our operational implementation. In small open economies, changes in external financing and domestic credit can rapidly alter households’ propensity to consume. Analyses of fragile growth emphasize that deteriorating financial conditions compress the low end of the growth distribution, consistent with a sharp decline in consumption under stress conditions [
1]. Demand-side early-warning systems complement bank-focused models. Our approach integrates these insights. The TVP-SV-VARX model extracts the evolving linkages between the real economy and finance and generates economically interpretable residuals; a tree-based learner then classifies anomalies in these residuals as precursors to short-term declines in consumption. This hybrid design aims to combine the advantages of both: structural interpretability facilitates political communication, while flexible pattern recognition facilitates timeliness.
The information set is intentionally broad. Diffusion exponential logic and factor-based models suggest that expanding the scope-covering credit, interest rates, spreads, prices, and forward-looking surveys-can increase the probability of detecting early signs of weakening demand before it impacts consumption [
8,
9]. In practice, this improves resilience to corrections and data gaps. Historical evidence emphasizes the importance of credit anomalies. Credit-financed booms significantly exacerbate the depth and duration of subsequent recessions, with consumption being one of the hardest hit economic components [
3,
4]. Therefore, anomaly-aware classifiers trained on real-time features of financial conditions are well-suited to this regime.
We position our contribution within a growing trend toward the use of nonlinear tools for macroprudential and stabilization purposes. Comparative “horse racing” studies have shown that ensemble models and modern learning models often outperform traditional single-equation benchmarks in crisis forecasting, especially when combined with simple confirmation rules [
23]. Our framework applies these findings to the consumption margin.
The evaluation strategy follows established procedures for detecting rare events. Precision–recall curves guide the selection of operating points, while Brier scores and real-time stability tests ensure that probabilities remain precise and well-calibrated across years. These diagnostics, coupled with clear communication of lead times, reinforce policy-relevant performance claims [
20,
21].
In doing so, we also aim to bridge a gap in practice. While warnings targeting the banking system are increasingly incorporated into policy tools, demand-side warnings remain ad hoc. By anchoring the signals in a “structure plus learning” architecture, the proposed system transforms emerging financial anomalies into actionable probabilities of short-term consumption weakness.
The result is a research agenda with direct policy benefits. If emerging financial anomalies can indeed predict sharp declines in consumption with sufficient lead time, authorities have an opportunity to implement targeted social support and precautionary fiscal measures to cushion household spending and support demand. The remainder of this paper formalizes the framework, details the data, and reports the out-of-sample performance of policy-driven operational points. In this way, we situate our contribution within the broader macroeconomic forecasting literature, which offers both structure and flexibility. Advances in high-dimensional methods no longer require a choice between economics and machine learning, but rather a need to combine them under evaluation metrics that reward the right trade-offs for rare and costly events. Our design and results directly address this need.
Against this background, this study develops a quarterly early-warning model for Romania using explainable ensemble learning. First, it combines a time-varying parameter VAR with stochastic volatility and exogenous drivers (TVP-SV-VARX) with tree-based machine-learning classifiers to transform macro-financial anomalies into calibrated probabilities of short-term private consumption declines. Second, it implements a strictly time-based, rare-event evaluation design and reports the trade-offs between discrimination, calibration, and lead time at policy-relevant operating points. Third, it translates these results into a governance-ready WATCH–AMBER–RED framework with explicit confirmation and coverage rules, designed for use in data-poor environments where predictive tools must support, rather than replace, structural judgement.
In Romania, these issues are particularly salient. Private consumption accounts for a large share of GDP, and the economy is strongly integrated into European financial markets, so shifts in external financing conditions and sovereign spreads pass through quickly to domestic credit and interest rates. Episodes of macroeconomic tightening and fiscal slippage have repeatedly exposed the sensitivity of household demand to changing financial conditions and confidence. This combination of high consumption shares, external exposure, and recurrent fiscal pressures makes Romania a natural test case for a demand-side early-warning framework.
3. Materials and Methods
This study employs a two-tiered early warning process, combining economic filters and flexible classifiers, applied to Romanian quarterly macro-financial data. The structural layer (time-varying and volatility-aware) extracts innovation data that deviate from the model’s consistent dynamics; the learning layer translates these innovation data into the probability of a short-term consumption decline. The empirical approach covers data from the first quarter of 2009 to the first quarter of 2025 and favors assessment tools that are appropriate for rare events and probabilistic integrity. All steps, from data collection to reporting, are reproducible in RStudio (Version 2024.12.1+563), using a fixed seed and exportable validation results [
20,
21].
Research question:
Can emergent anomalies in Romania’s financial conditions predict abrupt downturns in private consumption, offering policymakers the lead time required to stage social and fiscal measures?
Hypotheses:
Hypothesis 1 (H1). Time-varying financial anomalies improve the early detection of consumption downturns relative to static benchmarks.
Hypothesis 2 (H2). A hybrid structural-plus-machine-learning pipeline yields better rare-event discrimination and more reliable probabilistic signalling than either component in isolation.
The empirical study window extends from the first quarter of 2009 to the first quarter of 2025, collected quarterly. Data series are provided by the National Bank of Romania (BNR), Eurostat, and the European Central Bank (ECB), covering yield curves, credit volumes and spreads, prices, labor market indicators, forward-looking surveys, and national accounts consumption. Data alignment is performed using end-of-quarter values; level variables (such as credit volumes and consumption) enter the model in log-differences, while index series are rebased to a common reference period for consistency. Modelling, validation, and reporting are performed in RStudio. Given the class imbalance, the evaluation focuses on precision/recall geometry and correct estimation rules rather than accuracy. The structural backbone is a time-varying-parameter VAR with stochastic volatility and exogenous drivers (TVP-SV-VARX). In compact form,
with drifting coefficients and log-volatilities. The exogenous block
absorbs high-frequency financial conditions whose predictive content is inherently time-varying.
From this system, innovation residuals
are obtained. These are expanded into a feature set
that explicitly targets the “emergent anomalies” in the research question while preserving interpretability through the structural filter. Early warning is posed as supervised classification on
), where
flags downturn episodes (with a stricter downturn_strict variant for robustness). Two complementary learners are employed: Random Forest and Extreme Gradient Boosting (XGBoost). They were selected for their robustness to outliers, non-linear thresholds, and cross-predictor interactions characteristic of financial-conditions data. For policy-usable probabilities, base scores are mapped through a sparse logistic calibration,
with
the base-model score and
the logistic link.
Temporal validation follows a rolling-origin, blocked design with purge and embargo between training and test blocks to prevent look-ahead and adjacency leakage. Hyperparameters (e.g., anomaly-window length and residual lag order
) are selected by out-of-sample PR-AUC, with ROC-AUC as secondary. In practice, the anomaly-window length w and residual lag order
p are restricted to moderate values to balance the need to capture cyclical dynamics against the limited sample size. We explore w in {20, 28, 32}, corresponding roughly to five to eight years of quarterly data, and
p in {1, 2, 3}, which is sufficient to absorb short-run dynamics without exhausting degrees of freedom. The specifications reported in
Table 1 are those that consistently deliver the highest out-of-sample PR-AUC within this constrained hyperparameter space.
Performance is summarised by the area under the precision–recall curve and the Brier score,
complemented by confusion matrices at operating thresholds and a lead-time diagnostic defined as the median gap (in quarters) between first signal and label onset. Operating points for governance use a three-tier scheme. Let
and
enote calibrated scores from Random Forest and XGBoost. A RED alert is defined as
with AMBER and WATCH at lower thresholds, and a coverage gate requiring sufficient anomaly coverage over the active window. Such policy-aware thresholds and ensemble confirmation reflect asymmetric losses and help retain lead time without inflating false alarms [
22,
23].
To verify incremental information in anomalies, an ablation is run in which the classifier is trained on features without the time-varying anomaly layer; the degradation in precision–recall performance relative to the TVP variant confirms H1, consistent with gains from parameter and volatility drift in macro-financial settings [
10,
34]. Label robustness is established by re-estimating on downturn strict, with stable model rankings supporting H2’s ensemble premise. The framework is predictive, not causal: the TVP-SV-VARX disciplines the feature space, while the tree-based learners supply the non-linear separators necessary for rare-event discrimination, all implemented end-to-end in RStudio [
14,
16,
20].
4. Results
Between the first quarter of 2009 and the first quarter of 2025, Romania’s macro financial landscape encompassed periods of expansion, stress, and political regime change—conditions that could bias the transmission of financial conditions to private spending. Against this backdrop, this study tested a two-layer early warning system: a TVP-SV-VARX filter that provides time-varying and volatility-aware outliers; and a machine learning layer (using random forests, XGBoost, and sparse L1 logistic regression for calibration) that converts these outliers into the probability of a short-term consumption decline. Performance was evaluated using PR-AUC (primary), ROC-AUC (secondary), Brier score, confusion matrix with pre-set thresholds, and lead time diagnostics.
Table 1 shows the comparative results for different specifications. TVP Random Forest (window = 28,
p = 3) was the best in class (PR-AUC = 0.869; ROC-AUC = 0.889) and, crucial for deployment, maintained a tight operating threshold of 0.83, with precision = 1.00, recall = 0.83, F1 = 0.91, TP = 5, FP = 0, FN = 1, and average leading edge = 1 quartile. TVP XGBoost (window = 20,
p = 2) ranked second (PR-AUC = 0.752; ROC-AUC = 0.851), with precision = 1.00, recall = 0.67, and F1 = 0.70. TVP L1 Logit (w = 28,
p = 1) achieved the lowest Brier value (0.055) while maintaining a ROC AUC of 0.883, supporting its use as a calibration plot. Ablation without TVP abnormalities (no_tvp XGB) dropped to PR AUC = 0.505 and recall = 0.33, confirming the value of the TVP layer.
Figure 1 visualizes this ranking. The Random Forest bar chart dominates the PR-AUC and also maintains a strong performance in the ROC-AUC; XGBoost (PR) follows closely behind; L1 Logit approaches ROC but optimizes calibration by design; and the no_tvp XGB bar chart exhibits a significantly lower PR-AUC. This contrast demonstrates that modeling temporal variation and volatility at a structural level is particularly effective when detecting rare events is crucial.
Further,
Table 2 outlines the governance logic. RED is triggered when either RF ≥ 0.83 or RF ≥ 0.70 and XGB ≥ 0.70 are both present; AMBER is triggered when XGB ≥ 0.70; and WATCH uses a recall-driven XGB with a strict label of 0.56. A coverage gate (≥0.90) ensures that alerts are only issued when anomalous features are sufficiently present. This grading translates econometric improvements into graded actions (internal vigilance–scenario preparation–pre-authorized action).
Table 2.
Operating thresholds and ensemble confirmation. Early-warning tiers (Romania, 2009 Q1–2025 Q1).
Table 2.
Operating thresholds and ensemble confirmation. Early-warning tiers (Romania, 2009 Q1–2025 Q1).
| Tier | Spec (Variant|Model|w,p|Label) | Threshold | Trigger (Compact) | Intended Use |
|---|
| WATCH | tvp|xgb (recall)|20.2|downturn_strict | 0.56 | XGB_recall ≥ 0.56 | Internal vigilance, intensified monitoring |
| AMBER | tvp|xgb (PR)|20.2|downturn | 0.70 | XGB ≥ 0.70 | Scenario prep, stakeholder pre-coordination |
| RED | tvp|rf|28.3|downturn | 0.83 | RF ≥ 0.83 or (RF ≥ 0.70 and XGB ≥ 0.70) | Pre-authorised measures; external signalling |
| Gate | (data quality) | ≥0.90 coverage | Emit any signal only if anomaly-feature coverage ≥ 0.90 | Safeguard against thin inputs |
Figure 2.
Lead Time Overview.
Figure 2.
Lead Time Overview.
Next,
Table 3 documents the error profile. The RED level (RF threshold 0.83) achieves FP = 0, consistent with costly interventions. AMBER (XGB of 0.70) maintains a precision of 1.00 with a lower recall of 0.67, suitable for mobilization without overt signals. WATCH (XGB recall of 0.56) improves sensitivity (recall of 0.80) but produces fewer false positives, making it suitable for enhanced surveillance.
Figure 3 shows the observed and predicted frequencies by probability category. The L1 logit model’s low Brier value (0.055) and nearly diagonal confidence curve support its use for probabilistic transfer while maintaining a random forest as the basis for decision making. This segmentation improves transparency, providing stakeholders with calibrated numbers and clear operational warnings.
Hence,
Table 4 and
Figure 2 confirm operational readiness. The average lead time for all models is one-quarter, with a tight interquartile range (IQR); the average anomaly feature coverage is 0.95, exceeding the 0.90 threshold. For policy applications, this means the system typically allows a one-quarter margin and issues warnings only when financial anomaly signals are widespread and not a single indicator.
Table 5 (over the past 12 quarters) relates these statistics to recent experience. The random forest model recorded near-hit probabilities (approximately 0.818–0.823) from Q2 to Q4 2022, slightly below the 0.83 red line, and began to decline from Q1 2023; thus, the red line was not triggered. The PR-selected XGB peaked in Q1 2023 at approximately 0.503 (below 0.70). The recall-selected XGB produced a peak in alert levels from Q3 to Q4 2022, consistent with elevated but unconfirmed pressure. This pattern suggests that the layered system increases alertness without inducing signal fatigue, and that in recent years it would have implied heightened internal vigilance but no activation of costly RED-level policies.
Figure 4 overlays the RF, XGB (PR), and XGB (Recall) probabilities with the 0.56/0.70/0.83 lines. This figure explains the lack of an escalation after 2022: RF never exceeds 0.83, and the combination of RF ≥ 0.70 and XGB ≥ 0.70 is not observed. This aggregation confirms that the quarter-point lead is maintained while avoiding characteristic peaks, a design that favors asymmetric political losses.
From this point of view, the TVP-SV-VARX layer supports these results. By allowing for drift in the coefficients and volatility, it filters innovation into standardized outliers that reflect the regime-dependent transmission from financial conditions to consumption. Therefore, the no_tvp deficits in
Table 1 and
Figure 1 are diagnostic, not random: structural time variation is an econometric mechanism that transforms unstable markets into stable early warning signals. In practice, the system responds both to the level of stress and to how it affects household spending precisely the desired combination of demand-side policy tools.
5. Discussion
The results demonstrate that combining anomaly-driven, time-varying macro-financial filters with a flexible classifier can provide reliable early warning of declines in private consumption in Romania. The TVP-SV-VARX layer isolates economically significant shocks in the macro-financial transmission before they show up as weaker household spending. Furthermore, a Random Forest with a strict threshold achieves a median lead of one quarter in backtesting with no false positives, while the complementary XGBoost signal provides earlier but more moderate early warnings. Overall, the results support a viable policy workflow in which risks escalate from internal monitoring to scenario preparation and, only when warranted, to public policy measures [
22,
23]. The complementary XGBoost signal provided earlier but more moderate early warnings. Overall, the results support a viable policy workflow in which risks escalate from internal monitoring to scenario preparation to public policy measures [
22,
23].
The research question stating whether anomalies in financial conditions can predict sharp declines in consumption with useful lead times, is answered in the affirmative. The TVP layer extracts the time-varying transmission of interest rates, spreads, and credit conditions to household demand; the machine learning layer then applies nonlinear thresholds that static linear models would miss. The finding that early warnings can be issued with considerable sensitivity and without false alarms at a one-quarter lead time directly reflects the flexibility available to policymakers when designing temporary transfers or liquidity support. This is consistent with the literature emphasizing that in rare event scenarios, early warning tools must prioritize timeliness and accuracy over absolute accuracy [
20,
21].
The first hypothesis (H1), that time-varying financial anomalies improve early detection compared to a static benchmark, is supported by the ablation model. Removing the anomaly layer (no_TVP) leads to significant drops in PR-AUC and recall, validating H1. From an econometric perspective, shifts in coefficients and volatility help capture the regime-dependent propagation of financial stress to actual activity [
10,
11]. This implies that the system responds not only to the degree of financial constraints but also to the way political and economic developments shape household behaviour [
34]. The second hypothesis (H2), that the hybrid structured and machine-learning pipeline outperforms its individual components, is also confirmed. The tree-based learner represents difficult-to-predetermine interactions and thresholds in the anomaly space, while the structured filter normalises the feature set and reduces spurious patterns. This division of labour aligns with the proven strength of tree ensembles in handling noisy, interacting predictors [
14,
16]. In practice, Random Forest acts as the primary decision engine, providing the binary WATCH–AMBER–RED escalations at fixed thresholds, whereas the sparse logistic regression model is used as a separate calibration layer that maps base scores into well-behaved probabilities for communication and governance. This explicit separation between the discrete decision layer and the probabilistic calibration layer meets operational and accountability requirements [
21]. The tree-based learner represents difficult-to-predetermine interactions and thresholds in the anomaly space, while the structured filter normalizes the feature set and reduces spurious patterns. This division of labor aligns with the proven strength of tree ensembles in handling noisy, interacting predictors [
14,
16]. In practice, this yields a decision engine (random forest) for binary escalation and a probabilistic engine (sparse logistic regression) for calibrating communications, thus meeting operational and accountability requirements [
21].
The calibration results are crucial for implementation. The reliability plots show that the predicted probabilities track the observed frequencies across categories. This is crucial if government agencies are to base preauthorization actions on probability intervals rather than ad hoc decisions. Using sparse logistic regression calibration plots to adjust scores reduces probabilistic errors while maintaining high discriminatory power. This aligns with the view that appropriate assessment and calibration are prerequisites for risk communication in a policy setting [
21]. This allows transparent dashboards to complement the WATCH/AMBER/RED tiers and provide realistic probabilities in briefing documents.
Recent behavior demonstrates the governance value of tiering. Due to a lack of confirmation from supplementary models, the near-RED threshold at the end of 2022 was not escalated, thus preventing signal fatigue. At the same time, the alert-oriented WATCH tier flagged elevated but unconfirmed risks, thereby legitimizing increased monitoring without triggering costly interventions. This escalation is precisely what the early warning literature recommends, particularly in situations where losses are asymmetric and the political costs of false alarms are high [
22,
23].
Figure 4 summarises the escalation sequence and the cross-model confirmation logic across the WATCH–AMBER–RED tiers.
Limitations and next steps are clear. First, the framework is predictive, not causal; it should inform structural judgments, not replace them. For policy use, this implies that model signals are best treated as inputs into scenario analysis, targeted monitoring, and the pre-positioning of temporary measures rather than as mechanical triggers for intervention. Authorities should combine the model’s probabilities with institutional knowledge and stress-testing before committing to costly actions. Second, while maintaining strict coverage, expanding the exogenous block to include high-frequency sentiment or credit card issuance proxies could improve short-term sensitivity. Third, regular reestimation and monitoring of the TVP layer can prevent parameter staleness as the regime evolves [
10,
34]. Within these limitations, the evidence suggests that emerging financial anomalies, filtered through the TVP-SV-VARX and analysed by a robust learner, can provide a normative, reliable early warning of private consumption decline, confirming the research question and hypotheses.
6. Conclusions
The findings support the development of an operational early warning system that translates macro-financial anomalies into timely guidance on private consumption risks. A median warning time of one quarter effectively enables proactive policymaking, allowing decisions to shift from reacting to a recession to preparing for it.
In budgetary practice, the level of warning naturally corresponds to the degree of preparedness. The “WATCH” state requires enhanced monitoring and simple diagnostics-checking data integrity, updating immediate forecasts, and testing transfer mechanisms. The “AMBER” state supports scenario building in the absence of public signals: targeted beneficiary selection, legal review of temporary measures, and identification of administrative bottlenecks. The “RED” state supports the activation of time-limited support—modest, reversible instruments such as temporary transfers to vulnerable households, short-term utility arrears relief, or a narrow VAT deferral that is phased out as pressures ease.
Although the empirical implementation is specific to Romania, the architecture is generic. Other small open economies with similar macro-financial constraints can adopt the same two-layer design, provided that local variables, sample periods, and thresholds are re-estimated. In such settings, the TVP-SV-VARX layer would again extract country-specific macro-financial anomalies, while the ensemble classifier and calibration layer would be re-tuned to local rare-event patterns, preserving the balance between flexibility and interpretability.
Beyond the core framework, similar tiered logic can be applied to supervisory, labour-market, and social policy tools. WATCH-level signals justify enhanced monitoring of household credit and targeted analytical work, while AMBER-level signals support preparatory steps such as eligibility reviews, data-sharing arrangements, and the design of temporary support measures. RED-level signals can then trigger pre-authorised, time-limited interventions, such as targeted transfers or employer-retention schemes, subject to clear exit criteria. Throughout, calibrated probabilities remain primarily an internal input, while external communication focuses on risk levels and readiness rather than model outputs, limiting the risk of amplifying volatility. Governance arrangements—such as a cross-ministerial committee that sets thresholds, reviews warnings, and periodically stress-tests alternative rules—are essential for ensuring that the system remains evidence-based and accountable. Over time, linking activation levels to observed welfare and demand outcomes can inform recalibration and help assess whether early warnings are translated into effective, proportionate policy responses.