Figure 1.
Overall architecture of the proposed CMWSL stock-index volatility forecasting and risk-warning framework.
Figure 1.
Overall architecture of the proposed CMWSL stock-index volatility forecasting and risk-warning framework.
Figure 2.
Rolling walk-forward evaluation protocol. The blue bar denotes the full daily market sample, the brown bar marks the design and model-freeze period, and the gold bar marks the rolling out-of-sample evaluation period. Each step advances the forecast origin by one trading day. The training window (1260 days, shaded) rolls forward to generate the out-of-sample forecast sequence spanning 2016–2025. All feature construction, model fitting, warning-threshold calibration, and evaluation are performed strictly within the boundary of each training window, ensuring full causal integrity.
Figure 2.
Rolling walk-forward evaluation protocol. The blue bar denotes the full daily market sample, the brown bar marks the design and model-freeze period, and the gold bar marks the rolling out-of-sample evaluation period. Each step advances the forecast origin by one trading day. The training window (1260 days, shaded) rolls forward to generate the out-of-sample forecast sequence spanning 2016–2025. All feature construction, model fitting, warning-threshold calibration, and evaluation are performed strictly within the boundary of each training window, ensuring full causal integrity.
Figure 3.
Quantile-conditioned ΔQLIKE surfaces. Panel (a) reports Wavelet-LightGBM minus HAR averaged across all indices in the full 2016–2025 package. Panel (b) reports Spillover-LGB minus Wavelet-LGB for the pooled DJIA and S&P 500 spillover package. Negative values indicate that the more advanced model improves accuracy.
Figure 3.
Quantile-conditioned ΔQLIKE surfaces. Panel (a) reports Wavelet-LightGBM minus HAR averaged across all indices in the full 2016–2025 package. Panel (b) reports Spillover-LGB minus Wavelet-LGB for the pooled DJIA and S&P 500 spillover package. Negative values indicate that the more advanced model improves accuracy.
Figure 4.
Time-varying gain paths based on 126-day rolling average ΔQLIKE. Panel (a) reports Wavelet-LGB minus HAR averaged across all indices in the full 2016–2025 package. Panel (b) reports Spillover-LGB minus Wavelet-LGB for the pooled DJIA and S&P 500 spillover package. Negative values indicate that the more advanced model improves accuracy. Shaded regions denote pooled top-decile VIX stress windows.
Figure 4.
Time-varying gain paths based on 126-day rolling average ΔQLIKE. Panel (a) reports Wavelet-LGB minus HAR averaged across all indices in the full 2016–2025 package. Panel (b) reports Spillover-LGB minus Wavelet-LGB for the pooled DJIA and S&P 500 spillover package. Negative values indicate that the more advanced model improves accuracy. Shaded regions denote pooled top-decile VIX stress windows.
Figure 5.
QLIKE across indices and forecast horizons in the main Rogers–Satchell specification. Lower values indicate better performance.
Figure 5.
QLIKE across indices and forecast horizons in the main Rogers–Satchell specification. Lower values indicate better performance.
Figure 6.
Winner-and-margin map for the main Rogers–Satchell specification. Each tile reports the winner and its QLIKE margin over the runner-up. Only three model colors appear in the tiles because HV-22 does not win any index–horizon cell in this specification.
Figure 6.
Winner-and-margin map for the main Rogers–Satchell specification. Each tile reports the winner and its QLIKE margin over the runner-up. Only three model colors appear in the tiles because HV-22 does not win any index–horizon cell in this specification.
Figure 7.
Average QLIKE across the main and robustness settings. Lower values indicate better performance.
Figure 7.
Average QLIKE across the main and robustness settings. Lower values indicate better performance.
Figure 8.
Cell-level QLIKE differences relative to HAR. Negative values indicate that the nonlinear model outperforms HAR.
Figure 8.
Cell-level QLIKE differences relative to HAR. Negative values indicate that the nonlinear model outperforms HAR.
Figure 9.
Precision–recall comparison for the warning task using the S&P 500 at . The black dotted horizontal line marks the no-skill baseline implied by the event prevalence.
Figure 9.
Precision–recall comparison for the warning task using the S&P 500 at . The black dotted horizontal line marks the no-skill baseline implied by the event prevalence.
Figure 10.
Event-aligned warning diagnostics for the S&P 500 during the 2020 stress period.
Figure 10.
Event-aligned warning diagnostics for the S&P 500 during the 2020 stress period.
Figure 11.
Statistical evidence heatmap for each ablation step. (Left): Diebold–Mariano two-tailed test for Step 1 (HAR vs. Wavelet-LGB). (Right): Clark–West one-tailed test for Step 2 (Wavelet-LGB vs. Spillover-LGB). Teal tones indicate that the advanced model wins at the stated significance level; navy tones indicate the simpler model wins. Colour intensity scales with . Asterisks denote statistical significance: *** , ** , * , and . .
Figure 11.
Statistical evidence heatmap for each ablation step. (Left): Diebold–Mariano two-tailed test for Step 1 (HAR vs. Wavelet-LGB). (Right): Clark–West one-tailed test for Step 2 (Wavelet-LGB vs. Spillover-LGB). Teal tones indicate that the advanced model wins at the stated significance level; navy tones indicate the simpler model wins. Colour intensity scales with . Asterisks denote statistical significance: *** , ** , * , and . .
Figure 12.
QLIKE gain ladder per index. Each panel shows the incremental QLIKE gain (positive = improvement over baseline) for Step 1 (amber bars when DM-significant, light grey otherwise) and Step 2 (teal bars when CW-significant, light grey otherwise). Significance codes: *** , ** , * , . .
Figure 12.
QLIKE gain ladder per index. Each panel shows the incremental QLIKE gain (positive = improvement over baseline) for Step 1 (amber bars when DM-significant, light grey otherwise) and Step 2 (teal bars when CW-significant, light grey otherwise). Significance codes: *** , ** , * , . .
Figure 13.
Horizon-dependent SHAP feature attribution (top-12 mean absolute values) for
wavelet_lightgbm (S&P 500). Navy bars denote wavelet features; grey bars denote non-wavelet features. The shift from
to
confirms that medium-scale frequency-domain information adds value selectively as the forecast horizon lengthens—consistent with the quantile-conditioned and rolling-window evidence in
Figure 3 and
Figure 4. (
a)
: persistence and short-run signals dominate; wavelet features appear further down. (
b)
: wavelet energy at d2/d3 scales rises sharply in the ranking.
Figure 13.
Horizon-dependent SHAP feature attribution (top-12 mean absolute values) for
wavelet_lightgbm (S&P 500). Navy bars denote wavelet features; grey bars denote non-wavelet features. The shift from
to
confirms that medium-scale frequency-domain information adds value selectively as the forecast horizon lengthens—consistent with the quantile-conditioned and rolling-window evidence in
Figure 3 and
Figure 4. (
a)
: persistence and short-run signals dominate; wavelet features appear further down. (
b)
: wavelet energy at d2/d3 scales rises sharply in the ranking.
Figure 14.
SHAP beeswarm for wavelet_lightgbm (S&P 500, ). Each point is one observation; colour encodes the feature value. SWT energy features at the d2/d3 decomposition levels rank among the most influential predictors, and high energy values consistently produce positive SHAP contributions, confirming that medium-scale frequency-domain volatility drives upward forecast revisions.
Figure 14.
SHAP beeswarm for wavelet_lightgbm (S&P 500, ). Each point is one observation; colour encodes the feature value. SWT energy features at the d2/d3 decomposition levels rank among the most influential predictors, and high energy values consistently produce positive SHAP contributions, confirming that medium-scale frequency-domain volatility drives upward forecast revisions.
Table 1.
Positioning of the present study relative to representative prior works. ✓ = present; ∘ = partial; — = absent. Within-: within-index wavelet decomposition; Cross-: cross-index wavelet spillover features; Warning: integrated early-warning layer; Public: all data from public sources; Walk-fwd: rolling walk-forward evaluation; DL base.: deep-learning model included as baseline.
Table 1.
Positioning of the present study relative to representative prior works. ✓ = present; ∘ = partial; — = absent. Within-: within-index wavelet decomposition; Cross-: cross-index wavelet spillover features; Warning: integrated early-warning layer; Public: all data from public sources; Walk-fwd: rolling walk-forward evaluation; DL base.: deep-learning model included as baseline.
Table 2.
Public data blocks used in the baseline and robustness specifications. The table is organized by predictor block, source, and empirical role in the forecasting pipeline.
Table 2.
Public data blocks used in the baseline and robustness specifications. The table is organized by predictor block, source, and empirical role in the forecasting pipeline.
| Block | Variables/Symbols | Source | Main Role |
|---|
| Equity indices | S&P 500, Nasdaq-100, DJIA | Stooq | Daily OHLC prices and volatility targets |
| Liquidity proxies | SPY, QQQ, DIA | Stooq | ETF-based volume and trading-activity proxies |
| Implied volatility | VIXCLS, VXNCLS, VXDCLS | FRED (CBOE series) | Index-specific forward-looking risk proxies |
| Rates and conditions | DFF, DGS10, T10Y3M, NFCI, USRECD | FRED | Macro-financial and regime controls |
| Expanded public risk data | DGS2, BAMLH0A0HYM2, BAMLC0A0CM, USEPUINDXD, RVXCLS, VXVCLS | FRED | Credit, policy-uncertainty, and volatility-term-structure robustness block |
Table 3.
Core notation used in the methodological formulation.
Table 3.
Core notation used in the methodological formulation.
| Symbol | Category | Meaning |
|---|
| i | Cross-sectional index | Market index identifier (, , or ) |
| t | Time index | Forecast origin on trading day t |
| h | Forecast horizon | Prediction horizon in trading days (1, 5, or 10) |
| Forecast target | Future average volatility over the next h trading days |
| Transformed target | Log-transformed target used for model fitting |
| Feature block | Daily, weekly, and monthly persistence features |
| Feature block | Technical, implied-volatility, and macro-financial predictors |
| Feature block | Causal wavelet summaries from volatility and risk series |
| Calibrated forecast | Post-floor volatility prediction used for evaluation and warning linkage |
| Warning threshold | Rolling in-window 90th percentile for defining high-risk states |
| Warning score | Predicted high-risk probability from the logistic warning layer |
Table 4.
Leakage-free pseudo-code of the rolling multiscale forecasting and warning procedure.
Table 4.
Leakage-free pseudo-code of the rolling multiscale forecasting and warning procedure.
| Step | Operation |
|---|
| 1 | For each index i and horizon h, define a rolling training window of 1260 trading days and a one-step-ahead out-of-sample forecast origin. |
| 2 | Construct the OHLC-based volatility target , persistence terms, technical indicators, implied-volatility proxies, and macro-financial predictors using only data available up to time t. |
| 3 | Apply the causal non-decimated wavelet transform to selected high-value series and compute multiscale summaries such as recent coefficients, short-window means, standard deviations, and local energy terms. |
| 4 | Form the feature vector and estimate the horizon-specific boosting model on the transformed training target. |
| 5 | Generate the raw forecast and apply the horizon-specific lower-tail calibration safeguard to obtain the final regression forecast . |
| 6 | Compute the rolling high-risk threshold , estimate the logistic warning model on the training window, and choose the decision threshold that maximizes the training-window score with . |
| 7 | Output the calibrated volatility forecast, the warning probability, and the warning label for the current out-of-sample date, then roll the window forward and repeat. |
Table 5.
Main regression QLIKE across indices and forecast horizons under the Rogers–Satchell target. Best, second-best, and third-best values within each index–horizon cell are marked by gold shading with bold type, peach shading with underline, and blue-gray shading, respectively.
Table 5.
Main regression QLIKE across indices and forecast horizons under the Rogers–Satchell target. Best, second-best, and third-best values within each index–horizon cell are marked by gold shading with bold type, peach shading with underline, and blue-gray shading, respectively.
| Index | Model | | | |
|---|
| S&P 500 | HAR | −9.0316 | −8.9372 | −8.8682 |
| | HV-22 | −8.9405 | −8.8601 | −8.7836 |
| | LightGBM | −8.7386 | −8.9022 | −8.8749 |
| | Wavelet-LightGBM | −8.7327 | −8.8937 | −8.8603 |
| Nasdaq-100 | HAR | −8.5635 | −8.4940 | −8.4316 |
| | HV-22 | −8.5049 | −8.4472 | −8.3873 |
| | LightGBM | −8.4476 | −8.5125 | −8.4621 |
| | Wavelet-LightGBM | −8.3827 | −8.5026 | −8.4923 |
| DJIA | HAR | −9.0021 | −8.9139 | −8.8548 |
| | HV-22 | −8.9398 | −8.8619 | −8.7869 |
| | LightGBM | −8.5186 | −8.8111 | −8.8636 |
| | Wavelet-LightGBM | −8.2222 | −8.8649 | −8.8934 |
Table 6.
Average QLIKE across the main and robustness settings. Ranking is applied within each row: gold shading with bold type marks the best model, peach shading with underline marks the second-best model, and blue-gray shading marks the third-best model.
Table 6.
Average QLIKE across the main and robustness settings. Ranking is applied within each row: gold shading with bold type marks the best model, peach shading with underline marks the second-best model, and blue-gray shading marks the third-best model.
| Setting | HAR | LightGBM | Wavelet-LightGBM |
|---|
| Main RS | −8.7885 | −8.6812 | −8.6494 |
| Expanded Data | −9.5949 | −9.5641 | −9.5704 |
| Parkinson | −9.3019 | −9.3091 | −9.3143 |
Table 7.
Main warning results under the Rogers–Satchell target. PR-AUC is maximized and Brier score is minimized within each index–horizon cell. Gold shading with bold type marks the best value, and peach shading with underline marks the second-best value.
Table 7.
Main warning results under the Rogers–Satchell target. PR-AUC is maximized and Brier score is minimized within each index–horizon cell. Gold shading with bold type marks the best value, and peach shading with underline marks the second-best value.
| Index | Model | PR(1) | Brier(1) | PR(5) | Brier(5) | PR(10) | Brier(10) |
|---|
| S&P 500 | Naive Threshold | 0.4895 | 0.0948 | 0.5041 | 0.0985 | 0.4474 | 0.1041 |
| | Logistic-Raw | 0.5275 | 0.1327 | 0.5048 | 0.1271 | 0.5530 | 0.1143 |
| Nasdaq-100 | Naive Threshold | 0.4652 | 0.1072 | 0.5141 | 0.1090 | 0.4496 | 0.1148 |
| | Logistic-Raw | 0.5496 | 0.1418 | 0.5960 | 0.1258 | 0.4852 | 0.1399 |
| DJIA | Naive Threshold | 0.4733 | 0.0954 | 0.4882 | 0.0972 | 0.4712 | 0.0998 |
| | Logistic-Raw | 0.4900 | 0.1402 | 0.5782 | 0.1069 | 0.4672 | 0.1112 |
Table 8.
Event-based warning timing summary using a five-day pre-event detection window. Hit rate and median lead are maximized; false-alarm days per event are minimized within each horizon. Gold shading with bold type marks the best value within each horizon and metric.
Table 8.
Event-based warning timing summary using a five-day pre-event detection window. Hit rate and median lead are maximized; false-alarm days per event are minimized within each horizon. Gold shading with bold type marks the best value within each horizon and metric.
| Horizon | Model | Hit Rate | Median Lead | FA/Event | Events |
|---|
| Logistic-Raw | 0.821 | 5.000 | 5.149 | 460 |
| | Naive-Threshold | 0.613 | 4.000 | 1.005 | 460 |
| Logistic-Raw | 0.597 | 5.000 | 16.558 | 134 |
| | Naive-Threshold | 0.649 | 3.833 | 3.641 | 134 |
| Logistic-Raw | 0.602 | 5.000 | 37.330 | 81 |
| | Naive-Threshold | 0.556 | 4.333 | 6.463 | 81 |
Table 9.
Three-step ablation test results. Columns report mean QLIKE for each model, the Diebold–Mariano statistic and p-value for Step 1 (HAR vs. Wavelet-LGB) and Step 2 (Wavelet-LGB vs. Spillover-LGB), and the Clark–West statistic and p-value for the nested Step-2 comparison. DM tests are two-tailed with Newey–West HAC; CW test is one-tailed (Ho: spillover carries no incremental information). Significance codes: *** , ** , * , . . QLIKE convention: lower values indicate better forecasting accuracy.
Table 9.
Three-step ablation test results. Columns report mean QLIKE for each model, the Diebold–Mariano statistic and p-value for Step 1 (HAR vs. Wavelet-LGB) and Step 2 (Wavelet-LGB vs. Spillover-LGB), and the Clark–West statistic and p-value for the nested Step-2 comparison. DM tests are two-tailed with Newey–West HAC; CW test is one-tailed (Ho: spillover carries no incremental information). Significance codes: *** , ** , * , . . QLIKE convention: lower values indicate better forecasting accuracy.
| | | Mean QLIKE | Step 1 DM | Step 2 DM | Step 2 CW (Nested) |
|---|
| Index | | HAR | Wav-LGB | Spill-LGB | | | | | | |
|---|
| DJIA | | | | | | *** | | | | * |
| | | | | | | * | | | | ** |
| | | | | | | | | | | 0.0816 . |
| Nasdaq-100 | | | | | | <0.0001 *** | | | | |
| | | | | | | | | | | |
| | | | | | | *** | | | | |
| S&P 500 | | | | | | ** | | | | * |
| | | | | | | | | | | * |
| | | | | | | | | | | <0.0001 *** |
Table 10.
QLIKE comparison of benchmark models and the HAR-LSTM deep-learning baseline under the same 1260-day rolling walk-forward protocol. The LSTM is trained on the three HAR features (1-, 5-, and 22-day realized variance) with 16 hidden units and chronological early stopping. Best values per index–horizon cell are marked by gold shading with bold type; second-best values are marked by peach shading with underline. Italics identify model names.
Table 10.
QLIKE comparison of benchmark models and the HAR-LSTM deep-learning baseline under the same 1260-day rolling walk-forward protocol. The LSTM is trained on the three HAR features (1-, 5-, and 22-day realized variance) with 16 hidden units and chronological early stopping. Best values per index–horizon cell are marked by gold shading with bold type; second-best values are marked by peach shading with underline. Italics identify model names.
| Index | Model | | | |
|---|
| S&P 500 | HAR | 0.5204 | 0.3168 | 0.3085 |
| | HV-22 | 0.6122 | 0.3939 | 0.3932 |
| | LightGBM | 0.8149 | 0.3518 | 0.3019 |
| | wavelet_lightgbm | 0.8210 | 0.3603 | 0.3165 |
| | LSTM | 0.6072 | 0.3946 | 0.3510 |
| Nasdaq-100 | HAR | 0.4798 | 0.2763 | 0.2733 |
| | HV-22 | 0.5386 | 0.3231 | 0.3176 |
| | LightGBM | 0.5955 | 0.2578 | 0.2428 |
| | wavelet_lightgbm | 0.6607 | 0.2677 | 0.2126 |
| | LSTM | 0.5176 | 0.3097 | 0.2910 |
| DJIA | HAR | 0.5169 | 0.3144 | 0.3048 |
| | HV-22 | 0.5795 | 0.3665 | 0.3727 |
| | LightGBM | 1.0023 | 0.4173 | 0.2960 |
| | wavelet_lightgbm | 1.3000 | 0.3634 | 0.2662 |
| | LSTM | 0.5957 | 0.3653 | 0.3430 |