Regime-Aware LightGBM for Stock Market Forecasting: A Validated Walk-Forward Framework with Statistical Rigor and Explainable AI Analysis
Abstract
1. Introduction
- 1.
- LightGBM Classifier with Rich Feature Engineering. We employ a LightGBM gradient boosting classifier operating on 63 normalized features spanning technical, macroeconomic, cross-asset (BTC), and market interaction categories. An ablation study demonstrates that cross-asset features contribute the most predictive value.
- 2.
- Rolling Hidden Markov Model Regime Detection. We introduce an online (rolling) Gaussian HMM that is refitted every 63 trading days using only past data, completely eliminating look-ahead bias. At each time step t, the HMM is trained exclusively on data from , ensuring that regime labels reflect only information available at the time of prediction.
- 3.
- Rigorous Walk-Forward Validation. We replace the static chronological split with a walk-forward cross-validation protocol employing 100 expanding-window folds, 10-day purge windows (equal to the prediction horizon), and per-fold scaler fitting to prevent any form of data leakage.
- 4.
- Comprehensive Statistical Validation. We provide block bootstrap confidence intervals for the Sharpe ratio (10,000 resamples with block size , preserving serial dependence), the Deflated Sharpe Ratio [9] to correct for the full strategy search space ( trials), Lo’s autocorrelation-adjusted Sharpe SE [11], probability calibration analysis via Brier score and Expected Calibration Error (ECE), a seven-variant ablation study, comparison against four baseline models (XGBoost, Logistic Regression, SMA crossover, time-series momentum), and multi-dimensional sensitivity analysis across transaction costs, prediction horizons, VIX thresholds, and sub-periods.
2. Related Work
2.1. Machine Learning in Financial Forecasting
2.2. Ensemble Methods and Model Combination
2.3. Regime Detection in Financial Markets
2.4. Deep Learning Approaches
2.5. Technical and Momentum Baselines
2.6. Cross-Asset Information and Bitcoin as a Leading Indicator
2.7. Statistical Validation in Quantitative Finance
3. Materials and Methods
3.1. Data
3.2. Feature Engineering
3.2.1. Technical Indicators (30+ Features)
- Trend: SMA ratios at 5 periods (5, 10, 20, 50, 200 days), EMA ratios (12, 26 days), normalized MACD (line, signal, histogram), ADX with directional indicators (+DI, –DI), Aroon oscillator, and Ichimoku Cloud components (Tenkan, Kijun ratios, cloud width).
- Momentum: RSI(14) with Wilder smoothing, Stochastic %K/%D(14,3), Williams %R, CCI(20), Rate of Change at 3 periods (5, 10, 20 days), and True Strength Index (TSI).
- Volatility: Bollinger Band width and %B position, ATR(14) normalized, and Keltner Channel width.
- Volume: OBV slope (10-day linear regression), Chaikin Money Flow (CMF-20), volume ratio, and VWAP deviation.
3.2.2. Price-Derived Features
3.2.3. Cross-Asset Features (BTC Leading Indicator)
3.2.4. Market Interaction Features (SPY)
3.2.5. Macroeconomic Features
- VIX Term Structure: VIX level, VIX percentage change (5, 10 days), VIX3M level, and the term structure ratio , where values below zero indicate market panic (backwardation). The VIX term structure is a well-documented predictor of future volatility and equity returns [8].
- Credit Spreads: HYG/LQD ratio (credit risk proxy), 5-day and 20-day percentage changes, and mean reversion ratio. Credit spreads capture systemic risk and have been shown to predict equity returns, particularly during stress periods.
- Yield Curve: TLT/SHY ratio (curve steepness proxy) and its trend relative to the 50-day moving average. The yield curve slope is a leading indicator of economic activity and has historically predicted equity market returns with a lag of 6–12 months.
- Safe Haven Flows (Gold/Equity Ratio): Gold (GLD) percentage change, US Dollar Index (UUP) percentage change, and crucially, the GLD/SPY ratio (gold-to-equity ratio). This ratio captures “flight to safety” dynamics: when investors rotate from equities to gold, the ratio rises, often signaling capitulation that precedes equity market rebounds [16]. The SHAP analysis (Section 4.11) confirms that the gold/equity ratio is among the top 3 global predictors, particularly effective in identifying contrarian buying opportunities after panic-driven selloffs.
3.3. Target Definition
3.4. Model Architecture: LightGBM Classifier
Hyperparameter Selection
3.5. Regime Detection via Rolling Hidden Markov Model
3.5.1. Model Selection: Number of States
3.5.2. HMM Features
- 1.
- 20-day rolling returns;
- 2.
- 20-day annualized volatility ();
- 3.
- VIX level (normalized by 100);
- 4.
- Market breadth proxy: fraction of positive daily returns in a 20-day window.
3.5.3. Rolling Fitting Procedure
- 1.
- The HMM is fitted exclusively on the feature matrix (data from the start to day t);
- 2.
- To reduce computational cost, the model is refitted every trading days (approximately one quarter), and the most recent fitted model is cached between refits;
- 3.
- The regime at time t is decoded using the Viterbi algorithm on a 120-day context window ;
- 4.
- States are mapped to regimes by sorting mean returns: the state with the lowest mean return is bear (0), middle is sideways (1), highest is bull (2).
- Bull regime: Mean annualized volatility , VIX typically <20, positive 20-day returns;
- Sideways regime: Mean annualized volatility , VIX typically 20–30, near-zero 20-day returns;
- Bear regime: Mean annualized volatility , VIX typically >30, negative 20-day returns.
3.6. Walk-Forward Validation with Purge and Embargo
- Number of folds: 100;
- Minimum training window: 504 days (≈2 years);
- Test window: 21 days (≈1 month) per fold;
- Step size: 21 days between consecutive folds;
- Purge window: 10 days (equal to the prediction horizon h);
- Embargo: 0 days.
- 1.
- The feature scaler (RobustScaler) is fitted exclusively on the training data and used to transform the test data;
- 2.
- The target variable uses forward-shifted returns (shift ), computed before the train/test split;
- 3.
- The purge window of h days between the training and test sets ensures that no training sample has a target label that overlaps temporally with the test period.
3.7. Swing Trading Backtest
3.7.1. Signal Generation with Hysteresis
- Entry (BUY): Smoothed and 3-day momentum is positive;
- Normal Exit: Smoothed and minimum 30-day holding period is satisfied;
- Trailing Stop: Gain from entry exceeds 15%, drop from peak exceeds 10%, and model confidence is declining;
- Emergency Exit: Loss exceeds 20% and raw .
3.7.2. VIX Emergency Override
3.7.3. Transaction Costs
3.8. Statistical Validation Framework
3.8.1. Block Bootstrap Sharpe Ratio Confidence Intervals
3.8.2. Deflated Sharpe Ratio
3.8.3. Bootstrap Alpha Test
3.9. Probability Calibration Assessment
- Brier Score: , where is the predicted probability and the true label. indicates perfect calibration; corresponds to a random coin flip on a balanced binary problem.
- Expected Calibration Error (ECE): , where samples are binned into uniform bins and , are the average predicted and actual probabilities in bin b. ECE = 0 indicates perfect calibration.
- Reliability Diagram: A visual representation of calibration, plotting the observed fraction of positive outcomes against the predicted probability for each bin. A perfectly calibrated model lies on the diagonal.
3.10. Ablation Study Design
3.11. Sensitivity Analysis Design
- 1.
- Transaction costs: 0, 5, 10, 20, and 50 basis point total cost (commission + slippage);
- 2.
- Entry/exit thresholds: Grid search over entry and exit ;
- 3.
- Prediction horizon: trading days;
- 4.
- VIX emergency threshold: ;
- 5.
- Sub-period analysis: 2015–2019 (pre-COVID), 2020–2022 (pandemic/bear), 2023–2025 (recovery/AI boom).
3.12. Baseline Models
- 1.
- XGBoost Classifier [26]: Same hyperparameter structure (800 rounds, depth 6, learning rate 0.03) and same 63-feature input as LightGBM. Tests whether the choice of boosting framework matters.
- 2.
- Logistic Regression: The simplest possible ML baseline (linear decision boundary). Tests whether the non-linearity of tree-based models contributes predictive value.
- 3.
- SMA 50/200 Crossover: A classical technical trading rule that generates when the 50-day SMA exceeds the 200-day SMA. No ML involved.
- 4.
- Time-Series Momentum [21]: The 12-month return minus 1-month return. Generates when momentum is positive. No ML involved.
4. Results
4.1. Overall Performance: 51-Stock Universe
- 9 out of 51 stocks (17.6%) generated positive alpha over Buy-and-Hold;
- 12 out of 51 stocks (23.5%) achieved win rates ;
- The median time in market was 94.2%, indicating the system favors staying invested and exits only during detected danger periods;
- The median number of trades per stock was 22 over the full period, consistent with a swing trading frequency of approximately 2 trades per year.
4.2. Out-of-Sample Validation: S&P 500 Universe
4.3. Regime-Specific Performance
4.4. Statistical Significance Analysis
4.4.1. Portfolio-Level Statistical Significance
- Portfolio Sharpe ratio: 1.184;
- 95% block bootstrap CI: [0.526, 1.840];
- p-value (SR > 0): <;
- Deflated Sharpe Ratio (): 0.686 (not significant at 5%, but substantially above zero).
4.4.2. Cross-Sectional Top-K Portfolio Construction
4.5. Ablation Study Results
- 1.
- BTC cross-asset features are the most valuable component. Removing BTC features (variant F) produces the largest Sharpe ratio degradation () and strategy return decline (−319.9%), and this is the only variant where the Sharpe ratio p-value exceeds 0.05. This confirms the hypothesis that Bitcoin acts as a leading indicator for high-beta technology stocks.
- 2.
- The rolling HMM regime feature provides stock-dependent value. Across the full universe, removing the HMM regime feature (variant B) reduces the Sharpe ratio by −0.030. However, the effect is highly stock-dependent: the regime feature substantially helps crypto-correlated stocks (MSTR: +1257% strategy return with regime vs. without; SMCI: +621%; DKNG: +60%) but hurts others (AMD: −948%; TSLA: −602%). This suggests that the rolling HMM is most valuable for stocks with strong regime sensitivity.
- 3.
- Macroeconomic features contribute moderate but consistent value. Removing macro features (variant E) decreases the Sharpe ratio by −0.051 and pushes the p-value to borderline significance (0.045), indicating that macroeconomic conditioning provides a meaningful but not dominant contribution.
- 4.
- VIX override trades return for safety. Removing the VIX circuit breaker (variant D) increases both the Sharpe ratio (+0.060) and strategy returns (+363.4%). This finding suggests that during VIX spikes, the model’s signals remain profitable on average—the override sacrifices returns for tail-risk protection. In practice, this is a risk management decision rather than a prediction accuracy issue.
- 5.
- Post-processing has minimal aggregate impact. Variants C and G (no post-processing) achieve slightly higher Sharpe ratios (+0.012) than the full pipeline, suggesting that the adaptive threshold mechanism captures most of the signal value without requiring EMA smoothing, trailing stops, or minimum hold periods.
4.6. Baseline Model Comparison
- 1.
- LightGBM outperforms all alternative models in terms of the Sharpe ratio. The full LightGBM pipeline achieves a Sharpe of 0.938 vs. 0.795 for XGBoost (+18.0%), 0.743 for Logistic Regression (+26.2%), 0.609 for SMA crossover (+54.0%), and 0.314 for momentum (+198.7%). The advantage over XGBoost (18%) is meaningful, suggesting that LightGBM’s histogram-based split finding and leaf-wise growth provide tangible benefits over XGBoost’s level-wise approach for this feature set.
- 2.
- All ML models outperform technical baselines. The three ML classifiers (LightGBM, XGBoost, Logistic) all achieve substantially higher Sharpe ratios than the SMA crossover and momentum baselines, providing evidence that the ML approach captures information beyond what simple technical rules exploit. Even Logistic Regression (Sharpe 0.743) outperforms the SMA crossover (0.609), suggesting the feature set contains genuine predictive information accessible to linear models.
- 3.
- Technical baselines exhibit lower drawdowns but also lower returns. The SMA crossover achieves a maximum drawdown of −34.1% vs. −67.9% for LightGBM, reflecting the trend-following nature of moving average strategies that naturally exit during sustained declines. However, SMA’s total return (325.5%) is far lower than LightGBM (2715.9%). This trade-off motivates the volatility-targeting extension discussed in Section 4.9.
4.7. Probability Calibration Analysis
4.8. Sensitivity Analysis
4.8.1. Transaction Cost Sensitivity
4.8.2. Prediction Horizon Sensitivity
4.8.3. Sub-Period Analysis
- 2015–2019 (pre-COVID): Positive alpha (+10.9%) with Sharpe 0.684. The relatively stable macro environment favored the model’s learned patterns.
- 2020–2022 (pandemic/bear): Negative alpha (−56.3%) with Sharpe 0.524. The unprecedented volatility and rapid regime shifts challenged the model, though the Sharpe ratio remained positive.
- 2023–2025 (recovery/AI boom): The strongest period, with Sharpe 1.135, though negative alpha (−181.9%) indicates the model could not fully capture the extraordinary AI-driven rally of these high-momentum stocks.
4.8.4. Threshold Robustness
4.9. Volatility Targeting and Drawdown Reduction
4.10. Walk-Forward Accuracy and Current Signals
4.11. Explainability Analysis: SHAP Values Across Regimes
4.11.1. Global Feature Importance
4.11.2. Directional Impact: SHAP Summary Plot
- Yield Curve (macro_yield_curve): High values (steep yield curve, TLT/SHY high) are associated with positive SHAP values, indicating the model interprets a steep yield curve as bullish for equities—consistent with monetary policy theory where an upward-sloping curve signals economic expansion.
- Gold/Equity Ratio (macro_gold_vs_mkt): Extreme high values (spikes in gold relative to equities) produce strong positive SHAP outliers, suggesting the model identifies “flight to safety” reversals—when gold spikes relative to stocks, the model anticipates a mean reversion recovery in equities.
- Momentum (mom_3m): Low momentum values (blue) push SHAP values positive, confirming a mean reversion strategy where the model buys after sustained declines, consistent with the findings of our previous XAI analysis [5].
4.11.3. Regime-Dependent Feature Importance
- Bear Regime: The model prioritizes dist_sma_200 (mean |SHAP| = 0.498), macro_ yield_curve (0.452), and mom_3m (0.356). This represents a mean reversion/macro safety strategy: during downturns, the model assesses whether the stock has deviated sufficiently from its long-term average and whether the yield curve signals economic stability.
- Sideways Regime: The dominant feature is macro_yield_curve (0.583), followed by dist_sma_200 (0.362) and the regime variable itself (0.265). The model adopts a macro-driven patience approach, heavily relying on the macroeconomic backdrop to distinguish between temporary consolidation and regime transitions.
- Bull Regime: The yield curve remains important (0.434), but macro_gold_vs_mkt (0.380) and mkt_beta_63 (0.290) gain prominence. This reflects a risk appetite/beta strategy: during bull markets, the model monitors safe-haven flows (gold vs. equities) and market sensitivity (beta) to time entries, consistent with the observation that high-beta stocks amplify market movements.
4.11.4. Local Interpretability: Case Studies
Successful Prediction (True Positive)
- Primary driver: macro_gold_vs_mkt contributed +2.11 to the log-odds, indicating an elevated gold-to-equity ratio that the model interprets as a contrarian buying opportunity after a flight to safety.
- Supporting factors: Negative 3-month momentum (mom_3m = −0.994, contributing +0.51) and a large negative deviation from the 200-day SMA (dist_sma_200 = −1.495, contributing +0.35) confirmed oversold conditions.
- Macro confirmation: The yield curve ratio contributed +0.28, indicating a supportive macroeconomic environment for recovery.
Failed Prediction (False Positive)
- Conflicting signals: While dist_sma_50 (+0.71) and mom_3m (+0.43) pushed the probability up (suggesting oversold conditions), the market correlation (mkt_corr_63 = 0.814, contributing −0.43) and the regime variable (regime = −1, contributing −0.36) pushed it down.
- Regime warning: The negative regime contribution indicates that the model detected a deteriorating market environment, but the oversold technical signals overrode this warning.

5. Discussion
5.1. Statistical Significance and Multiple Testing
5.2. Ablation Insights: What Actually Drives Performance?
- The rolling HMM is most valuable for stocks with strong regime sensitivity, where the causal regime label provides information that the model cannot extract from other features;
- For stocks like AMD and TSLA, where the model’s feature set (VIX levels, market returns, credit spreads) already captures regime-relevant information, the noisy rolling HMM label introduces more noise than signals;
- This finding suggests a practical approach: use regime conditioning selectively for stocks with high regime sensitivity (e.g., crypto-correlated or high-beta names) while omitting it for stocks where technical/macro features provide sufficient conditioning.
5.3. Macroeconomic Features as Primary Predictors
5.4. Accuracy vs. Profitability
- 1.
- The model’s raw predictions are smoothed (EMA-5) to filter out noisy fluctuations;
- 2.
- The hysteresis mechanism (entry/exit thresholds with a dead zone) prevents whipsaw trading on marginal signals;
- 3.
- The 30-day minimum holding period forces the system to ride out short-term volatility within broader trends;
- 4.
- Adaptive thresholds (percentile-based) automatically calibrate to the stock-specific signal distribution.
5.5. Robustness and Sensitivity
- The strategy is cost-robust: even at 50 bps total cost (5× baseline), the Sharpe ratio declines by only 7.3%;
- The adaptive threshold mechanism eliminates sensitivity to initial parameter choices, with the Sharpe ratio varying by only 0.002 across the full 5×5 threshold grid;
- The Sharpe ratio remains positive across all three sub-periods (0.524 to 1.135), though alpha is negative during the high-momentum 2020–2022 and 2023–2025 periods when the strategy underperformed Buy-and-Hold on these explosive growth stocks.
5.6. Limitations
- 1.
- Survivorship Bias: The 51-stock universe was selected from current (February 2026) NASDAQ-100 constituents, potentially excluding delisted or demoted stocks that may have performed poorly during the backtest period (2015–2026). This is a well-known and potentially severe source of upward bias in equity backtests [29]. Stocks that are currently in the NASDAQ-100 are, by definition, those that have survived and generally appreciated in value—creating a favorable selection bias that inflates both the strategy’s and the benchmark’s returns. The S&P 500 scan (Section 4.2) partially mitigates this concern by testing on a broader universe, but it also uses current constituents. A fully survivorship-bias-free evaluation would require reconstructing the historical index composition at each rebalance date, including stocks that were subsequently delisted, acquired, or demoted. This is a critical limitation that could materially affect the reported performance metrics, and we prioritize it as future work.
- 2.
- Partial Alpha Generation: Only 9 out of 51 NASDAQ-100 stocks (17.6%) outperformed Buy-and-Hold on a total return basis, a rate confirmed by the S&P 500 scan (33/199, 16.6%). For the remaining stocks, the system generated positive Sharpe ratios but lagged the benchmark on cumulative returns, suggesting that the strategy is most effective for stocks with pronounced regime sensitivity (e.g., SMCI, MSTR, CTAS) rather than for the broader universe. The consistent ∼17% alpha-positive rate across two independent universes suggests this is a structural characteristic of the framework rather than a sampling artifact.
- 3.
- Ablation Universe: The ablation study was conducted on seven representative tickers. While these span mega-cap (NVDA, AVGO), high-beta (AMD, TSLA, SMCI), crypto-correlated (MSTR, DKNG), and growth (AVGO) categories, extending the ablation to the full 51-stock universe would provide a more comprehensive component valuation.
- 4.
- Rolling HMM Limitations: The online regime detection, while free of look-ahead bias, produces noisier regime labels than the offline alternative. The ablation study shows that the rolling HMM helps crypto-correlated stocks but hurts momentum-driven stocks, suggesting that a stock-adaptive regime conditioning approach (e.g., only applying regime features when they improve validation performance) should be explored.
- 5.
- Sub-Period Variation: While the strategy maintains positive Sharpe ratios across all sub-periods (range: 0.524 to 1.135 across three stocks), alpha relative to Buy-and-Hold varies substantially, with the strongest relative performance in stable markets (2015–2019) and the weakest during high-momentum periods (2023–2025).
- 6.
- Large Maximum Drawdowns: The base strategy exhibits maximum drawdowns of −60% to −93% on the most volatile stocks (SMCI, MSTR, ENPH), which are inconsistent with any practical risk management framework. While the volatility-targeting overlay (Section 4.9) reduces drawdowns to approximately −30% to −42%, these remain substantial. Practitioners should combine the strategy with explicit position-sizing limits, stop-loss thresholds, and portfolio-level risk budgets rather than relying on the strategy’s exit signals alone.
6. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Fama, E.F. Efficient capital markets: A review of theory and empirical work. J. Financ. 1970, 25, 383–417. [Google Scholar] [CrossRef]
- Gu, S.; Kelly, B.; Xiu, D. Empirical asset pricing via machine learning. Rev. Financ. Stud. 2020, 33, 2223–2273. [Google Scholar] [CrossRef]
- Fischer, T.; Krauss, C. Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 2018, 270, 654–669. [Google Scholar] [CrossRef]
- Jiang, W. Applications of deep learning in stock market prediction: Recent progress. Expert Syst. Appl. 2021, 184, 115537. [Google Scholar] [CrossRef]
- Pagliaro, A. Forecasting significant stock market price changes using machine learning: Extra Trees classifier leads. Electronics 2023, 12, 4551. [Google Scholar] [CrossRef]
- Pagliaro, A. Artificial intelligence vs. efficient markets: A critical reassessment of predictive models in the big data era. Electronics 2025, 14, 1721. [Google Scholar] [CrossRef]
- Hamilton, J.D. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 1989, 57, 357–384. [Google Scholar] [CrossRef]
- Ang, A.; Bekaert, G. International asset allocation with regime shifts. Rev. Financ. Stud. 2002, 15, 1137–1187. [Google Scholar] [CrossRef]
- Bailey, D.H.; López de Prado, M. The Deflated Sharpe Ratio: Correcting for selection bias, backtest overfitting, and non-normality. J. Portf. Manag. 2014, 40, 94–107. [Google Scholar] [CrossRef]
- Harvey, C.R.; Liu, Y.; Zhu, H. … and the cross-section of expected returns. Rev. Financ. Stud. 2016, 29, 5–68. [Google Scholar] [CrossRef]
- Lo, A.W. The statistics of Sharpe ratios. Financ. Anal. J. 2002, 58, 36–52. [Google Scholar] [CrossRef]
- Khaidem, L.; Saha, S.; Dey, S.R. Predicting the direction of stock market prices using random forest. arXiv 2016, arXiv:1605.00003. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 3146–3154. [Google Scholar]
- Zhang, K.; Zhong, G.; Dong, J.; Wang, S.; Wang, Y. Stock market prediction based on generative adversarial network. Procedia Comput. Sci. 2019, 147, 400–406. [Google Scholar] [CrossRef]
- Sezer, O.B.; Gudelek, M.U.; Ozbayoglu, A.M. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Appl. Soft Comput. 2020, 90, 106181. [Google Scholar] [CrossRef]
- Kumar, D.; Pawar, P.P.; Addula, S.R.; Meesala, M.K.; Oni, O.; Cheema, Q.N. A smart optimization model for reliable signal detection in financial markets using ELM and blockchain technology. FinTech 2025, 4, 56. [Google Scholar] [CrossRef]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
- Oreshkin, B.N.; Carpov, D.; Chapados, N.; Bengio, Y. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. In Proceedings of the International Conference on Learning Representations (ICLR); OpenReview.net: Online, 2020. [Google Scholar]
- Lo, A.W.; Mamaysky, H.; Wang, J. Foundations of technical analysis: Computational algorithms, statistical inference, and empirical implementation. J. Financ. 2000, 55, 1705–1765. [Google Scholar] [CrossRef]
- Moskowitz, T.J.; Ooi, Y.H.; Pedersen, L.H. Time series momentum. J. Financ. Econ. 2012, 104, 228–250. [Google Scholar] [CrossRef]
- Corbet, S.; Meegan, A.; Larkin, C.; Lucey, B.; Yarovaya, L. Exploring the dynamic relationships between cryptocurrencies and other financial assets. Econ. Lett. 2018, 165, 28–34. [Google Scholar] [CrossRef]
- Politis, D.N.; Romano, J.P. The stationary bootstrap. J. Am. Stat. Assoc. 1994, 89, 1303–1313. [Google Scholar] [CrossRef]
- Chong, T.T.L.; Ng, W.K. Technical analysis and the London stock exchange: Testing the MACD and RSI rules using the FT30. Appl. Econ. Lett. 2008, 15, 1111–1114. [Google Scholar] [CrossRef]
- Zadrozny, B.; Elkan, C. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2002; pp. 694–699. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 4768–4777. [Google Scholar]
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
- Brown, S.J.; Goetzmann, W.; Ibbotson, R.G.; Ross, S.A. Survivorship bias in performance studies. Rev. Financ. Stud. 1992, 5, 553–580. [Google Scholar] [CrossRef]





| Indicator | Ticker | Role |
|---|---|---|
| VIX (Short-Term Volatility) | ^ VIX | Fear gauge, emergency cutoff |
| VIX3M (Medium-Term Volatility) | ^ VIX3M | Term structure analysis |
| S&P 500 ETF | SPY | Market benchmark, beta computation |
| Bitcoin | BTC-USD | Leading indicator for crypto-proxy stocks |
| High Yield Bonds | HYG | Credit risk/risk-on appetite |
| Investment Grade Bonds | LQD | Credit spread reference |
| 20-Year Treasury Bonds | TLT | Long-term interest rate proxy |
| 1–3 Year Treasury Bonds | SHY | Short-term rate, yield curve proxy |
| Gold | GLD | Safe-haven demand |
| US Dollar Index | UUP | Global liquidity conditions |
| K | Log-Lik | Params | BIC | AIC |
|---|---|---|---|---|
| 2 | 19,074 | 31 | −37,902 | −38,086 |
| 3 | 20,273 | 50 | −40,149 | −40,446 |
| 4 | 20,941 | 71 | −41,319 | −41,741 |
| 5 | 20,786 | 94 | −40,826 | −41,384 |
| 6 | 22,066 | 119 | −43,189 | −43,894 |
| Variant | Description | Component Removed |
|---|---|---|
| A | Full pipeline (baseline) | None |
| B | No HMM regime feature | Rolling HMM regime label |
| C | No post-processing | EMA smoothing, hysteresis, trailing stop, min-hold |
| D | No VIX override | VIX > 40 emergency exit |
| E | No macro features | All macro_* and vix_* features |
| F | No BTC features | All btc_* cross-asset features |
| G | Simple threshold | All post-processing + VIX override |
| # | Ticker | Strat % | B&H % | Alpha % | Sharpe | WR % | MaxDD % |
|---|---|---|---|---|---|---|---|
| 1 | SMCI * | 2102.9 | 1493.0 | +610.0 | 0.816 | 65.5 | −84.5 |
| 2 | NVDA | 1920.5 | 3529.8 | −1609.3 | 0.906 | 70.8 | −64.5 |
| 3 | MSTR * | 1479.6 | 1133.3 | +346.4 | 0.754 | 70.6 | −86.9 |
| 4 | ENPH | 1446.7 | 2479.1 | −1032.3 | 0.753 | 66.7 | −92.5 |
| 5 | AMD | 1187.2 | 2262.9 | −1075.6 | 0.759 | 66.7 | −66.4 |
| 6 | TSLA | 977.9 | 1931.8 | −953.9 | 0.701 | 65.2 | −75.5 |
| 7 | KLAC | 934.3 | 1289.3 | −354.9 | 0.788 | 69.2 | −44.4 |
| 8 | MU | 712.5 | 801.9 | −89.5 | 0.682 | 76.0 | −60.9 |
| 9 | AVGO | 658.8 | 1112.7 | −454.0 | 0.714 | 66.7 | −46.7 |
| 10 | MELI | 451.5 | 789.4 | −337.9 | 0.566 | 63.0 | −71.4 |
| 11 | PANW | 446.6 | 634.4 | −187.8 | 0.624 | 72.4 | −42.3 |
| 12 | COST | 422.0 | 510.4 | −88.4 | 0.831 | 72.2 | −28.8 |
| 13 | MSFT | 410.0 | 460.2 | −50.2 | 0.702 | 59.1 | −40.3 |
| 14 | NOW | 373.5 | 426.7 | −53.2 | 0.556 | 64.0 | −55.1 |
| 15 | AAPL | 372.3 | 486.9 | −114.6 | 0.634 | 69.0 | −35.5 |
| Ticker | Sector | Alpha % | Sharpe | WR % | MaxDD % |
|---|---|---|---|---|---|
| CTAS | Industrials (Uniform Rental) | +181.0 | 0.885 | 68.2 | −26.6 |
| JPM | Financials (Banking) | +94.3 | 0.712 | 63.5 | −38.1 |
| DE | Industrials (Agriculture Equip) | +72.8 | 0.654 | 61.9 | −41.2 |
| ISRG | Healthcare (Surgical Robotics) | +68.5 | 0.621 | 62.7 | −35.8 |
| TMO | Healthcare (Life Sciences) | +53.2 | 0.578 | 61.1 | −39.4 |
| Ticker | Strat % | B&H % | Alpha % | Sharpe | Days | InMkt % |
|---|---|---|---|---|---|---|
| AMD | 1134.3 | 1154.8 | −20.5 | 2.083 | 672 | 97.8 |
| SMCI | 546.9 | 376.7 | +170.2 | 1.274 | 672 | 94.5 |
| TSLA | 388.0 | 401.5 | −13.5 | 1.347 | 672 | 95.5 |
| NVDA | 333.0 | 438.1 | −105.1 | 1.440 | 672 | 95.1 |
| MSTR | 325.1 | 298.0 | +27.1 | 1.075 | 672 | 96.1 |
| GOOGL | 310.8 | 305.6 | +5.2 | 2.168 | 672 | 98.4 |
| MELI | 268.4 | 254.0 | +14.4 | 1.252 | 672 | 99.1 |
| AAPL | 264.4 | 284.6 | −20.2 | 2.192 | 672 | 98.5 |
| MRVL | 219.7 | 164.4 | +55.3 | 1.032 | 672 | 92.1 |
| MU | 196.3 | 311.0 | −114.8 | 1.046 | 672 | 90.0 |
| Ticker | Strat % | B&H % | Alpha % | Sharpe | Days | InMkt % |
|---|---|---|---|---|---|---|
| ENPH | 633.4 | 490.3 | +143.1 | 1.344 | 635 | 89.1 |
| DKNG | 1.7 | −24.2 | +25.9 | 0.252 | 281 | 81.5 |
| MSTR | 50.7 | 43.8 | +6.9 | 0.550 | 635 | 95.0 |
| GILD | 83.8 | 83.4 | +0.5 | 0.827 | 635 | 93.2 |
| MRNA | −70.4 | −64.7 | −5.7 | −1.278 | 315 | 99.0 |
| DDOG | −37.7 | −28.9 | −8.8 | −0.749 | 252 | 93.3 |
| COIN | −50.9 | −38.7 | −12.2 | −1.762 | 136 | 77.2 |
| DASH | 14.9 | 29.1 | −14.2 | 0.659 | 140 | 87.9 |
| PEP | −8.3 | 7.1 | −15.3 | −0.350 | 635 | 93.4 |
| COST | 10.7 | 27.6 | −16.9 | 0.076 | 635 | 89.3 |
| Ticker | Sharpe | 95% CI (Block) | DSR (N = 250) | Lo SE | |
|---|---|---|---|---|---|
| AMD | 1.016 | [0.305, 1.678] | 0.003 | 0.524 | 0.345 |
| TSLA | 0.874 | [0.194, 1.573] | 0.006 | 0.353 | 0.359 |
| NVDA | 0.923 | [0.246, 1.623] | 0.003 | 0.396 | 0.350 |
| Portfolio | 1.184 | [0.526, 1.840] | <0.001 | 0.686 | 0.335 |
| K | Return % | Ann. % | Sharpe | MaxDD % | Turnover | Tickers |
|---|---|---|---|---|---|---|
| 3 | 382.0 | 41.8 | 0.847 | −65.0 | 0.083 | HOOD, AMD, NVDA |
| 5 | 332.8 | 38.4 | 0.816 | −63.9 | 0.094 | +TSLA |
| 10 | 414.9 | 43.8 | 1.038 | −38.6 | 0.089 | +COST, KLAC, SMCI, MSFT, AVGO |
| Variant | Strat % | Sharpe | WR % | ΔSharpe | |
|---|---|---|---|---|---|
| D: No VIX override | 1580.5 | 0.830 | 83.3 | 0.016 | +0.060 |
| C: No post-processing | 1680.2 | 0.782 | 72.0 | 0.037 | +0.012 |
| G: Simple threshold | 1680.2 | 0.782 | 72.0 | 0.037 | +0.012 |
| A: Full pipeline | 1217.2 | 0.770 | 67.1 | 0.023 | — |
| B: No HMM regime | 1124.4 | 0.740 | 68.7 | 0.035 | −0.030 |
| E: No macro features | 1229.9 | 0.719 | 69.8 | 0.045 | −0.051 |
| F: No BTC features | 897.2 | 0.670 | 66.7 | 0.052 | −0.100 |
| Model | Sharpe | Return % | MaxDD % | WR % | Trades | OOF Acc. |
|---|---|---|---|---|---|---|
| LightGBM (ours) | 0.938 | 2715.9 | −67.9 | 75.7 | 24 | 55.7% |
| XGBoost | 0.795 | 1551.2 | −69.6 | 69.5 | 27 | 52.2% |
| Logistic Regression | 0.743 | 1154.0 | −68.8 | 62.9 | 27 | 51.5% |
| SMA 50/200 Crossover | 0.609 | 325.5 | −34.1 | 87.5 | 6 | N/A |
| Momentum 12–1 | 0.314 | 122.4 | −55.8 | 65.0 | 15 | N/A |
| Metric | Value |
|---|---|
| Brier Score | 0.256 |
| ECE | 0.066 |
| Random Baseline (Brier) | 0.250 |
| Metric | Uncalibrated | Isotonic |
|---|---|---|
| Brier Score | 0.254 | 0.279 |
| ECE | 0.059 | 0.135 |
| Sharpe Ratio | 0.938 | 0.834 |
| Cost (bps) | Strat % | Sharpe | MaxDD % | WR % | Trades |
|---|---|---|---|---|---|
| 0 | 1476.7 | 0.806 | −68.6 | 67.6 | 25.7 |
| 5 | 1437.5 | 0.800 | −68.7 | 67.6 | 25.7 |
| 10 | 1399.2 | 0.794 | −68.7 | 67.6 | 25.7 |
| 20 | 1325.5 | 0.783 | −68.9 | 67.6 | 25.7 |
| 50 | 1125.2 | 0.747 | −69.2 | 67.6 | 25.7 |
| Horizon | Accuracy | Strat % | Sharpe | MaxDD % | WR % |
|---|---|---|---|---|---|
| 5 days | 55.9% | 1584.0 | 0.819 | −69.3 | 69.5 |
| 10 days | 55.8% | 1361.9 | 0.788 | −68.8 | 67.6 |
| 15 days | 56.1% | 1100.7 | 0.746 | −69.1 | 73.4 |
| Period | Strat % | B&H % | Alpha % | Sharpe | MaxDD % |
|---|---|---|---|---|---|
| 2015–2019 | 129.9 | 119.1 | +10.9 | 0.684 | −49.3 |
| 2020–2022 | 112.0 | 168.3 | −56.3 | 0.524 | −68.5 |
| 2023–2025 | 402.5 | 584.5 | −181.9 | 1.135 | −54.0 |
| Ticker | MaxDD (Base) | MaxDD (vol-tgt) | Sharpe (Base) | Sharpe (vol-tgt) |
|---|---|---|---|---|
| SMCI | −84.5% | −35.4% | 0.833 | 0.575 |
| MSTR | −84.5% | −29.3% | 0.622 | 0.260 |
| NVDA | −64.5% | −24.3% | 0.923 | 0.819 |
| AMD | −63.8% | −22.8% | 1.018 | 0.832 |
| TSLA | −75.5% | −26.7% | 0.874 | 0.661 |
| Ticker | OOF Acc. % | Folds | Sharpe | WR % | Strat % | B&H % |
|---|---|---|---|---|---|---|
| SMCI | 55.6 | 99 | 0.816 | 65.5 | 2102.9 | 1493.0 |
| NVDA | 59.9 | 99 | 0.906 | 70.8 | 1920.5 | 3529.8 |
| MSTR | 52.4 | 99 | 0.754 | 70.6 | 1479.6 | 1133.3 |
| AMD | 53.8 | 99 | 0.759 | 66.7 | 1187.2 | 2262.9 |
| TSLA | 53.8 | 99 | 0.701 | 65.2 | 977.9 | 1931.8 |
| AVGO | 58.0 | 99 | 0.714 | 66.7 | 658.8 | 1112.7 |
| DKNG | 50.1 | 44 | 0.737 | 64.3 | 193.1 | 115.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Pagliaro, A. Regime-Aware LightGBM for Stock Market Forecasting: A Validated Walk-Forward Framework with Statistical Rigor and Explainable AI Analysis. Electronics 2026, 15, 1334. https://doi.org/10.3390/electronics15061334
Pagliaro A. Regime-Aware LightGBM for Stock Market Forecasting: A Validated Walk-Forward Framework with Statistical Rigor and Explainable AI Analysis. Electronics. 2026; 15(6):1334. https://doi.org/10.3390/electronics15061334
Chicago/Turabian StylePagliaro, Antonio. 2026. "Regime-Aware LightGBM for Stock Market Forecasting: A Validated Walk-Forward Framework with Statistical Rigor and Explainable AI Analysis" Electronics 15, no. 6: 1334. https://doi.org/10.3390/electronics15061334
APA StylePagliaro, A. (2026). Regime-Aware LightGBM for Stock Market Forecasting: A Validated Walk-Forward Framework with Statistical Rigor and Explainable AI Analysis. Electronics, 15(6), 1334. https://doi.org/10.3390/electronics15061334
