Modeling Time-Varying Volatility via Multi-Scale Structures and Dynamic Attention Networks: Evidence from High-Frequency Data
Abstract
1. Introduction
2. Materials and Methods
2.1. Data and Preprocessing
2.1.1. Intraday Data and Trading Session
2.1.2. Daily Realized Measures
2.1.3. Descriptive Statistics
2.1.4. No Look-Ahead Design and OOS Protocol
2.1.5. Multifractal Dynamics Evidence and Motivation for Attention
2.1.6. Exploratory Visualization
2.2. Methodology
2.2.1. Variable Nomenclature (For Non-Finance Readers)
2.2.2. VaR as a Conditional Quantile and Pinball Loss
2.2.3. Approximation of Fractional Integration via HAR Tokenization
2.2.4. SA-HAR-J-Net: BiLSTM with Time Self-Attention and Bounded Output
2.2.5. Rolling OOS Training with Periodic Refitting
- Fix a start date t0, lookback length L, and refit interval K (in prediction days).
- For each prediction day t = t0, t0 + 1,…:
2.3. Rolling OOS Protocol and Reproducibility
2.3.1. SA-HAR-J-Net Configuration
2.3.2. Benchmark Model Specifications
3. Results
3.1. Main Predictive Performance

3.2. Multi-Quantile Analysis
3.3. Robustness: Historical Simulation Window Length
3.4. Robustness Checks
3.5. Interpretability and Economic Drivers
4. Discussion
4.1. Strong Persistence, Multi-Scale Structure, and Implications for Modeling 410
4.2. Why Adaptive Reweighting Can Matter for Tail Risk
4.3. Volatility-Scaled Bounding and Market Microstructure Considerations
4.4. Limitations and Future Directions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Supplementary Results: Tail Event Ranking Metrics
| 1% Tail Events | 5% Tail Events | |||
|---|---|---|---|---|
| Model | ROC-AUC | PR-AUC | ROC-AUC | PR-AUC |
| FHS | 0.871 | 0.265 | 0.766 | 0.254 |
| SA-HAR-J-Net | 0.834 | 0.240 | 0.779 | 0.248 |
| HAR-J | 0.874 | 0.155 | 0.776 | 0.236 |
| LGBM-Q | 0.708 | 0.071 | 0.658 | 0.139 |
| CNN-LSTM-Q | 0.698 | 0.043 | 0.712 | 0.111 |
| GARCH | 0.501 | 0.013 | 0.545 | 0.063 |
| GJR-GARCH | 0.501 | 0.013 | 0.545 | 0.063 |
| HS | 0.476 | 0.011 | 0.440 | 0.047 |
| Variable | Description | Estimate | Std. Error | t-Stat | p-Value |
|---|---|---|---|---|---|
| Intercept | Constant | 1.94 × 10−5 | 7.61 × 10−6 | 2.55 | 0.011 |
| clag1 | Continuous (daily) | 0.358 | 0.027 | 13.24 | <0.001 |
| croll5d | Continuous (5-day) | −0.027 | 0.054 | −0.50 | 0.618 |
| croll22d | Continuous (22-day) | 0.620 | 0.088 | 7.02 | <0.001 |
| jlag1 | Jump (daily) | 0.168 | 0.071 | 2.35 | 0.019 |
| jroll5d | Jump (5-day) | 0.736 | 0.203 | 3.63 | <0.001 |
| jroll22d | Jump (22-day) | −1.209 | 0.428 | −2.83 | 0.005 |
| retlag1 | Return lag 1 | −0.003 | 0.000 | −10.91 | <0.001 |
Appendix B. Supplementary Results: Sensitivity of Volatility-Scaled Cap Multiplier
| m | QL (1%) | Viol (1%) | QL (5%) | Viol (5%) |
|---|---|---|---|---|
| 4 | 0.073% | 1.83% | 0.200% | 3.88% |
| 5 | 0.069% | 0.91% | 0.200% | 4.11% |
| 6 | 0.071% | 0.46% | 0.199% | 3.65% (Baseline) |
| 7 | 0.071% | 0.46% | 0.197% | 4.11% |
| 8 | 0.071% | 0.46% | 0.196% | 4.11% |
References
- Jorion, P. Value at Risk: The New Benchmark for Managing Financial Risk, 3rd ed.; McGraw-Hill: New York, NY, USA, 2007. [Google Scholar]
- McNeil, A.J.; Frey, R.; Embrechts, P. Quantitative Risk Management: Concepts, Techniques and Tools; revised edition; Princeton University Press: Princeton, NJ, USA, 2015. [Google Scholar]
- Cont, R. Empirical properties of asset returns: Stylized facts and statistical issues. Quant. Financ. 2001, 1, 223. [Google Scholar] [CrossRef]
- Engle, R.F. Risk and volatility: Econometric models and financial practice. Am. Econ. Rev. 2004, 94, 405–420. [Google Scholar] [CrossRef]
- Engle, R.F. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 1982, 50, 987–1007. [Google Scholar] [CrossRef]
- Bollerslev, T. Generalized autoregressive conditional heteroskedasticity. J. Econom. 1986, 31, 307–327. [Google Scholar] [CrossRef]
- Granger, C.W.J.; Joyeux, R. An introduction to long-memory time series models and fractional differencing. J. Time Ser. Anal. 1980, 1, 15–29. [Google Scholar] [CrossRef]
- Geweke, J.; Porter-Hudak, S. The estimation and application of long memory time series models. J. Time Ser. Anal. 1983, 4, 221–238. [Google Scholar] [CrossRef]
- Ding, Z.; Granger, C.W.J.; Engle, R.F. A long memory property of stock market returns and a new model. J. Empir. Financ. 1993, 1, 83–106. [Google Scholar] [CrossRef]
- Baillie, R.T.; Bollerslev, T.; Mikkelsen, H.O. Fractionally integrated generalized autoregressive conditional heteroskedasticity. J. Econom. 1996, 74, 3–30. [Google Scholar] [CrossRef]
- Andersen, T.G.; Bollerslev, T.; Diebold, F.X.; Labys, P. Modeling and forecasting realized volatility. Econometrica 2003, 71, 579–625. [Google Scholar] [CrossRef]
- Barndorff-Nielsen, O.E.; Shephard, N. Econometric analysis of realized volatility and its use in estimating stochastic volatility models. J. R. Stat. Soc. Ser. B 2002, 64, 253–280. [Google Scholar] [CrossRef]
- Andersen, T.G.; Bollerslev, T.; Diebold, F.X.; Labys, P. The distribution of realized exchange rate volatility. J. Am. Stat. Assoc. 2001, 96, 42–55. [Google Scholar] [CrossRef]
- Barndorff-Nielsen, O.E.; Shephard, N. Power and bipower variation with stochastic volatility and jumps. J. Financ. Econom. 2004, 2, 1–37. [Google Scholar] [CrossRef]
- Corsi, F. A simple approximate long-memory model of realized volatility. J. Financ. Econom. 2009, 7, 174–196. [Google Scholar] [CrossRef]
- Patton, A.J.; Sheppard, K. Good volatility, bad volatility: Signed jumps and the persistence of volatility. Rev. Econ. Stat. 2015, 97, 683–697. [Google Scholar] [CrossRef]
- Hillebrand, E. Neglecting parameter changes in GARCH models. J. Econom. 2005, 129, 121–138. [Google Scholar] [CrossRef]
- McAleer, M.; Medeiros, M.C. Realized volatility: A review. Econom. Rev. 2008, 27, 10–45. [Google Scholar] [CrossRef]
- Mandelbrot, B.B. The variation of certain speculative prices. J. Bus. 1963, 36, 394. [Google Scholar] [CrossRef]
- Hurst, H.E. Long-term storage capacity of reservoirs. Trans. Am. Soc. Civ. Eng. 1951, 116, 770–799. [Google Scholar] [CrossRef]
- Peng, C.-K.; Buldyrev, S.V.; Havlin, S.; Simons, M.; Stanley, H.E.; Goldberger, A.L. Mosaic organization of DNA nucleotides. Phys. Rev. E 1994, 49, 1685–1689. [Google Scholar] [CrossRef]
- Kantelhardt, J.W.; Zschiegner, S.A.; Koscielny-Bunde, E.; Havlin, S.; Bunde, A.; Stanley, H.E. Multifractal detrended fluctuation analysis of nonstationary time series. Phys. A Stat. Mech. Its Appl. 2002, 316, 87–114. [Google Scholar] [CrossRef]
- Torrence, C.; Compo, G.P. A practical guide to wavelet analysis. Bull. Am. Meteorol. Soc. 1998, 79, 61–78. [Google Scholar] [CrossRef]
- Bacry, E.; Delour, J.; Muzy, J.-F. Multifractal random walk. Phys. Rev. E 2001, 64, 026103. [Google Scholar] [CrossRef]
- Calvet, L.E.; Fisher, A.J. Multifractality in asset returns: Theory and evidence. Rev. Econ. Stat. 2002, 84, 381–406. [Google Scholar] [CrossRef]
- Carpenter, J.N.; Whitelaw, R.F.; Lynch, A.W. The real value of China’s stock market. J. Financ. Econ. 2021, 139, 679–696. [Google Scholar] [CrossRef]
- Hansen, P.R.; Lunde, A. Realized variance and market microstructure noise. J. Bus. Econ. Stat. 2006, 24, 127–161. [Google Scholar] [CrossRef]
- Brownlees, C.T.; Gallo, G.M. Financial econometric analysis at ultra-high frequency: Data handling concerns. Comput. Stat. Data Anal. 2006, 51, 2232–2245. [Google Scholar] [CrossRef]
- Amaya, D.; Christoffersen, P.; Jacobs, K.; Vasquez, A. Does realized skewness predict the cross-section of equity returns? J. Financ. Econ. 2015, 118, 135–167. [Google Scholar] [CrossRef]
- Sainath, T.N.; Vinyals, O.; Senior, A.; Sak, H. Convolutional, long short-term memory, fully connected deep neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, Australia, 19–24 April 2015; pp. 4580–4584. [Google Scholar]
- Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inform. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Koenker, R.; Bassett, G. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
- Christoffersen, P. Evaluating interval forecasts. Int. Econ. Rev. 1998, 39, 841–862. [Google Scholar] [CrossRef]
- Kupiec, P.H. Techniques for verifying the accuracy of risk measurement models. Division of research and statistics, division of monetary affairs. Fed. Reserve Board 1995, 95, 73–84. [Google Scholar] [CrossRef]
- Engle, R.F.; Manganelli, S. CAViaR: Conditional autoregressive value at risk by regression quantiles. J. Bus. Econ. Stat. 2004, 22, 367–381. [Google Scholar] [CrossRef]
- Andersen, T.G.; Bollerslev, T.; Diebold, F.X.; Ebens, H. The Distribution of Realized Stock Return Volatility. J. Financ. Econ. 2001, 61, 43–76. [Google Scholar] [CrossRef]
- Boudoukh, J.; Richardson, M.; Whitelaw, R.F. The best of both worlds: A hybrid approach to calculating value at risk. Risk 1998, 11, 64–67. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Light GBM: A highly efficient gradient boosting decision tree. Adv. Neural Inform. Process. Syst. 2017, 30. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html (accessed on 6 April 2026).
- Glosten, L.R.; Jagannathan, R.; Runkle, D.E. On the relation between the expected value and the volatility of the nominal excess return on stocks. J. Financ. 1993, 48, 1779–1801. [Google Scholar] [CrossRef]
- Adesi, G.; Giannopoulos, K.; Vosper, L. VaR without correlations for portfolios of derivative securities. J. Futures Mark. 1999, 19, 583–602. [Google Scholar] [CrossRef]
- Diebold, F.X.; Mariano, R.S. Comparing predictive accuracy. J. Bus. Econ. Stat. 2002, 20, 134–144. [Google Scholar] [CrossRef]











| Variable | Mean | Std. Dev. | Min | Max |
|---|---|---|---|---|
| log_ret | 0.000320 | 0.019224 | −0.130337 | 0.157928 |
| rv5 | 0.000212 | 0.000396 | 0.000000 | 0.007538 |
| bv | 0.000195 | 0.000357 | 0.000000 | 0.007463 |
| J | 0.000023 | 0.000086 | 0.000000 | 0.002912 |
| rsk | 0.146477 | 0.857009 | −6.708204 | 6.708204 |
| rkt | 3.911952 | 2.085242 | 1.664353 | 45.000000 |
| Series | H(2) | αmin | αmax | ∆α |
|---|---|---|---|---|
| ChiNext log realized volatility | 1.1547 | 0.9817 | 1.2259 | 0.2442 |
| Mean | Std. Dev. | Min | Max |
|---|---|---|---|
| 1.0360 | 0.0944 | 0.8625 | 1.2616 |
| Configuration | m | Scale Range (s) | q Range | H(2) | ∆α | R2 (q = 2) |
|---|---|---|---|---|---|---|
| Baseline | 1 | [16, 512] | [−5, 5] | 1.155 | 0.244 | 0.996 |
| Scale Range: [16, 256] | 1 | [16, 256] | [−5, 5] | 1.114 | 0.223 | 0.997 |
| Scale Range: [32, 512] | 1 | [32, 512] | [−5, 5] | 1.152 | 0.285 | 0.994 |
| Scale Range: [32, N/4] | 1 | [32, 728] | [−5, 5] | 1.158 | 0.216 | 0.996 |
| Number of Scales: 15 | 1 | [16, 512] | [−5, 5] | 1.160 | 0.148 | 0.997 |
| Number of Scales: 25 | 1 | [16, 512] | [−5, 5] | 1.130 | 0.180 | 0.995 |
| q Range: [−4, 4] | 1 | [16, 512] | [−4, 4] | 1.155 | 0.160 | 0.996 |
| q Range: [−6, 6] | 1 | [16, 512] | [−6, 6] | 1.155 | 0.311 | 0.996 |
| q Step: 0.5 | 1 | [16, 512] | [−5, 5] | 1.155 | 0.263 | 0.996 |
| Detrend Order: m = 2 | 2 | [16, 512] | [−5, 5] | 1.070 | 0.269 | 0.998 |
| Series | H(2) | αmin | αmax | ∆α | R2 (q = 2) |
|---|---|---|---|---|---|
| Original | 1.155 | 0.982 | 1.226 | 0.244 | 0.996 |
| Shuffled (permute time) | 0.519 | 0.463 | 0.557 | 0.095 | 0.997 |
| IAAFT surrogate | 1.129 | 1.064 | 1.157 | 0.093 | 0.998 |
| Symbol | Variable Name | Interpretation (Financial/Physical Meaning) |
|---|---|---|
| Pt,i | Intraday price | Five-minute close price; raw high-frequency signal |
| rt,i | Intraday return | Five-minute log return; within-day movement intensity used to build realized measures |
| rt | Daily return | Close-to-close log return; target whose left-tail quantiles define VaR |
| RVt | Realized variance | Total within-day variation; how strongly prices fluctuated during day t |
| BVt | Bipower variation | Continuous-volatility proxy; less sensitive to jump-like spikes than RVt |
| Jt | Jump proxy | Proxy of discontinuous jumps: excess of RVt over BVt |
| RSKt | Realized skewness | Asymmetry of intraday returns; captures downside-dominated risk when negative |
| RKTt | Realized kurtosis | Tail heaviness; larger values indicate fat tails and more extreme outliers |
| ∆2RSKt | Skewness acceleration | Rapid changes in asymmetry; highlight turning points in tail shape |
| , , | HAR components | Daily/weekly/monthly aggregates approximating heterogeneous investor horizons (short/medium/long) |
| aℓ | Attention weight | Cognitive-inspired focus on salient historical days; a higher aℓ means higher relevance for tail risk |
| capt | Vol-scaled safety buffer | Volatility-scaled bound that stabilizes tail forecasts under noisy high-frequency inputs |
| Model | Key Settings/Parameters |
|---|---|
| SA-HAR-J-Net (neural quantile model) | Rolling OOS expanding window; lookback L = 60; HAR tokens: daily/weekly(5)/monthly(22); per-token features (RV, BV, J, RSK, ∆2RSKt, RKT) (dim = 6); BiLSTM hidden = 64 (per direction); attention dim = 128; dropout = 0.3; lr = 5 × 10−4; batch = 256; max epochs = 20 with patience = 20; refit every 20 prediction days; bounded VaR with cap multiplier m = 6.0 and ε = 10−8. |
| HS [38] (nonparametric) | Historical simulation with rolling window W = 250 trading days for the main comparison (sensitivity also reported for W ∈ {125, 500}). |
| HAR-J [15] (linear quantile regression) | Quantile regression at α = 0.05 with HAR-style regressors: (clag1, croll5, croll22, jlag1, jroll5, jroll22, rlag1). |
| LGBM-Q [39] (tree-based quantile model) | LightGBM quantile objective (α = 0.05) using the HAR-J feature set; learning_rate = 0.05; num_leaves = 31; n_estimators = 2000; best_iteration = 57. |
| CNN–LSTM-Q [30] (deep sequence baseline) | Window size = 20; Conv1D filters = 32; LSTM units = 50/50; dropout = 0.20; lr = 1 × 10−3; batch = 32; trained 17 epochs (quantile loss). |
| GARCH [6] (parametric volatility model) | Student-t GARCH(1,1): µ = 0.0421, ω = 0.0247, α1 = 0.0639, β1 = 0.9318, ν = 6.8667. |
| GJR-GARCH [40] (parametric volatility model) | Student-t GJR-GARCH(1,1): µ = 0.0393, ω = 0.0262, α1 = 0.0605, γ1 = 0.0098, β1 = 0.9297, ν = 6.8808. |
| FHS [41] (semi-parametric) | Filtered historical simulation with a HAR-J-style volatility filtering stage (coefficients reported) and empirical tail estimation on standardized residuals. |
| Learning_Rate | Num_Leaves | n_Estimators | Best_Iteration | Best Val QL |
|---|---|---|---|---|
| 0.05 | 31 | 2000 | 57 | 0.001278 |
| Window | Epochs | Batch | Train Loss | Val Loss | lr | Conv Filters | LSTM1 | LSTM2 | Dropout |
|---|---|---|---|---|---|---|---|---|---|
| 20 | 17 | 32 | 0.01391 | 0.01000 | 1 × 10−3 | 32 | 50 | 50 | 0.20 |
| Model | QL | Viol. Rate | ES | Clust. | Kupiec p | Christ. p |
|---|---|---|---|---|---|---|
| SA-HAR-J-Net | 0.197% | 3.88% | −3.55% | 2 | 26.43% | 15.95% |
| HS | 0.225% | 6.16% | −3.92% | 5 | 27.99% | 2.18% |
| HAR-J | 0.212% | 2.28% | −5.22% | 1 | 0.36% | 21.56% |
| LGBM-Q | 0.215% | 5.94% | −3.73% | 4 | 38.20% | 7.28% |
| CNN-LSTM-Q | 0.211% | 5.25% | −3.96% | 2 | 81.09% | 48.44% |
| GARCH | 0.256% | 0.68% | −9.63% | 0 | 0% | - |
| GJR-GARCH | 0.252% | 0.91% | −8.31% | 0 | 0% | - |
| FHS | 0.204% | 2.05% | −4.40% | 0 | 0.14% | - |
| Model 1 | Model 2 | DM Statistic | p-Value |
|---|---|---|---|
| HAR-J | LGBM-Q | −0.2509 | 0.8020 |
| LGBM-Q | CNN-LSTM-Q | 0.7073 | 0.4797 |
| HAR-J | HS | −0.6669 | 0.5052 |
| SA-HAR-J-Net | HARJ | −1.78 | 0.076 |
| SA-HAR-J-Net | FHS | −0.79 | 0.431 |
| SA-HAR-J-Net | LGBM | −1.14 | 0.255 |
| SA-HAR-J-Net | CNN-LSTM | −0.69 | 0.491 |
| Regressor | Coef. | Std. Err. | t-Stat | p-Value |
|---|---|---|---|---|
| clag1 | −4.3084 | 3.6385 | −1.1841 | 0.2365 |
| croll5 | −22.8610 | 7.4711 | −3.0599 | 0.0022 |
| croll22 | −58.6138 | 15.3361 | −3.8219 | 0.0001 |
| jlag1 | −31.5226 | 7.3358 | −4.2971 | 0.0000 |
| jroll5 | 26.4945 | 30.2555 | 0.8757 | 0.3813 |
| jroll22 | 178.4416 | 67.7801 | 2.6327 | 0.0085 |
| rlag1 | 0.1777 | 0.0388 | 4.5778 | 0.0000 |
| Model | μ | ω | α1 | γ1 | β1 | ν |
|---|---|---|---|---|---|---|
| GARCH(1,1)-t | 0.0421 | 0.0247 | 0.0639 | - | 0.9318 | 6.8667 |
| GJR-GARCH(1,1)-t | 0.0393 | 0.0262 | 0.0605 | 0.0098 | 0.9297 | 6.8808 |
| Early Half (E) | Late Half (L) | |||||
|---|---|---|---|---|---|---|
| Model | Viol | QL | ES | Viol | QL | ES |
| SA-HAR-J-Net | 4.57% | 0.00178 | −0.0275 | 3.20% | 0.00216 | −0.0470 |
| HS | 7.76% | 0.00219 | −0.0355 | 4.57% | 0.00231 | −0.0456 |
| HAR-J | 2.28% | 0.00209 | −0.0465 | 2.28% | 0.00215 | −0.0579 |
| LGBM-Q | 5.94% | 0.00206 | −0.0345 | 5.94% | 0.00225 | −0.0401 |
| CNN-LSTM-Q | 5.48% | 0.00202 | −0.0370 | 5.02% | 0.00219 | −0.0425 |
| GARCH | 0.91% | 0.00243 | −0.0794 | 0.46% | 0.00268 | −0.1303 |
| GJR-GARCH | 0.91% | 0.00241 | −0.0794 | 0.91% | 0.00264 | −0.0869 |
| FHS | 1.37% | 0.00190 | −0.0337 | 2.74% | 0.00218 | −0.0492 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, K.; Wu, S.; Zhu, D. Modeling Time-Varying Volatility via Multi-Scale Structures and Dynamic Attention Networks: Evidence from High-Frequency Data. Mathematics 2026, 14, 1257. https://doi.org/10.3390/math14081257
Zhang K, Wu S, Zhu D. Modeling Time-Varying Volatility via Multi-Scale Structures and Dynamic Attention Networks: Evidence from High-Frequency Data. Mathematics. 2026; 14(8):1257. https://doi.org/10.3390/math14081257
Chicago/Turabian StyleZhang, Kaidi, Shaobing Wu, and Dong Zhu. 2026. "Modeling Time-Varying Volatility via Multi-Scale Structures and Dynamic Attention Networks: Evidence from High-Frequency Data" Mathematics 14, no. 8: 1257. https://doi.org/10.3390/math14081257
APA StyleZhang, K., Wu, S., & Zhu, D. (2026). Modeling Time-Varying Volatility via Multi-Scale Structures and Dynamic Attention Networks: Evidence from High-Frequency Data. Mathematics, 14(8), 1257. https://doi.org/10.3390/math14081257
