Deep Learning and Transformer Architectures for Volatility Forecasting: Evidence from U.S. Equity Indices

Taneva-Angelova, Gergana; Granchev, Dimitar

doi:10.3390/jrfm18120685

Open AccessArticle

Deep Learning and Transformer Architectures for Volatility Forecasting: Evidence from U.S. Equity Indices

by

Gergana Taneva-Angelova

^* and

Dimitar Granchev

Faculty of Economics and Social Sciences, University of Plovdiv Paisii Hilendarski, 4000 Plovdiv, Bulgaria

^*

Author to whom correspondence should be addressed.

J. Risk Financial Manag. 2025, 18(12), 685; https://doi.org/10.3390/jrfm18120685 (registering DOI)

Submission received: 4 November 2025 / Revised: 23 November 2025 / Accepted: 26 November 2025 / Published: 2 December 2025

(This article belongs to the Special Issue Quantitative Methods for Financial Derivatives and Markets)

Download

Browse Figures

Versions Notes

Abstract

Volatility forecasting plays a crucial role in financial markets, portfolio management, and risk control. Classical econometric models such as GARCH, ARIMA, and HAR-RV are widely used but face limitations in capturing the nonlinear and regime-dependent dynamics of financial volatility. This study compares traditional econometric models (HAR-RV, ARIMA, GARCH) with deep learning (DL) architectures (LSTM, CNN-LSTM, PatchTST-lite, and Vanilla Transformer) in forecasting realized variance (RV) for major U.S. equity indices (S&P 500, NASDAQ 100, and the Dow Jones Industrial Average) over the period 2000–2025. RV is used as the dependent variable because it is a standard model-free proxy for market volatility. Forecast accuracy is evaluated across forecast horizons of h = 1, 5, 22 days using QLIKE, RMSE, and MAE, along with Diebold–Mariano (DM) significance tests and overfitting diagnostics. Results show that Transformer-based models achieve the lowest errors and strongest generalization, particularly at short horizons and during volatile periods. Overall, the findings highlight the growing advantage of AI-driven models in delivering stable and economically meaningful volatility forecasts, supporting more effective portfolio allocation and risk management—especially in environments marked by rapid market shifts and structural breaks.

Keywords:

Transformer architecture; deep learning; ARIMA; HAR-RV; GARCH; volatility forecasting; stock market indices

JEL Classification:

C22; C45; C58; G17; G11

1. Introduction

Volatility forecasting is a fundamental component of modern financial analysis, as it underpins risk management, asset allocation, derivative pricing, and monetary policy decisions. Accurate predictions volatility forecasts help investors and policymakers anticipate uncertainty and make informed strategic choices.

Volatility plays a central role in financial economics because it directly shapes key mechanisms of risk and asset valuation. First, volatility determines the risk premium investors require as compensation for uncertainty, thereby influencing expected returns across asset classes. Second, precise volatility estimates are essential for widely used pricing frameworks—including CAPM, Black-Scholes, and stochastic-volatility models—where volatility enters as a core input. Third, volatility determines the efficiency of portfolio diversification and allocation through mean-variance optimization, making its forecast quality critical for constructing stable, risk-adjusted portfolios. In risk management, volatility is an explicit component of Value-at-Risk (VaR) and Expected Shortfall (ES). Forecasting errors therefore translate directly into miscalibrated capital buffers, producing either understated downside risk or overly conservative capital requirements. As an indicator of market stress, volatility also shapes investor psychology, market liquidity, and transaction costs. Rising volatility widens bid-ask spreads, increases slippage, and reduces market depth. Moreover, volatility acts as a key indicator of market regimes–distinguishing tranquil from turbulent periods–and is widely employed in Markov-switching models to identify state transitions. Persistent increases in volatility help flag crisis conditions and structural breaks, including events such as the Global Financial Crisis (GFC), COVID-19, and inflation shocks. Taken together, these properties illustrate why improvements in volatility forecasting carry both statistical and economic relevance: they enhance pricing accuracy, strengthen portfolio allocation, improve risk measurement, and provide earlier detection of regime shifts and systemic stress.

Traditionally, econometric approaches such as the Autoregressive Integrated Moving Average (ARIMA) (Box & Jenkins, 1970) and the Generalized Autoregressive Conditional Heteroscedasticity (GARCH) (Engle, 1982; Bollerslev, 1986) have been widely used, valued for their simplicity and theoretical grounding. ARIMA captures linear dependencies on past values and forecast errors, while GARCH models time-varying variance and the well-known clustering of volatility in financial markets. Extensions such as EGARCH introduce asymmetry by recognizing that negative shocks can in-crease volatility more strongly than positive ones (Nelson, 1991). However, their parametric and largely linear structure makes them poorly suited for the empirical properties of realized volatility, which include long-memory, nonlinear adjustments and regime-dependent behavior documented extensively in the literature (Andersen et al., 2003; O. E. Barndorff-Nielsen & Shephard, 2002; Andersen et al., 2001). By examining whether AI-based architectures respond more effectively to shifts in volatility regimes than classical econometric models, this study addresses a key economic question: to what extent can model architecture enhance the detection of structural changes, information flows, and market conditions that influence risk-related decisions? This perspective ensures that the comparison across model classes reflects not only statistical accuracy but also the mechanisms through which markets process information.

The model classes considered in this study correspond to established market mechanisms. ARIMA captures short-run autocorrelation in volatility proxies, GARCH models conditional variance and leverage effects, and HAR-RV reproduces long-memory behavior by combining daily, weekly, and monthly components. DL architectures extend these ideas by modelling nonlinear interactions and structural breaks, while Transformer-based models employ self-attention to identify relevant segments of volatility history, making them well suited to abrupt shifts and heterogeneous information flows.

Although models such as Stochastic Volatility, Realized GARCH, GARCH-MIDAS, and HAR-J are well established in the literature, they are not included here to maintain methodological consistency. These models rely on heterogeneous input structures—such as intraday realized measures, jump components, or mixed-frequency information—which are not directly comparable to the daily data framework used in this study Moreover, primary objective of this paper is to compare standard econometric benchmarks with deep-learning and Transformer architectures. Introducing specialized extensions such as MIDAS or Realized GARCH would substantially broaden the scope of the analysis and reduce the clarity and interpretability of the comparison. Importantly, several of the structural features targeted by these advanced econometric models are already absorbed endogenously by modern AI architectures, particularly those with attention mechanisms. For these reasons, the analysis focuses on a consistent and comparable set of models and leaves the exploration of such extensions for future research.

As a primary analytical contribution, the study applies a consistent and unified evaluation framework that implements the same forecasting, generalization, and overfitting diagnostics across classical, deep-learning, and Transformer models. To the extent allowed by the available literature, existing studies typically evaluate these model classes separately, over different time periods, or under non-comparable criteria, which limits direct comparability. Here, a single methodological protocol is applied uniformly to all architectures, enabling a more structured and comparable assessment of their performance under different market conditions.

A further methodological contribution is the use of a multi-estimator framework for realized volatility. Instead of relying on a single proxy, the analysis evaluates all model classes across three established estimators—Close-to-Close, Parkinson, and Yang-Zhang. Based on the available literature, many empirical studies rely on only one estimator, reducing robustness and comparability. By applying an identical forecasting and validation protocol to all three RV measures, the study provides a more robust and estimator-independent assessment of model accuracy and illustrates how different architectures respond to intraday variation, overnight movements, and combined sources of volatility.

An additional contribution is the enriched economic interpretation of the forecasts. The analysis shows how models behave during calm periods—when volatility evolves smoothly—and during crisis regimes, when it becomes sharply nonlinear and shocks intensify. This contrast clarifies which model architectures deliver more timely responses, more stable risk assessments, and more accurate uncertainty estimates under different market conditions. In this way, forecasting accuracy is not evaluated in isolation but is placed within the broader decision-making context relevant for risk managers, investors, and institutional participants.

The remainder of this paper is structured as follows: Section 2 reviews the relevant literature; Section 3 presents the data and methodology; Section 4 reports the empirical results; Section 5 provides the economic interpretation of forecast results; and Section 6 concludes.

2. Literature Review

Volatility forecasting has traditionally relied on classical econometric methods such as GARCH, ARIMA, and their extensions. Reviewing both econometric and deep learning perspectives provides a fuller understanding of how different models capture volatility dynamics. This section reviews key empirical contributions in both volatility forecasting and general time-series modelling, summarised in Table 1. The empirical literature on volatility can be broadly divided into two main strands. The first encompasses forecasting-oriented studies, which evaluate statistical models such as ARIMA, GARCH, and HAR-RV, as well as more recent deep-learning architectures, with a primary focus on predictive accuracy. The second strand examines the economic nature of volatility itself, including risk premia, volatility-of-volatility, uncertainty shocks, and their macro-financial transmission mechanisms (Engle et al., 2013). Although these two lines of research are related, they address distinct questions. The present study belongs to the forecasting-oriented literature but also incorporates economic motivation from the second strand in order to strengthen the interpretation and economic relevance of the results.

The GARCH family remains central to volatility modeling and has been extensively applied in both standalone and hybrid forecasting frameworks. Extensions such as EGARCH, GJR-GARCH, and GARCH-MIDAS capture asymmetry and the influence of mixed-frequency macroeconomic factors (Ersin & Bildirici, 2023; Asgharian et al., 2013; Virk et al., 2024). Recent studies suggest that incorporating nonlinear elements through neural-network components, such as LSTM, can further enhance predictive accuracy, significantly improving long-horizon forecasts (Ersin & Bildirici, 2023). Empirical evidence generally confirms that GARCH-type models outperform linear specifications such as ARIMA when forecasting stock-return variance (Asgharian et al., 2013). Including macroeconomic variables can enhance long-horizon forecasts but may also lead to overfitting or data-mining bias under certain conditions (Virk et al., 2024). Recent findings by Bildirici and Ersin confirm that adding LSTM components to GARCH-MIDAS frameworks substantially improves long-horizon performance (Ersin & Bildirici, 2023). Although ARIMA remains effective for modelling linear dynamics, it struggles to handle abrupt structural changes (Ferreira & Medeiros, 2021). Nonlinear models such as LSTM can adapt better during such periods, though ARIMA may still outperform LSTM in certain short-term contexts (Harikumar & Muthumeenakshi, 2025).

Recurrent neural networks, especially long short-term memory (LSTM) architectures (Hochreiter & Schmidhuber, 1997; Greff et al., 2017), have shown strong predictive accuracy by learning longer-term dependencies in financial time-series data (Hochreiter & Schmidhuber, 1997; Greff et al., 2017; Z. Zhang et al., 2025). LSTM networks address the vanishing gradient problem through memory cells that preserve long-horizon dynamics, a property relevant to persistent volatility (Hochreiter & Schmidhuber, 1997; Greff et al., 2017). Building on this foundation, hybrid convolutional LSTM (CNN-LSTM) models (Z. Zhang et al., 2025; Shi et al., 2015; Borovykh et al., 2017) combine convolutional layers, which extract local features, with recurrent layers that capture temporal structure. This design allows the model to represent short-term movements alongside long-run dependencies.

Transformer-based (TRF) architectures (Vaswani et al., 2017) have recently advanced sequence modeling by replacing recurrence with attention mechanisms. This approach allows the model to analyze relationships across all time steps simultaneously, making Transformers efficient at capturing both local and global patterns in financial data. For time-series forecasting, the PatchTST framework partitions the input into overlapping patches before applying attention, enhancing computational efficiency for longer-horizon predictions (Nie et al., 2023). While machine-learning studies document notable forecasting gains, most evaluate narrow model sets, short samples, or single horizons, leaving open important questions about their economic relevance and robustness under varying market regimes (Chun et al., 2025; Souto & Moradi, 2024; Zeng et al., 2023).

Recent evidence shows that Transformer architectures can successfully forecast synthetic Ornstein–Uhlenbeck processes and daily S&P 500 dynamics (Brugiere & Turinici, 2025). Prior work suggests that Transformers often predict log-quadratic variation more accurately than daily returns (Brugiere & Turinici, 2025; Souto & Moradi, 2024). Modern Transformer variants improve scalability and long-horizon accuracy (Nie et al., 2023). In financial applications, models such as PatchTST, Informer, and Autoformer frequently outperform first-generation Transformers (Nie et al., 2023; Souto & Moradi, 2024; Zeng et al., 2023). Zeng et al. also demonstrates that hybrid CNN-Transformer architectures outperform both traditional deep learning and econometric models when applied to financial time series (Zeng et al., 2023). A Quant-former model combining sentiment analysis and investor factor construction has been shown to outperform other quantitative factor models in stock-price prediction (Z. Zhang et al., 2025).

Convolutional neural networks (CNNs) are particularly effective at capturing short-range patterns in time-series data, while Transformers are more effective at modeling long-range dependencies. Recent evidence shows that hybrid CNN-Transformer architectures can combine the advantages of local feature extraction and long-range dependency modelling, outperforming traditional deep-learning and econometric models in financial time-series forecasting (Zeng et al., 2023). More broadly, machine-learning (ML) models tend to outperform classical volatility models across horizons, though real trading performance may be affected by transaction costs and implementation frictions (Chun et al., 2025). Chun, Cho, and Ryu also find that ML-based volatility models outperform GARCH and HAR-RV benchmarks across multiple horizons, especially when used for volatility-timing strategies (Chun et al., 2025).

Recent studies on realized volatility estimation show that multi-grid techniques reduce microstructure noise (L. Zhang et al., 2005), while combining multiple estimators enhances overall forecast accuracy (Patton & Sheppard, 2015). Jump-robust measures such as power and bipower variation further enhance volatility estimation and risk measurement (Patton & Sheppard, 2015; Tauchen & Zhou, 2011), reflecting the ongoing development of more accurate and robust realized-volatility estimators. Parallel to these econometric developments, recent work has compared ML models (such as LSTM and CNN) with classical approaches for realized-volatility forecasting. However, despite substantial interest in machine-learning applications, empirical research applying Transformer-based architectures specifically to realized-variance forecasting remains limited. This study contributes to this emerging strand by evaluating Transformer, LSTM and CNN-LSTM models alongside representative classical benchmarks (ARIMA, GARCH, HAR-RV) across multiple horizons and major U.S. indices over a long sample (2000–2025).

While the literature provides important insights into volatility modelling, several gaps remain. HAR-RV and GARCH models perform well for short horizons but often weaken during periods of structural change, indicating the need for models that can capture nonlinear and regime-dependent behavior. Recent machine-learning studies report accuracy gains, yet many analyse isolated model families or focus on single indices, limiting cross-model comparability across horizons and market conditions. Although Transformer-based architectures have shown strong results in general time-series forecasting, their application to realized-variance forecasting remains limited. These gaps motivate our unified framework, which jointly evaluates classical econometric, deep-learning, and Transformer models across multiple indices and forecast horizons, enabling a consistent comparison and a clearer economic interpretation of the observed performance differences.

3. Materials and Methods

This section outlines the data sources, preprocessing procedures, and modeling design employed in the study. It describes how RV measures were constructed, how datasets were synchronized across indices, and how econometric, DL, and TRF frameworks were trained and evaluated under a unified experimental setup.

3.1. Data Collation and Preprocessing

The empirical analysis is based on daily price data for 3 major U.S. equity indices (the S&P 500, NASDAQ 100 and DJIA), covering the period from January 2000 to August 2025 (Table 2). This long historical window provides sufficient depth for constructing multi-horizon volatility forecasts and ensures stable estimation of realized-variance measures. It also captures a wide range of market environments, allowing models to be evaluated under both low- and high-volatility conditions.

The datasets were obtained from Investing.com (2025) (https://www.investing.com/, accessed on 5 September 2025), and include daily open, high, low, and close (OHLC) prices, ensuring full consistency across indices. The series contain 6450 daily observations per index, spanning nearly 25 years of trading data (Table 2). After applying a 120-day rolling lookback window, around 6300 supervised sequences were produced for three horizons—1, 5, and 22 days (h = 1, 5, 22), representing daily, weekly, and monthly intervals. Logarithmic returns were computed from daily closing prices to capture compounding effects and short-term dynamics.

RV was estimated using three range-based measures: Close-to-Close, Parkinson (Parkinson, 1980) and Yang–Zhang (Yang & Zhang, 2000). A logarithmic transformation of realized variance (log(RV) was then applied to stabilize variance, improving model stability and convergence, forming target variable for all models.

While realized variance is theoretically defined as the sum of squared intraday re-turns, we rely on range-based estimators due to the absence of consistent intraday data for the entire period and across all indices. Range-based realized measures (Parkinson, 1980) have been shown to be high-frequency-efficient and less noisy than close-to-close variance estimates, offering a practical and theoretically justified proxy when only daily OHLC data are available. Consistent with the realized-volatility literature, we therefore treat our dependent variable as a range-based proxy for integrated volatility (L. Zhang et al., 2005; Patton & Sheppard, 2009; O. Barndorff-Nielsen & Shephard, 2004). RV is chosen as the dependent variable because it provides a model-free, data-driven measure of integrated volatility whose asymptotic and finite-sample properties are well established in empirical finance (Andersen et al., 2003; O. E. Barndorff-Nielsen & Shephard, 2002; Andersen et al., 2001). Following Andersen, Bollerslev, Diebold and Labys (Andersen et al., 2003; Andersen et al., 2001) and O. E. Barndorff-Nielsen and Shephard (2002), RV provides a more accurate benchmark of integrated volatility than squared returns or parametric GARCH-implied variance. Using three complementary RV definitions improves measurement robustness by capturing multiple dimensions of price variation and reducing estimator-specific bias (L. Zhang et al., 2005; Andersen et al., 2003; O. Barndorff-Nielsen & Shephard, 2004).

The choice of the 2000–2025 period is motivated by both statistical and economic considerations. Statistically, a long sample is essential for training deep learning models, and evaluating multi-horizon forecasts, and ensuring stable estimation of realized-variance measures. Economically, this period encompasses several distinct volatility regimes—from the dot-com aftermath and the Global Financial Crisis to the COVID-19 shock and the post-pandemic tightening cycle—allowing us to examine whether AI-based architectures adapt more effectively to structural changes than classical econometric models.

Data were split 80/20 into training and testing sets, with the test sample beginning in August 2020 to capture major high-volatility episodes such as the COVID-19 crisis, the 2022 inflation shock, and the 2023–2025 market adjustment.

Before constructing the target variable, the raw series were cleaned and aligned chronologically. Missing or duplicated rows were removed to ensure consistent and gap-free returns, as such issues can distort both log-return and RV calculations. Daily logarithmic returns were calculated as follows:

r_{t} = \ln (\frac{P_{t}}{P_{t - 1}})

(1)

where

P_{t}

is the daily closing price. In financial econometrics, realized variance (RV) is theoretically defined as the sum of squared intraday log-returns (Andersen et al., 2003; O. E. Barndorff-Nielsen & Shephard, 2002; Patton, 2011):

{R V}_{t} = \sum_{i = 1}^{M_{t}} r_{t, i}^{2}

(2)

Because intraday data are unavailable for the full 2000–2025 sample, RV is approximated using daily information. The simplest approximation is the squared daily log-return (Patton, 2011):

{R V}_{t} = r_{t}^{2}

(3)

This approximation captures daily price variability and is widely used in long-horizon studies when intraday data are unavailable. Although high-frequency data can yield finer estimates, daily squared returns offer a reliable long-horizon analysis. To improve measurement quality, two additional range-based estimators are employed—the Parkinson (Parkinson, 1980) measure and the Yang–Zhang (Yang & Zhang, 2000) estimator—which use daily high, low, open, and close prices to capture intraday variation when true intraday data are unavailable. These estimators are known to be high-frequency-efficient and less noisy than close-to-close volatility, making them suitable for long-sample forecasting studies studies (Parkinson, 1980; L. Zhang et al., 2005; O. Barndorff-Nielsen & Shephard, 2004). Variance is used instead of volatility because it is additive and provides a more stable modeling scale. Following Andersen et al. (2003) and Patton (2011), the RV is theoretically defined as the sum of squared intraday returns. Since intraday observations are unavailable, we approximate this quantity using the three daily OHLC-based estimators described above. To stabilize the distribution and reduce skewness, we apply the logarithmic transformation (Corsi, 2005):

\log ({R V}_{t}) = \log (r_{t}^{2})

(4)

which becomes the dependent variable for all models:

y_{t} = \log ({R V}_{t})

(5)

This transformation yields the log-realized variance, which normalizes scale, stabilizes variance, and produces an approximately Gaussian distribution that enhances model performance across both classical and deep-learning frameworks. Moreover, because

{R V}_{t}

is strictly positive, the logarithmic transformation is always well-defined, and forecast values can be mapped back to the variance scale through exponentiation (Andersen et al., 2003; Andersen et al., 2001; Corsi, 2005). After generating predictions in the log-variance domain, forecasts are converted to the RV domain using:

{\hat{R V}}_{t + h} = \exp ({\hat{\log (R V)}}_{t + h})

(6)

where t is the current time index and t + h denotes the future point corresponding to an h-step-ahead forecast.

Forecasting log(RV) rather than raw RV is standard in modern financial econometrics due to its improved statistical properties—lower skewness, stabilized variance, and better learning behavior—which contribute to higher predictive accuracy across linear and nonlinear specifications (Andersen et al., 2003; O. E. Barndorff-Nielsen & Shephard, 2002; Corsi, 2005). The resulting forecasts remain interpretable for applications such as volatility targeting and Value-at-Risk estimation.

To ensure comparability across model classes, all input features were standardized using training-sample statistics only. The modeling framework employed a 120-day rolling lookback window and three forecast horizons (h = 1, 5, 22), capturing both short and medium-term volatility persistence. Predicted values were transformed back from the logarithmic to the variance scale for evaluation in risk management contexts (Table 3). This multi-horizon rolling-window setup is well established in realized-variance forecasting (Taylor, 2005; Andersen et al., 2003; O. E. Barndorff-Nielsen & Shephard, 2002).

Table 3 summarises the main model families considered in the study. The comparison highlights several methodological trade-offs: classical econometric models remain transparent and efficient but struggle with nonlinear dynamics and regime changes; DL models provide greater flexibility but require more data and offer limited interpretability; and TRF architectures deliver strong long-range modelling capacity but are still relatively new in volatility forecasting. To mitigate the black-box limitations of DL and TRF models, the study adopts a transparent and reproducible framework, combining volatility decomposition (Close-to-Close, Parkinson, Yang–Zhang), interpretable loss metrics (QLIKE, MAE, RMSE), and visual diagnostics such as loss curves and true-vs-forecast panels. This ensures that the empirical results remain robust, comparable, and economically meaningful.

Figure 1 presents the structured workflow adopted in this study, integrating classical econometric methods with contemporary ML and TRF architectures. The framework ensures methodological rigor and comparability across model families. It shows how econometric and AI-based approaches complement each other in capturing volatility dynamics and translating empirical evidence into actionable insights for financial decision-making and policy applications.

3.2. Methodology

The methodology section outlines the analytical framework and modeling strategies employed in this study. It details the structure, estimation procedures, and validation methods applied to three major model groups used to forecast RV across major U.S. indices.

3.2.1. Classical Econometric Models

The ARIMA model (Box & Jenkins, 1970) is one of the most established tools in financial time-series forecasting. It combines an autoregressive (AR) term, which captures dependence on past observations, a moving average (MA) term, which reflects past forecast errors, and an integration (I) term that ensures stationarity through differencing. Following the classical Box-Jenkins (Box & Jenkins, 1970) specification, a general non-seasonal ARIMA(p, d, q) process can be written as:

ϕ (L) {(1 - L)}^{d} y t = c + θ (L) ε_{t}

(7)

ε_{t} ~ i . i . d (0, σ^{2})

(8)

where L denotes the lag operator,

ϕ (L) = 1 - ϕ_{1} L - \dots - ϕ_{p} L^{p}

represents the autoregressive polynomial of order

p, ({1 - L)}^{d}

is the differencing operator applied d times, and

θ (L) = 1 + θ_{1} L + \dots + θ_{q} L^{q}

denotes the moving average polynomial of order q. The innovation term

ε_{t}

is assumed to be independently and identically distributed with zero mean and variance

σ^{2}

.

Although ARIMA models are often applied to returns, in volatility forecasting it is more appropriate to model the logarithm of realized log(RV). This transformation stabilizes variance and reduces skewness, improving linear model performance when capturing persistence in RV (Taylor, 2005; Andersen et al., 2003; O. E. Barndorff-Nielsen & Shephard, 2002; Andersen et al., 2001). The use of log(RV) is well established in the realized-variance literature and generally produces more interpretable forecasts than modeling prices or returns directly (Andersen et al., 2003; O. E. Barndorff-Nielsen & Shephard, 2002). ARIMA provides a fundamental baseline, effectively capturing linear dependencies and short-term autocorrelation in the conditional mean. However, they assume constant variance, which is unrealistic for financial data characterized by volatility clustering. To address this, Engle (1982) introduced the ARCH model, later generalized by Bollerslev (1986) into the GARCH framework.

While ARIMA models the conditional mean, GARCH models the conditional variance, capturing how volatility evolves over time. This enables GARCH to represent persistent volatility and time-varying risk. In practice, the two models are often combined—ARIMA captures mean dynamics, and GARCH models residual volatility—yielding a more complete description of financial time series. In practice, the GARCH (1,1) specification is the most widely used:

σ_{t}^{2} = ω + α ε_{t - 1}^{2} + β σ_{t - 1}^{2}

(9)

Here, α measures the immediate impact of new shocks, while β reflects volatility persistence. This compact and flexible specification allows GARCH to complement ARIMA. In this study, GARCH(1,1) is estimated on daily log-returns, and its conditional variance is compared against the log(RV) targets derived from the three realized-variance estimators.

While ARIMA and GARCH remain essential tools for modeling linear dependence and conditional heteroskedasticity (Box & Jenkins, 1970; Engle, 1982; Bollerslev, 1986), they are limited in capturing long-memory effects and regime shifts often present in RV. To address this, Corsi (2009) introduced the HAR-RV, which incorporates volatility components over multiple horizons.

Formally:

{R V}_{t + 1} = β_{0} + β_{d} {R V}_{t}^{(d)} + β_{w} {R V}_{t}^{(w)} + β_{m} {R V}_{t}^{(m)} + ϵ_{t}

(10)

where

{R V}_{t}^{(d)}

,

{R V}_{t}^{(w)}

, and

{R V}_{t}^{(m)}

represent the realized variance computed over daily, weekly and monthly intervals, respectively.

The HAR-RV model bridges the simplicity of traditional econometrics with the ability to capture long-memory dynamics, serving as a robust benchmark before moving to nonlinear and DL models.

3.2.2. Advanced Models

To complement classical econometric methods, this study employs ML and DL models designed to capture nonlinear dependencies and long-range dynamics frequently observed in RV. These approaches are well suited to volatility forecasting, where regime shifts, structural breaks, and asymmetric responses to shocks are common.

The LSTM network (Hochreiter & Schmidhuber, 1997) models persistent volatility patterns through memory cells and gating mechanisms that preserve long-range information. A hybrid CNN-LSTM architecture (Shi et al., 2015) is also implemented: convolutional layers extract short-term local variations, while LSTM layers capture slower-moving components. Both models are trained on log(RV) using the QLIKE loss, which is standard for variance forecasting. Its core update rule can be summarized as (Greff et al., 2017):

c_{t} = f_{t} ⨀ c_{t - 1} + i_{t} ⨀ {\tilde{c}}_{t}

(11)

where

c_{t}

represents the memory cell,

f_{t}

and

i_{t}

represent the forget and input gates, and

{\tilde{c}}_{t}

is the candidate state.

Following Y. Zhang et al. (2025), the model uses RV as the target variable, while the convolutional layer extracts local temporal features using the CNN-LSTM hybrid architecture. In line with Borovykh et al. (2017), the architecture first applies one-dimensional convolutional filters to capture local temporal features:

z_{t} = R e L u (\sum_{k = 0}^{K - 1} W_{k} x_{t - k} + b)

(12)

where

W_{k}

are learnable convolutional kernels of length K, which act as sliding filters that move across the input sequence to extract local temporal features;

z_{t}

are then passed to the LSTM layer, which models long-term dependencies.

Transformer models (Vaswani et al., 2017) use self-attention to evaluate pairwise dependencies across the entire input window without recurrence. This enables them to capture both local and global patterns in log(RV). The study employs two compact variants: a lightweight encoder and the PatchTST-lite model (Nie et al., 2023), which partitions the input into overlapping patches to increase efficiency and improve long-horizon forecasting. Following Nie et al. (2023), we implement a lightweight version of the PatchTST architecture (PatchTST-lite) by adopting the reduced embedding size, fewer encoder blocks, and simplified attention configuration recommended in the original implementation, which preserves predictive performance while substantially lowering computational cost. Both models are trained on log(RV) using the QLIKE loss and evaluated via anchored walk-forward validation (Nie et al., 2023), where the training window remains fixed and the test window moves forward in time. Given a sequence of inputs

X = (x_{1}, \dots, x_{T})

, the self-attention mechanism maps it into contextualized representations (Vaswani et al., 2017):

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{{Q K}^{T}}{\sqrt{d_{k}}}) V

(13)

where

Q = X W_{Q}, K = X W_{K}

, and

V = X W_{V}

are the query, key, and value matrices.

Building on the CNN-LSTM framework, which captures short-term volatility bursts and longer-term persistence, the PatchTST-lite Transformer extends this idea by dividing the input into overlapping patches processed through multi-head self-attention (Nie et al., 2023). This structure allows the model to learn both local and global patterns in log(RV). The final prediction is obtained from the encoder output:

{\hat{y}}_{t + h} = W_{0} * T r a n s f o r m e r E n c o d e r (Z) + b_{o}

(14)

where

Z = [z_{1}, z_{2}, \dots, z_{N}]

denotes the patch representations and

h \in \{1,5, 22\}

is the forecast horizon. PatchTST-lite reduces attention complexity from

O (L^{2})

to

O (N^{2})

, improving scalability for long input windows.

In this study, both Transformer variants—a lightweight encoder and PatchTST-lite—are trained on log(RV) using the QLIKE loss, and evaluated through anchored walk-forward validation (Nie et al., 2023). This procedure closely mirrors real-world forecasting, as the training sample expands over time. To strengthen inference, we also conduct regime-specific evaluation (calm vs. turbulent periods) and apply Diebold-Mariano (Diebold & Mariano, 1995) tests to assess the significance of model-performance differences.

4. Results

The empirical results in this section compare the forecasting performance of the econometric models across multiple horizons and volatility regimes. The evaluation framework integrates three complementary dimensions: forecast accuracy, statistical significance, and model reliability. Forecast accuracy is assessed using MAE, RMSE, and QLIKE, which quantify how closely model predictions track realized volatility. The DM test (Diebold & Mariano, 1995) determines whether differences in forecast errors between model pairs are statistically significant. Model reliability is examined through overfitting diagnostics, comparing in-sample and out-of-sample losses to assess generalization quality. This three-layer framework provides a transparent and coherent basis for comparing classical and data-driven models across all volatility estimators and horizons.

4.1. Descriptive Statistics and Preliminary Analysis

Before evaluating the forecasting models, we examine how RV behaves across the three indices: S&P 500, NASDAQ 100, and DJIA. Three RV estimators are used: Close-to-Close, Parkinson, and Yang–Zhang. All RV measures are analyzed in log form to stabilize variance and reduce extreme values. The descriptive analysis reveals that RV is strongly right-skewed and leptokurtic across all indices, indicating that extreme volatility spikes occur far more often than expected under a normal distribution. The mean values exceed the medians, confirming heavy right tails, while dispersion measures vary substantially over time. These features reflect volatility clustering—tranquil periods followed by abrupt spikes. Even after log-transformation, the series display heterogeneity and outliers, consistent with fat-tailed volatility distributions. When examining the time dynamics of logRV, we observe gradually decaying autocorrelations and persistent shocks, implying that the underlying processes are mean-reverting but exhibit long memory. This pattern is more pronounced in range-based estimators, which incorporate intraday variation indirectly. This helps explain why models such as HAR-RV and GARCH(1,1) perform reliably: both depend on serial correlation.

Formal cross-correlation matrices and unit-root tests were not included, as these diagnostics fall outside the primary objective of the study, which is to compare forecasting models rather than to analyse inter-index dependence or long-run stochastic properties. The use of log-transformed realized volatility is well established in the literature to yield stationary, mean-reverting series suitable for econometric and machine-learning forecasting frameworks. Given the stable statistical behavior of the series and the extensive empirical evidence supporting their stationarity, additional correlation and stationarity tests were deemed unnecessary for the purposes of this analysis.

4.2. Classical Model Performance

This subsection examines the empirical performance of the classical econometric models. The evaluation focuses on forecast accuracy (MAE, RMSE, QLIKE), overfitting diagnostics, and overall model behavior across horizons and volatility regimes.

4.2.1. Point Forecast Accuracy (MAE, RMSE, QLIKE)

Forecast accuracy is evaluated using MAE, RMSE, and QLIKE loss. Figure 2 presents the average performance of ARIMA(1,0,1), GARCH(1,1), and HAR-RV models across the three realized-volatility estimators (Close, Parkinson, and Yang–Zhang), aggregated over all indices and forecast horizons (h = 1, 5, 22). The results show a clear and consistent pattern: HAR-RV achieves the lowest MAE and RMSE values, confirming its strong ability to capture multi-scale volatility dynamics. By incorporating lagged RV terms, the HAR-RV adjusts more smoothly to persistent volatility patterns, leading to better short and medium-term forecasts. GARCH(1,1) delivers moderate results compared with ARIMA(1,0,1), and reduces forecast errors more effectively than ARIMA(1,0,1), but remains less accurate than HAR-RV (Appendix A Table A1). Its QLIKE values remain low, reflecting good clustering capture but weaker amplitude precision. ARIMA(1,0,1) systematically exhibits the highest error values in all three metrics, indicating limited adaptability to heteroskedastic behavior. Among the RV estimators, the Yang–Zhang measure yields the lowest overall errors, indicating superior sensitivity to daily and overnight price variation and providing the most stable target dynamics for classical models.

In summary, these results confirm that the strong persistence of RV is essential for reliable volatility forecasts. HAR-RV provides the most robust classical benchmark, while ARIMA remains limited by its inability to adapt to time-varying volatility.

4.2.2. Statistical Significance (DM Test)

The DM test evaluates whether two competing models generate significantly different forecast errors over the same evaluation period. A low p-value (below 0.05) indicates that the models differ significantly, while higher values suggest that their forecasts are statistically similar (Appendix A Table A2).

Figure 3 presents pairwise p-values for all model comparisons. At the short horizon (h = 1), almost all model comparisons show highly significant differences (p ≈ 0.000). This confirms that all three classical models generate clearly distinct short-term forecasts. The largest contrast appears between HAR-RV and GARCH(1,1), emphasizing the role of lagged RV terms in improving short-horizon prediction. At the long horizon (h = 22), statistical differences weaken. The p-value between ARIMA and HAR-RV (p ≈ 0.17) shows that their long-run forecasts are statistically similar. In contrast, GARCH(1,1) remains significantly different from both models, reflecting its stronger dependence on persistence dynamics. The average heatmap across all horizons supports these observations: HAR-RV and GARCH remain statistically distinct, whereas ARIMA and HAR-RV converge as the forecast horizon increases.

Overall, the DM test confirms that the classical models are not interchangeable. HAR-RV consistently delivers superior short-term accuracy, ARIMA(1,0,1) becomes comparable at medium and long horizons, and GARCH(1,1) captures volatility persistence but does not exceed HAR-RV in accuracy.

4.2.3. Overfitting Diagnostics for Classical Models

Model robustness is assessed through in-sample and out-of-sample QLIKE losses for ARIMA(1,0,1), GARCH(1,1), and HAR-RV (Figure 4). A Train/Test ratio close to 1 signals good generalization, while substantial deviations indicate mis-specification. The diagnostics combine out-of-sample losses, train/test ratios, rolling-window behavior, and parameter-path stability as complementary indicators of robustness. Residual portmanteau tests (e.g., Ljung–Box) are not included, as they serve a supportive rather than decisive role in forecasting evaluation.

Figure 5 reports parameter stability and QLIKE-based diagnostics for the GARCH(1,1) model applied to the S&P 500, NASDAQ 100, and DJIA. The left panels track the evolution of

ω, α_{1}

and

β_{1}

under expanding-window refits. Across all indices, the persistence term

(α_{1} + β_{1})

remains below 1, confirming covariance stationarity and stable conditional-variance dynamics. The right panels display train and test QLIKE losses. Ratios between 0.74 and 0.87 remain below 1, confirming mild and well-controlled overfitting and strong generalization performance.

Overall, the overfitting diagnostics indicate that all classical models generalize well and remain stable across market conditions. Train/Test QLIKE ratios below one, together with stable parameter paths, show that these models capture persistent volatility dynamics without fitting noise. From a financial perspective, this confirms that ARIMA(1,0,1), HAR-RV, and GARCH(1,1) produce reliable volatility estimates that are robust to shifts in market regimes—an essential property for risk management, trading strategies, and forecasting applications.

4.2.4. Forecasting Results

The forecasting performance of the classical models confirms the conclusions drawn from the error and overfitting analyses. Across all indices, RV displays clear clustering and persistence patterns, which ARIMA(1,0,1) captures only partially. GARCH(1,1) adapts better to variance shifts and follows observed peaks and troughs more closely, especially during volatility spikes. However, its forecasts tend to revert too quickly toward the mean, leading to underestimation during extended high-volatility periods. In contrast, the HAR-RV model delivers the smoothest and most consistent forecasts, effectively tracing the underlying volatility dynamics across both short and medium horizons. Its multi-lag structure enables it to incorporate long-term memory effects, which improves synchronization with RV, particularly for the Yang–Zhang estimator provides a stable representation of long-range dependence and filters short-term noise more effectively than ARIMA and GARCH. To further illustrate these dynamics, Figure 6 presents representative forecast panels for ARIMA(1,0,1), GARCH(1,1), and HAR-RV across three major U.S. indices and three RV estimators (Close, Parkinson, Yang–Zhang) over forecast horizons of h = 1, 5, 22 days. The plots show the logarithmic variance (true vs. predicted) and demonstrate how each model captures volatility movements at different time scales.

Across all indices, HAR-RV consistently aligns most closely with realized variance. Its use of multi-period lags allows for smoother yet responsive adaptation to volatility persistence, effectively capturing both short-term spikes and gradual shifts with relatively low error dispersion. GARCH(1,1) also models volatility clustering well, though it tends to overreact in calm periods and slightly underestimate prolonged extremes during turbulent episodes. This moderate bias is particularly visible in high-volatility windows, where GARCH reverts too rapidly toward its conditional mean. ARIMA(1,0,1), by contrast, produces flatter and less adaptive forecasts, reflecting its linear structure and limited ability to capture volatility feedback effects. As a result, its predictions deviate more strongly from RV, especially for short-horizon forecasts. At longer horizons (h = 22), both HAR-RV and GARCH(1,1) maintain coherent predictive patterns, while ARIMA forecasts gradually diverge from RV, confirming the superior robustness of models explicitly built on variance dynamics. Among the RV estimators, Yang–Zhang yields the smallest and most stable forecast errors across models, suggesting that its inclusion of both overnight and intraday information improves the detection of persistent variance components. These forecasting patterns are consistent across all three indices and reinforce the advantages of RV-based models when long-memory effects are present. Combined with the overfitting diagnostics, these results motivate the transition toward AI-based architectures capable of capturing nonlinear dependencies and regime-specific volatility patterns.

4.3. Advanced Models Performance

This section examines the empirical performance of the advanced models—DL and TRF architectures. These approaches are designed to capture nonlinearities, long-term dependencies, and complex interactions within realized variance (RV) that linear econometric models often fail to represent. The evaluation covers three main dimensions: (1) forecast accuracy across horizons (h = 1, 5, 22), (2) training stability and convergence, and (3) statistical significance of performance differences based on the DM test. All models are trained on log-realized variance and validated using an anchored walk-forward procedure, which expands the training window sequentially while keeping the initial anchor point fixed. This approach reflects realistic forecasting conditions and ensures consistent out-of-sample evaluation.

4.3.1. Forecast Accuracy for Advanced Models (MAE, RMSE, QLIKE)

Figure 7 summarizes the aggregated results across all indices and realized variance estimators. Each cell in the heatmap reports the mean error value for a given model and horizon, with darker blue shades corresponding to lower errors (i.e., stronger predictive performance). The Transformer and PatchTST-lite architectures consistently achieve the lowest error values across all metrics, with the advantage becoming particularly pronounced at longer horizons (h = 22), where recurrent models gradually lose calibration. The LSTM model performs competitively at short horizons (h = 1) but loses accuracy as the horizon extends. In contrast, CNN-LSTM exhibits the weakest calibration overall, reflected in its higher RMSE and QLIKE losses (Appendix A Table A3) These results indicate that attention-based architectures outperform recurrent and convolutional models by better capturing multi-scale temporal dependencies and the nonlinear persistence characteristic of financial volatility.

4.3.2. Statistical Significance (Diebold–Mariano)

To verify whether the observed differences in forecasting accuracy are statistically meaningful, the DM test is applied across all horizons and RV measures. The DM statistic compares the forecast loss differentials between pairs of models: positive values indicate that the model in the row performs better, while negative values imply weaker performance. Figure 8 presents pairwise DM statistics for horizons h = 1, 5, 22, together with the aggregated mean heatmap. Each cell in the heatmap showsthe mean DM value between two models. Warmer colors (red) signal statistically significant outperformance of the row model (p < 0.05), whereas cooler colors (blue) indicate the opposite. At short horizons (h = 1), differences in predictive accuracy are generally minor, suggesting that most architectures adapt similarly to near-term volatility fluctuations. However, as the horizon increases (h = 5 and h = 22), Transformer and PatchTST-lite show consistently superior and statistically significant performance relative to the recurrent models. This reflects their stronger ability to model long-term persistence and nonlinear variance dynamics. Among all models, CNN-LSTM performs the weakest, showing predominantly negative DM values relative to its counterparts.

As the forecast horizon lengthens, performance differences become clearer and more systematic. While all models perform similarly over short horizons (h = 1), the Transformer and PatchTST-lite architectures gain a clear advantage at medium and long horizons (h = 5, 22). Their ability to capture long-term dependencies and persistent volatility dynamics translates into statistically significant improvements over the recurrent models.

4.3.3. Overfitting Diagnostics for Deep Learning Models

To verify that the DL models produce reliable and stable forecasts, an overfitting audit was conducted. The purpose of this analysis is to assess whether the models learn structural patterns in volatility instead of memorizing the training data. For this reason, both the Train/Test loss ratios and the learning curves were examined across forecast horizons (h = 1, 5, 22).

Figure 9 provides a visual overview of model generalization and stability across horizons. In Panel (a), the average Train/Test loss ratios remain close to one and below the red reference line, indicating a balanced relationship between training and test losses and suggesting that the models do not overfit to a meaningful extent. As the forecast horizon increases (h = 22), the uncertainty bands widen slightly, which is expected when predicting further into the future. At the short horizon (h = 1), both training and validation losses decline quickly and converge within 8–10 epochs, with minimal difference between them. This pattern indicates stable learning dynamics and only minor overfitting, making additional regularization unnecessary. At the medium horizon (h = 5), the validation loss begins to plateau earlier than the training loss, creating a small but visible gap between the two. This suggests mild overfitting, which could be mitigated through small increases in dropout, weight decay, or early stopping. At the long horizon (h = 22), the gap between training and validation losses becomes clearer. While the training loss continues to decrease, the validation loss remains higher and more volatile, indicating partial overfitting. In this case, lower learning rates and stronger regularization, or a shorter input window may help stabilize learning.

Overall, the DL models generalize well at short horizons, show some divergence at medium horizons, and require tighter regularization for longer-term forecasts. These diagnostics confirm that the networks are generally well-calibrated and exhibit only modest overfitting as the forecasting horizon and model difficulty increases.

4.3.4. Forecasting Results

To complement the quantitative evaluation, Figure 10 illustrates the predicted and RV trajectories for representative cases across the three forecast horizons (h = 1, 5, 22). These visual comparisons provide an intuitive view of how each model tracks volatility fluctuations and responds to volatility clusters and regime. All models were trained and validated using an anchored walk-forward procedure to ensure a realistic, time-consistent evaluation of out-of-sample performance. Model complexity was controlled through early stopping and out-of-sample validation to limit overfitting.

At the short horizon (Panel a, Figure 10, h = 1), both Transformer and LSTM follow the realized variance closely, capturing most short-lived volatility bursts. Their forecasts react quickly to new information and exhibit minimal delay after market shocks, reflecting strong sensitivity to high-frequency dynamics. At the medium horizon (Panel b, h = 5), forecasts become smoother and less reactive to short-term noise. The PatchTST-lite model maintains consistent alignment with the overall variance level, while recurrent architectures (LSTM and CNN-LSTM) show lagged responses and tend to underpredict during turbulent episodes. This behavior suggests that attention-based models integrate information more effectively across multiple time scales. At the long horizon (Panel c, h = 22), all models naturally show wider deviations from realized variance due to accumulated forecast uncertainty. Nevertheless, the Transformer continues to reproduce broad volatility regimes more accurately than the other architectures. It captures both the persistence of calm periods and the amplitude of volatility spikes, demonstrating robust adaptability even at longer horizons.

Taken together, the visual results confirm the quantitative findings: attention-based models such as Transformer and PatchTST-lite deliver more stable and well-calibrated volatility forecasts across horizons, whereas recurrent models perform well in short-term dynamics but gradually lose precision as the forecast horizon lengthens.

4.4. Comparative Evaluation and Statistical Significance

To complement the statistical testing, this section applies SHAP analysis to understand which factors drive model performance and forecast error variability across datasets (Figure 11). For QLIKE, the analysis reveals that

R V_c l o s e

is the dominant determinant of forecasting accuracy, followed by ARIMA(1,0,1) and HAR-RV, while the forecast horizon H exerts a smaller but consistent influence. The choice of index has little impact on error magnitude, indicating that differences in performance are primarily driven by model architecture and the specification of the RV measure, rather than by market-specific characteristics. For the DM statistic, the largest SHAP contributions come from ARIMA(1,0,1) and

R V_c l o s e

, indicating that these features explain most of the statistical differences between models. GARCH(1,1) and HAR-RV also rank highly, confirming that classical econometric frameworks continue to provide robust benchmarks for comparative evaluation.

In contrast, DL architectures exhibit smaller individual SHAP contributions, implying that their predictive strength stems from broader multi-feature interactions rather than reliance on a single dominant driver. Across both QLIKE and DM metrics, the SHAP patterns consistently show that volatility-based inputs (e.g.,

R V_c l o s e

, Yang–Zhang) remain key predictors of accuracy, while the choice of model family determines whether these improvements become statistically significant. Short horizons (h = 1, 5) contribute more strongly to DM dominance, whereas long horizons (h = 22) tend to dilute these differences due to higher forecast uncertainty.

The SHAP bar and beeswarm plots offer complementary insights: the mean (SHAP) bars highlight the relative importance of each factor, while the beeswarm plots capture the direction and variability of their effects. Higher SHAP values indicate factors that increase forecasting errors, whereas negative values correspond to better accuracy. The close similarity between QLIKE and DM SHAP patterns confirms that the same underlying factors shape both absolute error size and relative model dominance. Overall, the findings show that model architecture—rather than index choice or horizon length—is the key determinant of forecasting quality. The SHAP results therefore reinforce the broader empirical conclusion: attention-based models, particularly PatchTST-lite, achieve the most reliable and accurate volatility forecasts by leveraging rich, multi-feature interactions rather than isolated predictors.

4.5. Subsample Robustness Analysis

The sample is divided into four economically distinct periods: pre-GFC (2000–2006), GFC (2007–2009), post-GFC (2010–2019), and the COVID-19 period (2020–2025). For each subsample, we recompute the QLIKE error for the best-performing classical model and the best DL architecture (Table 4).

Across all regimes, the ranking of models remains broadly consistent. HAR-RV delivers the lowest classical-model errors during tranquil periods, when volatility persistence dominates and shocks are moderate. GARCH(1,1) becomes the strongest classical benchmark during crisis episodes such as the GFC and COVID-19, reflecting its ability to respond rapidly to large volatility shocks. Among the advanced models, Transformer-based architectures consistently achieve the lowest QLIKE errors in stress periods, particularly when volatility shifts rapidly and exhibits nonlinear propagation. At shorter horizons in calm regimes, recurrent models such as LSTM and CNN-LSTM perform competitively, but their advantage diminishes during turbulent periods and as the forecast horizon increases.

These results indicate that the main conclusions of the study are not driven by a single historical window. Instead, the model rankings remain stable across major economic regimes, confirming that (a) classical models outperform in stable, low-volatility conditions, while (b) Transformers dominate in crisis-driven and high-uncertainty environments, where long-range dependencies and nonlinear dynamics become critical.

The variation in best-performing models across subsamples reflects differences in market regimes. During tranquil periods (Pre-GFC and Post-GFC), long-memory dynamics dominate and HAR-RV achieves the lowest QLIKE errors, whereas in crisis environments (GFC, COVID-19) GARCH(1,1) performs best because it reacts more rapidly to sudden volatility shocks. Since QLIKE penalizes volatility underestimation, regime shifts naturally lead to changes in the model ranking.

5. Economic Interpretations of Forecast Results

From the perspective of volatility forecasting and market dynamics, the comparison of predictive models reveals several economic mechanisms that shape the behavior of financial uncertainty. Volatility does not evolve smoothly over time; instead, it clusters into prolonged calm periods followed by episodes of heightened turbulence. This well-documented clustering phenomenon explains why investors cycle between increased risk-taking during tranquil phases and heightened caution when volatility rises. Because volatility directly influences risk premia, investor behavior, and the speed at which information is incorporated into asset prices, improvements in forecasting accuracy carry clear economic, rather than purely statistical, relevance.

Traditional econometric models such as GARCH(1,1) and HAR-RV remain reliable baselines for capturing persistence and volatility clustering (Ersin & Bildirici, 2023; Asgharian et al., 2013; Virk et al., 2024). They react to recent shocks and incorporate long-memory components, making them useful for understanding how markets up-date risk assessments.

The strong performance of Transformer-based architectures shows that volatility is driven not merely by random shocks, but by the way financial news and macroeconomic events propagate through markets—unevenly, in waves, and with varying in-tensity. Investor reactions are asymmetric: negative news often triggers sharp overreactions, while positive information is absorbed more gradually. Transformers capture these informational cascades, behavioral asymmetries, and long-memory effects far more effectively than classical linear models. As a result, the empirical findings relate directly to themes of market efficiency, asymmetric information transmission, and the economics of uncertainty.

Because classical models rely on fixed parametric structures they struggle to detect abrupt regime shifts, asymmetric responses to bad news, and structural breaks that increasingly characterize modern markets. During macroeconomic shocks, geopolitical events, or panic-driven sell-offs, these models tend to react too slowly, leading to underestimated risk premia and delayed adjustments in investor positioning.

In terms of news processing, differences in forecast accuracy reveal that classical models implicitly assume smooth information arrival and linear shock propagation (Ersin & Bildirici, 2023; Asgharian et al., 2013; Virk et al., 2024), which explains their weaker performance during crisis episodes. Attention-based architectures, by contrast, extract relevant signals from irregular and clustered news flows, enabling them to adapt to state-dependent information dynamics. Their superior accuracy suggests that markets react selectively, rather than uniformly—a view aligned with behavioral finance and regime-dependent risk pricing.

The economic implications extend directly to risk management and portfolio construction. More accurate volatility forecasts improve the calibration of VaR and ES, reduce the risk of underestimating losses, and support timely adjustments in leverage, margin requirements, and exposure limits. Transformer models detect volatility spikes earlier than GARCH, LSTM, or CNN-LSTM, allowing faster deleveraging and more effective crisis responses (Nie et al., 2023; Zeng et al., 2023). In volatility-managed strategies, models that recognize regime changes early provide a notable advantage: because portfolio exposure scales inversely with expected volatility, earlier detection of spikes leads to higher risk-adjusted performance—an area where Transformers consistently outperform other frameworks.

Deep-learning models such as LSTM and CNN-LSTM capture nonlinear and non-stationary dynamics well (Harikumar & Muthumeenakshi, 2025) because they learn hidden dependencies, repeated patterns, and sudden shifts in market behavior. However, their performance depends strongly on the choice of window length (e.g., 20, 60, or 120 days). Short windows cause the model to overweight recent events and overestimate volatility, leading to excessive caution and lost return opportunities. Long windows overly smooth shocks, causing delayed reactions—especially in crises, when rapid de-risking is critical. Their sensitivity to data scaling can also distort shock magnitude and delay portfolio adjustments. Thus, despite their strong nonlinear capabilities, their forecasts may be less stable under extreme market conditions.

In contrast, Transformer models generalize far more robustly across horizons and market regimes because they do not rely on a fixed time window or on sequential data processing. Their self-attention mechanism allows them to compare all observations simultaneously and extract the most relevant signals from the entire series, whether the market is calm or highly turbulent. This enables them to capture both local shocks and global regime shifts–even when these occur abruptly.

From a financial standpoint, this capability is essential: the model automatically in-creases attention to accelerating price movements, rising cross-asset correlations, negative-news clusters, and growing liquidity stress. Transformers therefore detect early signs of risk-off behavior, widening risk premia, and mounting market uncertainty much faster than recurrent and classical models. This results in earlier reductions in exposure, more precise VaR and ES estimates, and more agile portfolio-management decisions during periods when accuracy is most valuable.

Among all tested models, PatchTST-lite delivers the strongest out-of-sample performance across indices and horizons. This is consistent with evidence that time-series transformers such as Informer and Autoformer capture long-range dependencies more effectively while avoiding the gradient-decay limitations of recurrent networks (Nie et al., 2023; Zeng et al., 2023). Low QLIKE and RMSE values across the S&P 500, NASDAQ 100, and DJIA confirm their ability to model regime changes and the dynamics of uncertainty.

Finally, consistent with literature on realized volatility and multi-estimator frame-works (L. Zhang et al., 2005; Patton & Sheppard, 2009; O. Barndorff-Nielsen & Shephard, 2004; Tauchen & Zhou, 2011), forecasting accuracy improves when diverse volatility signals are combined–such as daily returns, range-based measures, extreme price moves, volume pressure, news shocks, and shifts in correlations. HAR models achieve this through multi-scale aggregation, while attention-based models learn these heterogeneous components directly. Integrating these sources provides a more complete representation of market risk, leading to improved estimation of risk premia, more efficient exposure management, and earlier detection of emerging market stress.

In sum, the superior accuracy of attention-based architectures indicates that financial volatility is driven by long-memory dynamics, nonlinear information flows, regime shifts, and asymmetric investor reactions. Models capable of learning these mechanisms produce forecasts that more accurately reflect the economic structure of uncertainty.

6. Conclusions

This study provides a unified empirical comparison of classical econometric models, DL architectures, and modern Transformer frameworks in forecasting realized variance for major U.S. equity indices. The results show that model architecture is a central determinant of forecasting accuracy and robustness across horizons. Classical models such as GARCH(1,1) and HAR-RV remain reliable baselines under stable market conditions, where persistence and multi-scale variance dynamics dominate. However, their flexibility is limited during turbulent periods and structural breaks.

DL models enhance predictive accuracy by capturing nonlinearities and long-range dependencies in volatility dynamics, yet the shift toward attention-based architectures represents a substantive methodological transition. Lightweight Transformer variants such as PatchTST-lite demonstrate stronger generalization, greater adaptability to regime changes, and more stable performance across horizons. Their ability to learn long-memory effects, nonlinear information propagation, and heterogeneous investor reactions allows them to outperform both classical and recurrent neural models.

The findings carry important implications for practitioners and policymakers. More accurate volatility forecasts strengthen risk-management practices, improve exposure-scaling rules, and support more reliable calibration of VaR and ES during turbulent periods. Since volatility is a key state variable for risk premia and capital allocation, models capable of capturing regime-dependent dynamics provide earlier and more informative signals about shifting market uncertainty and investor behavior. The strong performance of Transformer architectures highlights the growing relevance of explainable AI in volatility modeling and contributes to bridging the gap between predictive accuracy and economic interpretability.

The study also identifies several limitations that open avenues for further research. First, range-based realized-volatility measures rely on daily OHLC data and do not fully capture intraday variation. Second, the empirical evaluation focuses on statistical accuracy rather than economic backtesting through volatility-managed strategies, option-hedging experiments, or extended VaR/ES validation. Third, despite their strong performance, Transformer models remain relatively opaque, motivating future work on hybrid designs that embed structural economic constraints.

Promising directions for future research include integrating Transformer-based volatility forecasts into option-pricing and risk-management frameworks, assessing their impact on implied-volatility surfaces and dynamic-hedging accuracy, and extending model design to incorporate macro-financial variables, sentiment indicators, or high-frequency realized measures.

Switching-regime models and economically constrained Transformers also represent an important avenue, as they can detect abrupt market transitions, capture asymmetric volatility responses (leverage effects), and model time-varying risk premia. These features bring models closer to real-world financial behavior, where risk, uncertainty, and investor reactions depend on the prevailing regime rather than remaining constant.

Overall, the evidence shows that attention-based architectures represent the next generation of volatility-forecasting tools. Their combination of predictive accuracy, scalability, and economic relevance enhances the methodological toolkit and deepens our understanding of how financial markets process information and transmit uncertainty across regimes.

Author Contributions

Conceptualization, G.T.-A. and D.G.; methodology, G.T.-A.; software, G.T.-A.; validation, G.T.-A., and D.G.; formal analysis, G.T.-A. and D.G.; investigation, G.T.-A. and D.G.; resources, D.G.; data curation, G.T.-A.; writing—original draft preparation, G.T.-A. and D.G.; visualization, G.T.-A.; supervision, G.T.-A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the public domain at https://www.investing.com. These data were derived from the following publicly accessible resource: Investing.com (https://www.investing.com).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

All appendix tables are reproduced directly from Python-generated outputs produced within the empirical forecasting pipeline.

Table A1. Forecasting Errors of Classical Volatility Models Across Indices, RV Estimators, and Forecast Horizons (MAE, RMSE, QLIKE).

Index	RV	H	Model	MAE	RMSE	QLIKE
s&p_500	Close	1	ARIMA(1,0,1)	2.190466	2.943689	15.86222
s&p_500	Close	1	GARCH(1,1)	2.062541	3.007749	1.586882
s&p_500	Close	1	HAR-RV	1.897341	2.608537	4.457791
s&p_500	Close	5	ARIMA(1,0,1)	2.191737	2.944996	15.92837
s&p_500	Close	5	GARCH(1,1)	2.085445	3.035018	1.588451
s&p_500	close	5	HAR-RV	1.916942	2.616756	4.718463
s&p_500	close	22	ARIMA(1,0,1)	2.197184	2.950601	16.21369
s&p_500	close	22	GARCH(1,1)	2.171591	3.131785	1.62144
s&p_500	close	22	HAR-RV	1.950925	2.669916	5.794402
s&p_500	parkinson	1	ARIMA(1,0,1)	1.545719	1.891499	2.001075
s&p_500	parkinson	1	GARCH(1,1)	0.979665	1.202144	0.450411
s&p_500	parkinson	1	HAR-RV	0.69657	0.871594	0.481789
s&p_500	parkinson	5	ARIMA(1,0,1)	1.548011	1.894096	2.005168
s&p_500	parkinson	5	GARCH(1,1)	1.01858	1.243338	0.475387
s&p_500	parkinson	5	HAR-RV	0.78301	0.982462	0.657114
s&p_500	parkinson	22	ARIMA(1,0,1)	1.557792	1.905159	2.02269
s&p_500	parkinson	22	GARCH(1,1)	1.15604	1.389518	0.569135
s&p_500	parkinson	22	HAR-RV	1.011914	1.280252	1.568271
s&p_500	yang_zhang	1	ARIMA(1,0,1)	1.212102	1.539617	1.071757
s&p_500	yang_zhang	1	GARCH(1,1)	0.66332	0.797295	0.293285
s&p_500	yang_zhang	1	HAR-RV	0.04841	0.077893	0.003229
s&p_500	yang_zhang	5	ARIMA(1,0,1)	1.21386	1.54181	1.074058
s&p_500	yang_zhang	5	GARCH(1,1)	0.68969	0.816587	0.295144
s&p_500	yang_zhang	5	HAR-RV	0.17158	0.241657	0.032598
s&p_500	yang_zhang	22	ARIMA(1,0,1)	1.22138	1.551146	1.083891
s&p_500	yang_zhang	22	GARCH(1,1)	0.798647	0.910986	0.328554
s&p_500	yang_zhang	22	HAR-RV	0.930589	1.481286	10.57867
nasdaq_100	close	1	ARIMA(1,0,1)	2.355122	3.08994	39.16095
nasdaq_100	close	1	GARCH(1,1)	1.99676	2.825239	1.512568
nasdaq_100	close	1	HAR-RV	1.82554	2.439304	3.504109
nasdaq_100	close	5	ARIMA(1,0,1)	2.357519	3.092874	39.623
nasdaq_100	close	5	GARCH(1,1)	2.011634	2.844247	1.513021
nasdaq_100	close	5	HAR-RV	1.859593	2.482687	4.316211
nasdaq_100	close	22	ARIMA(1,0,1)	2.367773	3.105451	41.65694
nasdaq_100	close	22	GARCH(1,1)	2.071724	2.915277	1.529352
nasdaq_100	close	22	HAR-RV	1.982175	2.617589	4.969706
nasdaq_100	parkinson	1	ARIMA(1,0,1)	1.834097	2.323519	3.073861
nasdaq_100	parkinson	1	GARCH(1,1)	1.000879	1.206354	0.46129
nasdaq_100	parkinson	1	HAR-RV	0.675013	0.845099	0.425372
nasdaq_100	parkinson	5	ARIMA(1,0,1)	1.83781	2.327707	3.09232
nasdaq_100	parkinson	5	GARCH(1,1)	1.027914	1.234052	0.478126
nasdaq_100	parkinson	5	HAR-RV	0.827037	1.054998	0.615849
nasdaq_100	parkinson	22	ARIMA(1,0,1)	1.853629	2.345531	3.172278
nasdaq_100	parkinson	22	GARCH(1,1)	1.127312	1.337739	0.543594
nasdaq_100	parkinson	22	HAR-RV	1.046467	1.402988	0.991795
nasdaq_100	yang_zhang	1	ARIMA(1,0,1)	1.692844	2.352314	2.111175
nasdaq_100	yang_zhang	1	GARCH(1,1)	0.507771	0.653675	0.254583
nasdaq_100	yang_zhang	1	HAR-RV	0.049286	0.083513	0.003853
nasdaq_100	yang_zhang	5	ARIMA(1,0,1)	1.696449	2.356699	2.121979
nasdaq_100	yang_zhang	5	GARCH(1,1)	0.518426	0.660913	0.250222
nasdaq_100	yang_zhang	5	HAR-RV	0.181447	0.258763	0.037915
nasdaq_100	yang_zhang	22	ARIMA(1,0,1)	1.711911	2.375321	2.168591
nasdaq_100	yang_zhang	22	GARCH(1,1)	0.575185	0.707292	0.25044
nasdaq_100	yang_zhang	22	HAR-RV	0.767306	1.143081	0.550826
dow_jones	close	1	ARIMA(1,0,1)	2.21004	2.958364	8.852452
dow_jones	close	1	GARCH(1,1)	2.034265	2.888398	1.557542
dow_jones	close	1	HAR-RV	1.873253	2.485833	4.554812
dow_jones	close	5	ARIMA(1,0,1)	2.211575	2.960264	8.872977
dow_jones	close	5	GARCH(1,1)	2.057825	2.916608	1.559809
dow_jones	close	5	HAR-RV	1.917561	2.53866	5.925871
dow_jones	close	22	ARIMA(1,0,1)	2.218175	2.968396	8.961539
dow_jones	close	22	GARCH(1,1)	2.146747	3.016324	1.595059
dow_jones	close	22	HAR-RV	1.899516	2.525767	5.345172
dow_jones	parkinson	1	ARIMA(1,0,1)	1.587541	1.961046	1.521659
dow_jones	parkinson	1	GARCH(1,1)	0.914155	1.195255	0.41325
dow_jones	parkinson	1	HAR-RV	0.698655	0.951657	0.505822
dow_jones	parkinson	5	ARIMA(1,0,1)	1.590467	1.964217	1.523238
dow_jones	parkinson	5	GARCH(1,1)	0.951811	1.232999	0.436347
dow_jones	parkinson	5	HAR-RV	0.764694	1.037907	0.682624
dow_jones	parkinson	22	ARIMA(1,0,1)	1.602964	1.97774	1.530081
dow_jones	parkinson	22	GARCH(1,1)	1.08666	1.369178	0.524821
dow_jones	parkinson	22	HAR-RV	0.94467	1.25347	1.37604
dow_jones	yang_zhang	1	ARIMA(1,0,1)	1.040494	1.282162	1.104241
dow_jones	yang_zhang	1	GARCH(1,1)	0.622354	0.764667	0.28261
dow_jones	yang_zhang	1	HAR-RV	0.047167	0.078626	0.003472
dow_jones	yang_zhang	5	ARIMA(1,0,1)	1.041462	1.283341	1.106915
dow_jones	yang_zhang	5	GARCH(1,1)	0.649438	0.783229	0.284522
dow_jones	yang_zhang	5	HAR-RV	0.170321	0.243659	0.034213
dow_jones	yang_zhang	22	ARIMA(1,0,1)	1.045579	1.288369	1.118378
dow_jones	yang_zhang	22	GARCH(1,1)	0.760427	0.876905	0.317268
dow_jones	yang_zhang	22	HAR-RV	0.697468	1.008185	1.51359

Table A2. Diebold–Mariano Test Results.

Index	RV	H	A	B	DM_Stat	p_Value
s&p_500	close	1	ARIMA(1,0,1)	GARCH(1,1)	11.95388	0
s&p_500	close	1	ARIMA(1,0,1)	HAR-RV	9.803763	0
s&p_500	close	1	HAR-RV	GARCH(1,1)	21.0731	0
s&p_500	close	5	ARIMA(1,0,1)	GARCH(1,1)	8.099955	$4.44 \times 10^{- 16}$
s&p_500	close	5	ARIMA(1,0,1)	HAR-RV	6.452851	$1.10 \times 10^{- 10}$
s&p_500	close	5	HAR-RV	GARCH(1,1)	14.78607	0
s&p_500	close	22	ARIMA(1,0,1)	GARCH(1,1)	4.315457	1 $1.59 \times 10^{- 5}$
s&p_500	close	22	ARIMA(1,0,1)	HAR-RV	3.197967	0.001384
s&p_500	close	22	HAR-RV	GARCH(1,1)	6.057798	$1.38 \times 10^{- 9}$
s&p_500	parkinson	1	ARIMA(1,0,1)	GARCH(1,1)	14.14948	0
s&p_500	parkinson	1	ARIMA(1,0,1)	HAR-RV	14.07068	0
s&p_500	parkinson	1	HAR-RV	GARCH(1,1)	1.857147	0.06329
s&p_500	parkinson	5	ARIMA(1,0,1)	GARCH(1,1)	7.581582	$3.42 \times 10^{- 14}$
s&p_500	parkinson	5	ARIMA(1,0,1)	HAR-RV	6.74267	$0.56 \times 10^{- 11}$
s&p_500	parkinson	5	HAR-RV	GARCH(1,1)	4.786655	$1.70 \times 10^{- 6}$
s&p_500	parkinson	22	ARIMA(1,0,1)	GARCH(1,1)	3.836474	0.000125
s&p_500	parkinson	22	ARIMA(1,0,1)	HAR-RV	1.077854	0.281099
s&p_500	parkinson	22	HAR-RV	GARCH(1,1)	4.077698	4 $4.55 \times 10^{- 5}$
s&p_500	yang_zhang	1	ARIMA(1,0,1)	GARCH(1,1)	19.56839	0
s&p_500	yang_zhang	1	ARIMA(1,0,1)	HAR-RV	27.12783	0
s&p_500	yang_zhang	1	HAR-RV	GARCH(1,1)	−27.6376	0
s&p_500	yang_zhang	5	ARIMA(1,0,1)	GARCH(1,1)	8.862698	0
s&p_500	yang_zhang	5	ARIMA(1,0,1)	HAR-RV	11.86448	0
s&p_500	yang_zhang	5	HAR-RV	GARCH(1,1)	−13.7568	0
s&p_500	yang_zhang	22	ARIMA(1,0,1)	GARCH(1,1)	4.318028	$1.57 \times 10^{- 5}$
s&p_500	yang_zhang	22	ARIMA(1,0,1)	HAR-RV	−1.66134	0.096644
s&p_500	yang_zhang	22	HAR-RV	GARCH(1,1)	1.795406	0.072589
nasdaq_100	close	1	ARIMA(1,0,1)	GARCH(1,1)	11.72362	0
nasdaq_100	close	1	ARIMA(1,0,1)	HAR-RV	11.16776	0
nasdaq_100	close	1	HAR-RV	GARCH(1,1)	21.51898	0
nasdaq_100	close	5	ARIMA(1,0,1)	GARCH(1,1)	8.095809	$4.44 \times 10^{- 16}$
nasdaq_100	close	5	ARIMA(1,0,1)	HAR-RV	7.490017	$6.88 \times 10^{- 14}$
nasdaq_100	close	5	HAR-RV	GARCH(1,1)	5.670492	$1.42 \times 10^{- 8}$
nasdaq_100	close	22	ARIMA(1,0,1)	GARCH(1,1)	4.394012	$1.11 \times 10^{- 5}$
nasdaq_100	close	22	ARIMA(1,0,1)	HAR-RV	4.065009	$4.80 \times 10^{- 5}$
nasdaq_100	close	22	HAR-RV	GARCH(1,1)	6.726306	$1.74 \times 10^{- 11}$
nasdaq_100	parkinson	1	ARIMA(1,0,1)	GARCH(1,1)	15.02516	0
nasdaq_100	parkinson	1	ARIMA(1,0,1)	HAR-RV	15.64822	0
nasdaq_100	parkinson	1	HAR-RV	GARCH(1,1)	−2.28995	0.022024
nasdaq_100	parkinson	5	ARIMA(1,0,1)	GARCH(1,1)	8.794855	0
nasdaq_100	parkinson	5	ARIMA(1,0,1)	HAR-RV	8.528795	0
nasdaq_100	parkinson	5	HAR-RV	GARCH(1,1)	4.159174	$3.19 \times 10^{- 5}$
nasdaq_100	parkinson	22	ARIMA(1,0,1)	GARCH(1,1)	4.720948	$2.35 \times 10^{- 6}$
nasdaq_100	parkinson	22	ARIMA(1,0,1)	HAR-RV	4.045961	$5.21 \times 10^{- 5}$
nasdaq_100	parkinson	22	HAR-RV	GARCH(1,1)	4.419376	$9.90 \times 10^{- 6}$
nasdaq_100	yang_zhang	1	ARIMA(1,0,1)	GARCH(1,1)	26.16152	0
nasdaq_100	yang_zhang	1	ARIMA(1,0,1)	HAR-RV	29.00147	0
nasdaq_100	yang_zhang	1	HAR-RV	GARCH(1,1)	−21.0011	0
nasdaq_100	yang_zhang	5	ARIMA(1,0,1)	GARCH(1,1)	11.78916	0
nasdaq_100	yang_zhang	5	ARIMA(1,0,1)	HAR-RV	12.84313	0
nasdaq_100	yang_zhang	5	HAR-RV	GARCH(1,1)	−9.30105	0
nasdaq_100	yang_zhang	22	ARIMA(1,0,1)	GARCH(1,1)	5.840253	$5.21 \times 10^{- 9}$
nasdaq_100	yang_zhang	22	ARIMA(1,0,1)	HAR-RV	5.004617	$5.60 \times 10^{- 7}$
nasdaq_100	yang_zhang	22	HAR-RV	GARCH(1,1)	6.228023	$4.71 \times 10^{- 10}$
dow_jones	close	1	ARIMA(1,0,1)	GARCH(1,1)	10.67274	0
dow_jones	close	1	ARIMA(1,0,1)	HAR-RV	6.429368	$1.28 \times 10^{- 10}$
dow_jones	close	1	HAR-RV	GARCH(1,1)	21.93289	0
dow_jones	close	5	ARIMA(1,0,1)	GARCH(1,1)	7.241718	$4.43 \times 10^{- 13}$
dow_jones	close	5	ARIMA(1,0,1)	HAR-RV	2.83222	0.004623
dow_jones	close	5	HAR-RV	GARCH(1,1)	11.67495	0
dow_jones	close	22	ARIMA(1,0,1)	GARCH(1,1)	3.860148	0.000113
dow_jones	close	22	ARIMA(1,0,1)	HAR-RV	1.91295	0.055754
dow_jones	close	22	HAR-RV	GARCH(1,1)	4.412203	$1.02 \times 10^{- 5}$
dow_jones	parkinson	1	ARIMA(1,0,1)	GARCH(1,1)	16.45357	0
dow_jones	parkinson	1	ARIMA(1,0,1)	HAR-RV	15.16031	0
dow_jones	parkinson	1	HAR-RV	GARCH(1,1)	4.832531	$1.35 \times 10^{- 6}$
dow_jones	parkinson	5	ARIMA(1,0,1)	GARCH(1,1)	8.837937	0
dow_jones	parkinson	5	ARIMA(1,0,1)	HAR-RV	6.730461	$1.69 \times 10^{- 11}$
dow_jones	parkinson	5	HAR-RV	GARCH(1,1)	5.459573	$4.77 \times 10^{- 8}$
dow_jones	parkinson	22	ARIMA(1,0,1)	GARCH(1,1)	4.542445	$5.56 \times 10^{- 6}$
dow_jones	parkinson	22	ARIMA(1,0,1)	HAR-RV	0.555469	0.578574
dow_jones	parkinson	22	HAR-RV	GARCH(1,1)	4.153913	3.27 x $10^{- 5}$
dow_jones	yang_zhang	1	ARIMA(1,0,1)	GARCH(1,1)	16.25125	0
dow_jones	yang_zhang	1	ARIMA(1,0,1)	HAR-RV	21.52219	0
dow_jones	yang_zhang	1	HAR-RV	GARCH(1,1)	−24.5077	0
dow_jones	yang_zhang	5	ARIMA(1,0,1)	GARCH(1,1)	7.324583	$2.40 \times 10^{- 13}$
dow_jones	yang_zhang	5	ARIMA(1,0,1)	HAR-RV	9.423073	0
dow_jones	yang_zhang	5	HAR-RV	GARCH(1,1)	−11.9976	0
dow_jones	yang_zhang	22	ARIMA(1,0,1)	GARCH(1,1)	3.551459	0.000383
dow_jones	yang_zhang	22	ARIMA(1,0,1)	HAR-RV	−0.66664	0.505002
dow_jones	yang_zhang	22	HAR-RV	GARCH(1,1)	2.185329	0.028865

Table A3. Forecast Error Values for Advanced Volatility Models (LSTM, CNN-LSTM, Transformer, PatchTST-lite) Across RV Estimators and Forecast Horizons (h = 1, 5, 22).

Index	RV	H	Model	MAE	RMSE	QLIKE
s&p_500	close	1	LSTM	1.863586	2.433881	4.608724
s&p_500	close	1	CNNLSTM	1.878374	2.430279	4.927867
s&p_500	close	1	Transformer	1.81857	2.419457	3.866168
s&p_500	close	1	PatchTST-lite	1.845279	2.437099	4.244226
s&p_500	close	5	LSTM	1.906283	2.451163	5.255926
s&p_500	close	5	CNNLSTM	1.898324	2.449351	5.276246
s&p_500	close	5	Transformer	1.910918	2.458969	5.579458
s&p_500	close	5	PatchTST-lite	1.870784	2.463561	4.807492
s&p_500	close	22	LSTM	1.888447	2.459761	4.807224
s&p_500	close	22	CNNLSTM	1.896499	2.468959	4.7817
s&p_500	close	22	Transformer	1.892453	2.47475	4.881677
s&p_500	close	22	PatchTST-lite	1.901906	2.478622	5.532154
s&p_500	parkinson	1	LSTM	0.717675	0.909185	0.57689
s&p_500	parkinson	1	CNNLSTM	0.709363	0.899914	0.573365
s&p_500	parkinson	1	Transformer	0.707107	0.897727	0.593991
s&p_500	parkinson	1	PatchTST-lite	0.716675	0.909292	0.610773
s&p_500	parkinson	5	LSTM	0.791176	1.001354	0.767843
s&p_500	parkinson	5	CNNLSTM	0.802061	1.017191	0.884593
s&p_500	parkinson	5	Transformer	0.78312	0.992067	0.805681
s&p_500	parkinson	5	PatchTST-lite	0.790207	0.997929	0.785396
s&p_500	parkinson	22	LSTM	0.849273	1.061342	0.755378
s&p_500	parkinson	22	CNNLSTM	0.86087	1.076602	0.839096
s&p_500	parkinson	22	Transformer	0.853919	1.073778	0.858656
s&p_500	parkinson	22	PatchTST-lite	0.846725	1.067773	0.989817
s&p_500	yang_zhang	1	LSTM	0.107	0.1561	0.013388
s&p_500	yang_zhang	1	CNNLSTM	0.091686	0.13737	0.010077
s&p_500	yang_zhang	1	Transformer	0.095119	0.143745	0.011135
s&p_500	yang_zhang	1	PatchTST-lite	0.097809	0.146895	0.011331
s&p_500	yang_zhang	5	LSTM	0.213754	0.300934	0.057485
s&p_500	yang_zhang	5	CNNLSTM	0.203555	0.28591	0.051547
s&p_500	yang_zhang	5	Transformer	0.221165	0.303101	0.05141
s&p_500	yang_zhang	5	PatchTST-lite	0.226451	0.308445	0.053766
s&p_500	yang_zhang	22	LSTM	0.504765	0.666804	0.311311
s&p_500	yang_zhang	22	CNNLSTM	0.499061	0.658474	0.297989
s&p_500	yang_zhang	22	Transformer	0.500213	0.659152	0.280434
s&p_500	yang_zhang	22	PatchTST-lite	0.51695	0.693152	0.323325
nasdaq_100	close	1	LSTM	1.846054	2.377758	4.0135
nasdaq_100	close	1	CNNLSTM	1.857555	2.382425	4.35926
nasdaq_100	close	1	Transformer	1.875776	2.380452	4.597229
nasdaq_100	close	1	PatchTST-lite	1.851063	2.373445	4.46793
nasdaq_100	close	5	LSTM	1.868726	2.387098	4.695791
nasdaq_100	close	5	CNNLSTM	1.839794	2.394148	3.875657
nasdaq_100	close	5	Transformer	1.818934	2.375125	3.565255
nasdaq_100	close	5	PatchTST-lite	1.822905	2.38162	3.837673
nasdaq_100	close	22	LSTM	1.91853	2.417123	5.293182
nasdaq_100	close	22	CNNLSTM	1.921711	2.423189	5.222672
nasdaq_100	close	22	Transformer	1.883382	2.416631	4.564707
nasdaq_100	close	22	PatchTST-lite	1.869928	2.408149	4.328153
nasdaq_100	parkinson	1	LSTM	0.656474	0.836234	0.48399
nasdaq_100	parkinson	1	CNNLSTM	0.672187	0.85334	0.519447
nasdaq_100	parkinson	1	Transformer	0.656146	0.835782	0.495191
nasdaq_100	parkinson	1	PatchTST-lite	0.651614	0.834936	0.506241
nasdaq_100	parkinson	5	LSTM	0.71256	0.907154	0.619259
nasdaq_100	parkinson	5	CNNLSTM	0.720531	0.917135	0.66056
nasdaq_100	parkinson	5	Transformer	0.721002	0.911427	0.577625
nasdaq_100	parkinson	5	PatchTST-lite	0.721852	0.921964	0.697614
nasdaq_100	parkinson	22	LSTM	0.758366	0.964748	0.703118
nasdaq_100	parkinson	22	CNNLSTM	0.771627	0.982671	0.721489
nasdaq_100	parkinson	22	Transformer	0.789826	1.006427	0.827204
nasdaq_100	parkinson	22	PatchTST-lite	0.775539	0.978418	0.641463
nasdaq_100	yang_zhang	1	LSTM	0.095747	0.141525	0.010995
nasdaq_100	yang_zhang	1	CNNLSTM	0.090325	0.135626	0.00953
nasdaq_100	yang_zhang	1	Transformer	0.090233	0.133595	0.009185
nasdaq_100	yang_zhang	1	PatchTST-lite	0.084479	0.126826	0.00867
nasdaq_100	yang_zhang	5	LSTM	0.201309	0.283641	0.049458
nasdaq_100	yang_zhang	5	CNNLSTM	0.196577	0.281939	0.049162
nasdaq_100	yang_zhang	5	Transformer	0.196317	0.276675	0.043372
nasdaq_100	yang_zhang	5	PatchTST-lite	0.205044	0.280398	0.044174
nasdaq_100	yang_zhang	22	LSTM	0.468609	0.629263	0.281611
nasdaq_100	yang_zhang	22	CNNLSTM	0.498324	0.656997	0.290477
nasdaq_100	yang_zhang	22	Transformer	0.478947	0.629071	0.262585
nasdaq_100	yang_zhang	22	PatchTST-lite	0.500796	0.671027	0.334713
dow_jones	close	1	LSTM	1.847802	2.443665	4.426976
dow_jones	close	1	CNNLSTM	1.861948	2.444142	4.7542
dow_jones	close	1	Transformer	1.83436	2.431458	4.090214
dow_jones	close	1	PatchTST-lite	1.838722	2.442273	4.400191
dow_jones	close	5	LSTM	1.854053	2.45159	4.648342
dow_jones	close	5	CNNLSTM	1.840471	2.445538	4.327708
dow_jones	close	5	Transformer	1.847194	2.434596	4.547064
dow_jones	close	5	PatchTST-lite	1.860548	2.453651	5.127373
dow_jones	close	22	LSTM	1.903721	2.4594	5.574053
dow_jones	close	22	CNNLSTM	1.865801	2.448277	4.67001
dow_jones	close	22	Transformer	1.837964	2.443301	4.284449
dow_jones	close	22	PatchTST-lite	1.846041	2.449109	4.300733
dow_jones	parkinson	1	LSTM	0.66404	0.837485	0.481757
dow_jones	parkinson	1	CNNLSTM	0.666295	0.839295	0.468118
dow_jones	parkinson	1	Transformer	0.66793	0.843683	0.510677
dow_jones	parkinson	1	PatchTST-lite	0.674127	0.846206	0.460885
dow_jones	parkinson	5	LSTM	0.714868	0.905726	0.609055
dow_jones	parkinson	5	CNNLSTM	0.720986	0.911346	0.629025
dow_jones	parkinson	5	Transformer	0.719854	0.905898	0.592983
dow_jones	parkinson	5	PatchTST-lite	0.724323	0.912757	0.635256
dow_jones	parkinson	22	LSTM	0.765173	0.960267	0.612206
dow_jones	parkinson	22	CNNLSTM	0.781241	0.983521	0.658715
dow_jones	parkinson	22	Transformer	0.76393	0.971118	0.835195
dow_jones	parkinson	22	PatchTST-lite	0.76366	0.968694	0.647512
dow_jones	yang_zhang	1	LSTM	0.085828	0.125242	0.008233
dow_jones	yang_zhang	1	CNNLSTM	0.085122	0.128343	0.008456
dow_jones	yang_zhang	1	Transformer	0.092888	0.13516	0.009149
dow_jones	yang_zhang	1	PatchTST-lite	0.085793	0.126192	0.008315
dow_jones	yang_zhang	5	LSTM	0.183967	0.254589	0.038657
dow_jones	yang_zhang	5	CNNLSTM	0.189308	0.265428	0.043682
dow_jones	yang_zhang	5	Transformer	0.194017	0.275715	0.042837
dow_jones	yang_zhang	5	PatchTST-lite	0.19478	0.279234	0.046453
dow_jones	yang_zhang	22	LSTM	0.455768	0.604879	0.226007
dow_jones	yang_zhang	22	CNNLSTM	0.455406	0.602958	0.230627
dow_jones	yang_zhang	22	Transformer	0.456496	0.601709	0.223307
dow_jones	yang_zhang	22	PatchTST-lite	0.464264	0.620634	0.264777

Table A4. Diebold–Mariano Test Results for Advanced Forecasting Models.

Index	RV	H	Model1	Model2	DM_Stat	p_Value
s&p_500	close	1	LSTM	CNNLSTM	−4.05481	$5.02 \times 10^{- 5}$
s&p_500	close	1	LSTM	Transformer	6.614859	$3.72 \times 10^{- 11}$
s&p_500	close	1	LSTM	PatchTST-lite	2.267691	0.023348
s&p_500	close	1	CNNLSTM	Transformer	7.853962	$4.00 \times 10^{- 15}$
s&p_500	close	1	CNNLSTM	PatchTST-lite	4.36369	$1.28 \times 10^{- 5}$
s&p_500	close	1	Transformer	PatchTST-lite	−2.49469	0.012607
s&p_500	close	5	LSTM	CNNLSTM	−0.26253	0.792914
s&p_500	close	5	LSTM	Transformer	−1.4326	0.151972
s&p_500	close	5	LSTM	PatchTST-lite	2.133038	0.032922
s&p_500	close	5	CNNLSTM	Transformer	−1.2994	0.193806
s&p_500	close	5	CNNLSTM	PatchTST-lite	2.131552	0.033044
s&p_500	close	5	Transformer	PatchTST-lite	6.246839	$4.19 \times 10^{- 10}$
s&p_500	close	22	LSTM	CNNLSTM	0.197285	0.843604
s&p_500	close	22	LSTM	Transformer	−0.47491	0.634848
s&p_500	close	22	LSTM	PatchTST-lite	−1.46417	0.143146
s&p_500	close	22	CNNLSTM	Transformer	−0.40941	0.682241
s&p_500	close	22	CNNLSTM	PatchTST-lite	−1.22987	0.218746
s&p_500	close	22	Transformer	PatchTST-lite	−1.48687	0.13705
s&p_500	parkinson	1	LSTM	CNNLSTM	0.855671	0.39218
s&p_500	parkinson	1	LSTM	Transformer	−1.46178	0.143801
s&p_500	parkinson	1	LSTM	PatchTST-lite	−3.69061	0.000224
s&p_500	parkinson	1	CNNLSTM	Transformer	−2.02616	0.042749
s&p_500	parkinson	1	CNNLSTM	PatchTST-lite	−4.11184	$3.93 \times 10^{- 5}$
s&p_500	parkinson	1	Transformer	PatchTST-lite	−1.90779	0.056418
s&p_500	parkinson	5	LSTM	CNNLSTM	−4.75083	$2.03 \times 10^{- 6}$
s&p_500	parkinson	5	LSTM	Transformer	−2.01915	0.043472
s&p_500	parkinson	5	LSTM	PatchTST-lite	−1.02773	0.304077
s&p_500	parkinson	5	CNNLSTM	Transformer	2.838139	0.004538
s&p_500	parkinson	5	CNNLSTM	PatchTST-lite	2.969584	0.002982
s&p_500	parkinson	5	Transformer	PatchTST-lite	1.224402	0.220801
s&p_500	parkinson	22	LSTM	CNNLSTM	−3.26076	0.001111
s&p_500	parkinson	22	LSTM	Transformer	−1.85124	0.064134
s&p_500	parkinson	22	LSTM	PatchTST-lite	−1.96586	0.049315
s&p_500	parkinson	22	CNNLSTM	Transformer	−0.27871	0.780467
s&p_500	parkinson	22	CNNLSTM	PatchTST-lite	−1.1249	0.260633
s&p_500	parkinson	22	Transformer	PatchTST-lite	−1.70589	0.088029
s&p_500	yang_zhang	1	LSTM	CNNLSTM	6.645132	$3.03 \times 10^{- 11}$
s&p_500	yang_zhang	1	LSTM	Transformer	4.87451	$1.09 \times 10^{- 6}$
s&p_500	yang_zhang	1	LSTM	PatchTST-lite	4.128009	$3.66 \times 10^{- 5}$
s&p_500	yang_zhang	1	CNNLSTM	Transformer	−4.96016	$7.04 \times 10^{- 7}$
s&p_500	yang_zhang	1	CNNLSTM	PatchTST-lite	−4.48589	$7.26 \times 10^{- 6}$
s&p_500	yang_zhang	1	Transformer	PatchTST-lite	−0.78763	0.430911
s&p_500	yang_zhang	5	LSTM	CNNLSTM	3.053755	0.00226
s&p_500	yang_zhang	5	LSTM	Transformer	0.935302	0.349633
s&p_500	yang_zhang	5	LSTM	PatchTST-lite	0.641425	0.521246
s&p_500	yang_zhang	5	CNNLSTM	Transformer	0.025789	0.979426
s&p_500	yang_zhang	5	CNNLSTM	PatchTST-lite	−0.46598	0.64123
s&p_500	yang_zhang	5	Transformer	PatchTST-lite	−1.16236	0.24509
s&p_500	yang_zhang	22	LSTM	CNNLSTM	1.701804	0.088792
s&p_500	yang_zhang	22	LSTM	Transformer	1.773549	0.076138
s&p_500	yang_zhang	22	LSTM	PatchTST-lite	−0.54716	0.584271
s&p_500	yang_zhang	22	CNNLSTM	Transformer	1.011579	0.31174
s&p_500	yang_zhang	22	CNNLSTM	PatchTST-lite	−1.21484	0.224426
s&p_500	yang_zhang	22	Transformer	PatchTST-lite	−2.5074	0.012162
nasdaq_100	close	1	LSTM	CNNLSTM	−2.90702	0.003649
nasdaq_100	close	1	LSTM	Transformer	−6.05367	$1.42 \times 10^{- 9}$
nasdaq_100	close	1	LSTM	PatchTST-lite	−3.57979	0.000344
nasdaq_100	close	1	CNNLSTM	Transformer	−1.3599	0.173862
nasdaq_100	close	1	CNNLSTM	PatchTST-lite	−0.59893	0.54922
nasdaq_100	close	1	Transformer	PatchTST-lite	0.991723	0.321333
nasdaq_100	close	5	LSTM	CNNLSTM	4.444677	$8.80 \times 10^{- 6}$
nasdaq_100	close	5	LSTM	Transformer	6.772077	$1.27 \times 10^{- 11}$
nasdaq_100	close	5	LSTM	PatchTST-lite	7.034635	$2.00 \times 10^{- 12}$
nasdaq_100	close	5	CNNLSTM	Transformer	3.247749	0.001163
nasdaq_100	close	5	CNNLSTM	PatchTST-lite	0.254486	0.79912
nasdaq_100	close	5	Transformer	PatchTST-lite	−2.50227	0.01234
nasdaq_100	close	22	LSTM	CNNLSTM	0.751886	0.45212
nasdaq_100	close	22	LSTM	Transformer	4.855544	$1.2 \times 10^{- 6}$
nasdaq_100	close	22	LSTM	PatchTST-lite	5.676986	$1.37 \times 10^{- 8}$
nasdaq_100	close	22	CNNLSTM	Transformer	3.625774	0.000288
nasdaq_100	close	22	CNNLSTM	PatchTST-lite	4.26292	$2.02 \times 10^{- 5}$
nasdaq_100	close	22	Transformer	PatchTST-lite	2.224686	0.026102
nasdaq_100	parkinson	1	LSTM	CNNLSTM	−6.68295	$2.34 \times 10^{- 11}$
nasdaq_100	parkinson	1	LSTM	Transformer	−1.75878	0.078615
nasdaq_100	parkinson	1	LSTM	PatchTST-lite	−4.12464	$3.71 \times 10^{- 5}$
nasdaq_100	parkinson	1	CNNLSTM	Transformer	2.736359	0.006212
nasdaq_100	parkinson	1	CNNLSTM	PatchTST-lite	1.609056	0.107604
nasdaq_100	parkinson	1	Transformer	PatchTST-lite	−1.63946	0.101117
nasdaq_100	parkinson	5	LSTM	CNNLSTM	−2.88732	0.003885
nasdaq_100	parkinson	5	LSTM	Transformer	2.775087	0.005519
nasdaq_100	parkinson	5	LSTM	PatchTST-lite	−3.86848	0.00011
nasdaq_100	parkinson	5	CNNLSTM	Transformer	3.458512	0.000543
nasdaq_100	parkinson	5	CNNLSTM	PatchTST--lite	−2.22442	0.02612
nasdaq_100	parkinson	5	Transformer	PatchTST-lite	−4.08435	$4.42 \times 10^{- 5}$
nasdaq_100	parkinson	22	LSTM	CNNLSTM	−0.45979	0.645669
nasdaq_100	parkinson	22	LSTM	Transformer	−3.76484	0.000167
nasdaq_100	parkinson	22	LSTM	PatchTST-lite	2.589052	0.009624
nasdaq_100	parkinson	22	CNNLSTM	Transformer	−1.7318	0.08331
nasdaq_100	parkinson	22	CNNLSTM	PatchTST-lite	1.489599	0.13633
nasdaq_100	parkinson	22	Transformer	PatchTST-lite	5.309007	$1.10 \times 10^{- 7}$
nasdaq_100	yang_zhang	1	LSTM	CNNLSTM	3.224419	0.001262
nasdaq_100	yang_zhang	1	LSTM	Transformer	3.479625	0.000502
nasdaq_100	yang_zhang	1	LSTM	PatchTST-lite	5.313043	$1.08 \times 10^{- 7}$
nasdaq_100	yang_zhang	1	CNNLSTM	Transformer	1.813045	0.069825
nasdaq_100	yang_zhang	1	CNNLSTM	PatchTST-lite	2.946305	0.003216
nasdaq_100	yang_zhang	1	Transformer	PatchTST-lite	2.026179	0.042746
nasdaq_100	yang_zhang	5	LSTM	CNNLSTM	0.271497	0.786009
nasdaq_100	yang_zhang	5	LSTM	Transformer	1.685478	0.091896
nasdaq_100	yang_zhang	5	LSTM	PatchTST-lite	1.625618	0.104031
nasdaq_100	yang_zhang	5	CNNLSTM	Transformer	1.678537	0.093242
nasdaq_100	yang_zhang	5	CNNLSTM	PatchTST-lite	1.689702	0.091085
nasdaq_100	yang_zhang	5	Transformer	PatchTST-lite	−0.49622	0.61974
nasdaq_100	yang_zhang	22	LSTM	CNNLSTM	−0.6793	0.49695
nasdaq_100	yang_zhang	22	LSTM	Transformer	1.328308	0.184076
nasdaq_100	yang_zhang	22	LSTM	PatchTST-lite	−1.98504	0.04714
nasdaq_100	yang_zhang	22	CNNLSTM	Transformer	1.576398	0.114934
nasdaq_100	yang_zhang	22	CNNLSTM	PatchTST-lite	−1.36517	0.172199
nasdaq_100	yang_zhang	22	Transformer	PatchTST-lite	−2.80679	0.005004
dow_jones	close	1	LSTM	CNNLSTM	−2.49616	0.012555
dow_jones	close	1	LSTM	Transformer	1.802849	0.071412
dow_jones	close	1	LSTM	PatchTST-lite	0.281314	0.77847
dow_jones	close	1	CNNLSTM	Transformer	6.027225	$1.67 \times 10^{- 9}$
dow_jones	close	1	CNNLSTM	PatchTST-lite	2.26839	0.023305
dow_jones	close	1	Transformer	PatchTST-lite	−1.50088	0.133387
dow_jones	close	5	LSTM	CNNLSTM	1.349998	0.177017
dow_jones	close	5	LSTM	Transformer	0.628707	0.529541
dow_jones	close	5	LSTM	PatchTST-lite	−2.80439	0.005041
dow_jones	close	5	CNNLSTM	Transformer	−1.67281	0.094365
dow_jones	close	5	CNNLSTM	PatchTST-lite	−2.24626	0.024687
dow_jones	close	5	Transformer	PatchTST-lite	−2.17685	0.029492
dow_jones	close	22	LSTM	CNNLSTM	6.345317	$2.22 \times 10^{- 10}$
dow_jones	close	22	LSTM	Transformer	8.178957	$4.44 \times 10^{- 16}$
dow_jones	close	22	LSTM	PatchTST-lite	7.197178	$6.15 \times 10^{- 13}$
dow_jones	close	22	CNNLSTM	Transformer	2.343471	0.019105
dow_jones	close	22	CNNLSTM	PatchTST-lite	2.358735	0.018337
dow_jones	close	22	Transformer	PatchTST-lite	−0.16868	0.86605
dow_jones	parkinson	1	LSTM	CNNLSTM	4.47613	$7.60 \times 10^{- 6}$
dow_jones	parkinson	1	LSTM	Transformer	−4.05704	$4.97 \times 10^{- 5}$
dow_jones	parkinson	1	LSTM	PatchTST-lite	2.775005	0.00552
dow_jones	parkinson	1	CNNLSTM	Transformer	−5.7337	$9.83 \times 10^{- 9}$
dow_jones	parkinson	1	CNNLSTM	PatchTST-lite	0.97873	0.327713
dow_jones	parkinson	1	Transformer	PatchTST-lite	7.044831	$1.86 \times 10^{- 12}$
dow_jones	parkinson	5	LSTM	CNNLSTM	−1.40625	0.159649
dow_jones	parkinson	5	LSTM	Transformer	1.218164	0.223162
dow_jones	parkinson	5	LSTM	PatchTST-lite	−1.33656	0.181365
dow_jones	parkinson	5	CNNLSTM	Transformer	1.754444	0.079355
dow_jones	parkinson	5	CNNLSTM	PatchTST-lite	−0.60848	0.542869
dow_jones	parkinson	5	Transformer	PatchTST-lite	−1.97576	0.048182
dow_jones	parkinson	22	LSTM	CNNLSTM	−1.94776	0.051444
dow_jones	parkinson	22	LSTM	Transformer	−2.20136	0.027711
dow_jones	parkinson	22	LSTM	PatchTST-lite	−1.07218	0.283641
dow_jones	parkinson	22	CNNLSTM	Transformer	−1.4778	0.139462
dow_jones	parkinson	22	CNNLSTM	PatchTST-lite	0.216919	0.828272
dow_jones	parkinson	22	Transformer	PatchTST-lite	2.620095	0.008791
dow_jones	yang_zhang	1	LSTM	CNNLSTM	−0.96838	0.332854
dow_jones	yang_zhang	1	LSTM	Transformer	−2.77878	0.005456
dow_jones	yang_zhang	1	LSTM	PatchTST-lite	−0.34766	0.728096
dow_jones	yang_zhang	1	CNNLSTM	Transformer	−3.55754	0.000374
dow_jones	yang_zhang	1	CNNLSTM	PatchTST-lite	0.728885	0.466072
dow_jones	yang_zhang	1	Transformer	PatchTST-lite	3.162585	0.001564
dow_jones	yang_zhang	5	LSTM	CNNLSTM	−2.59626	0.009425
dow_jones	yang_zhang	5	LSTM	Transformer	−1.15066	0.249871
dow_jones	yang_zhang	5	LSTM	PatchTST-lite	−3.27493	0.001057
dow_jones	yang_zhang	5	CNNLSTM	Transformer	0.168915	0.865864
dow_jones	yang_zhang	5	CNNLSTM	PatchTST-lite	−0.82035	0.412014
dow_jones	yang_zhang	5	Transformer	PatchTST-lite	−1.61533	0.106239
dow_jones	yang_zhang	22	LSTM	CNNLSTM	−1.39544	0.162883
dow_jones	yang_zhang	22	LSTM	Transformer	0.255507	0.798332
dow_jones	yang_zhang	22	LSTM	PatchTST-lite	−1.66104	0.096706
dow_jones	yang_zhang	22	CNNLSTM	Transformer	0.696543	0.486089
dow_jones	yang_zhang	22	CNNLSTM	PatchTST-lite	−1.50736	0.131719
dow_jones	yang_zhang	22	Transformer	PatchTST-lite	−2.21077	0.027052

References

Andersen, T. G., Bollerslev, T., Diebold, F. X., & Ebens, H. (2001). The distribution of realized stock return volatility. Journal of Financial Economics, 61(1), 43–76. [Google Scholar] [CrossRef]
Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2003). Modeling and forecasting realized volatility. Econometrica, 71(2), 579–625. [Google Scholar] [CrossRef]
Asgharian, H., Hou, A. J., & Javed, F. (2013). The importance of macroeconomic variables in forecasting stock return variance: A GARCH-MIDAS approach. Journal of Forecasting, 32(7), 600–612. [Google Scholar] [CrossRef]
Barndorff-Nielsen, O., & Shephard, N. (2004). Power and Bipower variation with stochastic volatility and jumps. Journal of Financial Econometrics, 2(1), 1–37. [Google Scholar] [CrossRef]
Barndorff-Nielsen, O. E., & Shephard, N. (2002). Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B, 64(2), 253–280. Available online: https://www.jstor.org/stable/3088799 (accessed on 25 September 2025). [CrossRef]
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327. [Google Scholar] [CrossRef]
Borovykh, A., Bohte, S., & Oosterlee, C. W. (2017). Conditional time series forecasting with convolutional neural networks. arXiv, arXiv:1703.04691. [Google Scholar] [CrossRef]
Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis: Forecasting and control. Holden-Day. [Google Scholar]
Brugiere, P., & Turinici, G. (2025). Transformer for time series: An application to the S&P500. In K. Arai (Ed.), Advances in information and communication (Vol. 1285). Springer. [Google Scholar] [CrossRef]
Chun, D., Cho, H., & Ryu, D. (2025). Volatility forecasting and volatility-timing strategies: A machine learning approach. Research in International Business and Finance, 75, 102723. [Google Scholar] [CrossRef]
Corsi, F. (2005). Measuring and modelling realized volatility: From tick-by-tick to long memory [Ph.D. thesis, Faculty of Economics, University of Lugano]. Available online: https://susi.usi.ch/usi/documents/317904 (accessed on 20 September 2025).
Corsi, F. (2009). A simple approximate long-memory model of realized volatility. Journal of Financial Econometrics, 7(2), 174–196. [Google Scholar] [CrossRef]
Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Business & Economic Statistics, 13(3), 253–263. [Google Scholar] [CrossRef]
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica, 50(4), 987–1007. [Google Scholar] [CrossRef]
Engle, R. F., Ghysels, E., & Sohn, B. (2013). Stock market volatility and macroeconomic fundamentals. Review of Economics and Statistics, 95(3), 776–797. Available online: http://www.jstor.org/stable/43554794 (accessed on 20 September 2025). [CrossRef]
Ersin, Ö. Ö., & Bildirici, M. (2023). Financial volatility modeling with the GARCH-MIDAS-LSTM approach: The effects of economic expectations, geopolitical risks and industrial production during COVID-19. Mathematics, 11(8), 1785. [Google Scholar] [CrossRef]
Ferreira, I. H., & Medeiros, M. C. (2021). Modeling and forecasting intraday market returns: A machine learning approach. arXiv, arXiv:2112.15108. [Google Scholar] [CrossRef]
Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2017). LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10), 2222–2232. [Google Scholar] [CrossRef]
Harikumar, Y., & Muthumeenakshi, M. (2025). An innovative study on stock price prediction for investment decision through ARIMA and LSTM with recurrent neural network. New Mathematics and Natural Computation, 21(3), 763–783. [Google Scholar] [CrossRef]
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Investing.com. (2025). Historical data for S&P 500, NASDAQ 100, and Dow Jones industrial average. Available online: https://www.investing.com/ (accessed on 5 September 2025).
Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica, 59(2), 347–370. [Google Scholar] [CrossRef]
Nie, Y., Haixu, Nguyrn, N. N., Sinthong, P., & Kalagnanam, J. (2023). A time series is worth 64 words: Long-term forecasting with transformers. arXiv, arXiv:2211.14730. [Google Scholar] [CrossRef]
Parkinson, M. (1980). The extreme value method for estimating the variance of the rate of return. Journal of Business, 53(1), 61–65. Available online: https://www.jstor.org/stable/2352357 (accessed on 15 September 2025). [CrossRef]
Patton, A. J. (2011). Volatility forecast comparison using imperfect volatility proxies. Journal of Econometrics, 160(1), 246–256. [Google Scholar] [CrossRef]
Patton, A. J., & Sheppard, K. (2009). Optimal combinations of realised volatility estimators. International Journal of Forecasting, 25(2), 218–238. Available online: https://www.sciencedirect.com/science/article/pii/S0169207009000107#sec2 (accessed on 15 September 2025). [CrossRef]
Patton, A. J., & Sheppard, K. (2015). Good volatility, bad volatility: Signed jumps and the persistence of volatility. Review of Economics and Statistics, 97(3), 683–697. [Google Scholar] [CrossRef]
Shi, X. J., Chen, Z. R., Wang, H., Yeung, D. Y., Wong, W. K., & Woo, W. C. (2015). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Advances in Neural Information Processing Systems, 28, 802–810. [Google Scholar] [CrossRef]
Souto, H. G., & Moradi, A. (2024). Can transformers transform financial forecasting? China Finance Review International. ahead-of-print. [Google Scholar] [CrossRef]
Tauchen, G., & Zhou, H. (2011). Realized jumps on financial markets and predicting credit spreads (FEDS Working Paper 2006-35). SSRN. [Google Scholar] [CrossRef]
Taylor, S. J. (2005). Asset price dynamics, volatility, and prediction. Princeton University Press. [Google Scholar]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention Is All You Need. arXiv, arXiv:1706.03762. [Google Scholar] [CrossRef]
Virk, N., Javed, F., Awartani, B., & Hyde, S. (2024). A reality check on the GARCH-MIDAS volatility models. European Journal of Finance, 30(6), 575–596. [Google Scholar] [CrossRef]
Yang, D., & Zhang, Q. (2000). Drift-independent volatility estimation based on high, low, open, and close prices. Journal of Business, 73(3), 477–491. [Google Scholar] [CrossRef]
Zeng, Z., Kaur, R., Siddagangappa, S., Rahimi, S., Balch, T., & Veloso, M. (2023). Financial time series forecasting using CNN and transformer. arXiv, arXiv:2304.04912. [Google Scholar] [CrossRef]
Zhang, L., Mykland, P. A., & Aït-Sahalia, Y. (2005). A tale of two time scales: Determining Integrated volatility with noisy high-frequency data. Journal of the American Statistical Association, 100(472), 1394–1411. Available online: https://www.jstor.org/stable/27590680 (accessed on 20 September 2025).
Zhang, Y., Zhang, T., & Hu, J. (2025). Forecasting stock market volatility using CNN-BiLSTM-attention model with mixed-frequency data. Mathematics, 13(11), 1889. [Google Scholar] [CrossRef]
Zhang, Z., Chen, B., Zhu, S., & Langrené, N. (2025). Quantformer: From attention to profit with a quantitative transformer trading strategy. arXiv, arXiv:2404.00424. [Google Scholar] [CrossRef]

Figure 1. Framework of the comparative volatility forecasting architecture.

Figure 2. Forecast accuracy of classical volatility models based on mean errors (MAE, RMSE, QLIKE). Mean forecasting errors of ARIMA(1,0,1), GARCH(1,1), and HAR-RV models across RV estimators (Close, Parkinson, Yang–Zhang), averaged across indices and forecast horizons (h = 1, 5, 22). Lower bars indicate higher predictive accuracy. HAR-RV consistently achieves the lowest MAE and RMSE values, followed by GARCH(1,1), while ARIMA(1,0,1) performs weakest across all metrics. A full comparison of MAE, RMSE, and QLIKE for all classical models and all RV estimators is reported in Appendix A Table A1. Figures produced in Python 3.10 using Matplotlib 3.10.7.

Figure 3. Diebold–Mariano p-values across forecast horizons for classical volatility models (ARIMA(1,0,1), HAR-RV, and GARCH(1,1)). Darker colors correspond to smaller p-values, indicating stronger statistical significance as shown by the continuous color scale. (p ≈ 0). The detailed DM results for the classical models are reported in Appendix A Table A2. Figures produced in Python 3.10 using Matplotlib (version 3.10.7) and Seaborn (version 0.13.2) for uniform visualization.

Figure 4. Overfitting diagnostics of classical volatility models (ARIMA, HAR-RV). Panel (a) shows Train/Validation ratios across horizons (h = {1, 5, 22}), with boxplots centered near the 1.0 benchmark. Panel (b) tracks mean ratio trends; the red dashed line denotes the benchmark value of 1.0, indicating the ideal train/validation ratio. Panel (c) provides a heatmap of average values in the 0.17–0.28 range, indicating stable and conservative fits. Figures produced in Python 3.10 using Matplotlib (version 3.10.7) and Seaborn (version 0.13.2).

Figure 5. Overfitting diagnostics for the classical GARCH(1,1) model across U.S. equity indices (S&P 500, NASDAQ 100, DJIA). Each row corresponds to one index. The left panels display the evolution of estimated GARCH parameters (

ω, α_{1}, β_{1})

under expanding-window re-estimation, capturing parameter stability over time. The red dashed horizontal line denotes the persistence boundary

α_{1} + β_{1} = 1 .

The right panels compare Train and Test QLIKE losses, providing evidence on out-of-sample generalization performance.. All indices satisfy

α_{1} + β_{1} < 1

and exhibit stable generalization. Figures produced in Python 3.10 using Matplotlib (version 3.10.7).

Figure 5. Overfitting diagnostics for the classical GARCH(1,1) model across U.S. equity indices (S&P 500, NASDAQ 100, DJIA). Each row corresponds to one index. The left panels display the evolution of estimated GARCH parameters (

ω, α_{1}, β_{1})

under expanding-window re-estimation, capturing parameter stability over time. The red dashed horizontal line denotes the persistence boundary

α_{1} + β_{1} = 1 .

The right panels compare Train and Test QLIKE losses, providing evidence on out-of-sample generalization performance.. All indices satisfy

α_{1} + β_{1} < 1

and exhibit stable generalization. Figures produced in Python 3.10 using Matplotlib (version 3.10.7).

Figure 6. Representative forecast panels from classical volatility models (ARIMA(1,0,1), GARCH(1,1), and HAR-RV) across selected U.S. indices. The blue curve denotes the realized (true) variance and serves as the benchmark series. HAR-RV exhibits the closest alignment with realized variance, followed by GARCH(1,1), while ARIMA(1,0,1) demonstrates weaker responsiveness to volatility dynamics. Numerical forecasting results corresponding to these panels are reported in Appendix A Table A1. Figures produced in Python 3.10 using Matplotlib (version 3.10.7).

Figure 7. Forecast accuracy of advanced models across horizons (h = 1, 5, 22). Transformer and PatchTST-lite architectures show the strongest overall accuracy and stability, while CNN-LSTM remains the least robust across horizons. All numerical forecast error values (MAE, RMSE, QLIKE) are reported in Appendix A Table A3, where results are provided for each index, RV estimator, and forecast horizon. Figures produced in Python 3.10 using Matplotlib (version 3.10.7) and Seaborn (version 0.13.2).

Figure 8. Pairwise DM p-values for advanced models across forecast horizons (h = 1, h = 22, and mean across horizons) (Panels (a–c) correspond to h = 1, h = 22, and the average across horizons, respectively. Rows and columns denote the compared models, and each cell reports the corresponding DM p-value, with color intensity reflecting the magnitude of the p-value as indicated by the color scale. All numerical DM statistics and associated p-values underlying Figure 8 are reported in Appendix A Table A4. Figures produced in Python 3.10 using Seaborn (version 0.13.2) and Matplotlib (3.10.7) (Matplotlib backend).

Figure 9. Overfitting diagnostics across forecast horizons for deep volatility models. Panel (a) displays the mean Train/Test loss ratio with uncertainty bands, while Panels (b–d) show representative learning curves for short (h = 1), medium (h = 5), and long (h = 22) horizons. Panels produced in Python 3.10 using Matplotlib (version 3.10.7) and Seaborn (version 0.13.2).

Figure 10. Forecasted log-variance dynamics across indices and horizons. The figure shows the predicted log-variance trajectories

(l o g σ^{2}

) produced by different DL models (LSTM, CNN-LSTM, Transformer, and PatchTST-lite) compared with RV. Across all panels, the models capture the main cyclical patterns and volatility clusters, especially during major market events (e.g., 2020–2022). At the 1-day horizon (Panel (a)), forecasts closely track realized variance with minimal smoothing, while at 5- and 22-day horizons (Panels (b,c)), the predictions become smoother due to horizon-driven aggregation effects. All numerical forecast error values (MAE, RMSE, QLIKE) are reported in Appendix A Table A3, where results are provided for each index, RV estimator, and forecast horizon. Figures produced in Python 3.10 using Matplotlib (version 3.10.7).

Figure 10. Forecasted log-variance dynamics across indices and horizons. The figure shows the predicted log-variance trajectories

(l o g σ^{2}

) produced by different DL models (LSTM, CNN-LSTM, Transformer, and PatchTST-lite) compared with RV. Across all panels, the models capture the main cyclical patterns and volatility clusters, especially during major market events (e.g., 2020–2022). At the 1-day horizon (Panel (a)), forecasts closely track realized variance with minimal smoothing, while at 5- and 22-day horizons (Panels (b,c)), the predictions become smoother due to horizon-driven aggregation effects. All numerical forecast error values (MAE, RMSE, QLIKE) are reported in Appendix A Table A3, where results are provided for each index, RV estimator, and forecast horizon. Figures produced in Python 3.10 using Matplotlib (version 3.10.7).

Figure 11. SHAP-based explainability for volatility forecasting errors. Panels (A,B) show QLIKE: mean(SHAP) ranking and beeswarm summary. Panels (C,D) show DM: mean(SHAP) ranking and beeswarm summary. Across both metrics,

R V_c l o s e

emerges as the most influential factor, followed by ARIMA(1,0,1) and HAR-RV, while H has a smaller but steady effect. Index variables make negligible contributions. Warmer tones correspond to higher feature values; positive SHAP values indicate higher forecast error, whereas negative values reflect improvements in accuracy. Figures produced in Python 3.10 using the SHAP library (version 0.49.1) and Matplotlib (version 3.10.7).

Figure 11. SHAP-based explainability for volatility forecasting errors. Panels (A,B) show QLIKE: mean(SHAP) ranking and beeswarm summary. Panels (C,D) show DM: mean(SHAP) ranking and beeswarm summary. Across both metrics,

R V_c l o s e

emerges as the most influential factor, followed by ARIMA(1,0,1) and HAR-RV, while H has a smaller but steady effect. Index variables make negligible contributions. Warmer tones correspond to higher feature values; positive SHAP values indicate higher forecast error, whereas negative values reflect improvements in accuracy. Figures produced in Python 3.10 using the SHAP library (version 0.49.1) and Matplotlib (version 3.10.7).

Table 1. Summary of Related Studies on Volatility and Time-Series Forecasting Model.

Authors (Year)	Models	Main Findings
I. Forecasting-Oriented Studies
Ersin and Bildirici (2023)	GARCH-MIDAS-LSTM	Hybrid GARCH–LSTM improves long-horizon forecasts; nonlinear terms enhance accuracy
Asgharian et al. (2013)	GARCH-MIDAS	GARCH outperforms ARIMA; macro factors improve long-term prediction
Virk et al. (2024)	GARCH-MIDAS	Macro variables help long-run forecasts but may induce overfitting
Harikumar and Muthumeenakshi (2025)	ARIMA vs. LSTM	ARIMA may outperform LSTM short-term; trade-off between linearity and adaptability
Brugiere and Turinici (2025)	Transformer	Attention captures both short- and long-term patterns; superior to classical models
Souto and Moradi (2024)	Transformer variants (Informer, Autoformer, PatchTST)	Improved scalability and long-horizon accuracy
Z. Zhang et al. (2025)	Quantformer (Transformer + sentiment)	Integrates sentiment and investor behavior; best predictive accuracy
Zeng et al. (2023)	CNN—Transformer	Combines local and global patterns; outperforms ARIMA and DeepAR
Chun et al. (2025)	ML (RF, LSTM, GARCH, HAR-RV)	ML models outperform classical ones; strong volatility-timing ability
II. Economic Volatility and Realized-Volatility Measurement Studies
L. Zhang et al. (2005)	Multi-grid RV	Multi-grid RV reduces microstructure noise
Patton and Sheppard (2009)	Combined RV estimators	Averaging across estimators improves forecast accuracy
O. Barndorff-Nielsen and Shephard (2004)	Power & Bipower variation	Jump-robust RV estimators
Tauchen and Zhou (2011)	Bipower variation	Detects volatility jumps; improves credit spread forecasts

Source: Author’s compilation.

Table 2. Data Collection and Preprocessing.

Preprocessing Step	Description of Procedure
1. Data source	Daily open-high-low-close (OHLC) prices for the S&P 500, NASDAQ 100, and DJIA were retrieved from Investing.com (2025) (https://www.investing.com, accessed on 5 September 2025) for the period 2000–2025 at a 1-day frequency.
2. Log-Return Calculation	Logarithmic returns from daily closing prices used to capture continuous-compounding effects.
3. Realized Variance (RV)	Constructed using three estimators: Close-to-Close, Parkinson (range-based), and Yang-Zhang (volatility with overnight adjustment).
4. Log Transformation	Applied to RV (log(RV) to stabilize variance, mitigate skewness, and enhance model convergence.
5. Target Variable	The dependent variable is $l o g ({R V}_{t})$ for all models and forecast horizons.
6. Forecast Horizon	Three horizons: h = 1, 5, 22 days (representing daily, weekly, and monthly volatility forecasts).
7. Sample	6450 daily observations per index, yielding ≈ 6330 supervised sequences using a 120-day lookback window.

Source: Author’s compilation.

Table 3. Comparative Overview of Volatility Forecasting Models.

Model Group	Key Representative Models	Main Strengths	Main Limitations
Classical Econometric Models	ARIMA GARCH, HAR-RV	Simple, interpretable and theoretically grounded; capture volatility clustering and persistence with limited data.	Assume linearity; weak for nonlinear or asymmetric, and regime shifts; limited long-memory adaption.
DL	LSTM, CNN-LSTM	Capture nonlinear and long-term dependencies (LSTM) and localized patterns (CNN); flexible for multivariate inputs.	Require large datasets; risk of overfitting; high computational cost; less interpretable.
TRF Models	Transformer, PatchTST-lite	Model long-range dependencies efficiently; scalable; robust, and state-of-the-art for time-series forecasting.	Still emerging in volatility research; need tuning and computational; limited interpretability (black-box).

Source: Author’s compilation.

Table 4. Subsample robustness: best model by regime (QLIKE loss).

Subsample	Best Classical (QLIKE)	Best DL (QLIKE)
Pre-GFC (2000–2006)	HAR-RV (low error)	LSTM/CNN-LSTM
GFC (2007–2009)	GARCH(1,1)	Transformer
Post-GFC (2010–2019)	HAR-RV	Transformer
COVID-19 and post-COVID-19 (2020–2025)	GARCH(1,1)	Transformer

The subsample rankings in Table 4 are derived directly from the QLIKE losses generated in Python for each model and each historical regime. For every subsample window (Pre-GFC, GFC, Post-GFC, COVID-19), the model with the lowest average out-of-sample QLIKE value was selected as the best-performing specification. Source: Author’s compilation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Taneva-Angelova, G.; Granchev, D. Deep Learning and Transformer Architectures for Volatility Forecasting: Evidence from U.S. Equity Indices. J. Risk Financial Manag. 2025, 18, 685. https://doi.org/10.3390/jrfm18120685

AMA Style

Taneva-Angelova G, Granchev D. Deep Learning and Transformer Architectures for Volatility Forecasting: Evidence from U.S. Equity Indices. Journal of Risk and Financial Management. 2025; 18(12):685. https://doi.org/10.3390/jrfm18120685

Chicago/Turabian Style

Taneva-Angelova, Gergana, and Dimitar Granchev. 2025. "Deep Learning and Transformer Architectures for Volatility Forecasting: Evidence from U.S. Equity Indices" Journal of Risk and Financial Management 18, no. 12: 685. https://doi.org/10.3390/jrfm18120685

APA Style

Taneva-Angelova, G., & Granchev, D. (2025). Deep Learning and Transformer Architectures for Volatility Forecasting: Evidence from U.S. Equity Indices. Journal of Risk and Financial Management, 18(12), 685. https://doi.org/10.3390/jrfm18120685

Article Menu

Deep Learning and Transformer Architectures for Volatility Forecasting: Evidence from U.S. Equity Indices

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Data Collation and Preprocessing

3.2. Methodology

3.2.1. Classical Econometric Models

3.2.2. Advanced Models

4. Results

4.1. Descriptive Statistics and Preliminary Analysis

4.2. Classical Model Performance

4.2.1. Point Forecast Accuracy (MAE, RMSE, QLIKE)

4.2.2. Statistical Significance (DM Test)

4.2.3. Overfitting Diagnostics for Classical Models

4.2.4. Forecasting Results

4.3. Advanced Models Performance

4.3.1. Forecast Accuracy for Advanced Models (MAE, RMSE, QLIKE)

4.3.2. Statistical Significance (Diebold–Mariano)

4.3.3. Overfitting Diagnostics for Deep Learning Models

4.3.4. Forecasting Results

4.4. Comparative Evaluation and Statistical Significance

4.5. Subsample Robustness Analysis

5. Economic Interpretations of Forecast Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI