Modeling Time-Varying Volatility via Multi-Scale Structures and Dynamic Attention Networks: Evidence from High-Frequency Data

Zhang, Kaidi; Wu, Shaobing; Zhu, Dong

doi:10.3390/math14081257

Open AccessArticle

Modeling Time-Varying Volatility via Multi-Scale Structures and Dynamic Attention Networks: Evidence from High-Frequency Data

by

Kaidi Zhang

^1,2,

Shaobing Wu

^2,* and

Dong Zhu

^2,*

¹

School of Management and Economics, The Chinese University of Hong Kong, Shenzhen 518172, China

²

College of Physics and Optoelectronic Engineering, Shenzhen University, Shenzhen 518060, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2026, 14(8), 1257; https://doi.org/10.3390/math14081257

Submission received: 16 February 2026 / Revised: 3 April 2026 / Accepted: 7 April 2026 / Published: 10 April 2026

Download

Browse Figures

Versions Notes

Abstract

Accurate tail risk forecasting in emerging markets is frequently compromised by the nonlinear dynamics and time-varying long memory of high-frequency volatility. In this study, we employ multifractal detrended fluctuation analysis (MF-DFA) to decode the complex market behavior, revealing pronounced multifractality and strong persistence that defy the static assumptions of classical linear models. The multifractal analysis is only used for research motivation and model design, not as input features for the model. To bridge the gap between fractal diagnostics and predictive modeling, we propose an attention-based dynamically reweighted SA-HAR-J-Net framework. This architecture uniquely integrates HAR-style multi-horizon inputs with a bidirectional LSTM (BiLSTM) encoder and a temporal self-attention mechanism. Crucially, the attention module functions as a dynamic reweighting system, allowing the model to adaptively emphasize historical patterns that receive higher attention weights under changing market conditions, thereby mimicking the time-varying correlations inherent in multifractal processes. Furthermore, we incorporate jump proxies and realized higher moments to enhance the capture of extreme tail dynamics. Utilizing a strict expanding-window out-of-sample protocol, the proposed method achieves significantly lower quantile loss and superior calibration relative to established econometric and machine learning benchmarks for Value-at-Risk (VaR) forecasting. This work provides a robust framework for tail risk monitoring by effectively aligning deep learning architectures with the stylized facts of multifractal markets.

Keywords:

multifractal analysis; attention-based dynamic reweighting; tail risk; high-frequency volatility

MSC:

91G60; 62M10; 68T05

1. Introduction

Episodes such as the 2008 global financial crisis remain a stark reminder that tail risk is not only large in magnitude but also persistent in time. Even after the initial shock dissipates, market stress can propagate through trading and information channels, while volatility often exhibits slow decay and clustering rather than rapid mean reversion. In this context, Value-at-Risk (VaR), defined as a conditional left-tail quantile, continues to play a central role in risk limits, capital allocation, and daily monitoring [1,2]. Yet reliable VaR forecasting is inherently difficult because financial risk is generated by a complex system with nonlinear interactions, dynamic feedback, heavy tails, and regime-dependent persistence [3,4]. Modeling tail risk under such conditions faces several practical challenges. First, market dynamics are multi-scale: short-horizon shocks, medium-horizon volatility clustering, and long-horizon persistence can coexist and interact, so a single representative horizon is rarely adequate. Second, temporal evolution matters: the effective memory of volatility and the relevance of past information can shift across regimes, making fixed-memory or fixed weight forecasting rules fragile. Third, modern data environments demand both flexibility and transparency. Models must exploit rich information (including high-frequency inputs) while, where possible, preserving economically meaningful connections between inputs and risk forecasts. A growing literature addresses parts of this problem through nonlinear econometric models, realized measures, and multi-horizon structures. ARCH/GARCH models extend linear specifications to capture volatility clustering [5,6], and long memory extensions based on fractional integration or fractional differencing have been proposed to represent power-law persistence [7,8,9,10]. High-frequency intraday returns further enable realized measures that proxy latent volatility and separate continuous variation from jumps [11,12,13,14]. The heterogeneous autoregressive (HAR) paradigm provides a parsimonious approximation of heterogeneous trading horizons via daily–weekly–monthly components [15]. Nevertheless, classical HAR-type specifications typically aggregate horizons with static linear weights, which can be restrictive when persistence shifts across regimes or when tail shape information enters nonlinearly [16,17,18].

To motivate an attention-based dynamic reweighting mechanism, we adopt a multifractal perspective on realized volatility. Fractal finance emphasizes that volatility is multi-scale and may exhibit scaling regularities associated with heavy tails and long memory [19]. Detrended fluctuation analysis (DFA) and multifractal DFA (MF-DFA) quantify generalized Hurst exponents and the singularity spectrum for potentially nonstationary time series [20,21,22], while wavelet time–frequency representations provide a complementary view of intermittency and scale-dependent energy bursts [23]. Multifractal cascade models provide a theoretical basis for multi-scale volatility modeling and motivate interpreting volatility as a complex dynamical process with regime-dependent memory [24,25]. For tail risk forecasting, this matters because multifractality implies heterogeneous local scaling and time-varying effective persistence: a fixed memory kernel may overweight irrelevant history in some regimes and underweight salient episodes in others. We study the ChiNext Index (399006.SZ), a growth-oriented market segment in which volatility frequently exhibits bursts and time-varying persistence [26]. Starting from 139,920 five-minute price bars, we construct a daily realized measure panel via an explicit intraday-to-daily pipeline [27,28]. We then report multifractal diagnostics for log-realized volatility in Section 2.1.5: the generalized Hurst profile is strongly q-dependent, and the estimated Hurst exponent at q = 2 exceeds unity (e.g., H(2) = 1.1547), suggesting extremely strong persistence and potentially nonstationary scaling behavior. A shuffle test indicates that multifractality is primarily driven by temporal correlations rather than the marginal distribution, while wavelet power and rolling DFA-based estimates reveal regime-dependent persistence across scales. These empirical patterns motivate tail risk models that can adapt their effective memory and reweight past information as regimes change. Motivated by these considerations, we develop a regime-adaptive hybrid quantile framework for one-day-ahead VaR forecasting that retains HAR-style multi-horizon structure while enabling time-varying reweighting of historical information. Specifically, we (i) construct a high-frequency realized measure panel for ChiNext, including realized variance (RV), bipower variation (BV), a jump proxy, realized skewness (RSK), realized kurtosis (RKT), and an A-share-specific second difference skewness feature [29]; (ii) propose SA-HAR-J-Net, integrating HAR tokenization with BiLSTM encoding and time self-attention as a dynamic reweighting module [30,31,32]; and (iii) evaluate performance under an expanding-window rolling out-of-sample protocol with strict no look-ahead design, reporting pinball loss and standard coverage/independence backtests [33,34,35,36]. Importantly, the multifractal and wavelet analyses are used as diagnostic evidence and architectural motivation; they are not used as supervision signals in model training.

Figure 1 outlines our methodological pipeline, which bridges high-frequency market microstructure with deep quantile learning. The framework proceeds from constructing realized measures using five-minute bars to diagnosing volatility features via multifractal analysis. Central to this approach is a rigorous rolling out-of-sample training scheme that forecasts tail risks without look-ahead bias, culminating in a multi-dimensional evaluation of VaR reliability. The remainder of this paper is organized as follows. Section 2 describes the data construction, realized measures, multifractal diagnostics, and the proposed modeling framework. Section 2.3 details the rolling out-of-sample protocol and benchmark specifications. Section 3 reports the main empirical results and robustness checks, followed by discussion and concluding remarks in Section 4 and Section 5.

2. Materials and Methods

2.1. Data and Preprocessing

2.1.1. Intraday Data and Trading Session

Ali and Khurram (2025) have confirmed that five-minute high-frequency data is the optimal sampling frequency, and they conducted volatility modeling based on this frequency [37]. In this study, the five-minute intraday price data of the ChiNext Index (399006.SZ), officially compiled and released by the Shenzhen Stock Exchange (SZSE), is collected from the Wind Financial Terminal (WFT). We restrict the sample to the regular trading hours (CN_RTH), i.e., 09:30–11:30 and 13:00–15:00 Beijing time. Let P_t,i denote the i-th five-minute closing price on trading day t, and define intraday log returns.

r_{t, i} = \log P_{t, i} - \log P_{t, i - 1}

(1)

This transformation converts raw prices into scale-free increments. In high-frequency settings, it separates intraday variability from price levels, making realized measures comparable over time. The daily close-to-close log return is denoted by r_t.

We apply basic data-quality screening before constructing realized measures. Specifically, we (i) remove obvious data errors such as nonpositive prices and duplicated timestamps; (ii) enforce regular-session filtering (CN_RTH) and ignore non-trading intervals; and (iii) discard trading days with fewer than 10 intraday five-minute returns (i.e., fewer than 11 five-minute price bars) rather than interpolating missing bars. This minimum coverage rule serves as a general safeguard for data quality. All realized measures are computed from the cleaned intraday return sequence available up to each day. In our final sample, intraday coverage is effectively complete: n_t = 47 for all trading days (equivalently, 48 five-minute price bars per day).

2.1.2. Daily Realized Measures

We compute the following daily realized measures from the intraday return sequence {r_t,i}nt i = 1. [Realized variance, bipower variation, and jump proxy] Let {r_t,i}nt i = 1 denote the intraday return sequence on day t. We define realized variance (RV) and bipower variation (BV) as:

{R V}_{t} = \sum_{i = 1}^{n_{t}} r_{t, i}^{2}

(2)

{B V}_{t} = \frac{n_{t}}{n_{t} - 1} μ_{1}^{- 2} \sum_{i = 2}^{n_{t}} |r_{t, i}| |r_{t, i - 1}|

(3)

where µ₁ = (2/π)^1/2. Consequently, the nonnegative daily jump proxy is defined as:

J_{t} = m a x ({R V}_{t} - {B V}_{t}, 0)

(4)

Economically, RV_t measures the total intraday price fluctuation intensity. The contrast between RV_t and BV_t helps isolate abrupt jump-like moves, which are often associated with news shocks and tail risk episodes. In addition, we compute realized skewness (RSK) and realized kurtosis (RKT) using standardized higher-order realized moments [29]:

{R S K}_{t} = \frac{{(n_{t})}^{\frac{1}{2}} \sum_{i = 1}^{n_{t}} r_{t, i}^{3}}{{(\sum_{i = 1}^{n_{t}} r_{t, i}^{2})}^{\frac{3}{2}}}

(5)

{R K T}_{t} = \frac{n_{t} \sum_{i = 1}^{n_{t}} r_{t, i}^{4}}{{(\sum_{i = 1}^{n_{t}} r_{t, i}^{2})}^{2}}

(6)

To capture rapidly changing asymmetry in the A-share tails, we include the second difference of realized skewness:

∆^{2} {R S K}_{t} = {R S K}_{t} - {2 R S K}_{t - 1} + {R S K}_{t - 2}

(7)

The final daily feature panel includes {r_t, RV_t, BV_t, J_t, RSK_t, ∆²RSK_t, RKT_t}.

2.1.3. Descriptive Statistics

The daily realized measure dataset spans 2 January 2014 to 12 January 2026, with 2915 trading days. The intraday panel contains 139,920 five-minute price bars after session filtering and basic data-quality screening. Each trading day contributes 47 intraday returns (computed as first differences of log prices) corresponding to 48 five-minute bars in a full regular session. Table 1 reports descriptive statistics computed from the realized measure panel.

2.1.4. No Look-Ahead Design and OOS Protocol

All predictors are shifted by one day so that VaR forecasts for day t depend only on information available up to t−1. Throughout, predictors are constructed using information up to t − 1, while t denotes the forecast (evaluation) day for r_t and VaR(α) t. We adopt an expanding-window rolling OOS scheme with periodic refitting, early stopping, and GPU training. In our main configuration, the lookback length is set to L = 60 trading days, and models are refit every K = 20 prediction days. It is explicitly emphasized that the prediction for day t only uses information up to day t − 1 and no future data is involved to avoid look-ahead bias. All models are evaluated on the same aligned out-of-sample period (11 March 2024 to 12 January 2026); detailed settings are reported in Section 2.3. The exact training configuration and benchmark specifications are summarized in Section 2.3.

2.1.5. Multifractal Dynamics Evidence and Motivation for Attention

We provide multifractal evidence from the ChiNext realized volatility process and link these empirical signatures to the design choice of a time self-attention mechanism in SA-HAR-J-Net. We focus on two complementary views: a static multifractal diagnosis (distributional complexity vs. temporal correlation) and a dynamic diagnosis (time-varying regimes across multiple scales). Multifractal analysis is employed solely for research motivation and model design and is not used as input features for the model.

MF-DFA: definition and estimation. Let

{x_{t}}_{t}^{N}

= 1 denote the daily log-realized volatility series. MF-DFA quantifies multi-scale power-law fluctuations for potentially nonstationary time series by combining detrending with q-order fluctuation functions [21,22]. We first remove the sample mean and form the cumulative profile

Y (i) = \sum_{t = 1}^{i} (x_{t} - \bar{x}), i = 1, \dots, N,

(8)

where

\bar{x}

= N⁻¹∑N t = 1x_t. For a given scale s, we split the profile into N_s = ⌊N/s⌋ nonoverlapping segments. To reduce boundary effects, we perform the segmentation from both ends, resulting in 2N_s segments. In each segment ν, we fit a polynomial trend P_ν,m(i) of order m by least squares and compute the detrended variance

F^{2} (v, s) = \frac{1}{s} \sum_{i = 1}^{s} {\{Y [(v - 1) s + i] - P_{v, m} (i)\}}^{2}, v = 1, \dots, 2 N_{s}

(9)

For q ≠ 0, the q-order fluctuation function is defined as

F_{q} (s) = {\{\frac{1}{{2 N}_{s}} \sum_{v = 1}^{{2 N}_{s}} {[F^{2} (v, s)]}^{\frac{q}{2}}\}}^{\frac{1}{q}}

(10)

and for q = 0 we use the logarithmic averaging limit

F_{0} (s) = e x p \{\frac{1}{{4 N}_{s}} \sum_{v = 1}^{{2 N}_{s}} \ln [F^{2} (v, s)]\}

(11)

If the series exhibits multifractal scaling, then over an appropriate range of scales s we expect

F_{q} (s) ~ s^{H (q)}

(12)

where H(q) is the generalized Hurst exponent, estimated as the slope of a log–log regression of log F_q(s) on log s for each q. To summarize multifractality in a compact form, define the mass exponent function and obtain the singularity strength and spectrum via the Legendre transform.

τ (q) = q H (q) - 1

(13)

α (q) = \frac{d τ (q)}{d q}

(14)

f (α) = q α - τ (q)

(15)

The spectrum width ∆α = α_max − α_min provides an interpretable scalar measure of multifractal complexity.

Static diagnosis: temporal correlation vs. distribution.

Table 2 reports global multifractal parameters. The estimated Hurst exponent at q = 2 is H(2) = 1.1547. The observation that H(2) > 1 indicates pronounced persistence, nonstationarity, and strong long memory dynamics in the logarithmic realized volatility series. The singularity spectrum spans α_min = 0.9817 and α_max = 1.2259, implying a multifractal width ∆α = 0.2442, i.e., heterogeneous local scaling behaviors consistent with nonlinear dynamics.

To identify whether multifractality is driven primarily by temporal correlations rather than the marginal distribution, we conduct a shuffle test. We randomly permute the volatility series, which preserves the unconditional distribution but destroys time ordering. Figure 2a compares H(q) between the original and shuffled series; the shuffled curve collapses toward an approximately flat profile near 0.5, consistent with the removal of temporal dependence, while the original curve remains distinctly q-dependent. This discrepancy confirms that multifractality mainly originates from temporal correlations (long memory). Figure 2b reports the singularity spectrum f(α), whose width ∆α quantifies multifractal complexity.

Dynamic diagnosis: multi-scale co-movement and regime shifts.

We further examine time variation by aligning (a) log-realized volatility, (b) wavelet power spectrum, and (c) rolling Hurst exponent (DFA, 500-day window), shown in Figure 3. Table 3 summarizes rolling statistics: Mean = 1.0360, Std. Dev. = 0.0944, Min = 0.8625, and Max 200 = 1.2616, indicating pronounced regime dependence.

Importantly, volatility spikes coincide with bursts of wavelet energy across a wide range of periods and with shifts in the rolling Hurst exponent, implying multi-scale co-movement and time-varying long memory. This directly motivates the use of self-attention in SA-HAR-J-Net: attention can adaptively reweight historical tokens and focus on regime-relevant horizons and days, rather than relying on fixed weights. Robustness of multifractal evidence and non-stationarity.

As previously reported in Table 2, the baseline Hurst exponent is H(2) ≈ 1.1547. Values of H(2) exceeding unity depart from standard stationary fractional Gaussian noise and are consistent with very strong persistence and possibly nonstationary or integrated-like behavior in the high-frequency volatility process. At the same time, such values should be interpreted with caution, because finite-sample bias in MF-DFA may also inflate H(2) even for highly persistent stationary processes. This extreme persistence implies that shocks to the system do not decay at a stationary rate but exhibit local trending characteristics, rendering traditional constant-parameter linear models inadequate and necessitating the adaptive, regime-aware mechanism of the proposed SA-HAR-J-Net. To ensure that these empirical findings are not artifacts of algorithmic parameter selection, we conduct an extensive robustness analysis, as summarized in Table 4. Variations in the scaling ranges (s), the number of scales, and the moment ranges (q) consistently produce H(2) > 1.11 and multifractal spectrum widths ∆α > 0.14. Furthermore, increasing the detrending polynomial order to m = 2 still yields H(2) ≈ 1.070 and strong power-law scaling (R2 ≈ 0.998). Setting m = 2 reduces H(2) to 1.070 but does not change any substantive conclusion of the study. This confirms that the observed long memory and multifractal properties are extremely robust to methodological configurations.

Building upon the basic shuffle test shown in Figure 2a, we formally apply Iterative Amplitude Adjusted Fourier Transform (IAAFT) surrogate testing to isolate the fundamental sources of multifractality (Table 5). While the completely shuffled series collapses H(2) to approximately 0.519 (strictly consistent with a random walk), the IAAFT surrogate preserves the linear autocorrelations and unconditional distribution while destroying nonlinear phase relations. The IAAFT surrogate retains a high H(2) ≈ 1.129 due to linear long memory, but its multifractal spectrum width ∆α drops sharply from 0.244 to 0.093. This substantial reduction isolates nonlinear temporal dynamics—rather than mere heavy-tailed distributions—as the primary driver of the multifractal signature, directly justifying the deployment of nonlinear deep learning architectures for tail risk forecasting. In other words, the IAAFT results are informative because they preserve a high H(2) while substantially reducing the multifractal width, which suggests that nonlinear temporal structure is an important driver of the observed multifractality. Specifically, the shuffled series removes all temporal correlation, while the IAAFT surrogate preserves linear autocorrelation (including long memory dependence) but eliminates nonlinear temporal dependencies, thereby clarifying the distinction between linear and nonlinear drivers of multifractality.

2.1.6. Exploratory Visualization

Figure 4 provides exploratory visualization of returns and realized measures, and Figure 5 reports the feature correlation matrix used in the HAR-style specification. The visual inspection in Figure 4 highlights several key stylized facts that motivate our model specification. As shown in Figure 4a, the return series is characterized by time-varying volatility. The non-normality of the data is evident from the heavy tails observed in the kernel density estimate (Figure 4b) and the Q–Q plot (Figure 4c).

Crucially for our forecasting framework, the ACF of squared returns in Figure 4d exhibits a hyperbolic decay, confirming the well-known long memory feature of financial volatility. Furthermore, the decomposition of realized measures in Figure 4e,f reveals that the total variation is driven by both a continuous component (approximated by BV) and a discontinuous jump component (J).

2.2. Methodology

2.2.1. Variable Nomenclature (For Non-Finance Readers)

To facilitate cross-disciplinary understanding, Table 6 provides a detailed glossary of the notation alongside its financial and physical interpretation. We specifically distinguish between raw high-frequency inputs and derived realized measures, such as variance, skewness, and kurtosis, which collectively capture the magnitude, asymmetry, and tail heaviness of market fluctuations used to forecast extreme risk. To maintain consistency with subsequent correlation analysis and quantile regression modeling, key variables in Figure 5 and related model specifications are defined herein. These include the lagged daily continuous volatility c_lag₁, weekly and monthly aggregated continuous volatility c_roll₅ and c_roll₂₂, lagged daily jump j_lag₁, weekly and monthly aggregated jumps j_roll₅ and j_roll₂₂, and lagged daily log return r_lag₁. Figure 5 in this section is defined in advance to ensure consistency across the entire feature set. Detailed economic interpretations will be provided in subsequent sections.

2.2.2. VaR as a Conditional Quantile and Pinball Loss

Let F_t−₁ denote the information set up to time t − 1. For α ∈ (0, 1), the one-step-ahead VaR is defined as the conditional α-quantile

{V a R}_{t}^{(α)} = q_{α} (r_{t}| F_{t - 1})

(16)

Interpreting VaR as a conditional quantile focuses the task on tail shape modeling rather than mean prediction. Smaller α corresponds to rarer but more severe losses, which is central for stress-aware risk management. We estimate VaR directly by minimizing the pinball (quantile) loss. For a forecast

{\hat{q}}_{t}

at level α, define

L_{α} (r_{t}, {\hat{q}}_{t}) = \{\begin{matrix} α (r_{t} - {\hat{q}}_{t}), r_{t} \geq {\hat{q}}_{t} \\ (1 - α) ({\hat{q}}_{t} - r_{t}), r_{t} < {\hat{q}}_{t} \end{matrix}

(17)

Jointly training at the 1% and 5% levels encourages coherent learning across different tail severities. SA-HAR-J-Net is trained jointly for α ∈ {0.01, 0.05} by minimizing the equally weighted sum of the two pinball losses over the training sample.

2.2.3. Approximation of Fractional Integration via HAR Tokenization

Standard financial volatility models often assume exponential decay of autocorrelation, whereas multifractal analyses reveal a power-law decay characteristic of fractionally integrated processes. To explicitly encode this long memory structure without the high computational cost of fractional differencing, we adopt the Heterogeneous Autoregressive (HAR) framework as a discrete approximation to the fractional integration operator (1 − L)^−d.

Let the base realized feature vector be xt = (RV_t, BV_t, J_t, RSK_t, ∆²RSK_t, RKT_t)^T ∈ ℝ⁶. Mathematically, a fractionally integrated process xt can be represented as an infinite-order autoregressive process AR(∞) where coefficients decay hyperbolically. The HAR framework approximates this decay using a cascade of heterogeneous time scales (daily, weekly, and monthly):

x_{t}^{(d)} = x_{t}

(18)

x_{t}^{(ω)} = \frac{1}{5} \sum_{k = 1}^{5} x_{t - k} \approx \int_{t - 5}^{t} x (s) d s

(19)

x_{t}^{(m)} = \frac{1}{22} \sum_{k = 1}^{22} x_{t - k} \approx \int_{t - 22}^{t} x (s) d s

(20)

By aggregating high-frequency information over these cascading horizons, the HAR components capture the superposition of volatility processes with different time constants, which is known to generate long memory behavior indistinguishable from true fractional integration (d > 0). Therefore, we construct the input tensor not merely as statistical averages but as a multi-scale representation of the fractional memory kernel:

X_{t} = [x_{t - l}^{(d)}, x_{t - l}^{(ω)}, x_{t - l}^{(m)}] \binom{L - 1}{l = 0} \in M_{3 \times L} (R^{6})

(21)

This structured input allows the subsequent BiLSTM and self-attention layers to learn a nonlinear, adaptive mapping of the fractional order d(t), effectively handling the regime-dependent persistence observed in the ChiNext index.

2.2.4. SA-HAR-J-Net: BiLSTM with Time Self-Attention and Bounded Output

We flatten each time step into a vector in ℝ¹⁸ and feed the length-L sequence into a one-layer bidirectional LSTM, producing hidden states {h_ℓ}L ℓ = 1, where h_ℓ ∈ R^2H and H = 64. Concretely, let u_ℓ ∈ ℝ¹⁸ denote the flattened input at time step ℓ. The forward and backward LSTM recursions can be written as

(\vec{h_{l}}, \vec{c_{l}}) = L S T M (u_{l}, \vec{h_{l - 1}}, \vec{c_{l - 1}})

(22)

(\overset{\leftarrow}{h_{l}}, \overset{\leftarrow}{c_{l}},) = L S T M (u_{l}, \overset{\leftarrow}{h_{l - 1}}, \overset{\leftarrow}{c_{l - 1}})

(23)

and the bidirectional hidden representation is the concatenation

h_{l} = [\vec{h_{l}}; \overset{\leftarrow}{h_{l}}] \in R^{2 H}

(24)

Within each LSTM direction, a standard gated update is given by

i_{l} = σ (W_{i} u_{l} + U_{i} h_{l - 1} + b_{i})

(25)

f_{l} = σ (W_{f} u_{l} + U_{f} h_{l - 1} + b_{f})

(26)

o_{l} = σ (W_{o} u_{l} + U_{o} h_{l - 1} + b_{o})

(27)

{\tilde{c}}_{l} = t a n h (W_{c} u_{l} + U_{c} h_{l - 1} + b_{c})

(28)

c_{l} = f_{l} ⊙ c_{l - 1} + i_{l} ⊙ {\tilde{c}}_{l}

(29)

h_{l} = o_{l} ⊙ t a n h (c_{l})

(30)

where σ(·) is the logistic sigmoid and ⊙ denotes elementwise multiplication. For each forecast day t, the BiLSTM encoder takes the length-L input sequence and produces a sequence of hidden states (h_ℓ)_ℓ = 1,…,L, where each h_ℓ ∈ R^2H summarizes the information at position ℓ in the lookback window. Although h_ℓ denotes the hidden representation at the last position of the lookback window, the proposed model does not directly use only the final hidden state for prediction. Instead, a time self-attention layer is applied to the entire hidden-state sequence in order to adaptively reweight historical information across the L positions. Specifically, the attention score, normalized attention weight, and context vector are defined as:

s_{l} = v^{⊺} t a n h ({W h}_{l})

(31)

a_{l} = \frac{e x p (s_{l})}{\sum_{j = 1}^{L} e x p (s_{j})}

(32)

c = \sum_{l = 1}^{L} a_{l} h_{l}

(33)

where s_ℓ is the scalar attention score associated with hidden state h_ℓ, al is the corresponding normalized attention weight, and c ∈ R^2H is the attention-aggregated context vector. Here, ℓ = 1,…,L indexes the positions in the lookback window, and L denotes the lookback length. In our main specification, L = 60 trading days. The context vector c is then fed into a fully connected layer (followed by dropout) to generate two raw outputs, (z_0.01,t, z_0.05,t), corresponding to the 1% and 5% VaR levels. Finally, these raw outputs are mapped to the bounded VaR forecasts through

{\hat{q}}_{α, t} = - σ (z_{α, t}) {c a p}_{t}

(34)

{c a p}_{t} = m \sqrt{{R V}_{t - 1} + ε}

(35)

where m > 0 is a fixed multiplier and ε > 0 avoids degeneracy. The attention weights a_ℓ act as a cognitive-inspired reweighting of past days, allowing the model to focus on salient risk episodes within long histories. Consequently, this structural limit is intended to reduce the risk that the model overreacts to transient high-frequency shocks. Unconstrained deep learning architectures are susceptible to generating economically implausible, unbounded risk estimates during extreme market stress due to the nonlinear extrapolation of noise. By explicitly anchoring the output bounds to the prevailing volatility state (cap_t), this mechanism enforces a stylized financial fact: while extreme tail events (jumps) deviate from normal expectations, the maximum potential loss remains fundamentally constrained by the current market regime. Consequently, this structural limit prevents the attention mechanism from overreacting to transient high-frequency shocks.

At the same time, this bounded-output design may come at a modest cost in complementary ex post tail event ranking diagnostics, especially at the 1% tail level (see Appendix A), while helping maintain numerical stability and economically plausible VaR forecasts.

2.2.5. Rolling OOS Training with Periodic Refitting

We implement an expanding-window rolling out-of-sample procedure with periodic refitting:

Fix a start date t₀, lookback length L, and refit interval K (in prediction days).
For each prediction day t = t₀, t₀ + 1,…:

(a) Construct all realized features and HAR tokens using information up to t − 1 (via one-day shifting).

(b) If t = t₀ or (t − t₀) mod K = 0, train SA-HAR-J-Net on all data available up to t − 1, using an internal validation split and early stopping.

(c) Use the most recent fitted model to predict (

{\hat{V a R}}_{t}^{(0.01)}

,

{\hat{V a R}}_{t}^{(0.05)}

) from the input tensor X_t.

All hyperparameters reported in Section 2.3 match the implementation and the exported metadata file.

2.3. Rolling OOS Protocol and Reproducibility

2.3.1. SA-HAR-J-Net Configuration

The rolling OOS configuration is provided in the accompanying replication package (saharjnet_meta.json). In our experiments, the model uses L = 60 lookback days, refits every 20 prediction days, trains up to 20 epochs with patience 20, batch size 256, and learning rate 5 × 10⁻⁴ on GPU. The per-token feature dimension is 6, corresponding to (RV, BV, J, RSK, ∆²RSK_t, and RKT). Dropout is set to 0.3, and the VaR cap uses multiplier m = 6.0 with ε = 10⁻⁸. This value is chosen as a reasonable baseline to ensure stable and economically plausible VaR forecasts. As demonstrated in the robustness check (Appendix B), the main empirical findings remain qualitatively similar when m varies within a plausible range of [5,7]. SA-HAR-J-Net is implemented in Python 3.12.10. The proposed SA-HAR-J-Net was trained in PyTorch 2.7.1 with GPU acceleration via CUDA 11.8. Benchmark models use TensorFlow 2.19.0/Keras (CNN–LSTM), LightGBM 4.6.0 (LGBM), and statsmodels 0.14.5 (HAR-J, GARCH, GJR-GARCH, and FHS). Our overall framework can be seen in Figure 6.

2.3.2. Benchmark Model Specifications

For reproducibility, we report compact parameter/hyperparameter summaries for representative benchmark models based on the exported tables included in the replication package. All models are evaluated on the same out-of-sample test period (11 March 2024 to 12 January 2026) by aligning forecasts to the intersection of available prediction dates. For every test day t, all predictors are constructed using information up to t − 1 via one-day shifting, and model hyperparameters are selected using the training/validation split only (no test-set tuning). For the historical simulation benchmark with rolling window W = 250, the window is fully available at the beginning of the test period, so no backward-filling of missing forecasts is used or needed in the test-set evaluation. Please see key settings/parameters of all models in Table 7.

Table 8 reports the selected LightGBM quantile-regression hyperparameters [39], and Table 9 reports the CNN–LSTM configuration and final losses. Extending CNN-LSTM-Q to L = 60 did not materially improve its performance (pinball loss increased), and the model’s performance with the 60-day time window was inferior to that of the model configurations reported in Table 9.

Figure 7 illustrates the training stability of the CNN–LSTM baseline. The model is trained for 17 epochs (Table 9) and converges to a stable regime; the final training loss is 0.01391, and the final validation loss is 0.01000. The validation loss can be slightly lower than the training loss in regularized neural training with dropout and minibatch noise.

3. Results

We evaluate VaR forecasts using average quantile loss, violation rate, Expected Shortfall (ES), and standard coverage/independence tests. Although SA-HAR-J-Net is trained jointly for the 1% and 5% quantiles, for readability we present the 5% results as the main benchmark and report complementary 1% evidence in the multi-quantile diagnostics and the supplementary ranking metrics (Appendix A). In this paper, a violation occurs when the realized return is lower than the one-step-ahead predicted VaR at level alpha. The reported violation rate is the sample frequency of such exceedances over the common test period.

3.1. Main Predictive Performance

The comparative results for the 5% VaR level (α = 0.05) are summarized in Table 10. Overall, the proposed SA-HAR-J-Net achieves the lowest average quantile loss (0.00197) among all competing models. In terms of reliability, SA-HAR-J-Net maintains a violation rate of 3.88%, which is sufficiently close to the nominal 5% level.

The non-significant p-values for both Kupiec (0.2643) and Christoffersen (0.1595) tests confirm that the model effectively captures tail dynamics without exhibiting significant clustering of violations. In contrast, the benchmark models reveal divergent structural weaknesses regarding risk calibration. On one hand, the GARCH and GJR-GARCH frameworks display excessive conservatism, evidenced by violation rates significantly below the nominal level (0.68% and 0.91%, respectively). While this defensive posture prevents violations, the statistically significant rejection of the Kupiec test (p < 0.001) implies an inefficient deployment of the risk budget and may suggest more conservative capital allocation. Conversely, the HS, LGBM-Q, and CNN–LSTM-Q models tend toward the opposite extreme by systematically underestimating risk; they yield violation rates that exceed the 5% threshold (e.g., 6.16% for HS), thereby failing to provide adequate protection against tail events. HAR-J and FHS also exhibit conservatism (violation rates around 2%), with Kupiec tests rejecting correct unconditional coverage, indicating that these models may overstate risk in this sample. It is reiterated that multifractal analysis is used only for research motivation and model design and is not incorporated as input features in the modeling process. Table 10 also shows that SA-HAR-J-Net attains the least negative ES value (−0.0355) among the compared models. This suggests that, conditional on a violation, the average realized tail loss under SA-HAR-J-Net is less severe than under the benchmark forecasts, whereas the more negative ES values of GARCH-type models are consistent with their excessive conservatism in this sample.

Figure 8. Time-series backtesting of α = 0.05 (5%) one-day-ahead VaR forecasts for all models on the common test set (ChiNext Index, 399006.SZ).

To provide a statistical comparison of predictive accuracy, we report selected Diebold–Mariano (DM) tests [42] based on the loss differential series. Table 11 summarizes the DM statistics and p-values. We report selected pairwise comparisons for representative benchmark pairs; the full set of pairwise DM tests (including comparisons against SA-HAR-J-Net) is provided in the replication package. Figure 8 reports time-series backtesting of 5% (α = 0.05) one-day-ahead VaR forecasts for all models on the ChiNext Index (399006.SZ) test set.

3.2. Multi-Quantile Analysis

As shown in Appendix A, benchmark models such as FHS and HAR-J achieve slightly higher ranking metrics for 1% tail events than SA-HAR-J-Net. For example, at the 1% tail, SA-HAR-J-Net attains ROC-AUC = 0.834 and PR-AUC = 0.240, compared with 0.871 and 0.265 for FHS; at the 5% tail, the gap is smaller, and SA-HAR-J-Net attains the highest ROC-AUC (0.779). This suggests a practical trade-off between ex post tail event ranking and quantile calibration. Since Value-at-Risk forecasting is primarily a quantile calibration task, we focus our main conclusions on pinball loss and standard backtesting diagnostics. Figure 9 shows multi-quantile diagnostics for α = 0.01, 0.025, 0.05, 0.075, 0.10, including violation rate, pinball loss and ES.

3.3. Robustness: Historical Simulation Window Length

Historical simulation depends on the rolling window length. Figure 10 reports the sensitivity of the violation rate to the HS window size.

Table 12 details the coefficient estimates for the HAR-J quantile regression at the 5% level. A closer look at the results reveals an interesting hierarchy in risk drivers. While the immediate past day’s continuous volatility (c_lag₁) appears statistically insignificant (p > 0.10), the accumulated volatility over the past week and month (c_roll₅ and c_roll₂₂) plays a dominant role. The large, negative coefficients on these longer-term metrics suggest that persistent market stress is a far stronger predictor of downside risk than fleeting daily fluctuations. Additionally, the market shows a sharp sensitivity to immediate jumps (j_lag₁), significantly depressing the quantile. In contrast, past positive returns (r_lag₁) act as a buffer, showing a significant positive relationship that helps alleviate tail risk.

Table 13 summarizes the maximum likelihood estimates for the benchmark GARCH models. Two stylized facts of financial returns are clearly captured. First, the volatility exhibits high persistence, as the sum of the ARCH and GARCH coefficients (α₁ + β₁) is approximately 0.99 for both models. This implies that volatility shocks decay very slowly. Second, the estimated degrees of freedom parameter (ν) is around 6.9, far below infinity, which confirms the presence of heavy tails and justifies the use of the Student-t distribution. Additionally, the GJR-GARCH leverage parameter (γ₁) is small (0.0098), suggesting only a modest asymmetry in the response to positive versus negative shocks.

3.4. Robustness Checks

To assess stability across market regimes, we split the common test period into two equal halves (early vs. late; 219 trading days each) and recompute the main evaluation metrics within each subsample. Table 14 reports the resulting violation rates, quantile losses, and Expected Shortfall values. To verify the stability of our results, Figure 11 presents the rolling performance metrics over the test period. The top row (quantile loss) demonstrates that SA-HAR-J-Net consistently achieves the lowest loss trajectory across all confidence levels. The bottom row (violation rate) indicates that the model remains stable around the target levels, further confirming its robustness to market fluctuations.

3.5. Interpretability and Economic Drivers

Figure 12 presents a composite diagnostic analysis that connects predictive performance with interpretable economic drivers. The radar chart in Figure 12a visually distinguishes the proposed SA-HAR-J-Net (blue polygon); its minimal enclosed area across the four normalized metrics indicates that the model achieves the best trade-off between accuracy and stability, clearly outperforming the GARCH baselines.

More importantly, a comparison of Figure 12b,c reveals a robust consensus on the drivers of tail risk. The statistical HAR-J results highlight monthly continuous volatility (c_roll22) as a dominant negative factor (p < 0.001).

This finding is independently corroborated by the LightGBM SHAP analysis, which similarly identifies c_roll22 as the most influential feature. Such consistency between the linear “white-box” inference and the nonlinear “black-box” importance scores suggests that the models are capturing genuine volatility persistence rather than fitting spurious noise.

4. Discussion

Our empirical results indicate that SA-HAR-J-Net improves tail-quantile forecasting accuracy relative to classical and machine learning benchmarks on a common rolling out-of-sample period. Beyond the reduction in pinball loss, the results are consistent with the diagnostic evidence that ChiNext volatility is multi-scale and exhibits regime-dependent persistence. This section discusses how the multifractal findings inform the modeling choices and what the empirical results imply for tail risk forecasting.

4.1. Strong Persistence, Multi-Scale Structure, and Implications for Modeling 410

A central finding of our multifractal diagnostics (Section 2.1.5) is that ChiNext realized volatility exhibits pronounced multi-scale structure, and the estimated generalized Hurst exponent at q = 2 frequently exceeds unity [e.g., H(2) ≈ 1.15]. Although finite-sample effects, methodological specifications, and measurement noise embedded in realized volatility measures may affect the estimation of scaling behavior, the result H(2) > 1 indicates exceptionally strong persistence, long memory, and potential nonstationary or near-integrated dynamics in the log-volatility series. This evidence does not establish a structural law of motion; rather, it highlights that fixed-memory or static-parameter specifications may be restrictive when persistence varies across regimes.

4.2. Why Adaptive Reweighting Can Matter for Tail Risk

The classical HAR paradigm represents heterogeneous horizons through daily/weekly/monthly components but aggregates them with fixed linear weights. When the effective memory and the relevance of specific horizons shift across regimes, static aggregation can become suboptimal. SA-HAR-J-Net addresses this limitation by combining a BiLSTM encoder (flexible sequence representation) with a time self-attention layer that performs data-driven reweighting of historical information. This adaptive mechanism is consistent with the empirical evidence of time-varying persistence and provides an interpretable way to emphasize regime-relevant episodes and horizons.

4.3. Volatility-Scaled Bounding and Market Microstructure Considerations

The volatility-scaled bounded output mapping acts as a stability device for extreme-tail forecasting. In high-frequency settings, noisy inputs can induce overly volatile tail-quantile predictions; bounding the VaR forecast relative to recent realized volatility (e.g., via

\sqrt{R V}

) discourages implausibly extreme extrapolations and improves numerical stability during stress episodes. This design can introduce mild conservativeness (violation rates slightly below the nominal level), reflecting a trade-off between sharpness and calibration. Institutional features of China’s A-share market (e.g., price-limit regimes) provide additional motivation for using volatility-adaptive constraints, although we do not explicitly model price-limit mechanisms.

4.4. Limitations and Future Directions

Several limitations merit emphasis. First, the empirical study focuses on a single emerging-market index; extending the analysis to other markets and asset classes would clarify external validity. Second, multifractal estimates depend on methodological choices, and systematic robustness checks across scale ranges and window specifications remain important. Third, the attention mechanism may offer additional interpretive insight, but the present paper does not provide a dedicated empirical validation of attention-weight interpretability. Future work may extend the framework to multi-step and joint VaR–ES forecasting, explore hierarchical or regime-aware attention structures, and investigate whether summaries of the singularity spectrum f(α) can improve stability and performance when included as additional state variables.

5. Conclusions

In conclusion, this study bridges the gap between complexity-science diagnostics and deep learning forecasting for high-frequency financial markets. Using multifractal analysis on the ChiNext Index, we document a pronounced multi-scale structure in log-realized volatility, where the generalized Hurst exponent at q = 2 frequently exceeds unity. This evidence points to extremely strong persistence and nonstationary scaling behavior, suggesting that multifractality is driven primarily by temporal correlations. Motivated by these diagnostics, we propose SA-HAR-J-Net, a regime-adaptive framework. By integrating HAR-style multi-horizon tokenization with a BiLSTM encoder and a time self-attention mechanism, the model relaxes the static constraints of traditional specifications. The attention layer acts as a dynamic reweighting module that adaptively emphasizes regime-relevant historical episodes to accommodate time-varying persistence. Under a strict rolling out-of-sample protocol with no look-ahead bias, SA-HAR-J-Net achieves superior calibration and lower pinball loss at the 1% and 5% tail quantiles. These findings demonstrate that connecting empirical multifractal diagnostics to adaptive sequence models provides a robust pathway for tail risk monitoring in nonstationary emerging markets.

Author Contributions

Conceptualization, K.Z., S.W. and D.Z.; methodology, K.Z.; software, K.Z.; validation, K.Z.; investigation, K.Z.; writing—original draft preparation, K.Z.; writing—review and editing, S.W. and D.Z.; supervision, D.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw high-frequency intraday data can be obtained from official China exchange data channels and/or financial data vendors (e.g., Wind) and are subject to data-provider licensing restrictions. It can also be obtained from Tushare and other financial data vendors. https://tushare.pro/document/2?doc_id=370 (accessed on 15 February 2026).

Acknowledgments

During the preparation of this manuscript, the author used AI-assisted tools for language editing and code assistance. The author has reviewed and edited the output and takes full responsibility for the content of this publication.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A. Supplementary Results: Tail Event Ranking Metrics

VaR forecasting is fundamentally a quantile calibration task, so our main conclusions are based on pinball loss and standard coverage/backtesting diagnostics. As a complementary analysis, we evaluate how well models rank extreme tail risk days by treating the realized return tail event as a binary label and using the predicted VaR as a risk score. Table A1 reports ROC-AUC and PR-AUC (Average Precision) at the 1% and 5% tails. These ranking metrics are reported as complementary ex post diagnostics only. They are not used in model training, hyperparameter selection, or the main VaR backtesting evaluation.

Table A1. Tail event ranking performance measured by ROC-AUC and PR-AUC (Average Precision).

	1% Tail Events		5% Tail Events
Model	ROC-AUC	PR-AUC	ROC-AUC	PR-AUC
FHS	0.871	0.265	0.766	0.254
SA-HAR-J-Net	0.834	0.240	0.779	0.248
HAR-J	0.874	0.155	0.776	0.236
LGBM-Q	0.708	0.071	0.658	0.139
CNN-LSTM-Q	0.698	0.043	0.712	0.111
GARCH	0.501	0.013	0.545	0.063
GJR-GARCH	0.501	0.013	0.545	0.063
HS	0.476	0.011	0.440	0.047

Note: The binary label is defined as an unconditional tail event using the realized return quantile within the evaluation sample. This tail event label is used for ex post ranking diagnostics only; it is not used in model training, hyperparameter selection, or VaR backtesting. Columns are grouped by the target quantile level (α = 0.01 and α = 0.05).

Table A2. FHS HAR-J style volatility filtering coefficients.

Variable	Description	Estimate	Std. Error	t-Stat	p-Value
Intercept	Constant	1.94 × 10⁻⁵	7.61 × 10⁻⁶	2.55	0.011
c_lag₁	Continuous (daily)	0.358	0.027	13.24	<0.001
c_roll_5d	Continuous (5-day)	−0.027	0.054	−0.50	0.618
c_roll_22d	Continuous (22-day)	0.620	0.088	7.02	<0.001
j_lag₁	Jump (daily)	0.168	0.071	2.35	0.019
j_roll_5d	Jump (5-day)	0.736	0.203	3.63	<0.001
j_roll_22d	Jump (22-day)	−1.209	0.428	−2.83	0.005
ret_lag₁	Return lag 1	−0.003	0.000	−10.91	<0.001

Note: Model specification:

σ_{t} = β_{0} + β_{1} C_{t - 1} + β_{2} C_{t - 1 : t - 5} + β_{3} C_{t - 1 : t - 22} + β_{4} C_{t - 1} + β_{5} C_{t - 1 : t - 5} + β_{6} C_{t - 1 : t - 22} + β_{7} r_{t - 1} + ε_{t}

, where C denotes the continuous component and J denotes the jump component. BV stands for bipower variation.

Appendix B. Supplementary Results: Sensitivity of Volatility-Scaled Cap Multiplier

Table A3 shows the sensitivity of the volatility-scaled cap multiplier (Var_Cap_Mult) for different m ∈ {4, 5, 6, 7, 8}. The results report QL and Viol at the 1% and 5% levels, with m = 6 serving as the baseline.

Table A3. Volatility-scaled cap multiplier sensitivity.

m	QL (1%)	Viol (1%)	QL (5%)	Viol (5%)
4	0.073%	1.83%	0.200%	3.88%
5	0.069%	0.91%	0.200%	4.11%
6	0.071%	0.46%	0.199%	3.65% (Baseline)
7	0.071%	0.46%	0.197%	4.11%
8	0.071%	0.46%	0.196%	4.11%

References

Jorion, P. Value at Risk: The New Benchmark for Managing Financial Risk, 3rd ed.; McGraw-Hill: New York, NY, USA, 2007. [Google Scholar]
McNeil, A.J.; Frey, R.; Embrechts, P. Quantitative Risk Management: Concepts, Techniques and Tools; revised edition; Princeton University Press: Princeton, NJ, USA, 2015. [Google Scholar]
Cont, R. Empirical properties of asset returns: Stylized facts and statistical issues. Quant. Financ. 2001, 1, 223. [Google Scholar] [CrossRef]
Engle, R.F. Risk and volatility: Econometric models and financial practice. Am. Econ. Rev. 2004, 94, 405–420. [Google Scholar] [CrossRef]
Engle, R.F. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 1982, 50, 987–1007. [Google Scholar] [CrossRef]
Bollerslev, T. Generalized autoregressive conditional heteroskedasticity. J. Econom. 1986, 31, 307–327. [Google Scholar] [CrossRef]
Granger, C.W.J.; Joyeux, R. An introduction to long-memory time series models and fractional differencing. J. Time Ser. Anal. 1980, 1, 15–29. [Google Scholar] [CrossRef]
Geweke, J.; Porter-Hudak, S. The estimation and application of long memory time series models. J. Time Ser. Anal. 1983, 4, 221–238. [Google Scholar] [CrossRef]
Ding, Z.; Granger, C.W.J.; Engle, R.F. A long memory property of stock market returns and a new model. J. Empir. Financ. 1993, 1, 83–106. [Google Scholar] [CrossRef]
Baillie, R.T.; Bollerslev, T.; Mikkelsen, H.O. Fractionally integrated generalized autoregressive conditional heteroskedasticity. J. Econom. 1996, 74, 3–30. [Google Scholar] [CrossRef]
Andersen, T.G.; Bollerslev, T.; Diebold, F.X.; Labys, P. Modeling and forecasting realized volatility. Econometrica 2003, 71, 579–625. [Google Scholar] [CrossRef]
Barndorff-Nielsen, O.E.; Shephard, N. Econometric analysis of realized volatility and its use in estimating stochastic volatility models. J. R. Stat. Soc. Ser. B 2002, 64, 253–280. [Google Scholar] [CrossRef]
Andersen, T.G.; Bollerslev, T.; Diebold, F.X.; Labys, P. The distribution of realized exchange rate volatility. J. Am. Stat. Assoc. 2001, 96, 42–55. [Google Scholar] [CrossRef]
Barndorff-Nielsen, O.E.; Shephard, N. Power and bipower variation with stochastic volatility and jumps. J. Financ. Econom. 2004, 2, 1–37. [Google Scholar] [CrossRef]
Corsi, F. A simple approximate long-memory model of realized volatility. J. Financ. Econom. 2009, 7, 174–196. [Google Scholar] [CrossRef]
Patton, A.J.; Sheppard, K. Good volatility, bad volatility: Signed jumps and the persistence of volatility. Rev. Econ. Stat. 2015, 97, 683–697. [Google Scholar] [CrossRef]
Hillebrand, E. Neglecting parameter changes in GARCH models. J. Econom. 2005, 129, 121–138. [Google Scholar] [CrossRef]
McAleer, M.; Medeiros, M.C. Realized volatility: A review. Econom. Rev. 2008, 27, 10–45. [Google Scholar] [CrossRef]
Mandelbrot, B.B. The variation of certain speculative prices. J. Bus. 1963, 36, 394. [Google Scholar] [CrossRef]
Hurst, H.E. Long-term storage capacity of reservoirs. Trans. Am. Soc. Civ. Eng. 1951, 116, 770–799. [Google Scholar] [CrossRef]
Peng, C.-K.; Buldyrev, S.V.; Havlin, S.; Simons, M.; Stanley, H.E.; Goldberger, A.L. Mosaic organization of DNA nucleotides. Phys. Rev. E 1994, 49, 1685–1689. [Google Scholar] [CrossRef]
Kantelhardt, J.W.; Zschiegner, S.A.; Koscielny-Bunde, E.; Havlin, S.; Bunde, A.; Stanley, H.E. Multifractal detrended fluctuation analysis of nonstationary time series. Phys. A Stat. Mech. Its Appl. 2002, 316, 87–114. [Google Scholar] [CrossRef]
Torrence, C.; Compo, G.P. A practical guide to wavelet analysis. Bull. Am. Meteorol. Soc. 1998, 79, 61–78. [Google Scholar] [CrossRef]
Bacry, E.; Delour, J.; Muzy, J.-F. Multifractal random walk. Phys. Rev. E 2001, 64, 026103. [Google Scholar] [CrossRef]
Calvet, L.E.; Fisher, A.J. Multifractality in asset returns: Theory and evidence. Rev. Econ. Stat. 2002, 84, 381–406. [Google Scholar] [CrossRef]
Carpenter, J.N.; Whitelaw, R.F.; Lynch, A.W. The real value of China’s stock market. J. Financ. Econ. 2021, 139, 679–696. [Google Scholar] [CrossRef]
Hansen, P.R.; Lunde, A. Realized variance and market microstructure noise. J. Bus. Econ. Stat. 2006, 24, 127–161. [Google Scholar] [CrossRef]
Brownlees, C.T.; Gallo, G.M. Financial econometric analysis at ultra-high frequency: Data handling concerns. Comput. Stat. Data Anal. 2006, 51, 2232–2245. [Google Scholar] [CrossRef]
Amaya, D.; Christoffersen, P.; Jacobs, K.; Vasquez, A. Does realized skewness predict the cross-section of equity returns? J. Financ. Econ. 2015, 118, 135–167. [Google Scholar] [CrossRef]
Sainath, T.N.; Vinyals, O.; Senior, A.; Sak, H. Convolutional, long short-term memory, fully connected deep neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, Australia, 19–24 April 2015; pp. 4580–4584. [Google Scholar]
Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inform. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Koenker, R.; Bassett, G. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
Christoffersen, P. Evaluating interval forecasts. Int. Econ. Rev. 1998, 39, 841–862. [Google Scholar] [CrossRef]
Kupiec, P.H. Techniques for verifying the accuracy of risk measurement models. Division of research and statistics, division of monetary affairs. Fed. Reserve Board 1995, 95, 73–84. [Google Scholar] [CrossRef]
Engle, R.F.; Manganelli, S. CAViaR: Conditional autoregressive value at risk by regression quantiles. J. Bus. Econ. Stat. 2004, 22, 367–381. [Google Scholar] [CrossRef]
Andersen, T.G.; Bollerslev, T.; Diebold, F.X.; Ebens, H. The Distribution of Realized Stock Return Volatility. J. Financ. Econ. 2001, 61, 43–76. [Google Scholar] [CrossRef]
Boudoukh, J.; Richardson, M.; Whitelaw, R.F. The best of both worlds: A hybrid approach to calculating value at risk. Risk 1998, 11, 64–67. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Light GBM: A highly efficient gradient boosting decision tree. Adv. Neural Inform. Process. Syst. 2017, 30. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html (accessed on 6 April 2026).
Glosten, L.R.; Jagannathan, R.; Runkle, D.E. On the relation between the expected value and the volatility of the nominal excess return on stocks. J. Financ. 1993, 48, 1779–1801. [Google Scholar] [CrossRef]
Adesi, G.; Giannopoulos, K.; Vosper, L. VaR without correlations for portfolios of derivative securities. J. Futures Mark. 1999, 19, 583–602. [Google Scholar] [CrossRef]
Diebold, F.X.; Mariano, R.S. Comparing predictive accuracy. J. Bus. Econ. Stat. 2002, 20, 134–144. [Google Scholar] [CrossRef]

Figure 1. The process integrates four key stages: (1) construction of realized measures from five-minute intraday bars; (2) multifractal analysis to diagnose volatility persistence; (3) rolling out-of-sample training with periodic refitting to prevent look-ahead bias; and (4) comprehensive VaR evaluation focusing on tail reliability and quantile accuracy.

Figure 2. Static multifractal diagnostics for ChiNext log-realized volatility. (a) Shuffle test comparing generalized Hurst profiles H(q) for the original series and its shuffled counterpart (distribution preserved, temporal dependence removed). (b) Singularity spectrum f(α) summarizing multifractal complexity via the width ∆α.

Figure 3. Dynamic multifractal diagnostics. Panels align (a,b) log-realized volatility and (c) rolling DFA-based Hurst exponent H(2) (window = 500). Co-movement across panels is consistent with regime-dependent persistence and motivates adaptive reweighting via time self-attention, and (d) wavelet power spectrum (time-scale energy), power denotes the squared magnitude of the wavelet coefficients.

Figure 4. Exploratory visualization of daily returns and realized measures for the ChiNext Index (399006.SZ). (a) Log-returns time series, (b) return distribution with a normal fit, (c) Q–Q plot for normality check, (d) ACF of squared returns (volatility clustering), (e) realized variance (RV5) and bipower variation (BV), and (f) jump component series J = RV5 − BV.

Figure 5. Correlation matrix of realized features used in the HAR-style specification.

Figure 6. Technical roadmap of SA-HAR-J-Net, from data preprocessing and realized measure construction to HAR tokenization, BiLSTM encoding, time self-attention aggregation, and rolling OOS evaluation.

Figure 7. Training and validation learning curves for the CNN-LSTM quantile baseline (diagnostic).

Figure 9. Multi-quantile diagnostics across α ∈ {0.01, 0.025, 0.05, 0.075, 0.10}: (a) violation rate, (b) pinball (quantile) loss, and (c) Expected Shortfall (ES).

Figure 10. Sensitivity of HS calibration to rolling window length (5% VaR, ChiNext Index). (a) Violation rates rise from 5.94% (W = 125) to 6.62% (W = 500), above the nominal 5% level, indicating mildly liberal coverage. (b) U-shaped pinball loss is minimized at W = 250 (0.00225).

Figure 11. Rolling performance diagnostics. Rolling Loss (a–c) and Rolling Violation Rates (d–f) at significance levels α = 1%, 5%, and 10%.

Figure 12. Intermediate evidence used in the empirical evaluation: (a) radar chart of normalized metrics, (b) HAR-J coefficient estimates, (c) SHAP summaries for LGBM-Q, and (d) HAR-J residual diagnostics.

Table 1. Descriptive statistics of daily returns and realized measures (399006.SZ).

Variable	Mean	Std. Dev.	Min	Max
log_ret	0.000320	0.019224	−0.130337	0.157928
rv5	0.000212	0.000396	0.000000	0.007538
bv	0.000195	0.000357	0.000000	0.007463
J	0.000023	0.000086	0.000000	0.002912
rsk	0.146477	0.857009	−6.708204	6.708204
rkt	3.911952	2.085242	1.664353	45.000000

Note: In Table 1, rv5 denotes RV_t computed from five-minute intraday returns within the regular trading session.

Table 2. Global multifractal parameters of the ChiNext realized volatility series.

Series	H(2)	α_min	α_max	∆α
ChiNext log realized volatility	1.1547	0.9817	1.2259	0.2442

Table 3. Descriptive statistics of the time-varying Hurst exponent [rolling DFA, H(2)].

Mean	Std. Dev.	Min	Max
1.0360	0.0944	0.8625	1.2616

Table 4. Robustness check of MF-DFA parameters. The baseline configuration uses m = 1, s ∈ [16, 512], q ∈ [−5, 5] with ∆q = 1.0, and 20 scales.

Configuration	m	Scale Range (s)	q Range	H(2)	∆α	R² (q = 2)
Baseline	1	[16, 512]	[−5, 5]	1.155	0.244	0.996
Scale Range: [16, 256]	1	[16, 256]	[−5, 5]	1.114	0.223	0.997
Scale Range: [32, 512]	1	[32, 512]	[−5, 5]	1.152	0.285	0.994
Scale Range: [32, N/4]	1	[32, 728]	[−5, 5]	1.158	0.216	0.996
Number of Scales: 15	1	[16, 512]	[−5, 5]	1.160	0.148	0.997
Number of Scales: 25	1	[16, 512]	[−5, 5]	1.130	0.180	0.995
q Range: [−4, 4]	1	[16, 512]	[−4, 4]	1.155	0.160	0.996
q Range: [−6, 6]	1	[16, 512]	[−6, 6]	1.155	0.311	0.996
q Step: 0.5	1	[16, 512]	[−5, 5]	1.155	0.263	0.996
Detrend Order: m = 2	2	[16, 512]	[−5, 5]	1.070	0.269	0.998

Table 5. Multifractal properties of the original series compared with shuffled and IAAFT surrogate data.

Series	H(2)	α_min	α_max	∆α	R² (q = 2)
Original	1.155	0.982	1.226	0.244	0.996
Shuffled (permute time)	0.519	0.463	0.557	0.095	0.997
IAAFT surrogate	1.129	1.064	1.157	0.093	0.998

Table 6. Variable nomenclature and interpretations.

Symbol	Variable Name	Interpretation (Financial/Physical Meaning)
P_t,i	Intraday price	Five-minute close price; raw high-frequency signal
r_t,i	Intraday return	Five-minute log return; within-day movement intensity used to build realized measures
r_t	Daily return	Close-to-close log return; target whose left-tail quantiles define VaR
RV_t	Realized variance	Total within-day variation; how strongly prices fluctuated during day t
BV_t	Bipower variation	Continuous-volatility proxy; less sensitive to jump-like spikes than RV_t
J_t	Jump proxy	Proxy of discontinuous jumps: excess of RV_t over BV_t
RSK_t	Realized skewness	Asymmetry of intraday returns; captures downside-dominated risk when negative
RKT_t	Realized kurtosis	Tail heaviness; larger values indicate fat tails and more extreme outliers
∆²RSK_t	Skewness acceleration	Rapid changes in asymmetry; highlight turning points in tail shape
$x_{t}^{(d)}$ , $x_{t}^{(ω)}$ , $x_{t}^{(m)}$	HAR components	Daily/weekly/monthly aggregates approximating heterogeneous investor horizons (short/medium/long)
a_ℓ	Attention weight	Cognitive-inspired focus on salient historical days; a higher a_ℓ means higher relevance for tail risk
cap_t	Vol-scaled safety buffer	Volatility-scaled bound that stabilizes tail forecasts under noisy high-frequency inputs

Table 7. Summary of model configurations and key parameters used in the empirical comparison. (see Appendix Table A2).

Model	Key Settings/Parameters
SA-HAR-J-Net (neural quantile model)	Rolling OOS expanding window; lookback L = 60; HAR tokens: daily/weekly(5)/monthly(22); per-token features (RV, BV, J, RSK, ∆²RSK_t, RKT) (dim = 6); BiLSTM hidden = 64 (per direction); attention dim = 128; dropout = 0.3; lr = 5 × 10⁻⁴; batch = 256; max epochs = 20 with patience = 20; refit every 20 prediction days; bounded VaR with cap multiplier m = 6.0 and ε = 10⁻⁸.
HS [38] (nonparametric)	Historical simulation with rolling window W = 250 trading days for the main comparison (sensitivity also reported for W ∈ {125, 500}).
HAR-J [15] (linear quantile regression)	Quantile regression at α = 0.05 with HAR-style regressors: (c_lag₁, c_roll₅, c_roll2₂, j_lag₁, j_roll₅, j_roll₂₂, r_lag₁).
LGBM-Q [39] (tree-based quantile model)	LightGBM quantile objective (α = 0.05) using the HAR-J feature set; learning_rate = 0.05; num_leaves = 31; n_estimators = 2000; best_iteration = 57.
CNN–LSTM-Q [30] (deep sequence baseline)	Window size = 20; Conv1D filters = 32; LSTM units = 50/50; dropout = 0.20; lr = 1 × 10⁻³; batch = 32; trained 17 epochs (quantile loss).
GARCH [6] (parametric volatility model)	Student-t GARCH(1,1): µ = 0.0421, ω = 0.0247, α₁ = 0.0639, β₁ = 0.9318, ν = 6.8667.
GJR-GARCH [40] (parametric volatility model)	Student-t GJR-GARCH(1,1): µ = 0.0393, ω = 0.0262, α₁ = 0.0605, γ₁ = 0.0098, β₁ = 0.9297, ν = 6.8808.
FHS [41] (semi-parametric)	Filtered historical simulation with a HAR-J-style volatility filtering stage (coefficients reported) and empirical tail estimation on standardized residuals.

Table 8. LightGBM quantile model hyperparameters (summary).

Learning_Rate	Num_Leaves	n_Estimators	Best_Iteration	Best Val QL
0.05	31	2000	57	0.001278

Table 9. CNN–LSTM quantile model training summary (window-based sequence baseline).

Window	Epochs	Batch	Train Loss	Val Loss	lr	Conv Filters	LSTM1	LSTM2	Dropout
20	17	32	0.01391	0.01000	1 × 10⁻³	32	50	50	0.20

Table 10. 5% VaR backtesting results on the common test set.

Model	QL	Viol. Rate	ES	Clust.	Kupiec p	Christ. p
SA-HAR-J-Net	0.197%	3.88%	−3.55%	2	26.43%	15.95%
HS	0.225%	6.16%	−3.92%	5	27.99%	2.18%
HAR-J	0.212%	2.28%	−5.22%	1	0.36%	21.56%
LGBM-Q	0.215%	5.94%	−3.73%	4	38.20%	7.28%
CNN-LSTM-Q	0.211%	5.25%	−3.96%	2	81.09%	48.44%
GARCH	0.256%	0.68%	−9.63%	0	0%	-
GJR-GARCH	0.252%	0.91%	−8.31%	0	0%	-
FHS	0.204%	2.05%	−4.40%	0	0.14%	-

Note: QL denotes average pinball loss at α = 0.05 (reported to 5 decimals). ES is reported to 4 decimals. Clust. denotes the number of violation clusters (runs of consecutive VaR violations). “-” indicates that the Christoffersen independence test is not applicable or not well-defined due to too few violations in the evaluation sample.

Table 11. Diebold–Mariano tests for predictive accuracy (5% VaR, common test period).

Model 1	Model 2	DM Statistic	p-Value
HAR-J	LGBM-Q	−0.2509	0.8020
LGBM-Q	CNN-LSTM-Q	0.7073	0.4797
HAR-J	HS	−0.6669	0.5052
SA-HAR-J-Net	HARJ	−1.78	0.076
SA-HAR-J-Net	FHS	−0.79	0.431
SA-HAR-J-Net	LGBM	−1.14	0.255
SA-HAR-J-Net	CNN-LSTM	−0.69	0.491

Table 12. HAR-J quantile regression coefficients (α = 0.05).

Regressor	Coef.	Std. Err.	t-Stat	p-Value
c_lag₁	−4.3084	3.6385	−1.1841	0.2365
c_roll₅	−22.8610	7.4711	−3.0599	0.0022
c_roll₂₂	−58.6138	15.3361	−3.8219	0.0001
j_lag₁	−31.5226	7.3358	−4.2971	0.0000
j_roll₅	26.4945	30.2555	0.8757	0.3813
j_roll₂₂	178.4416	67.7801	2.6327	0.0085
r_lag₁	0.1777	0.0388	4.5778	0.0000

Table 13. GARCH(1,1)-t and GJR-GARCH(1,1)-t parameter estimates.

Model	μ	ω	α₁	γ₁	β₁	ν
GARCH(1,1)-t	0.0421	0.0247	0.0639	-	0.9318	6.8667
GJR-GARCH(1,1)-t	0.0393	0.0262	0.0605	0.0098	0.9297	6.8808

Note: µ denotes the conditional mean, ω the variance intercept, α₁ the ARCH term, β₁ the GARCH persistence term, and ν the degrees of freedom of the Student-t innovations. The leverage/asymmetry parameter γ₁ applies only to the GJR-GARCH specification; “-” indicates not applicable.

Table 14. Subsample robustness in the common test period (early half vs. late half; 219 days each).

	Early Half (E)			Late Half (L)
Model	Viol	QL	ES	Viol	QL	ES
SA-HAR-J-Net	4.57%	0.00178	−0.0275	3.20%	0.00216	−0.0470
HS	7.76%	0.00219	−0.0355	4.57%	0.00231	−0.0456
HAR-J	2.28%	0.00209	−0.0465	2.28%	0.00215	−0.0579
LGBM-Q	5.94%	0.00206	−0.0345	5.94%	0.00225	−0.0401
CNN-LSTM-Q	5.48%	0.00202	−0.0370	5.02%	0.00219	−0.0425
GARCH	0.91%	0.00243	−0.0794	0.46%	0.00268	−0.1303
GJR-GARCH	0.91%	0.00241	−0.0794	0.91%	0.00264	−0.0869
FHS	1.37%	0.00190	−0.0337	2.74%	0.00218	−0.0492

Note: Viol (E/L), QL (E/L), and ES (E/L) denote early/late-half metrics, respectively. Viol is reported as a percentage, QL to 5 decimals, and ES to 4 decimals.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, K.; Wu, S.; Zhu, D. Modeling Time-Varying Volatility via Multi-Scale Structures and Dynamic Attention Networks: Evidence from High-Frequency Data. Mathematics 2026, 14, 1257. https://doi.org/10.3390/math14081257

AMA Style

Zhang K, Wu S, Zhu D. Modeling Time-Varying Volatility via Multi-Scale Structures and Dynamic Attention Networks: Evidence from High-Frequency Data. Mathematics. 2026; 14(8):1257. https://doi.org/10.3390/math14081257

Chicago/Turabian Style

Zhang, Kaidi, Shaobing Wu, and Dong Zhu. 2026. "Modeling Time-Varying Volatility via Multi-Scale Structures and Dynamic Attention Networks: Evidence from High-Frequency Data" Mathematics 14, no. 8: 1257. https://doi.org/10.3390/math14081257

APA Style

Zhang, K., Wu, S., & Zhu, D. (2026). Modeling Time-Varying Volatility via Multi-Scale Structures and Dynamic Attention Networks: Evidence from High-Frequency Data. Mathematics, 14(8), 1257. https://doi.org/10.3390/math14081257

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Time-Varying Volatility via Multi-Scale Structures and Dynamic Attention Networks: Evidence from High-Frequency Data

Abstract

1. Introduction

2. Materials and Methods

2.1. Data and Preprocessing

2.1.1. Intraday Data and Trading Session

2.1.2. Daily Realized Measures

2.1.3. Descriptive Statistics

2.1.4. No Look-Ahead Design and OOS Protocol

2.1.5. Multifractal Dynamics Evidence and Motivation for Attention

2.1.6. Exploratory Visualization

2.2. Methodology

2.2.1. Variable Nomenclature (For Non-Finance Readers)

2.2.2. VaR as a Conditional Quantile and Pinball Loss

2.2.3. Approximation of Fractional Integration via HAR Tokenization

2.2.4. SA-HAR-J-Net: BiLSTM with Time Self-Attention and Bounded Output

2.2.5. Rolling OOS Training with Periodic Refitting

2.3. Rolling OOS Protocol and Reproducibility

2.3.1. SA-HAR-J-Net Configuration

2.3.2. Benchmark Model Specifications

3. Results

3.1. Main Predictive Performance

3.2. Multi-Quantile Analysis

3.3. Robustness: Historical Simulation Window Length

3.4. Robustness Checks

3.5. Interpretability and Economic Drivers

4. Discussion

4.1. Strong Persistence, Multi-Scale Structure, and Implications for Modeling 410

4.2. Why Adaptive Reweighting Can Matter for Tail Risk

4.3. Volatility-Scaled Bounding and Market Microstructure Considerations

4.4. Limitations and Future Directions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Supplementary Results: Tail Event Ranking Metrics

Appendix B. Supplementary Results: Sensitivity of Volatility-Scaled Cap Multiplier

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI