Next Article in Journal
Unsupervised Voting for Detecting the Algorithmic Solving Strategy in Competitive Programming Solutions
Previous Article in Journal
Progressive Policy Learning: A Hierarchical Framework for Dexterous Bimanual Manipulation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DFC-LSTM: A Novel LSTM Architecture Integrating Dynamic Fractal Gating and Chaotic Activation for Value-at-Risk Forecasting

by
Yilong Zeng
1,
Boyan Tang
1,2,
Zhefang Zhou
1,3,* and
Raymond S. T. Lee
1,3,*
1
Faculty of Science and Technology, Beijing Normal-Hong Kong Baptist University, Zhuhai 519000, China
2
The Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, Shenzhen 518000, China
3
Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, Beijing Normal-Hong Kong Baptist University, Zhuhai 519087, China
*
Authors to whom correspondence should be addressed.
Mathematics 2025, 13(22), 3587; https://doi.org/10.3390/math13223587
Submission received: 9 October 2025 / Revised: 4 November 2025 / Accepted: 6 November 2025 / Published: 8 November 2025
(This article belongs to the Section E5: Financial Mathematics)

Abstract

Accurate Value-at-Risk (VaR) forecasting is challenged by the non-stationary, fractal, and chaotic dynamics of financial markets. Standard deep learning models like LSTMs often rely on static internal mechanisms that fail to adapt to shifting market complexities. To address these limitations, we propose a novel architecture: the Dynamic Fractal–Chaotic LSTM (DFC-LSTM). This model incorporates two synergistic innovations: a multifractal-driven dynamic forget gate that utilizes the multifractal spectrum width ( Δ α ) to adaptively regulate memory retention, and a chaotic oscillator-based dynamic activation that replaces the standard tanh function with the peak response of a Lee Oscillator’s trajectory. We evaluate the DFC-LSTM for one-day-ahead 95% VaR forecasting on S&P 500 and AAPL stock data, comparing it against a suite of state-of-the-art benchmarks. The DFC-LSTM consistently demonstrates superior statistical calibration, passing coverage tests with significantly higher p-values—particularly on the volatile AAPL dataset, where several benchmarks fail—while maintaining competitive economic loss scores. These results validate that embedding the intrinsic dynamical principles of financial markets into neural architectures leads to more accurate and reliable risk forecasts.

1. Introduction

In modern financial risk management, accurate forecasting of asset return volatility is essential. Volatility serves as a fundamental risk measure that informs investment decisions and constitutes the primary input for risk assessment frameworks such as Value-at-Risk (VaR). As a widely adopted metric for market risk, VaR is extensively used by financial institutions for regulatory capital requirements and internal risk management [1]. Consequently, producing precise and adaptive VaR forecasts is critical for effectively navigating the complexities and turbulence of financial markets.
Throughout the history of VaR estimation, various methods have been developed, with the parametric approach being one of the primary paradigms. This method evaluates risk by analyzing the return distribution of an asset over a look-back period and estimating its volatility and expected return. Within this framework, accurate modeling of volatility has become crucial, leading to the development of many sophisticated volatility models, among which the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) family is the most prominent [2]. Efforts to improve predictive accuracy have also explored the use of more complex probability density functions and the incorporation of time-varying higher-order conditional moments. While GARCH models are highly regarded for their statistical rigor and ability to capture ‘stylized facts’ such as volatility clustering, their foundation relies on strict statistical assumptions—for example, specific error distributions. This structural rigidity often results in suboptimal performance amid the profound non-linearity and dynamic complexity of financial markets, particularly during periods of extreme stress [3].
To overcome the limitations of traditional models, both academia and industry have increasingly turned to deep learning (DL), which offers fewer modeling constraints and enhanced feature extraction capabilities [4,5]. Given the pervasive long-memory property in financial time series [6], Recurrent Neural Networks (RNNs), capable of retaining historical information, are considered more appropriate for processing such data than Feed-forward Neural Networks (FNNs) [7]. However, conventional RNNs encounter difficulties with vanishing or exploding gradients during the training of long sequences [8]. The Long Short-Term Memory (LSTM) network, introduced by Hochreiter & Schmidhuber [9], addresses these issues through its gating mechanisms, enabling the model to capture features over extended time horizons. Consequently, LSTMs are theoretically well-suited for volatility forecasting tasks [5].
Building on this foundation, researchers have pursued various approaches to further enhance LSTM performance in financial forecasting. A prominent trend involves developing hybrid models that combine the statistical insights of GARCH models with the sequential learning strengths of LSTMs. Early studies, such as those by Kim & Won [4] and Hu et al. [10], demonstrated the effectiveness of this approach by feeding GARCH-derived predictions as external features into LSTM networks. More recent advancements have integrated these methodologies more deeply; for instance, Zhao et al. [11] proposed re-engineering and embedding the mathematical structure of GARCH models directly within the LSTM cell, which not only improved the model’s interpretability in a financial context but also yielded superior predictive performance over traditional econometric models. Simultaneously, another avenue of research enhances model capabilities by incorporating external information, notably utilizing Natural Language Processing (NLP) techniques to extract market sentiment from textual data as an additional input [12]. Since these hybrid and information-enhanced methods aim to improve volatility forecast accuracy, they have naturally been extended to the more complex task of VaR estimation, with promising results [3].
Despite notable advances in fusion strategies, current state-of-the-art forecasting models still encounter two fundamental bottlenecks rooted in the intrinsic properties of financial data. The first pertains to input feature fidelity: financial time series are inherently highly non-stationary [13], with their mean and variance evolving dynamically over time. Deeper still, this non-stationarity often manifests as intricate fractal structures, where the complex temporal correlations give rise to multifractal characteristics—beyond mere fat-tail distributions [14]. Multifractal analysis, a powerful tool in nonlinear dynamics, can reveal the “bursty” and heterogeneous nature of market fluctuations at fine scales [15]. However, a persistent technical challenge in standard multifractal analysis—the use of non-overlapping segmentation—can introduce spurious fluctuations that undermine the stability of fractal measurements [16]. Nonetheless, its core metric, the multifractal spectrum width ( Δ α ), remains an effective indicator of the degree of inhomogeneity and local complexity in market dynamics. Recent studies have demonstrated that incorporating such refined fractal features into machine learning models can substantially improve prediction accuracy [17]. This underscores that, without a robust fractal perspective, even advanced features derived from GARCH models or sentiment analysis may only offer a partial picture, failing to fully capture the deep, high-fidelity structural state of the market.
Beyond this input-level challenge lies a more fundamental bottleneck: the responsiveness of the model architecture itself. Most existing deep learning models rely on static gating mechanisms and simple activation functions (e.g., ReLU or tanh), which reveal inherent limitations when processing highly dynamic and non-stationary financial signals [8]. This mismatch stems from a disparity between the complexity of the signals—characterized by chaos, nonlinearity, and rapid changes—and the fixed logic embedded within such models. While the nature of financial chaos remains an active research area, its significance in describing nonlinear market behaviors is widely recognized [18]. Static gates, for instance, cannot adaptively modify their memory management strategies in response to fluctuations’ intensity—which can be quantified by the multifractal spectrum width—leading to suboptimal information retention. Additionally, conventional activation functions like tanh are insufficient for capturing the rich, chaotic nonlinearities inherent in financial data, impeding the model’s ability to characterize intrinsic complexity and abrupt transitions. This mechanistic rigidity means that, even with perfect input features, the internal processing may fail to respond appropriately to market dynamics, resulting in information decay during transmission [19]. Therefore, developing an adaptive architecture capable of addressing both feature fidelity and responsiveness—by dynamically adjusting to market complexity—is a key challenge for advancing financial forecasting models.
To systematically address the challenges outlined above, this paper introduces the Dynamic Fractal–Chaotic Long Short-Term Memory (DFC-LSTM) network. The central idea is to embed the intrinsic dynamical principles of financial markets—namely, fractal geometry and deterministic chaos—directly into the core computational units of the network, thereby simultaneously tackling both the responsiveness and fidelity bottlenecks. The proposed architecture incorporates two synergistic innovations. First, the traditional static forget gate is replaced by a multifractal-driven dynamic forget gate, which leverages the multifractal spectrum width ( Δ α ) to adaptively regulate memory retention in real time, according to the evolving complexity of the market. Second, the conventional tanh activation function is substituted with a chaotic oscillator-based dynamic activation, which evolves along a complex nonlinear trajectory; its peak response is used as the activation output, addressing the complexity mismatch inherent in static nonlinearities. Additionally, by using the multifractal spectrum width—an effective high-fidelity measure of the market state—as the direct input to modulate the network’s gating mechanisms, the DFC-LSTM enhances its fidelity to the true internal market structure, ensuring that decision-making is grounded in a more accurate perception of market dynamics.
The main contributions of this paper can be summarized as follows:
  • We introduce the DFC-LSTM, an architecture that embeds the intrinsic dynamical features of financial markets—namely, fractal complexity and chaos—directly into the core mechanisms of the neural network cell.
  • We develop a dynamic forget gate modulated by a robust, real-time fractal indicator derived from an Overlapped Sliding Window Multifractal Detrended Fluctuation Analysis (OSW-MF-DFA), which enables an adaptive memory policy responsive to shifts in market regimes.
  • We are the first to incorporate the maximum trajectory response of a chaotic oscillator as a dynamic activation function within a VaR forecasting LSTM framework, effectively addressing the complexity mismatch challenge in deep learning models.
  • We empirically demonstrate that the DFC-LSTM delivers more accurate and reliable VaR forecasts across datasets with varying volatility characteristics, including a broad market index and a highly volatile individual stock.
The remainder of this paper is organized as follows: Section 2 provides a detailed exposition of the DFC-LSTM methodology. Section 3 describes the data and experimental design. Section 4 presents the empirical results, while Section 5 discusses their implications. Finally, Section 6 offers concluding remarks and outlines potential avenues for future research.

2. Materials and Methods

This section is dedicated to the detailed exposition of the novel components and the final architecture of our proposed model. We begin by introducing the optimized multifractal analysis used to quantify market complexity. We then describe the chaotic oscillator that serves as a dynamic activation system. Finally, we present the complete Dynamic Fractal–Chaotic Long Short-Term Memory (DFC-LSTM) architecture, illustrating how the preceding components are integrated to create a financially-aware, adaptive neural network.

2.1. Multifractal Analysis for Market Complexity Quantification

To obtain a high-fidelity measure of market complexity, we require a method capable of capturing the nonlinear, scale-dependent fluctuations inherent in financial time series. The Multifractal Detrended Fluctuation Analysis (MF-DFA) is particularly well-suited for this task as it is designed for non-stationary data [17]. As a robust and widely-used technique from econophysics, MF-DFA characterizes the scaling properties of a time series by examining its fluctuation at different scales [20]. However, the standard MF-DFA procedure, which partitions the time series into non-overlapping segments, can be prone to spurious fluctuations arising from discontinuities at the segment boundaries [16]. To enhance the robustness and reliability of the analysis, this study adopts an optimized variant that utilizes an overlapping smoothing window, a method often referred to as Overlapped Sliding Window MF-DFA (OSW-MF-DFA). This optimization, which has been verified and successfully applied by numerous scholars to generate more stable multifractal measurements, forms the basis of our approach [21,22].
In our study, the OSW-MF-DFA is implemented from scratch and applied over a rolling window of the Realized Volatility series. A window length of T = 252 days was chosen for this rolling calculation, corresponding to approximately one trading year, which is a standard choice in financial time series analysis for capturing annual patterns and ensuring a stable estimate of the evolving fractal structure. The OSW-MF-DFA procedure consists of the following steps:
  • Profile Construction: For a given time series x k of length N, its profile Y ( i ) is first created by subtracting the series mean x ¯ and then computing the cumulative sum:
    Y ( i ) = k = 1 i ( x k x ¯ ) , i = 1 , , N
  • Overlapped Segmentation: The profile Y ( i ) is partitioned using a sliding window of length s. To ensure a more robust averaging, we employ an overlapping strategy. For each scale s, an overlap length l is defined based on a fixed overlap ratio of 1 / 3 , such that l = s / 3 . This specific ratio is demonstrated in the literature to offer a robust balance between enhancing measurement stability and maintaining computational efficiency [21,22]. The window then slides forward along the profile with a step size of step = s l . This process generates a larger set of N s = ( N s ) / step + 1 overlapping segments, providing a more comprehensive analysis of the data at each scale.
  • Local Detrending: For each segment v { 1 , , N s } , the local trend is removed by a polynomial fit. A least-squares polynomial, y v ( i ) , is calculated for the data in each segment. In this study, we use a first-order polynomial ( m = 1 ), which corresponds to linear detrending. Linear detrending is the standard approach for financial data, effectively removing local trends without the risk of overfitting to noise sometimes associated with higher-order polynomials [23].
  • Variance Calculation: The variance (mean square error) for each detrended segment is calculated as:
    F 2 ( s , v ) = 1 s i = 1 s Y ( ( v 1 ) · step + i ) y v ( i ) 2
  • q-th Order Fluctuation Function: The q-th order fluctuation function, F q ( s ) , is obtained by averaging the variances over all segments. Based on our implementation for a range of moments q [ 5 , 5 ] , this range is standard in multifractal analysis as it is generally sufficient to capture the scaling behavior associated with both small ( q < 0 ) and large ( q > 0 ) fluctuations [20,23]. F q ( s ) is defined as:
    F q ( s ) = 1 N s v = 1 N s ( F 2 ( s , v ) ) q / 2 1 / q if q 0 exp 1 2 N s v = 1 N s ln ( F 2 ( s , v ) ) if q = 0
  • Scaling Exponent Estimation: For a time series with fractal properties, F q ( s ) exhibits a power-law relationship with the scale s, such that F q ( s ) s h ( q ) . The generalized Hurst exponent, h ( q ) , is estimated as the slope of a linear regression on the log-log plot of F q ( s ) versus s.
  • Multifractal Spectrum Derivation: The final step is to derive the multifractal spectrum f ( α ) from the generalized Hurst exponents h ( q ) . This is achieved through a Legendre transform of the mass exponent τ ( q ) :
    τ ( q ) = q h ( q ) 1
    α ( q ) = d τ ( q ) d q
    f ( α ) = q α ( q ) τ ( q )
    In our implementation, the derivative is computed numerically. The width of the singularity spectrum, Δ α = α m a x α m i n , is then calculated. A larger Δ α indicates a higher degree of multifractality. This Δ α value serves as the high-fidelity indicator of the market’s structural state, which is used as a dynamic driver for our proposed DFC-LSTM model.
While a comprehensive sensitivity analysis varying each parameter is beyond the scope of this study, the selected configuration represents a standard, robust, and computationally reasonable approach well-supported by existing research in the field.

2.2. Chaotic Oscillator as a Dynamic Activation System

To address the model responsiveness bottleneck and the associated “complexity mismatch”, we replace the standard static activation function (e.g., tanh) with a dynamic system. The decision to use an oscillator represents a paradigm shift grounded in neuroscience, which reveals that the brain operates not on simple feed-forward firing, but on a substrate of continuous and chaotic oscillations [24]. Financial markets, as complex systems driven by collective human behavior, may be more faithfully modeled by such dynamic, oscillatory principles. After all, market price fluctuations are the emergent result of this collective behavior—the aggregation of countless individual decisions to buy and sell, which creates complex supply and demand dynamics. Instead of a simple static mapping like tanh, the oscillator provides a dynamic processing unit. Its chaotic, non-periodic, and bounded trajectory is theoretically a better match for encoding the complex, ‘bursty’ nature of these incoming financial signals.

2.2.1. The Lee Oscillator Model

We employ the Lee Oscillator, a form of Chaotic Oscillatory Cell (COC) known for its ability to generate complex, non-periodic, and bounded dynamics that emulate biologically plausible neural activity [25]. Specifically, this study uses an advanced variant, the Lee Oscillator with Retrograde Signaling (LORS), which enhances the original design by incorporating retrograde signaling mechanisms observed in neuroscience [26,27], boosting its biological plausibility. The neural architecture of the LORS is depicted in Figure 1. Its dynamic behavior is governed by the following set of equations:
f ( μ ; x ) = tanh ( μ x )
S t = i + e · tanh ( i )
E t + 1 = f ( a 1 LORS t + a 2 E t a 3 I t + a 4 S t ξ E )
I t + 1 = f ( b 1 LORS t b 2 E t b 3 I t + b 4 S t ξ I )
Ω t + 1 = f ( S t )
LORS t + 1 = [ E t + 1 I t + 1 ] · e k S t 2 + Ω t + 1
where E t and I t are the states of the excitatory and inhibitory neurons. The term i represents the base input stimulus (i.e., the pre-activation value from the neural network layer), which is modulated into the external stimulus S t . The parameters a j and b j are weights governing the internal connections, and ξ E and ξ I are the corresponding thresholds. The term LORS t is the oscillator’s final output at time step t. Crucially, the exponential term e k S t 2 in Equation (12) acts as a decay operator; it dampens the contribution of the internal state difference ( [ E t + 1 I t + 1 ] ) when the input stimulus S t becomes large, which helps to stabilize the oscillator dynamics and prevent runaway activation.

2.2.2. Parameterization and Diverse Dynamics

A key aspect of our methodology is the use of a diverse set of ten parameterized Lee oscillators. The initial eight types (T1–T8) were rigorously derived from systematic studies of the oscillator’s bifurcation behavior and exhibit a broad range of dynamics, from simple bifurcations to dense chaos [28]. To better capture the specific characteristics of financial markets, we extend this set with two additional configurations, T9 and T10, which were empirically designed to model phenomena such as abrupt regime switching and persistent high uncertainty. The specific parameter settings for each of the 10 types are detailed in Table 1. Their corresponding bifurcation diagrams are illustrated in Figure 2. These diagrams plot the steady-state oscillator output ( LORS t ) on the y-axis against the input stimulus (i) on the x-axis, visually confirming the rich variety of dynamic behaviors—ranging from simple fixed points and periodic oscillations to complex chaotic regimes—generated by these parameter sets.

2.2.3. Activation Derivation via Max-over-Time Pooling

The output of the Lee Oscillator is a temporal trajectory, which is architecturally incompatible with standard neural network layers that expect a scalar activation. To distill this dynamic information into a practical scalar value, we employ a Max-over-Time (MoT) pooling strategy. For a given pre-activation value x and a selected oscillator type (treated as a model hyperparameter), the oscillator’s internal states evolve over a predefined number of steps, N h i s t . The MoT mechanism then selects the single maximum response value from this entire internal evolution. The final activation g ( x ) is thus defined as:
g ( x ) = max t [ 1 , N h i s t ] { LORS t ( x ) }
This process effectively transforms each parameterized oscillator into a unique, static, yet highly nonlinear “meta-activation function”. This mechanism ensures that the activation process is no longer a simple mapping but a dynamic history-dependent computation that can produce a far richer set of responses compared to a simple ‘tanh’ function. To maintain computational efficiency during training, we utilize a pre-computation strategy where the input-output mapping of each meta-activation function is stored in a lookup table.
A critical challenge in training a network with the g c o c activation is twofold. First, the Max-over-Time (MoT) pooling operation (Equation (13)) is non-differentiable. Second, the standard Delta Rule for backpropagation [29], which relies on the derivative d I = 1 tanh ( I ) 2 (where I is the pre-activation input), is ill-suited for this architecture. In the complex chaotic regions, the traditional ‘tanh’ function is insufficient to describe the non-linear dynamics. Furthermore, as noted in the Chaotic Oscillatory Neural Network (CONN) literature [27], the system is highly sensitive to the input I, and the derivative d I rapidly approaches zero as the input magnitude increases, leading to the well-known vanishing gradient problem.
To overcome these issues, we employ a surrogate gradient strategy. This approach bypasses both the non-differentiable MoT and the unstable input I by computing a proxy gradient based only on the stable output  z = g c o c ( x ) . The mathematical rationale is as follows: we treat the final output z as if it were the result of a simple hyperbolic tangent function, z tanh ( I ) . This implies an inverse relationship using the inverse hyperbolic tangent (atanh), I atanh ( z ) . By substituting this approximation of I back into the original derivative formula, we derive the surrogate gradient d surrogate :
d surrogate = 1 tanh ( atanh ( z ) ) 2 = 1 z 2
This 1 z 2 approximation is equivalent to indirectly estimating the derivative of the activation function using the square of the output z. This method avoids the direct computation of the unstable input I from the chaotic region, which significantly improves numerical stability and greatly alleviates the vanishing gradient problem. This strategy is implemented using a custom function defining separate forward and backward computational paths: the forward pass uses the efficient, pre-computed lookup table to find the true z, while the backward pass applies the 1 z 2 surrogate gradient to the incoming error signal. This ensures that the network can be trained efficiently. Our internal comparisons confirmed that this method provides superior performance and stability compared to both direct automatic differentiation (which is computationally explosive) and simple linear interpolation of the lookup table’s gradient.

2.3. The DFC-LSTM Architecture

The proposed Dynamic Fractal–Chaotic LSTM (DFC-LSTM) integrates principles from multifractal analysis and chaos theory directly into the cell structure of a standard LSTM network. This approach is inspired by recent works that have successfully modified the internal gates of recurrent units to adapt to the non-stationary characteristics of time series, such as the Multifractal Gated Recurrent Unit (MF-GRU) [13]. While the MF-GRU adapts the GRU’s update gate, our DFC-LSTM introduces two distinct innovations to the LSTM architecture: one modifying the forget gate and another replacing the candidate memory state activation.
The overall architecture of the DFC-LSTM cell at time step t is visualized in Figure 3.
At each time step t, the DFC-LSTM cell takes the previous hidden state h t 1 , the previous cell state C t 1 , and the current input vector x t to compute the new hidden state h t and cell state C t . The input vector x t is composed of the primary time series features x t and the corresponding multifractal spectrum width Δ α t . Let the combined input for the gates be combined t = [ x t , h t 1 ] . The cell’s dynamics are governed by the following equations:
e t = σ ( Δ α t )
f t = σ ( W f 1 · combined t + b f 1 ) · e t + ( W f 2 · combined t + b f 2 ) · ( 1 e t )
i t = σ ( W i · combined t + b i )
C ˜ t = g c o c ( W C · combined t + b C )
o t = σ ( W o · combined t + b o )
C t = f t C t 1 + i t C ˜ t
h t = o t tanh ( C t )
where ⊙ denotes element-wise multiplication, σ is the sigmoid function, and W and b are learnable weights and biases. Equation (16) defines the Multifractal-driven Dynamic Forget Gate. Here, the normalized complexity measure e t acts as a dynamic interpolator between two “expert” weight networks, ( W f 1 , b f 1 ) and ( W f 2 , b f 2 ) . This structure allows the gate’s sensitivity to past information to change based on the current market complexity. We chose this pre-activation interpolation design—where the weighted sum from the two expert networks is combined before the sigmoid function—over a post-activation interpolation (blending the outputs of two separate sigmoid gates) because it allows for a more expressive non-linear interaction between the experts and is computationally slightly more efficient, requiring only one sigmoid calculation.
A critical design choice in our architecture is the selective replacement of the activation functions. A standard LSTM cell utilizes the ‘tanh’ function in two distinct roles. The first role is in the generation of the candidate memory state ( C ˜ t ), where it processes the current inputs to formulate “new” information. It is precisely this ‘tanh’ that we replace with the Chaotic Oscillator Activation ( g c o c ( · ) ), as defined in Equation (18) and detailed in Section 2.2. The rationale is to use a complex, dynamic system to interpret and encode the chaotic nature of the incoming financial signal.
The second role of ‘tanh’ in a standard LSTM is to squash the updated cell state C t before the final output, as shown in Equation (21). The purpose of this function is purely mathematical: to bound the cell state and ensure the numerical stability of the hidden state h t that is passed to the next time step. In line with standard deep learning practice, we preserve this standard ‘tanh’ function for its role as a robust output regularizer, rather than replacing it with a complex oscillator which could introduce unnecessary dynamics and potentially destabilize the training process. This selective modification allows our DFC-LSTM to harness the expressive power of chaotic dynamics for information processing while maintaining the stability of its internal state updates.

3. Data and Experimental Design

This section details the datasets, benchmark models, and evaluation framework used to assess the performance of the proposed DFC-LSTM model.

3.1. Data Description

Our empirical analysis is conducted on two distinct and highly liquid financial time series: the S&P 500 index (SPY) and Apple Inc. (Cupertino, CA, USA) stock (AAPL), providing a comprehensive testbed that covers both a broad market index and a major individual equity. The daily data for both assets are obtained from the Wind Economics Database of China. To ensure a rigorous and consistent comparative analysis, the study period is synchronized to the common overlapping interval of both datasets, spanning from 7 October 2011 to 7 February 2025. For each asset within this unified period, we construct three key time series for our models: the daily log-return ( r t ), the Realized Volatility (RV), and the multifractal spectrum width ( Δ α ).
The daily log-return ( r t ), expressed in percentage points, is calculated from the daily closing prices P t as:
r t = 100 × ( ln ( P t ) ln ( P t 1 ) )
We employ the daily RV as the proxy for the true latent volatility, as it provides a nearly unbiased and efficient measure by leveraging high-frequency data [6]. For a given trading day t with M intraday 5 min returns, RV is defined as the sum of squared intraday returns:
R V t = j = 1 M r t , j 2
where r t , j is the j-th 5 min intraday log-return on day t. The multifractal spectrum width, Δ α , is subsequently computed using the Overlapped Sliding Window Multifractal Detrended Fluctuation Analysis (OSW-MF-DFA), as detailed in Section 2.1, with a rolling window of T = 252 trading days applied to the RV series. After merging and cleaning the data, the final dataset for the S&P 500 contains 3603 daily observations, and the dataset for AAPL contains 3354 daily observations.
Figure 4 provides a visual comparison of these three time series for both the S&P 500 and AAPL. The plots for the log-returns ( r t ) in both figures clearly exhibit volatility clustering, a well-known stylized fact where periods of high volatility are followed by periods of high volatility, and vice-versa. This is particularly evident around major crisis events such as the COVID-19 market crash in early 2020. The RV plots further confirm this persistence in volatility regimes. Most importantly, the plots for the multifractal spectrum width ( Δ α ) demonstrate its effectiveness as a real-time indicator of market complexity. The Δ α series displays significant temporal variation and exhibits pronounced spikes that coincide with periods of high market turmoil, visually justifying its use as a dynamic driver for the model’s forget gate.
To provide quantitative evidence for these visual observations, Table 2 summarizes the descriptive statistics and results of key diagnostic tests for the key series. The statistics reveal several crucial properties. First, the high kurtosis values and the highly significant Jarque-Bera (JB) test statistics (p-value 0 ) for all series confirm the well-documented non-normality and fat-tailed nature of financial data. Second, the significant Augmented Dickey-Fuller (ADF) test results and non-significant Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test results provide strong evidence that the series are stationary, making them suitable for time series modeling. Most importantly, the extremely small p-values of the ARCH-LM test for all series indicate a strong presence of conditional heteroscedasticity (ARCH effects). This quantitatively confirms the volatility clustering observed in the figures and provides a compelling justification for the use of volatility-aware models, such as GARCH and our proposed DFC-LSTM, over simpler linear models.

3.2. Benchmark Models

To rigorously evaluate the performance of our proposed DFC-LSTM model, we select a set of contemporary, high-performing deep learning models as benchmarks. The chosen models are drawn from recent, high-impact publications and represent the current state-of-the-art in time series forecasting. This strategy ensures that our DFC-LSTM is tested against a challenging and meaningful set of alternatives. The selection is further guided by the successful application of these models to the same datasets used in our study, ensuring their relevance.
  • GARCH-LSTM (Internal): Our first advanced benchmark is the GARCH-LSTM model proposed by Zhao et al. [11]. This model represents a deep fusion of econometric theory and recurrent networks, where the mathematical structure of a GJR-GARCH model is directly embedded into the LSTM cell architecture. In its original publication, this architecture was shown to achieve superior predictive performance over traditional econometric models on stock index data, including the S&P 500.
  • Standard LSTM: A standard LSTM network serves as a crucial baseline for ablation purposes. This model is configured with hyperparameters (e.g., hidden size, layers) identical to our DFC-LSTM. Its inclusion allows for a direct assessment of the performance gains attributable solely to our two proposed innovations: the multifractal-driven dynamic gate and the chaotic oscillator activation.
  • Gated Recurrent Unit (GRU): We include the GRU, a popular and powerful variant of gated RNNs introduced by Cho et al. [30]. The GRU simplifies the LSTM architecture by combining the forget and input gates into a single “update gate”. It often achieves performance comparable to LSTM with fewer parameters, making it an essential and efficient benchmark for comparison against our more complex DFC-LSTM cell.
  • Temporal Convolutional Network (TCN): As a strong non-recurrent benchmark, we include the TCN. Zhang et al. [31] proposed this architecture for VaR forecasting, demonstrating that its use of causal and dilated convolutions allows it to outperform traditional GARCH models and standard recurrent networks. Its successful application to the AAPL stock dataset in the original paper makes it a highly relevant benchmark.
  • Transformer: We include the standard Transformer model, which has become a dominant architecture in sequence modeling since its introduction by Vaswani et al. [32]. Its architecture abandons recurrence entirely in favor of a self-attention mechanism. While its effectiveness in general time series forecasting has been debated [33], its paradigm-shifting influence makes it an indispensable benchmark to test whether a specialized recurrent model like DFC-LSTM can outperform a powerful, general-purpose attention-based architecture.
  • Informer: To further challenge our model, we also benchmark against the Informer model, an efficient Transformer variant specifically designed for long sequence forecasting [34]. Informer introduces a ProbSparse self-attention mechanism to reduce the computational complexity of the standard Transformer, making it a more optimized and formidable baseline for financial time series analysis.
By benchmarking against this comprehensive suite of models, we effectively bypass the need for direct comparison with traditional GARCH models, as the superiority of several of these benchmarks has already been established in the cited literature. Our experiments are thus designed to test whether the DFC-LSTM can advance the state-of-the-art beyond these powerful and diverse deep learning solutions.

3.3. Experimental Settings and Evaluation Framework

3.3.1. Experimental Settings

All experiments are implemented in Python 3.10 using the PyTorch 2.0 framework with CUDA 11.8 acceleration on a system equipped with an NVIDIA RTX 3060 GPU, an Intel i7-11800H CPU, and 32GB of RAM. For all models, the data is chronologically split into three sets: an initial 70% for training, the next 10% for validation, and the final 20% for out-of-sample testing. The 70% training set was used to fit the models. The 10% validation set was used exclusively for model selection, specifically to identify the optimal chaotic oscillator type (T1–T10) for the DFC-LSTM based on statistical robustness (as detailed in Appendix A.3). The final 20% test set was used only once for the final out-of-sample performance evaluation reported in this paper.
To ensure a fair comparison, the benchmark models (Standard LSTM, GRU, TCN, etc.) were trained using only the primary time series features ( r t and R V t ). The multifractal spectrum width ( Δ α t ) was not provided as an input feature to the benchmarks. It was used exclusively as an architectural component within the DFC-LSTM’s dynamic gate, allowing us to test the benefit of the proposed architecture rather than simply the benefit of an additional feature. Furthermore, to ensure a consistent comparison, a standardized set of hyperparameters is used for all deep learning models. Specifically, we employ a look-back window of 60 trading days, a hidden size of 128 neurons, and a dropout rate of 0.2. All models are trained for 100 epochs using a batch size of 64 with the AdamW optimizer and a learning rate of 0.001. The task is framed as a quantile regression problem, where the models are trained to directly predict the 5th percentile of the return distribution ( α = 0.05 ) by minimizing a quantile loss function. The quantile loss L τ for a given quantile τ is defined as:
L τ ( y , y ^ τ ) = τ ( y y ^ τ ) if y y ^ τ > 0 ( τ 1 ) ( y y ^ τ ) if y y ^ τ 0
where y is the true return, y ^ τ is the predicted quantile (the VaR forecast), and τ is set to 0.05. This quantile loss is the appropriate objective for training models to predict a specific quantile like VaR, unlike metrics such as RMSE or R-squared which are designed to evaluate the accuracy of point forecasts (e.g., predicting the mean).

3.3.2. Evaluation Framework and Metrics

The out-of-sample VaR forecasts from all models are evaluated using a comprehensive backtesting framework designed to assess both statistical accuracy and practical economic implications.
Violation Rate (VR)
This is the empirical frequency of violations, calculated as the proportion of days where the actual loss exceeded the predicted VaR [35]. A well-calibrated model should have a VR close to the target level α .
V R = 1 T t = 1 T I ( r t < VaR t ( α ) )
where T is the total number of out-of-sample observations, and I ( · ) is an indicator function that equals 1 if a violation occurs and 0 otherwise.
Unconditional Coverage (UC) Test
This is the Kupiec’s proportion of failures (POF) test, which statistically assesses whether the observed violation rate is significantly different from the expected rate α [36]. The null hypothesis ( H 0 ) is that the model’s violation rate is correct ( V R = α ). The likelihood ratio test statistic is:
L R u c = 2 ln ( 1 α ) T N α N ( 1 V R ) T N V R N χ 2 ( 1 )
where N is the number of violations, and χ 2 ( 1 ) denotes the chi-square distribution with one degree of freedom. A p-value greater than 0.05 indicates that the model passes the test.
Conditional Coverage (CC) Test
This is Christoffersen’s test, which extends the UC test by jointly examining if the violation rate is correct and if the violations are independent (i.e., not clustered) [37]. The null hypothesis ( H 0 ) is that violations are correctly specified and independent. The test statistic is L R c c = L R u c + L R i n d , where L R i n d tests for the independence of violations:
L R i n d = 2 ln ( 1 π 2 ) n 00 + n 10 π 2 n 01 + n 11 ( 1 π 01 ) n 00 π 01 n 01 ( 1 π 11 ) n 10 π 11 n 11 χ 2 ( 1 )
Here, n i j is the number of times state j occurred after state i (where state 1 is a violation and 0 not), and π i j are the corresponding transition probabilities. The L R c c statistic follows a χ 2 ( 2 ) distribution. A p-value greater than 0.05 is required to pass the test.
Dynamic Quantile (DQ) Test
This test, proposed by Engle & Manganelli [38], provides a more rigorous assessment than the CC test by jointly examining if the violation rate is correct and if violations (or ’hits’) are independent of all available information. The test statistic is based on the regression of the demeaned hit sequence H i t t = I ( r t < VaR t ( α ) ) α against its own lags and other explanatory variables (such as the VaR forecast itself). The null hypothesis ( H 0 ) is that all regressors have zero coefficients. The resulting test statistic L R d q follows a χ 2 ( k ) distribution, where k is the number of regressors. A p-value greater than 0.05 is required, indicating that the model is dynamically well-specified and violations are unpredictable.
Violation Independence (Ljung-Box) Test
This test provides a direct assessment of violation independence by applying the Ljung-Box Q-statistic [39] to the violation indicator series I t = I ( r t < VaR t ( α ) ) . The test is used to detect serial correlation in this binary series over the first m lags (we use m = 10 in our study). The null hypothesis ( H 0 ) is that the violations are independently distributed. A p-value greater than 0.05 provides strong evidence against violation clustering, corroborating the findings of the CC and DQ tests.
Basel “Traffic Light” Approach
This is a practical regulatory framework for model validation [40]. It classifies a model’s performance into three zones based on the observed number of violations over a test period. The zones are defined by the cumulative binomial probability of observing N violations for a given α . For example, in a 250-day window at the α = 0.05 level (which we use), the expected number of violations is 12.5. The “Green Zone” would correspond to a range of violations (e.g., 8 to 17) that is statistically probable. The zones are:
  • Green Zone: The model is accepted.
  • Yellow Zone: The model is accepted but incurs a capital multiplier penalty.
  • Red Zone: The model is statistically rejected.
While our test set is longer than 250 days, we apply this same probabilistic logic to classify our models.
Regulator’s Quadratic Loss Function (RQL)
This loss function reflects a regulator’s perspective by penalizing violations based on the squared magnitude of the excess loss, thus placing a higher penalty on larger failures [41]. The average loss is calculated as:
R Q L = 1 T t = 1 T L t , where L t = 1 + ( r t VaR t ) 2 if r t < VaR t 0 otherwise
Firm’s Loss Function (FS)
This loss function reflects a firm’s perspective by incorporating the opportunity cost of capital. It penalizes violations quadratically while also applying a linear penalty to non-violations, representing the cost of holding excessive capital reserves [42]. The average loss is:
F S = 1 T t = 1 T L t , where L t = ( r t VaR t ) 2 if r t < VaR t β · VaR t otherwise
where β is the firm’s cost of capital, set to 0.0001 in our experiments. A superior model is expected to have a VR close to α , high p-values for the UC and CC tests, and low values for the RQL and FS loss functions.
In this study, we focus on the 95% confidence level for VaR forecasting. This level is not only a widely adopted benchmark in both industry practice and regulatory frameworks [35,40] but also provides a more statistically robust environment for model backtesting. Higher quantiles, such as 99%, lead to a very small number of expected violations in the out-of-sample period. This data sparsity is known to diminish the statistical power of standard coverage tests (e.g., UC and CC tests), making the backtesting results less reliable and sensitive to singular events [36,37,43,44]. Therefore, a thorough evaluation at the 95% level is sufficient to rigorously validate the performance and innovation of our proposed architecture while ensuring the robustness of our empirical conclusions.

4. Results

This section presents the empirical results of our out-of-sample 95% VaR backtesting. We first conduct a detailed ablation study to disentangle the contributions of our two proposed innovations: the dynamic fractal gate and the chaotic oscillator activation. Following this, we compare the full DFC-LSTM model against a comprehensive suite of state-of-the-art benchmarks. To ensure a thorough validation of statistical robustness, we incorporate a rigorous set of advanced backtests, including the Dynamic Quantile (DQ) test and Ljung-Box independence test.

4.1. Ablation Study

To isolate the impact of each component, we compare the full DFC-LSTM against the Standard LSTM (which has static gates and static ‘tanh’ activation) and two new variants:
  • MF-LSTM (Gate-Only): Uses the multifractal-driven dynamic forget gate but retains the standard ‘tanh’ activation.
  • Chaotic-LSTM (Chaos-Only): Uses the standard LSTM forget gate but replaces the ‘tanh‘ activation with the dynamic g c o c (Lee Oscillator).
Table 3 and Table 4 present the ablation results.
The ablation study reveals a critical synergistic effect. On the S&P 500 (Table 3), a challenging series with high persistence, both the Gate-Only (A) and Chaos-Only (B) variants fail the rigorous Dynamic Quantile (DQ) test, with p-values of 0.0443 and 0.0232 respectively. The standard LSTM also fails the Ljung-Box (L-B) independence test. In contrast, the full DFC-LSTM (C) is one of the few models to successfully pass the DQ test (p = 0.1971), demonstrating that both innovations are required to work in synergy to achieve correct dynamic specification.
The results on the volatile AAPL dataset (Table 4) are even more striking. Here, the Standard LSTM (Baseline) fails the L-B independence test (p = 0.0216). In contrast, both the Gate-Only (A) and Chaos-Only (B) components are robust enough on their own to fix this issue and pass all tests. However, the full DFC-LSTM (C) achieves a perfect DQ p-value of 1.0000, demonstrating the highest possible level of statistical calibration and dynamic specification. This confirms that the two innovations are complementary, leading to a model that is significantly more robust than its individual parts.

4.2. Comparative Benchmark Results

We now compare the full DFC-LSTM against the suite of benchmarks, including re-tuned Transformer and Informer models. These models were optimized to ensure a fair and robust comparison against architectures not originally designed for financial noise (see Appendix A.2 for tuning details).
Table 5 presents the results for the S&P 500. The key finding is that this dataset exhibits strong violation persistence, as all models tested, including our own, fail the Ljung-Box (L-B) independence test. This highlights the difficulty of the task and the importance of the more advanced DQ test. Here, the DFC-LSTM-T10 stands out: it passes the rigorous DQ test with a p-value of 0.1971. In contrast, several key benchmarks, including the TCN (p = 0.0290) and Standard GRU (p = 0.0453), fail this test. This demonstrates that the DFC-LSTM provides superior dynamic specification even when simple independence is hard to achieve.
Table 6 details the results for the more challenging AAPL dataset. Here, the superiority of the DFC-LSTM is unambiguous. The Standard LSTM baseline fails the L-B independence test (p = 0.0216). In contrast, the DFC-LSTM (T1) passes all four statistical backtests (UC, CC, DQ, and L-B) and achieves a perfect DQ p-value of 1.0000. While other advanced benchmarks (GARCH-LSTM, TCN, GRU) also pass these tests, the DFC-LSTM’s perfect dynamic specification score, combined with the failure of its own baseline (Standard LSTM), provides the strongest evidence of its superior design and robustness for volatile assets.

5. Discussion

The empirical results presented in Section 4 consistently highlight the superior statistical robustness of the proposed DFC-LSTM architecture. A deeper interpretation of these findings begins with the ablation study (Section 4.1), which provides critical insights into the model’s inner workings. For the challenging S&P 500 series (Table 3), both the Gate-Only and Chaos-Only variants failed the rigorous DQ test. The full DFC-LSTM, however, successfully passed this test. This strongly suggests that for a persistent, broad market index, the two innovations are not individually sufficient. Instead, they act synergistically: the fractal gate adapts the model’s memory to shifting volatility regimes, while the chaotic activation provides the necessary nonlinear capacity to model the dynamics within those regimes. Only the full model, combining both, was able to achieve the correct dynamic specification. This synergistic relationship is further clarified by the results on the volatile AAPL dataset (Table 4), where the Standard LSTM baseline failed the L-B independence test. Here, both individual components (Gate-Only and Chaos-Only) were robust enough on their own to fix this issue and pass all tests. This indicates that for highly non-stationary series, either innovation is a significant improvement. However, the full DFC-LSTM achieved a perfect DQ p-value of 1.0000, demonstrating a level of calibration that even the individual components could not reach. This confirms that the innovations are complementary, leading to the most robust model.
This superior statistical robustness is reinforced when evaluating the model against the full benchmark suite. The primary criterion for a VaR model’s viability is its statistical validity, and our expanded backtesting framework, incorporating the DQ and Ljung-Box tests, underscores this point. On the S&P 500 (Table 5), we found that violation independence is an extremely difficult property to capture, with all models failing the L-B test. However, the DFC-LSTM was one of the few models (unlike the TCN and GRU) to pass the rigorous DQ test, proving its superior dynamic specification. On the AAPL dataset (Table 6), the DFC-LSTM’s superiority is unambiguous. It resolves the failure of the Standard LSTM baseline and achieves a perfect DQ score (1.0000), a mark of robustness unmatched by any other model. This demonstrates that the DFC-LSTM’s theory-informed design translates into empirically superior calibration.
From a practical perspective, this statistical reliability is paramount. While the DFC-LSTM demonstrates competitive economic loss scores (RQL and FS), its true value lies in its balance of statistical reliability and economic performance. A model with a slightly lower RQL but a failing DQ p-value (like the TCN on S&P 500) or a failing independence test (like the Standard LSTM on AAPL) is a poor choice for any risk manager. The DFC-LSTM provides an exceptional balance: it is among the best in terms of economic loss while also being the most statistically robust and dynamically well-specified model in the entire study. This success stands in contrast to the failure of the tuned Transformer-based models, which, especially on the S&P 500, suggests that their general-purpose self-attention mechanism may be ill-suited for the high-noise, low-signal environment of single-step-ahead VaR forecasting. In contrast, our DFC-LSTM, which embeds domain knowledge (fractals, chaos) directly into a recurrent cell, proves to be a far more effective architecture.
A final practical consideration is the computational cost. In our experiments, the DFC-LSTM models completed training in approximately 700 s on average. This is not only a reasonable absolute time but is also highly competitive, proving notably faster than more complex benchmarks like the GARCH-LSTM and TCN. This efficiency stems from its design. The fractal analysis (OSW-MF-DFA) component is a one-time preprocessing step performed before training begins and thus adds no overhead to the model training loop. Furthermore, the chaotic activation function’s cost is well-managed; while not computationally identical to a standard tanh, its reliance on a pre-computed lookup table for forward propagation and a simple surrogate gradient ( 1 z 2 ) for backward propagation means its additional overhead is minimal. The primary marginal cost arises from the dynamic fractal gate (Equation (16)), which requires one additional set of “expert” weights compared to a standard LSTM’s forget gate. Therefore, the total training cost represents only a modest increase over a standard LSTM and is more efficient than several other benchmark models. We argue this minor computational cost is a highly justifiable price for the model’s demonstrably superior statistical reliability and dynamic calibration.

6. Conclusions

This paper addresses the persistent challenge of accurate Value-at-Risk (VaR) forecasting in the face of complex, non-stationary financial market dynamics. To overcome the limitations of conventional models, we propose the Dynamic Fractal–Chaotic LSTM (DFC-LSTM), a novel architecture that integrates principles from fractal analysis and chaos theory directly into the core computational units of the LSTM cell. The model introduces two synergistic innovations: a multifractal-driven dynamic forget gate that adapts memory retention to real-time market complexity, and a chaotic oscillator-based dynamic activation function that resolves the “complexity mismatch” inherent in static processors.
Our empirical results, based on backtesting on both the S&P 500 index and the volatile AAPL stock, demonstrate the superiority of the DFC-LSTM. The model consistently delivers statistically valid VaR forecasts, passing both unconditional and conditional coverage tests with significantly higher p-values than a comprehensive suite of state-of-the-art benchmarks, including GARCH-LSTM, TCN, and Transformer-based models. This statistical robustness is particularly salient on the challenging AAPL dataset, where several benchmarks fail. While maintaining highly competitive economic loss scores, the DFC-LSTM’s primary strength lies in its exceptional calibration and reliability.
The findings strongly suggest that directly embedding the intrinsic dynamical principles of financial markets into the neural architecture is an effective strategy for improving financial risk modeling. The success of the DFC-LSTM validates our hypothesis that a theory-informed, adaptive architecture leads to more accurate and dependable risk forecasts. Future research could extend this framework in several directions: applying it to other asset classes such as commodities or cryptocurrencies; testing its performance at more extreme risk levels (e.g., 99% VaR) and for complementary risk measures like Expected Shortfall (ES); validating its robustness under different market conditions using more rigorous protocols, such as rolling-window estimations; assessing its practical economic implications; investigating a broader range of chaotic systems for the activation mechanism; or expanding the architecture to a multivariate setting to model systemic risk.

Author Contributions

Conceptualization, Y.Z. and R.S.T.L.; methodology, Y.Z.; software, Y.Z.; validation, Y.Z. and Z.Z.; formal analysis, Y.Z. and Z.Z.; investigation, Y.Z.; resources, B.T.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, Z.Z., R.S.T.L., and B.T.; visualization, Y.Z.; supervision, Z.Z., R.S.T.L., and B.T.; project administration, Z.Z. and R.S.T.L.; funding acquisition, Z.Z. and R.S.T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Guangdong Provincial Key Laboratory of IRADS (2022B1212010006).

Informed Consent Statement

This research article describing a study do not involve humans.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from the Wind Economics Database and are available at https://www.wind.com.cn/ (accessed on 24 April 2025) for users with a valid subscription.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Nomenclature

Table A1 provides a summary of the key mathematical symbols and variables used throughout this paper.
Table A1. List of key symbols and definitions.
Table A1. List of key symbols and definitions.
SymbolDefinition
r t Daily log-return at time t.
P t Daily closing price at time t.
R V t Realized Volatility at time t.
α Significance level for VaR (e.g., 0.05).
τ Quantile level for the loss function (e.g., 0.05).
VaR t ( α ) Value-at-Risk forecast at time t for significance level α .
Δ α Multifractal spectrum width; Δ α t is the value at time t.
TRolling window length for OSW-MF-DFA (e.g., 252).
sScale (segment length) in MF-DFA.
qMoment (order) for fluctuation function.
F q ( s ) q-th order fluctuation function.
h ( q ) Generalized Hurst exponent.
τ ( q ) Mass exponent.
g c o c ( · ) Chaotic Oscillator Activation function.
LORS t Output of the Lee Oscillator at its internal step t.
E t , I t States of the excitatory and inhibitory neurons in the oscillator.
N h i s t Number of internal evolution steps for the oscillator.
d surrogate Surrogate gradient for g c o c ( · ) , defined as 1 z 2 .
x t Input vector to the LSTM cell at time t.
h t Hidden state of the LSTM cell at time t.
C t Cell state of the LSTM cell at time t.
C ˜ t Candidate memory state.
f t Forget gate output.
i t Input gate output.
o t Output gate output.
e t Dynamic interpolator for the forget gate, derived from Δ α t .
W , b Learnable weights and biases of the network.
σ ( · ) Sigmoid activation function.
Element-wise multiplication.
V R Violation Rate.
L R u c Likelihood Ratio statistic for Unconditional Coverage (Kupiec) test.
L R c c Likelihood Ratio statistic for Conditional Coverage (Christoffersen) test.
L R d q Likelihood Ratio statistic for Dynamic Quantile (Engle-Manganelli) test.
R Q L Regulator’s Quadratic Loss function.
F S Firm’s (Sarma) Loss function.
β Firm’s cost of capital parameter for the FS loss.

Appendix A.2. Benchmark Model Configuration

To ensure a fair comparison, a standardized set of core hyperparameters was employed for the majority of the benchmark models (Standard LSTM, Standard GRU, TCN, GARCH-LSTM) as well as our proposed DFC-LSTM. This standard configuration included a look-back window of 60 days, a hidden dimension of 128, a learning rate of 0.001 (using the AdamW optimizer), a batch size of 64, and 100 training epochs.
Recognizing that Transformer-based architectures (Transformer and Informer) can be particularly sensitive to parameterization, especially in noisy financial time series, specific tuning was performed for these two models. Different configurations, including variations in hidden size, were tested. The final configurations selected for presentation in the main results tables (Section 4) represent these tuned settings and differ slightly between the datasets and models, reflecting the outcome of this tuning process.
Crucially, several regularization and stabilization techniques were applied universally across all models during training to enhance robustness. These included L2 Weight Decay, Dropout within model architectures, and Gradient Clipping.
Table A2 details the specific architectural configurations used for generating the main results presented in this paper, highlighting the distinct parameters for the tuned Transformer and Informer models alongside the standard configuration used for others.
Table A2. Hyperparameter settings for all models used in the main results.
Table A2. Hyperparameter settings for all models used in the main results.
ModelLook-BackHidden Size (d_model)LayersHeads (n_heads)FFN Dim (d_ff)Other Arch. Params
Standard LSTM601281N/AN/ADropout = 0.2
Standard GRU601281N/AN/ADropout = 0.2
TCN601282N/AN/AKernel Size = 2, Dropout = 0.2
GARCH-LSTM601281N/AN/ADropout = 0.2
DFC-LSTM (Family)601281N/AN/ADropout = 0.2
Transformer (Tuned, SP500 & AAPL)60641 (Encoder)4256Dropout = 0.2
Informer (Tuned, SP500)601282 (Encoder)4128Factor = 3, Dropout = 0.2
Informer (Tuned, AAPL)60642 (Encoder)4128Factor = 3, Dropout = 0.2

Appendix A.3. DFC-LSTM Oscillator Type Performance (95% VaR)

To address potential model-selection bias regarding the 10 chaotic oscillator types (T1–T10), this section provides the full backtesting results for all 10 DFC-LSTM variants on both datasets for the primary 95% VaR level. The optimal oscillator type for each dataset, presented in the main results (Section 4), was selected based on performance on the validation set. The selection process prioritized models that passed all statistical backtests (UC, CC, DQ, L-B) while achieving high p-values, particularly for the DQ and L-B tests, indicating superior statistical robustness.
Table A3 and Table A4 present these detailed results. For the S&P 500 dataset, Type 10 (T10) was selected as it was one of the few variants to pass the DQ test. For the AAPL dataset, Type 1 (T1) was chosen, notably achieving a perfect DQ p-value of 1.0000. This transparent reporting demonstrates the performance variation across oscillator types and validates our selection process.
Table A3. Full backtesting results for all 10 DFC-LSTM oscillator types on S&P 500 ( α = 0.05 ).
Table A3. Full backtesting results for all 10 DFC-LSTM oscillator types on S&P 500 ( α = 0.05 ).
Model VariantVRUC p-valCC p-valDQ p-valL-B p-valBasel ZoneRQL LossFS Loss
DFC-LSTM (T1)0.08460.00010.00040.03090.0047Red0.18750.1029
DFC-LSTM (T2)0.08320.00020.00040.01460.0039Yellow0.18190.0988
DFC-LSTM (T3)0.06210.15490.27040.06220.0001Green0.14340.0814
DFC-LSTM (T4)0.13680.00000.00000.00000.0105 *Red0.30140.1646
DFC-LSTM (T5)0.03530.05780.09330.00590.0001Green0.08730.0522
DFC-LSTM (T6)0.03530.05780.09330.00040.0001Green0.08310.0480
DFC-LSTM (T7)0.03950.18330.11980.04480.0015Green0.09430.0550
DFC-LSTM (T8)0.06630.05750.09370.02970.0010Yellow0.15520.0890
DFC-LSTM (T9)0.08180.00040.00100.02000.0056Yellow0.18070.0990
DFC-LSTM (T10)0.04940.93810.64750.19710.0349Green0.12030.0711
Note: The selected model (T10) for the main paper results is highlighted. L-B p-val is for Q(10). Values in italics indicate failure at 5% significance level. * Value fails at 1% but passes at 5%.
Table A4. Full backtesting results for all 10 DFC-LSTM oscillator types on AAPL ( α = 0.05 ).
Table A4. Full backtesting results for all 10 DFC-LSTM oscillator types on AAPL ( α = 0.05 ).
Model VariantVRUC p-valCC p-valDQ p-valL-B p-valBasel ZoneRQL LossFS Loss
DFC-LSTM (T1)0.04850.85760.54061.00000.2415Green0.12560.0774
DFC-LSTM (T2)0.02580.00170.00551.00000.1278Green0.06410.0386
DFC-LSTM (T3)0.04090.26920.15651.00000.0216Green0.10330.0627
DFC-LSTM (T4)0.02880.00680.02180.63490.0028Green0.07670.0482
DFC-LSTM (T5)0.04850.85760.54060.99820.0495Green0.13140.0832
DFC-LSTM (T6)0.05300.72340.32241.00000.0739Green0.14590.0931
DFC-LSTM (T7)0.03030.01250.03940.68240.0087Green0.07520.0452
DFC-LSTM (T8)0.02120.00010.00041.00000.6990Green0.04560.0248
DFC-LSTM (T9)0.03790.13620.20231.00000.1665Green0.09810.0605
DFC-LSTM (T10)0.05450.59720.34180.99520.1053Green0.14770.0934
Note: The selected model (T1) for the main paper results is highlighted. L-B p-val is for Q(10). Values in italics indicate failure at 5% significance level.

Appendix A.4. 99% VaR Backtesting Results

To evaluate model performance under stricter regulatory conditions, this section provides supplementary backtesting results for the 99% Value-at-Risk (VaR) level ( α = 0.01 ). Forecasting and backtesting at this extreme quantile are known to be challenging due to the infrequent nature of violations, which can limit the power and reliability of statistical tests [43]. Nonetheless, these results offer valuable insights into the models’ ability to handle extreme tail risk.
The performance of the DFC-LSTM is compared against key benchmarks. For the DFC-LSTM, the configuration (oscillator type and gate input window T) yielding the best validation performance for the specific 99% VaR task on each dataset is presented.
Table A5 and Table A6 summarize these 99% VaR backtesting results.
Table A5. Out-of-sample VaR backtesting results for S&P 500 at α = 0.01 (99% VaR).
Table A5. Out-of-sample VaR backtesting results for S&P 500 at α = 0.01 (99% VaR).
ModelVRUC p-valCC p-valDQ p-valL-B p-valBasel ZoneRQL LossFS Loss
DFC-LSTM (T6, Gate T90)0.01020.96650.92980.00000.0000Green0.01540.0055
Standard LSTM0.00850.67260.0000 *0.00400.0025Green0.01190.0038
GARCH-LSTM0.00850.67260.0000 *0.00270.0025Green0.01210.0040
TCN0.00560.20390.0000 *0.84080.9998Green0.00690.0016
Note: L-B p-val is for Q(10). Values in italics indicate failure at 5% significance level. * CC test p-value near 0.0 may indicate numerical limits or strong clustering.
Table A6. Out-of-sample VaR backtesting results for AAPL at α = 0.01 (99% VaR).
Table A6. Out-of-sample VaR backtesting results for AAPL at α = 0.01 (99% VaR).
ModelVRUC p-valCC p-valDQ p-valL-B p-valBasel ZoneRQL LossFS Loss
DFC-LSTM (T6, Gate T60)0.00780.55030.80451.00000.9991Green0.01150.0042
Standard LSTM0.00610.27250.0000 *0.87060.9997Green0.00910.0035
GARCH-LSTM0.00300.03480.0000 *0.99901.0000Yellow0.00480.0023
TCN0.00000.00000.0000 *1.00001.0000Red0.00000.0006
Note: L-B p-val is for Q(10). Values in italics indicate failure at 5% significance level. * CC test p-value near 0.0 may indicate numerical limits or strong clustering.
For the S&P 500 (Table A5), the DFC-LSTM model utilized Oscillator Type 6 (T6) combined with the T 90 gate input ( Δ α T = 90 ), as determined by validation performance. This configuration achieves high UC (0.9665) and CC (0.9298) p-values, indicating appropriate coverage and independence according to these tests. However, similar to most benchmarks at this stringent 99% level, it fails the DQ and L-B tests, highlighting the difficulty in capturing the dynamics of rare extreme events for the index.
On the more volatile AAPL dataset (Table A6), the DFC-LSTM again demonstrates superior performance. The best configuration for this task, identified through validation, employed Oscillator Type 6 (T6) but interestingly utilized the shorter-term T 60 gate input ( Δ α T = 60 ). This DFC-LSTM variant is the only model tested that successfully passes all statistical backtests, including the CC test (p = 0.8045) where other models fail, and achieves a perfect DQ p-value (1.0000). This contrasts sharply with the TCN, which fails completely (Red zone), and the GARCH-LSTM (Yellow zone). The success with the T 60 gate input at the 99% level for AAPL, compared to T 252 at the 95% level, suggests that a faster, more responsive gate dynamic might be advantageous for capturing the rapid onset of extreme tail events in highly volatile individual stocks. These 99% results reinforce the DFC-LSTM’s robustness in modeling extreme risks where conventional and benchmark models often struggle.

Appendix A.5. Sensitivity Analysis of OSW-MFDFA Window Length (T)

This appendix examines the influence of the OSW-MFDFA rolling window length T on the DFC-LSTM model’s performance and the characteristics of the resulting Δ α t signal, which serves as the dynamic input to the forget gate. The parameter T critically determines the balance between the signal’s responsiveness to recent market changes and its stability over time. While the primary analysis employs T = 252 (approximately one trading year) based on standard practice and validation results (Appendix A.3), this sensitivity analysis compares its effects against a medium-term window of T = 90 (approximately one quarter) and a short-term window of T = 60 (approximately two months).

Appendix A.5.1. Effect on Gate Dynamics (Δαt Signal)

The choice of window length T directly shapes the dynamics of the Δ α t signal used for gating. Figure A1 illustrates the Δ α t time series generated using T = 60 , T = 90 , and T = 252 for both datasets. The plots reveal a clear trade-off: shorter windows (e.g., T = 60 , green dotted line) produce a more volatile Δ α t signal that reacts rapidly to short-term market fluctuations, potentially capturing transient changes in complexity. Longer windows (e.g., T = 252 , red solid line) yield a much smoother signal, reflecting more persistent, structural shifts in market multifractality but potentially lagging in response to sudden events. The T = 90 signal (blue dashed line) represents an intermediate case. This visual comparison confirms that T significantly modulates the characteristics of the gate control signal.
Figure A1. Comparison of Δ α t series using different window lengths ( T = 60 , T = 90 , T = 252 ) for S&P 500 (a) and AAPL (b). (a) S&P 500 Δ α t Sensitivity. (b) AAPL Δ α t Sensitivity.
Figure A1. Comparison of Δ α t series using different window lengths ( T = 60 , T = 90 , T = 252 ) for S&P 500 (a) and AAPL (b). (a) S&P 500 Δ α t Sensitivity. (b) AAPL Δ α t Sensitivity.
Mathematics 13 03587 g0a1

Appendix A.5.2. Effect on DFC-LSTM VaR Performance (95% Level)

To evaluate how these different gate dynamics impact the final VaR forecasts, we perform two comparisons at the 95% VaR level, presented in Table A7 and Table A8. Panel A in each table assesses the sensitivity of the baseline optimal models (DFC-LSTM T10 for S&P 500, T1 for AAPL, originally selected using T = 252 validation as shown in Table 5 and Table 6 respectively) when their gate input is changed to Δ α T = 90 and Δ α T = 60 . Panel B provides a stronger comparison by showing the performance of the best performing oscillator type identified specifically for the T = 90 and T = 60 settings (selected based on separate validation runs).
Table A7. Sensitivity analysis results for DFC-LSTM on S&P 500 ( α = 0.05 ) with varying gate window lengths (T).
Table A7. Sensitivity analysis results for DFC-LSTM on S&P 500 ( α = 0.05 ) with varying gate window lengths (T).
Model ConfigOscillatorVRUC p-valCC p-valDQ p-valL-B p-valBasel ZoneRQL LossFS Loss
Panel A: Sensitivity of Baseline Optimal Oscillator (T10)
DFC-LSTM (Gate T = 252 )T10 (Baseline)0.04940.93810.64750.19710.0349Green0.12030.0711
DFC-LSTM (Gate T = 90 )T100.08850.00000.00010.02130.0073Red0.19690.1085
DFC-LSTM (Gate T = 60 )T100.11900.00000.00000.00000.0137Red0.26410.1452
Panel B: Best Oscillators Optimized for Shorter Windows
DFC-LSTM (Gate T = 90 )T9 (Best for T = 90 )0.04930.93720.59230.25000.0155Green0.12220.0730
DFC-LSTM (Gate T = 60 )T9 (Best for T = 60 )0.05660.43560.62370.01690.0010Green0.13600.0796
Note: L-B p-val is for Q(10). Values in italics indicate failure at the 5% significance level.
The analysis indicates varying sensitivity depending on the dataset.
For the S&P 500 (Table A7), Panel A shows that the baseline optimal model (T10 oscillator) performance deteriorates substantially when forced to use shorter gate window lengths ( T = 90 and T = 60 ). Panel B examines whether optimizing the oscillator specifically for shorter windows can recover performance. While the best model for T = 90 (using T9 oscillator) achieves good statistical properties (passing UC, CC, DQ tests), it still fails the L-B test and offers no significant advantage over the T = 252 baseline (T10 oscillator). Furthermore, the best model for T = 60 (also T9 oscillator) fails both the DQ and L-B tests. This strongly suggests that for a broad, relatively persistent market index, the smoother, longer-term Δ α T = 252 signal provides the most stable and effective gate control overall, even when oscillator choice is optimized for shorter windows.
Table A8. Sensitivity analysis results for DFC-LSTM on AAPL ( α = 0.05 ) with varying gate window lengths (T).
Table A8. Sensitivity analysis results for DFC-LSTM on AAPL ( α = 0.05 ) with varying gate window lengths (T).
Model ConfigOscillatorVRUC p-valCC p-valDQ p-valL-B p-valBasel ZoneRQL LossFS Loss
Panel A: Sensitivity of Baseline Optimal Oscillator (T1)
DFC-LSTM (Gate T = 252 )T1 (Baseline)0.04850.85760.54061.00000.2415Green0.12560.0774
DFC-LSTM (Gate T = 90 )T10.04190.32970.18781.00000.0295Green0.11290.0713
DFC-LSTM (Gate T = 60 )T10.04190.32970.18781.00000.0295Green0.11170.0701
Panel B: Best Oscillators Optimized for Shorter Windows
DFC-LSTM (Gate T = 90 )T10 (Best for T = 90)0.05740.40130.30961.00000.0409Green0.15740.1002
DFC-LSTM (Gate T = 60 )T10 (Best for T = 60)0.05430.62390.32131.00000.0921Green0.14870.0947
Note: L-B p-val is for Q(10). Values in italics indicate failure at the 5% significance level.
In contrast, the DFC-LSTM shows greater robustness on the more volatile AAPL dataset (Table A8). Panel A confirms that the baseline optimal model (T1 oscillator) maintains strong DQ performance (p = 1.0000) even with shorter gate inputs ( T = 90 , T = 60 ), although it fails the L-B test in those cases. Panel B shows the results when optimizing the oscillator for shorter windows (T10 turns out to be best for both T = 90 and T = 60 via validation). These optimized T = 90 and T = 60 models also achieve perfect DQ scores. However, they struggle with the L-B independence test ( T = 90 fails, T = 60 passes marginally) and exhibit noticeably higher economic losses (RQL and FS) compared to the T = 252 baseline (T1 oscillator).
Overall, this extended sensitivity analysis confirms that the choice of T influences gate dynamics and VaR performance. While shorter windows might seem appealing for reactivity, the T = 252 window, combined with the oscillator selected under that regime, consistently provides the best balance of statistical robustness (especially passing independence tests) and competitive economic performance across both datasets. This reinforces the choice of T = 252 as the primary window length for our main analysis.

References

  1. Abad, P.; Benito, S.; López, C. A comprehensive review of value at risk methodologies. Span. Rev. Financ. Econ. 2014, 12, 15–32. [Google Scholar]
  2. Bollerslev, T. Generalized autoregressive conditional heteroskedasticity. J. Econom. 1986, 31, 307–327. [Google Scholar] [CrossRef]
  3. Kakade, K.; Jain, I.; Mishra, A.K. Value-at-Risk forecasting: A hybrid ensemble learning GARCH-LSTM based approach. Resour. Policy 2022, 78, 102903. [Google Scholar]
  4. Kim, H.Y.; Won, C.H. Forecasting the volatility of stock price index: A hybrid model integrating LSTM with multiple GARCH-type models. Expert Syst. Appl. 2018, 103, 25–37. [Google Scholar] [CrossRef]
  5. Lin, Y.; Lin, Z.; Liao, Y.; Li, Y.; Xu, J.; Yan, Y. Forecasting the realized volatility of stock price index: A hybrid model integrating CEEMDAN and LSTM. Expert Syst. Appl. 2022, 206, 117736. [Google Scholar] [CrossRef]
  6. Andersen, T.G.; Bollerslev, T.; Diebold, F.X.; Labys, P. Modeling and forecasting realized volatility. Econometrica 2003, 71, 579–625. [Google Scholar] [CrossRef]
  7. Cao, J.; Li, Z.; Li, J. Financial time series forecasting model based on CEEMDAN and LSTM. Phys. A Stat. Mech. Its Appl. 2019, 519, 127–139. [Google Scholar]
  8. Wang, J.; Lee, R. Chaotic Recurrent Neural Networks for Financial Forecast. Am. J. Neural Netw. Appl. 2021, 7, 7–14. [Google Scholar]
  9. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  10. Hu, Y.; Ni, J.; Wen, L. A hybrid deep learning approach by integrating LSTM-ANN networks with GARCH model for copper price volatility prediction. Phys. A Stat. Mech. Its Appl. 2020, 557, 124907. [Google Scholar]
  11. Zhao, P.; Zhu, H.; Ng, W.S.H.; Lee, D.L. From GARCH to Neural Network for Volatility Forecast. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI: Washington, DC, USA, 2024; Volume 38, pp. 16998–17006. [Google Scholar]
  12. Cao, Y.; Choo, W.C.; Matemilola, B.T. Value-at-risk forecasting- based on textual information and a hybrid deep learning-based approach. Int. Rev. Econ. Financ. 2025, 103, 104403. [Google Scholar] [CrossRef]
  13. Yu, X.; Zhang, D.; Zhu, T.; Jiang, X. Novel hybrid multi-head self-attention and multifractal algorithm for non-stationary time series prediction. Inf. Sci. 2022, 613, 541–555. [Google Scholar] [CrossRef]
  14. Kwapień, J.; Blasiak, P.; Drożdż, S.; Oświęcimka, P. Genuine multifractality in time series is due to temporal correlations. Phys. Rev. E 2023, 107, 034139. [Google Scholar] [CrossRef] [PubMed]
  15. Li, Y.; Yin, M.; Khan, K.; Su, C.W. The impact of COVID-19 on shipping freights: Asymmetric multifractality analysis. Marit. Policy Manag. 2023, 50, 1047–1061. [Google Scholar] [CrossRef]
  16. Bashan, A.; Bartsch, R.; Kantelhardt, J.W.; Havlin, S. Comparison of detrending methods for fluctuation analysis. Phys. A Stat. Mech. Its Appl. 2008, 387, 5080–5090. [Google Scholar] [CrossRef]
  17. Chen, F.; Sha, Y.; Ji, H.; Peng, K.; Liang, X. Integrating Multifractal Features into Machine Learning for Improved Prediction. Fractal Fract. 2025, 9, 205. [Google Scholar] [CrossRef]
  18. Vogl, M. Controversy in financial chaos research and nonlinear dynamics: A short literature review. Chaos Solitons Fractals 2022, 162, 112444. [Google Scholar] [CrossRef]
  19. Lee, R.S.T. LEE-Associator—A transient chaotic auto-association network for progressive memory recalling. Neural Netw. 2006, 19, 644–666. [Google Scholar] [CrossRef]
  20. Kantelhardt, J.W.; Zschiegner, S.A.; Koscielny-Bunde, E.; Havlin, S.; Bunde, A.; Stanley, H.E. Multifractal detrended fluctuation analysis of nonstationary time series. Phys. A Stat. Mech. Its Appl. 2002, 316, 87–114. [Google Scholar] [CrossRef]
  21. Tang, Y.; Zhu, P.F. Research of long memory, risk and efficiency of bull and bear based on CSI300 index futures: From the perspective of multifractality. Manag. Rev. 2019, 31, 59–70. [Google Scholar]
  22. Zhang, S.; Fang, W. Multifractal Behaviors of Stock Indices and Their Ability to Improve Forecasting in a Volatility Clustering Period. Entropy 2021, 23, 1018. [Google Scholar] [CrossRef]
  23. Datta, R.P. Analysis of Indian Foreign Exchange Markets: A Multifractal Detrended Fluctuation Analysis (MFDFA) Approach. Int. J. Empir. Econ. 2024, 3, 2450006. [Google Scholar] [CrossRef]
  24. Freeman, W.J. Neurodynamics: An Exploration in Mesoscopic Brain Dynamics; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
  25. Lee, R.S.T. A transient-chaotic autoassociative network (TCAN) based on Lee oscillators. IEEE Trans. Neural Netw. 2004, 15, 1228–1243. [Google Scholar] [CrossRef] [PubMed]
  26. Levitan, I.B.; Kaczmarek, L.K. The Neuron: Cell and Molecular Biology; Oxford University Press: Oxford, UK, 2002. [Google Scholar]
  27. Wong, M.H.; Lee, R.S.; Liu, J.N.K. Wind shear forecasting by Chaotic Oscillatory-based Neural Networks (CONN) with Lee Oscillator (retrograde signalling) model. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 2040–2047. [Google Scholar]
  28. Lee, R.S.T. Chaotic type-2 transient-fuzzy deep neuro-oscillatory network (CT2TFDNN) for worldwide financial prediction. IEEE Trans. Fuzzy Syst. 2019, 28, 731–745. [Google Scholar] [CrossRef]
  29. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  30. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1724–1734. [Google Scholar]
  31. Zhang, C.-X.; Li, J.; Huang, X.-F.; Zhang, J.-S.; Huang, H.-C. Forecasting stock volatility and value-at-risk based on temporal convolutional networks. Expert Syst. Appl. 2022, 207, 117951. [Google Scholar] [CrossRef]
  32. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar] [CrossRef]
  33. Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? arXiv 2022, arXiv:2205.13504. [Google Scholar] [CrossRef]
  34. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI: Washington, DC, USA, 2021; Volume 35, pp. 11106–11115. [Google Scholar]
  35. Jorion, P. Value at Risk: The New Benchmark for Managing Financial Risk, 3rd ed.; McGraw-Hill: Columbus, OH, USA, 2007. [Google Scholar]
  36. Kupiec, P.H. Techniques for verifying the accuracy of risk measurement models. J. Deriv. 1995, 3, 73–84. [Google Scholar] [CrossRef]
  37. Christoffersen, P.F. Evaluating interval forecasts. Int. Econ. Rev. 1998, 39, 841–862. [Google Scholar] [CrossRef]
  38. Engle, R.F.; Manganelli, S. CAViaR: Conditional autoregressive value at risk by regression quantiles. J. Bus. Econ. Stat. 2004, 22, 367–381. [Google Scholar] [CrossRef]
  39. Ljung, G.M.; Box, G.E.P. On a measure of lack of fit in time series models. Biometrika 1978, 65, 297–303. [Google Scholar] [CrossRef]
  40. Basel Committee on Banking Supervision. Amendment to the Capital Accord to Incorporate Market Risks; Bank for International Settlements: Basel, Switzerland, 1996. [Google Scholar]
  41. Lopez, J.A. Methods for evaluating value-at-risk estimates. FRBNY Econ. Policy Rev. 1999, 5, 179–188. [Google Scholar] [CrossRef]
  42. Sarma, M.; Thomas, S.; Shah, A. Selection of value-at-risk models. J. Forecast. 2003, 22, 337–358. [Google Scholar] [CrossRef]
  43. Campbell, S.D. A review of backtesting and backtesting procedures. J. Risk 2005, 9, 1–17. [Google Scholar] [CrossRef]
  44. Nieppola, O. Backtesting Value-at-Risk Models; Acta Wasaensia No. 211; University of Vaasa: Vaasa, Finland, 2009. [Google Scholar]
Figure 1. Neural architecture of the Lee Oscillator with Retrograde Signaling (LORS).
Figure 1. Neural architecture of the Lee Oscillator with Retrograde Signaling (LORS).
Mathematics 13 03587 g001
Figure 2. Bifurcation diagrams for the ten LORS types.
Figure 2. Bifurcation diagrams for the ten LORS types.
Mathematics 13 03587 g002
Figure 3. Architecture of the Dynamic Fractal–Chaotic LSTM (DFC-LSTM) cell.
Figure 3. Architecture of the Dynamic Fractal–Chaotic LSTM (DFC-LSTM) cell.
Mathematics 13 03587 g003
Figure 4. Time series plots for the S&P 500 (left column) and AAPL (right column) datasets. (a) S&P 500 Log-Return ( r t ). (b) AAPL Log-Return ( r t ). (c) S&P 500 Realized Volatility (RV). (d) AAPL Realized Volatility (RV). (e) S&P 500 Multifractal Spectrum Width ( Δ α ). (f) AAPL Multifractal Spectrum Width ( Δ α ).
Figure 4. Time series plots for the S&P 500 (left column) and AAPL (right column) datasets. (a) S&P 500 Log-Return ( r t ). (b) AAPL Log-Return ( r t ). (c) S&P 500 Realized Volatility (RV). (d) AAPL Realized Volatility (RV). (e) S&P 500 Multifractal Spectrum Width ( Δ α ). (f) AAPL Multifractal Spectrum Width ( Δ α ).
Mathematics 13 03587 g004
Table 1. Parameter settings for the 10 types of Lee Oscillators used in experiments.
Table 1. Parameter settings for the 10 types of Lee Oscillators used in experiments.
Param.T1T2T3T4T5T6T7T8T9T10
a 1 0.00.50.5−0.5−0.9−0.9−5.0−5.01.03.0
a 2 5.00.550.60.550.90.95.05.0−1.03.0
a 3 5.00.550.550.550.90.95.05.0−1.03.0
a 4 1.0−0.50.5−0.5−0.9−0.9−5.0−5.0−1.02.0
b 1 0.00.5−0.5−0.50.90.91.01.0−1.00.45
b 2 −1.0−0.55−0.6−0.55−0.9−0.9−1.0−1.02.0−0.45
b 3 1.0−0.55−0.55−0.55−0.9−0.9−1.0−1.02.0−0.45
b 4 0.0−0.50.50.50.90.91.01.0−1.01.0
μ 5111111111
k50050505050300503005050
e0.0010.0010.0010.0010.0010.0010.0010.0010.0010.001
Table 2. Descriptive statistics and diagnostic tests for key time series.
Table 2. Descriptive statistics and diagnostic tests for key time series.
AssetVariableMeanStd. Dev.MaxMinSkewnessKurtosisJB Stat.ADF Stat.KPSS Stat.ARCH-LM Stat.
S&P 500Log-Return ( r t )0.04551.07958.9683−12.7652−0.765114.162630,463.51 ***−13.15 ***0.02621352.02 ***
Realized Vol. (RV)0.60211.565140.73360.008813.4219247.44469.30 × 106 ***−8.20 ***0.27582129.69 ***
Delta Alpha ( Δ α )0.96120.43033.28020.11790.81261.5395752.38 ***−4.88 ***0.33703444.55 ***
AAPLLog-Return ( r t )0.08431.773211.3081−13.7713−0.25155.71264595.97 ***−18.62 ***0.0482367.29 ***
Realized Vol. (RV)1.71352.752649.15030.08569.4176126.24272.28 × 106 ***−9.98 ***0.4247 *783.30 ***
Delta Alpha ( Δ α )1.08400.30142.45940.2631−0.02530.432726.53 ***−4.22 ***0.4128 *3129.08 ***
Note: *** and * denote rejection of the null hypothesis at the 1% and 10% significance levels, respectively.
Table 3. Ablation study results for S&P 500 at α = 0.05 .
Table 3. Ablation study results for S&P 500 at α = 0.05 .
ModelVRUC p-valCC p-valDQ p-valL-B p-valBasel ZoneRQL LossFS Loss
Standard LSTM (Baseline)0.04090.25170.17420.07290.0046Green0.09740.0567
(A) DFC-LSTM (Gate-Only)0.05220.79080.72610.04430.0035Green0.12810.0760
(B) Chaotic-LSTM (T10)0.05220.79080.72610.02320.0001Green0.12570.0737
(C) DFC-LSTM (Full, T10)0.04940.93810.64750.19710.0349Green0.12030.0711
Note: L-B p-val is for the Ljung-Box Q(10) statistic. Values in italics indicate failure to pass the test at the 5% significance level.
Table 4. Ablation study results for AAPL at α = 0.05 .
Table 4. Ablation study results for AAPL at α = 0.05 .
ModelVRUC p-valCC p-valDQ p-valL-B p-valBasel ZoneRQL LossFS Loss
Standard LSTM (Baseline)0.04090.26920.15650.32980.0216Green0.10660.0660
(A) DFC-LSTM (Gate-Only)0.03790.13620.32910.55860.2560Green0.10240.0648
(B) Chaotic-LSTM (T6)0.05300.72340.67150.51320.1132Green0.13970.0869
(C) DFC-LSTM (Full, T1)0.04850.85760.54061.00000.2415Green0.12560.0774
Table 5. Full out-of-sample VaR backtesting results for S&P 500 at α = 0.05 .
Table 5. Full out-of-sample VaR backtesting results for S&P 500 at α = 0.05 .
ModelVRUC p-valCC p-valDQ p-valL-B p-valBasel ZoneRQL LossFS Loss
DFC-LSTM (T10)0.04940.93810.64750.19710.0349Green0.12030.0711
GARCH-LSTM0.04650.66940.49460.09460.0065Green0.11440.0680
TCN0.04370.43370.32050.02900.0031Green0.10570.0621
Standard LSTM0.04090.25170.17420.07290.0046Green0.09740.0567
Standard GRU0.03810.12920.21310.04530.0013Green0.09030.0524
Transformer (Tuned)0.06060.20740.43770.09250.0023Green0.14230.0818
Informer (Tuned)0.07480.00470.00590.06170.0093Yellow0.16880.0942
Note: Values in italics indicate failure to pass the test at the 5% significance level.
Table 6. Full out-of-sample VaR backtesting results for AAPL at α = 0.05 .
Table 6. Full out-of-sample VaR backtesting results for AAPL at α = 0.05 .
ModelVRUC p-valCC p-valDQ p-valL-B p-valBasel ZoneRQL LossFS Loss
DFC-LSTM (T1)0.04850.85760.54061.00000.2415Green0.12560.0774
GARCH-LSTM0.04550.58660.37840.65190.1193Green0.11940.0742
TCN0.04700.71830.46240.85390.1759Green0.12700.0803
Standard LSTM0.04090.26920.15650.32980.0216Green0.10660.0660
Standard GRU0.04700.71830.46240.12500.1759Green0.12210.0754
Transformer (Tuned)0.09390.00000.00000.00010.0018Red0.27930.1856
Informer (Tuned)0.05450.59720.34180.14350.1053Green0.14580.0914
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zeng, Y.; Tang, B.; Zhou, Z.; Lee, R.S.T. DFC-LSTM: A Novel LSTM Architecture Integrating Dynamic Fractal Gating and Chaotic Activation for Value-at-Risk Forecasting. Mathematics 2025, 13, 3587. https://doi.org/10.3390/math13223587

AMA Style

Zeng Y, Tang B, Zhou Z, Lee RST. DFC-LSTM: A Novel LSTM Architecture Integrating Dynamic Fractal Gating and Chaotic Activation for Value-at-Risk Forecasting. Mathematics. 2025; 13(22):3587. https://doi.org/10.3390/math13223587

Chicago/Turabian Style

Zeng, Yilong, Boyan Tang, Zhefang Zhou, and Raymond S. T. Lee. 2025. "DFC-LSTM: A Novel LSTM Architecture Integrating Dynamic Fractal Gating and Chaotic Activation for Value-at-Risk Forecasting" Mathematics 13, no. 22: 3587. https://doi.org/10.3390/math13223587

APA Style

Zeng, Y., Tang, B., Zhou, Z., & Lee, R. S. T. (2025). DFC-LSTM: A Novel LSTM Architecture Integrating Dynamic Fractal Gating and Chaotic Activation for Value-at-Risk Forecasting. Mathematics, 13(22), 3587. https://doi.org/10.3390/math13223587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop