An Improved GRU Financial Time Series Prediction Model

Li, Yong

doi:10.3390/fractalfract10040227

Open AccessArticle

An Improved GRU Financial Time Series Prediction Model

by

Yong Li

Business School, China University of Political Science and Law, Beijing 100085, China

Fractal Fract. 2026, 10(4), 227; https://doi.org/10.3390/fractalfract10040227

Submission received: 15 February 2026 / Revised: 21 March 2026 / Accepted: 27 March 2026 / Published: 28 March 2026

(This article belongs to the Special Issue Multifractal Analysis and Complex Systems)

Download

Browse Figures

Versions Notes

Abstract

Forecasting financial time series (FTS) is essential for analyzing and understanding the dynamics of financial markets. Traditional recurrent neural network (RNN) models often suffer from low prediction accuracy on non-stationary and abruptly changing data, as their gating mechanisms struggle to capture evolving trends in FTS. This paper introduces variational mode decomposition (VMD) and multifractal analysis to enhance the gating mechanism of the gated recurrent unit (GRU). By quantifying the changing characteristics of FTS, the proposed model dynamically adjusts the gating weights. In addition, a state fusion strategy is employed to improve the utilization efficiency of historical information. Experiments are conducted using daily data of the SSE 50, CSI 300, and CSI 1000 indices, spanning from 4 January 2002, to 26 December 2025. The results demonstrate that, compared to traditional models, the proposed model better captures the evolving characteristics of FTS and achieves higher prediction accuracy.

Keywords:

multifractal characteristics; financial time series; gate recurrent unit; attention mechanism; forecasting

1. Introduction

Mining the critical information embedded in financial time series (FTS) and constructing effective predictive models is key to analyzing and grasping the dynamics of financial markets. However, this task is challenging due to the nonlinear, non-stationary, and serially correlated nature of FTS data. With the advancement of artificial intelligence, training models using historical data and other relevant features provides a robust solution for forecasting future financial market trends [1]. Current research on FTS forecasting, both in academia and industry, primarily focuses on two categories. The first involves various deep learning prediction models built upon neural networks. Recurrent neural network (RNN) models, endowed with memory capabilities, exhibit excellent fitting ability for nonlinear relationships [2]. Nevertheless, when learning long sequences, standard RNNs are prone to the vanishing and exploding gradient problems, which hinder their ability to capture long-span nonlinear dependencies [3]. To address this issue, Hochreiter and Schmidhuber [4] proposed the long short-term memory (LSTM) model. The LSTM introduces a gating mechanism that utilizes activation functions and the Hadamard product to control information flow, thereby enhancing the representational power for processing long sequences [5]. Numerous researchers have achieved varying degrees of successes in forecasting FTS using LSTM models or their variants [6,7,8]. However, the iterative process of adjusting the number and combination of parameters within the LSTM framework can be time-consuming and detrimental to its overall performance [9,10]. Cho et al. [11] proposed the gated recurrent unit (GRU) by merging the LSTM’s forget and input gates, thereby reducing the number of training parameters, improving convergence speed, and achieving greater computational efficiency. This simplification also lowers the risk of model overfitting, especially when time series data are limited. Particularly for data containing substantial noise and outliers, the streamlined structure of the GRU offers better regularization effects [12].

The other category involves various Transformer models built upon attention mechanisms. Fundamentally built on attention, the Transformer leverages self-attention to directly capture global dependencies among sequence elements and multi-head attention to extract correlational information from various subspaces. Its purely feedforward architecture facilitates a high degree of parallelization, leading to substantial gains in training efficiency [13]. Additionally, its stacked multi-layer encoder–decoder structure is equipped with powerful hierarchical feature extraction capabilities [14]. Yao et al. [15] demonstrate that the parallel processing of multiple attention functions via the multi-head attention mechanism in Transformers can accurately capture the temporal evolution characteristics of extreme risks in global stock markets. Bolchini et al. [16] indicate that Transformer models, by leveraging high-dimensional embedding algorithms and attention mechanisms, can successfully identify market noise, as exemplified by irrational trading behavior. Furthermore, through an analysis of the model’s information mining mechanism, their study helps explain the past “disappearance of the monthly momentum effect” in the A-share market. However, some scholars [17,18] point out that the Transformer architecture has limitations regarding its computational efficiency, model structure, training requirements, and adaptability to specific tasks and data, particularly noting that it is prone to overfitting when a task exhibits strong inherent sequential structure priors [19]. In scenarios with limited datasets, LSTM and GRU often outperform Transformer models [20].

Financial markets are complex nonlinear dynamic systems, shaped by factors ranging from macroeconomic conditions and political events to investor expectations. This complexity arises because FTS evolve through alternating periods of moderate fluctuation and drastic booms or busts. During transitions between bull and bear markets or at the onset of structural trends, price levels may undergo systematic increases or decreases, resulting in trend mutations. This inherent nature makes FTS exhibit stronger volatility and randomness than other natural time series, such as water flow variations or audio signals. Owing to the intense nonlinear effects in the evolution of FTS, the internal gating mechanisms of existing predictive models struggle to effectively capture changing trends, performing particularly poorly on segments characterized by abrupt and drastic changes [21]. According to existing literature, no single model or set of indicators has yet proven to be a completely reliable forecasting tool. To address these challenges, this study proposes VMD-MF-GRU, an improved gated recurrent unit model that integrates variational mode decomposition (VMD), multifractal analysis (MF), and a state fusion strategy. By optimizing the traditional GRU gating mechanism, the proposed model enhances adaptive learning of abrupt-change features, thereby improving the accuracy and stability of FTS forecasting.

FTS are typically a mixture of long-term trends, medium-term cycles, and short-term fluctuations [22], whose driving mechanisms and statistical properties vary significantly. Since employing a single model to capture all these patterns simultaneously can easily lead to overfitting or underfitting, we adopt the VMD algorithm—an effective denoising method [23]—to separate the multi-scale complex modes contained within the series. After decomposition, the relatively stable trend and cyclical components, which exhibit stronger regularity, are extracted and modeled first. The randomness in the residual is then processed separately. This sequential approach helps prevent noise from interfering with the overall model, allowing predictions to focus more effectively on the potentially predictable signals underlying the series.

Financial time series are inherently characterized by time-varying complexity, a property that can be effectively quantified using multifractal analysis [24]. Specifically, the multifractal spectrum width measures the uniformity of probability distributions across scales, thereby reflecting the complexity and non-stationarity of the system at a given point in time. A wider spectrum width indicates greater unevenness in local fluctuation structures, suggesting a tendency toward disorder and instability [25]. During extreme events such as financial crises or market crashes, the spectrum width often expands significantly, signaling heightened volatility heterogeneity and irregularity [26]. This time-varying nature positions the multifractal spectrum width as a valuable indicator for characterizing the local dynamics of financial time series.

In the Gated Recurrent Unit (GRU), the update gate determines the proportion of the previous hidden state that flows into the current state, thereby controlling the model’s retention of historical information. In traditional GRU architectures, the update gate weights become fixed after training, limiting the model’s ability to adapt to the evolving complexity of financial time series. To address this limitation, recent research has explored adaptive mechanisms for GRU gating structures. For instance, Wang et al. [27] introduced the optimized gated recurrent unit, which enhances information processing and learning efficiency through refined unit architecture and learning mechanisms. In the context of multimodal sentiment analysis, Shi [28] developed learnable gating sensitivity parameters that dynamically adjust gate biases using real-time noise estimates, thereby improving model robustness. While existing literature has extensively validated the use of multifractal analysis for characterizing volatility and complexity in financial markets, the direct association between spectrum width and structural regime transitions remains an open question. In this study, we adopt a pragmatic, data-driven approach: we utilize the spectrum width as an adaptive signal to modulate the GRU gating mechanism, and the empirical results demonstrate the effectiveness of this heuristic strategy. Together, these findings suggest that incorporating adaptability into GRU gating mechanisms is an effective means of enhancing model performance.

Building on the above analysis, this study proposes the following theoretical hypothesis: when financial time series enter a period of high complexity—i.e., when the multifractal spectrum width is large—the structural stability of long-term historical patterns diminishes, and their predictive power for future states weakens accordingly. In such cases, the model should reduce its reliance on long-range historical information and instead focus more on local, recent data features to enhance its adaptability to current structural changes. The validity of this hypothesis is supported on two levels. First, empirical studies have shown that multifractal characteristics are closely associated with market states, with the spectrum width widening significantly during crises—a pattern that reflects structural changes in the data-generating process. Second, the update gate in GRU inherently governs the retention of historical information; introducing an adaptive adjustment mechanism tied to data complexity enables the model’s memory strength to align with the intrinsic dynamic characteristics of the data.

To realize this dynamic adjustment mechanism, the proposed VMD-MF-GRU model incorporates an adaptive update gate control strategy based on the multifractal spectrum width. Specifically, the multifractal spectrum width at the current time step is passed through an activation function to dynamically adjust the weight matrix of the update gate, allowing the model to adaptively compress or expand the retention of historical information in response to real-time changes in data complexity. When the spectrum width is large—indicating high complexity—the update gate is suppressed, and the model reduces its reliance on long-term memory. Conversely, when the spectrum width is small—indicating low complexity—the update gate operates at its conventional level, enabling the model to fully leverage historical information for stable prediction. This design directly links the temporal structural instability captured by the multifractal spectrum width to the adaptive regulation of the GRU memory mechanism, endowing the model with the theoretical capacity to dynamically adjust memory strength according to the intrinsic state of the data. As such, it provides a clear theoretical foundation for improving predictive performance on non-stationary financial time series.

To validate the proposed approach, this study uses China’s major stock market indices—the SSE 50, CSI 300, and CSI 1000—as examples. External influencing factors are systematically analyzed, and the VMD-MF-GRU model is applied to individually forecast the decomposed trend, cyclical, and random components, each characterized by distinct scales and practical implications.

2. Fundamental Theories and Methods

2.1. VMD

Influenced by factors such as economic cycles, valuation levels, event-driven dynamics, and market sentiment, stock markets manifest as cyclical, non-stationary time series, the decomposition and prediction of the raw data from such series are crucial for determining market trends. As a signal decomposition method, the core principle of VMD is to transform the signal decomposition problem into a variational optimization problem. Specifically, to minimize the sum of the estimated bandwidths of all modes, VMD seeks a set of modal functions with specific bandwidth constraints, thereby achieving adaptive, non-recursive signal decomposition. The procedure of VMD for processing non-stationary and nonlinear signals is as follows:

Step 1: Construct a constrained variational model. The input original signal F_t is decomposed into k components. The expression for the corresponding constrained variational model is:

\min_{\{ω_{k}\}, {u_{k}}} {\sum_{k = 1}^{K} | | \frac{\partial F}{\partial t [(δ (t) + \frac{j}{π_{t}}) * u_{k} (t)] e^{- j ω_{k} t}} | |_{2}^{2}}, s . t . \sum_{k = 1}^{K} u_{k} = F (t) .

(1)

where δ(t) is the Dirac delta function, {u_k} and {ω_k} represent the set of k decomposed modes and their corresponding center frequencies, respectively, and ∗ denotes the convolution operation.

Step 2: Solve the constructed constrained variational model. A quadratic penalty factor β and a Lagrange multiplier λ(t) are introduced to transform the constrained problem into an unconstrained one. The augmented Lagrangian expression is given by:

\begin{matrix} L (\{ω_{k}\}, \{u_{k}\}, λ) = \\ β \sum_{k = 1}^{K} | | \frac{\partial f}{\partial t [(δ (t) + \frac{j}{π_{t}}) * u_{k} (t)] e^{- j ω_{k} t}} | |_{2}^{2} + \\ | | f (t) - \sum_{k = 1}^{K} u_{k} (t) | |_{2}^{2} + < λ (t), f (t) - \sum_{k = 1}^{K} u_{k} (t) > . \end{matrix}

(2)

The solution to the augmented Lagrangian function is obtained by applying the alternating direction method of multipliers, combined with the Fourier isometric transform.

\sum_{k = 1}^{K} \frac{| | u_{k}^{n + 1} - u_{k}^{n} {| |}_{2}^{2}}{| | u_{k}^{n} {| |}_{2}^{2}} < ε

(3)

Update u_k, ω_k, and λ until the condition specified in Equation (3) is met. The optimal solution u_k is then obtained.

2.2. Multifractal Analysis

By using the multifractal method to calculate the probability measure and singularity exponent of sequence intervals, the resulting distribution interval of the singularity exponent can characterize the degree of variation within non-stationary data segments. The calculation steps are as follows:

Step 1: Probability measure. The financial time series is divided into fixed windows of scale s. The probability measure for the i-th window of length s is expressed as:

P_{i} (s) = u_{i} (s) / \sum_{i = 0}^{N} u_{i} (s) .

(4)

where u_i(s) represents the sum of all sample values within the i-th interval when the window size is s, and N is the total number of time windows of size s.

Step 2: Determine the singularity exponent. The singularity exponent α governs the singularity of the probability measure. The expression for the singularity exponent of the i-th window is

α_{i} = \frac{\ln P_{i} (s)}{\ln s}

(5)

The value of α_i reflects the probability magnitude of the measure within the partitioned interval corresponding to the window and is distributed within a finite interval [α_min,α_max], where −∞ ≤ α_min ≤ α_max ≤ + ∞, α_min and α_max represent the lower and upper bounds of the singularity of the variable’s distribution in space, respectively. Let N_α(s) denote the number of units (or boxes) within the fractal that share the same singularity exponent α.

Step 3: Construct the multifractal spectrum and calculate the spectrum width. If the FTS exhibits multifractal characteristics, then within a certain range of scales, N_α(s) and the window size s satisfy the following relationship:

N_α(s)∝s^−f(α)

(6)

where f(α) is the multifractal spectrum.

f (α) = \lim_{s \to 0} \frac{\ln N_{α} (s)}{\ln s}

(7)

For the i-th interval with window size s, corresponding to the singularity set {α}, the width of the multifractal spectrum is expressed as:

∆α_i = max(α_i) − min(α_i).

(8)

Δα_i reflects the degree of inhomogeneity in the probability measure distribution of the fractal structure and is used to characterize the degree of variation in non-stationary financial time series data. A larger Δα_i indicates a more uneven distribution and more drastic changes within that interval.

By applying a fixed-size sliding window across the time series and performing multifractal feature extraction on the resulting segments, we obtain the spectrum width matrix E.

E = [\begin{matrix} E_{1} \\ E_{2} \\ ⋮ \\ E_{N} \end{matrix}] = [\begin{matrix} Δ α_{1}^{1} & Δ α_{1}^{1} & \dots & Δ α_{1}^{s} \\ Δ α_{2}^{1} & Δ α_{2}^{2} & \dots & Δ α_{2}^{s} \\ ⋮ & ⋮ & ⋮ \\ Δ α_{N}^{1} & Δ α_{N}^{2} & \dots & Δ α_{N}^{s} \end{matrix}]

(9)

2.3. Improved GRU

The gating mechanism of the traditional GRU model struggles to capture jumping behaviors in financial markets, leading to poor prediction performance on segments characterized by abrupt changes. A GRU is primarily composed of a reset gate and an update gate. The reset gate determines the degree of combination between the previous moment’s information and the current moment’s input, while the update gate decides how much of the previous moment’s state information is retained at the current moment. Previous studies have shown that the update gate plays a particularly crucial role in regulating information flow within the GRU architecture, and its configuration significantly affects the model’s ability to capture long-term dependencies [29]. However, the gating components of the traditional GRU exhibit strong randomness during the training process, making it difficult to learn non-stationary abrupt change features, which adversely affects prediction performance. To address this limitation, this study improves the GRU based on multifractal analysis by utilizing the multifractal spectrum width to quantify data variation and establishing a new dynamic adjustment weight matrix to replace the traditional update gate. Specifically, the update gate weight matrix in the original GRU is split into two new matrices, which are then multiplied respectively by the multifractal spectrum width matrix processed by an activation function, σ(E) and (1 − σ(E)), to obtain the temporary gating output, as illustrated in Figure 1.

In the VMD-MF-GRU, the hidden state at time t is computed by summing the outputs of the two temporary update gates, yielding the overall update gate output. This output is then used in subsequent calculations, while all other update rules remain consistent with the original GRU. The transmission process of the new GRU hidden state at time t is as follows:

z_{t}^{1} = σ (W_{z 1} [h_{t - 1}, x_{t}] ⊙ σ (E) + b_{z 1})

(10)

z_{t}^{2} = σ (W_{z 2} [h_{t - 1}, x_{t}] ⊙ (1 - σ (E)) + b_{z 2})

(11)

z_{t} = z_{t}^{1} + z_{t}^{2}

(12)

r_{t} = σ (W_{r} [h_{t - 1}, x_{t}] + b_{r})

(13)

a_{t} = \tan h (W_{a} [r_{t} ⊙ h_{t - 1}, x_{t}] + b_{a})

(14)

h_{t} = h_{t - 1} ⊙ (1 - z_{t}) + z_{t} ⊙ a_{t})

(15)

y_{t} = σ (W_{a} \cdot h_{t})

(16)

where x_t is the input vector at time t; z_t and r_t are the outputs of the update gate and reset gate at time t, respectively; h_t₋₁ and h_t denote the outputs at time t − 1 and t, respectively;

σ (\cdot)

is the Sigmoid activation function; E represents the multifractal spectrum width matrix of the input data;

z_{t}^{1}

and

z_{t}^{2}

are the outputs of the two temporary update gates; W_z₁ and W_z₂ are the weight matrices for the temporary update gates; W_r and W_a are the weight matrices for the reset gate and the candidate set, respectively; and b_z, b_r, and b_a are the bias vectors.

To dynamically integrate multifractal characteristics into the time series modeling process, we design an adaptive adjustment mechanism. At each time step t, the pre-computed singularity spectrum width E is fed into a learnable mapping function to generate a scaling factor σ(E). This factor dynamically adjusts the candidate update gate within the GRU, enabling the model to adaptively modify its memory and update rates based on the current market’s fractal complexity. This design ensures that the spectrum width variable is deeply coupled with the recurrent training process of the GRU.

The newly designed VMD-MF-GRU enhances the memory capacity of the update gate, enabling it to adaptively adjust weights based on the multifractal spectrum width. As illustrated in Figure 1, the update gate controls how much of the previous moment’s state information is carried into the current state, meaning that a larger update gate retains more information from the previous state. The reset gate governs how much information from the previous state is written into the current candidate set a_t. The smaller the reset gate, the less information from the previous state is incorporated.

2.4. State Fusion Strategy for VMD-MF-GRU

The hidden state of a recurrent neural network is determined not only by the input at the current time step but also by the hidden state from the previous time step. However, traditional recurrent neural networks often fail to fully account for the potential influence of earlier historical states on the current state. To enhance the network’s utilization of historical information, this study proposes an improved state fusion strategy for recurrent neural networks (see Figure 2). The core idea is to employ a one-dimensional convolutional neural network with a kernel size of 1 to fuse the hidden states at each time step, thereby strengthening the model’s ability to learn long-range dependencies within the data. In the specific implementation, the hidden state at the current moment is not only passed to the next moment but is also fused with the hidden state at the next moment. This fused representation then propagates forward through subsequent network layers as time iterates. Through this mechanism, cross-temporal fusion of hidden states is effectively achieved, enhancing the model’s capability to capture time-dependent features. The update expression for the hidden state is as follows:

{\tilde{h}}_{t} = c o n v (c a t ({\tilde{h}}_{t - 1}, h_{t})) .

(17)

where

{\tilde{h}}_{t}

represents the output of the hidden layer at time t after applying the recurrent neural network state fusion strategy, cat(⋅) denotes the concatenation operation along a specified dimension, conv(⋅) represents the convolution operation with a kernel size of 1.

2.5. FTS Forecasting Workflow

The workflow for FTS forecasting using the proposed VMD-MF-GRU model consists of three main stages: data preprocessing, model training, and prediction. The overall process is illustrated in Figure 3. First, the VMD method is applied to decompose the original FTS into trend, cyclical, and random components. Second, multifractal algorithms are integrated to extract variation characteristics from each component, alongside a state fusion strategy that progressively integrates hidden layer state information. After training and validation, the modified GRU model forecasts each component individually. Finally, these component forecasts are superimposed to reconstruct a complete FTS prediction.

3. Experimental Design

3.1. Data Sources

To validate the effectiveness of the proposed model, three stock indices—the SSE 50, CSI 300, and CSI 1000—were selected for empirical analysis. The CSI 300 is used as the primary research object, while the prediction method and parameter settings for the other two indices are analogous to those applied to the CSI 300. Daily data were obtained for the period from 4 January 2002, to 26 December 2025. To evaluate the predictive performance of the proposed model, the dataset is partitioned into three subsets—training, validation, and test sets—following the temporal order to prevent look-ahead bias. Specifically, the data from 4 January 2002 to 31 July 2019 are used as the training set for model parameter learning. The subsequent period from 1 August 2019 to 31 December 2023 serves as the validation set for hyperparameter tuning and early stopping to mitigate overfitting. The remaining data from 1 January 2024 to 26 December 2025 are reserved as the test set for final model evaluation. This chronological splitting strategy ensures that the model is evaluated on out-of-sample data that genuinely reflects future unseen periods, thereby providing a reliable assessment of its generalization capability.

Following previous studies [26,27,30], this study adopts a dual criterion of ‘theoretical drive + statistical verification’ for the selection of influential factors. First, based on classical asset pricing theory, we construct an initial factor pool from four dimensions: valuation, growth, monetary policy, and sentiment. Second, through correlation analysis and univariate predictive ability tests, we eliminate redundant and ineffective factors, ultimately selecting the quantile of the price-to-earnings (PE) ratio, GDP growth rate, the three-year interest rate, and a market sentiment index as the core predictive variables. This factor set achieves an adjusted R² of 77.85% in explaining the historical fluctuations of the cyclical component, indicating that it captures the primary sources of variation. Further robustness tests reveal that introducing additional macroeconomic variables does not significantly improve the model’s predictive accuracy, thereby validating the effectiveness and relative completeness of the current factor set. Market sentiment is measured using the China’s Stock Market Investor Sentiment Index (CICSI) from Shanghai Jiao Tong University. All other data are sourced from the Wind database.

We normalize the input data to prevent neurons from becoming saturated and losing their learning ability—a common issue with sigmoid activation functions when input values are too large. Scaling the input variables into the interval [0,1] enables the sigmoid function to operate more effectively. The normalization procedure applied to the input variables in this study is as follows:

x_{i}^{*} = \frac{x_{i} - x_{\min}}{x_{\max} - x_{\min}}, i = 1, 2, \dots, n .

(18)

where

x_{i}^{*}

is the normalized result, ranging from 0 to 1; x_i is the value in the original time series; and x_min and x_max are the minimum and maximum values in the original time series, respectively.

The normalization method for the output variable is as follows:

y_{s c a l e d} = y \times (\max - \min) + \min .

(19)

where max and min define a range selected based on the effective operating interval of the neural network’s activation function; y_scaled is the final output data fed into the model after processing.

3.2. Parameter Comparison Experiment

To comprehensively evaluate the predictive performance and generalization capability of the proposed model, we design a rigorous experimental protocol encompassing four key aspects: forecasting strategy, parameter optimization, training stopping criterion, and evaluation methodology. First, regarding the forecasting strategy, considering the time-varying nature of FTS, we adopt a rolling window forecasting approach. This strategy maintains a fixed window size on the training set and progressively slides forward to generate multiple training–testing cycles, thereby simulating real-world trading environments as closely as possible and assessing the model’s robustness under different market conditions. Second, in the model training phase, we employ a grid search approach combined with a validation set for hyperparameter optimization. Third, to prevent overfitting and reduce training time, we introduce an early stopping criterion. Specifically, during the training process, if the loss function on the validation set does not decrease for ten consecutive epochs, training is halted, and the model weights corresponding to the lowest validation loss are saved as the final model. Finally, in the model evaluation phase, we strictly adhere to an out-of-sample evaluation protocol. Throughout the hyperparameter optimization and model selection process, the test set remains completely isolated and is used only once in the final evaluation. To assess the predictive performance of the model, the mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and coefficient of determination (R²) are adopted as evaluation metrics. Their calculation formulas are as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(20)

M A P E = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} |

(21)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(22)

R^{2} = \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(23)

where n is the total number of observed samples, y_i is the actual value,

{\hat{y}}_{i}

is the predicted value, and

{\bar{y}}_{i}

is the mean of the actual sample values.

The hyperparameters involved in VMD-MF-GRU mainly include the number of network layers and the number of hidden layer nodes. The selection of these hyperparameters influences the training outcome. Based on empirical knowledge, the search range for the number of hidden layer nodes is set to [16, 64]. Subsequently, the optimal number of network layers and hidden layer nodes for VMD-MF-GRU is determined through comparative experiments. The number of training iterations for the model is set to 1200. Under the same training dataset, the four evaluation metrics are employed as selection criteria. The prediction results of the VMD-MF-GRU model under different hyperparameters are presented in Table 1.

Table 1 shows that the model achieves its optimal predictive performance when the number of network layers is 2 and the number of nodes in each hidden layer is 32. Under this configuration, the model achieves strong performance across all metrics. Specifically, the MAE is 0.4583, the RMSE is 0.5235, the MAPE is 0.0198, and the R² is 0.7523. The prediction error initially decreases and then increases as the number of nodes in each hidden layer grows. This trend suggests that an overly simple network structure may lead to underfitting, while an excessively complex one can result in overfitting. When the number of nodes in each hidden layer is held constant, increasing the number of network layers tends to degrade model performance.

To further analyze the impact of economic cycles, valuation levels, monetary policy, and market sentiment on stock prices, we configure different input window sizes to calculate the RMSE of the VMD-MF-GRU’s predictions for the cyclical components obtained from sequence decomposition. The results are presented in Table 2.

As shown in Table 2, the RMSE is minimized when the window size is set to 9 months. Therefore, the time step for training is set to 9. MSE is adopted as the loss function, tanh is used as the activation function for the output layer, and the Adam method is employed as the parameter optimization algorithm.

3.3. Decomposition of FTS Data and Its Corresponding Multifractal Spectrum Characteristics

Figure 4 presents the CSI 300 time series curve, along with its trend, cyclical, and random components as derived from VMD. The trend component primarily represents the long-term direction of stock prices, usually associated with company fundamentals such as sustained earnings growth, industry position consolidation, management efficiency improvements, or a positive long-term macroeconomic outlook. It reflects the core evolution path of asset value over several years or longer spans, stripped of short- to medium-term fluctuations. The cyclical component captures the medium-term fluctuations around the trend. These fluctuations often exhibit a certain degree of cyclicity and are related to macroeconomic cycles, industry boom–bust cycles, market sentiment cycles, or the allocation cycles of large capital. The random component is typically driven by unpredictable, instantaneous information shocks, such as unexpected company news, minor policy adjustments, market rumors, local liquidity changes, or pure trading noise.

Figure 5a presents the cyclical component of the CSI 300 index alongside the time-varying curve of the spectrum width, which quantifies index changes using multifractal analysis. It reveals that the trends of the cyclical component and the spectrum width are generally synchronized. In segments with abrupt changes, the spectrum width values are large, approaching 1. In segments with relatively stable fluctuations, the spectrum width variation is small, ranging within [0,0.3]. This indicates that the spectrum width effectively characterizes the cyclical variation features of the FTS. Figure 5b displays the random component and its corresponding spectrum width curve over time. In segments where the random component fluctuates sharply, the spectrum width value tends towards 1. In relatively stable data segments, the spectrum width value ranges within [0,0.2]. The spectrum width is thus capable of capturing the stochastic variation characteristics induced by external random factors. Consequently, incorporating multifractals into the GRU allows it to characterize the variation features and trends of FTS data. Using the spectrum width to control the adaptive adjustment of update gate weights is expected to enhance prediction accuracy.

3.4. Prediction Results of the VMD-MF-GRU

To verify the effectiveness and practicality of the VMD-MF-GRU, its prediction results are compared and analyzed against those of the LSTM, GRU, and Transformer models.

3.4.1. Frequency-Division Prediction of FTS

For the prediction of the cyclical component, valuation levels, economic cycles, monetary policy, and market sentiment indices are selected as influencing factors. For the prediction of the random component, changes in monetary policy and market sentiment indices are used as influencing factors. The prediction results for the decomposed components of the CSI 300 time series are shown in Figure 6.

In terms of predicting both the trend and cyclical components, the VMD-MF-GRU model outperforms the Transformer, LSTM, and GRU models. This improvement is attributed to the introduction of multifractals, which enhances the memory capacity of the update gate. Moreover, due to the strong correlation between the influencing factors and the target variable, the model adaptively adjusts weights based on the spectrum width to learn the abrupt change and time-lag characteristics induced by valuation levels, economic cycles, monetary policy, and market sentiment. Consequently, the prediction performance for segments with abrupt changes is significantly improved, as illustrated in Figure 7.

Regarding the prediction of the random component, changes in the market sentiment index are the dominant influencing factor. However, due to its substantial randomness, it interferes with the memory capacity of the gating unit for long-term sequential features. Therefore, the improvement in predicting the random component is modest. The prediction accuracy and error statistics for the different models across the decomposed components are presented in Table 3.

Table 3 indicates that, when comparing the prediction accuracy and errors of the four models on the trend and cyclical components, the VMD-MF-GRU model achieves the smallest prediction errors and the highest coefficient of determination (R²) for both components. Specifically, compared to the LSTM, GRU, and Transformer models, the VMD-MF-GRU model reduces the MAE for the trend component by 10.52%, 6.74%, and 12.53%, respectively, and for the cyclical component by 14.34%, 10.53%, and 22.06%, respectively. These improvements indicate that, on average, the gap between each predicted value and its corresponding true value has narrowed, demonstrating that the model’s forecasts are now more consistently accurate. In terms of RMSE, it achieves reductions of 11.39%, 7.94%, and 15.98% for the trend component, and 12.56%, 8.81%, and 26.02% for the cyclical component. These results suggest that the model has become more sensitive to extreme fluctuations, leading to a substantial improvement in predictive accuracy. The average deviation between predicted and actual values has been reduced, contributing to stronger generalization capabilities. Regarding the MAPE, it shows decreases of 10.80%, 8.73%, and 18.51% for the trend component, and 11.83%, 9.68%, and 27.93% for the cyclical component. These significant improvements in relative prediction accuracy highlight the model’s progress. Given the scale-free nature of MAPE, this improvement verifies the model’s applicability across samples of varying magnitudes. Furthermore, R² is improved by 18.48%, 8.89%, and 23.56% for the trend component, and by 18.79%, 13.26%, and 25.91% for the cyclical component. These results demonstrate that the VMD-MF-GRU model possesses strong fitting capabilities, effectively explains changing trends, and exhibits robust generalization performance.

To further verify the statistical significance of the performance improvements achieved by the VMD-MF-GRU model, we employed the Diebold–Mariano (DM) test. The null hypothesis of this test is that the two competing models have equal predictive accuracy. We calculated the DM statistic based on both the squared loss (to assess the significance of the RMSE reduction) and the absolute loss (to assess the significance of the MAE reduction). The null hypothesis (H₀) posits no statistically significant difference in prediction accuracy between the two models. The results are presented in Table 4.

Table 4 demonstrates that the DM statistic is consistently and significantly negative across all test configurations, with corresponding p-values well below the 0.01 significance level. This indicates that the DM test provides substantial statistical evidence that the enhanced predictive capability of the proposed model is not merely due to chance. A comparison of the proposed model with other benchmark models reveals a substantial and robust improvement in predictive accuracy.

To quantify the incremental contribution of the state fusion strategy to model performance, we conducted an ablation study. We employed a variant model without the fusion module as the baseline control group, ensuring consistency with the full model in terms of data partitioning, hyperparameter configuration, and training strategy to conduct a fair and independent validation. The experimental results are presented in Table 5.

Table 5 demonstrates that removing the state fusion module leads to a noticeable decline in the predictive performance of the VMD-MF-GRU model. Specifically, compared to the full model, the variant without state fusion exhibits increases in MAE of 6.61%, 10.78%, and 2.79% for the trend, cyclical, and random components, respectively, and increases in RMSE of 7.61%, 16.38%, and 12.74%, respectively. Additionally, the R² values decrease by 3.61%, 2.88%, and 3.09%, respectively. These results provide strong evidence for the core role of the proposed state fusion strategy in capturing temporal dependencies, confirming that the resulting performance gains are statistically significant and not attributable to other incidental factors.

3.4.2. FTS Forecasting

Following the additive time series model, the predicted values of each component are superimposed to reconstruct the final FTS forecast. A comparison of the prediction results for the CSI 300 using different models is shown in Figure 8. As illustrated, the predictions generated by the VMD-MF-GRU model are the closest to the actual values and best align with the underlying trend of the data.

To further assess the generalizability of the VMD-MF-GRU model, we extended our analysis to include two additional major indices—the SSE 50 and CSI 1000—alongside the CSI 300. Together, these three indices represent the most important and representative broad-based indices in China’s A-share market, capturing different tiers in terms of market capitalization, industry coverage, and stylistic characteristics. Table 6 reports the prediction accuracy of each model across these indices.

Table 6 shows that among the four models, the VMD-MF-GRU model achieves the highest prediction accuracy. The performances of LSTM and GRU are similar, while the Transformer model performs the least favorably. A possible explanation is that the design philosophy of the Transformer model does not align well with the characteristics of financial market data. Transformers process information by analyzing and comparing all elements as a unified whole, operating under the assumption that relationships within the data are symmetric and identifiable through similarity. While this approach is highly effective for structured, correlated data such as language, it may be less robust to the noisy, non-stationary nature of FTS, which typically exhibit high noise, short-range dependencies, and limited effective data. When confronted with such data, the Transformer’s global processing capability can become a drawback, as it tends to overfit to noise or spurious correlations, thereby failing to accurately capture the locally informative signals that hold genuine predictive power. In contrast, the incremental, localized processing paradigm of traditional neural networks may be more adept at handling the complex, ambiguous, and noise-filled characteristics of financial data. The proposed VMD-MF-GRU leverages its gating mechanism to extract historical information from influencing factors and frequency-decomposed components. The integration of the multifractal spectrum width further enhances its ability to learn abrupt changes, while the state fusion mechanism improves the extraction of temporal dependencies. Consequently, the experimental results demonstrate that the VMD-MF-GRU model can more effectively capture and learn the nonlinear variation features of FTS, enabling accurate and efficient predictions with strong forecasting performance.

4. Conclusions

The gating mechanism of traditional memory network models is insufficient for extracting and learning abrupt change behaviors in FTS, leading to poor prediction performance on segments characterized by drastic booms and busts. Based on a systematic analysis of factors influencing FTS variations, this study proposes the VMD-MF-GRU model. First, VMD is used to decompose FTS data into trend, cyclical, and random components. This effective extraction and separation of the multi-scale complex modes embedded in the series enhances the prediction accuracy of stock market movements. Second, by adaptively adjusting the update gate weights based on the multifractal singularity spectrum width, the proposed model captures abrupt change trends in FTS. This improvement to the GRU’s update mechanism enables advanced focus on local temporal features, achieving dual feature extraction and ultimately enhancing prediction performance. Third, by employing a state fusion strategy within the recurrent neural network, the model achieves improved utilization of historical information. The results of comparative experiments with LSTM, GRU, and Transformer models show that the VMD-MF-GRU achieves lower MAE, MAPE, and RMSE values than the other models, along with a higher R². This indicates that the proposed model exhibits high reliability and can provide more accurate results in FTS forecasting. Therefore, it can serve as an effective tool for predicting changes in FTS data.

Abnormal stock price fluctuations often serve as a precursor to the accumulation and release of financial risks. Only through quantitative methods can these fluctuations be transformed from vague market perceptions into precise data features, enabling the capture of early warning signals before risks spiral out of control. Quantitative analysis not only characterizes the magnitude and frequency of fluctuations to distinguish rational adjustments from market failures but also provides deeper insights into the transmission paths of risks. This process of “visualizing” risk offers regulators a basis for early intervention, shifting the approach to preventing and mitigating financial risks from reactive measures to proactive prevention, thereby effectively curbing the escalation of localized fluctuations into systemic crises.

Based on this rationale, the methodological framework of this paper primarily relies on Multifractal analysis to enhance and optimize the gating mechanisms within the GRU architecture for modeling and predicting complex time series. Naturally, numerous scholars have explored various dimensions of how to further benefit from the latest theoretical advancements in nonlinear and complex time series modeling to refine financial time series forecasting. For instance, the bispectral analysis technique developed for bilinear Markov switching models [31] provides closed-form solutions for higher-order moments and density functions. On one hand, it enriches the theoretical understanding of a model’s nonlinear dynamics; on the other, it offers additional insights into the nonlinear dependence structures among the components of FTS obtained through VMD. Another example is the Log-TSV-GARCH model [32], which combines logarithmic shock transformation with a dynamic threshold mechanism, specifically designed to capture abrupt regime shifts and asymmetric shock responses. Integrating these cutting-edge theories and methods into the existing framework to improve the characterization and prediction of nonlinear features in FTS represents a key direction for our future research.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities, the Scientific Research and Innovation Project of China University of Political Science and Law, grant number 24KYGH008.

Data Availability Statement

The data that support the findings of this study are derived from the following sources: (1) Wind database (https://www.wind.com.cn), used under license; (2) National Bureau of Statistics of China, publicly available data. Restrictions apply to the Wind data, which are not publicly available but can be accessed from the corresponding author upon reasonable request and with permission from Wind Information Co., Ltd.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhu, T. Latent factor model in asset pricing: A deep learning approach in the Chinese stock market. Financ. Res. Lett. 2025, 86, 108519. [Google Scholar] [CrossRef]
Yang, J.; Chang, B.; Zhang, Y.; Luo, W.; Ge, S.; Wu, M. CNN coal and rock recognition method based on hyperspectral data. Int. J. Coal Sci. Technol. 2022, 9, 63. [Google Scholar] [CrossRef]
Chen, W.; Hussain, W.; Cauteruccio, F.; Zhang, X. Deep Learning for Financial Time Series Prediction: A State-of-the-Art Review of Standalone and Hybrid Models. CMES—Comput. Model. Eng. Sci. 2023, 139, 187–224. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Li, C.; Wang, Z.; Rao, M.; Belkin, D.; Song, W.; Jiang, H.; Yan, P.; Li, Y.; Lin, P.; Hu, M.; et al. Long short-term memory networks in memristor crossbar arrays. Nat. Mach. Intell. 2019, 1, 49–57. [Google Scholar] [CrossRef]
Yao, Y.; Zhang, Z.-Y.; Zhao, Y. Stock index forecasting based on multivariate empirical mode decomposition and temporal convolutional networks. Appl. Soft Comput. 2023, 142, 110356. [Google Scholar] [CrossRef]
Jiang, Y.; Olmo, J.; Atwi, M. Deep reinforcement learning for portfolio selection. Glob. Financ. J. 2024, 62, 101016. [Google Scholar] [CrossRef]
Martelo, S.; León, D.; Hernandez, G. Multivariate Financial Time Series Forecasting with Deep Learning. In Workshop on Engineering Applications; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
Cho, K.; Van Merriënboer, B.; Gulçehre, Ç.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1724–1734. [Google Scholar]
Li, P.; Wei, Y.; Yin, L. Research on Stock Price Prediction Method Based on the GAN-LSTM-Attention Model. Comput. Mater. Contin. 2025, 82, 609–625. [Google Scholar] [CrossRef]
Buczyński, M.; Chlebus, M.; Kopczewska, K.; Zajenkowski, M. Financial Time Series Models—Comprehensive Review of Deep Learning Approaches and Practical Recommendations. Eng. Proc. 2023, 39, 79. [Google Scholar]
Sun, G.; Deng, S. Financial Time Series Forecasting: A Comparison Between Traditional Methods and AI-Driven Techniques. J. Comput. Signal Syst. Res. 2025, 2, 86–93. [Google Scholar] [CrossRef]
Giantsidi, S.; Tarantola, C. Deep learning for financial forecasting: A review of recent trends. Int. Rev. Econ. Financ. 2025, 104, 104719. [Google Scholar] [CrossRef]
Yao, Y.; Wang, X.; Chen, W.; Chen, Z. Extreme Risk Spillover among Global Stock Markets Based on Transformer-LSTM Quantile Regression. Chin. J. Manag. Sci. 2025, 33, 1–13. [Google Scholar]
Bolchini, C.; Cassano, L.; Miele, A. Resilience of deep learning applications: A systematic literature review of analysis and hardening techniques. Comput. Sci. Rev. 2024, 54, 100682. [Google Scholar] [CrossRef]
Wang, S. A Stock Price Prediction Method Based on BiLSTM and Improved Transformer. IEEE Access 2023, 11, 104211–104223. [Google Scholar] [CrossRef]
Yang, K.; Wei, Y.; Li, S.; He, J. Asymmetric risk spillovers between Shanghai and Hong Kong stock markets under China’s capital account liberalization. N. Am. J. Econ. Financ. 2020, 51, 101100. [Google Scholar] [CrossRef]
Zhang, Y.; Dong, Z.; Xu, W. Integrative stock price trend prediction via hierarchical LLM text processing and patch-based transformer with co-attention. Expert Syst. Appl. 2026, 302, 130441. [Google Scholar] [CrossRef]
Li, S.; Xu, S. Enhancing stock price prediction using GANs and transformer-based attention mechanisms. Empir. Econ. 2025, 68, 373–403. [Google Scholar]
Tsaknaki, I.Y.; Lillo, F.; Mazzarisi, P. Bayesian autoregressive online change-point detection with time-varying parameters. Commun. Nonlinear Sci. Numer. Simul. 2025, 142, 108500. [Google Scholar] [CrossRef]
Wang, J.; Chen, Z. Factor-GAN: Enhancing stock price prediction and factor investment with Generative Adversarial Networks. PLoS ONE 2024, 19, e0306094. [Google Scholar] [CrossRef]
Li, S.; Zhang, S.; He, J.; Ren, T. How does investor sentiment impact systemic risk contagion across industries? Evidence from the Chinese stock market. Appl. Econ. 2026, 58, 517–539. [Google Scholar] [CrossRef]
Li, Y. Multifractal Characteristics of China’s Stock Market and Slump’s Fractal Prediction. Fractal Fract. 2022, 6, 499. [Google Scholar]
Li, Y. The Importance of Non-Systemically Important Banks—A Network-Based Analysis for China’s Banking System. Fractal Fract. 2023, 7, 735. [Google Scholar]
Cao, G.; Han, Y.; Li, Q.; Xu, W. Asymmetric MF-DCCA method based on risk conduction and its application in the Chinese and foreign stock markets. Phys. A Stat. Mech. Its Appl. 2017, 468, 119–130. [Google Scholar] [CrossRef]
Wang, X.; Xu, J.; Shi, W.; Liu, J. OGRU An Optimized Gated Recurrent Unit Neural Network. J. Phys. Conf. Ser. 2019, 1325, 12–89. [Google Scholar] [CrossRef]
Shi, W. Construction of adaptive gated recurrent hierarchical network with cross-modal dynamic interaction and its application in multimodal sentiment analysis. Discov. Appl. Sci. 2025, 7, 1404. [Google Scholar] [CrossRef]
Shaikh, Z.M.; Ramadass, S. Unveiling deep learning powers: LSTM, BiLSTM, GRU, BiGRU, RNN comparison. Indones. J. Electr. Eng. Comput. Sci. 2024, 35, 263. [Google Scholar] [CrossRef]
Ruan, L.; Yang, L. Investor-company interactions and stock price crash risk: Evidence from China. Res. Int. Bus. Financ. 2025, 76, 102830. [Google Scholar]
Cavicchioli, M.; Ghezal, A.; Zemmouri, I. (Bi)spectral analysis of Markov switching bilinear time series. Stat. Methods Appl. 2025, 1–30. [Google Scholar] [CrossRef]
Alraddadi, R. The—Model: A threshold-based volatility framework with logarithmic shocks for exchange rate dynamics. AIMS Math. 2025, 10, 19495–19511. [Google Scholar] [CrossRef]

$Fractalfract 10 00227 g001$

Figure 1. Structure of improved gated recurrent unit.

$Fractalfract 10 00227 g001$

$Fractalfract 10 00227 g002$

Figure 2. VMD-MF-GRU State Fusion.

$Fractalfract 10 00227 g002$

$Fractalfract 10 00227 g003$

Figure 3. Flowchart of FTS forecasting.

$Fractalfract 10 00227 g003$

$Fractalfract 10 00227 g004$

Figure 4. Time series curves of the CSI 300 index components.

$Fractalfract 10 00227 g004$

$Fractalfract 10 00227 g005$

Figure 5. Cyclical component, random component, and corresponding spectrum width over time for the CSI 300 index: (a) cyclical component; (b) random component.

$Fractalfract 10 00227 g005$

$Fractalfract 10 00227 g006$

Figure 6. Decomposed component predictions for the CSI 300 index using different models. (a) trend component prediction; (b) cyclical component prediction; (c) random component prediction.

$Fractalfract 10 00227 g006$

$Fractalfract 10 00227 g007$

Figure 7. Predictions for the cyclical and random components in abrupt-change segments using different models. (a) cyclical component prediction; (b) random component prediction.

$Fractalfract 10 00227 g007$

$Fractalfract 10 00227 g008$

Figure 8. Comparison of prediction results for the CSI 300 using different models.

$Fractalfract 10 00227 g008$

Table 1. Prediction performance of the VMD-MF-GRU under different parameters.

Number of Hidden Layers	Nodes in Layer 1	Nodes in Layer 2	Nodes in Layer 3	MAE	RMSE	MAPE	R²
2	16	16	—	0.4754	0.5642	0.0220	0.5915
2	32	16	—	0.5398	0.6777	0.0268	0.4512
2	32	32	—	0.4583	0.5235	0.0198	0.7523
2	64	32	—	0.4844	0.5732	0.0216	0.6245
2	64	64	—	0.5258	0.5946	0.0229	0.6865
3	16	16	16	0.4777	0.5814	0.0217	0.6732
3	32	16	16	0.5498	0.6372	0.0241	0.6647
3	32	32	16	0.4989	0.5781	0.0227	0.5817
3	32	32	32	0.5573	0.6843	0.0252	0.4957

Table 2. RMSE of prediction results under different input window settings.

Window Size	RMSE	Window Size	RMSE
3	0.0844	15	0.0952
6	0.0836	18	0.1185
9	0.0809	21	0.1294
12	0.0816	24	0.1396

Table 3. Prediction accuracy and errors of different models for the CSI 300 components.

Model	Components	MAE	RMSE	MAPE (%)	R²
LSTM	Trend	0.0913	1.1870	0.0388	0.7079
	cyclical	0.2252	0.0852	0.0538	0.6898
	random	0.2582	0.0834	0.4118	0.6694
GRU	Trend	0.0876	1.1425	0.0379	0.7702
	cyclical	0.2156	0.0817	0.0525	0.7235
	random	0.2225	0.0996	0.3291	0.6958
Transformer	Trend	0.0934	1.2518	0.0425	0.6788
	cyclical	0.2475	0.1007	0.0659	0.6508
	random	0.2723	0.0982	0.5721	0.5791
VMD-MF-GRU	Trend	0.0817	1.0518	0.0346	0.8387
	cyclical	0.1929	0.0745	0.0475	0.8194
	random	0.2258	0.0871	0.3318	0.6595

Table 4. Diebold–Mariano test results.

Compared Model	Components	Loss Function	DM Statistic	p-Value
LSTM	Trend	Squared Error	−4.285	0.0001 ***
	Trend	Absolute Error	−4.127	0.0001 ***
	cyclical	Squared Error	−4.823	0.0001 ***
	cyclical	Absolute Error	−4.251	0.0001 ***
GRU	Trend	Squared Error	−3.572	0.0001 ***
	Trend	Absolute Error	−3.146	0.0001 ***
	cyclical	Squared Error	−3.967	0.0001 ***
	cyclical	Absolute Error	−3.552	0.0001 ***
Transformer	Trend	Squared Error	−4.723	0.0001 ***
	Trend	Absolute Error	−4.093	0.0001 ***
	cyclical	Squared Error	−5.124	0.0001 ***
	cyclical	Absolute Error	−4.896	0.0001 ***

Note: *** is the 1% significance level.

Table 5. Results of ablation experiments.

Model	Components	MAE	RMSE	MAPE (%)	R²
VMD-MF-GRU (with state fusion strategy)	Trend	0.0817	1.0518	0.0346	0.8387
	cyclical	0.1929	0.0745	0.0475	0.8194
	random	0.2258	0.0871	0.3318	0.6595
VMD-MF-GRU (without state fusion strategy)	Trend	0.0876	1.1425	0.0379	0.7702
	cyclical	0.2156	0.0817	0.0525	0.7235
	random	0.2225	0.0996	0.3291	0.6958

Table 6. Prediction accuracy and errors of different models for the SSE 50, CSI 300, and CSI 1000.

Index	Model	MAE	RMSE	MAPE (%)	R²
SSE 50	LSTM	0.5721	1.4269	0.4157	0.7821
	GRU	0.5266	1.3465	0.3703	0.7266
	Transformer	0.6247	1.5326	0.5269	0.6247
	VMD-MF-GRU	0.4811	1.2192	0.3488	0.8211
CSI 300	LSTM	0.5624	1.3978	0.3269	0.7624
	GRU	0.5168	1.3168	0.3016	0.7516
	Transformer	0.6123	1.5267	0.4206	0.6123
	VMD-MF-GRU	0.4181	1.1282	0.2192	0.8418
CSI 1000	LSTM	0.5817	1.4398	0.4267	0.7581
	GRU	0.5326	1.3382	0.3641	0.7132
	Transformer	0.6429	1.5807	0.4334	0.6429
	VMD-MF-GRU	0.4583	1.2571	0.2314	0.8045

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Y. An Improved GRU Financial Time Series Prediction Model. Fractal Fract. 2026, 10, 227. https://doi.org/10.3390/fractalfract10040227

AMA Style

Li Y. An Improved GRU Financial Time Series Prediction Model. Fractal and Fractional. 2026; 10(4):227. https://doi.org/10.3390/fractalfract10040227

Chicago/Turabian Style

Li, Yong. 2026. "An Improved GRU Financial Time Series Prediction Model" Fractal and Fractional 10, no. 4: 227. https://doi.org/10.3390/fractalfract10040227

APA Style

Li, Y. (2026). An Improved GRU Financial Time Series Prediction Model. Fractal and Fractional, 10(4), 227. https://doi.org/10.3390/fractalfract10040227

Article Menu

An Improved GRU Financial Time Series Prediction Model

Abstract

1. Introduction

2. Fundamental Theories and Methods

2.1. VMD

2.2. Multifractal Analysis

2.3. Improved GRU

2.4. State Fusion Strategy for VMD-MF-GRU

2.5. FTS Forecasting Workflow

3. Experimental Design

3.1. Data Sources

3.2. Parameter Comparison Experiment

3.3. Decomposition of FTS Data and Its Corresponding Multifractal Spectrum Characteristics

3.4. Prediction Results of the VMD-MF-GRU

3.4.1. Frequency-Division Prediction of FTS

3.4.2. FTS Forecasting

4. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI