A Hybrid Model for Stock Index Forecasting Integrating Adaptive Frequency-Domain Decomposition and Enhanced Transformer Encoder

Zheng, Hairong; Zeng, Xiaozheng; Hu, Guoyu; Zhang, Tingting

doi:10.3390/math14122202

Open AccessArticle

A Hybrid Model for Stock Index Forecasting Integrating Adaptive Frequency-Domain Decomposition and Enhanced Transformer Encoder

¹

College of Economics and Management, Fujian Agriculture and Forestry University, Fuzhou 350002, China

²

Agriculture and Forestry Artificial Intelligence Research Institute, College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2026, 14(12), 2202; https://doi.org/10.3390/math14122202

Submission received: 6 May 2026 / Revised: 15 June 2026 / Accepted: 16 June 2026 / Published: 18 June 2026

Download

Browse Figures

Versions Notes

Abstract

Stock index price series are composed of superimposed multi-frequency components, including long-term trends, cyclical fluctuations, and stochastic noise. Effectively decoupling these heterogeneous components and modeling them separately is key to improving forecasting accuracy. Existing methods under the “decomposition–prediction” paradigm mostly employ fixed-scale decomposition, and the forecasting models are not specifically adapted to the non-stationary and high-noise characteristics of financial data, resulting in limitations in adaptivity and local dynamic capture. This paper proposes a frequency-aware adaptive multi-scale decomposition Transformer hybrid model (FAMS-Transformer). At the decomposition level, the fast Fourier transform is used to dynamically identify dominant cycles, thereby adaptively decoupling trends and fluctuations, overcoming the limitations of fixed-scale decomposition. At the forecasting level, a lightweight depthwise separable convolution is embedded between the self-attention and feedforward network of the Transformer encoder, enhancing the model’s ability to capture local temporal dynamics and achieving collaborative modeling of global dependencies and local information. Comparative experiments with 15 baseline models including LSTM, Transformer, TimesNet, and FreTS on three representative Chinese market indices—Shanghai Composite Index, Shenzhen Component Index, and Small and Medium Enterprises 100 Index—across four prediction horizons from one step to 15 steps demonstrate that FAMS-Transformer achieves the best forecasting accuracy in all scenarios. The coefficient of determination for 15-step prediction remains stably between 0.730 and 0.928. Moreover, the model still performs well on the S & P 500 dataset. Ablation studies and significance tests further validate the effectiveness of each core module and the statistical significance of the performance improvements.

Keywords:

stock index forecasting; adaptive frequency-domain decomposition; Transformer; multi-step forecasting

MSC:

68T07; 91G80; 62M20

1. Introduction

Accurate forecasting of stock index prices is a central issue in financial research, as its outcomes directly inform national economic policy making and the optimization of market investment decisions [1]. However, stock index price series inherently contain multi-frequency components, including long-term trends, cyclical fluctuations, and random noise [2], whose statistical properties and evolutionary patterns differ markedly, making the forecasting task extremely challenging. Consequently, how to effectively decouple and separately model these heterogeneous components is crucial for improving prediction accuracy. Reviewing the evolution of stock forecasting methods, Black and Scholes (1973) [3] and Merton (1973) constructed a general continuous-time pricing framework [4]. Since then, stochastic volatility models (Heston, 1993) and jump-diffusion extensions (Cont and Tankov, 2004) have enriched the parametric description system of financial time series [5,6]. However, such models have strict distribution assumptions. When the distribution assumption fails, the pricing deviation or risk measurement error of such models will increase significantly. In contrast, researchers have improved the model’s ability to characterize the nonlinear characteristics of financial sequences through traditional statistical models such as ARIMA and GARCH [7], as well as machine learning methods such as SVR and LSTM [8,9]. In recent years, the introduction of the Transformer [10] has brought a new technical paradigm to sequence modeling; its self-attention mechanism can effectively capture long-range dependencies and has significant advantages in dealing with stock prediction tasks [11].

However, these models share a common limitation: when directly processing raw stock series, the model must simultaneously handle the superposition of trends, multiple periodic fluctuations, and noise, which not only increases the learning difficulty but also renders the model overly sensitive to local noise, especially when the prediction step length is prolonged. To overcome this limitation, the “decomposition–forecasting” hybrid modeling paradigm has begun to gain attention in stock forecasting [1,12,13,14]. The core idea of this paradigm is to first decompose the original series into relatively simple components through data preprocessing techniques, then forecast each component separately and fuse the results. The existing research methods have effectively integrated the data-driven adaptive decomposition methods such as EMD, EEMD, CEEMDAN and VMD with the predictor, successfully separated the signal and noise, and greatly improved the prediction accuracy [15,16].

Furthermore, Fourier transform and wavelet transform, as classical frequency-domain analysis tools, have been successfully applied to time series tasks with clear physical laws such as wind speed prediction and power load prediction [17,18], and provide a technical path distinct from EMD-type methods, while their application of such methods in the field of stock forecasting is still in its infancy. Moreover, the decomposition scale used is mostly pre-set, which makes it difficult to flexibly adapt to the significant differences in the cyclical structure of different stocks and different market periods [8,19].

Unlike the above fixed-scale frequency-domain decomposition strategies, in the general time series forecasting domain, some cutting-edge research has begun to explore more flexible frequency-domain modeling approaches. TimesNet automatically identifies multi-periodic components in time series via FFT and reshapes the one-dimensional sequence into two-dimensional tensors to leverage convolutional networks for modeling [20]; FEDformer utilizes Fourier transform in a frequency enhancement module to capture the global properties of the series [21]; FreTS designs a frequency-domain MLP architecture that achieves efficient forecasting by focusing on key frequency components [22]. Although these models have achieved progress on their respective tasks, they were not specifically designed to accommodate the non-stationarity and heavy noise characteristics of financial data [23], and their transfer potential to stock forecasting tasks has not yet been fully verified.

At the same time, choosing the right prediction model is another complex challenge. The Transformer encoder excels at capturing global dependencies, but its standard self-attention mechanism is insufficiently sensitive to the local temporal patterns of sequences [24]. To address this deficiency, LogTrans adopted convolutional self-attention to enhance local information capture [25]. However, such improvements are mostly generic architectural optimizations and rarely consider the specific requirements of stock forecasting scenarios—namely, how to enhance the perception of typical financial time series features such as short-term trend reversals and local volatility clustering through a lightweight design while preserving the advantage of global dependency modeling.

In summary, although the existing research shows the advantages and effectiveness of the decomposition strategy and the artificial intelligence model in stock price forecasting, there are still some shortcomings. First, existing frequency-domain decomposition methods rely on fixed scales in stock prediction, and do not specifically adapt to financial data characteristics, lacking a mechanism to adaptively adjust the decomposition strategy according to the inherent cyclical structure of the stock price series itself. Second, Transformer encoders lack an effective means to achieve lightweight collaborative modeling of global dependencies and local dynamics when modeling stock price series.

To address the above issues, this paper proposes a Frequency-Aware Multi-Scale Decomposition Transformer (FAMS-Transformer) predictive framework, which achieves refined modeling of financial time series through a hybrid architecture of frequency-domain adaptive decomposition and enhanced encoding. Based on three representative indices in the Chinese market—the Shanghai Composite Index, the Shenzhen Component Index, and the SME 100 Index—and incorporating various external market factors such as carbon trading prices, crude oil prices, global stock indices, exchange rates, macroeconomic variables, and sentiment indicators, we set four forecast horizons of one step, five steps, 10 steps, and 15 steps, and systematically compare with fifteen baseline models: LSTM, SVR, Transformer, Autoformer, Informer, iTransformer, PatchTST, TimesNet, FreTS, DLinear, Koopa, LightTS, FEDformer, TiDE and FiLM. The main contributions of this paper are reflected in the following three aspects:

First, a frequency-aware adaptive multi-scale decomposition mechanism tailored for stock forecasting is proposed. Unlike the fixed-scale decomposition or globally uniform frequency selection strategies commonly adopted in existing research, this paper utilizes the fast Fourier transform (FFT) to dynamically identify the dominant periods of each input sequence in the frequency domain, based on which the trend decomposition scale is adaptively determined, thereby achieving the decoupling of trend and fluctuation in the time domain. This mechanism makes the decomposition process entirely driven by the cyclical characteristics of the stock data itself, avoiding the constraints of preset parameters on the decomposition effect, and provides a new solution for the adaptive modeling of non-stationary and heavily noisy financial sequences.

Second, a Transformer encoder integrated with an intermediate enhancement mechanism is constructed to realize collaborative modeling of global dependencies and local dynamics. This paper embeds a lightweight depthwise separable convolution module between the standard self-attention layer and the feed-forward network, which retains the advantage of self-attention in capturing long-range dependencies while enhancing the encoder’s ability to perceive short-term temporal patterns and local fluctuation features. This design compensates for the insensitivity of traditional Transformer encoders to local sequence information at a relatively low computational cost, providing a more balanced architectural scheme for sequence encoding in stock forecasting scenarios.

Third, the forecasting performance and robustness of the FAMS-Transformer hybrid model are validated through a systematic empirical study. Comparative experiments on three Chinese stock indices with different volatility characteristics, covering both short- and long-horizon multi-step forecasting scenarios, demonstrate that FAMS-Transformer outperforms the fifteen baseline models on all evaluation metrics; in the most challenging 15-step forecasting, the coefficient of determination remains stable between 0.730 and 0.928.

The ablation experiments confirm the independent contributions and complementary effects of the adaptive decomposition module and the intermediate enhancement depth separable one-dimensional convolution module, and significance testing further verifies the statistical reliability of the performance advantages. The experiment on the SP500 dataset further verifies the generalization ability of the model, indicating that the FAMS-Transformer hybrid model exhibits good robustness under different market structures. The remainder of this paper is organized as follows: Section 2 reviews related work; Section 3 elaborates on the architecture and core mechanisms of the FAMS-Transformer model; Section 4 details the experiment designed to verify the training effect of the model; Section 5 reports the results and discussion of the experiment; Section 6 concludes the paper, discusses the research limitations, and outlines future directions.

2. Related Work

This paper aims to improve the multi-step forecasting accuracy of stock index prices through a hybrid architecture that integrates adaptive frequency-domain decomposition with an enhanced Transformer encoder. Centered on this objective, this section reviews the relevant literature from three perspectives: the decomposition–forecasting paradigm in stock prediction, the application of frequency-domain decomposition methods to financial time series, and the use of Transformer encoders in stock price forecasting.

2.1. Research on the Decomposition–Forecasting Paradigm in Stock Prediction

Stock index price series are characterized by inherent non-stationarity, high noise levels, and the superposition of multiple frequency components [2], posing significant challenges to accurate forecasting. As early as 1973, Black and Scholes proposed the assumption that the stock price obeys the geometric Brownian motion [3]. On this basis, Merton systematically extended the continuous-time option pricing theory [4], which laid a mathematical foundation for all subsequent continuous-time financial models. Although continuous-time models have evolved [5,26], they rely on fixed mathematical assumptions and are difficult to adapt to complex nonlinear changes in the stock market [6]. Meanwhile, traditional statistical models such as ARIMA and GARCH rely on strong linear assumptions [27,28] and exhibit limited performance when handling nonlinear financial series. With the development of machine learning technology, nonlinear models such as SVR, LSTM, and GRU can learn complex patterns adaptively from data without pre-setting strict distribution assumptions, and gradually become mainstream tools in the field of stock forecasting [29,30]. However, it is difficult for these models to effectively distinguish the differential evolution of long-term trends, cyclical fluctuations and random noise. Consequently, as the forecast horizon lengthens, the risk of overfitting to noise increases significantly and predictive performance deteriorates markedly. This has prompted researchers not only to optimize the prediction model, but also to improve data preprocessing techniques.

The ‘decomposition–prediction’ hybrid modeling paradigm can effectively capture key time-frequency information, reduce data noise and greatly improve the prediction ability of financial models by transforming time series data into intrinsic mode functions (IMFs) with unique spectral structure and periodic characteristics [31]. Empirical mode decomposition (EMD), proposed by Huang et al. [32], is the most representative adaptive decomposition method under this paradigm and can decompose non-stationary signals into a number of intrinsic mode functions (IMFs) under weak linear assumptions. Nevertheless, standard EMD suffers from deficiencies such as mode mixing and sensitivity to local extrema. EEMD [33], CEEMDAN [16], VMD [34] and so on have been improved by different methods.

Building on these decomposition techniques, a series of hybrid frameworks tailored for stock forecasting have been proposed. Gao’s StockCI model integrates CEEMDAN with Informer: it first decomposes the original stock price series into multiple IMFs using CEEMDAN to reduce non-stationarity, then applies the ProbSparse self-attention mechanism of Informer to model each component over long sequences, outperforming baseline models such as ARIMA, RNN, and LSTM on high-frequency A-share market data [12]. The CoML framework proposed by He et al. adopts a three-stage strategy of decomposition–reconstruction–forecasting. After CEEMDAN decomposition, a fine-to-coarse algorithm reconstructs the components into high-frequency fluctuation terms and low-frequency trend terms, which are then modeled separately by BiLSTM, SVR, and MLP and ensembled, demonstrating strong performance in both emerging and developed markets [1]. The ASDH model by Ge and Lin further introduces an adaptive selection mechanism that automatically matches the optimal predictor among algorithms such as GABP, KNN, and ARMA according to the frequency characteristics of each component [14]. The CVASD-MDCM-Informer framework proposed by Su et al. employs an adaptive VMD optimized by the CPO algorithm for decomposition and incorporates a multi-scale dilated convolution module to enhance the capacity to capture short-term fluctuations and long-term trends [13]. The above research consistently shows that using data preprocessing techniques to decompose time series has better performance than traditional methods that directly predict the original sequence.

However, it is worth noting that the decomposition strategy used in the above methods performs a unified decomposition operation on the entire input sequence. Such a strategy struggles to flexibly adapt to the dynamic changes in cyclical structure across different time windows. When market conditions undergo rapid shifts, a fixed decomposition pattern may fail to promptly capture newly emerging dominant cycles, thereby limiting the model’s adaptability in complex market environments.

2.2. Application of Frequency-Domain Decomposition Methods to Financial Time Series

Unlike EMD-type methods that perform decomposition in the time domain, the Fourier transform and wavelet transform provide an alternative technical path for signal decomposition from a frequency-domain perspective. The Fourier transform can map a time-domain signal into the frequency domain, separating different frequency components hidden in the time-domain observations [35], and the introduction of the fast Fourier transform (FFT) enables this transformation to be implemented efficiently with low computational complexity [19]. It should be pointed out that the exploration of this kind of frequency-domain transformation method in the field of financial forecasting is still in its infancy. Chen and Chen combined FFT with a fuzzy time series model and applied it to the prediction of Taiwan Weighted Index and Dow Jones Industrial Average Index [19]. Zhang et al. proposed a WT-ARIMA-LSTM hybrid model that uses wavelet transform for multi-scale decomposition of stock index futures prices, with ARIMA and LSTM respectively processing components of different frequencies [8]. Jin et al. combined EMD with an attention-based LSTM and incorporated investor sentiment analysis [36]. These works have achieved good improvement results, and preliminarily verified the feasibility of frequency-domain analysis in stock forecasting. However, most of them adopt pre-set fixed decomposition scales and fail to dynamically adjust according to the local statistical characteristics of input data.

In the field of general time series prediction, researchers have proposed more flexible frequency-domain modeling methods such as TimesNet [20] and FEDformer [21]. In more recent studies, Yemets et al. proposed an FFT-based feature extraction scheme that extends the input Transformer with the phase and magnitude information of complex numbers as additional features, validating the effectiveness of fusing time-domain and frequency-domain information on multiple datasets [37]. Zhang et al. [38] provided a systematic review of the applications of the Fourier transform, Laplace transform, and wavelet transform in time series analysis, pointing out that frequency-domain transforms possess unique advantages in capturing global frequency components.

However, the aforementioned general frequency-domain models were not specifically designed to adapt to the characteristics of financial data [23]. The dominant cycles in financial time series may drift with changes in the market environment, whereas models such as TimesNet and FEDformer adopt a globally uniform strategy in the selection of frequency components, i.e., applying the same frequency selection rules to all input sequences, making it difficult to adequately capture such dynamic changes. Therefore, how to achieve adaptive decomposition in the frequency domain that matches the cyclical characteristics of the stock price series itself remains a problem that requires further in-depth exploration.

2.3. Transformer Architecture and Its Application in Stock Prediction

Transformer was first proposed by Vaswani et al. [10], with its core innovation being the replacement of traditional recurrent or convolutional structures with a self-attention mechanism, thereby enabling efficient parallelized modeling of dependencies between any positions in a sequence. This advantage has allowed it to rapidly expand from the field of natural language processing to time series forecasting tasks [39]. Wang et al. applied Transformer to stock index forecasting and achieved superior predictive performance over traditional methods such as LSTM on four major global indices—the CSI 300, S & P 500, Hang Seng Index, and Nikkei 225—validating the potential of the self-attention mechanism in financial time series modeling [11].

However, the self-attention mechanism of the standard Transformer suffers from a structural deficiency when encoding stock series. The attention weights computed via dot-product operations reflect the global similarity between any two points in the sequence, and they are insensitive to temporal features such as local fluctuation patterns and short-term trend reversals embedded in adjacent time steps [22]. In stock forecasting scenarios, short-term volatility clustering and abrupt trend reversals are common market phenomena, and the low sensitivity of standard self-attention to such local dynamics may lead to delayed model responses.

To address this issue, researchers have proposed improvement schemes from different perspectives, enhancing the Transformer across various dimensions such as computational efficiency, sequence decomposition, or feature partitioning. Examples include LogTrans [25], Reformer [40], Informer [41], Autoformer [42], and PatchTST [43].

However, these works have rarely been specifically designed to meet the need for collaborative modeling of global dependencies and local dynamics in stock forecasting scenarios, and they are unable to simultaneously achieve the synergistic optimization of global dependency modeling and local dynamic capture in such scenarios.

To overcome the above shortcomings, the FAMS-Transformer hybrid model proposed in this paper, at the decomposition level, dynamically identifies dominant cycles using FFT, thereby remedying the flexibility deficiency of fixed-scale decomposition. At the encoding level, a lightweight depthwise separable convolution is embedded between the self-attention layer and the feed-forward network within the Transformer encoder, preserving the advantage of global dependency modeling while enhancing the capacity to perceive local temporal dynamics. The subsequent chapters will systematically elaborate on the design principles and implementation details of each module.

3. Method

This paper proposes a hybrid model, FAMS-Transformer, which takes into account the influence of external stock market factors and enables multi-step forecasting. It is an end-to-end training process consisting of three modules: a frequency-adaptive decomposition module, a predictor module (Enhanced Transformer Encoder), and a feature fusion module. This section will elaborate on the specific working principle of each module.

3.1. Frequency-Adaptive Decomposition Module

The trend, period and noise components superimposed together in the time domain correspond to different frequency ranges in the frequency domain; in order to convert the time domain data to the frequency domain, we employ the Fourier transform method for data preprocessing. However, the standard Fourier transform requires that the original signal be a continuous function, expressed mathematically in integral form. Since stock price index data are discrete, we adopt the fast Fourier transform (FFT) [17] to perform the decomposition, so as to effectively capture the periodic trend patterns inherent in the data. Accordingly, the discrete input Xt is transformed into the frequency domain:

X_{t} = F F T (X_{t}) = A (X_{t}) e^{j Φ (X_{t})} = {[\sum_{n = 0}^{N - 1} X_{t} [t, n] \cdot e^{- \frac{2 π i}{N} k n}]}_{k = 0}^{N - 1} = R e (\sum_{n = 0}^{N - 1} X_{t} [t, n] \cdot e^{- \frac{2 π i}{N} k n}) + i \cdot I m (\sum_{n = 0}^{N - 1} X_{t} [t, n] \cdot e^{- \frac{2 π i}{N} k n}), \forall t \in {1, 2, \dots, T}

(1)

where

X_{t} \in C^{\frac{T}{2}} \times N

denotes the output of the fast Fourier transform (FFT), representing the time series data in the frequency domain;

A (X_{t})

and

Φ (X_{t})

are the amplitude and phase, respectively. To simplify the notation, we introduce the abbreviation

X_{t} = R e (X_{t}) + i \cdot I m (X_{t})

, where

R e (X_{t})

and

I m (X_{t})

correspond to the real and imaginary parts in Equation (1). In the subsequent sections, for brevity, we refer to Equation (1) as the FFT.

For stock index forecasting tasks, different indices and technical indicators often exhibit distinct numerical ranges and fluctuation amplitudes. Standardization can effectively mitigate the impact of scale inconsistencies and thus enhance the model’s generalization capability. Therefore, before applying the FFT along the temporal dimension, the model first performs standardization on the original time series at the input layer. Given an input sequence

X \in R^{B \times L \times D}

, its statistical characteristics are defined as

μ = mean (X)

and

σ = std (X)

, from which a standardized representation

X^{'}

is obtained. Here,

μ

and

σ

represent the mean and standard deviation along the temporal dimension, respectively, and

ϵ

is a numerical stability term.

X^{'} = \frac{X - μ}{σ} \in R^{B \times L \times D}

(2)

After completing the input standardization, this paper performs periodic modeling of the time series in the frequency domain and implements trend-wave decomposition in the time domain. Given the normalized sequence

X^{'} \in R^{B \times L \times D}

, the fast Fourier transform is first performed in the time dimension:

X_{f} = FFT (X^{'})

(3)

The TimesNet model proposed by Wu et al. is based on the idea of FFT-based period discovery and amplitude-weighted fusion. It has outstanding modeling ability in dealing with strong periodic data [20]. TimesNet assumes that the frequency amplitude can reflect the relative strength of the corresponding periodic components in the current input sequence, so it can be used as an important reference for the fusion of different periodic representations. In this paper, frequency intensity is used as a data-driven dynamic prior to adjust the relative contribution of the corresponding features of different frequency structures in the fusion stage. The specific fusion process will be elaborated in Section 3.3.

Next, the frequency amplitude of a frequency component calculated by fast Fourier transform (FFT) is calculated:

A = m e a n (∣ X_{f} ∣)

(4)

Among them,

A \in R^{L / 2}

represents the average intensity of different frequency components. The larger the amplitude of the frequency component, the more significant the dominant oscillation mode in the corresponding sequence. Therefore, the first k frequency indexes with the largest amplitude are selected:

{f_{1}, f_{2}, \dots, f_{k}} = TopK (A)

(5)

And they are mapped to the corresponding period:

p_{i} = ⌊\frac{L}{f_{i}}⌋, i \in {1, \dots, k}

(6)

Then, based on the estimated period set

\{p_{1}, \dots, p_{k}\}

, we select the maximum period as the trend scale:

p_{trend} = m a x (p_{1}, \dots, p_{k})

(7)

The trend component is extracted by dynamic moving average:

X_{trend} = MovingAvg (X^{'}, p_{trend})

(8)

Among them, MovingAvg represents the one-dimensional average pooling operation with boundary filling. The fluctuation component is defined as:

X_{vol} = X^{'} - X_{trend}

(9)

In this process, if the frequency decomposition process is not handled properly, it may lead to the model’s over-reliance on the periodic structure in the sample within the input window, resulting in data leakage. Therefore, for each test sample, the dominant frequency and corresponding period are only calculated by the historical input window of the sample, and the predicted interval data are not involved in FFT period identification, decomposition scale construction or fusion weight calculation.

3.2. Enhanced Transformer Predictor Module

Transformer adopts an encoder–decoder architecture. The encoder compresses the key information of the input sequence into a fixed-length vector, and then the decoder converts it into an output result. The encoder–decoder architecture provides a solution for processing long sequence data. Therefore, the Transformer family has developed rapidly in the field of time series prediction in recent years [24].

The multi-head self-attention mechanism in the Transformer encoder collects and integrates information from different representation subspaces by running multiple attention heads in parallel, thereby achieving richer feature extraction and enhancing the model’s ability to capture the evolution pattern of the input sequence. The self-attention mechanism proposed by Vaswani et al. [10] is defined as follows:

A t t e n t i o n (Q, K, V) = S O F T M A X (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(10)

where Q = XW^Q, K = XW^K, and V = XW^V are the query, key, and value matrices, respectively, obtained as the outputs of three distinct linear projections of the same input.

In the multi-head attention mechanism, each attention function is executed in parallel with the projection versions of the query, key, and value matrices. Then the output of all attention functions is spliced together through the linear layer to produce the final result. The formula of multi-head attention is expressed as:

M u l t i H e a d (Q, K, V) = C o n c a t ({h e a d}_{1}, \dots, {h e a d}_{h}) W^{O}

(11)

{h e a d}_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(12)

where i = 1, …, h,

W_{i}^{Q}, W_{i}^{K}, W_{i}^{V}

is the weight of the corresponding network.

In essence, each attention head can be seen as observing a sequence from different perspectives: some focus on short-term dependencies, some highlight periodic patterns, and some capture long-term trends or local window structures, thereby effectively capturing long-term dependencies. Figure 1 shows the mechanism structure of multi-headed attention.

Differently from the original encoder proposed by Vaswani et al., and in order to reduce the computational overhead while taking into account the global dependency modeling and local dynamic capture capabilities, this paper introduces a lightweight deep separable convolution module between self-attention and feedforward networks. The self-attention first completes the global context aggregation. The depthwise separable convolution imposes local timing constraints on its output features. The feedforward network then performs nonlinear transformation on the representations that fuse global and local information. The three are functionally complementary. Figure 2 shows the improved transformer encoder architecture.

Deep separable convolution decomposes the standard convolution operation into two steps: the first step is channel-by-channel convolution, which independently processes the temporal dynamic features of each channel and effectively reduces redundant calculations; the second step is point-by-point convolution, which further integrates multi-channel information through 1 × 1 convolution, while avoiding over-modeling of the correlation between channels [44]. Deep separable convolution decouples time modeling and cross-feature interaction, which not only reduces the number of parameters, but also does not damage the performance of the model [45,46,47]. Figure 3 shows a simplified schematic of deep separable convolution.

After the frequency decomposition is completed, the model separately models the resulting decomposed components, namely the trend component

X_{trend}

and the fluctuation component

X_{vol}

. For a given input sequence

X \in R^{B \times L \times D}

, the time series is partitioned into multiple patches of length P:

X^{*} \to {P_{1}, P_{2}, \dots, P_{N}}

, and an embedded representation

Z \in R^{B \cdot D \times N \times d_{model}}

is obtained through linear mapping, where

N

denotes the number of patches.

\tilde{Z} = Z + Dropout (Attention (Z))

(13)

Each layer goes through the Self-Attention block, Conv block, FFN block three modules in turn:

Z_{mid} = Conv 1 D (\tilde{Z})

(14)

Z^{'} = \tilde{Z} + Dropout (Z_{mid})

(15)

Z^{(l)} = EncoderLayer (Z^{(l− 1)})

(16)

Conv1D uses depthwise separable convolutions (groups =

d_{model}

) and performs local modeling on the patch dimension.

Z_{out} = Z^{'} + FFN (Z^{'})

(17)

After the Transformer encoder, the output shape is

Z_{out} \in R^{B \times D \times d_{model} \times N}

, and the final prediction

Y \in R^{B \times T \times D}

is obtained through a linear mapping.

3.3. Feature Fusion Module

We perform feature fusion on the trend component

X_{trend}

and the fluctuation component

X_{vol}

output by the encoder to achieve adaptive fusion of different dynamic modes. Subsequently,

Y_{trend}

and

Y_{vol}

are predicted.

Y_{trend} = f (X_{trend})

(18)

Y_{vol} = f (X_{vol})

(19)

In TimesNet, the frequency amplitude is normalized by Softmax, and the weighted fusion of different periodic representations is carried out [20]. In this paper, the frequency intensity W is used as the weight reference for the adaptive fusion of the trend component and the fluctuation component. The specific process is shown in Equations (20)–(24). It is worth noting that the frequency intensity is only used as a frequency-aware empirical dynamic weighting strategy, rather than an optimal weight estimation with strict theoretical guarantees.

W = Softmax (A_{f_{1}}, \dots, A_{f_{k}})

(20)

s = m e a n (W)

(21)

The learnable parameters are introduced to allocate the basic weights:

w = Softmax (w_{1}, w_{2})

(22)

Using frequency intensity information to construct dynamic weights:

S = Softmax (A_{f_{1}}, \dots, A_{f_{k}})

(23)

s = m e a n (S)

(24)

.

Among them, the weight is

α = w_{1} \cdot s

,

β = w_{2} \cdot (2 - s)

.

Finally, we express the fusion result as:

Y = \frac{α \cdot Y_{trend} + β \cdot Y_{vol}}{α + β}

(25)

The predicted output

\hat{Y} \in R^{B \times T \times 1}

of the model only corresponds to the closing index. Since the input is standardized before training, de-standardization is required:

{\hat{Y}}_{f i n a l} = \hat{Y} \cdot σ + μ

(26)

3.4. The Process of the Proposed Model

In this study, a hybrid model of stock prediction combining adaptive frequency-domain decomposition and enhanced Transformer coding is constructed. The model successfully realizes the frequency-domain adaptive decomposition of stock price sequence by using the fast Fourier transform.

Then, the prediction is carried out by the enhanced coded Transformer, and the frequency intensity is introduced as the weight reference, and the trend component and the fluctuation component are adaptively fused to realize the refined modeling of the financial time series.

Algorithm 1 shows the pseudocode of the training program of the proposed time series prediction model based on the adapted transformer neural network and the feature extraction scheme based on FFT. Figure 4 shows the flowchart of the FAMS-Transformer hybrid model.

Algorithm 1 shows the processing flow of FAMS-Transformer hybrid model.

Algorithm 1: Overall Forecasting Procedure

Input: Historical multivariate time series

X \in R^{B \times L \times D}

;

prediction horizon H; number of dominant periods k;

patch length P; stride S

Output: Forecasted sequence

\hat{Y} \in R^{B \times H \times C}

1 Input normalization

2

μ \leftarrow Mean (X)

3

σ < \sqrt{Var (X)} + ϵ

4

X_{norm} \leftarrow (X - μ) / σ

5 Frequency-adaptive decomposition

6

P, A \leftarrow AdaptivePeriodExtraction (X_{norm}, k)

7

p_{trend} \leftarrow \max (P)

8

X_{trend} \leftarrow MovingAvgDynamic (X_{norm}, p_{trend})

9

X_{vol} \leftarrow X_{norm} - X_{trend}

10 Dual-branch forecasting

11

{\hat{Y}}_{trend} \leftarrow PatchTSTBranch (X_{trend}, P, S)

12

{\hat{Y}}_{vol} \leftarrow PatchTSTBranch (X_{vol}, P, S)

13 Frequency-aware fusion

14

w \leftarrow Softmax (w_{fusion})

15

s \leftarrow Mean (Softmax (A))

16

w_{trend} \leftarrow w_{1} \cdot s

17

w_{vol} \leftarrow w_{2} \cdot (2 - s)

18

{\hat{Y}}_{norm} \leftarrow \frac{w_{trend} ⊙ {\hat{Y}}_{trend} + w_{vol} ⊙ {\hat{Y}}_{vol}}{w_{trend} + w_{vol} + ϵ}

19 De-normalization

20

\hat{Y} \leftarrow {\hat{Y}}_{norm} ⊙ σ + μ

21

return \hat{Y}

4. Experimental Design

4.1. Data Experiment Settings

This study selects three representative indices of the Chinese stock market for investigation: the Shanghai Composite Index (SSE), the Shenzhen Component Index (SZSE), and the SME 100 Index. The closing price is taken as the forecast target, while other trading data (such as opening price, price change, and trading volume) serve as feature inputs for model training and technical indicator construction. The data span from 10 April 2014 to 19 March 2025. Trading suspension dates are excluded to avoid interference with the model results. All three datasets are split into training, validation, and testing sets in a ratio of 6:1:3. Figure 5 illustrates the trends of the selected datasets; the subgraphs (a), (b) and (c) represent the trend of Shanghai Composite Index (SSE), Shenzhen Component Index (SZSE) and Small and Medium 100 Index (SME 100) respectively.

Table 1 presents the descriptive statistics of the three stock price indices. The three indices form a sharp contrast in terms of volatility characteristics. Specifically, the SSE exhibits the lowest volatility (Std = 404.88) with a leptokurtic and heavy-tailed distribution (Kurtosis = 3.679). The SZSE displays the widest absolute fluctuation range (Std = 2038.12), with its kurtosis approaching that of a normal distribution (0.009). The SME 100 shows the highest relative volatility, characterized by a platykurtic and thin-tailed distribution (Kurtosis = −0.203) and a more dispersed price distribution. The skewness values of all three indices are positive, indicating right-skewed price distributions.

Recent research findings indicate that the stock market is a highly complex stochastic system, which is also influenced by a variety of external factors [48,49,50]. Accordingly, we incorporate all relevant indicators into the feature inputs, including carbon trading market prices, crude oil market prices, gold market prices, global stock market prices, foreign exchange market prices, macroeconomic activity, and sentiment indicators. The specific names and symbols of each feature are listed in Table 2.

4.2. Evaluation Metrics

This paper employs the mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE, %), and the coefficient of determination (R²) as the evaluation metrics for the model’s forecasting performance, whose corresponding formulas are defined as follows:

MAE = \frac{1}{N} \sum_{i = 1}^{N} |Y_{i} - {\hat{Y}}_{i}|

(27)

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} (Y_{i} - {\hat{Y}}_{i})^{2}}

(28)

MAPE = \frac{1}{N} \sum_{i = 1}^{N} |\frac{Y_{i} - {\hat{Y}}_{i}}{Y_{i}}| \times 100 %

(29)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} (Y_{i} - {\hat{Y}}_{i})^{2}}{\sum_{i = 1}^{N} (Y_{i} - \bar{Y})^{2}}

(30)

where

Y_{i}

denotes the true value,

{\hat{Y}}_{i}

denotes the predicted value,

\bar{Y}

is the mean of the true values, and N is the number of samples. MAE, RMSE, and MAPE measure the deviation between the true and predicted values; smaller values indicate that the predictions are closer to the true values. Among them, MAE and RMSE measure the absolute magnitude of the prediction error, which directly reflects the fitting accuracy of the model to the trend and fluctuation components; MAPE normalizes the error in percentage form, which is convenient for horizontal comparison. R² measures the goodness of fit of the model relative to the constant mean baseline; a value closer to 1 indicates a better fit.

4.3. Baseline Models

In order to verify the effectiveness of the proposed framework, this paper selects more than fifteen state-of-the-art time series prediction models as baseline models for comparison, and divides the proposed baseline models into two categories.

The first category includes general sequence modeling methods. The long short-term memory network (LSTM) selectively retains and forgets information through a gating mechanism, which effectively alleviates the problem of gradient disappearance when dealing with long sequences by traditional recurrent neural networks [51]. Support vector regression (SVR) uses kernel techniques to map data to a high-dimensional feature space to perform linear regression, thereby fitting a nonlinear relationship with the original space [9]. The standard Transformer implements parallel processing and long-range dependence capture of sequence data with a self-attention mechanism and an encoder–decoder structure [10]. This kind of model is not specially designed for time series prediction tasks, and lacks an effective adaptation mechanism for non-stationary characteristics of financial data.

The second category covers the long time series prediction models proposed in recent years, which enhance the ability to capture complex time series patterns through frequency-domain transformation, multi-cycle decomposition or lightweight design. On the basis of Transformer, Autoformer uses a built-in sequence decomposition module to gradually separate trend-periodic components and seasonal components, and replaces self-attention with an autocorrelation mechanism, and mines cycle-based subsequence dependencies through fast Fourier transform [42]. Informer improves the efficiency of long sequence prediction through three innovations of ProbSparse self-attention, self-attention distillation and generative decoder [37]. FEDformer combines seasonal-trend decomposition and frequency-domain enhanced attention module to maintain linear complexity by randomly selecting frequency components, which improves the distribution consistency between the predicted sequence and the real sequence [21]. The iTransformer reverses the modeling dimension of the traditional Transformer, treats the entire sequence of each variable as a token, learns the correlation between variables with self-attention, and captures univariate time series dynamics with a feed-forward network [52]. PatchTST divides long sequences into fixed-length patches as tokens, and models each variable independently to capture local temporal patterns efficiently with a shorter attention window [43]. TimesNet converts one-dimensional time series into two-dimensional tensors by FFT to capture multi-cycle patterns, and uses the Inception module to extract changes within and between cycles [20]. FreTS first uses the frequency-domain MLP architecture to convert the time-domain signal to the frequency-domain for global learning, and uses the energy compression characteristics of the Fourier transform to focus on the key frequency components [22]. DLinear decomposes the original time series into trend and seasonal components by a moving average operation, and then sums them after simple linear layer prediction [53]. Based on Koopman theory, Koopa uses a Fourier filter to decompose non-stationary time series into time-invariant and time-varying components, and learns dynamics hierarchically through a modular Koopman predictor [54]. LightTS uses two downsampling strategies, interval sampling and continuous sampling, to shorten the length of the input sequence and then process it with MLP, which significantly improves the efficiency without losing accuracy [55]. FiLM uses Legendre polynomials to compress historical information into memory units, and introduces Fourier transform to enhance the capture of frequency-domain modes, thereby reducing noise interference [56]. TiDE (Time-series Dense Encoder), proposed by Google in 2023, is a long-horizon forecasting model built upon a multi-layer perceptron (MLP) architecture. It combines a projection of the historical series with an encoding of future covariates through residual connections, achieving predictive performance comparable to or better than Transformer-based models with a streamlined fully connected design [57].

4.4. Experiment Design and Objectives

This study systematically evaluates the predictive performance and design rationale of FAMS-Transformer across five dimensions:

(1): Multi-step prediction accuracy comparison against baseline models. FAMS-Transformer is compared with 15 baselines—LSTM, SVR, Transformer, Autoformer, Informer, FEDformer, TimesNet, FreTS, DLinear, Koopa, LightTS, FiLM, PatchTST, iTransformer, and TiDE—across four prediction horizons (1 step, 5 steps, 10 steps, and 15 steps), to assess whether it outperforms all baselines in every setting.
(2): Cross-market validation. The core comparison is replicated on the S & P 500 index under identical prediction settings to examine whether the performance advantage of FAMS-Transformer generalizes across different market environments.
(3): Significance testing. Paired t-tests and Wilcoxon signed-rank tests are conducted on per-sample prediction errors to determine whether the performance advantage of FAMS-Transformer over each baseline is statistically significant.
(4): Volatility-regime analysis. Test samples within each dataset are partitioned into low-volatility (below the first quartile, Q1), medium-volatility (between Q1 and the median, Q2), and high-volatility (above Q2) regimes based on historical volatility. Model performance is evaluated independently within each regime to verify whether FAMS-Transformer maintains its advantage across varying volatility conditions.
(5): Ablation experiments. Three progressively structured groups of ablation variants are designed to quantify the independent contribution of each core component and to validate specific design choices.

Module necessity ablation: The intermediate convolutional module is removed (w/o Conv), the multi-scale decomposition module is removed (w/o Decomp), and both are removed simultaneously (w/o Both). Together with the full model, these four configurations quantify the independent and joint contributions of the two modules.

Local-feature mechanism design ablation. The depthwise separable convolution adopted in this paper is replaced with standard convolution (Standard Conv1D), dilated depthwise separable convolution (Dilated Depthwise Conv1D), and the original Transformer with the intermediate convolutional module entirely removed (Original Transformer), to verify the rationale for adopting depthwise separable convolution.

Period-selection strategy ablation. The adaptive FFT-based period identification adopted in this paper (Adaptive-period) is replaced with a uniform fixed period applied to all windows (Fixed-period) and a randomly assigned period per window (Random-period), to examine whether the adaptive period strategy yields a genuine improvement in predictive accuracy.

5. Experimental Results and Discussion

5.1. Comparison with Baseline Models

Table 3 summarizes the four evaluation indexes of MAE, RMSE, MAPE and R² of FAMS-Transformer and 15 baseline models on the three data sets of Shanghai Composite Index (SSE), Shenzhen Component Index (SZSE) and Small and Medium 100 Index (SME 100) with four prediction steps of one step, five steps, 10 steps and 15 steps. The rough mark is the optimal value of each index, and the underline mark is the suboptimal value.

Figure 6 shows the performance of all models on MAE, RMSE and MAPE under 1-step prediction in the form of line chart. The three subgraphs, (a), (b), and (c), in Figure 6 correspond to SSE, SZSE and SME100 data sets in turn. Since single-step prediction is the most basic time series prediction task, it can most directly reflect the fitting ability of the model and is not interfered by the error accumulation effect. Moreover, it can be seen from Table 3 that in the subsequent multi-step prediction (five, 10, 15 steps) experiments, the error distribution characteristics of each model are highly similar to the single-step prediction situation. Therefore, in order to avoid information redundancy, only the line chart of single-step prediction is displayed here.

Figure 7, Figure 8 and Figure 9 show the fitting curves of the predicted values and the real values of each model on the SSE, SZSE and SME100 datasets, respectively. The four subgraphs (a), (b), (c) and (d) in each graph correspond to the prediction steps of one, five, 10 and 15 respectively. In order to clearly show the fitting differences in different architecture models, only five representative baseline models of DLinear, FEDformer, iTransformer, PatchTST and TimesNet are selected to compare with FAMS-Transformer.

It can be seen from Table 3 and Figure 6, Figure 7, Figure 8 and Figure 9 that FAMS-Transformer achieves the best or near-optimal prediction accuracy on all prediction steps and data sets. For example, on the SME 100 dataset, iTransformer’s R² is 0.930 and FAMS-Transformer’s R² is 0.928, but the two values are highly close and do not constitute a substantial gap. As the prediction step length is extended from 1 step to 15 steps, the prediction accuracy of all models decreases, but the attenuation range is significantly different. The attenuation of FAMS-Transformer was the slowest: R² decreased from 0.959 to 0.730 (a decrease of 0.229) on SSE and only from 0.990 to 0.928 (a decrease of 0.062) on SME 100. In sharp contrast, the first type of baseline model has experienced severe performance degradation or even collapse. The R² of SVR is all entirely negative under the four steps of SSE, and the R² of LSTM in the first step of SME 100 is only 0.084. This contrast shows that FAMS-Transformer’s adaptive decomposition and enhanced coding mechanism provides it with more robust long-term modeling capabilities, while traditional models without time-series-specific adaptation are difficult to maintain effective feature extraction in long-term financial forecasting scenarios.

In the second type of long time series prediction model, iTransformer and PatchTST are the most prominent. iTransformer shows strong competitiveness in long-step prediction, and achieves the optimal R² in 15 steps of SZSE and SME 100. PatchTST performed strongly in short-step prediction (SSE 1-step MAE = 27.637, second only to FAMS-Transformer), but the relative ranking declined with the increase in step length. The performance of FreTS on SZSE is noteworthy, and the 10-step R² ranks first (0.924) among all models, reflecting the generalization potential of the frequency-domain MLP architecture under specific market structures. The performance of TimesNet on the three data sets is quite different (SSE 1 step R² = 0.771, SME 100 index is 0.938), suggesting that its fixed cycle identification strategy has insufficient adaptability to different market volatility characteristics.

Based on the above analysis, FAMS-Transformer achieves the optimal performance in all prediction scenarios. In the most challenging 15-step prediction, its R² remains stable between 0.730 and 0.928. This advantage is due to the dynamic identification of the dominant period and the explicit decoupling of the trend–fluctuation by the frequency-adaptive decomposition mechanism. The trend branch provides a robust base for long-range prediction, and the fluctuation branch captures local dynamics. The two are adaptively weighted by frequency-aware fusion to ensure that the model still maintains strong prediction accuracy in long-step and high-fluctuation scenarios.

5.2. Verification of S & P 500 Data Set

Table 4 reports four evaluation indexes of MAE, RMSE, MAPE and R² of FAMS-Transformer and 15 baseline models on the S & P 500 data set with four prediction steps of one step, five steps, 10 steps and 15 steps. The rough mark is the optimal value of each index, and the underline mark is the suboptimal value. Figure 10 shows the fitting curve of the predicted value and the real value of each model on the S & P 500 data set. The four subgraphs (a) (b) (c) (d) in each graph correspond to the prediction steps of one, five, 10 and 15, respectively. In order to clearly show the fitting differences in different architecture models, only five representative baseline models of DLinear, FEDformer, iTransformer, PatchTST and TimesNet are selected to compare with FAMS-Transformer.

It can be seen from Table 4 and Figure 10 that the prediction accuracy of FAMS-Transformer on SP500 is excellent. All four indicators perform well under four steps, and the leading advantage is more significant than that of the A-share experiment. The results show that the adaptive decomposition and enhanced coding mechanism of FAMS-Transformer is equally effective in the US market, and its performance is robust in mature markets with different volatility characteristics and trading mechanisms. Nevertheless, these results do not fully rule out the possibility of dataset-specific adaptation, and the generalizability of the model remains bounded by the coverage of the current test data.

Additionally, we evaluate directional accuracy (DA)—the proportion of samples for which the predicted direction of price change (up or down) matches the actual direction—as a supplementary metric from the perspective of investment practice. DA values for all models fall within the narrow range of 0.48–0.53, with FAMS-Transformer marginally outperforming most baselines but showing limited advantage. It should be noted that the proposed method is primarily designed for point forecasting of financial time series, and improvements in MAE, RMSE, and R² do not necessarily translate into directional prediction advantages. This result is consistent with the near-random-walk nature of short-term price movements: under the efficient market framework, historical price information alone is insufficient to reliably predict the direction of price changes. Meaningful improvement in DA likely requires the incorporation of non-price information sources such as news text and policy signals. Complete DA results are provided in Appendix A, Table A1.

5.3. Significance Testing

Table 5 summarizes the Wilcoxon signed-rank test results between FAMS-Transformer and the 15 baseline models across four datasets and four prediction horizons, and Table 6 reports the corresponding paired t-test results. In both tables, ***, **, and * denote significance at the 0.01, 0.05, and 0.1 levels, respectively, and (ns) indicates non-significance.

Across all 240 comparisons at α = 0.05, FAMS-Transformer significantly outperforms the baselines in 91.3% of cases under the Wilcoxon signed-rank test and in 88.8% under the paired t-test. The two tests agree in the vast majority of comparisons. The few cases of disagreement typically arise because the t-test is sensitive to departures from normality, whereas the Wilcoxon test is more robust to distributional shape; that the latter still indicates significance suggests that the error distribution of FAMS-Transformer is indeed systematically better than that of the baseline.

The 21 non-significant cases exhibit a clear hierarchical pattern across model categories. First, non-significance relative to the strongest baselines—PatchTST and iTransformer—occurs exclusively at pl = 1, predominantly on the SSE dataset. One possible interpretation is that at a one-step horizon, the prediction target is largely determined by the local morphology of the most recent time steps, and multi-scale periodic structure has yet to unfold. The patching mechanism of PatchTST and the inverted attention of iTransformer are naturally suited to capturing such short-range local patterns. Second, non-significance relative to Autoformer is concentrated at pl = 5. Autoformer incorporates a built-in series decomposition module that, like the frequency-domain decomposition of FAMS-Transformer, structures the input through trend-cycle separation. At moderate prediction horizons, the fixed-scale moving-average decomposition used by Autoformer can still adequately capture the dominant periodic components, and the performance gap between the two decomposition strategies falls below the threshold of statistical discriminability. This observation, in turn, suggests that the decomposition strategy itself may possess general effectiveness.

It should be noted that the above discussion on the causes of non-significant results remains a possible interpretation based on observed experimental patterns, rather than a conclusion directly verified by additional causal experiments. The primary role of the significance analysis is to identify the prediction horizons and baseline comparisons under which the proposed method maintains a more stable advantage; the specific causes of non-significant results require further investigation through dedicated controlled experiments.

5.4. Volatility-Regime Analysis

Taking one-step-ahead prediction as an example, Table 7 reports the R² of each model across three volatility regimes on all four datasets. Complete volatility-regime analysis results for the Shanghai Composite Index (SSE), Shenzhen Component Index (SZSE), SME 100 Index (SMESE), and S & P 500 Index (SP500), covering all four prediction horizons and four evaluation metrics, are provided in Appendix A Table A2, Table A3, Table A4 and Table A5, respectively. As shown in Table 7, FAMS-Transformer ranks first in R² in 11 out of the 12 regime-by-dataset scenarios, with the sole exception being the medium-volatility regime on SZSE (R² = 0.993, ranked second, trailing PatchTST by only 0.001). This result demonstrates that the performance advantage of FAMS-Transformer is not confined to any specific volatility environment—even in the low-volatility regime, where the decomposition strategy has the least to act upon, the model remains the best among all 16 models. Moreover, its R² in the low-volatility regime persists at the same level as, or slightly above, that in the medium- and high-volatility regimes (e.g., SMESE: low = 0.9944, medium = 0.9944, high = 0.9840; SP500: low = 0.998, medium = 0.997, high = 0.993), thereby ruling out the concern that the model may overfit to high-volatility patterns.

5.5. Ablation Experiment

Table 8 reports the evaluation metrics of the full model and its ablation variants across three datasets (SSE, SZSE, and SMESE) at different prediction horizons. Table 9 compares the parameter count (Params) and computational cost (FLOPs) of four architectural variants—the depthwise separable convolution adopted by FAMS-Transformer, standard convolution (Standard Conv1D), dilated depthwise convolution (Dilated Depthwise Conv1D), and the original Transformer without the intermediate convolutional module (Original Transformer)—across the four prediction horizons.

The analysis proceeds from three perspectives: module necessity, local-feature mechanism design, and period-selection strategy.

(1): Module necessity ablation. As shown in Table 8, the full model outperforms all ablation variants across the three datasets, confirming the complementary relationship between the two core modules. Taking SSE 15-step prediction as an example: the full model achieves R² = 0.730; removing the decomposition module (w/o Decomp) reduces it to 0.718; removing the intermediate convolution (w/o Conv) reduces it to 0.726; and removing both (w/o Both) reduces it to 0.715. The independent R² increment of the decomposition module (ΔR² ≈ 0.012) and that of the convolutional module (ΔR² ≈ 0.004) are each numerically modest. However, their joint contribution (ΔR² = 0.015, full model vs. w/o Both) exceeds either independent increment, indicating a complementary relationship: the decomposition module provides structurally separated trend and fluctuation components, upon which the convolutional module extracts local patterns, with the combined effect exceeding the sum of their individual contributions. Among these, the decomposition module provides a modest but consistent contribution across most settings, and the model’s final performance should be understood as the result of the joint operation of multi-scale decomposition, local feature extraction, period selection, and the fusion mechanism.
(2): Local-feature mechanism design ablation. As shown in Table 9, depthwise separable convolution yields better average error metrics than standard convolution across all three datasets while requiring fewer parameters and lower FLOPs, suggesting that the channel-mixing capacity of standard convolution does not translate into improved predictive performance in this setting and instead introduces additional computational overhead. Dilated depthwise convolution performs comparably to, but does not consistently surpass, the standard depthwise separable version, indicating that simply expanding the receptive field does not reliably improve forecasting performance on financial time series.
(3): Period-selection strategy ablation. As shown in Table 8, the adaptive period strategy adopted in this paper yields R² values nearly identical to those of the fixed period strategy (SSE pl = 1: 0.9592 vs. 0.9593), whereas random period assignment leads to a systematic performance degradation (a drop of 0.0020 on SSE pl = 1). This result indicates that the period information extracted by adaptive FFT captures genuine periodic structure rather than passively fitting noise specific to individual windows. If the adaptive mechanism were merely fitting noise, randomizing the period would not cause systematic degradation and cases in which random outperforms adaptive would be expected. The core value of the adaptive period strategy lies in its ability to automatically achieve strong predictive accuracy in a data-driven manner: the period parameter is determined entirely by the frequency-domain characteristics of each input window, requiring no manual specification.

In addition, to examine whether the dominant periods identified by the adaptive mechanism vary randomly across adjacent windows, we identify the top two (Top-2) dominant periods for each test sample via FFT using only its historical input window (excluding data from the prediction horizon), and compute the Jaccard similarity between the period sets selected for each pair of adjacent test windows:

J (P_{t}, P_{t - 1}) = \frac{|P_{t} \cap P_{t - 1}|}{|P_{t} \cup P_{t - 1}|}

(31)

A Jaccard value closer to 1 indicates greater consistency between the period sets selected from adjacent windows. Table 10 reports the Jaccard similarity statistics of the dominant period sets between adjacent test windows. As shown in Table 10, the mean Jaccard similarity across all four datasets exceeds 0.92, with the median reaching 1.0, indicating a high degree of consistency in the dominant periods identified across adjacent windows. It should be noted that this analysis does not, in a strict sense, completely rule out the possibility of spurious local periodicity fitting. The aim of the present analysis is to provide supplementary empirical evidence from the perspective of period continuity, building upon the existing ablation comparisons.

6. Conclusions

This paper proposes a Frequency-Aware Adaptive Multi-Scale Decomposition Transformer forecasting framework (FAMS-Transformer). Multi-step forecasting experiments are conducted at four horizons ranging from one to 15 steps on four representative market indices—the Shanghai Composite Index, the Shenzhen Component Index, the SME 100 Index, and the S & P 500 Index—with comparisons against fifteen baseline models, including LSTM, SVR, Transformer, Autoformer, Informer, iTransformer, PatchTST, TimesNet, FreTS, DLinear, Koopa, LightTS, FEDformer, TiDE and FiLM. The experimental results demonstrate that FAMS-Transformer can effectively separate and model the heterogeneous components inherent in stock index price series—namely, long-term trends, cyclical fluctuations, and random noise—and achieves superior predictive performance both at the most challenging 15-step forecast horizon and on high-volatility indices such as the SZSE Component Index and the SME 100 Index. These findings indicate that the synergistic design of adaptive decomposition and enhanced encoding can effectively address two core limitations in the prevailing decomposition–forecasting paradigm: the lack of adaptivity in decomposition strategies and the insufficient capacity of Transformer encoders to capture local temporal dynamics. Although the performance differences among alternative local modeling approaches in the ablation experiments are generally modest, the removal of the frequency-adaptive decomposition module leads to a discernible decline in prediction accuracy across all three datasets, thereby validating the effectiveness of the FFT-based dynamic identification of dominant periodic components and the consequent adaptive trend–fluctuation decoupling. This mechanism renders the decomposition process entirely driven by the intrinsic periodic characteristics of the price data themselves. Compared with the fixed-scale or globally uniform frequency selection strategies commonly adopted in existing studies, the proposed approach can more flexibly adapt to variations in periodic structure under different market regimes. Furthermore, the depthwise separable convolution reduces computational overhead while improving predictive accuracy, striking a favorable balance among prediction performance, model complexity, and architectural simplicity.

Nevertheless, several limitations of this study warrant acknowledgment. First, the current model relies primarily on historical price data and structured external indicators, without yet incorporating unstructured information such as news text and social media sentiment. This limits the model’s capacity to respond to policy shocks and market turbulence, and is also an important reason for the limited improvement in Directional Accuracy. Future work may integrate tools such as financial sentiment analysis to construct a multimodal input framework, or design classification-oriented branches for directional prediction, so as to concurrently improve point forecasting accuracy and directional judgment. Second, the model involves the coordinated operation of multiple components—including adaptive decomposition, the attention mechanism, and fusion weighting—resulting in a complex decision process. The lack of interpretability constitutes a substantive limitation of the present method. Under the frequency-domain decomposition architecture, standard SHAP analysis and attention weight visualization cannot readily penetrate to the upstream frequency-domain operations; this challenge is a common difficulty faced by this technical approach. Although the current ablation experiments have covered module necessity and period selection strategies, the independent contributions of the fusion weighting mechanism and threshold sensitivity have not yet been sufficiently isolated. Future research will prioritize the adaptation of SHAP to frequency-domain attribution, the temporal alignment of attention weights with financial events for visualization, and dynamic visualization of fusion weights, so as to progressively enhance the transparency and trustworthiness of the model in financial applications. Furthermore, although this study integrates various external market indicators—including carbon trading prices, crude oil prices, and macroeconomic variables—into a unified input space, the marginal contribution of each category of external factors to prediction accuracy has not been separately quantified. Subsequent work may disentangle the differential impacts of distinct factors across diverse market environments through systematic controlled experiments, thereby providing a basis for the refined selection of input features. Finally, the absence of time-series foundation models such as TimesFM from the baseline comparison constitutes a limitation of the present study. In future work, we will consider systematically comparing the proposed method with foundation models including TimesFM under zero-shot, fine-tuning, or unified pretraining evaluation frameworks.

Author Contributions

Conceptualization, G.H. and T.Z.; methodology, G.H. and T.Z.; software, G.H. and T.Z.; validation, X.Z. and T.Z.; formal analysis, H.Z.; investigation, H.Z.; resources, G.H.; data curation, X.Z. and T.Z.; writing—original draft preparation, X.Z.; writing—review and editing, H.Z. and G.H.; visualization, X.Z.; supervision, G.H. and T.Z.; project administration, G.H. and T.Z.; funding acquisition, T.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Fujian Province, grant number 2026J008100.

Data Availability Statement

The data used in this study were obtained from the Wind Financial Database under institutional subscription and are not publicly available due to commercial license restrictions. Reasonable requests for non-confidential processed data may be directed to the corresponding author.

Acknowledgments

The authors would like to thank the open-source community for the computational frameworks that supported the empirical analysis.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ARIMA	Autoregressive Integrated Moving Average
ASDH	Adaptive Selection Decomposition Hybrid
BiLSTM	Bidirectional Long Short-Term Memory
CEEMDAN	Complete Ensemble Empirical Mode Decomposition with Adaptive Noise
CoML	Committee of Multi-Scale Nonlinear Learning
CPO	Crested Porcupine Optimizer
EEMD	Ensemble Empirical Mode Decomposition
EMD	Empirical Mode Decomposition
FAMS	Frequency-Aware Multi-Scale
FEDformer	Frequency Enhanced Decomposed Transformer
FFT	Fast Fourier Transform
FiLM	Frequency Improved Legendre Memory
GABP	Genetic Algorithm Back Propagation
GARCH	Generalized Autoregressive Conditional Heteroskedasticity
IMF	Intrinsic Mode Function
KNN	K-Nearest Neighbors
LSTM	Long Short-Term Memory
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
MDCM	Multi-Scale Dilated Convolution Module
MLP	Multilayer Perceptron
MSPE	Mean Square Percentage Error
R²	Coefficient of Determination
RMSE	Root Mean Square Error
SME 100	Small and Medium Enterprises 100 Index
SSE	Shanghai Stock Exchange Composite Index
SVR	Support Vector Regression
SZSE	Shenzhen Stock Exchange Component Index)
VMD	Variational Mode Decomposition
WT	Wavelet Transform

Appendix A

Table A1. Directional accuracy (DA) of all models across four datasets and four prediction horizons.

Model	SSE				SP500				SZSE				SMESE
Model	1 Step	5 Steps	10 Steps	15 Steps	1 Step	5 Steps	10 Steps	15 Steps	1 Step	5 Steps	10 Steps	15 Steps	1 Step	5 Steps	10 Steps	15 Steps
OurModel	0.5206	0.5241	0.5226	0.5167	0.5267	0.5302	0.5211	0.5112	0.5245	0.5101	0.5166	0.5093	0.5245	0.5192	0.51	0.5173
Autoformer	0.5245	0.5174	0.5086	0.4876	0.48	0.4969	0.5044	0.4551	0.5039	0.5008	0.4954	0.4587	0.5155	0.4997	0.4986	0.4792
DLinear	0.5039	0.5197	0.5196	0.5011	0.48	0.4378	0.4289	0.4179	0.491	0.5161	0.5167	0.4809	0.4845	0.5168	0.516	0.4848
FEDformer	0.5026	0.5003	0.488	0.4637	0.4911	0.4738	0.4584	0.4597	0.4871	0.4617	0.4558	0.4331	0.4884	0.4754	0.4735	0.4525
FiLM	0.4987	0.4891	0.4967	0.4886	0.4866	0.4946	0.4583	0.4425	0.4858	0.4598	0.4802	0.4731	0.4781	0.4671	0.4949	0.4835
FreTS	0.4871	0.4992	0.5116	0.5079	0.4911	0.4993	0.4323	0.4156	0.4858	0.4738	0.5125	0.505	0.509	0.4868	0.5126	0.5007
Informer	0.509	0.4997	0.5177	0.5125	0.4543	0.4179	0.3989	0.3777	0.4729	0.4782	0.4825	0.4912	0.4961	0.4837	0.4907	0.5017
Koopa	0.4974	0.5067	0.5043	0.4932	0.4811	0.5007	0.4777	0.4668	0.5052	0.4961	0.4756	0.4801	0.518	0.4917	0.4832	0.489
LightTS	0.5116	0.5044	0.5159	0.525	0.4399	0.4434	0.4127	0.3925	0.4781	0.4702	0.537	0.537	0.4897	0.4782	0.5282	0.5262
Lstm	0.518	0.5088	0.5026	0.5297	0.4543	0.4179	0.3989	0.3777	0.4832	0.4795	0.4722	0.521	0.4884	0.4707	0.4836	0.5148
PatchTST	0.4807	0.4992	0.5025	0.4959	0.4889	0.4805	0.462	0.4477	0.4948	0.4801	0.5027	0.4945	0.5103	0.4917	0.513	0.5092
SVR	0.5013	0.5171	0.528	0.5283	0.4543	0.4179	0.3989	0.3777	0.4897	0.4982	0.4982	0.5035	0.5064	0.5039	0.5021	0.5031
TiDE	0.5193	0.4972	0.4988	0.4928	0.4967	0.4682	0.4614	0.4483	0.4974	0.4731	0.4759	0.4703	0.5013	0.4816	0.4854	0.4822
TimesNet	0.5116	0.5132	0.5117	0.5129	0.4911	0.4902	0.47	0.4925	0.5077	0.4679	0.4691	0.4809	0.4948	0.4811	0.4902	0.4738
Transformer	0.5	0.5026	0.4992	0.483	0.4543	0.4179	0.3989	0.3777	0.4729	0.471	0.4549	0.4476	0.5077	0.4899	0.4782	0.4823
iTransformer	0.491	0.5023	0.5168	0.5004	0.4933	0.4859	0.4659	0.4476	0.5026	0.4938	0.5134	0.5033	0.5039	0.5127	0.522	0.5154

Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.

Table A2. Volatility-regime analysis results for SSE (all models, four prediction horizons, four evaluation metrics).

	Model	Low				Medium				High
	Model	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)
1 step	OurModel	0.988	16.777	21.749	0.540	0.969	24.630	30.937	0.766	0.900	37.794	54.709	1.199
	Autoformer	0.802	73.504	90.157	2.380	0.675	77.240	100.594	2.393	0.645	75.690	103.051	2.367
	DLinear	0.900	54.631	64.068	1.796	0.830	59.378	72.759	1.855	0.512	91.005	120.689	2.876
	FEDformer	0.814	73.303	87.335	2.398	0.816	59.613	75.783	1.862	0.536	91.950	117.787	2.897
	FiLM	0.894	56.049	65.938	1.833	0.789	65.316	81.065	2.042	0.359	109.123	138.374	3.444
	FreTS	0.980	22.986	28.476	0.752	0.965	25.848	32.980	0.807	0.889	39.242	57.552	1.244
	Informer	0.058	154.710	196.491	5.210	0.230	126.328	154.943	4.018	−0.219	163.563	190.832	5.288
	Koopa	0.967	29.309	36.644	0.941	0.930	37.506	46.855	1.161	0.681	67.787	97.546	2.134
	LightTS	0.941	41.245	49.271	1.344	0.890	48.702	58.677	1.526	0.671	71.192	99.155	2.245
	Lstm	0.576	109.657	131.893	3.577	0.544	99.418	119.261	3.092	0.152	130.071	159.158	4.080
	PatchTST	0.987	18.212	22.807	0.586	0.969	24.445	31.030	0.761	0.893	40.241	56.661	1.276
	SVR	−1.806	290.832	339.146	9.698	−0.720	190.430	231.592	6.061	−1.120	203.285	251.643	6.652
	TiDE	0.948	37.938	46.167	1.230	0.873	50.547	62.851	1.572	0.642	74.823	103.358	2.369
	TimesNet	0.913	48.825	59.775	1.577	0.813	59.797	76.319	1.845	0.483	93.640	124.226	2.952
	Transformer	0.553	106.910	135.381	3.510	0.708	75.137	95.358	2.322	0.793	62.650	78.579	2.000
	iTransformer	0.986	18.961	24.027	0.613	0.966	25.765	32.611	0.801	0.893	38.750	56.601	1.230
5 steps	OurModel	0.941	38.819	47.511	1.262	0.848	49.694	62.065	1.553	0.633	79.280	114.811	2.503
	Autoformer	0.634	96.348	118.030	3.115	0.417	94.092	121.726	2.919	0.463	106.637	139.017	3.330
	DLinear	0.550	111.394	130.966	3.494	−0.097	144.965	166.930	4.445	−0.162	158.419	204.385	4.897
	FEDformer	0.589	105.034	125.167	3.441	0.427	96.870	120.672	3.022	0.210	125.787	168.562	3.936
	FiLM	0.847	65.552	76.383	2.161	0.697	71.204	87.771	2.240	0.365	119.841	151.061	3.781
	FreTS	0.904	50.287	60.613	1.665	0.826	53.077	66.456	1.671	0.617	80.825	117.298	2.551
	Informer	0.295	133.330	163.876	4.386	0.184	121.745	143.961	3.801	0.372	127.499	150.296	4.045
	Koopa	0.927	43.802	52.716	1.432	0.808	56.693	69.789	1.777	0.560	86.769	125.774	2.737
	LightTS	0.923	42.829	54.000	1.381	0.786	60.295	73.761	1.874	0.557	91.159	126.262	2.861
	Lstm	0.545	110.688	131.625	3.554	0.178	127.006	144.532	3.920	0.329	124.680	155.350	3.878
	PatchTST	0.941	39.500	47.420	1.284	0.844	50.249	63.037	1.572	0.599	83.923	120.036	2.648
	SVR	−1.562	267.112	312.389	8.919	−0.712	168.247	208.574	5.367	−0.623	194.300	241.557	6.307
	TiDE	0.788	76.083	89.936	2.479	0.480	92.746	114.986	2.890	−0.092	154.645	198.148	4.846
	TimesNet	0.870	59.473	70.413	1.937	0.685	72.899	89.459	2.279	0.165	130.226	173.304	4.086
	Transformer	0.459	112.999	143.514	3.727	0.617	79.625	98.658	2.480	0.693	80.002	104.989	2.520
	iTransformer	0.936	41.017	49.400	1.334	0.847	50.102	62.347	1.567	0.597	80.984	120.376	2.553
10 steps	OurModel	0.903	54.038	63.772	1.758	0.705	67.408	83.126	2.122	0.273	113.955	154.228	3.587
	Autoformer	0.489	118.347	146.638	3.805	−0.057	121.489	157.280	3.797	−0.255	160.337	202.685	5.047
	DLinear	0.584	106.098	132.364	3.295	−0.243	144.522	170.575	4.438	−0.415	160.943	215.173	4.979
	FEDformer	0.644	104.624	122.358	3.427	0.337	99.771	124.564	3.143	−0.362	168.935	211.175	5.294
	FiLM	0.867	64.435	74.851	2.099	0.620	76.291	94.331	2.388	0.046	134.273	176.704	4.217
	FreTS	0.872	61.637	73.390	2.016	0.710	69.078	82.370	2.170	0.370	107.516	143.574	3.378
	Informer	0.189	146.963	184.695	4.796	−0.008	126.121	153.603	3.958	0.218	136.159	159.984	4.324
	Koopa	0.869	64.018	74.349	2.091	0.684	69.525	85.955	2.189	0.314	108.309	149.866	3.429
	LightTS	0.896	51.323	66.011	1.652	0.713	68.915	81.974	2.158	0.314	109.281	149.834	3.432
	Lstm	0.004	162.253	204.664	5.433	0.237	103.728	133.632	3.320	0.186	133.283	163.253	4.299
	PatchTST	0.901	54.285	64.453	1.762	0.697	69.054	84.180	2.170	0.208	121.269	161.048	3.818
	SVR	−0.847	237.443	278.756	7.876	−0.475	147.288	185.803	4.711	−0.457	172.179	218.406	5.593
	TiDE	0.830	73.379	84.461	2.402	0.609	77.855	95.685	2.439	−0.059	144.377	186.146	4.538
	TimesNet	0.844	69.058	81.041	2.237	0.505	86.684	107.599	2.701	−0.222	150.119	199.995	4.681
	Transformer	0.355	129.178	164.764	4.277	0.478	87.397	110.542	2.760	0.552	95.869	121.085	3.062
	iTransformer	0.892	58.482	67.425	1.901	0.716	66.625	81.574	2.096	0.243	113.190	157.448	3.559
15 steps	OurModel	0.859	70.479	80.933	2.294	0.562	75.809	93.319	2.387	−0.128	143.975	184.619	4.536
	Autoformer	0.468	126.041	157.078	4.026	−0.651	143.326	181.170	4.499	−0.382	163.779	204.302	5.123
	DLinear	0.735	95.309	110.922	3.148	0.378	92.015	111.153	2.903	−0.339	163.920	201.120	5.160
	FEDformer	0.592	119.434	137.470	3.925	0.022	116.779	139.413	3.694	−0.763	194.840	230.819	6.135
	FiLM	0.830	76.705	88.663	2.506	0.422	88.071	107.225	2.765	−0.351	161.426	202.029	5.067
	FreTS	0.814	78.774	92.864	2.560	0.566	77.151	92.877	2.417	0.083	122.632	166.454	3.866
	Informer	0.259	150.509	185.411	4.871	−0.141	126.888	150.577	4.000	0.098	141.386	165.103	4.486
	Koopa	0.854	69.565	82.291	2.259	0.477	82.905	101.963	2.595	−0.143	145.579	185.838	4.577
	LightTS	0.829	73.144	89.074	2.377	0.517	81.317	97.944	2.549	0.010	131.664	172.952	4.155
	Lstm	0.423	131.572	163.635	4.314	0.339	92.704	114.633	2.942	0.327	117.800	142.607	3.775
	PatchTST	0.859	70.313	80.982	2.287	0.556	77.330	93.955	2.434	−0.202	148.621	190.527	4.674
	SVR	−0.544	227.195	267.542	7.475	−0.397	131.167	166.647	4.224	−0.355	157.203	202.293	5.115
	TiDE	0.821	78.313	91.155	2.561	0.412	88.901	108.092	2.793	−0.409	166.162	206.295	5.219
	TimesNet	0.790	81.368	98.573	2.657	0.157	103.644	129.411	3.241	−0.529	170.432	214.945	5.324
	Transformer	0.379	137.319	169.696	4.461	0.336	95.774	114.903	3.027	0.493	99.630	123.802	3.174
	iTransformer	0.857	70.825	81.297	2.302	0.563	76.344	93.251	2.399	−0.107	139.054	182.843	4.381

Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.

Table A3. Volatility-regime analysis results for SZSE (all models, four prediction horizons, four evaluation metrics).

	Model	Low				Medium				High
	Model	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)
1 step	OurModel	0.994	83.164	105.512	0.781	0.993	116.663	150.602	1.054	0.956	173.246	257.992	1.637
	Autoformer	0.928	281.764	363.855	2.660	0.949	326.837	413.662	2.972	0.811	383.833	535.405	3.621
	DLinear	0.937	291.603	341.314	2.834	0.952	332.121	401.975	3.097	0.786	425.763	569.549	4.002
	FEDformer	0.919	315.599	386.191	3.048	0.947	344.444	423.296	3.162	0.778	453.729	580.416	4.228
	FiLM	0.927	309.069	367.309	2.964	0.933	387.962	474.707	3.533	0.692	524.741	683.426	4.884
	FreTS	0.990	112.387	137.032	1.090	0.992	129.534	159.479	1.202	0.953	181.207	266.976	1.723
	Informer	−0.449	1421.353	1634.282	14.321	0.120	1513.941	1717.201	15.158	−1.031	1627.552	1753.467	16.106
	Koopa	0.983	138.184	175.584	1.285	0.983	190.322	237.440	1.730	0.882	292.911	423.292	2.744
	LightTS	0.961	225.498	268.166	2.185	0.976	229.764	284.032	2.173	0.857	345.167	465.839	3.256
	Lstm	−0.077	1080.988	1409.105	11.100	0.140	1465.799	1697.605	14.537	−1.112	1561.856	1788.246	15.551
	PatchTST	0.993	87.316	111.166	0.820	0.994	110.027	142.569	0.996	0.954	184.385	263.243	1.744
	SVR	−0.429	1259.521	1623.370	12.830	−0.018	1570.417	1846.683	15.232	−0.546	1240.730	1530.266	12.641
	TiDE	0.964	212.417	255.880	2.018	0.969	252.670	323.032	2.282	0.842	348.066	489.645	3.278
	TimesNet	0.943	265.625	324.201	2.496	0.954	308.161	391.902	2.781	0.804	398.501	545.450	3.737
	Transformer	−0.272	1264.250	1531.305	12.859	0.178	1399.981	1659.734	14.079	−0.851	1442.257	1674.261	14.521
	iTransformer	0.992	97.670	122.682	0.923	0.992	129.172	165.707	1.171	0.955	182.642	262.014	1.725
5 steps	OurModel	0.963	195.681	245.609	1.859	0.961	265.907	334.353	2.444	0.853	373.325	540.487	3.497
	Autoformer	0.861	388.675	479.403	3.660	0.892	449.904	559.852	4.055	0.747	511.907	710.332	4.774
	DLinear	0.829	419.483	531.872	3.729	0.866	474.048	623.382	4.171	0.608	675.869	884.331	6.255
	FEDformer	0.777	520.858	606.666	5.080	0.867	518.039	621.548	4.714	0.665	618.842	816.636	5.708
	FiLM	0.889	363.290	427.669	3.517	0.895	449.851	551.129	4.123	0.706	587.973	764.968	5.446
	FreTS	0.940	262.849	315.140	2.582	0.958	283.234	348.091	2.689	0.848	384.839	549.774	3.636
	Informer	−0.368	1201.104	1502.535	12.331	−0.001	1488.158	1702.805	14.891	−0.493	1596.494	1725.110	15.677
	Koopa	0.945	244.848	302.569	2.342	0.949	305.608	385.836	2.787	0.830	402.599	582.731	3.759
	LightTS	0.947	247.211	295.618	2.371	0.948	311.280	388.935	2.887	0.833	422.357	577.425	3.922
	Lstm	−0.316	1242.099	1473.679	12.583	0.108	1377.917	1607.288	13.849	−0.310	1475.723	1615.765	14.517
	PatchTST	0.964	195.681	243.913	1.858	0.960	275.171	342.374	2.537	0.839	394.977	566.997	3.697
	SVR	−0.418	1185.776	1529.739	12.165	−0.021	1449.971	1719.820	14.215	−0.189	1242.160	1539.469	12.319
	TiDE	0.864	387.685	473.767	3.701	0.851	539.993	657.157	4.916	0.516	764.428	982.407	7.089
	TimesNet	0.897	347.596	411.648	3.328	0.900	443.257	539.412	4.018	0.645	632.676	840.741	5.888
	Transformer	−0.451	1268.735	1547.812	12.992	0.036	1412.432	1670.570	14.327	−0.449	1522.578	1699.601	15.072
	iTransformer	0.961	201.666	252.979	1.917	0.961	270.617	335.076	2.489	0.846	377.824	554.529	3.538
10 steps	OurModel	0.922	292.494	356.016	2.780	0.928	367.050	458.140	3.362	0.675	551.348	740.735	5.149
	Autoformer	0.748	504.609	641.203	4.783	0.769	655.575	821.125	6.061	0.388	798.000	1017.013	7.485
	DLinear	0.801	437.441	569.832	3.906	0.847	505.340	667.032	4.470	0.472	708.593	945.138	6.590
	FEDformer	0.789	491.381	586.675	4.776	0.788	651.944	786.326	6.037	0.353	828.403	1046.204	7.672
	FiLM	0.900	344.525	404.280	3.277	0.902	431.014	534.146	3.950	0.559	649.633	863.137	6.052
	FreTS	0.895	345.946	413.595	3.370	0.935	359.606	434.920	3.433	0.732	502.619	673.147	4.749
	Informer	−0.561	1269.006	1595.413	13.009	−0.133	1609.074	1817.937	15.992	−0.869	1639.881	1777.434	16.180
	Koopa	0.893	359.707	418.535	3.442	0.911	418.694	510.793	3.851	0.685	532.905	729.623	5.002
	LightTS	0.903	325.625	398.016	3.127	0.935	356.668	436.252	3.335	0.765	469.852	629.868	4.439
	Lstm	−0.876	1460.096	1749.128	14.855	−0.237	1654.682	1899.523	16.596	−0.907	1603.532	1795.660	15.951
	PatchTST	0.920	292.083	360.951	2.769	0.926	367.631	463.526	3.361	0.659	558.828	759.084	5.229
	SVR	−0.283	1109.155	1446.286	11.378	0.046	1396.788	1668.117	13.572	−0.140	1114.316	1388.356	11.188
	TiDE	0.878	380.086	445.691	3.640	0.883	473.839	583.698	4.372	0.491	705.263	927.635	6.538
	TimesNet	0.883	364.969	437.338	3.432	0.879	489.904	594.911	4.460	0.362	790.474	1038.242	7.324
	Transformer	−0.751	1412.339	1689.807	14.397	−0.130	1558.622	1815.284	15.760	−0.900	1630.318	1792.289	16.164
	iTransformer	0.920	300.025	361.479	2.856	0.928	372.024	458.590	3.421	0.667	537.778	749.907	5.025
15 steps	OurModel	0.911	348.887	422.726	3.291	0.867	468.858	575.951	4.351	0.418	701.957	919.613	6.531
	Autoformer	0.801	507.611	630.895	4.894	0.705	720.542	858.858	6.769	0.084	958.787	1153.788	8.915
	DLinear	0.851	458.184	546.659	4.450	0.801	591.977	704.881	5.707	0.312	804.844	999.730	7.520
	FEDformer	0.779	561.669	665.343	5.380	0.716	729.219	842.973	6.928	0.008	990.288	1200.630	9.188
	FiLM	0.894	380.013	459.723	3.608	0.831	536.764	650.837	4.984	0.276	803.267	1025.636	7.446
	FreTS	0.891	384.216	466.548	3.629	0.891	426.103	522.805	4.117	0.565	588.500	795.049	5.583
	Informer	−0.130	1195.373	1503.123	11.909	−0.236	1532.163	1758.409	15.563	−0.895	1521.537	1659.832	15.102
	Koopa	0.909	346.277	425.734	3.252	0.862	475.865	587.058	4.394	0.370	753.895	956.705	7.019
	LightTS	0.866	421.844	517.397	4.031	0.867	463.731	577.593	4.516	0.600	598.149	762.161	5.693
	Lstm	0.165	932.952	1292.256	9.289	0.073	1262.565	1522.296	12.855	−0.131	1084.795	1282.162	10.924
	PatchTST	0.909	348.204	426.893	3.281	0.866	472.356	579.718	4.384	0.389	724.806	942.166	6.739
	SVR	−0.094	1139.857	1479.147	11.155	−0.028	1343.391	1603.507	13.513	−0.195	1062.589	1317.987	10.801
	TiDE	0.889	386.739	471.022	3.674	0.831	537.309	650.413	5.004	0.242	827.944	1049.836	7.685
	TimesNet	0.890	385.343	468.168	3.638	0.809	564.981	690.563	5.249	0.303	798.640	1006.910	7.375
	Transformer	−0.188	1283.304	1541.541	12.787	−0.277	1513.086	1787.276	15.609	−0.969	1536.232	1692.021	15.291
	iTransformer	0.911	346.125	422.079	3.274	0.876	454.598	557.169	4.216	0.452	670.697	892.155	6.257

Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.

Table A4. Volatility-regime analysis results for SMESE (all models, four prediction horizons, four evaluation metrics).

	Model	Low				Medium				High
	Model	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)
1 step	OurModel	0.994	54.730	68.879	0.787	0.994	75.572	95.223	1.090	0.984	114.953	159.521	1.619
	Autoformer	0.934	186.910	237.242	2.721	0.961	194.418	249.636	2.858	0.935	244.988	320.028	3.471
	DLinear	0.943	185.487	218.913	2.778	0.958	213.396	261.493	3.181	0.921	279.378	353.560	3.948
	FEDformer	0.932	203.167	239.970	3.002	0.956	213.950	267.748	3.131	0.906	309.724	386.404	4.315
	FiLM	0.933	196.380	237.524	2.879	0.946	238.969	294.740	3.446	0.878	352.055	440.772	4.917
	FreTS	0.990	74.027	90.127	1.097	0.993	87.277	106.631	1.301	0.983	119.697	163.896	1.691
	Informer	0.010	705.115	916.003	11.348	0.155	989.708	1167.999	16.193	0.340	827.158	1023.176	12.987
	Koopa	0.985	87.495	111.733	1.258	0.985	122.966	157.002	1.763	0.958	191.803	258.460	2.702
	LightTS	0.963	146.808	176.072	2.178	0.975	164.908	201.671	2.476	0.947	222.976	289.044	3.166
	Lstm	0.018	700.085	912.235	11.188	−0.037	1086.247	1294.147	17.579	0.195	899.190	1130.379	14.056
	PatchTST	0.993	59.118	74.409	0.852	0.994	73.214	95.811	1.057	0.983	121.230	163.118	1.708
	SVR	−0.489	864.530	1123.240	13.798	−0.222	1235.546	1404.887	19.580	0.041	1007.194	1233.896	15.160
	TiDE	0.967	138.479	168.477	2.008	0.975	158.329	200.947	2.276	0.939	234.881	310.664	3.304
	TimesNet	0.944	174.004	218.647	2.499	0.956	206.662	265.369	2.961	0.914	286.094	370.132	4.016
	Transformer	−0.079	683.956	956.382	11.145	0.014	1040.008	1262.038	17.011	0.223	845.373	1110.257	13.342
	iTransformer	0.992	65.212	81.673	0.944	0.993	83.025	105.934	1.201	0.983	121.472	166.027	1.708
5 steps	OurModel	0.965	128.254	161.037	1.874	0.970	174.230	214.358	2.550	0.931	249.274	338.760	3.480
	Autoformer	0.851	271.262	332.902	3.969	0.918	282.489	354.885	4.073	0.881	356.645	443.593	4.998
	DLinear	0.836	275.502	348.990	3.792	0.895	297.490	400.954	4.089	0.810	445.191	561.214	6.194
	FEDformer	0.810	319.217	375.790	4.756	0.897	319.034	396.498	4.606	0.840	406.757	514.314	5.641
	FiLM	0.896	231.853	277.998	3.422	0.922	283.436	344.169	4.118	0.848	395.872	501.587	5.478
	FreTS	0.944	168.828	203.070	2.534	0.963	197.032	236.314	2.968	0.935	241.759	328.747	3.421
	Informer	−0.963	991.441	1207.270	15.774	−0.493	1323.093	1510.494	21.461	−0.064	1145.952	1328.542	17.687
	Koopa	0.947	160.308	198.438	2.347	0.960	200.379	247.918	2.914	0.914	279.096	377.574	3.881
	LightTS	0.951	157.358	190.394	2.316	0.956	208.087	260.614	3.075	0.923	271.026	356.217	3.765
	Lstm	−0.475	873.934	1046.509	13.789	−0.180	1150.003	1342.707	18.585	0.157	1001.016	1182.334	15.277
	PatchTST	0.965	130.459	160.600	1.903	0.968	180.034	219.441	2.642	0.926	256.696	351.019	3.594
	SVR	−0.534	823.451	1066.928	13.277	−0.185	1183.214	1346.032	18.778	0.087	1013.088	1230.246	14.993
	TiDE	0.874	249.235	305.506	3.647	0.895	323.581	400.289	4.647	0.769	499.815	619.300	7.011
	TimesNet	0.921	197.099	241.692	2.878	0.938	243.116	306.857	3.472	0.854	377.793	491.512	5.309
	Transformer	−0.371	730.660	1008.731	11.951	−0.133	1095.013	1315.705	17.943	0.214	906.332	1141.491	14.153
	iTransformer	0.964	130.004	163.841	1.891	0.969	180.486	218.347	2.642	0.931	245.568	339.128	3.430
10 steps	OurModel	0.931	188.787	228.249	2.745	0.953	228.761	286.945	3.301	0.844	366.679	464.328	5.178
	Autoformer	0.792	325.792	396.388	4.756	0.878	373.490	460.484	5.399	0.635	559.805	710.358	7.901
	DLinear	0.825	276.953	363.673	3.794	0.885	333.855	446.775	4.495	0.753	452.091	584.552	6.401
	FEDformer	0.809	319.958	379.209	4.775	0.869	381.038	476.667	5.538	0.667	555.362	678.898	7.767
	FiLM	0.914	215.268	254.324	3.137	0.936	263.214	332.497	3.783	0.791	426.476	536.997	6.032
	FreTS	0.904	223.962	269.093	3.330	0.944	259.919	311.454	3.851	0.882	313.079	403.614	4.514
	Informer	−0.775	923.466	1157.162	14.729	−0.213	1259.793	1450.680	20.266	−0.201	1131.601	1288.675	17.545
	Koopa	0.906	225.554	265.940	3.299	0.939	260.980	326.445	3.774	0.815	399.125	505.708	5.624
	LightTS	0.907	216.544	265.043	3.203	0.942	264.432	318.211	3.854	0.890	301.805	390.828	4.307
	Lstm	−0.569	904.879	1087.739	14.265	−0.054	1176.761	1352.345	18.800	0.023	955.828	1162.199	14.953
	PatchTST	0.930	185.873	229.504	2.710	0.953	226.338	286.880	3.264	0.835	376.798	478.145	5.326
	SVR	−0.380	788.079	1020.376	12.660	−0.025	1171.246	1333.410	18.128	0.083	927.306	1125.889	14.161
	TiDE	0.895	239.839	281.845	3.515	0.924	293.285	363.679	4.261	0.752	461.785	585.145	6.475
	TimesNet	0.896	236.105	279.546	3.433	0.922	286.359	368.106	4.054	0.680	530.108	665.642	7.478
	Transformer	−0.549	806.564	1081.041	13.081	−0.096	1158.761	1379.086	18.783	−0.016	975.529	1185.326	15.356
	iTransformer	0.930	191.650	229.942	2.787	0.951	231.745	290.634	3.347	0.845	358.877	463.212	5.081
15 steps	OurModel	0.930	215.187	261.455	3.134	0.917	284.316	353.797	4.099	0.737	461.052	575.596	6.489
	Autoformer	0.831	321.766	405.884	4.812	0.818	409.024	525.421	5.925	0.525	643.416	773.606	9.082
	DLinear	0.871	300.946	354.740	4.517	0.862	369.175	456.341	5.617	0.695	511.939	619.716	7.285
	FEDformer	0.807	359.110	434.725	5.328	0.810	453.760	536.388	6.709	0.514	659.922	782.896	9.288
	FiLM	0.914	237.474	289.537	3.486	0.896	317.106	396.690	4.594	0.667	528.816	647.758	7.432
	FreTS	0.900	253.271	312.015	3.649	0.917	291.983	354.687	4.313	0.813	378.851	485.797	5.514
	Informer	−0.142	809.488	1056.612	12.782	−0.142	1137.856	1315.131	18.342	−0.073	1025.637	1163.087	15.961
	Koopa	0.930	210.591	261.612	3.056	0.914	282.808	360.444	4.066	0.703	505.073	612.316	7.130
	LightTS	0.876	285.651	348.618	4.209	0.897	325.478	395.530	4.880	0.824	387.341	471.017	5.636
	Lstm	0.154	666.577	909.225	10.409	0.066	1028.068	1189.068	16.221	0.217	848.023	993.697	12.962
	PatchTST	0.928	217.080	265.628	3.173	0.913	292.222	363.680	4.221	0.721	479.351	593.188	6.746
	SVR	−0.175	835.873	1071.463	12.966	−0.091	1126.431	1285.451	17.721	0.059	897.988	1089.131	13.925
	TiDE	0.910	238.887	296.152	3.512	0.896	318.332	397.369	4.616	0.653	542.278	661.264	7.629
	TimesNet	0.909	246.469	298.209	3.634	0.860	372.991	461.113	5.429	0.630	564.132	683.108	7.861
	Transformer	−0.079	760.487	1026.981	12.125	−0.153	1102.827	1321.512	17.960	0.017	929.960	1113.407	14.648
	iTransformer	0.928	218.934	265.972	3.194	0.922	276.768	342.902	3.990	0.753	443.067	558.361	6.249

Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.

Table A5. Volatility-regime analysis results for SP500 (all models, four prediction horizons, four evaluation metrics).

	Model	Low				Medium				High
	Model	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)
1 step	OurModel	0.998	32.431	39.666	0.588	0.997	40.844	50.970	0.757	0.993	56.103	75.828	1.214
	Autoformer	0.989	74.216	92.231	1.353	0.991	73.667	90.108	1.374	0.980	94.139	126.806	1.994
	DLinear	0.974	127.508	138.246	2.296	0.986	97.607	113.218	1.823	0.965	136.110	167.858	2.932
	FEDformer	0.984	92.261	108.664	1.679	0.990	78.154	94.577	1.489	0.973	111.171	148.100	2.393
	FiLM	0.982	102.759	114.170	1.891	0.988	89.132	104.136	1.706	0.954	152.704	192.220	3.327
	FreTS	0.997	41.884	50.343	0.737	0.996	48.120	59.627	0.882	0.991	65.128	84.934	1.382
	Informer	−2.691	1471.585	1654.355	24.891	−1.697	1345.167	1576.799	22.912	−0.359	844.309	1046.050	16.365
	Koopa	0.986	76.329	100.215	1.406	0.991	70.771	91.380	1.317	0.973	110.876	146.137	2.425
	LightTS	0.911	222.279	256.790	3.754	0.943	186.551	228.755	3.211	0.955	148.631	191.361	3.025
	Lstm	−2.783	1535.223	1674.817	26.309	−1.787	1438.691	1602.965	25.100	−0.690	1081.966	1166.537	22.454
	PatchTST	0.997	38.079	46.638	0.691	0.996	49.287	62.270	0.917	0.990	69.677	90.801	1.512
	SVR	−8.920	2527.457	2712.290	43.577	−6.330	2363.952	2599.587	41.388	−3.630	1703.820	1931.044	34.200
	TiDE	0.990	72.055	86.369	1.305	0.992	70.789	87.407	1.328	0.975	106.213	143.068	2.298
	TimesNet	0.990	71.644	86.670	1.285	0.992	66.995	84.138	1.235	0.981	94.225	124.024	2.046
	Transformer	−2.743	1469.539	1666.071	24.764	−1.781	1359.824	1601.324	23.122	−0.388	819.271	1057.223	15.590
	iTransformer	0.997	34.734	44.078	0.627	0.996	46.873	61.635	0.871	0.989	68.935	92.217	1.468
5 steps	OurModel	0.990	67.941	84.062	1.239	0.990	76.811	96.296	1.427	0.977	110.516	140.746	2.351
	Autoformer	0.986	85.508	103.305	1.571	0.986	91.298	112.879	1.715	0.959	147.768	187.696	3.188
	DLinear	0.790	383.949	395.014	6.919	0.884	304.002	325.236	5.650	0.894	254.523	301.356	5.361
	FEDformer	0.974	117.780	137.936	2.179	0.984	95.667	118.876	1.845	0.951	155.781	204.458	3.351
	FiLM	0.985	91.668	106.212	1.703	0.987	88.665	106.863	1.705	0.950	157.481	206.966	3.436
	FreTS	0.991	68.208	82.767	1.241	0.989	77.053	98.776	1.435	0.975	114.869	147.653	2.442
	Informer	−2.931	1554.858	1708.511	26.520	−1.860	1406.096	1611.600	24.275	−0.485	941.826	1128.144	18.353
	Koopa	0.987	84.013	99.572	1.551	0.988	87.502	106.089	1.649	0.970	124.956	161.406	2.657
	LightTS	0.853	298.182	330.504	5.120	0.914	236.384	279.536	4.119	0.935	182.373	235.720	3.639
	Lstm	−3.842	1802.208	1896.148	31.395	−2.522	1668.475	1788.173	29.786	−1.218	1292.667	1378.581	26.696
	PatchTST	0.989	77.889	90.197	1.427	0.988	86.146	104.701	1.615	0.969	131.691	164.135	2.814
	SVR	−9.084	2560.671	2736.250	44.209	−6.400	2368.778	2592.187	41.771	−3.620	1758.898	1989.762	35.046
	TiDE	0.959	158.984	174.170	2.925	0.979	113.808	137.255	2.165	0.952	158.402	203.424	3.379
	TimesNet	0.984	92.933	109.164	1.704	0.988	86.267	104.766	1.649	0.957	151.805	192.358	3.305
	Transformer	−2.134	1325.352	1525.522	22.197	−1.276	1193.691	1437.435	20.218	−0.141	785.927	988.717	15.030
	iTransformer	0.990	73.623	86.471	1.348	0.989	79.318	99.762	1.493	0.973	120.548	152.728	2.571
10 steps	OurModel	0.981	97.854	117.873	1.793	0.981	106.693	130.314	2.012	0.958	148.962	188.168	3.154
	Autoformer	0.958	133.545	175.316	2.494	0.972	124.816	158.728	2.389	0.924	202.375	254.095	4.382
	DLinear	0.707	453.326	465.480	8.130	0.839	356.641	382.625	6.553	0.869	279.530	332.354	5.878
	FEDformer	0.964	146.382	163.595	2.702	0.979	116.316	138.574	2.227	0.931	185.279	241.987	3.978
	FiLM	0.969	138.166	151.434	2.536	0.981	113.520	131.884	2.154	0.944	178.655	216.944	3.829
	FreTS	0.974	123.108	138.694	2.222	0.976	124.307	146.334	2.331	0.950	166.078	204.873	3.508
	Informer	−3.380	1667.124	1799.307	28.619	−2.163	1517.714	1696.017	26.405	−0.732	1050.928	1210.690	20.840
	Koopa	0.976	116.990	132.396	2.162	0.982	110.780	129.433	2.095	0.951	164.281	203.511	3.516
	LightTS	0.702	433.342	469.584	7.459	0.821	350.752	403.935	6.094	0.887	240.465	308.934	4.800
	Lstm	−4.109	1851.907	1943.212	32.308	−2.711	1718.861	1837.122	30.665	−1.413	1365.212	1429.046	28.672
	PatchTST	0.975	124.142	136.901	2.280	0.983	108.065	126.048	2.051	0.948	172.991	209.800	3.722
	SVR	−9.416	2613.387	2774.536	45.231	−6.667	2433.625	2640.578	42.972	−3.756	1784.885	2006.349	35.709
	TiDE	0.937	199.273	215.801	3.657	0.973	132.912	157.467	2.564	0.910	210.025	276.006	4.535
	TimesNet	0.973	124.173	142.013	2.265	0.980	112.987	133.637	2.141	0.947	172.453	211.214	3.694
	Transformer	−2.660	1469.980	1644.816	24.860	−1.649	1333.538	1552.163	22.812	−0.391	903.518	1084.893	17.632
	iTransformer	0.972	128.440	143.989	2.352	0.980	114.595	136.499	2.169	0.947	172.102	212.383	3.691
15 steps	OurModel	0.950	179.595	197.206	3.292	0.967	144.479	171.944	2.719	0.937	190.317	234.345	4.000
	Autoformer	0.941	190.695	214.031	3.513	0.969	138.345	167.586	2.674	0.912	214.287	276.513	4.617
	DLinear	0.832	342.262	359.967	6.078	0.915	243.808	276.848	4.422	0.921	210.217	262.479	4.376
	FEDformer	0.938	197.376	218.391	3.620	0.971	133.234	162.904	2.544	0.921	198.774	262.079	4.273
	FiLM	0.956	169.649	183.220	3.117	0.977	124.321	145.303	2.389	0.929	201.868	248.381	4.309
	FreTS	0.941	198.525	213.625	3.555	0.957	167.856	197.079	3.104	0.929	203.719	248.573	4.281
	Informer	−3.766	1808.125	1916.060	31.368	−2.581	1651.082	1795.127	29.168	−1.063	1184.813	1339.926	23.453
	Koopa	0.966	147.199	162.236	2.724	0.978	120.298	140.775	2.297	0.938	190.764	233.209	4.055
	LightTS	0.146	732.377	811.007	12.443	0.408	629.090	730.007	10.801	0.702	391.236	509.185	7.431
	Lstm	−4.540	1992.091	2065.816	35.027	−3.201	1843.494	1944.406	33.170	−1.607	1428.461	1506.161	29.547
	PatchTST	0.963	153.844	168.016	2.834	0.978	120.247	140.662	2.302	0.936	196.898	236.553	4.206
	SVR	−9.247	2655.957	2809.624	46.118	−6.884	2470.190	2663.770	43.837	−3.903	1841.699	2065.504	36.551
	TiDE	0.936	205.664	222.264	3.763	0.968	143.936	171.008	2.744	0.925	196.056	255.986	4.174
	TimesNet	0.961	156.808	173.800	2.895	0.977	124.534	145.394	2.392	0.932	193.550	243.284	4.125
	Transformer	−2.283	1412.301	1590.272	23.842	−1.466	1265.148	1489.725	21.586	−0.243	842.167	1040.133	16.091
	iTransformer	0.964	153.227	167.289	2.824	0.977	123.846	145.073	2.358	0.937	191.974	233.472	4.069

Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.

References

He, Q.; Liang, Y.; Lin, Y.; Pan, D.; Yue, Y. Committee of Multi-Scale Nonlinear Learning Frameworks for Accurate Stock Price Forecasting. Eng. Appl. Artif. Intell. 2025, 162, 112325. [Google Scholar] [CrossRef]
Zhang, C.; Sjarif, N.N.A.; Ibrahim, R. Deep Learning Models for Price Forecasting of Financial Time Series: A Review of Recent Advancements: 2020–2022. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2024, 14, e1519. [Google Scholar]
Black, F.; Scholes, M. The Pricing of Options and Corporate Liabilities. J. Political Econ. 1973, 81, 637–654. [Google Scholar] [CrossRef] [PubMed]
Merton, R.C. Theory of Rational Option Pricing. Bell J. Econ. Manag. Sci. 1973, 4, 141–183. [Google Scholar] [CrossRef]
Heston, S.L. A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options. Rev. Financ. Stud. 1993, 6, 327–343. [Google Scholar] [CrossRef]
Cont, R. Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues. Quant. Financ. 2001, 1, 223. [Google Scholar] [CrossRef]
Li, Z.; Liao, Y.; Hu, B.; Ni, L.; Lu, Y. A Financial Deep Learning Framework: Predicting the Values of Financial Time Series with ARIMA and LSTM. Int. J. Web Serv. Res. 2022, 19, 1–15. [Google Scholar] [CrossRef]
Zhang, J.; Liu, H.; Bai, W.; Li, X. A Hybrid Approach of Wavelet Transform, ARIMA and LSTM Model for the Share Price Index Futures Forecasting. N. Am. J. Econ. Financ. 2024, 69, 102022. [Google Scholar] [CrossRef]
Wang, S.-W.; Huang, C.-Y. A Hybrid SVR-Based Framework for Cryptocurrency Price Forecasting and Strategy Backtesting. Appl. Artif. Intell. 2026, 40, 2612793. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
Wang, C.; Chen, Y.; Zhang, S.; Zhang, Q. Stock Market Index Prediction Using Deep Transformer Model. Expert Syst. Appl. 2022, 208, 118128. [Google Scholar] [CrossRef]
Gao, M.-C. StockCI: A Hybrid Model Integrating CEEMDAN and Informer for Enhanced Long-Term Stock Price Forecasting. Complex Intell. Syst. 2025, 12, 74. [Google Scholar] [CrossRef]
Su, J.; Lau, R.Y.K.; Du, Y.; Yu, J.; Zhang, H. A Novel Hybrid Framework for Stock Price Prediction Integrating Adaptive Signal Decomposition and Multi-Scale Feature Extraction. Appl. Sci. 2025, 15, 12450. [Google Scholar] [CrossRef]
Ge, S.; Lin, A. An Adaptive Selection Decomposition Hybrid Model for Stock Time Series Forecasting. Nonlinear Dyn. 2025, 113, 4647–4669. [Google Scholar] [CrossRef]
Minh, H.B.; An, N.H.; Tuan, N.M. Multi-Step-Ahead Time Series Forecasting Based on CEEMDAN Decomposition and Temporal Convolutional Networks. In Proceedings of the 2022 International Conference on Advanced Computing and Analytics (ACOMPA), Ho Chi Minh City, Vietnam, 21–23 November 2022; IEEE: Ho Chi Minh City, Vietnam, 2022; pp. 54–59. [Google Scholar]
Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A Complete Ensemble Empirical Mode Decomposition with Adaptive Noise. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; IEEE: Prague, Czech Republic, 2011; pp. 4144–4147. [Google Scholar]
Zhang, Y.; Chen, Y.; Qi, Z.; Wang, S.; Zhang, J.; Wang, F. A Hybrid Forecasting System with Complexity Identification and Improved Optimization for Short-Term Wind Speed Prediction. Energy Convers. Manag. 2022, 270, 116221. [Google Scholar] [CrossRef]
Alipour, M.; Aghaei, J.; Norouzi, M.; Niknam, T.; Hashemi, S.; Lehtonen, M. A Novel Electrical Net-Load Forecasting Model Based on Deep Neural Networks and Wavelet Transform Integration. Energy 2020, 205, 118106. [Google Scholar] [CrossRef]
Chen, M.-Y.; Chen, B.-T. Online Fuzzy Time Series Analysis Based on Entropy Discretization and a Fast Fourier Transform. Appl. Soft Comput. 2014, 14, 156–166. [Google Scholar] [CrossRef]
Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency Enhanced Decomposed Transformer for Long-Term Series Forecasting. In International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2022. [Google Scholar]
Yi, K.; Zhang, Q.; Fan, W.; Wang, S.; Wang, P.; He, H.; Lian, D.; An, N.; Cao, L.; Niu, Z. Frequency-Domain MLPs Are More Effective Learners in Time Series Forecasting. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Neural Information Processing Systems Foundation, Inc.: South Lake Tahoe, NV, USA, 2023. [Google Scholar]
Wang, M.; Wang, H.; Zhang, F. Correctformer: A Transformer Architecture for Correcting Periodic Drift in Time-Series Forecasting. Neural Netw. 2026, 196, 108375. [Google Scholar] [CrossRef] [PubMed]
Tang, Z.; Ji, T.; Kang, J.; Huang, Y.; Tang, W. Learning Global and Local Features of Power Load Series through Transformer and 2D-CNN: An Image-Based Multi-Step Forecasting Approach Incorporating Phase Space Reconstruction. Appl. Energy 2025, 378, 124786. [Google Scholar] [CrossRef]
Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.-X.; Yan, X. Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Shehzad, H.T.; Anwar, M.A.; Razzaq, M. A Comparative Predicting Stock Prices Using Heston and Geometric Brownian Motion Models. arXiv 2023, arXiv:2302.07796. [Google Scholar] [CrossRef]
Box, G.E.P.; Pierce, D.A. Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models. J. Am. Stat. Assoc. 1970, 65, 1509–1526. [Google Scholar] [CrossRef]
Li, W.-J.; Zhang, D.-Q. GARCH-FIS: A Hybrid Forecasting Model with Dynamic Volatility-Driven Parameter Adaptation. arXiv 2026, arXiv:2603.14793. [Google Scholar]
Beniwal, M. Adaptive Weighted Genetic Algorithm-Optimized SVR for Robust Long-Term Forecasting of Global Stock Indices for Investment Decisions. arXiv 2025, arXiv:2512.15113. [Google Scholar]
Seabe, P.L.; Moutsinga, C.R.B.; Pindza, E. Forecasting Cryptocurrency Prices Using LSTM, GRU, and Bi-Directional LSTM: A Deep Learning Approach. Fractal Fract. 2023, 7, 203. [Google Scholar] [CrossRef]
Büyükşahin, Ü.Ç.; Ertekin, Ş. Improving Forecasting Accuracy of Time Series Data Using a New ARIMA-ANN Hybrid Method and Empirical Mode Decomposition. Neurocomputing 2019, 361, 151–163. [Google Scholar] [CrossRef]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.-C.; Tung, C.C.; Liu, H.H. The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-Stationary Time Series Analysis. Proc. R. Soc. Lond. A 1998, 454, 903–995. [Google Scholar] [CrossRef]
Wu, Z.; Huang, N.E. Ensemble Empirical Mode Decomposition: A Noise-Assisted Data Analysis Method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
Song, D.; Chung Baek, A.M.; Kim, N. Forecasting Stock Market Indices Using Padding-Based Fourier Transform Denoising and Time Series Deep Learning Models. IEEE Access 2021, 9, 83786–83796. [Google Scholar] [CrossRef]
Jin, Z.; Yang, Y.; Liu, Y. Stock Closing Price Prediction Based on Sentiment Analysis and LSTM. Neural Comput. Appl. 2020, 32, 9713–9729. [Google Scholar] [CrossRef]
Yemets, K.; Izonin, I.; Dronyuk, I. Time Series Forecasting Model Based on the Adapted Transformer Neural Network and FFT-Based Features Extraction. Sensors 2025, 25, 652. [Google Scholar] [CrossRef] [PubMed]
Zhang, Q.; Yang, P.; Wen, H.; Li, X.; Wang, H.; Sun, F.; Song, Z.; Lai, Z.; Ma, R.; Han, R.; et al. Beyond the Time Domain: Recent Advances on Frequency Transforms in Time Series Analysis. arXiv 2025, arXiv:2504.07099. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Kitaev, N.; Kaiser, Ł.; Levskaya, A. Reformer: The Efficient Transformer. arXiv 2020, arXiv:2001.04451. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; AAAI: Washington, DC, USA, 2021. [Google Scholar]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Online, 6–14 December 2021. [Google Scholar]
Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A Time Series Is Worth 64 Words: Long-Term Forecasting with Transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2017. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2020. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Vaziri, J.; Farid, D.; Nazemi Ardakani, M.; Hosseini Bamakan, S.M.; Shahlaei, M. A Time-Varying Stock Portfolio Selection Model Based on Optimized PSO-BiLSTM and Multi-Objective Mathematical Programming under Budget Constraints. Neural Comput. Appl. 2023, 35, 18445–18470. [Google Scholar] [CrossRef]
Wu, J.M.-T.; Li, Z.; Herencsar, N.; Vo, B.; Lin, J.C.-W. A Graph-Based CNN-LSTM Stock Price Prediction Algorithm with Leading Indicators. Multimed. Syst. 2023, 29, 1751–1770. [Google Scholar] [CrossRef]
Li, X.; Sun, Y. Stock Intelligent Investment Strategy Based on Support Vector Machine Parameter Optimization Algorithm. Neural Comput. Appl. 2020, 32, 1765–1775. [Google Scholar] [CrossRef]
Tu, X.; Fu, L.; Wang, Q. Carbon Price Prediction Based on Multidimensional Association Rules and Optimized Multi-Factor LSTM Model. Energy 2025, 329, 136768. [Google Scholar] [CrossRef]
Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; AAAI: Washington, DC, USA, 2023. [Google Scholar]
Liu, Y.; Li, C.; Wang, J.; Long, M. Koopa: Learning Non-Stationary Time Series Dynamics with Koopman Predictors. In Proceedings of the Advances in Neural Information Processing Systems 36 (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
Zhang, T.; Zhang, Y.; Cao, W.; Bian, J.; Yi, X.; Zheng, S.; Li, J. Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-Oriented MLP Structures. arXiv 2022, arXiv:2207.01186. [Google Scholar]
Zhou, T.; Ma, Z.; Wang, X.; Wen, Q.; Sun, L.; Yao, T.; Yin, W.; Jin, R. FiLM: Frequency Improved Legendre Memory Model for Long-Term Time Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems 35 (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
Das, A.; Kong, W.; Leach, A.; Mathur, S.; Sen, R.; Yu, R. Long-Term Forecasting with TiDE: Time-Series Dense Encoder. arXiv 2023, arXiv:2304.08424. [Google Scholar]

Figure 1. Multi-head attention mechanism structure diagram.

Figure 2. Improved transformer encoder architecture.

Figure 3. Simplified schematic of depthwise separable convolution.

Figure 4. The flowchart of FAMS-Transformer hybrid model.

Figure 5. Daily closing price trends of the three stock indices.

Figure 6. Comparison of error metrics of different models on three datasets under one-step forecasting.

Figure 7. The fitting curves of the predicted values and the true values of each model on the SSE dataset under different prediction steps are compared.

Figure 8. Comparison of the predicted value and the real value fitting curve of each model on the SZSE data set under different prediction steps.

Figure 9. The fitting curves of the predicted values and the true values of each model on the SME100 data set under different prediction steps are compared.

Figure 10. Comparison of the predicted value and the real value fitting curve of each model on the S & P 500 data set under different prediction steps.

Table 1. Descriptive statistics of SSE, SZSE, and SME 100.

Index	Count	Max	Min	Mean	Standard Deviation	Skewness	Kurtosis
SSE	2661	5166.350	2003.490	3137.932	404.876	0.344	3.679
SZSE	2661	18,098.270	7089.440	10,926.937	2038.115	0.533	0.009
SME100	2661	11,996.520	4465.450	7131.459	1423.868	0.537	−0.203

Table 2. Specific names and symbols of the input features.

	Classification	Indicator Name
Market Trading Indicators	Price-Volume Indicators	Open price, Close price, Price change, Volume
Market Trading Indicators	Technical Indicators	P/E ratio (TTM), P/B ratio (MRQ), P/S ratio (TTM), P/CF ratio (TTM), 5-day/10-day Moving Average, MACD, Momentum Indicator, Bollinger Bands, Williams Variable Accumulation/Distribution
External Factor Indicators	Commodity Indicators	Carbon trading prices (Beijing, Shanghai, Shenzhen); Crude oil prices (WTI, Brent); Gold prices (Shanghai Gold Exchange closing price, London spot gold closing price)
	Global Capital Market Indicators	Global stock indices (S & P 500, Dow Jones Industrial Average, Hang Seng Index, Nikkei 225); Foreign exchange rates (EUR/CNY, JPY/CNY, HKD/CNY)
	Macroeconomic Variables	Money supply and inflation (China M2 year-on-year growth rate (monthly), China CPI month-on-month, China CPI year-on-year, China CPI consumer goods year-on-year); China goods export growth rate (monthly); Interest rates and credit (China 10-year and 1-year government bond yield spread (monthly), Ratio of China net bond issuance to year-end market capitalization (monthly))
	Sentiment Indicators	VIX closing price

All data are sourced from the WIND database.

Table 3. The evaluation results of multi-step prediction of FAMS-Transformer and baseline model on three datasets.

Model		1 Step				5 Steps				10 Steps				15 Steps
Model		MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²
SSE	OurModel	26.403	38.406	0.835	0.959	42.624	63.142	1.350	0.889	56.182	82.315	1.782	0.808	67.248	96.728	2.135	0.730
	Lstm	113.066	137.803	3.584	0.475	113.901	139.141	3.598	0.459	129.992	158.636	4.123	0.286	120.347	150.010	3.847	0.350
	SVR	228.231	278.131	7.472	−1.139	219.559	268.478	7.185	−1.014	207.981	255.782	6.801	−0.857	198.637	245.824	6.493	−0.745
	Transformer	81.574	105.836	2.611	0.690	96.165	120.741	3.089	0.593	111.527	139.510	3.596	0.448	117.328	143.801	3.756	0.403
	Autoformer	75.475	98.089	2.380	0.734	89.915	115.920	2.832	0.624	104.965	137.516	3.311	0.463	114.042	145.033	3.613	0.392
	Informer	148.229	181.721	4.840	0.087	118.590	145.204	3.744	0.411	125.345	155.486	3.977	0.314	134.265	165.092	4.228	0.213
	FEDformer	74.978	95.321	2.387	0.749	98.259	126.694	3.116	0.551	108.891	139.113	3.459	0.451	116.273	146.080	3.705	0.384
	iTransformer	27.828	40.193	0.881	0.955	44.086	65.581	1.397	0.880	57.252	84.704	1.815	0.796	67.681	97.868	2.148	0.723
	PatchTST	27.637	39.563	0.874	0.957	44.363	65.195	1.405	0.881	58.528	85.343	1.854	0.793	69.690	99.681	2.210	0.713
	TimesNet	67.430	90.993	2.125	0.771	77.410	105.912	2.455	0.687	81.024	109.825	2.566	0.658	91.405	122.090	2.899	0.570
	FreTS	29.363	41.686	0.934	0.952	47.729	68.158	1.516	0.870	59.180	83.443	1.884	0.802	71.048	98.067	2.263	0.722
	DLinear	68.350	89.396	2.176	0.779	92.096	122.120	2.882	0.583	99.533	131.162	3.131	0.512	123.005	154.731	3.903	0.309
	Koopa	44.877	65.984	1.412	0.880	53.735	77.886	1.702	0.830	64.802	91.393	2.061	0.763	75.712	105.471	2.402	0.679
	LightTS	53.720	72.363	1.706	0.855	61.871	83.103	1.959	0.807	66.245	93.077	2.087	0.754	84.518	112.798	2.669	0.633
	TiDE	54.441	74.768	1.724	0.845	86.072	117.556	2.722	0.614	83.925	115.831	2.657	0.619	88.312	120.529	2.799	0.58
	FiLM	76.844	100.133	2.440	0.723	73.577	97.177	2.346	0.736	73.727	103.619	2.332	0.695	85.964	117.088	2.724	0.604
SZSE	OurModel	124.367	182.953	1.158	0.985	206.896	306.086	1.931	0.958	282.320	409.791	2.635	0.923	345.009	492.748	3.220	0.887
	Lstm	1369.424	1639.565	13.728	−0.190	1328.959	1558.876	13.353	−0.090	1361.853	1617.977	13.721	−0.195	1088.812	1370.676	10.873	0.126
	SVR	1356.614	1671.816	13.565	−0.237	1328.803	1639.327	13.274	−0.205	1285.854	1593.720	12.840	−0.159	1256.020	1560.495	12.545	−0.133
	Transformer	1368.789	1622.991	13.820	−0.166	1500.539	1725.613	15.043	−0.335	1667.031	1887.478	16.658	−0.626	1580.839	1815.125	15.834	−0.533
	Autoformer	330.818	443.570	3.084	0.913	395.564	518.707	3.655	0.879	511.272	679.042	4.747	0.790	563.955	724.182	5.251	0.756
	Informer	1520.958	1702.362	15.195	−0.283	1233.787	1479.367	12.406	0.019	1288.809	1544.958	12.897	−0.089	1179.318	1434.305	11.782	0.043
	FEDformer	371.260	470.922	3.479	0.902	494.360	624.938	4.611	0.825	552.153	698.491	5.155	0.777	596.261	751.435	5.588	0.737
	iTransformer	136.504	192.525	1.273	0.984	213.463	313.735	1.994	0.956	282.465	412.148	2.639	0.922	340.639	489.736	3.184	0.888
	PatchTST	127.265	184.421	1.187	0.985	216.459	317.364	2.021	0.955	286.451	418.211	2.672	0.920	352.035	503.643	3.284	0.882
	TimesNet	324.116	430.632	3.005	0.918	417.421	551.698	3.887	0.864	431.481	578.179	4.011	0.847	463.216	611.000	4.316	0.826
	FreTS	141.058	196.246	1.338	0.983	243.994	335.867	2.306	0.949	297.260	408.382	2.842	0.924	365.141	494.837	3.488	0.886
	DLinear	349.852	448.188	3.311	0.911	407.736	546.838	3.771	0.866	461.339	604.424	4.309	0.833	584.959	734.819	5.524	0.749
	Koopa	207.161	298.054	1.920	0.961	267.891	382.312	2.493	0.934	340.774	469.976	3.187	0.899	394.082	543.740	3.675	0.862
	LightTS	266.857	351.074	2.538	0.945	305.507	405.234	2.862	0.926	326.067	433.316	3.055	0.914	388.962	510.445	3.671	0.879
	TiDE	271.075	369.550	2.527	0.940	443.331	597.655	4.122	0.840	433.526	590.476	4.036	0.841	459.871	620.316	4.287	0.821
	FiLM	407.282	525.208	3.794	0.878	399.561	524.269	3.732	0.877	375.403	521.600	3.494	0.876	447.862	607.176	4.173	0.828
SME100	OurModel	81.759	114.418	1.165	0.990	137.772	194.836	1.972	0.972	183.906	257.603	2.632	0.951	222.256	308.989	3.179	0.928
	Lstm	894.928	1122.965	14.270	0.084	941.185	1134.317	14.924	0.058	873.126	1080.850	13.890	0.134	862.426	1063.915	13.407	0.150
	SVR	1035.499	1259.150	16.175	−0.151	1023.054	1241.835	15.955	−0.129	1001.112	1215.803	15.599	−0.095	987.008	1198.412	15.376	−0.079
	Transformer	856.209	1116.354	13.829	0.095	890.111	1144.961	14.382	0.040	951.746	1199.412	15.305	−0.066	895.266	1138.797	14.391	0.026
	Autoformer	208.789	271.455	3.017	0.947	268.800	346.569	3.835	0.912	327.340	424.967	4.688	0.866	351.160	453.881	5.023	0.845
	Informer	840.468	1040.687	13.506	0.214	942.630	1164.505	15.039	0.007	908.386	1120.765	14.420	0.069	817.880	1018.858	12.925	0.220
	FEDformer	242.317	304.776	3.483	0.933	322.260	400.957	4.611	0.882	353.874	442.420	5.071	0.855	383.527	481.322	5.511	0.826
	iTransformer	89.912	123.116	1.285	0.989	140.043	197.208	2.004	0.972	184.316	257.680	2.640	0.951	220.359	306.320	3.157	0.930
	PatchTST	84.535	117.391	1.205	0.990	142.360	200.222	2.038	0.971	187.066	262.563	2.679	0.949	228.014	316.611	3.264	0.925
	TimesNet	222.274	291.709	3.159	0.938	257.510	340.329	3.667	0.915	278.991	371.423	3.968	0.898	336.187	434.568	4.806	0.858
	FreTS	93.675	124.326	1.363	0.989	160.229	213.114	2.318	0.967	195.697	258.518	2.868	0.950	241.419	315.305	3.542	0.925
	DLinear	226.103	283.639	3.303	0.942	263.217	346.951	3.743	0.912	298.706	385.267	4.292	0.890	380.595	472.080	5.551	0.833
	Koopa	134.102	186.166	1.908	0.975	176.521	242.306	2.518	0.957	224.024	301.377	3.208	0.933	254.150	342.975	3.634	0.912
	LightTS	178.248	227.495	2.607	0.962	200.543	260.122	2.889	0.950	221.059	284.139	3.183	0.940	264.035	334.520	3.826	0.916
	TiDE	177.254	234.757	2.530	0.960	284.586	375.160	4.063	0.897	277.809	369.460	3.971	0.899	293.102	388.366	4.193	0.887
	FiLM	262.498	335.494	3.748	0.918	260.354	339.592	3.720	0.916	241.051	325.202	3.445	0.922	285.604	380.603	4.084	0.891

Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.

Table 4. Evaluation results of multi-step prediction of FAMS-Transformer and baseline model on SP500 dataset.

Model		1 Step				5 Steps				10 Steps				15 Steps
Model		MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²
SP500	OurModel	43.114	57.491	0.853	0.997	64.302	87.986	1.268	0.992	84.765	113.812	1.671	0.987	107.485	139.476	2.118	0.98
	Lstm	1352.164	1498.576	24.623	−1.295	1621.364	1727.415	30.065	−2.063	1582.292	1691.607	29.385	−1.952	1662.456	1763.336	30.959	−2.225
	SVR	2198.776	2439.126	39.726	−5.081	2217.17	2451.434	40.134	−5.168	2243.873	2469.67	40.721	−5.293	2271.16	2489.939	41.31	−5.43
	Transformer	1216.485	1467.396	21.162	−1.201	1255.306	1479.128	22.099	−1.246	1352.903	1570.076	23.964	−1.543	1393.996	1612.991	24.723	−1.698
	Autoformer	80.666	104.397	1.573	0.989	90.395	118.882	1.787	0.985	114.539	153.451	2.277	0.976	142.943	178.802	2.85	0.967
	Informer	1219.263	1450.294	21.367	−1.15	1366.067	1571.478	24.289	−1.535	1440.401	1626.381	25.858	−1.729	1508.868	1684.644	27.224	−1.943
	FEDformer	93.871	119.282	1.854	0.985	118.862	149.092	2.354	0.977	129.942	163.07	2.582	0.973	140.929	176.762	2.809	0.968
	iTransformer	50.164	68.887	0.988	0.995	74.752	98.108	1.479	0.99	99.394	126.75	1.967	0.983	117.723	147.656	2.329	0.977
	PatchTST	52.332	69.014	1.039	0.995	86.849	110.171	1.719	0.988	108.755	135.802	2.156	0.981	126.124	155.279	2.494	0.975
	TimesNet	77.614	99.941	1.522	0.99	102.721	130.621	2.051	0.982	110.861	140.114	2.185	0.98	116.517	150.913	2.318	0.976
	FreTS	51.7	66.576	1.000	0.995	69.798	93.321	1.374	0.991	103.543	130.962	2.009	0.982	127.498	160.504	2.463	0.973
	DLinear	120.416	141.543	2.35	0.98	198.175	232.915	3.797	0.944	214.181	257.551	4.093	0.932	215.072	263.01	4.122	0.928
	Koopa	85.981	115.093	1.716	0.986	80.151	106.269	1.59	0.988	99.812	128.455	1.979	0.983	116.081	146.047	2.299	0.978
	LightTS	185.861	227.257	3.331	0.947	184.166	229.792	3.344	0.946	339.258	428.274	6.075	0.811	414.287	524.201	7.333	0.715
	TiDE	83.007	108.863	1.643	0.988	126.585	157.964	2.509	0.974	136.07	171.541	2.693	0.97	151.568	188.284	2.993	0.963
	FiLM	114.851	142.365	2.308	0.979	96.874	131.728	1.966	0.982	117.246	144.812	2.327	0.978	130.626	161.699	2.603	0.973

Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.

Table 5. Significance test results—Wilcoxon signed-rank test (α = 0.05).

Baseline	SSE				SZSE				SMESE				SP500
Baseline	pl = 1	pl = 5	pl = 10	pl = 15	pl = 1	pl = 5	pl = 10	pl = 15	pl = 1	pl = 5	pl = 10	pl = 15	pl = 1	pl = 5	pl = 10	pl = 15
Autoformer	***	***	**	***	*	ns	***	***	***	ns	***	***	***	ns	***	***
DLinear	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***
FEDformer	***	***	***	***	***	***	***	***	***	***	***	***	***	***	ns	***
FiLM	***	***	***	ns	***	***	ns	*	***	***	ns	***	***	***	***	***
FreTS	***	***	***	***	***	***	***	***	***	***	***	***	***	ns	***	***
Informer	***	***	***	**	***	***	***	***	***	***	***	***	***	***	***	***
iTransformer	ns	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***
Koopa	***	***	***	***	***	***	***	***	***	***	***	***	***	***	ns	***
LightTS	***	***	***	***	***	ns	***	*	***	ns	***	ns	***	***	***	***
Lstm	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***
PatchTST	ns	***	***	***	***	***	***	***	ns	***	***	*	***	***	*	***
SVR	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***
TiDE	***	**	***	ns	***	***	***	ns	***	***	***	***	***	***	***	**
TimesNet	ns	ns	***	***	**	***	***	***	***	***	***	***	***	*	**	***
Transformer	**	ns	***	ns	***	***	***	***	***	***	***	***	***	***	***	***

Note: *** p < 0.01; ** p < 0.05; * p < 0.1; ns, p ≥ 0.1.

Table 6. Significance test results—paired t-test (α = 0.05).

Baseline	SSE				SZSE				SMESE				SP500
Baseline	pl = 1	pl = 5	pl = 10	pl = 15	pl = 1	pl = 5	pl = 10	pl = 15	pl = 1	pl = 5	pl = 10	pl = 15	pl = 1	pl = 5	pl = 10	pl = 15
Autoformer	***	**	ns	***	ns	ns	***	ns	***	ns	***	***	***	ns	***	***
DLinear	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***
FEDformer	***	***	***	***	***	***	***	***	***	***	***	***	***	ns	*	***
FiLM	***	***	***	***	***	***	***	*	***	***	***	ns	***	***	***	***
FreTS	***	***	***	***	***	***	***	***	***	***	***	***	***	ns	***	***
Informer	***	***	***	**	***	***	***	***	***	***	***	***	***	***	***	***
iTransformer	ns	***	***	***	***	***	***	***	***	***	***	***	**	*	**	***
Koopa	***	***	***	***	***	***	***	***	***	***	***	***	***	***	**	***
LightTS	*	***	***	***	***	ns	***	***	***	ns	***	***	***	***	***	***
Lstm	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***
PatchTST	ns	***	***	***	*	***	***	***	ns	*	***	ns	***	***	ns	***
SVR	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***	***
TiDE	*	ns	***	**	***	***	***	ns	***	***	***	ns	**	***	***	***
TimesNet	**	ns	***	***	ns	***	***	***	***	ns	***	***	***	ns	ns	***
Transformer	ns	ns	***	ns	***	***	***	***	***	***	***	***	***	***	***	***

Note: *** p < 0.01; ** p < 0.05; * p < 0.1; ns, p ≥ 0.1.

Table 7. R² of all models across three volatility regimes on four datasets (prediction horizon pl = 1).

Model	SSE			SZSE			SMESE			SP500
Model	Low	Medium	High	Low	Medium	High	Low	Medium	High	Low	Medium	High
FAMS-Transformer (Ours)	0.988	0.969	0.900	0.994	0.993	0.956	0.994	0.994	0.984	0.998	0.997	0.993
Autoformer	0.802	0.675	0.645	0.928	0.949	0.811	0.934	0.961	0.935	0.989	0.991	0.980
DLinear	0.900	0.830	0.512	0.937	0.952	0.786	0.943	0.958	0.921	0.974	0.986	0.965
FEDformer	0.814	0.816	0.536	0.919	0.947	0.778	0.932	0.956	0.906	0.984	0.990	0.973
FiLM	0.894	0.789	0.359	0.927	0.933	0.692	0.933	0.946	0.878	0.982	0.988	0.954
FreTS	0.980	0.965	0.889	0.990	0.992	0.953	0.990	0.993	0.983	0.997	0.996	0.991
Informer	0.058	0.230	−0.219	−0.449	0.120	−1.031	0.010	0.155	0.340	−2.691	−1.697	−0.359
Koopa	0.967	0.930	0.681	0.983	0.983	0.882	0.985	0.985	0.958	0.986	0.991	0.973
LightTS	0.941	0.890	0.671	0.961	0.976	0.857	0.963	0.975	0.947	0.911	0.943	0.955
Lstm	0.576	0.544	0.152	−0.077	0.140	−1.112	0.018	−0.037	0.195	−2.783	−1.787	−0.690
PatchTST	0.987	0.969	0.893	0.993	0.994	0.954	0.993	0.994	0.983	0.997	0.996	0.990
SVR	−1.806	−0.720	−1.120	−0.429	−0.018	−0.546	−0.489	−0.222	0.041	−8.920	−6.330	−3.630
TiDE	0.948	0.873	0.642	0.964	0.969	0.842	0.967	0.975	0.939	0.990	0.992	0.975
TimesNet	0.913	0.813	0.483	0.943	0.954	0.804	0.944	0.956	0.914	0.990	0.992	0.981
Transformer	0.553	0.708	0.793	−0.272	0.178	−0.851	−0.079	0.014	0.223	−2.743	−1.781	−0.388
iTransformer	0.986	0.966	0.893	0.992	0.992	0.955	0.992	0.993	0.983	0.997	0.996	0.989

Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.

Table 8. Ablation experimental results.

Model		1 Step				5 Steps				10 Steps				15 Steps
Model		MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²	MAE	RMSE	MAPE (%)	R²
	OurModel	26.403	38.406	0.835	0.9592	42.624	63.142	1.35	0.8886	56.182	82.315	1.782	0.8077	67.248	96.728	2.135	0.7298
	w/o Conv	26.341	38.206	0.834	0.9596	42.911	63.483	1.359	0.8874	56.693	82.89	1.799	0.805	67.879	97.439	2.155	0.7258
	w/o Decomp	27.322	39.86	0.866	0.9561	43.87	64.89	1.391	0.8823	57.939	84.761	1.839	0.7961	69.002	98.756	2.191	0.7183
	w/o Both	26.925	39.237	0.853	0.9574	44.044	64.908	1.396	0.8823	56.928	83.472	1.806	0.8023	69.561	99.354	2.208	0.7149
SSE	Standard Conv1D	27.394	40.217	0.868	0.9553	44.239	65.789	1.403	0.879	58.133	85.086	1.844	0.7946	69.65	99.51	2.212	0.714
	Dilated Depthwise Conv1D	27.809	41.132	0.88	0.9532	44.166	65.416	1.4	0.8804	57.734	84.516	1.832	0.7973	69.015	98.927	2.191	0.7174
	Original Transformer	27.222	39.817	0.861	0.9562	44.367	65.445	1.406	0.8803	57.57	84.092	1.826	0.7993	69.709	99.518	2.213	0.714
	Fixed-period	26.377	38.38	0.834	0.9593	42.653	63.032	1.351	0.889	56.135	82.398	1.78	0.8073	67.346	96.942	2.138	0.7286
	Random-period	26.957	39.36	0.851	0.9572	42.968	63.962	1.36	0.8857	57.262	83.497	1.816	0.8022	68.401	98.487	2.171	0.7199
	OurModel	124.367	182.953	1.158	0.9852	206.896	306.086	1.931	0.958	282.32	409.791	2.635	0.9234	345.009	492.748	3.22	0.887
	w/o Conv	129.113	186.939	1.204	0.9845	211.057	310.648	1.971	0.9567	285.788	414.583	2.667	0.9216	348.847	497.872	3.255	0.8846
	w/o Decomp	128.602	188.944	1.201	0.9842	214.322	315.395	2.003	0.9554	290.062	420.557	2.709	0.9193	350.837	500.619	3.276	0.8834
	w/o Both	128.596	186.089	1.201	0.9847	214.9	315.425	2.007	0.9554	281.854	408.257	2.631	0.9239	354.938	505.683	3.314	0.881
SZSE	Standard Conv1D	132.816	193.178	1.24	0.9835	216.269	317.52	2.022	0.9548	291.147	421.924	2.719	0.9187	354.817	503.229	3.315	0.8821
	Dilated Depthwise Conv1D	126.984	187.337	1.183	0.9845	215.772	317.705	2.017	0.9547	290.415	420.392	2.713	0.9193	351.059	501.037	3.278	0.8832
	Original Transformer	130.853	190.965	1.22	0.9839	217.147	318.273	2.029	0.9546	285.112	414.542	2.663	0.9216	354.491	505.042	3.31	0.8813
	Fixed-period	124.701	183.676	1.16	0.9851	208.842	307.783	1.95	0.9575	281.837	409.472	2.63	0.9235	344.511	492.87	3.215	0.8869
	Random-period	126.882	186.047	1.182	0.9847	208.908	308.895	1.946	0.9572	284.202	411.84	2.658	0.9226	346.931	493.677	3.242	0.8866
	OurModel	81.759	114.418	1.165	0.9905	137.772	194.836	1.972	0.9722	183.906	257.603	2.632	0.9508	222.256	308.989	3.179	0.9283
	w/o Conv	85.914	119.689	1.226	0.9896	138.094	195.156	1.975	0.9721	185.729	260.403	2.658	0.9498	224.789	312.302	3.216	0.9267
	w/o Decomp	85.878	119.839	1.229	0.9896	140.45	198.42	2.013	0.9712	188.856	264.447	2.707	0.9482	223.462	308.316	3.203	0.9286
	w/o Both	86.682	119.18	1.237	0.9897	141.599	199.557	2.029	0.9708	187.238	262.251	2.683	0.949	229.306	317.643	3.286	0.9242
SME 100	Standard Conv1D	88.166	123.075	1.26	0.989	142.748	200.979	2.047	0.9704	190.583	266.748	2.733	0.9473	229.608	316.871	3.291	0.9246
	Dilated Depthwise Conv1D	86.426	121.321	1.235	0.9893	141.619	200.107	2.029	0.9707	189.504	265.043	2.716	0.9479	227.539	315.249	3.26	0.9254
	Original Transformer	88.303	123.087	1.262	0.989	142.142	200.195	2.037	0.9707	186.055	260.411	2.666	0.9497	229.023	316.948	3.281	0.9245
	Fixed-period	82.287	115.509	1.172	0.9903	135.977	192.44	1.945	0.9729	183.93	257.645	2.632	0.9508	223.328	310.15	3.195	0.9277
	Random-period	86.47	118.653	1.232	0.9898	137.047	193.691	1.966	0.9725	182.277	255.674	2.616	0.9516	224.551	310.251	3.216	0.9277

Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.

Table 9. Comparison of efficiency index of four architecture variants under four prediction steps.

Model	1 Step		5 Steps		10 Steps		15 Steps
Model	Params	FLOPs	Params	FLOPs	Params	FLOPs	Params	FLOPs
Standard Conv1D	7.891 M	1.538 G	7.898 M	1.538 G	7.905 M	1.539 G	7.913 M	1.539 G
w/o Decomp	6.322 M	1.232 G	6.328 M	1.232 G	6.335 M	1.233 G	6.343 M	1.233 G
Dilated Depthwise Conv1D	6.322 M	1.232 G	6.328 M	1.232 G	6.335 M	1.233 G	6.343 M	1.233 G
Original Transformer	6.316 M	1.231 G	6.322 M	1.231 G	6.329 M	1.231 G	6.337 M	1.232 G

Table 10. Statistics of Jaccard similarity of dominant period sets between adjacent test windows.

Dataset	Num. Windows	Mean Jaccard	Median Jaccard	Full-Overlap Ratio	Most Frequent Periods
SSE	762	0.9693	1	0.9528	29, 15, 11
SZSE	762	0.9693	1	0.9528	29, 15, 11
SMESE	762	0.9693	1	0.9528	29, 15, 11
S&P500	884	0.92	1	0.879	29, 15, 11, 7, 5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, H.; Zeng, X.; Hu, G.; Zhang, T. A Hybrid Model for Stock Index Forecasting Integrating Adaptive Frequency-Domain Decomposition and Enhanced Transformer Encoder. Mathematics 2026, 14, 2202. https://doi.org/10.3390/math14122202

AMA Style

Zheng H, Zeng X, Hu G, Zhang T. A Hybrid Model for Stock Index Forecasting Integrating Adaptive Frequency-Domain Decomposition and Enhanced Transformer Encoder. Mathematics. 2026; 14(12):2202. https://doi.org/10.3390/math14122202

Chicago/Turabian Style

Zheng, Hairong, Xiaozheng Zeng, Guoyu Hu, and Tingting Zhang. 2026. "A Hybrid Model for Stock Index Forecasting Integrating Adaptive Frequency-Domain Decomposition and Enhanced Transformer Encoder" Mathematics 14, no. 12: 2202. https://doi.org/10.3390/math14122202

APA Style

Zheng, H., Zeng, X., Hu, G., & Zhang, T. (2026). A Hybrid Model for Stock Index Forecasting Integrating Adaptive Frequency-Domain Decomposition and Enhanced Transformer Encoder. Mathematics, 14(12), 2202. https://doi.org/10.3390/math14122202

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Model for Stock Index Forecasting Integrating Adaptive Frequency-Domain Decomposition and Enhanced Transformer Encoder

Abstract

1. Introduction

2. Related Work

2.1. Research on the Decomposition–Forecasting Paradigm in Stock Prediction

2.2. Application of Frequency-Domain Decomposition Methods to Financial Time Series

2.3. Transformer Architecture and Its Application in Stock Prediction

3. Method

3.1. Frequency-Adaptive Decomposition Module

3.2. Enhanced Transformer Predictor Module

3.3. Feature Fusion Module

3.4. The Process of the Proposed Model

4. Experimental Design

4.1. Data Experiment Settings

4.2. Evaluation Metrics

4.3. Baseline Models

4.4. Experiment Design and Objectives

5. Experimental Results and Discussion

5.1. Comparison with Baseline Models

5.2. Verification of S & P 500 Data Set

5.3. Significance Testing

5.4. Volatility-Regime Analysis

5.5. Ablation Experiment

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI