Next Article in Journal
On the Unboundedness of the Number of Natural Solutions for a Parameter-Dependent System of Equations
Previous Article in Journal
Performance Evaluation of a Single-Server Queueing System with Correlated Arrivals, Two-Tier Service Structure, Random Breakdowns and Phase-Type Repairs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid Model for Stock Index Forecasting Integrating Adaptive Frequency-Domain Decomposition and Enhanced Transformer Encoder

1
College of Economics and Management, Fujian Agriculture and Forestry University, Fuzhou 350002, China
2
Agriculture and Forestry Artificial Intelligence Research Institute, College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
*
Authors to whom correspondence should be addressed.
Mathematics 2026, 14(12), 2202; https://doi.org/10.3390/math14122202
Submission received: 6 May 2026 / Revised: 15 June 2026 / Accepted: 16 June 2026 / Published: 18 June 2026

Abstract

Stock index price series are composed of superimposed multi-frequency components, including long-term trends, cyclical fluctuations, and stochastic noise. Effectively decoupling these heterogeneous components and modeling them separately is key to improving forecasting accuracy. Existing methods under the “decomposition–prediction” paradigm mostly employ fixed-scale decomposition, and the forecasting models are not specifically adapted to the non-stationary and high-noise characteristics of financial data, resulting in limitations in adaptivity and local dynamic capture. This paper proposes a frequency-aware adaptive multi-scale decomposition Transformer hybrid model (FAMS-Transformer). At the decomposition level, the fast Fourier transform is used to dynamically identify dominant cycles, thereby adaptively decoupling trends and fluctuations, overcoming the limitations of fixed-scale decomposition. At the forecasting level, a lightweight depthwise separable convolution is embedded between the self-attention and feedforward network of the Transformer encoder, enhancing the model’s ability to capture local temporal dynamics and achieving collaborative modeling of global dependencies and local information. Comparative experiments with 15 baseline models including LSTM, Transformer, TimesNet, and FreTS on three representative Chinese market indices—Shanghai Composite Index, Shenzhen Component Index, and Small and Medium Enterprises 100 Index—across four prediction horizons from one step to 15 steps demonstrate that FAMS-Transformer achieves the best forecasting accuracy in all scenarios. The coefficient of determination for 15-step prediction remains stably between 0.730 and 0.928. Moreover, the model still performs well on the S & P 500 dataset. Ablation studies and significance tests further validate the effectiveness of each core module and the statistical significance of the performance improvements.

1. Introduction

Accurate forecasting of stock index prices is a central issue in financial research, as its outcomes directly inform national economic policy making and the optimization of market investment decisions [1]. However, stock index price series inherently contain multi-frequency components, including long-term trends, cyclical fluctuations, and random noise [2], whose statistical properties and evolutionary patterns differ markedly, making the forecasting task extremely challenging. Consequently, how to effectively decouple and separately model these heterogeneous components is crucial for improving prediction accuracy. Reviewing the evolution of stock forecasting methods, Black and Scholes (1973) [3] and Merton (1973) constructed a general continuous-time pricing framework [4]. Since then, stochastic volatility models (Heston, 1993) and jump-diffusion extensions (Cont and Tankov, 2004) have enriched the parametric description system of financial time series [5,6]. However, such models have strict distribution assumptions. When the distribution assumption fails, the pricing deviation or risk measurement error of such models will increase significantly. In contrast, researchers have improved the model’s ability to characterize the nonlinear characteristics of financial sequences through traditional statistical models such as ARIMA and GARCH [7], as well as machine learning methods such as SVR and LSTM [8,9]. In recent years, the introduction of the Transformer [10] has brought a new technical paradigm to sequence modeling; its self-attention mechanism can effectively capture long-range dependencies and has significant advantages in dealing with stock prediction tasks [11].
However, these models share a common limitation: when directly processing raw stock series, the model must simultaneously handle the superposition of trends, multiple periodic fluctuations, and noise, which not only increases the learning difficulty but also renders the model overly sensitive to local noise, especially when the prediction step length is prolonged. To overcome this limitation, the “decomposition–forecasting” hybrid modeling paradigm has begun to gain attention in stock forecasting [1,12,13,14]. The core idea of this paradigm is to first decompose the original series into relatively simple components through data preprocessing techniques, then forecast each component separately and fuse the results. The existing research methods have effectively integrated the data-driven adaptive decomposition methods such as EMD, EEMD, CEEMDAN and VMD with the predictor, successfully separated the signal and noise, and greatly improved the prediction accuracy [15,16].
Furthermore, Fourier transform and wavelet transform, as classical frequency-domain analysis tools, have been successfully applied to time series tasks with clear physical laws such as wind speed prediction and power load prediction [17,18], and provide a technical path distinct from EMD-type methods, while their application of such methods in the field of stock forecasting is still in its infancy. Moreover, the decomposition scale used is mostly pre-set, which makes it difficult to flexibly adapt to the significant differences in the cyclical structure of different stocks and different market periods [8,19].
Unlike the above fixed-scale frequency-domain decomposition strategies, in the general time series forecasting domain, some cutting-edge research has begun to explore more flexible frequency-domain modeling approaches. TimesNet automatically identifies multi-periodic components in time series via FFT and reshapes the one-dimensional sequence into two-dimensional tensors to leverage convolutional networks for modeling [20]; FEDformer utilizes Fourier transform in a frequency enhancement module to capture the global properties of the series [21]; FreTS designs a frequency-domain MLP architecture that achieves efficient forecasting by focusing on key frequency components [22]. Although these models have achieved progress on their respective tasks, they were not specifically designed to accommodate the non-stationarity and heavy noise characteristics of financial data [23], and their transfer potential to stock forecasting tasks has not yet been fully verified.
At the same time, choosing the right prediction model is another complex challenge. The Transformer encoder excels at capturing global dependencies, but its standard self-attention mechanism is insufficiently sensitive to the local temporal patterns of sequences [24]. To address this deficiency, LogTrans adopted convolutional self-attention to enhance local information capture [25]. However, such improvements are mostly generic architectural optimizations and rarely consider the specific requirements of stock forecasting scenarios—namely, how to enhance the perception of typical financial time series features such as short-term trend reversals and local volatility clustering through a lightweight design while preserving the advantage of global dependency modeling.
In summary, although the existing research shows the advantages and effectiveness of the decomposition strategy and the artificial intelligence model in stock price forecasting, there are still some shortcomings. First, existing frequency-domain decomposition methods rely on fixed scales in stock prediction, and do not specifically adapt to financial data characteristics, lacking a mechanism to adaptively adjust the decomposition strategy according to the inherent cyclical structure of the stock price series itself. Second, Transformer encoders lack an effective means to achieve lightweight collaborative modeling of global dependencies and local dynamics when modeling stock price series.
To address the above issues, this paper proposes a Frequency-Aware Multi-Scale Decomposition Transformer (FAMS-Transformer) predictive framework, which achieves refined modeling of financial time series through a hybrid architecture of frequency-domain adaptive decomposition and enhanced encoding. Based on three representative indices in the Chinese market—the Shanghai Composite Index, the Shenzhen Component Index, and the SME 100 Index—and incorporating various external market factors such as carbon trading prices, crude oil prices, global stock indices, exchange rates, macroeconomic variables, and sentiment indicators, we set four forecast horizons of one step, five steps, 10 steps, and 15 steps, and systematically compare with fifteen baseline models: LSTM, SVR, Transformer, Autoformer, Informer, iTransformer, PatchTST, TimesNet, FreTS, DLinear, Koopa, LightTS, FEDformer, TiDE and FiLM. The main contributions of this paper are reflected in the following three aspects:
First, a frequency-aware adaptive multi-scale decomposition mechanism tailored for stock forecasting is proposed. Unlike the fixed-scale decomposition or globally uniform frequency selection strategies commonly adopted in existing research, this paper utilizes the fast Fourier transform (FFT) to dynamically identify the dominant periods of each input sequence in the frequency domain, based on which the trend decomposition scale is adaptively determined, thereby achieving the decoupling of trend and fluctuation in the time domain. This mechanism makes the decomposition process entirely driven by the cyclical characteristics of the stock data itself, avoiding the constraints of preset parameters on the decomposition effect, and provides a new solution for the adaptive modeling of non-stationary and heavily noisy financial sequences.
Second, a Transformer encoder integrated with an intermediate enhancement mechanism is constructed to realize collaborative modeling of global dependencies and local dynamics. This paper embeds a lightweight depthwise separable convolution module between the standard self-attention layer and the feed-forward network, which retains the advantage of self-attention in capturing long-range dependencies while enhancing the encoder’s ability to perceive short-term temporal patterns and local fluctuation features. This design compensates for the insensitivity of traditional Transformer encoders to local sequence information at a relatively low computational cost, providing a more balanced architectural scheme for sequence encoding in stock forecasting scenarios.
Third, the forecasting performance and robustness of the FAMS-Transformer hybrid model are validated through a systematic empirical study. Comparative experiments on three Chinese stock indices with different volatility characteristics, covering both short- and long-horizon multi-step forecasting scenarios, demonstrate that FAMS-Transformer outperforms the fifteen baseline models on all evaluation metrics; in the most challenging 15-step forecasting, the coefficient of determination remains stable between 0.730 and 0.928.
The ablation experiments confirm the independent contributions and complementary effects of the adaptive decomposition module and the intermediate enhancement depth separable one-dimensional convolution module, and significance testing further verifies the statistical reliability of the performance advantages. The experiment on the SP500 dataset further verifies the generalization ability of the model, indicating that the FAMS-Transformer hybrid model exhibits good robustness under different market structures. The remainder of this paper is organized as follows: Section 2 reviews related work; Section 3 elaborates on the architecture and core mechanisms of the FAMS-Transformer model; Section 4 details the experiment designed to verify the training effect of the model; Section 5 reports the results and discussion of the experiment; Section 6 concludes the paper, discusses the research limitations, and outlines future directions.

2. Related Work

This paper aims to improve the multi-step forecasting accuracy of stock index prices through a hybrid architecture that integrates adaptive frequency-domain decomposition with an enhanced Transformer encoder. Centered on this objective, this section reviews the relevant literature from three perspectives: the decomposition–forecasting paradigm in stock prediction, the application of frequency-domain decomposition methods to financial time series, and the use of Transformer encoders in stock price forecasting.

2.1. Research on the Decomposition–Forecasting Paradigm in Stock Prediction

Stock index price series are characterized by inherent non-stationarity, high noise levels, and the superposition of multiple frequency components [2], posing significant challenges to accurate forecasting. As early as 1973, Black and Scholes proposed the assumption that the stock price obeys the geometric Brownian motion [3]. On this basis, Merton systematically extended the continuous-time option pricing theory [4], which laid a mathematical foundation for all subsequent continuous-time financial models. Although continuous-time models have evolved [5,26], they rely on fixed mathematical assumptions and are difficult to adapt to complex nonlinear changes in the stock market [6]. Meanwhile, traditional statistical models such as ARIMA and GARCH rely on strong linear assumptions [27,28] and exhibit limited performance when handling nonlinear financial series. With the development of machine learning technology, nonlinear models such as SVR, LSTM, and GRU can learn complex patterns adaptively from data without pre-setting strict distribution assumptions, and gradually become mainstream tools in the field of stock forecasting [29,30]. However, it is difficult for these models to effectively distinguish the differential evolution of long-term trends, cyclical fluctuations and random noise. Consequently, as the forecast horizon lengthens, the risk of overfitting to noise increases significantly and predictive performance deteriorates markedly. This has prompted researchers not only to optimize the prediction model, but also to improve data preprocessing techniques.
The ‘decomposition–prediction’ hybrid modeling paradigm can effectively capture key time-frequency information, reduce data noise and greatly improve the prediction ability of financial models by transforming time series data into intrinsic mode functions (IMFs) with unique spectral structure and periodic characteristics [31]. Empirical mode decomposition (EMD), proposed by Huang et al. [32], is the most representative adaptive decomposition method under this paradigm and can decompose non-stationary signals into a number of intrinsic mode functions (IMFs) under weak linear assumptions. Nevertheless, standard EMD suffers from deficiencies such as mode mixing and sensitivity to local extrema. EEMD [33], CEEMDAN [16], VMD [34] and so on have been improved by different methods.
Building on these decomposition techniques, a series of hybrid frameworks tailored for stock forecasting have been proposed. Gao’s StockCI model integrates CEEMDAN with Informer: it first decomposes the original stock price series into multiple IMFs using CEEMDAN to reduce non-stationarity, then applies the ProbSparse self-attention mechanism of Informer to model each component over long sequences, outperforming baseline models such as ARIMA, RNN, and LSTM on high-frequency A-share market data [12]. The CoML framework proposed by He et al. adopts a three-stage strategy of decomposition–reconstruction–forecasting. After CEEMDAN decomposition, a fine-to-coarse algorithm reconstructs the components into high-frequency fluctuation terms and low-frequency trend terms, which are then modeled separately by BiLSTM, SVR, and MLP and ensembled, demonstrating strong performance in both emerging and developed markets [1]. The ASDH model by Ge and Lin further introduces an adaptive selection mechanism that automatically matches the optimal predictor among algorithms such as GABP, KNN, and ARMA according to the frequency characteristics of each component [14]. The CVASD-MDCM-Informer framework proposed by Su et al. employs an adaptive VMD optimized by the CPO algorithm for decomposition and incorporates a multi-scale dilated convolution module to enhance the capacity to capture short-term fluctuations and long-term trends [13]. The above research consistently shows that using data preprocessing techniques to decompose time series has better performance than traditional methods that directly predict the original sequence.
However, it is worth noting that the decomposition strategy used in the above methods performs a unified decomposition operation on the entire input sequence. Such a strategy struggles to flexibly adapt to the dynamic changes in cyclical structure across different time windows. When market conditions undergo rapid shifts, a fixed decomposition pattern may fail to promptly capture newly emerging dominant cycles, thereby limiting the model’s adaptability in complex market environments.

2.2. Application of Frequency-Domain Decomposition Methods to Financial Time Series

Unlike EMD-type methods that perform decomposition in the time domain, the Fourier transform and wavelet transform provide an alternative technical path for signal decomposition from a frequency-domain perspective. The Fourier transform can map a time-domain signal into the frequency domain, separating different frequency components hidden in the time-domain observations [35], and the introduction of the fast Fourier transform (FFT) enables this transformation to be implemented efficiently with low computational complexity [19]. It should be pointed out that the exploration of this kind of frequency-domain transformation method in the field of financial forecasting is still in its infancy. Chen and Chen combined FFT with a fuzzy time series model and applied it to the prediction of Taiwan Weighted Index and Dow Jones Industrial Average Index [19]. Zhang et al. proposed a WT-ARIMA-LSTM hybrid model that uses wavelet transform for multi-scale decomposition of stock index futures prices, with ARIMA and LSTM respectively processing components of different frequencies [8]. Jin et al. combined EMD with an attention-based LSTM and incorporated investor sentiment analysis [36]. These works have achieved good improvement results, and preliminarily verified the feasibility of frequency-domain analysis in stock forecasting. However, most of them adopt pre-set fixed decomposition scales and fail to dynamically adjust according to the local statistical characteristics of input data.
In the field of general time series prediction, researchers have proposed more flexible frequency-domain modeling methods such as TimesNet [20] and FEDformer [21]. In more recent studies, Yemets et al. proposed an FFT-based feature extraction scheme that extends the input Transformer with the phase and magnitude information of complex numbers as additional features, validating the effectiveness of fusing time-domain and frequency-domain information on multiple datasets [37]. Zhang et al. [38] provided a systematic review of the applications of the Fourier transform, Laplace transform, and wavelet transform in time series analysis, pointing out that frequency-domain transforms possess unique advantages in capturing global frequency components.
However, the aforementioned general frequency-domain models were not specifically designed to adapt to the characteristics of financial data [23]. The dominant cycles in financial time series may drift with changes in the market environment, whereas models such as TimesNet and FEDformer adopt a globally uniform strategy in the selection of frequency components, i.e., applying the same frequency selection rules to all input sequences, making it difficult to adequately capture such dynamic changes. Therefore, how to achieve adaptive decomposition in the frequency domain that matches the cyclical characteristics of the stock price series itself remains a problem that requires further in-depth exploration.

2.3. Transformer Architecture and Its Application in Stock Prediction

Transformer was first proposed by Vaswani et al. [10], with its core innovation being the replacement of traditional recurrent or convolutional structures with a self-attention mechanism, thereby enabling efficient parallelized modeling of dependencies between any positions in a sequence. This advantage has allowed it to rapidly expand from the field of natural language processing to time series forecasting tasks [39]. Wang et al. applied Transformer to stock index forecasting and achieved superior predictive performance over traditional methods such as LSTM on four major global indices—the CSI 300, S & P 500, Hang Seng Index, and Nikkei 225—validating the potential of the self-attention mechanism in financial time series modeling [11].
However, the self-attention mechanism of the standard Transformer suffers from a structural deficiency when encoding stock series. The attention weights computed via dot-product operations reflect the global similarity between any two points in the sequence, and they are insensitive to temporal features such as local fluctuation patterns and short-term trend reversals embedded in adjacent time steps [22]. In stock forecasting scenarios, short-term volatility clustering and abrupt trend reversals are common market phenomena, and the low sensitivity of standard self-attention to such local dynamics may lead to delayed model responses.
To address this issue, researchers have proposed improvement schemes from different perspectives, enhancing the Transformer across various dimensions such as computational efficiency, sequence decomposition, or feature partitioning. Examples include LogTrans [25], Reformer [40], Informer [41], Autoformer [42], and PatchTST [43].
However, these works have rarely been specifically designed to meet the need for collaborative modeling of global dependencies and local dynamics in stock forecasting scenarios, and they are unable to simultaneously achieve the synergistic optimization of global dependency modeling and local dynamic capture in such scenarios.
To overcome the above shortcomings, the FAMS-Transformer hybrid model proposed in this paper, at the decomposition level, dynamically identifies dominant cycles using FFT, thereby remedying the flexibility deficiency of fixed-scale decomposition. At the encoding level, a lightweight depthwise separable convolution is embedded between the self-attention layer and the feed-forward network within the Transformer encoder, preserving the advantage of global dependency modeling while enhancing the capacity to perceive local temporal dynamics. The subsequent chapters will systematically elaborate on the design principles and implementation details of each module.

3. Method

This paper proposes a hybrid model, FAMS-Transformer, which takes into account the influence of external stock market factors and enables multi-step forecasting. It is an end-to-end training process consisting of three modules: a frequency-adaptive decomposition module, a predictor module (Enhanced Transformer Encoder), and a feature fusion module. This section will elaborate on the specific working principle of each module.

3.1. Frequency-Adaptive Decomposition Module

The trend, period and noise components superimposed together in the time domain correspond to different frequency ranges in the frequency domain; in order to convert the time domain data to the frequency domain, we employ the Fourier transform method for data preprocessing. However, the standard Fourier transform requires that the original signal be a continuous function, expressed mathematically in integral form. Since stock price index data are discrete, we adopt the fast Fourier transform (FFT) [17] to perform the decomposition, so as to effectively capture the periodic trend patterns inherent in the data. Accordingly, the discrete input Xt is transformed into the frequency domain:
X t = F F T ( X t ) = A ( X t ) e j Φ ( X t ) = n = 0 N 1 X t t , n · e 2 π i N k n k = 0 N 1 = R e n = 0 N 1 X t t , n · e 2 π i N k n + i · I m n = 0 N 1 X t t , n · e 2 π i N k n ,     t { 1 ,   2 ,   ,   T }
where X t C T 2 × N denotes the output of the fast Fourier transform (FFT), representing the time series data in the frequency domain; A ( X t ) and Φ ( X t ) are the amplitude and phase, respectively. To simplify the notation, we introduce the abbreviation X t = R e X t + i · I m X t , where R e X t and I m X t correspond to the real and imaginary parts in Equation (1). In the subsequent sections, for brevity, we refer to Equation (1) as the FFT.
For stock index forecasting tasks, different indices and technical indicators often exhibit distinct numerical ranges and fluctuation amplitudes. Standardization can effectively mitigate the impact of scale inconsistencies and thus enhance the model’s generalization capability. Therefore, before applying the FFT along the temporal dimension, the model first performs standardization on the original time series at the input layer. Given an input sequence X R B × L × D , its statistical characteristics are defined as μ = mean ( X ) and σ = std ( X ) , from which a standardized representation X is obtained. Here, μ and σ represent the mean and standard deviation along the temporal dimension, respectively, and ϵ is a numerical stability term.
X = X μ σ R B × L × D
After completing the input standardization, this paper performs periodic modeling of the time series in the frequency domain and implements trend-wave decomposition in the time domain. Given the normalized sequence X R B × L × D , the fast Fourier transform is first performed in the time dimension:
X f = FFT ( X )
The TimesNet model proposed by Wu et al. is based on the idea of FFT-based period discovery and amplitude-weighted fusion. It has outstanding modeling ability in dealing with strong periodic data [20]. TimesNet assumes that the frequency amplitude can reflect the relative strength of the corresponding periodic components in the current input sequence, so it can be used as an important reference for the fusion of different periodic representations. In this paper, frequency intensity is used as a data-driven dynamic prior to adjust the relative contribution of the corresponding features of different frequency structures in the fusion stage. The specific fusion process will be elaborated in Section 3.3.
Next, the frequency amplitude of a frequency component calculated by fast Fourier transform (FFT) is calculated:
A = m e a n ( X f )
Among them, A R L / 2 represents the average intensity of different frequency components. The larger the amplitude of the frequency component, the more significant the dominant oscillation mode in the corresponding sequence. Therefore, the first k frequency indexes with the largest amplitude are selected:
{ f 1 ,   f 2 ,   ,   f k } = TopK ( A )
And they are mapped to the corresponding period:
p i = L f i , i { 1 ,   ,   k }
Then, based on the estimated period set p 1 p k , we select the maximum period as the trend scale:
p trend = m a x ( p 1 ,   ,   p k )
The trend component is extracted by dynamic moving average:
X trend = MovingAvg ( X , p trend )
Among them, MovingAvg represents the one-dimensional average pooling operation with boundary filling. The fluctuation component is defined as:
X vol = X X trend
In this process, if the frequency decomposition process is not handled properly, it may lead to the model’s over-reliance on the periodic structure in the sample within the input window, resulting in data leakage. Therefore, for each test sample, the dominant frequency and corresponding period are only calculated by the historical input window of the sample, and the predicted interval data are not involved in FFT period identification, decomposition scale construction or fusion weight calculation.

3.2. Enhanced Transformer Predictor Module

Transformer adopts an encoder–decoder architecture. The encoder compresses the key information of the input sequence into a fixed-length vector, and then the decoder converts it into an output result. The encoder–decoder architecture provides a solution for processing long sequence data. Therefore, the Transformer family has developed rapidly in the field of time series prediction in recent years [24].
The multi-head self-attention mechanism in the Transformer encoder collects and integrates information from different representation subspaces by running multiple attention heads in parallel, thereby achieving richer feature extraction and enhancing the model’s ability to capture the evolution pattern of the input sequence. The self-attention mechanism proposed by Vaswani et al. [10] is defined as follows:
A t t e n t i o n ( Q , K , V ) = S O F T M A X ( Q K T d k ) V
where Q = XWQ, K = XWK, and V = XWV are the query, key, and value matrices, respectively, obtained as the outputs of three distinct linear projections of the same input.
In the multi-head attention mechanism, each attention function is executed in parallel with the projection versions of the query, key, and value matrices. Then the output of all attention functions is spliced together through the linear layer to produce the final result. The formula of multi-head attention is expressed as:
M u l t i H e a d ( Q , K , V ) = C o n c a t ( h e a d 1 , , h e a d h ) W O
  h e a d i = A t t e n t i o n ( Q W i Q , K W i K , V W i V )
where i = 1, , h, W i Q ,   W i K ,   W i V is the weight of the corresponding network.
In essence, each attention head can be seen as observing a sequence from different perspectives: some focus on short-term dependencies, some highlight periodic patterns, and some capture long-term trends or local window structures, thereby effectively capturing long-term dependencies. Figure 1 shows the mechanism structure of multi-headed attention.
Differently from the original encoder proposed by Vaswani et al., and in order to reduce the computational overhead while taking into account the global dependency modeling and local dynamic capture capabilities, this paper introduces a lightweight deep separable convolution module between self-attention and feedforward networks. The self-attention first completes the global context aggregation. The depthwise separable convolution imposes local timing constraints on its output features. The feedforward network then performs nonlinear transformation on the representations that fuse global and local information. The three are functionally complementary. Figure 2 shows the improved transformer encoder architecture.
Deep separable convolution decomposes the standard convolution operation into two steps: the first step is channel-by-channel convolution, which independently processes the temporal dynamic features of each channel and effectively reduces redundant calculations; the second step is point-by-point convolution, which further integrates multi-channel information through 1 × 1 convolution, while avoiding over-modeling of the correlation between channels [44]. Deep separable convolution decouples time modeling and cross-feature interaction, which not only reduces the number of parameters, but also does not damage the performance of the model [45,46,47]. Figure 3 shows a simplified schematic of deep separable convolution.
After the frequency decomposition is completed, the model separately models the resulting decomposed components, namely the trend component X trend and the fluctuation component X vol . For a given input sequence X R B × L × D , the time series is partitioned into multiple patches of length P: X { P 1 ,   P 2 ,   ,   P N } , and an embedded representation Z R B · D × N × d model is obtained through linear mapping, where N denotes the number of patches.
Z ~ = Z + Dropout ( Attention ( Z ) )
Each layer goes through the Self-Attention block, Conv block, FFN block three modules in turn:
Z mid = Conv 1 D ( Z ~ )
Z = Z ~ + Dropout ( Z mid )
Z l = EncoderLayer ( Z l 1 )
Conv1D uses depthwise separable convolutions (groups = d model ) and performs local modeling on the patch dimension.
Z out = Z + FFN ( Z )
After the Transformer encoder, the output shape is Z out R B × D × d model × N , and the final prediction Y R B × T × D is obtained through a linear mapping.

3.3. Feature Fusion Module

We perform feature fusion on the trend component X trend and the fluctuation component X vol output by the encoder to achieve adaptive fusion of different dynamic modes. Subsequently, Y trend and Y vol are predicted.
Y trend = f ( X trend )
Y vol = f ( X vol )
In TimesNet, the frequency amplitude is normalized by Softmax, and the weighted fusion of different periodic representations is carried out [20]. In this paper, the frequency intensity W is used as the weight reference for the adaptive fusion of the trend component and the fluctuation component. The specific process is shown in Equations (20)–(24). It is worth noting that the frequency intensity is only used as a frequency-aware empirical dynamic weighting strategy, rather than an optimal weight estimation with strict theoretical guarantees.
W = Softmax ( A f 1 , , A f k )
s = m e a n ( W )
The learnable parameters are introduced to allocate the basic weights:
w = Softmax ( w 1 , w 2 )
Using frequency intensity information to construct dynamic weights:
S = Softmax ( A f 1 , , A f k )
s = m e a n ( S )
.
Among them, the weight is α = w 1 · s , β = w 2 · ( 2 s ) .
Finally, we express the fusion result as:
Y = α · Y trend + β · Y vol α + β
The predicted output Y ^ R B × T × 1 of the model only corresponds to the closing index. Since the input is standardized before training, de-standardization is required:
Y ^ f i n a l = Y ^ · σ + μ

3.4. The Process of the Proposed Model

In this study, a hybrid model of stock prediction combining adaptive frequency-domain decomposition and enhanced Transformer coding is constructed. The model successfully realizes the frequency-domain adaptive decomposition of stock price sequence by using the fast Fourier transform.
Then, the prediction is carried out by the enhanced coded Transformer, and the frequency intensity is introduced as the weight reference, and the trend component and the fluctuation component are adaptively fused to realize the refined modeling of the financial time series.
Algorithm 1 shows the pseudocode of the training program of the proposed time series prediction model based on the adapted transformer neural network and the feature extraction scheme based on FFT. Figure 4 shows the flowchart of the FAMS-Transformer hybrid model.
Algorithm 1 shows the processing flow of FAMS-Transformer hybrid model.
Algorithm 1: Overall Forecasting Procedure
Input: Historical multivariate time series X R B × L × D ;
prediction horizon H; number of dominant periods k;
patch length P; stride S
Output: Forecasted sequence Y ^ R B × H × C
1 Input normalization
2 μ Mean ( X )
3 σ < Var ( X ) + ϵ
4 X norm ( X μ ) / σ
5 Frequency-adaptive decomposition
6 P , A AdaptivePeriodExtraction ( X norm , k )
7 p trend max ( P )
8 X trend MovingAvgDynamic ( X norm , p trend )
9 X vol X norm X trend
10 Dual-branch forecasting
11 Y ^ trend PatchTSTBranch ( X trend , P , S )
12 Y ^ vol PatchTSTBranch ( X vol , P , S )
13 Frequency-aware fusion
14 w Softmax ( w fusion )
15 s Mean ( Softmax ( A ) )
16 w trend w 1 · s
17 w vol w 2 · ( 2 s )
18 Y ^ norm w trend Y ^ trend + w vol Y ^ vol w trend + w vol + ϵ
19 De-normalization
20 Y ^ Y ^ norm σ + μ
21 return   Y ^

4. Experimental Design

4.1. Data Experiment Settings

This study selects three representative indices of the Chinese stock market for investigation: the Shanghai Composite Index (SSE), the Shenzhen Component Index (SZSE), and the SME 100 Index. The closing price is taken as the forecast target, while other trading data (such as opening price, price change, and trading volume) serve as feature inputs for model training and technical indicator construction. The data span from 10 April 2014 to 19 March 2025. Trading suspension dates are excluded to avoid interference with the model results. All three datasets are split into training, validation, and testing sets in a ratio of 6:1:3. Figure 5 illustrates the trends of the selected datasets; the subgraphs (a), (b) and (c) represent the trend of Shanghai Composite Index (SSE), Shenzhen Component Index (SZSE) and Small and Medium 100 Index (SME 100) respectively.
Table 1 presents the descriptive statistics of the three stock price indices. The three indices form a sharp contrast in terms of volatility characteristics. Specifically, the SSE exhibits the lowest volatility (Std = 404.88) with a leptokurtic and heavy-tailed distribution (Kurtosis = 3.679). The SZSE displays the widest absolute fluctuation range (Std = 2038.12), with its kurtosis approaching that of a normal distribution (0.009). The SME 100 shows the highest relative volatility, characterized by a platykurtic and thin-tailed distribution (Kurtosis = −0.203) and a more dispersed price distribution. The skewness values of all three indices are positive, indicating right-skewed price distributions.
Recent research findings indicate that the stock market is a highly complex stochastic system, which is also influenced by a variety of external factors [48,49,50]. Accordingly, we incorporate all relevant indicators into the feature inputs, including carbon trading market prices, crude oil market prices, gold market prices, global stock market prices, foreign exchange market prices, macroeconomic activity, and sentiment indicators. The specific names and symbols of each feature are listed in Table 2.

4.2. Evaluation Metrics

This paper employs the mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE, %), and the coefficient of determination (R2) as the evaluation metrics for the model’s forecasting performance, whose corresponding formulas are defined as follows:
MAE = 1 N i = 1 N Y i Y ^ i
RMSE = 1 N i = 1 N ( Y i Y ^ i ) 2
MAPE = 1 N i = 1 N Y i Y ^ i Y i × 100 %
R 2 = 1 i = 1 N ( Y i Y ^ i ) 2 i = 1 N ( Y i Y - ) 2
where Y i denotes the true value, Y ^ i denotes the predicted value, Y ¯ is the mean of the true values, and N is the number of samples. MAE, RMSE, and MAPE measure the deviation between the true and predicted values; smaller values indicate that the predictions are closer to the true values. Among them, MAE and RMSE measure the absolute magnitude of the prediction error, which directly reflects the fitting accuracy of the model to the trend and fluctuation components; MAPE normalizes the error in percentage form, which is convenient for horizontal comparison. R2 measures the goodness of fit of the model relative to the constant mean baseline; a value closer to 1 indicates a better fit.

4.3. Baseline Models

In order to verify the effectiveness of the proposed framework, this paper selects more than fifteen state-of-the-art time series prediction models as baseline models for comparison, and divides the proposed baseline models into two categories.
The first category includes general sequence modeling methods. The long short-term memory network (LSTM) selectively retains and forgets information through a gating mechanism, which effectively alleviates the problem of gradient disappearance when dealing with long sequences by traditional recurrent neural networks [51]. Support vector regression (SVR) uses kernel techniques to map data to a high-dimensional feature space to perform linear regression, thereby fitting a nonlinear relationship with the original space [9]. The standard Transformer implements parallel processing and long-range dependence capture of sequence data with a self-attention mechanism and an encoder–decoder structure [10]. This kind of model is not specially designed for time series prediction tasks, and lacks an effective adaptation mechanism for non-stationary characteristics of financial data.
The second category covers the long time series prediction models proposed in recent years, which enhance the ability to capture complex time series patterns through frequency-domain transformation, multi-cycle decomposition or lightweight design. On the basis of Transformer, Autoformer uses a built-in sequence decomposition module to gradually separate trend-periodic components and seasonal components, and replaces self-attention with an autocorrelation mechanism, and mines cycle-based subsequence dependencies through fast Fourier transform [42]. Informer improves the efficiency of long sequence prediction through three innovations of ProbSparse self-attention, self-attention distillation and generative decoder [37]. FEDformer combines seasonal-trend decomposition and frequency-domain enhanced attention module to maintain linear complexity by randomly selecting frequency components, which improves the distribution consistency between the predicted sequence and the real sequence [21]. The iTransformer reverses the modeling dimension of the traditional Transformer, treats the entire sequence of each variable as a token, learns the correlation between variables with self-attention, and captures univariate time series dynamics with a feed-forward network [52]. PatchTST divides long sequences into fixed-length patches as tokens, and models each variable independently to capture local temporal patterns efficiently with a shorter attention window [43]. TimesNet converts one-dimensional time series into two-dimensional tensors by FFT to capture multi-cycle patterns, and uses the Inception module to extract changes within and between cycles [20]. FreTS first uses the frequency-domain MLP architecture to convert the time-domain signal to the frequency-domain for global learning, and uses the energy compression characteristics of the Fourier transform to focus on the key frequency components [22]. DLinear decomposes the original time series into trend and seasonal components by a moving average operation, and then sums them after simple linear layer prediction [53]. Based on Koopman theory, Koopa uses a Fourier filter to decompose non-stationary time series into time-invariant and time-varying components, and learns dynamics hierarchically through a modular Koopman predictor [54]. LightTS uses two downsampling strategies, interval sampling and continuous sampling, to shorten the length of the input sequence and then process it with MLP, which significantly improves the efficiency without losing accuracy [55]. FiLM uses Legendre polynomials to compress historical information into memory units, and introduces Fourier transform to enhance the capture of frequency-domain modes, thereby reducing noise interference [56]. TiDE (Time-series Dense Encoder), proposed by Google in 2023, is a long-horizon forecasting model built upon a multi-layer perceptron (MLP) architecture. It combines a projection of the historical series with an encoding of future covariates through residual connections, achieving predictive performance comparable to or better than Transformer-based models with a streamlined fully connected design [57].

4.4. Experiment Design and Objectives

This study systematically evaluates the predictive performance and design rationale of FAMS-Transformer across five dimensions:
(1)
Multi-step prediction accuracy comparison against baseline models. FAMS-Transformer is compared with 15 baselines—LSTM, SVR, Transformer, Autoformer, Informer, FEDformer, TimesNet, FreTS, DLinear, Koopa, LightTS, FiLM, PatchTST, iTransformer, and TiDE—across four prediction horizons (1 step, 5 steps, 10 steps, and 15 steps), to assess whether it outperforms all baselines in every setting.
(2)
Cross-market validation. The core comparison is replicated on the S & P 500 index under identical prediction settings to examine whether the performance advantage of FAMS-Transformer generalizes across different market environments.
(3)
Significance testing. Paired t-tests and Wilcoxon signed-rank tests are conducted on per-sample prediction errors to determine whether the performance advantage of FAMS-Transformer over each baseline is statistically significant.
(4)
Volatility-regime analysis. Test samples within each dataset are partitioned into low-volatility (below the first quartile, Q1), medium-volatility (between Q1 and the median, Q2), and high-volatility (above Q2) regimes based on historical volatility. Model performance is evaluated independently within each regime to verify whether FAMS-Transformer maintains its advantage across varying volatility conditions.
(5)
Ablation experiments. Three progressively structured groups of ablation variants are designed to quantify the independent contribution of each core component and to validate specific design choices.
Module necessity ablation: The intermediate convolutional module is removed (w/o Conv), the multi-scale decomposition module is removed (w/o Decomp), and both are removed simultaneously (w/o Both). Together with the full model, these four configurations quantify the independent and joint contributions of the two modules.
Local-feature mechanism design ablation. The depthwise separable convolution adopted in this paper is replaced with standard convolution (Standard Conv1D), dilated depthwise separable convolution (Dilated Depthwise Conv1D), and the original Transformer with the intermediate convolutional module entirely removed (Original Transformer), to verify the rationale for adopting depthwise separable convolution.
Period-selection strategy ablation. The adaptive FFT-based period identification adopted in this paper (Adaptive-period) is replaced with a uniform fixed period applied to all windows (Fixed-period) and a randomly assigned period per window (Random-period), to examine whether the adaptive period strategy yields a genuine improvement in predictive accuracy.

5. Experimental Results and Discussion

5.1. Comparison with Baseline Models

Table 3 summarizes the four evaluation indexes of MAE, RMSE, MAPE and R2 of FAMS-Transformer and 15 baseline models on the three data sets of Shanghai Composite Index (SSE), Shenzhen Component Index (SZSE) and Small and Medium 100 Index (SME 100) with four prediction steps of one step, five steps, 10 steps and 15 steps. The rough mark is the optimal value of each index, and the underline mark is the suboptimal value.
Figure 6 shows the performance of all models on MAE, RMSE and MAPE under 1-step prediction in the form of line chart. The three subgraphs, (a), (b), and (c), in Figure 6 correspond to SSE, SZSE and SME100 data sets in turn. Since single-step prediction is the most basic time series prediction task, it can most directly reflect the fitting ability of the model and is not interfered by the error accumulation effect. Moreover, it can be seen from Table 3 that in the subsequent multi-step prediction (five, 10, 15 steps) experiments, the error distribution characteristics of each model are highly similar to the single-step prediction situation. Therefore, in order to avoid information redundancy, only the line chart of single-step prediction is displayed here.
Figure 7, Figure 8 and Figure 9 show the fitting curves of the predicted values and the real values of each model on the SSE, SZSE and SME100 datasets, respectively. The four subgraphs (a), (b), (c) and (d) in each graph correspond to the prediction steps of one, five, 10 and 15 respectively. In order to clearly show the fitting differences in different architecture models, only five representative baseline models of DLinear, FEDformer, iTransformer, PatchTST and TimesNet are selected to compare with FAMS-Transformer.
It can be seen from Table 3 and Figure 6, Figure 7, Figure 8 and Figure 9 that FAMS-Transformer achieves the best or near-optimal prediction accuracy on all prediction steps and data sets. For example, on the SME 100 dataset, iTransformer’s R2 is 0.930 and FAMS-Transformer’s R2 is 0.928, but the two values are highly close and do not constitute a substantial gap. As the prediction step length is extended from 1 step to 15 steps, the prediction accuracy of all models decreases, but the attenuation range is significantly different. The attenuation of FAMS-Transformer was the slowest: R2 decreased from 0.959 to 0.730 (a decrease of 0.229) on SSE and only from 0.990 to 0.928 (a decrease of 0.062) on SME 100. In sharp contrast, the first type of baseline model has experienced severe performance degradation or even collapse. The R2 of SVR is all entirely negative under the four steps of SSE, and the R2 of LSTM in the first step of SME 100 is only 0.084. This contrast shows that FAMS-Transformer’s adaptive decomposition and enhanced coding mechanism provides it with more robust long-term modeling capabilities, while traditional models without time-series-specific adaptation are difficult to maintain effective feature extraction in long-term financial forecasting scenarios.
In the second type of long time series prediction model, iTransformer and PatchTST are the most prominent. iTransformer shows strong competitiveness in long-step prediction, and achieves the optimal R2 in 15 steps of SZSE and SME 100. PatchTST performed strongly in short-step prediction (SSE 1-step MAE = 27.637, second only to FAMS-Transformer), but the relative ranking declined with the increase in step length. The performance of FreTS on SZSE is noteworthy, and the 10-step R2 ranks first (0.924) among all models, reflecting the generalization potential of the frequency-domain MLP architecture under specific market structures. The performance of TimesNet on the three data sets is quite different (SSE 1 step R2 = 0.771, SME 100 index is 0.938), suggesting that its fixed cycle identification strategy has insufficient adaptability to different market volatility characteristics.
Based on the above analysis, FAMS-Transformer achieves the optimal performance in all prediction scenarios. In the most challenging 15-step prediction, its R2 remains stable between 0.730 and 0.928. This advantage is due to the dynamic identification of the dominant period and the explicit decoupling of the trend–fluctuation by the frequency-adaptive decomposition mechanism. The trend branch provides a robust base for long-range prediction, and the fluctuation branch captures local dynamics. The two are adaptively weighted by frequency-aware fusion to ensure that the model still maintains strong prediction accuracy in long-step and high-fluctuation scenarios.

5.2. Verification of S & P 500 Data Set

Table 4 reports four evaluation indexes of MAE, RMSE, MAPE and R2 of FAMS-Transformer and 15 baseline models on the S & P 500 data set with four prediction steps of one step, five steps, 10 steps and 15 steps. The rough mark is the optimal value of each index, and the underline mark is the suboptimal value. Figure 10 shows the fitting curve of the predicted value and the real value of each model on the S & P 500 data set. The four subgraphs (a) (b) (c) (d) in each graph correspond to the prediction steps of one, five, 10 and 15, respectively. In order to clearly show the fitting differences in different architecture models, only five representative baseline models of DLinear, FEDformer, iTransformer, PatchTST and TimesNet are selected to compare with FAMS-Transformer.
It can be seen from Table 4 and Figure 10 that the prediction accuracy of FAMS-Transformer on SP500 is excellent. All four indicators perform well under four steps, and the leading advantage is more significant than that of the A-share experiment. The results show that the adaptive decomposition and enhanced coding mechanism of FAMS-Transformer is equally effective in the US market, and its performance is robust in mature markets with different volatility characteristics and trading mechanisms. Nevertheless, these results do not fully rule out the possibility of dataset-specific adaptation, and the generalizability of the model remains bounded by the coverage of the current test data.
Additionally, we evaluate directional accuracy (DA)—the proportion of samples for which the predicted direction of price change (up or down) matches the actual direction—as a supplementary metric from the perspective of investment practice. DA values for all models fall within the narrow range of 0.48–0.53, with FAMS-Transformer marginally outperforming most baselines but showing limited advantage. It should be noted that the proposed method is primarily designed for point forecasting of financial time series, and improvements in MAE, RMSE, and R2 do not necessarily translate into directional prediction advantages. This result is consistent with the near-random-walk nature of short-term price movements: under the efficient market framework, historical price information alone is insufficient to reliably predict the direction of price changes. Meaningful improvement in DA likely requires the incorporation of non-price information sources such as news text and policy signals. Complete DA results are provided in Appendix A, Table A1.

5.3. Significance Testing

Table 5 summarizes the Wilcoxon signed-rank test results between FAMS-Transformer and the 15 baseline models across four datasets and four prediction horizons, and Table 6 reports the corresponding paired t-test results. In both tables, ***, **, and * denote significance at the 0.01, 0.05, and 0.1 levels, respectively, and (ns) indicates non-significance.
Across all 240 comparisons at α = 0.05, FAMS-Transformer significantly outperforms the baselines in 91.3% of cases under the Wilcoxon signed-rank test and in 88.8% under the paired t-test. The two tests agree in the vast majority of comparisons. The few cases of disagreement typically arise because the t-test is sensitive to departures from normality, whereas the Wilcoxon test is more robust to distributional shape; that the latter still indicates significance suggests that the error distribution of FAMS-Transformer is indeed systematically better than that of the baseline.
The 21 non-significant cases exhibit a clear hierarchical pattern across model categories. First, non-significance relative to the strongest baselines—PatchTST and iTransformer—occurs exclusively at pl = 1, predominantly on the SSE dataset. One possible interpretation is that at a one-step horizon, the prediction target is largely determined by the local morphology of the most recent time steps, and multi-scale periodic structure has yet to unfold. The patching mechanism of PatchTST and the inverted attention of iTransformer are naturally suited to capturing such short-range local patterns. Second, non-significance relative to Autoformer is concentrated at pl = 5. Autoformer incorporates a built-in series decomposition module that, like the frequency-domain decomposition of FAMS-Transformer, structures the input through trend-cycle separation. At moderate prediction horizons, the fixed-scale moving-average decomposition used by Autoformer can still adequately capture the dominant periodic components, and the performance gap between the two decomposition strategies falls below the threshold of statistical discriminability. This observation, in turn, suggests that the decomposition strategy itself may possess general effectiveness.
It should be noted that the above discussion on the causes of non-significant results remains a possible interpretation based on observed experimental patterns, rather than a conclusion directly verified by additional causal experiments. The primary role of the significance analysis is to identify the prediction horizons and baseline comparisons under which the proposed method maintains a more stable advantage; the specific causes of non-significant results require further investigation through dedicated controlled experiments.

5.4. Volatility-Regime Analysis

Taking one-step-ahead prediction as an example, Table 7 reports the R2 of each model across three volatility regimes on all four datasets. Complete volatility-regime analysis results for the Shanghai Composite Index (SSE), Shenzhen Component Index (SZSE), SME 100 Index (SMESE), and S & P 500 Index (SP500), covering all four prediction horizons and four evaluation metrics, are provided in Appendix A Table A2, Table A3, Table A4 and Table A5, respectively. As shown in Table 7, FAMS-Transformer ranks first in R2 in 11 out of the 12 regime-by-dataset scenarios, with the sole exception being the medium-volatility regime on SZSE (R2 = 0.993, ranked second, trailing PatchTST by only 0.001). This result demonstrates that the performance advantage of FAMS-Transformer is not confined to any specific volatility environment—even in the low-volatility regime, where the decomposition strategy has the least to act upon, the model remains the best among all 16 models. Moreover, its R2 in the low-volatility regime persists at the same level as, or slightly above, that in the medium- and high-volatility regimes (e.g., SMESE: low = 0.9944, medium = 0.9944, high = 0.9840; SP500: low = 0.998, medium = 0.997, high = 0.993), thereby ruling out the concern that the model may overfit to high-volatility patterns.

5.5. Ablation Experiment

Table 8 reports the evaluation metrics of the full model and its ablation variants across three datasets (SSE, SZSE, and SMESE) at different prediction horizons. Table 9 compares the parameter count (Params) and computational cost (FLOPs) of four architectural variants—the depthwise separable convolution adopted by FAMS-Transformer, standard convolution (Standard Conv1D), dilated depthwise convolution (Dilated Depthwise Conv1D), and the original Transformer without the intermediate convolutional module (Original Transformer)—across the four prediction horizons.
The analysis proceeds from three perspectives: module necessity, local-feature mechanism design, and period-selection strategy.
(1)
Module necessity ablation. As shown in Table 8, the full model outperforms all ablation variants across the three datasets, confirming the complementary relationship between the two core modules. Taking SSE 15-step prediction as an example: the full model achieves R2 = 0.730; removing the decomposition module (w/o Decomp) reduces it to 0.718; removing the intermediate convolution (w/o Conv) reduces it to 0.726; and removing both (w/o Both) reduces it to 0.715. The independent R2 increment of the decomposition module (ΔR2 ≈ 0.012) and that of the convolutional module (ΔR2 ≈ 0.004) are each numerically modest. However, their joint contribution (ΔR2 = 0.015, full model vs. w/o Both) exceeds either independent increment, indicating a complementary relationship: the decomposition module provides structurally separated trend and fluctuation components, upon which the convolutional module extracts local patterns, with the combined effect exceeding the sum of their individual contributions. Among these, the decomposition module provides a modest but consistent contribution across most settings, and the model’s final performance should be understood as the result of the joint operation of multi-scale decomposition, local feature extraction, period selection, and the fusion mechanism.
(2)
Local-feature mechanism design ablation. As shown in Table 9, depthwise separable convolution yields better average error metrics than standard convolution across all three datasets while requiring fewer parameters and lower FLOPs, suggesting that the channel-mixing capacity of standard convolution does not translate into improved predictive performance in this setting and instead introduces additional computational overhead. Dilated depthwise convolution performs comparably to, but does not consistently surpass, the standard depthwise separable version, indicating that simply expanding the receptive field does not reliably improve forecasting performance on financial time series.
(3)
Period-selection strategy ablation. As shown in Table 8, the adaptive period strategy adopted in this paper yields R2 values nearly identical to those of the fixed period strategy (SSE pl = 1: 0.9592 vs. 0.9593), whereas random period assignment leads to a systematic performance degradation (a drop of 0.0020 on SSE pl = 1). This result indicates that the period information extracted by adaptive FFT captures genuine periodic structure rather than passively fitting noise specific to individual windows. If the adaptive mechanism were merely fitting noise, randomizing the period would not cause systematic degradation and cases in which random outperforms adaptive would be expected. The core value of the adaptive period strategy lies in its ability to automatically achieve strong predictive accuracy in a data-driven manner: the period parameter is determined entirely by the frequency-domain characteristics of each input window, requiring no manual specification.
In addition, to examine whether the dominant periods identified by the adaptive mechanism vary randomly across adjacent windows, we identify the top two (Top-2) dominant periods for each test sample via FFT using only its historical input window (excluding data from the prediction horizon), and compute the Jaccard similarity between the period sets selected for each pair of adjacent test windows:
J P t , P t 1 = P t P t 1 P t P t 1
A Jaccard value closer to 1 indicates greater consistency between the period sets selected from adjacent windows. Table 10 reports the Jaccard similarity statistics of the dominant period sets between adjacent test windows. As shown in Table 10, the mean Jaccard similarity across all four datasets exceeds 0.92, with the median reaching 1.0, indicating a high degree of consistency in the dominant periods identified across adjacent windows. It should be noted that this analysis does not, in a strict sense, completely rule out the possibility of spurious local periodicity fitting. The aim of the present analysis is to provide supplementary empirical evidence from the perspective of period continuity, building upon the existing ablation comparisons.

6. Conclusions

This paper proposes a Frequency-Aware Adaptive Multi-Scale Decomposition Transformer forecasting framework (FAMS-Transformer). Multi-step forecasting experiments are conducted at four horizons ranging from one to 15 steps on four representative market indices—the Shanghai Composite Index, the Shenzhen Component Index, the SME 100 Index, and the S & P 500 Index—with comparisons against fifteen baseline models, including LSTM, SVR, Transformer, Autoformer, Informer, iTransformer, PatchTST, TimesNet, FreTS, DLinear, Koopa, LightTS, FEDformer, TiDE and FiLM. The experimental results demonstrate that FAMS-Transformer can effectively separate and model the heterogeneous components inherent in stock index price series—namely, long-term trends, cyclical fluctuations, and random noise—and achieves superior predictive performance both at the most challenging 15-step forecast horizon and on high-volatility indices such as the SZSE Component Index and the SME 100 Index. These findings indicate that the synergistic design of adaptive decomposition and enhanced encoding can effectively address two core limitations in the prevailing decomposition–forecasting paradigm: the lack of adaptivity in decomposition strategies and the insufficient capacity of Transformer encoders to capture local temporal dynamics. Although the performance differences among alternative local modeling approaches in the ablation experiments are generally modest, the removal of the frequency-adaptive decomposition module leads to a discernible decline in prediction accuracy across all three datasets, thereby validating the effectiveness of the FFT-based dynamic identification of dominant periodic components and the consequent adaptive trend–fluctuation decoupling. This mechanism renders the decomposition process entirely driven by the intrinsic periodic characteristics of the price data themselves. Compared with the fixed-scale or globally uniform frequency selection strategies commonly adopted in existing studies, the proposed approach can more flexibly adapt to variations in periodic structure under different market regimes. Furthermore, the depthwise separable convolution reduces computational overhead while improving predictive accuracy, striking a favorable balance among prediction performance, model complexity, and architectural simplicity.
Nevertheless, several limitations of this study warrant acknowledgment. First, the current model relies primarily on historical price data and structured external indicators, without yet incorporating unstructured information such as news text and social media sentiment. This limits the model’s capacity to respond to policy shocks and market turbulence, and is also an important reason for the limited improvement in Directional Accuracy. Future work may integrate tools such as financial sentiment analysis to construct a multimodal input framework, or design classification-oriented branches for directional prediction, so as to concurrently improve point forecasting accuracy and directional judgment. Second, the model involves the coordinated operation of multiple components—including adaptive decomposition, the attention mechanism, and fusion weighting—resulting in a complex decision process. The lack of interpretability constitutes a substantive limitation of the present method. Under the frequency-domain decomposition architecture, standard SHAP analysis and attention weight visualization cannot readily penetrate to the upstream frequency-domain operations; this challenge is a common difficulty faced by this technical approach. Although the current ablation experiments have covered module necessity and period selection strategies, the independent contributions of the fusion weighting mechanism and threshold sensitivity have not yet been sufficiently isolated. Future research will prioritize the adaptation of SHAP to frequency-domain attribution, the temporal alignment of attention weights with financial events for visualization, and dynamic visualization of fusion weights, so as to progressively enhance the transparency and trustworthiness of the model in financial applications. Furthermore, although this study integrates various external market indicators—including carbon trading prices, crude oil prices, and macroeconomic variables—into a unified input space, the marginal contribution of each category of external factors to prediction accuracy has not been separately quantified. Subsequent work may disentangle the differential impacts of distinct factors across diverse market environments through systematic controlled experiments, thereby providing a basis for the refined selection of input features. Finally, the absence of time-series foundation models such as TimesFM from the baseline comparison constitutes a limitation of the present study. In future work, we will consider systematically comparing the proposed method with foundation models including TimesFM under zero-shot, fine-tuning, or unified pretraining evaluation frameworks.

Author Contributions

Conceptualization, G.H. and T.Z.; methodology, G.H. and T.Z.; software, G.H. and T.Z.; validation, X.Z. and T.Z.; formal analysis, H.Z.; investigation, H.Z.; resources, G.H.; data curation, X.Z. and T.Z.; writing—original draft preparation, X.Z.; writing—review and editing, H.Z. and G.H.; visualization, X.Z.; supervision, G.H. and T.Z.; project administration, G.H. and T.Z.; funding acquisition, T.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Fujian Province, grant number 2026J008100.

Data Availability Statement

The data used in this study were obtained from the Wind Financial Database under institutional subscription and are not publicly available due to commercial license restrictions. Reasonable requests for non-confidential processed data may be directed to the corresponding author.

Acknowledgments

The authors would like to thank the open-source community for the computational frameworks that supported the empirical analysis.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ARIMAAutoregressive Integrated Moving Average
ASDHAdaptive Selection Decomposition Hybrid
BiLSTMBidirectional Long Short-Term Memory
CEEMDANComplete Ensemble Empirical Mode Decomposition with Adaptive Noise
CoMLCommittee of Multi-Scale Nonlinear Learning
CPOCrested Porcupine Optimizer
EEMDEnsemble Empirical Mode Decomposition
EMDEmpirical Mode Decomposition
FAMSFrequency-Aware Multi-Scale
FEDformerFrequency Enhanced Decomposed Transformer
FFTFast Fourier Transform
FiLMFrequency Improved Legendre Memory
GABPGenetic Algorithm Back Propagation
GARCHGeneralized Autoregressive Conditional Heteroskedasticity
IMFIntrinsic Mode Function
KNNK-Nearest Neighbors
LSTMLong Short-Term Memory
MAEMean Absolute Error
MAPEMean Absolute Percentage Error
MDCMMulti-Scale Dilated Convolution Module
MLPMultilayer Perceptron
MSPEMean Square Percentage Error
R2Coefficient of Determination
RMSERoot Mean Square Error
SME 100Small and Medium Enterprises 100 Index
SSEShanghai Stock Exchange Composite Index
SVRSupport Vector Regression
SZSEShenzhen Stock Exchange Component Index)
VMDVariational Mode Decomposition
WTWavelet Transform

Appendix A

Table A1. Directional accuracy (DA) of all models across four datasets and four prediction horizons.
Table A1. Directional accuracy (DA) of all models across four datasets and four prediction horizons.
ModelSSESP500SZSESMESE
1 Step5 Steps10 Steps15 Steps1 Step5 Steps10 Steps15 Steps1 Step5 Steps10 Steps15 Steps1 Step5 Steps10 Steps15 Steps
OurModel0.52060.52410.52260.51670.52670.53020.52110.51120.52450.51010.51660.50930.52450.51920.510.5173
Autoformer0.52450.51740.50860.48760.480.49690.50440.45510.50390.50080.49540.45870.51550.49970.49860.4792
DLinear0.50390.51970.51960.50110.480.43780.42890.41790.4910.51610.51670.48090.48450.51680.5160.4848
FEDformer0.50260.50030.4880.46370.49110.47380.45840.45970.48710.46170.45580.43310.48840.47540.47350.4525
FiLM0.49870.48910.49670.48860.48660.49460.45830.44250.48580.45980.48020.47310.47810.46710.49490.4835
FreTS0.48710.49920.51160.50790.49110.49930.43230.41560.48580.47380.51250.5050.5090.48680.51260.5007
Informer0.5090.49970.51770.51250.45430.41790.39890.37770.47290.47820.48250.49120.49610.48370.49070.5017
Koopa0.49740.50670.50430.49320.48110.50070.47770.46680.50520.49610.47560.48010.5180.49170.48320.489
LightTS0.51160.50440.51590.5250.43990.44340.41270.39250.47810.47020.5370.5370.48970.47820.52820.5262
Lstm0.5180.50880.50260.52970.45430.41790.39890.37770.48320.47950.47220.5210.48840.47070.48360.5148
PatchTST0.48070.49920.50250.49590.48890.48050.4620.44770.49480.48010.50270.49450.51030.49170.5130.5092
SVR0.50130.51710.5280.52830.45430.41790.39890.37770.48970.49820.49820.50350.50640.50390.50210.5031
TiDE0.51930.49720.49880.49280.49670.46820.46140.44830.49740.47310.47590.47030.50130.48160.48540.4822
TimesNet0.51160.51320.51170.51290.49110.49020.470.49250.50770.46790.46910.48090.49480.48110.49020.4738
Transformer0.50.50260.49920.4830.45430.41790.39890.37770.47290.4710.45490.44760.50770.48990.47820.4823
iTransformer0.4910.50230.51680.50040.49330.48590.46590.44760.50260.49380.51340.50330.50390.51270.5220.5154
Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.
Table A2. Volatility-regime analysis results for SSE (all models, four prediction horizons, four evaluation metrics).
Table A2. Volatility-regime analysis results for SSE (all models, four prediction horizons, four evaluation metrics).
ModelLowMediumHigh
R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)
1 stepOurModel0.98816.77721.7490.5400.96924.63030.9370.7660.90037.79454.7091.199
Autoformer0.80273.50490.1572.3800.67577.240100.5942.3930.64575.690103.0512.367
DLinear0.90054.63164.0681.7960.83059.37872.7591.8550.51291.005120.6892.876
FEDformer0.81473.30387.3352.3980.81659.61375.7831.8620.53691.950117.7872.897
FiLM0.89456.04965.9381.8330.78965.31681.0652.0420.359109.123138.3743.444
FreTS0.98022.98628.4760.7520.96525.84832.9800.8070.88939.24257.5521.244
Informer0.058154.710196.4915.2100.230126.328154.9434.018−0.219163.563190.8325.288
Koopa0.96729.30936.6440.9410.93037.50646.8551.1610.68167.78797.5462.134
LightTS0.94141.24549.2711.3440.89048.70258.6771.5260.67171.19299.1552.245
Lstm0.576109.657131.8933.5770.54499.418119.2613.0920.152130.071159.1584.080
PatchTST0.98718.21222.8070.5860.96924.44531.0300.7610.89340.24156.6611.276
SVR−1.806290.832339.1469.698−0.720190.430231.5926.061−1.120203.285251.6436.652
TiDE0.94837.93846.1671.2300.87350.54762.8511.5720.64274.823103.3582.369
TimesNet0.91348.82559.7751.5770.81359.79776.3191.8450.48393.640124.2262.952
Transformer0.553106.910135.3813.5100.70875.13795.3582.3220.79362.65078.5792.000
iTransformer0.98618.96124.0270.6130.96625.76532.6110.8010.89338.75056.6011.230
5 stepsOurModel0.94138.81947.5111.2620.84849.69462.0651.5530.63379.280114.8112.503
Autoformer0.63496.348118.0303.1150.41794.092121.7262.9190.463106.637139.0173.330
DLinear0.550111.394130.9663.494−0.097144.965166.9304.445−0.162158.419204.3854.897
FEDformer0.589105.034125.1673.4410.42796.870120.6723.0220.210125.787168.5623.936
FiLM0.84765.55276.3832.1610.69771.20487.7712.2400.365119.841151.0613.781
FreTS0.90450.28760.6131.6650.82653.07766.4561.6710.61780.825117.2982.551
Informer0.295133.330163.8764.3860.184121.745143.9613.8010.372127.499150.2964.045
Koopa0.92743.80252.7161.4320.80856.69369.7891.7770.56086.769125.7742.737
LightTS0.92342.82954.0001.3810.78660.29573.7611.8740.55791.159126.2622.861
Lstm0.545110.688131.6253.5540.178127.006144.5323.9200.329124.680155.3503.878
PatchTST0.94139.50047.4201.2840.84450.24963.0371.5720.59983.923120.0362.648
SVR−1.562267.112312.3898.919−0.712168.247208.5745.367−0.623194.300241.5576.307
TiDE0.78876.08389.9362.4790.48092.746114.9862.890−0.092154.645198.1484.846
TimesNet0.87059.47370.4131.9370.68572.89989.4592.2790.165130.226173.3044.086
Transformer0.459112.999143.5143.7270.61779.62598.6582.4800.69380.002104.9892.520
iTransformer0.93641.01749.4001.3340.84750.10262.3471.5670.59780.984120.3762.553
10 stepsOurModel0.90354.03863.7721.7580.70567.40883.1262.1220.273113.955154.2283.587
Autoformer0.489118.347146.6383.805−0.057121.489157.2803.797−0.255160.337202.6855.047
DLinear0.584106.098132.3643.295−0.243144.522170.5754.438−0.415160.943215.1734.979
FEDformer0.644104.624122.3583.4270.33799.771124.5643.143−0.362168.935211.1755.294
FiLM0.86764.43574.8512.0990.62076.29194.3312.3880.046134.273176.7044.217
FreTS0.87261.63773.3902.0160.71069.07882.3702.1700.370107.516143.5743.378
Informer0.189146.963184.6954.796−0.008126.121153.6033.9580.218136.159159.9844.324
Koopa0.86964.01874.3492.0910.68469.52585.9552.1890.314108.309149.8663.429
LightTS0.89651.32366.0111.6520.71368.91581.9742.1580.314109.281149.8343.432
Lstm0.004162.253204.6645.4330.237103.728133.6323.3200.186133.283163.2534.299
PatchTST0.90154.28564.4531.7620.69769.05484.1802.1700.208121.269161.0483.818
SVR−0.847237.443278.7567.876−0.475147.288185.8034.711−0.457172.179218.4065.593
TiDE0.83073.37984.4612.4020.60977.85595.6852.439−0.059144.377186.1464.538
TimesNet0.84469.05881.0412.2370.50586.684107.5992.701−0.222150.119199.9954.681
Transformer0.355129.178164.7644.2770.47887.397110.5422.7600.55295.869121.0853.062
iTransformer0.89258.48267.4251.9010.71666.62581.5742.0960.243113.190157.4483.559
15 stepsOurModel0.85970.47980.9332.2940.56275.80993.3192.387−0.128143.975184.6194.536
Autoformer0.468126.041157.0784.026−0.651143.326181.1704.499−0.382163.779204.3025.123
DLinear0.73595.309110.9223.1480.37892.015111.1532.903−0.339163.920201.1205.160
FEDformer0.592119.434137.4703.9250.022116.779139.4133.694−0.763194.840230.8196.135
FiLM0.83076.70588.6632.5060.42288.071107.2252.765−0.351161.426202.0295.067
FreTS0.81478.77492.8642.5600.56677.15192.8772.4170.083122.632166.4543.866
Informer0.259150.509185.4114.871−0.141126.888150.5774.0000.098141.386165.1034.486
Koopa0.85469.56582.2912.2590.47782.905101.9632.595−0.143145.579185.8384.577
LightTS0.82973.14489.0742.3770.51781.31797.9442.5490.010131.664172.9524.155
Lstm0.423131.572163.6354.3140.33992.704114.6332.9420.327117.800142.6073.775
PatchTST0.85970.31380.9822.2870.55677.33093.9552.434−0.202148.621190.5274.674
SVR−0.544227.195267.5427.475−0.397131.167166.6474.224−0.355157.203202.2935.115
TiDE0.82178.31391.1552.5610.41288.901108.0922.793−0.409166.162206.2955.219
TimesNet0.79081.36898.5732.6570.157103.644129.4113.241−0.529170.432214.9455.324
Transformer0.379137.319169.6964.4610.33695.774114.9033.0270.49399.630123.8023.174
iTransformer0.85770.82581.2972.3020.56376.34493.2512.399−0.107139.054182.8434.381
Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.
Table A3. Volatility-regime analysis results for SZSE (all models, four prediction horizons, four evaluation metrics).
Table A3. Volatility-regime analysis results for SZSE (all models, four prediction horizons, four evaluation metrics).
ModelLowMediumHigh
R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)
1 stepOurModel0.99483.164105.5120.7810.993116.663150.6021.0540.956173.246257.9921.637
Autoformer0.928281.764363.8552.6600.949326.837413.6622.9720.811383.833535.4053.621
DLinear0.937291.603341.3142.8340.952332.121401.9753.0970.786425.763569.5494.002
FEDformer0.919315.599386.1913.0480.947344.444423.2963.1620.778453.729580.4164.228
FiLM0.927309.069367.3092.9640.933387.962474.7073.5330.692524.741683.4264.884
FreTS0.990112.387137.0321.0900.992129.534159.4791.2020.953181.207266.9761.723
Informer−0.4491421.3531634.28214.3210.1201513.9411717.20115.158−1.0311627.5521753.46716.106
Koopa0.983138.184175.5841.2850.983190.322237.4401.7300.882292.911423.2922.744
LightTS0.961225.498268.1662.1850.976229.764284.0322.1730.857345.167465.8393.256
Lstm−0.0771080.9881409.10511.1000.1401465.7991697.60514.537−1.1121561.8561788.24615.551
PatchTST0.99387.316111.1660.8200.994110.027142.5690.9960.954184.385263.2431.744
SVR−0.4291259.5211623.37012.830−0.0181570.4171846.68315.232−0.5461240.7301530.26612.641
TiDE0.964212.417255.8802.0180.969252.670323.0322.2820.842348.066489.6453.278
TimesNet0.943265.625324.2012.4960.954308.161391.9022.7810.804398.501545.4503.737
Transformer−0.2721264.2501531.30512.8590.1781399.9811659.73414.079−0.8511442.2571674.26114.521
iTransformer0.99297.670122.6820.9230.992129.172165.7071.1710.955182.642262.0141.725
5 stepsOurModel0.963195.681245.6091.8590.961265.907334.3532.4440.853373.325540.4873.497
Autoformer0.861388.675479.4033.6600.892449.904559.8524.0550.747511.907710.3324.774
DLinear0.829419.483531.8723.7290.866474.048623.3824.1710.608675.869884.3316.255
FEDformer0.777520.858606.6665.0800.867518.039621.5484.7140.665618.842816.6365.708
FiLM0.889363.290427.6693.5170.895449.851551.1294.1230.706587.973764.9685.446
FreTS0.940262.849315.1402.5820.958283.234348.0912.6890.848384.839549.7743.636
Informer−0.3681201.1041502.53512.331−0.0011488.1581702.80514.891−0.4931596.4941725.11015.677
Koopa0.945244.848302.5692.3420.949305.608385.8362.7870.830402.599582.7313.759
LightTS0.947247.211295.6182.3710.948311.280388.9352.8870.833422.357577.4253.922
Lstm−0.3161242.0991473.67912.5830.1081377.9171607.28813.849−0.3101475.7231615.76514.517
PatchTST0.964195.681243.9131.8580.960275.171342.3742.5370.839394.977566.9973.697
SVR−0.4181185.7761529.73912.165−0.0211449.9711719.82014.215−0.1891242.1601539.46912.319
TiDE0.864387.685473.7673.7010.851539.993657.1574.9160.516764.428982.4077.089
TimesNet0.897347.596411.6483.3280.900443.257539.4124.0180.645632.676840.7415.888
Transformer−0.4511268.7351547.81212.9920.0361412.4321670.57014.327−0.4491522.5781699.60115.072
iTransformer0.961201.666252.9791.9170.961270.617335.0762.4890.846377.824554.5293.538
10 stepsOurModel0.922292.494356.0162.7800.928367.050458.1403.3620.675551.348740.7355.149
Autoformer0.748504.609641.2034.7830.769655.575821.1256.0610.388798.0001017.0137.485
DLinear0.801437.441569.8323.9060.847505.340667.0324.4700.472708.593945.1386.590
FEDformer0.789491.381586.6754.7760.788651.944786.3266.0370.353828.4031046.2047.672
FiLM0.900344.525404.2803.2770.902431.014534.1463.9500.559649.633863.1376.052
FreTS0.895345.946413.5953.3700.935359.606434.9203.4330.732502.619673.1474.749
Informer−0.5611269.0061595.41313.009−0.1331609.0741817.93715.992−0.8691639.8811777.43416.180
Koopa0.893359.707418.5353.4420.911418.694510.7933.8510.685532.905729.6235.002
LightTS0.903325.625398.0163.1270.935356.668436.2523.3350.765469.852629.8684.439
Lstm−0.8761460.0961749.12814.855−0.2371654.6821899.52316.596−0.9071603.5321795.66015.951
PatchTST0.920292.083360.9512.7690.926367.631463.5263.3610.659558.828759.0845.229
SVR−0.2831109.1551446.28611.3780.0461396.7881668.11713.572−0.1401114.3161388.35611.188
TiDE0.878380.086445.6913.6400.883473.839583.6984.3720.491705.263927.6356.538
TimesNet0.883364.969437.3383.4320.879489.904594.9114.4600.362790.4741038.2427.324
Transformer−0.7511412.3391689.80714.397−0.1301558.6221815.28415.760−0.9001630.3181792.28916.164
iTransformer0.920300.025361.4792.8560.928372.024458.5903.4210.667537.778749.9075.025
15 stepsOurModel0.911348.887422.7263.2910.867468.858575.9514.3510.418701.957919.6136.531
Autoformer0.801507.611630.8954.8940.705720.542858.8586.7690.084958.7871153.7888.915
DLinear0.851458.184546.6594.4500.801591.977704.8815.7070.312804.844999.7307.520
FEDformer0.779561.669665.3435.3800.716729.219842.9736.9280.008990.2881200.6309.188
FiLM0.894380.013459.7233.6080.831536.764650.8374.9840.276803.2671025.6367.446
FreTS0.891384.216466.5483.6290.891426.103522.8054.1170.565588.500795.0495.583
Informer−0.1301195.3731503.12311.909−0.2361532.1631758.40915.563−0.8951521.5371659.83215.102
Koopa0.909346.277425.7343.2520.862475.865587.0584.3940.370753.895956.7057.019
LightTS0.866421.844517.3974.0310.867463.731577.5934.5160.600598.149762.1615.693
Lstm0.165932.9521292.2569.2890.0731262.5651522.29612.855−0.1311084.7951282.16210.924
PatchTST0.909348.204426.8933.2810.866472.356579.7184.3840.389724.806942.1666.739
SVR−0.0941139.8571479.14711.155−0.0281343.3911603.50713.513−0.1951062.5891317.98710.801
TiDE0.889386.739471.0223.6740.831537.309650.4135.0040.242827.9441049.8367.685
TimesNet0.890385.343468.1683.6380.809564.981690.5635.2490.303798.6401006.9107.375
Transformer−0.1881283.3041541.54112.787−0.2771513.0861787.27615.609−0.9691536.2321692.02115.291
iTransformer0.911346.125422.0793.2740.876454.598557.1694.2160.452670.697892.1556.257
Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.
Table A4. Volatility-regime analysis results for SMESE (all models, four prediction horizons, four evaluation metrics).
Table A4. Volatility-regime analysis results for SMESE (all models, four prediction horizons, four evaluation metrics).
ModelLowMediumHigh
R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)
1 stepOurModel0.99454.73068.8790.7870.99475.57295.2231.0900.984114.953159.5211.619
Autoformer0.934186.910237.2422.7210.961194.418249.6362.8580.935244.988320.0283.471
DLinear0.943185.487218.9132.7780.958213.396261.4933.1810.921279.378353.5603.948
FEDformer0.932203.167239.9703.0020.956213.950267.7483.1310.906309.724386.4044.315
FiLM0.933196.380237.5242.8790.946238.969294.7403.4460.878352.055440.7724.917
FreTS0.99074.02790.1271.0970.99387.277106.6311.3010.983119.697163.8961.691
Informer0.010705.115916.00311.3480.155989.7081167.99916.1930.340827.1581023.17612.987
Koopa0.98587.495111.7331.2580.985122.966157.0021.7630.958191.803258.4602.702
LightTS0.963146.808176.0722.1780.975164.908201.6712.4760.947222.976289.0443.166
Lstm0.018700.085912.23511.188−0.0371086.2471294.14717.5790.195899.1901130.37914.056
PatchTST0.99359.11874.4090.8520.99473.21495.8111.0570.983121.230163.1181.708
SVR−0.489864.5301123.24013.798−0.2221235.5461404.88719.5800.0411007.1941233.89615.160
TiDE0.967138.479168.4772.0080.975158.329200.9472.2760.939234.881310.6643.304
TimesNet0.944174.004218.6472.4990.956206.662265.3692.9610.914286.094370.1324.016
Transformer−0.079683.956956.38211.1450.0141040.0081262.03817.0110.223845.3731110.25713.342
iTransformer0.99265.21281.6730.9440.99383.025105.9341.2010.983121.472166.0271.708
5 stepsOurModel0.965128.254161.0371.8740.970174.230214.3582.5500.931249.274338.7603.480
Autoformer0.851271.262332.9023.9690.918282.489354.8854.0730.881356.645443.5934.998
DLinear0.836275.502348.9903.7920.895297.490400.9544.0890.810445.191561.2146.194
FEDformer0.810319.217375.7904.7560.897319.034396.4984.6060.840406.757514.3145.641
FiLM0.896231.853277.9983.4220.922283.436344.1694.1180.848395.872501.5875.478
FreTS0.944168.828203.0702.5340.963197.032236.3142.9680.935241.759328.7473.421
Informer−0.963991.4411207.27015.774−0.4931323.0931510.49421.461−0.0641145.9521328.54217.687
Koopa0.947160.308198.4382.3470.960200.379247.9182.9140.914279.096377.5743.881
LightTS0.951157.358190.3942.3160.956208.087260.6143.0750.923271.026356.2173.765
Lstm−0.475873.9341046.50913.789−0.1801150.0031342.70718.5850.1571001.0161182.33415.277
PatchTST0.965130.459160.6001.9030.968180.034219.4412.6420.926256.696351.0193.594
SVR−0.534823.4511066.92813.277−0.1851183.2141346.03218.7780.0871013.0881230.24614.993
TiDE0.874249.235305.5063.6470.895323.581400.2894.6470.769499.815619.3007.011
TimesNet0.921197.099241.6922.8780.938243.116306.8573.4720.854377.793491.5125.309
Transformer−0.371730.6601008.73111.951−0.1331095.0131315.70517.9430.214906.3321141.49114.153
iTransformer0.964130.004163.8411.8910.969180.486218.3472.6420.931245.568339.1283.430
10 stepsOurModel0.931188.787228.2492.7450.953228.761286.9453.3010.844366.679464.3285.178
Autoformer0.792325.792396.3884.7560.878373.490460.4845.3990.635559.805710.3587.901
DLinear0.825276.953363.6733.7940.885333.855446.7754.4950.753452.091584.5526.401
FEDformer0.809319.958379.2094.7750.869381.038476.6675.5380.667555.362678.8987.767
FiLM0.914215.268254.3243.1370.936263.214332.4973.7830.791426.476536.9976.032
FreTS0.904223.962269.0933.3300.944259.919311.4543.8510.882313.079403.6144.514
Informer−0.775923.4661157.16214.729−0.2131259.7931450.68020.266−0.2011131.6011288.67517.545
Koopa0.906225.554265.9403.2990.939260.980326.4453.7740.815399.125505.7085.624
LightTS0.907216.544265.0433.2030.942264.432318.2113.8540.890301.805390.8284.307
Lstm−0.569904.8791087.73914.265−0.0541176.7611352.34518.8000.023955.8281162.19914.953
PatchTST0.930185.873229.5042.7100.953226.338286.8803.2640.835376.798478.1455.326
SVR−0.380788.0791020.37612.660−0.0251171.2461333.41018.1280.083927.3061125.88914.161
TiDE0.895239.839281.8453.5150.924293.285363.6794.2610.752461.785585.1456.475
TimesNet0.896236.105279.5463.4330.922286.359368.1064.0540.680530.108665.6427.478
Transformer−0.549806.5641081.04113.081−0.0961158.7611379.08618.783−0.016975.5291185.32615.356
iTransformer0.930191.650229.9422.7870.951231.745290.6343.3470.845358.877463.2125.081
15 stepsOurModel0.930215.187261.4553.1340.917284.316353.7974.0990.737461.052575.5966.489
Autoformer0.831321.766405.8844.8120.818409.024525.4215.9250.525643.416773.6069.082
DLinear0.871300.946354.7404.5170.862369.175456.3415.6170.695511.939619.7167.285
FEDformer0.807359.110434.7255.3280.810453.760536.3886.7090.514659.922782.8969.288
FiLM0.914237.474289.5373.4860.896317.106396.6904.5940.667528.816647.7587.432
FreTS0.900253.271312.0153.6490.917291.983354.6874.3130.813378.851485.7975.514
Informer−0.142809.4881056.61212.782−0.1421137.8561315.13118.342−0.0731025.6371163.08715.961
Koopa0.930210.591261.6123.0560.914282.808360.4444.0660.703505.073612.3167.130
LightTS0.876285.651348.6184.2090.897325.478395.5304.8800.824387.341471.0175.636
Lstm0.154666.577909.22510.4090.0661028.0681189.06816.2210.217848.023993.69712.962
PatchTST0.928217.080265.6283.1730.913292.222363.6804.2210.721479.351593.1886.746
SVR−0.175835.8731071.46312.966−0.0911126.4311285.45117.7210.059897.9881089.13113.925
TiDE0.910238.887296.1523.5120.896318.332397.3694.6160.653542.278661.2647.629
TimesNet0.909246.469298.2093.6340.860372.991461.1135.4290.630564.132683.1087.861
Transformer−0.079760.4871026.98112.125−0.1531102.8271321.51217.9600.017929.9601113.40714.648
iTransformer0.928218.934265.9723.1940.922276.768342.9023.9900.753443.067558.3616.249
Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.
Table A5. Volatility-regime analysis results for SP500 (all models, four prediction horizons, four evaluation metrics).
Table A5. Volatility-regime analysis results for SP500 (all models, four prediction horizons, four evaluation metrics).
ModelLowMediumHigh
R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)
1 stepOurModel0.99832.43139.6660.5880.99740.84450.9700.7570.99356.10375.8281.214
Autoformer0.98974.21692.2311.3530.99173.66790.1081.3740.98094.139126.8061.994
DLinear0.974127.508138.2462.2960.98697.607113.2181.8230.965136.110167.8582.932
FEDformer0.98492.261108.6641.6790.99078.15494.5771.4890.973111.171148.1002.393
FiLM0.982102.759114.1701.8910.98889.132104.1361.7060.954152.704192.2203.327
FreTS0.99741.88450.3430.7370.99648.12059.6270.8820.99165.12884.9341.382
Informer−2.6911471.5851654.35524.891−1.6971345.1671576.79922.912−0.359844.3091046.05016.365
Koopa0.98676.329100.2151.4060.99170.77191.3801.3170.973110.876146.1372.425
LightTS0.911222.279256.7903.7540.943186.551228.7553.2110.955148.631191.3613.025
Lstm−2.7831535.2231674.81726.309−1.7871438.6911602.96525.100−0.6901081.9661166.53722.454
PatchTST0.99738.07946.6380.6910.99649.28762.2700.9170.99069.67790.8011.512
SVR−8.9202527.4572712.29043.577−6.3302363.9522599.58741.388−3.6301703.8201931.04434.200
TiDE0.99072.05586.3691.3050.99270.78987.4071.3280.975106.213143.0682.298
TimesNet0.99071.64486.6701.2850.99266.99584.1381.2350.98194.225124.0242.046
Transformer−2.7431469.5391666.07124.764−1.7811359.8241601.32423.122−0.388819.2711057.22315.590
iTransformer0.99734.73444.0780.6270.99646.87361.6350.8710.98968.93592.2171.468
5 stepsOurModel0.99067.94184.0621.2390.99076.81196.2961.4270.977110.516140.7462.351
Autoformer0.98685.508103.3051.5710.98691.298112.8791.7150.959147.768187.6963.188
DLinear0.790383.949395.0146.9190.884304.002325.2365.6500.894254.523301.3565.361
FEDformer0.974117.780137.9362.1790.98495.667118.8761.8450.951155.781204.4583.351
FiLM0.98591.668106.2121.7030.98788.665106.8631.7050.950157.481206.9663.436
FreTS0.99168.20882.7671.2410.98977.05398.7761.4350.975114.869147.6532.442
Informer−2.9311554.8581708.51126.520−1.8601406.0961611.60024.275−0.485941.8261128.14418.353
Koopa0.98784.01399.5721.5510.98887.502106.0891.6490.970124.956161.4062.657
LightTS0.853298.182330.5045.1200.914236.384279.5364.1190.935182.373235.7203.639
Lstm−3.8421802.2081896.14831.395−2.5221668.4751788.17329.786−1.2181292.6671378.58126.696
PatchTST0.98977.88990.1971.4270.98886.146104.7011.6150.969131.691164.1352.814
SVR−9.0842560.6712736.25044.209−6.4002368.7782592.18741.771−3.6201758.8981989.76235.046
TiDE0.959158.984174.1702.9250.979113.808137.2552.1650.952158.402203.4243.379
TimesNet0.98492.933109.1641.7040.98886.267104.7661.6490.957151.805192.3583.305
Transformer−2.1341325.3521525.52222.197−1.2761193.6911437.43520.218−0.141785.927988.71715.030
iTransformer0.99073.62386.4711.3480.98979.31899.7621.4930.973120.548152.7282.571
10 stepsOurModel0.98197.854117.8731.7930.981106.693130.3142.0120.958148.962188.1683.154
Autoformer0.958133.545175.3162.4940.972124.816158.7282.3890.924202.375254.0954.382
DLinear0.707453.326465.4808.1300.839356.641382.6256.5530.869279.530332.3545.878
FEDformer0.964146.382163.5952.7020.979116.316138.5742.2270.931185.279241.9873.978
FiLM0.969138.166151.4342.5360.981113.520131.8842.1540.944178.655216.9443.829
FreTS0.974123.108138.6942.2220.976124.307146.3342.3310.950166.078204.8733.508
Informer−3.3801667.1241799.30728.619−2.1631517.7141696.01726.405−0.7321050.9281210.69020.840
Koopa0.976116.990132.3962.1620.982110.780129.4332.0950.951164.281203.5113.516
LightTS0.702433.342469.5847.4590.821350.752403.9356.0940.887240.465308.9344.800
Lstm−4.1091851.9071943.21232.308−2.7111718.8611837.12230.665−1.4131365.2121429.04628.672
PatchTST0.975124.142136.9012.2800.983108.065126.0482.0510.948172.991209.8003.722
SVR−9.4162613.3872774.53645.231−6.6672433.6252640.57842.972−3.7561784.8852006.34935.709
TiDE0.937199.273215.8013.6570.973132.912157.4672.5640.910210.025276.0064.535
TimesNet0.973124.173142.0132.2650.980112.987133.6372.1410.947172.453211.2143.694
Transformer−2.6601469.9801644.81624.860−1.6491333.5381552.16322.812−0.391903.5181084.89317.632
iTransformer0.972128.440143.9892.3520.980114.595136.4992.1690.947172.102212.3833.691
15 stepsOurModel0.950179.595197.2063.2920.967144.479171.9442.7190.937190.317234.3454.000
Autoformer0.941190.695214.0313.5130.969138.345167.5862.6740.912214.287276.5134.617
DLinear0.832342.262359.9676.0780.915243.808276.8484.4220.921210.217262.4794.376
FEDformer0.938197.376218.3913.6200.971133.234162.9042.5440.921198.774262.0794.273
FiLM0.956169.649183.2203.1170.977124.321145.3032.3890.929201.868248.3814.309
FreTS0.941198.525213.6253.5550.957167.856197.0793.1040.929203.719248.5734.281
Informer−3.7661808.1251916.06031.368−2.5811651.0821795.12729.168−1.0631184.8131339.92623.453
Koopa0.966147.199162.2362.7240.978120.298140.7752.2970.938190.764233.2094.055
LightTS0.146732.377811.00712.4430.408629.090730.00710.8010.702391.236509.1857.431
Lstm−4.5401992.0912065.81635.027−3.2011843.4941944.40633.170−1.6071428.4611506.16129.547
PatchTST0.963153.844168.0162.8340.978120.247140.6622.3020.936196.898236.5534.206
SVR−9.2472655.9572809.62446.118−6.8842470.1902663.77043.837−3.9031841.6992065.50436.551
TiDE0.936205.664222.2643.7630.968143.936171.0082.7440.925196.056255.9864.174
TimesNet0.961156.808173.8002.8950.977124.534145.3942.3920.932193.550243.2844.125
Transformer−2.2831412.3011590.27223.842−1.4661265.1481489.72521.586−0.243842.1671040.13316.091
iTransformer0.964153.227167.2892.8240.977123.846145.0732.3580.937191.974233.4724.069
Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.

References

  1. He, Q.; Liang, Y.; Lin, Y.; Pan, D.; Yue, Y. Committee of Multi-Scale Nonlinear Learning Frameworks for Accurate Stock Price Forecasting. Eng. Appl. Artif. Intell. 2025, 162, 112325. [Google Scholar] [CrossRef]
  2. Zhang, C.; Sjarif, N.N.A.; Ibrahim, R. Deep Learning Models for Price Forecasting of Financial Time Series: A Review of Recent Advancements: 2020–2022. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2024, 14, e1519. [Google Scholar]
  3. Black, F.; Scholes, M. The Pricing of Options and Corporate Liabilities. J. Political Econ. 1973, 81, 637–654. [Google Scholar] [CrossRef] [PubMed]
  4. Merton, R.C. Theory of Rational Option Pricing. Bell J. Econ. Manag. Sci. 1973, 4, 141–183. [Google Scholar] [CrossRef]
  5. Heston, S.L. A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options. Rev. Financ. Stud. 1993, 6, 327–343. [Google Scholar] [CrossRef]
  6. Cont, R. Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues. Quant. Financ. 2001, 1, 223. [Google Scholar] [CrossRef]
  7. Li, Z.; Liao, Y.; Hu, B.; Ni, L.; Lu, Y. A Financial Deep Learning Framework: Predicting the Values of Financial Time Series with ARIMA and LSTM. Int. J. Web Serv. Res. 2022, 19, 1–15. [Google Scholar] [CrossRef]
  8. Zhang, J.; Liu, H.; Bai, W.; Li, X. A Hybrid Approach of Wavelet Transform, ARIMA and LSTM Model for the Share Price Index Futures Forecasting. N. Am. J. Econ. Financ. 2024, 69, 102022. [Google Scholar] [CrossRef]
  9. Wang, S.-W.; Huang, C.-Y. A Hybrid SVR-Based Framework for Cryptocurrency Price Forecasting and Strategy Backtesting. Appl. Artif. Intell. 2026, 40, 2612793. [Google Scholar] [CrossRef]
  10. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
  11. Wang, C.; Chen, Y.; Zhang, S.; Zhang, Q. Stock Market Index Prediction Using Deep Transformer Model. Expert Syst. Appl. 2022, 208, 118128. [Google Scholar] [CrossRef]
  12. Gao, M.-C. StockCI: A Hybrid Model Integrating CEEMDAN and Informer for Enhanced Long-Term Stock Price Forecasting. Complex Intell. Syst. 2025, 12, 74. [Google Scholar] [CrossRef]
  13. Su, J.; Lau, R.Y.K.; Du, Y.; Yu, J.; Zhang, H. A Novel Hybrid Framework for Stock Price Prediction Integrating Adaptive Signal Decomposition and Multi-Scale Feature Extraction. Appl. Sci. 2025, 15, 12450. [Google Scholar] [CrossRef]
  14. Ge, S.; Lin, A. An Adaptive Selection Decomposition Hybrid Model for Stock Time Series Forecasting. Nonlinear Dyn. 2025, 113, 4647–4669. [Google Scholar] [CrossRef]
  15. Minh, H.B.; An, N.H.; Tuan, N.M. Multi-Step-Ahead Time Series Forecasting Based on CEEMDAN Decomposition and Temporal Convolutional Networks. In Proceedings of the 2022 International Conference on Advanced Computing and Analytics (ACOMPA), Ho Chi Minh City, Vietnam, 21–23 November 2022; IEEE: Ho Chi Minh City, Vietnam, 2022; pp. 54–59. [Google Scholar]
  16. Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A Complete Ensemble Empirical Mode Decomposition with Adaptive Noise. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; IEEE: Prague, Czech Republic, 2011; pp. 4144–4147. [Google Scholar]
  17. Zhang, Y.; Chen, Y.; Qi, Z.; Wang, S.; Zhang, J.; Wang, F. A Hybrid Forecasting System with Complexity Identification and Improved Optimization for Short-Term Wind Speed Prediction. Energy Convers. Manag. 2022, 270, 116221. [Google Scholar] [CrossRef]
  18. Alipour, M.; Aghaei, J.; Norouzi, M.; Niknam, T.; Hashemi, S.; Lehtonen, M. A Novel Electrical Net-Load Forecasting Model Based on Deep Neural Networks and Wavelet Transform Integration. Energy 2020, 205, 118106. [Google Scholar] [CrossRef]
  19. Chen, M.-Y.; Chen, B.-T. Online Fuzzy Time Series Analysis Based on Entropy Discretization and a Fast Fourier Transform. Appl. Soft Comput. 2014, 14, 156–166. [Google Scholar] [CrossRef]
  20. Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar]
  21. Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency Enhanced Decomposed Transformer for Long-Term Series Forecasting. In International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2022. [Google Scholar]
  22. Yi, K.; Zhang, Q.; Fan, W.; Wang, S.; Wang, P.; He, H.; Lian, D.; An, N.; Cao, L.; Niu, Z. Frequency-Domain MLPs Are More Effective Learners in Time Series Forecasting. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Neural Information Processing Systems Foundation, Inc.: South Lake Tahoe, NV, USA, 2023. [Google Scholar]
  23. Wang, M.; Wang, H.; Zhang, F. Correctformer: A Transformer Architecture for Correcting Periodic Drift in Time-Series Forecasting. Neural Netw. 2026, 196, 108375. [Google Scholar] [CrossRef] [PubMed]
  24. Tang, Z.; Ji, T.; Kang, J.; Huang, Y.; Tang, W. Learning Global and Local Features of Power Load Series through Transformer and 2D-CNN: An Image-Based Multi-Step Forecasting Approach Incorporating Phase Space Reconstruction. Appl. Energy 2025, 378, 124786. [Google Scholar] [CrossRef]
  25. Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.-X.; Yan, X. Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
  26. Shehzad, H.T.; Anwar, M.A.; Razzaq, M. A Comparative Predicting Stock Prices Using Heston and Geometric Brownian Motion Models. arXiv 2023, arXiv:2302.07796. [Google Scholar] [CrossRef]
  27. Box, G.E.P.; Pierce, D.A. Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models. J. Am. Stat. Assoc. 1970, 65, 1509–1526. [Google Scholar] [CrossRef]
  28. Li, W.-J.; Zhang, D.-Q. GARCH-FIS: A Hybrid Forecasting Model with Dynamic Volatility-Driven Parameter Adaptation. arXiv 2026, arXiv:2603.14793. [Google Scholar]
  29. Beniwal, M. Adaptive Weighted Genetic Algorithm-Optimized SVR for Robust Long-Term Forecasting of Global Stock Indices for Investment Decisions. arXiv 2025, arXiv:2512.15113. [Google Scholar]
  30. Seabe, P.L.; Moutsinga, C.R.B.; Pindza, E. Forecasting Cryptocurrency Prices Using LSTM, GRU, and Bi-Directional LSTM: A Deep Learning Approach. Fractal Fract. 2023, 7, 203. [Google Scholar] [CrossRef]
  31. Büyükşahin, Ü.Ç.; Ertekin, Ş. Improving Forecasting Accuracy of Time Series Data Using a New ARIMA-ANN Hybrid Method and Empirical Mode Decomposition. Neurocomputing 2019, 361, 151–163. [Google Scholar] [CrossRef]
  32. Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.-C.; Tung, C.C.; Liu, H.H. The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-Stationary Time Series Analysis. Proc. R. Soc. Lond. A 1998, 454, 903–995. [Google Scholar] [CrossRef]
  33. Wu, Z.; Huang, N.E. Ensemble Empirical Mode Decomposition: A Noise-Assisted Data Analysis Method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
  34. Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
  35. Song, D.; Chung Baek, A.M.; Kim, N. Forecasting Stock Market Indices Using Padding-Based Fourier Transform Denoising and Time Series Deep Learning Models. IEEE Access 2021, 9, 83786–83796. [Google Scholar] [CrossRef]
  36. Jin, Z.; Yang, Y.; Liu, Y. Stock Closing Price Prediction Based on Sentiment Analysis and LSTM. Neural Comput. Appl. 2020, 32, 9713–9729. [Google Scholar] [CrossRef]
  37. Yemets, K.; Izonin, I.; Dronyuk, I. Time Series Forecasting Model Based on the Adapted Transformer Neural Network and FFT-Based Features Extraction. Sensors 2025, 25, 652. [Google Scholar] [CrossRef] [PubMed]
  38. Zhang, Q.; Yang, P.; Wen, H.; Li, X.; Wang, H.; Sun, F.; Song, Z.; Lai, Z.; Ma, R.; Han, R.; et al. Beyond the Time Domain: Recent Advances on Frequency Transforms in Time Series Analysis. arXiv 2025, arXiv:2504.07099. [Google Scholar]
  39. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  40. Kitaev, N.; Kaiser, Ł.; Levskaya, A. Reformer: The Efficient Transformer. arXiv 2020, arXiv:2001.04451. [Google Scholar]
  41. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; AAAI: Washington, DC, USA, 2021. [Google Scholar]
  42. Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Online, 6–14 December 2021. [Google Scholar]
  43. Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A Time Series Is Worth 64 Words: Long-Term Forecasting with Transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
  44. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017. [Google Scholar]
  45. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2017. [Google Scholar]
  46. Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2020. [Google Scholar]
  47. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  48. Vaziri, J.; Farid, D.; Nazemi Ardakani, M.; Hosseini Bamakan, S.M.; Shahlaei, M. A Time-Varying Stock Portfolio Selection Model Based on Optimized PSO-BiLSTM and Multi-Objective Mathematical Programming under Budget Constraints. Neural Comput. Appl. 2023, 35, 18445–18470. [Google Scholar] [CrossRef]
  49. Wu, J.M.-T.; Li, Z.; Herencsar, N.; Vo, B.; Lin, J.C.-W. A Graph-Based CNN-LSTM Stock Price Prediction Algorithm with Leading Indicators. Multimed. Syst. 2023, 29, 1751–1770. [Google Scholar] [CrossRef]
  50. Li, X.; Sun, Y. Stock Intelligent Investment Strategy Based on Support Vector Machine Parameter Optimization Algorithm. Neural Comput. Appl. 2020, 32, 1765–1775. [Google Scholar] [CrossRef]
  51. Tu, X.; Fu, L.; Wang, Q. Carbon Price Prediction Based on Multidimensional Association Rules and Optimized Multi-Factor LSTM Model. Energy 2025, 329, 136768. [Google Scholar] [CrossRef]
  52. Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
  53. Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; AAAI: Washington, DC, USA, 2023. [Google Scholar]
  54. Liu, Y.; Li, C.; Wang, J.; Long, M. Koopa: Learning Non-Stationary Time Series Dynamics with Koopman Predictors. In Proceedings of the Advances in Neural Information Processing Systems 36 (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
  55. Zhang, T.; Zhang, Y.; Cao, W.; Bian, J.; Yi, X.; Zheng, S.; Li, J. Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-Oriented MLP Structures. arXiv 2022, arXiv:2207.01186. [Google Scholar]
  56. Zhou, T.; Ma, Z.; Wang, X.; Wen, Q.; Sun, L.; Yao, T.; Yin, W.; Jin, R. FiLM: Frequency Improved Legendre Memory Model for Long-Term Time Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems 35 (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
  57. Das, A.; Kong, W.; Leach, A.; Mathur, S.; Sen, R.; Yu, R. Long-Term Forecasting with TiDE: Time-Series Dense Encoder. arXiv 2023, arXiv:2304.08424. [Google Scholar]
Figure 1. Multi-head attention mechanism structure diagram.
Figure 1. Multi-head attention mechanism structure diagram.
Mathematics 14 02202 g001
Figure 2. Improved transformer encoder architecture.
Figure 2. Improved transformer encoder architecture.
Mathematics 14 02202 g002
Figure 3. Simplified schematic of depthwise separable convolution.
Figure 3. Simplified schematic of depthwise separable convolution.
Mathematics 14 02202 g003
Figure 4. The flowchart of FAMS-Transformer hybrid model.
Figure 4. The flowchart of FAMS-Transformer hybrid model.
Mathematics 14 02202 g004
Figure 5. Daily closing price trends of the three stock indices.
Figure 5. Daily closing price trends of the three stock indices.
Mathematics 14 02202 g005
Figure 6. Comparison of error metrics of different models on three datasets under one-step forecasting.
Figure 6. Comparison of error metrics of different models on three datasets under one-step forecasting.
Mathematics 14 02202 g006
Figure 7. The fitting curves of the predicted values and the true values of each model on the SSE dataset under different prediction steps are compared.
Figure 7. The fitting curves of the predicted values and the true values of each model on the SSE dataset under different prediction steps are compared.
Mathematics 14 02202 g007
Figure 8. Comparison of the predicted value and the real value fitting curve of each model on the SZSE data set under different prediction steps.
Figure 8. Comparison of the predicted value and the real value fitting curve of each model on the SZSE data set under different prediction steps.
Mathematics 14 02202 g008
Figure 9. The fitting curves of the predicted values and the true values of each model on the SME100 data set under different prediction steps are compared.
Figure 9. The fitting curves of the predicted values and the true values of each model on the SME100 data set under different prediction steps are compared.
Mathematics 14 02202 g009
Figure 10. Comparison of the predicted value and the real value fitting curve of each model on the S & P 500 data set under different prediction steps.
Figure 10. Comparison of the predicted value and the real value fitting curve of each model on the S & P 500 data set under different prediction steps.
Mathematics 14 02202 g010
Table 1. Descriptive statistics of SSE, SZSE, and SME 100.
Table 1. Descriptive statistics of SSE, SZSE, and SME 100.
IndexCountMaxMinMeanStandard DeviationSkewnessKurtosis
SSE26615166.3502003.4903137.932404.8760.3443.679
SZSE266118,098.2707089.44010,926.9372038.1150.5330.009
SME100266111,996.5204465.4507131.4591423.8680.537−0.203
Table 2. Specific names and symbols of the input features.
Table 2. Specific names and symbols of the input features.
ClassificationIndicator Name
Market Trading IndicatorsPrice-Volume
Indicators
Open price, Close price, Price change, Volume
Technical
Indicators
P/E ratio (TTM), P/B ratio (MRQ), P/S ratio (TTM), P/CF ratio (TTM), 5-day/10-day Moving Average, MACD, Momentum Indicator, Bollinger Bands, Williams Variable Accumulation/Distribution
External Factor IndicatorsCommodity
Indicators
Carbon trading prices (Beijing, Shanghai, Shenzhen); Crude oil prices (WTI, Brent); Gold prices (Shanghai Gold Exchange closing price, London spot gold closing price)
Global Capital
Market Indicators
Global stock indices (S & P 500, Dow Jones Industrial Average, Hang Seng Index, Nikkei 225); Foreign exchange rates (EUR/CNY, JPY/CNY, HKD/CNY)
Macroeconomic
Variables
Money supply and inflation (China M2 year-on-year growth rate (monthly), China CPI month-on-month, China CPI year-on-year, China CPI consumer goods year-on-year); China goods export growth rate (monthly); Interest rates and credit (China 10-year and 1-year government bond yield spread (monthly), Ratio of China net bond issuance to year-end market capitalization (monthly))
Sentiment IndicatorsVIX closing price
All data are sourced from the WIND database.
Table 3. The evaluation results of multi-step prediction of FAMS-Transformer and baseline model on three datasets.
Table 3. The evaluation results of multi-step prediction of FAMS-Transformer and baseline model on three datasets.
Model1 Step5 Steps10 Steps15 Steps
MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2
SSEOurModel26.40338.4060.8350.95942.62463.1421.3500.88956.18282.3151.7820.80867.24896.7282.1350.730
Lstm113.066137.8033.5840.475113.901139.1413.5980.459129.992158.6364.1230.286120.347150.0103.8470.350
SVR228.231278.1317.472−1.139219.559268.4787.185−1.014207.981255.7826.801−0.857198.637245.8246.493−0.745
Transformer81.574105.8362.6110.69096.165120.7413.0890.593111.527139.5103.5960.448117.328143.8013.7560.403
Autoformer75.47598.0892.3800.73489.915115.9202.8320.624104.965137.5163.3110.463114.042145.0333.6130.392
Informer148.229181.7214.8400.087118.590145.2043.7440.411125.345155.4863.9770.314134.265165.0924.2280.213
FEDformer74.97895.3212.3870.74998.259126.6943.1160.551108.891139.1133.4590.451116.273146.0803.7050.384
iTransformer27.82840.1930.8810.95544.08665.5811.3970.88057.25284.7041.8150.79667.68197.8682.1480.723
PatchTST27.63739.5630.8740.95744.36365.1951.4050.88158.52885.3431.8540.79369.69099.6812.2100.713
TimesNet67.43090.9932.1250.77177.410105.9122.4550.68781.024109.8252.5660.65891.405122.0902.8990.570
FreTS29.36341.6860.9340.95247.72968.1581.5160.87059.18083.4431.8840.80271.04898.0672.2630.722
DLinear68.35089.3962.1760.77992.096122.1202.8820.58399.533131.1623.1310.512123.005154.7313.9030.309
Koopa44.87765.9841.4120.88053.73577.8861.7020.83064.80291.3932.0610.76375.712105.4712.4020.679
LightTS53.72072.3631.7060.85561.87183.1031.9590.80766.24593.0772.0870.75484.518112.7982.6690.633
TiDE54.44174.7681.7240.84586.072117.5562.7220.61483.925115.8312.6570.61988.312120.5292.7990.58
FiLM76.844100.1332.4400.72373.57797.1772.3460.73673.727103.6192.3320.69585.964117.0882.7240.604
SZSEOurModel124.367182.9531.1580.985206.896306.0861.9310.958282.320409.7912.6350.923345.009492.7483.2200.887
Lstm1369.4241639.56513.728−0.1901328.9591558.87613.353−0.0901361.8531617.97713.721−0.1951088.8121370.67610.8730.126
SVR1356.6141671.81613.565−0.2371328.8031639.32713.274−0.2051285.8541593.72012.840−0.1591256.0201560.49512.545−0.133
Transformer1368.7891622.99113.820−0.1661500.5391725.61315.043−0.3351667.0311887.47816.658−0.6261580.8391815.12515.834−0.533
Autoformer330.818443.5703.0840.913395.564518.7073.6550.879511.272679.0424.7470.790563.955724.1825.2510.756
Informer1520.9581702.36215.195−0.2831233.7871479.36712.4060.0191288.8091544.95812.897−0.0891179.3181434.30511.7820.043
FEDformer371.260470.9223.4790.902494.360624.9384.6110.825552.153698.4915.1550.777596.261751.4355.5880.737
iTransformer136.504192.5251.2730.984213.463313.7351.9940.956282.465412.1482.6390.922340.639489.7363.1840.888
PatchTST127.265184.4211.1870.985216.459317.3642.0210.955286.451418.2112.6720.920352.035503.6433.2840.882
TimesNet324.116430.6323.0050.918417.421551.6983.8870.864431.481578.1794.0110.847463.216611.0004.3160.826
FreTS141.058196.2461.3380.983243.994335.8672.3060.949297.260408.3822.8420.924365.141494.8373.4880.886
DLinear349.852448.1883.3110.911407.736546.8383.7710.866461.339604.4244.3090.833584.959734.8195.5240.749
Koopa207.161298.0541.9200.961267.891382.3122.4930.934340.774469.9763.1870.899394.082543.7403.6750.862
LightTS266.857351.0742.5380.945305.507405.2342.8620.926326.067433.3163.0550.914388.962510.4453.6710.879
TiDE271.075369.5502.5270.940443.331597.6554.1220.840433.526590.4764.0360.841459.871620.3164.2870.821
FiLM407.282525.2083.7940.878399.561524.2693.7320.877375.403521.6003.4940.876447.862607.1764.1730.828
SME100OurModel81.759114.4181.1650.990137.772194.8361.9720.972183.906257.6032.6320.951222.256308.9893.1790.928
Lstm894.9281122.96514.2700.084941.1851134.31714.9240.058873.1261080.85013.8900.134862.4261063.91513.4070.150
SVR1035.4991259.15016.175−0.1511023.0541241.83515.955−0.1291001.1121215.80315.599−0.095987.0081198.41215.376−0.079
Transformer856.2091116.35413.8290.095890.1111144.96114.3820.040951.7461199.41215.305−0.066895.2661138.79714.3910.026
Autoformer208.789271.4553.0170.947268.800346.5693.8350.912327.340424.9674.6880.866351.160453.8815.0230.845
Informer840.4681040.68713.5060.214942.6301164.50515.0390.007908.3861120.76514.4200.069817.8801018.85812.9250.220
FEDformer242.317304.7763.4830.933322.260400.9574.6110.882353.874442.4205.0710.855383.527481.3225.5110.826
iTransformer89.912123.1161.2850.989140.043197.2082.0040.972184.316257.6802.6400.951220.359306.3203.1570.930
PatchTST84.535117.3911.2050.990142.360200.2222.0380.971187.066262.5632.6790.949228.014316.6113.2640.925
TimesNet222.274291.7093.1590.938257.510340.3293.6670.915278.991371.4233.9680.898336.187434.5684.8060.858
FreTS93.675124.3261.3630.989160.229213.1142.3180.967195.697258.5182.8680.950241.419315.3053.5420.925
DLinear226.103283.6393.3030.942263.217346.9513.7430.912298.706385.2674.2920.890380.595472.0805.5510.833
Koopa134.102186.1661.9080.975176.521242.3062.5180.957224.024301.3773.2080.933254.150342.9753.6340.912
LightTS178.248227.4952.6070.962200.543260.1222.8890.950221.059284.1393.1830.940264.035334.5203.8260.916
TiDE177.254234.7572.5300.960284.586375.1604.0630.897277.809369.4603.9710.899293.102388.3664.1930.887
FiLM262.498335.4943.7480.918260.354339.5923.7200.916241.051325.2023.4450.922285.604380.6034.0840.891
Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.
Table 4. Evaluation results of multi-step prediction of FAMS-Transformer and baseline model on SP500 dataset.
Table 4. Evaluation results of multi-step prediction of FAMS-Transformer and baseline model on SP500 dataset.
Model1 Step5 Steps10 Steps15 Steps
MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2
SP500OurModel43.11457.4910.8530.99764.30287.9861.2680.99284.765113.8121.6710.987107.485139.4762.1180.98
Lstm1352.1641498.57624.623−1.2951621.3641727.41530.065−2.0631582.2921691.60729.385−1.9521662.4561763.33630.959−2.225
SVR2198.7762439.12639.726−5.0812217.172451.43440.134−5.1682243.8732469.6740.721−5.2932271.162489.93941.31−5.43
Transformer1216.4851467.39621.162−1.2011255.3061479.12822.099−1.2461352.9031570.07623.964−1.5431393.9961612.99124.723−1.698
Autoformer80.666104.3971.5730.98990.395118.8821.7870.985114.539153.4512.2770.976142.943178.8022.850.967
Informer1219.2631450.29421.367−1.151366.0671571.47824.289−1.5351440.4011626.38125.858−1.7291508.8681684.64427.224−1.943
FEDformer93.871119.2821.8540.985118.862149.0922.3540.977129.942163.072.5820.973140.929176.7622.8090.968
iTransformer50.16468.8870.9880.99574.75298.1081.4790.9999.394126.751.9670.983117.723147.6562.3290.977
PatchTST52.33269.0141.0390.99586.849110.1711.7190.988108.755135.8022.1560.981126.124155.2792.4940.975
TimesNet77.61499.9411.5220.99102.721130.6212.0510.982110.861140.1142.1850.98116.517150.9132.3180.976
FreTS51.766.5761.0000.99569.79893.3211.3740.991103.543130.9622.0090.982127.498160.5042.4630.973
DLinear120.416141.5432.350.98198.175232.9153.7970.944214.181257.5514.0930.932215.072263.014.1220.928
Koopa85.981115.0931.7160.98680.151106.2691.590.98899.812128.4551.9790.983116.081146.0472.2990.978
LightTS185.861227.2573.3310.947184.166229.7923.3440.946339.258428.2746.0750.811414.287524.2017.3330.715
TiDE83.007108.8631.6430.988126.585157.9642.5090.974136.07171.5412.6930.97151.568188.2842.9930.963
FiLM114.851142.3652.3080.97996.874131.7281.9660.982117.246144.8122.3270.978130.626161.6992.6030.973
Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.
Table 5. Significance test results—Wilcoxon signed-rank test (α = 0.05).
Table 5. Significance test results—Wilcoxon signed-rank test (α = 0.05).
BaselineSSESZSESMESESP500
pl = 1pl = 5pl = 10pl = 15pl = 1pl = 5pl = 10pl = 15pl = 1pl = 5pl = 10pl = 15pl = 1pl = 5pl = 10pl = 15
Autoformer************ns*********ns*********ns******
DLinear************************************************
FEDformer******************************************ns***
FiLM*********ns******ns*******ns***************
FreTS***************************************ns******
Informer***********************************************
iTransformerns*********************************************
Koopa******************************************ns***
LightTS***************ns*******ns***ns************
Lstm************************************************
PatchTSTns*********************ns*****************
SVR************************************************
TiDE********ns*********ns***********************
TimesNetnsns**************************************
Transformer**ns***ns************************************
Note: *** p < 0.01; ** p < 0.05; * p < 0.1; ns, p ≥ 0.1.
Table 6. Significance test results—paired t-test (α = 0.05).
Table 6. Significance test results—paired t-test (α = 0.05).
BaselineSSESZSESMESESP500
pl = 1pl = 5pl = 10pl = 15pl = 1pl = 5pl = 10pl = 15pl = 1pl = 5pl = 10pl = 15pl = 1pl = 5pl = 10pl = 15
Autoformer*****ns***nsns***ns***ns*********ns******
DLinear************************************************
FEDformer***************************************ns****
FiLM*******************************ns************
FreTS***************************************ns******
Informer***********************************************
iTransformerns*****************************************
Koopa***********************************************
LightTS*************ns*********ns******************
Lstm************************************************
PatchTSTns*******************ns****ns******ns***
SVR************************************************
TiDE*ns**************ns*********ns***********
TimesNet**ns******ns************ns*********nsns***
Transformernsns***ns************************************
Note: *** p < 0.01; ** p < 0.05; * p < 0.1; ns, p ≥ 0.1.
Table 7. R2 of all models across three volatility regimes on four datasets (prediction horizon pl = 1).
Table 7. R2 of all models across three volatility regimes on four datasets (prediction horizon pl = 1).
ModelSSESZSESMESESP500
LowMediumHighLowMediumHighLowMediumHighLowMediumHigh
FAMS-Transformer (Ours)0.9880.9690.9000.9940.9930.9560.9940.9940.9840.9980.9970.993
Autoformer0.8020.6750.6450.9280.9490.8110.9340.9610.9350.9890.9910.980
DLinear0.9000.8300.5120.9370.9520.7860.9430.9580.9210.9740.9860.965
FEDformer0.8140.8160.5360.9190.9470.7780.9320.9560.9060.9840.9900.973
FiLM0.8940.7890.3590.9270.9330.6920.9330.9460.8780.9820.9880.954
FreTS0.9800.9650.8890.9900.9920.9530.9900.9930.9830.9970.9960.991
Informer0.0580.230−0.219−0.4490.120−1.0310.0100.1550.340−2.691−1.697−0.359
Koopa0.9670.9300.6810.9830.9830.8820.9850.9850.9580.9860.9910.973
LightTS0.9410.8900.6710.9610.9760.8570.9630.9750.9470.9110.9430.955
Lstm0.5760.5440.152−0.0770.140−1.1120.018−0.0370.195−2.783−1.787−0.690
PatchTST0.9870.9690.8930.9930.9940.9540.9930.9940.9830.9970.9960.990
SVR−1.806−0.720−1.120−0.429−0.018−0.546−0.489−0.2220.041−8.920−6.330−3.630
TiDE0.9480.8730.6420.9640.9690.8420.9670.9750.9390.9900.9920.975
TimesNet0.9130.8130.4830.9430.9540.8040.9440.9560.9140.9900.9920.981
Transformer0.5530.7080.793−0.2720.178−0.851−0.0790.0140.223−2.743−1.781−0.388
iTransformer0.9860.9660.8930.9920.9920.9550.9920.9930.9830.9970.9960.989
Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.
Table 8. Ablation experimental results.
Table 8. Ablation experimental results.
Model1 Step5 Steps10 Steps15 Steps
MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2MAERMSEMAPE (%)R2
OurModel26.40338.4060.8350.959242.62463.1421.350.888656.18282.3151.7820.807767.24896.7282.1350.7298
w/o Conv26.34138.2060.8340.959642.91163.4831.3590.887456.69382.891.7990.80567.87997.4392.1550.7258
w/o Decomp27.32239.860.8660.956143.8764.891.3910.882357.93984.7611.8390.796169.00298.7562.1910.7183
w/o Both26.92539.2370.8530.957444.04464.9081.3960.882356.92883.4721.8060.802369.56199.3542.2080.7149
SSEStandard Conv1D27.39440.2170.8680.955344.23965.7891.4030.87958.13385.0861.8440.794669.6599.512.2120.714
Dilated Depthwise Conv1D27.80941.1320.880.953244.16665.4161.40.880457.73484.5161.8320.797369.01598.9272.1910.7174
Original Transformer27.22239.8170.8610.956244.36765.4451.4060.880357.5784.0921.8260.799369.70999.5182.2130.714
Fixed-period26.37738.380.8340.959342.65363.0321.3510.88956.13582.3981.780.807367.34696.9422.1380.7286
Random-period26.95739.360.8510.957242.96863.9621.360.885757.26283.4971.8160.802268.40198.4872.1710.7199
OurModel124.367182.9531.1580.9852206.896306.0861.9310.958282.32409.7912.6350.9234345.009492.7483.220.887
w/o Conv129.113186.9391.2040.9845211.057310.6481.9710.9567285.788414.5832.6670.9216348.847497.8723.2550.8846
w/o Decomp128.602188.9441.2010.9842214.322315.3952.0030.9554290.062420.5572.7090.9193350.837500.6193.2760.8834
w/o Both128.596186.0891.2010.9847214.9315.4252.0070.9554281.854408.2572.6310.9239354.938505.6833.3140.881
SZSEStandard Conv1D132.816193.1781.240.9835216.269317.522.0220.9548291.147421.9242.7190.9187354.817503.2293.3150.8821
Dilated Depthwise Conv1D126.984187.3371.1830.9845215.772317.7052.0170.9547290.415420.3922.7130.9193351.059501.0373.2780.8832
Original Transformer130.853190.9651.220.9839217.147318.2732.0290.9546285.112414.5422.6630.9216354.491505.0423.310.8813
Fixed-period124.701183.6761.160.9851208.842307.7831.950.9575281.837409.4722.630.9235344.511492.873.2150.8869
Random-period126.882186.0471.1820.9847208.908308.8951.9460.9572284.202411.842.6580.9226346.931493.6773.2420.8866
OurModel81.759114.4181.1650.9905137.772194.8361.9720.9722183.906257.6032.6320.9508222.256308.9893.1790.9283
w/o Conv85.914119.6891.2260.9896138.094195.1561.9750.9721185.729260.4032.6580.9498224.789312.3023.2160.9267
w/o Decomp85.878119.8391.2290.9896140.45198.422.0130.9712188.856264.4472.7070.9482223.462308.3163.2030.9286
w/o Both86.682119.181.2370.9897141.599199.5572.0290.9708187.238262.2512.6830.949229.306317.6433.2860.9242
SME 100Standard Conv1D88.166123.0751.260.989142.748200.9792.0470.9704190.583266.7482.7330.9473229.608316.8713.2910.9246
Dilated Depthwise Conv1D86.426121.3211.2350.9893141.619200.1072.0290.9707189.504265.0432.7160.9479227.539315.2493.260.9254
Original Transformer88.303123.0871.2620.989142.142200.1952.0370.9707186.055260.4112.6660.9497229.023316.9483.2810.9245
Fixed-period82.287115.5091.1720.9903135.977192.441.9450.9729183.93257.6452.6320.9508223.328310.153.1950.9277
Random-period86.47118.6531.2320.9898137.047193.6911.9660.9725182.277255.6742.6160.9516224.551310.2513.2160.9277
Note: bold values indicate the best-performing method, and underlined values indicate the second-best method.
Table 9. Comparison of efficiency index of four architecture variants under four prediction steps.
Table 9. Comparison of efficiency index of four architecture variants under four prediction steps.
Model1 Step5 Steps10 Steps15 Steps
ParamsFLOPsParamsFLOPsParamsFLOPsParamsFLOPs
Standard Conv1D7.891 M1.538 G7.898 M1.538 G7.905 M1.539 G7.913 M1.539 G
w/o Decomp6.322 M1.232 G6.328 M1.232 G6.335 M1.233 G6.343 M1.233 G
Dilated Depthwise Conv1D6.322 M1.232 G6.328 M1.232 G6.335 M1.233 G6.343 M1.233 G
Original Transformer6.316 M1.231 G6.322 M1.231 G6.329 M1.231 G6.337 M1.232 G
Table 10. Statistics of Jaccard similarity of dominant period sets between adjacent test windows.
Table 10. Statistics of Jaccard similarity of dominant period sets between adjacent test windows.
DatasetNum. WindowsMean JaccardMedian JaccardFull-Overlap RatioMost Frequent Periods
SSE7620.969310.952829, 15, 11
SZSE7620.969310.952829, 15, 11
SMESE7620.969310.952829, 15, 11
S&P5008840.9210.87929, 15, 11, 7, 5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, H.; Zeng, X.; Hu, G.; Zhang, T. A Hybrid Model for Stock Index Forecasting Integrating Adaptive Frequency-Domain Decomposition and Enhanced Transformer Encoder. Mathematics 2026, 14, 2202. https://doi.org/10.3390/math14122202

AMA Style

Zheng H, Zeng X, Hu G, Zhang T. A Hybrid Model for Stock Index Forecasting Integrating Adaptive Frequency-Domain Decomposition and Enhanced Transformer Encoder. Mathematics. 2026; 14(12):2202. https://doi.org/10.3390/math14122202

Chicago/Turabian Style

Zheng, Hairong, Xiaozheng Zeng, Guoyu Hu, and Tingting Zhang. 2026. "A Hybrid Model for Stock Index Forecasting Integrating Adaptive Frequency-Domain Decomposition and Enhanced Transformer Encoder" Mathematics 14, no. 12: 2202. https://doi.org/10.3390/math14122202

APA Style

Zheng, H., Zeng, X., Hu, G., & Zhang, T. (2026). A Hybrid Model for Stock Index Forecasting Integrating Adaptive Frequency-Domain Decomposition and Enhanced Transformer Encoder. Mathematics, 14(12), 2202. https://doi.org/10.3390/math14122202

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop